# D3: Visualizing Titanic Survivors by Gender, Age and Class

Titanic: Machine Learning from Disaster is the 101 type of machine learning competition hosted on Kaggle since it started. The task is to predict who would survive the disaster given information on individual's age, gender, socio-economic status(class) and various other features.

Recently, during the winter break, I have started learning the JavaScript library D3.js. The above graph is a screenshot of my first visualization project I created with D3.js. The address to the live version of the project is here.

• Each rectangle in the graph represents a passenger on Titanic, color yellow means that the passenger survived the disaster and the color blue indicates that he does not.
• There can be multiple people with the same age, gender, and class values, so I set the opacity of these rectangles to be 20%. So the place on the graph where you can see solid yellow shows that those passengers have a higher chance of surviving, whereas solid blue indicates danger.

Based on this visualization we can see that:

1. females (young or old, except around age 25) and young males(under age 15) from middle and upper class tend to survive.
2. the overall survivor rate for female passengers is higher than male passengers.

So, without all the drama shown in the classic movie, this visualization basically predicts that Jack will most likely not able to make it, but Rose will survive...

Update: Jan 14

I made a couple of changes to the visualization during last few days. Now a newer version is available here.

# VIDEO 3 - A Basic Line Plot

## make the `x` values as ordered categorical data type won't help, it is

probably due to the difference in implementation `ggplot` in Python.

# VIDEO 4 - Adding the Hour of the Day

## Create out plot/Change the colors

`ggplot` in python does not support `aes(group=Var)` yet

Also the `ggplot` version I got using `pip` does not seem to plot legend correctly.

## Make a heatmap:

Struggled to plot heatmap using `ggplot` in Python without success. Turning to the old friend `matplotlib`

## VIDEO 5 - Maps

Given up on it...

# VIDEO 4 - A BASIC SCATTERPLOT

## Let's redo this using ggplot

There is a `ggplot` library developed by `yhat` for python, but it is not as developed as `ggplot2` in `R`.

Since the `ggplot`in `Python` is based on `matplotlib`, there are some small differences.

Create the `ggplot` object with the data and the aesthetic mapping:

## Redo the plot with blue triangles instead of circles:

To specify the type of plotting symbols, reference the marker in `matplotlib` Also notice the `size` is different.

# VIDEO 5 - MORE ADVANCED SCATTERPLOTS

## Is the fertility rate of a country was a good predictor of the percentage of

the population under 15?

## Simple linear regression model to predict the percentage of the population

under 15, using the log of the fertility rate:

# 机器学习笔记 Week8 降维

Week8 由两部分内容构成：

### 2 降维（Dimensionality Reduction）

#### 2.3主要成分分析（Principal Component Analysis）

• 问题是要将n维数据降至k维
• 目标是找到向量u(1),u(2),...,u(k)使得总的投射误差最小

#### 2.6应用主要成分分析

1. 第一步是运用主要成分分析将数据压缩至1000个特征
2. 然后对训练集运行学习算法
3. 在预测时，采用之前学习而来的Ureduce将输入的特征x转换成特征向量z，然后再进行预测