Predicting Titanic Survivors

This is my maiden voyage, when it comes to Kaggle contest that is!.

In this work, we will write our deep machine learning in its original form of using forward and backward propagations without the help of other packages like Tensor Flow, Kerras, Mxnet,... We also compare results of the neural networks with other statistical methods, we also use the popular scikit-learn library to develop our machine learning algorithms

We follow the following steps to predict:

  1. Load Data:

    • Load Data Modelling Libraries
  2. Data Visualization

    • We provide a few plots of the data
  3. Data Wrangling

    • 3.1 Adding title column and digitizing Embarked, Pclass
    • 3.2 Adding fFamily Size feature
    • 3.3 Adding Survival expectation value by sex and pclass
  4. Neural Network
  5. Compare with other Statistical Methods

  6. Conclusion

    • Neural Network is most accurate even though the data set is not so large, 668 rows for training data.
    • Statistical methods are most timing efficient, menaing taking much lesser time to predict.

1. Load Data

Load Data Modelling Libraries

We will use the popular scikit-learn library to develop our machine learning algorithms. In sklearn, algorithms are called Estimators and implemented in their own classes. For data visualization, we will use the matplotlib and seaborn library. Below are common classes to load.

2 Explore Data Graphically

Visualize Age Data

3. Data Wrangling

3.1 Adding title column and digitizing Embarked, Pclass

3.2 Add Family Size feature

Family Survival

This credit is due to https://www.kaggle.com/shunjiangxu/blood-is-thicker-than-water-friendship-forever

3.3 Survival expectation value by sex and pclass

3.4 Dividing data

To carry out machine learning algorithms, we divide data into test and training sets. The goal is to have both good accuracy rates on both sets.

4. Neural Network

4.1 Define a Neural Network and Initialize values

4.2 Neural Network Trainging

- The cost function should go to zero if the neural networks sucessfully trained

4.3 Neural Network Predictions

5. Compare with other Statistical Methods

6. Conclusion

  • Neural Network is most accurate even though the data set is not so large, 668 rows for training data
  • Statistical methods are most timing efficient, menaing taking much lesser time to predict.