Index of /teaching/dav_20/labs/lab13/

      Name                                                                             Last modified         Size  Description 
   
up Parent Directory 01-Jun-2020 11:54 - directory titanic 01-Jun-2020 11:54 - unknown test.csv 01-Jun-2020 12:10 28k unknown train.csv 01-Jun-2020 12:10 60k

====================================================================
                            TITANIC
                            Part 3
====================================================================

Use the passenger data from Titanic shipwreck to answer question 
"what sorts of people were more likely to survive?”

You will be given: name, age, gender, socio-economic class, etc) 

====================================================================

The data has been split into two groups:

- training set (train.csv)
- test set (test.csv)

Class description:

pclass: A proxy for socio-economic status (SES)
1st = Upper, 2nd = Middle, 3rd = Lower

age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

sibsp: The dataset defines family relations in this way...
Sibling = brother, sister, stepbrother, stepsister
Spouse = husband, wife

parch: The dataset defines family relations in this way...
Parent = mother, father
Child = daughter, son, stepdaughter, stepson
Some children traveled only with a nanny, therefore parch=0 for them.

====================================================================

3) ML models building (deep learning)
- install tensorflow & keras
- train dense model (at least two dense layers, adam, relu/softsign, dropout layers)
- print the model structure and save in text or image file (screenshot), model.summary()
- calculate scores and save models in json&hdf5 format, model.to_json()
- make prediction script (command-line tool that asks for age, gender, 
socio-economic class, etc and return the prediction)
for instance:
python titanic_dl_predictor.py
Age: 24
Gender: Male
socio-economic class: 1
...
Most likely you would not survive the titanic crash (DEAD 0.9311)
* GPU vs CPU For our exercises, it is sufficient to use only CPU, but in real-life scenarios deep learning training can require a lot of RAM and computing power. Thus, GPU can be used, but this is also the tricky part. First of all, you need nVidia cards (most laptops do not have the separate graphic card). Next, you need to install special nVidia driver supporting deep learning (it seems easy to install, but frequently it may lead to serious problems including complete system failure or re-installation of X environment). ==================================================================== Homework: Make pdf report with all plots and the tables summarizing the Titanic parts 1-3. If possible, make some conclussions. Additionally provide all scripts, separate plot image files like: - initial data exploration plots - the decision trees visualizations - for deep learning provide also json&hdf models All files should be sent until 07.06.2020 via email to lukaskoz@mimuw.edu.pl with the email subject: 'lab13_hw_Name_Surname' without email text body and with 'lab13_hw_Name_Surname.7z' (ASCII letters only) attachment. All emails with a different structure (the one that will not go through email filter to the proper email folder dedicated for home works) will be scored -10% Using non-English labels, legends, descriptions, etc. will be scored -10%
Proudly Served by LiteSpeed Web Server at bioinformatics.netmark.pl Port 80