Lab 8: The importance of the ploting

Task 1) Analyze the dataset: ans.csv

Calculate the missing values (fill the tables).

Property Value Accuracy
Mean of x ? ?
Sample variance of x ? ?
Mean of y ? ?
Sample variance of y ? ?
Correlation between x and y ? ?
Linear regression line (y = a+bx) ? ?
Coefficient of determination of the linear regression ? ?

Task 2) Analyze the dataset: ans2.tsv

Calculate the missing values (fill the tables).

Property Value Accuracy (up to 3 places)
Mean of x ? ?
Mean of y ? ?
SD of x ? ?
SD of y ? ?
Corr ? ?

Homework:

Make the report in html (from jupyter) with the filled tables and scatter plots for:

  • point 1, one single plot similar to the one from Wikipedia
  • point 2, in this case make separte subplots for each dataset (the plot in structure should look similar to the plots from Task5 in Lab7

Can you guess what "d" and "s" stand for in given datasets of ans2.tsv?

The report should contain:

  • the main report file in html (with all the plots embedded)
  • the jupyter notebook*

* thus this time no .py scripts as the python code should be included in jupyter/html


The homework should be sent until 26.04.2020 via email to lukaskoz@mimuw.edu.pl with the email subject:

'lab8_hw_Name_Surname' without email text body and with 'lab8_hw_Name_Surname.7z' (ASCII letters only) attachment.

All emails with a different structure (the one that will not go through email filter to the proper email folder dedicated for home works) will be scored -10%

Using non-English labels, legends, descriptions, etc. will be scored -10%

Additionally, all problems with the structure of the plot e.g. the plot size, labels font size, etc. will also affect the grading. You need to follow advice included in the lectures.

Epilog: Read the article Same Stats, Different Graphs