Problem 3a. In this problem we look at the mean and the standard deviation from a more statistical point of view.
Does the standard deviation accurately describe the typical deviations?
Extra task. Explore np.random in order to get other distributions:
Plot the histograms, the probability density functions (PDF's) and the cumulative distribution functions (CDF's) for some of them (scipy.stats.norm). Play with the parameters.
Answer the same question about them (Does the standard deviation accurately describe the typical deviations?).
Extra task: The importance of the ploting
Analyze the dataset: ans.csv
Calculate the missing values (fill the tables).
Property | Value | Accuracy |
---|---|---|
Mean of x | ? | ? |
Sample variance of x | ? | ? |
Mean of y | ? | ? |
Sample variance of y | ? | ? |
Correlation between x and y | ? | ? |
Linear regression line (y = a+bx) | ? | ? |
Coefficient of determination of the linear regression | ? | ? |
Analyze the dataset: ans2.tsv
Property | Value | Accuracy (up to 3 places) |
---|---|---|
Mean of x | ? | ? |
Mean of y | ? | ? |
SD of x | ? | ? |
SD of y | ? | ? |
Corr | ? | ? |
Now, make plots for all data from ans.csv and ans2.tsv
Can you guess what "d" and "s" stand for in given subsets of ans2.tsv?
Problem 3b. In this task we consider a discrete distribution without a mean and verify whether we can still estimate the location of its peak by simply computing averages. Consider a discrete variable $X$ with the following distribution: $P(X=k) = \frac{1}{4|k|(|k|+1)}$ for $k \neq 0$ and $P(X=0) = \frac{1}{2}$.
Problem 3c. We are now going to investigate an intermediate case - a variable with a finite mean, but no variance. Consider a discrete variable $Y$ with the following distribution: $P(Y=k) = \frac{1}{|k|(|k|+1)(|k|+2)}$ for $k \neq 0$ and $P(Y=0) = \frac{1}{2}$.