Index of /teaching/dav_20/labs/lab2/
Name Last modified Size Description
Parent Directory 02-Mar-2020 16:19 -
==== The importance of the (big) data =====
a) E.coli, human, yeast,
A. thaliana, D. melanogaster, C. elegans, Mouse,
Zebrafish (D. rerio), Bacillus subtilis
Prepare bar plot (matplotlib) showing protein length for all 9 organisms:
- x,y axes should have description
- aggregate bars for the same group with different color
- add legend (upper-left corner)
- add error bar to each bar
Calculate percentage content of all amino acids and prepare table (PrettyTable module).
Additionally, prepare bar plot for percentage content of all amino acids for E.coli,
human, yeast (thus group three bars for each amino acid).
- calculate the average length of protein and percentage content of all amino acids (just numbers)
Compare the result with the point (a). Can you explain the difference
(hint: open ftp://ftp.wwpdb.org/pub/pdb/derived_data/pdb_seqres.txt in text editor)?
- full UniProt (Swiss-Prot)
- 200 randomly selected Bacteria
- 200 randomly selected Viruses
- 200 randomly selected Archaea
- 200 randomly selected Eukaryota
Prepare similar box plots and table as in (a).
d) data exploration:
- for each organism (a) and kingdom (b) make separate histogram for protein length
- calculate and plot median instead arthmetic mean
- instead bar plots, use "boxplot" function (only protein length)
Discuss which is better: median or arthmetic mean (prons and cons)?
Moreover, answer which amino acid is the most frequent
at N-terminus? Can you justify why this one? Is it the
same in each organism?
Prapare short report (pdf) containing all above plots, tables and answers to above questions and send it to firstname.lastname@example.org until 08.03.2020.
Proudly Served by LiteSpeed Web Server at bioinformatics.netmark.pl Port 80