Index of /teaching/dav_20/labs/lab1/

      Name                                                                             Last modified         Size  Description 
   
up Parent Directory 24-Feb-2020 16:10 -

================================================
====     The importance of the (big) data  =====
================================================

Exercise 1

Calculate the average protein length and amino acid 
content for different data sets:

a) E.coli, Bacillus subtilis, human, yeast,
A. thaliana, D. melanogaster, C. elegans, Mouse, 
Zebrafish (D. rerio)

b) PDB

c) UniProt
- full UniProt (Swiss-Prot)
- 200 randomly selected Bacteria
- 200 randomly selected Viruses
- 200 randomly selected Archaea
- 200 randomly selected Eukaryota


Make plots comparing average protein length between:
- selected organisms (a)
- all kingdoms (c)
- PDB vs Uniprot

For amino acid content you should calculate also some
error (e.g. standard deviation).

Moreover, check which amino acid is the most frequent 
at N-terminus. Can you justify why this one?

Additional material:
https://en.wikipedia.org/wiki/FASTA_format
https://en.wikipedia.org/wiki/List_of_model_organisms

Homework
1) Finish the plots for avg protein content for UniProt.
2) Find in the internet any plot you wish to show at next 
lesson (think how to make it better).

Proudly Served by LiteSpeed Web Server at bioinformatics.netmark.pl Port 80