Protein data bank (PDB) : 46818 structures (oct 2007) SCOP (Structural Classification Of Proteins): 971 folds (major structural similarity) 1586 super-families.

Protein data bank (PDB) : 46818 structures (oct 2007) SCOP (Structural Classification Of Proteins): 971 folds (major structural similarity) 1586 super-families (probable common evolutionary origin) 3004 families (clear evolutionary relationship, ~ 30% identity) Nearly all folds are known (?) But 5 millions known protein sequences (trEMBL) -> needs for structure prediction Protein structure prediction

Usually, structure-activity relationships : site-directed mutagenesis, pharmacologic studies, drug design,… But also: genomic studies : recognizing orphan genes distant evolution studies Structure prediction: what for ? Sequences diverge more than structures

Known structures : Simulations at the atom level: molecular modelling (enthalpic energy) / molecular dynamics /normal modes Methods for protein structural studies

Unknown structures : Before using molecular mechanics, one must have a « realistic » structure. 3D structure prediction : 1) homology modelling 2) ab initio folding 3) threading Methods for protein structural studies

Needs to know a 3D structure that is homolog to the query sequence e.g.: Modeller web server (http://www.salilab.org/modeller) Homology modelling

e.g.: Modeller web server (http://www.salilab.org/modeller) Homology modelling

AGVLVAGHM... generatio n Minimisation - energy evaluation Target sequence: Protein Data Bank (PDB) Ab initio folding Baker et al.

Threading (1) Protein Data Bank (PDB) families

Threading (1) family family core + interactions Protein Data Bank -> library of cores

Threading (2) Protein Data Bank (PDB) Statistics for 3D neighboring residue pairs -> Energy AL = -1.2 AI = -2.2... Other characteristics: residue accessibility, secondary structure,…

Threading (3) core

VI = -2.3 LN = -4.2 LG = -5.1 Threading (3) Thread the sequence onto the core

NG = -1.3 VI = -2.2 SA = -4.2 Threading (3) Thread the sequence onto the core

IG = -3.3 NG = -3.0 GL = -2.1 Compute energy for every alignment of the sequence onto the core (many alignments, gaps…) Threading (3) Thread the sequence onto the core -> choose the best core (low energy) Thread the sequence onto all cores

Threading Threading methods are under developments : - optimisation of 3D alignments - better core definition - statistical assessment for results Can be used when sequence tools (BLAST or PSIBLAST) cannot find simlarities

Threading Robetta : http://robetta.bakerlab.org/ 3DPSSM : http://www.sbg.bio.ic.ac.uk/ 3dpssm/ bioinbgu : http://www.cs.bgu.ac.il/ bioinbgu/form.html GenTHREADER : http://bioinf.cs.ucl.ac.uk/psipred/psiform.html FROST : http://genome.jouy.inra.fr/frost/

The end…

A) La quantification des similarités des paires de structures (comparaison «~tout contre tout~») donne la position d'une structure dans un espace abstrait de hautes dimensions. La hauteur des pics reflète la densité de population de repliements, les axes horizontaux sont les axes des deux premiers vecteurs propres (i.e. associés aux deux plus grandes valeurs propres), l'axe vertical donne le nombre de repliements. La distribution des architectures est donnée par la projection sur le plan (la proximité sur ce plan donne une indication sur la similarité structurale entre 2 protéines) B) 40% de tous les domaines connus sont couverts par 16 classes de repliements. Ces 16 repliements sont montrés ici sous forme de diagrammes topologiques de structures secondaires dans la classe de leur attracteur (le numéro d'attracteur est le même que dans la figure A). Figures tirées de Holm et Sander (1996) "Mapping the protein universe"

Threading: fonction dévaluation

Méthode dalignement séquence/structure

Méthode dalignement séquence/structure (2)

Normalisation des scores

Protein data bank (PDB) : 46818 structures (oct 2007) SCOP (Structural Classification Of Proteins): 971 folds (major structural similarity) 1586 super-families.

Présentations similaires

Présentation au sujet: "Protein data bank (PDB) : 46818 structures (oct 2007) SCOP (Structural Classification Of Proteins): 971 folds (major structural similarity) 1586 super-families."— Transcription de la présentation:

Présentations similaires

Notre projet

Feed-back

Entrer

S'autoriser via un réseau social:

Protein data bank (PDB) : 46818 structures (oct 2007) SCOP (Structural Classification Of Proteins): 971 folds (major structural similarity) 1586 super-families.

Présentations similaires

Présentation au sujet: "Protein data bank (PDB) : 46818 structures (oct 2007) SCOP (Structural Classification Of Proteins): 971 folds (major structural similarity) 1586 super-families."— Transcription de la présentation:

Présentations similaires

Notre projet

Feed-back