La présentation est en train de télécharger. S'il vous plaît, attendez

La présentation est en train de télécharger. S'il vous plaît, attendez

EA 3781 Evolution Biologique

Présentations similaires


Présentation au sujet: "EA 3781 Evolution Biologique"— Transcription de la présentation:

1 EA 3781 Evolution Biologique
The use of the concepts of evolutionary biology in genome (biological) annotation. Pierre Pontarotti EA 3781 Evolution Biologique

2

3 Somes Concepts in evolutionary biology
Use of the concepts for Gene Structural and functional annotation. Informatisation Others concepts

4 ?? Metazoan Phylogeny ( Adoutte et al. 2000) ECDYSOZOANS
Arthropods Gastrotrichs Nematodes Onychophorans Tardigrades Kinorhynchs Priapulids ECDYSOZOANS Molluscs Rotifers Annelids Gnathostomulids Sipunculans Nemerteans Pogonophorans Platyhelminthes Entoprocts Bryozoans Brachiopods Phoronids LOPHOTROCHOZOANS Vertebrates Cephalochordates Urochordates Hemichordates Echinoderms PROTOSTOMES DEUTEROSTOMES BILATERIA Ctenophorans Cnidarians Poriferans Urbilateria ?? Metazoan Phylogeny ( Adoutte et al. 2000)

5 URBILATERIA : The hypothetical Metazoan Ancestor
Geoffroy de St Hilaire during XIX th Century URBILATERIA Genome evolved by the fixation of : Nucleotide substitution Gene loss Genic duplication Gene duplication Genome region duplication Whole genome duplication Chromosomal rearrangement

6 Vertebrates Deutérostomata Protostomata
Large scale gene duplication in vertebrate lineage Amniota (Human) 360 450 Lisamphibia Vertebrates 528 T2 Actinopterygii (Zebrafish) Chondrichthyes (shark) 564 Deutérostomata T1 Pikaia Cephalaspidomorphi (lamprey) 751 Myxini (Hagfish) genes >751 Céphalochordata (amphioxus) < Urochordata (Ciona) Echinodermata Insects (Drosophila) Protostomata Nématod (c. elegans)

7 From alleles to orthologs
Points mutations I A B C D Population : POP 1 POP 1 split in 2 autonomous populations POP 1A POP 1B Allele A fixation and accumulation of new mutations A1 A2 B1 B2 Allele B fixation and accumulation of new mutations

8 From alleles to orthologs points mutations
POP 1A POP 1B A1 A2 B1 B2 POP 1A1 POP 1A2 POP 1B1 POP 1B2 A11 A12 A21 A22 B11 B12 B21 B22 POP 1B split in 2 autonomous populations Allele A1 fixation and accumulation of new mutations POP 1A split in Allele A2 fixation and accumulation of new mutations Allele B1 fixation and accumulation of new mutations Allele B2 fixation and accumulation of new mutations

9 From alleles to orthologs
B.1.1 B.1.2 B.2.1 B.2.2 Alleles Orthologs

10 A HUMAN multigenic family DROSOPHILA multigenic family A2 A3’ A3” A1
Orthologs and paralogs HUMAN multigenic family DROSOPHILA multigenic family A2 A3’ A3” A1 A1 A2 A3 Duplication Speciation A1 A2 A3 URBILATERIA A1, A2, B Paralogs A1/2 A3 A

11 A1 HUMAN A1 DROSO A1/2 A2 HUMAN A2 DROSO A A3’ HUMAN A3” HUMAN A3
Orthology/ Paralogy A1 HUMAN Orthologs : 2 genes on different species Which come from a common ancestor and separated by a speciation event. A1 DROSO A1/2 A2 HUMAN Paralogs : 2 genes resulting from a duplication event in a genome. A2 DROSO A A3’ HUMAN A3” HUMAN Co-Orthologues A3 Duplication Speciation A3 DROSO

12 From Gene History To Gene Function

13 A HUMAN Ancestral Function DROSOPHILA Ancestral Function A A
Orhologs under purifying selection HUMAN Ancestral Function DROSOPHILA Ancestral Function A A Purifying Selection Purifying Selection Speciation A URBILATERIA

14 A HUMAN New Function ? DROSOPHILA Ancestral Function A2 A Speciation
Ortholog functional switch HUMAN New Function ? DROSOPHILA Ancestral Function A2 A Positive selection Or relaxed Purifying Selection Speciation A URBILATERIA

15 A DROSOPHILA Ancestral Function HUMAN Sub-Function HUMAN Sub-Function
Co-ortholog Sub Functionalization DROSOPHILA Ancestral Function HUMAN Sub-Function HUMAN Sub-Function A’ A” A Duplication Purifying Selection Speciation A URBILATERIA

16 A HUMAN Ancestral Function HUMAN New Function DROSOPHILA
Co-ortholog Neo Functionalization HUMAN Ancestral Function HUMAN New Function DROSOPHILA Ancestral Function A A2 A Positive or relaxed selection Duplication Purifying Selection Purifying Selection Speciation A URBILATERIA

17 Orthology /paralogy information
is important for functional inference (forget for species with high level of horizontal transfer)

18 A1 HUMAN A1 DROSO A1/2 A2 HUMAN A2 DROSO A A3’ HUMAN A3” HUMAN A3
Orthology/ Paralogy A1 HUMAN Orthologs : 2 genes on different species Which come from a common ancestor and separated by a speciation event. A1 DROSO A1/2 A2 HUMAN Paralogs : 2 genes resulting from a duplication event in a genome. A2 DROSO A A3’ HUMAN A3” HUMAN Co-Orthologues A3 Duplication Speciation A3 DROSO

19 A Warning that will be discussed by other intervenants
Many scientists are using the best BLAST hit to look for orthologous relationship … BUT! Many co orthologs can be present Problem with genomes that are not fully sequenced Or when gene loss occurred AND Even with Phylogenetic analysis : Bias must be corrected. A phylogenetic tree is hypothetical

20 Evolutionary shift (due to positive or relaxed selection) could be linked to functional shift . See N Galtier and A Levasseur talks.

21

22 Detection of Positive selection and functional shift

23

24 Detection of Evolutionary constraint relaxation and functional shift

25 A HUMAN Ancestral Function HUMAN New Function DROSOPHILA
Co-ortholog Neo Functionalization HUMAN Ancestral Function HUMAN New Function DROSOPHILA Ancestral Function A A2 A Duplication Purifying Selection Purifying Selection Speciation A URBILATERIA

26

27 Paralogue replacement
Constitutive proteasome β-subunits replacement after Interferon-γ stimulation Constitutive Proteasome Immuno-Proteasome Paralogue replacement PSMB8 (LMP 7) PSMB9 (LMP 2) PSMB10 (LMP Z) PSMB5 PSMB6 PSMB7 New function (specialization) (Specific size protein or peptide degradation – used by MHC system) Only found in vertebrates Ancestral function : Protein degradation Present in all Metazoans, therefore present in Urbilateria (Metazoan ancestor). Paralogue = duplicated gene

28 Immuno Proteasome Vertebrates Deutérostomata Proteasome Protostomata
Large scale gene duplication in vertebrate lineage Amniota (Human) 360 450 Lisamphibia Vertebrates 528 Immuno Proteasome Actinopterygii (Zebrafish) Chondrichthyes (shark) 564 Deutérostomata Cephalaspidomorphi (lamprey) 751 Myxini (Hagfish) Proteasome >751 Céphalochordata (amphioxus) < Urochordata (Ciona) PROTEASOME Echinodermata Insects (Drosophila) Protostomata Nématod (c. elegans)

29 PSMB7 Mus PSMB7 Ratt PSMB7 Bos PSMB7 Homo PSMB7 Gall PSMB7 Xeno PSMB7 Zebra PSMB7 Fugu PSMB10 Zebra PSMB10 Fugu PSMB10 Bos PSMB10 Mus PSMB10 Homo PSMB7/10 Bran PSMB7/10 Ci-zeta Cionai PSMB7/10 Bombyx PSMB7/10 Prosbeta2 PSMB7/10 CG18341 Drosophila 62 100 44 95 93 78 59 58 88 98 52 80 0.1 * 74 99 69 76 91 75 Duplication

30 The study genes and genomes HISTORY.
Help to find evidences for gene FUNCTION.

31 Concepts in evolutionary biology
Use of the concepts for Structural and functional annotation. Structural annotation (deciphering of gene structure). Functional annotation (especially the use of phylogeny to decipher proteins function). .

32 Functional annotation
Biochemical and Biological process : Experimental approach : RNA Interference Tandem affinity purification and mass spectrometry In Silico

33 Functional annotation
Based on phylogeny. from experimentally annotated genes…

34 INTERLUDE FUNCTION???? A complex concept;

35 Function Prediction Using orthology information (done) Using the evolutionary shift information (in progress) Function prediction by Integrative phylogenomics (Engelhardt et al PLOS Computional biology 2005) (in progress)

36 Textual Information Analysis
Functional annotation Homologs with experimentally known function: how information can be found. Gene Ontology SwissProt GenBank MedLine Textual Information Analysis G.O. Standard

37 Gene Ontology Classification
Functional annotation Gene Ontology Classification Biological process – biological process to which the gene or gene product contributes. Cell growth and maintenance; pyrimidine metabolism; … Molecular function – biochemical activity, including specific binding to ligands or structures, of a gene product. Enzyme, transporter; Toll receptor ligand, … Cellular component – place in the cell where a gene product is active. Cytoplasm, ribosome, … . Plus others classifications to develop: In particular evolutionary based ontology

38

39 Functional prediction: Using orthology information
Small fraction correspond to known, well-characterized proteins. If the function is unknown : Phylogenetic analysis : Functional prediction: Using orthology information Using the evolutionary shift information by integrative Phylogenomics

40 Tumor necrosis factor family Phylogenetic tree :
Orthologs identification GgaTNFSF10 DreTNFSF10 HsaTNFSF10 PolTNFSF11 HsaTNFSF11 XlaTNFSF11 GgaTNFSF5 HsaTNFSF5 BboTNFSF5 MmuTNFSF2 HsaTNFSF2 MmuTNFSF1 HsaTNFSF1 MmuTNFSF15 HsaTNFSF15 HsaTNFSF14 MmuTNFSF14 HsaTNFSF6 RnoTNFSF6 HsaTNFSF13 MmuTNFSF6 GgaTNFSF13 PolTNFSF13 MmuTNFSF7 HsaTNFSF7 HsaTNFSF8 MmuTNFSF8 HsaTNFSF9 MmuTNFSF9 EIGER (DmeTNF) 99 96 73 78 79 95 MmuTNFSF5 98 88 69 74 55 58 97 68 0,2 DF1 DF2 DF3 Atherosclerotic plaque formation ALPS - LPR/GLD Lymphoproliferative syndrome Trends in Immunology (July 2003)

41 Functional annotation
Human TNF family Phylogenetic tree : Search for the closest Paralog Functional annotation Molecular Function Biological Process TNFSF3 TNFRSF3 LN, PP, GC, Tumorocidal activity TNFSF1 TNFRSF1A PP, GC, T cell Homeostasis (death) TNFSF2 TNFRSF1B T cell Homeostasis (death) TNFSF15 TNFRSF12 T cell costimulation, negative selection? T cell Homeostasis (survival?), CTL activation, peripheral tolerance? TNFSF14 TNFRSF14 TNFRSF6B T cell Homeostasis (death), CTL function, peripheral tolerance, T cell costimulation, chemotaxis TNFSF6 TNFRSF6 TNFSF18 TNFRSF18 T cell transmigration and homeostasis (survival)? TNFSF4 TNFRSF4 T cell homeostasis (survival), peripheral tolerance GC, B cell function, peripheral tolerance, T cell priming TNFSF5 TNFRSF5 TNFRSF10B TNFRSF10A TNFRSF10C TNFRSF10D Tumorocidal activity, T cell function? TNFSF10 TNFRSF11B TNFSF11 TNFRSF11A LN, bone Homeostasis, mammary gland development BR3 B cell Homeostasis B cell Homeostasis ? TNFSF13B TNFRSF17 TNFSF13 TACI TNFSF12? TNFSF7 TNFRSF7 T cell activation? TNFSF9 TNFRSF9 T cell activation and survival, CTL activity, Tumorocidal actvity? TNFSF8 TNFRSF8 Negative selection, autoimmunity TNFRSF19 ? EDA-A1 EDAR Tooth, hair, sweat gland formation EDA-A2 XEDAR Tooth, hair, skin formation? TNFRSF21 ? Trends in Immunology (July 2003) RELT ?

42 Gene function prediction: Using orthology information
Small fraction correspond to known, well-characterized proteins. If the function is unknown : Phylogenetic analysis : Gene function prediction: Using orthology information Using the evolutionary shift information ( see Levasseur talk) by integrative Phylogenomics

43

44 evolutionary biology concepts for genome annotation
Further reading Concepts Levasseur A, Danchin E, Orlando L, Bailly X, Pontarotti P. Conceptual bases for quantifying the role of the environment on genomes evolution: the participation of positive selection and neutral evolution Biological review in press Danchin E.G.J, et al. The Major Histocompatibiliy Complex Origin Immunological reviews. 2004;198(1): Concepts for applied evolution Danchin E.G.J, Levasseur A, Lopez-Rascol V, Gouret P, Pontarotti P. The use of evolutionary biology concepts for genome annotation. J. Exp. Zoology Part B: Mol. and Dev. Evol

45

46 Informatisation des concepts et connaissances
Phylogénie Détection des gènes orthologues et paralogues Détection de changements évolutifs (en cours) Prévision de fonctions

47 FIGENIX est une plate-forme logicielle multi-utilisateur dédiée aux taches d'annotation structurales et fonctionnelles: - Prédictions de gènes pour de grandes séquences d'ADN - Construction d'arbres phylogénétiques robustes - Détection automatique d'orthologues et de paralogues - Recherche automatique de données fonctionnelles sur les gènes disponibles à partir de bases de données « Web » - Filtrage et construction de bases de données protéiques (contigage d'EST) - Processus chainés (ex: Prédiction de gènes suivie d'études phylogénétiques pour chacun)

48 ETAPES DU PIPELINE de Phylogénie (1)
Séquence protéique codée par un gène putatif Ensembl NR… BLAST + filtrage CLUSTAL W + purification + correction de biais PFAM Alignement multiple Recherche de domaines par HmmPFAM Conservation « repeats » monophylétiques Enumération domaines Construction Arbre de la Vie Alignement « repeats » fusionnés Existence « repeats »? O N Arbre de référence Test de composition par TREEPuzzle pour élim séq trop divergentes Création domaine « FIGENIX » (correctDomains) Conservation alignement complet

49 ETAPES DU PIPELINE de phylogénie (2)
Détection « groupes de paralogie » + élim sites qui évol trop vites (« test de Gu ») Élim séq >30% « gaps » Construction Arbre de la Vie Élim domaines les + non congruents détectés par HomPart de PAUP Arbre de référence Test de saturation NJ Parcimonie Maximum de vraisemblance arbre arbre arbre Comparaison topologies par tests Templeton-Hasegawa Arbre NJ Topologies congruentes? Arbre consensus N O Détection orthologues I recherche de fonctions

50

51 Architecture de FIGENIX
EST Agent MGI Agent GO Agent Functional Collector Agent Archiver RDBMS Expert System Genomic Data Annotation Engine Persistence Layer Repository Load Balancing, Security, ... Web Server Request Data exchange - plate-forme Intranet/Extranet -architecture 3 tiers (interface web/ serveurs “métier” / base de données)

52 1)

53

54 Further reading: about concepts informatisation
Gouret et al.FIGENIX: intelligent automation of genomic annotation: expertise integration in a new software platform. BMC Bioinformatics Aug 5;6:198 Balandraud et al. A rigorous method for multigenic families' functional annotation: the peptidyl arginine deiminase (PADs) proteins family example BMC Genomics 2005, 6:153     

55 Further reading on FIGENIX utilization
Danchin et al . Eleven ancestral gene families lost in mammals and vertebrates while otherwise universally conserved in animals BMC Evolutionary Biology 2006, 6:5 Paillisson et al . Bromodomain testis-specific protein is expressed in mouse oocyte and evolves faster than its ubiquitously expressed paralogs BRD2, -3 and -4. Genomics. 2006 Levasseur et al Tracking the evolutionary and functional shifts connection: the lipase-esterase example.BMC evolutionary biology 2006

56

57 Structural annotation

58 Gene finding and protein prediction
Structural annotation Genome nucleotide-level Annotation : Mapping Finding genomic landmarks Gene finding and protein prediction Non-coding RNAs and regulatory regions Identifying repetitive elements Mapping segmental duplications Mapping variations (SNP, microsatellites, ….)

59 Available tools State of the Art Structural annotation Ab initio :
Genscan Fgenesh Genie Etc … Based on statistical signals within the DNA. Coding propensity (hexamer signals). Splice Site Signals. Strengths : Easy and quick to run. Only need DNA as input. Weakness : High false positive rate. Similarity Based : Genewise Sim4 Est2genome Figenix Alignement programs that know about gene structure. Very accurate with strong sequence similarities Strengths : Accurate. Weakness : Need strong similarities, slow to run.

60 « FIGENIX SOFTWARE PLATFORM » Annotating method
Structural annotation « FIGENIX SOFTWARE PLATFORM » Annotating method Structural Annotation combining together a statistical and homologous approach (similarities with known proteins). The process automation resulted in an expert system based on biological inference rules using gene history and ab-initio program. But yet not completely evolutionary biology based

61 région 1 région 2 segment ADN protéine A (meilleur hit région 1)
protéine B (meilleur hit région 2) région 1 région 2 hsp: A1 hsp: A2 hsp: A3 hsp: B1 hsp:B2

62 D M S A +

63 Validation of structural annotation
Protein = amino acid sequence Gene = nucleotidic sequence mRNA = nucleotidic sequence P Transcription Traduction Sequence Genscan : 31% HMMGene : 38% Protein Figenix  : 87% The platform performances were validated on standard dataset (HMR195) see Guigò et al, 2000; Rogic et al, 2001.

64 CORRECT PROTEIN PREDICTION
Structural annotation Accuracy versus Exon Type and Prediction 0.87 0.38 0.31 CORRECT PROTEIN PREDICTION 0.22 0.65 0.80 0.55 Genscan 0.05 0.95 0.92 0.91 Figenix 0.15 0.78 0.81 0.75 Hmmgen OVER PREDICTION Terminal (55) Internal (186) Initial EXON TYPE PROGRAMS The Mouse and Rat sequence from the HMR195 dataset was used on the human division of swissprot.

65 The next step for structural annotation :
Is to take into account the gene evolutionary history

66 Structural annotation (deciphering of gene structure).
Concepts , modélisation, informatisation, bio-annalyse Structural annotation (deciphering of gene structure). Functional annotation (especially the use of phylogeny to decipher proteins function).

67 Next Phylogenomics (genome Evolution) Phylopostgenomics
- phylotranscriptomics - phylointeractomics ………..

68

69 Connaissances/concepts
Observation : il existe des régions de syntenies conservées entre espèce. Explication /concept : ces régions proviennent d’une région ancestrale qui a évoluée de manière indépendante après spéciation dans chaque lignée, mais pas assez pour perdre toute trace de conservation. A partir de cette connaissance et de cette prédiction que découle un ensemble de réflexion qui indique que les analyses des synténies conservées et la reconstruction de régions ancestrales sont intéressantes, d’un point de vu appliqué : assistance au clonage positionnel et d’un point de vue conceptuel : compréhension de l’évolution des génomes. Formalisation de la question biologique Comment mettre en évidence les synténies conservées ? C’est aussi à ce moment que la conceptualisation prend toute sa place Si les synténies conservées proviennent vraiment d’une région ancestrale, les gènes dans ces régions doivent avoir ll faut donc avoir des programmes qui soient capables de mettre en évidence les relations d’orthologie, et de trouver des clusters significatifs. Reconstruction des génomes (translocation, fusion inversion… pondération de ces événements) 1/ des relations d’orthologie 2/ le regroupement des gènes orthologues doit être improbable sous l’hypothèse du hasard (le regroupement doit être significatif).

70 Modélisation mathématique
Il faut modéliser dans le cas ou les outils informatiques n’existent pas ou dont le formalisme biologique n’est pas correct. Ce qui est le cas pour les tests statistiques de regroupement (la taille des famille de in-paralogues en particulier). Modéliser la reconstruction des génomes Formalisation informatique 1)Algorithmes Tests statistiques Modélisation reconstruction ancestrale des génomes 2) Intégration avec les autres outils « informatique » dans le système informatique (CASSIOPE)

71 Bioanalyse Recherche automatique de synténies conservées. Reconstruction et évolution de régions génomique Nouvelle connaissance et nouveaux concepts Application directe : aide au clonage positionnel Concepts/connaissance: Mise en évidence de regroupement fonctionnel

72 C.A.S.S.I.O.P.E C.A.S.S.I.O.P.E: Clever Agent System for Synteny Inheritance and Other Phenomena in Evolution find conserved regions between genomes For more info see Virginie Lopez Rascol

73 C.A.S.S.I.O.P.E.

74 Toward the ancestral genome reconstruction

75 Toward the ancestral genome reconstruction

76 C.A.S.S.I.O.P.E Bioanalyse Recherche automatique de synténies conservées. Reconstruction et évolution de régions génomique Nouvelle connaissance et nouveaux concepts Application directe : aide au clonage positionnel Concepts/connaissance: Mise en évidence de regroupement fonctionnel

77 Projet MEG* (Modèlisation Evolution Génome)
Collaborateurs Projet MEG* (Modèlisation Evolution Génome) Nathalie Balandraud Etienne Danchin Philippe Gouret Vérane Vitiello Math/bio Julien Berestycki* Simona Grusea* Stéphanie Léocard* Valda Limic * Laure Rigal* Etienne Pardoux* Info/bio Olivier Chabrol* Virginie Lopez* Cedric Notredame* Concepts et bio-analyse Roxane Barthelemy * Jean, Paul Casanova* Elodie Darbo* Anthony Levasseur* Eric Faure* Pierre Pontarotti*

78 Open Discussion Phylo postgenomic


Télécharger ppt "EA 3781 Evolution Biologique"

Présentations similaires


Annonces Google