La présentation est en train de télécharger. S'il vous plaît, attendez

La présentation est en train de télécharger. S'il vous plaît, attendez

The use of the concepts of evolutionary biology in genome annotation.

Présentations similaires


Présentation au sujet: "The use of the concepts of evolutionary biology in genome annotation."— Transcription de la présentation:

1 The use of the concepts of evolutionary biology in genome annotation.

2 Figenix Comparative genomics, concept of orthology and paralogy.
What is phylogenomics? Structural and functional annotation. Structural annotation (deciphering of gene structure). Functional annotation (especially the use of phylogeny to decipher proteins function). Figenix . Genome evolution CASSIOPE

3 ?? Metazoan Phylogeny (From Adoutte et al. 2000)
Arthropods Gastrotrichs Nematodes Onychophorans Tardigrades Kinorhynchs Priapulids ECDYSOZOANS Molluscs Rotifers Annelids Gnathostomulids Sipunculans Nemerteans Pogonophorans Platyhelminthes Entoprocts Bryozoans Brachiopods Phoronids LOPHOTROCHOZOANS Vertebrates Cephalochordates Urochordates Hemichordates Echinoderms PROTOSTOMES DEUTEROSTOMES BILATERIA Ctenophorans Cnidarians Poriferans Urbilateria ?? Metazoan Phylogeny (From Adoutte et al. 2000) Drosophila | Anopheles C.elegans Human | Mouse | Zebrafish | Fugu Ciona Sequenced metazoan species

4 URBILATERIA : The hypothetical Metazoan Ancestor
Geoffroy de St Hilaire during XIX th Century URBILATERIA Genome evolved by the fixation of : Gene mutation Gene loss Genic duplication Gene duplication Genome region duplication Whole genome duplication 800 millions years ago …

5 AIS (Adaptative Immune System)
Large scale gene duplication in vertebrate lineage Amniota (Human) 360 450 Lisamphibia Vertebrates AIS (Adaptative Immune System) 528 T2 Actinopterygii (Zebrafish) Chondrichthyes (shark) 564 Deutérostomata T1 Pikaia Cephalaspidomorphi (lamprey) 751 Myxini (Hagfish) genes >751 Céphalochordata (amphioxus) < Urochordata (Ciona) Echinodermata Insects (Drosophila) Protostomata Nématod (c. elegans)

6 A URBILATERIA Speciation DROSOPHILA Ancestral Function HUMAN
Orhologs under purifying selection A URBILATERIA Speciation Purifying Selection DROSOPHILA Ancestral Function HUMAN

7 A A2 URBILATERIA Speciation DROSOPHILA Ancestral Function HUMAN
Ortholog functional switch A A2 URBILATERIA Speciation Purifying Selection DROSOPHILA Ancestral Function HUMAN New Function Positive selection Or relaxed

8 A A’ URBILATERIA Speciation DROSOPHILA Ancestral Function A”
Co-ortholog Sub Functionalization A A’ URBILATERIA Speciation Purifying Selection DROSOPHILA Ancestral Function A” Duplication HUMAN Sub-Function

9 A URBILATERIA Speciation DROSOPHILA Ancestral Function A2 Duplication
Co-ortholog Neo Functionalization A URBILATERIA Speciation Purifying Selection DROSOPHILA Ancestral Function A2 Duplication HUMAN New Function Positive or relaxed selection

10 A A1/2 A3 A1 A2 URBILATERIA A3’ A3” HUMAN multigenic family
Orthologs and paralogs A1/2 A3 A A1 A2 URBILATERIA A3’ A3” HUMAN multigenic family DROSOPHILA multigenic family A1, A2, B Paralogs Duplication Speciation

11 A1 HUMAN A1 DROSO A1/2 A2 HUMAN A2 DROSO A A3’ HUMAN A3” HUMAN A3
Orthology/ Paralogy A1 HUMAN A1 DROSO A2 HUMAN A2 DROSO A3’ HUMAN A3” HUMAN A3 DROSO Co-Orthologues Duplication Speciation A A1/2 A3 Orthologs : 2 genes on different species Which come from a common ancestor and separated by a speciation event. Paralogs : 2 genes resulting from a duplication event in a genome.

12 How to evidence orthologous relationship ?
Many scientists are using the best BLAST hit to look for orthologous relationship … BUT! Many co orthologs can be present Problem with genomes that are not fully sequenced Or when gene loss occurred AND … Even with Phylogenetic analysis : Bias must be corrected. Different methods must be used to reconstruct phylogenetic trees.

13 A URBILATERIA Speciation DROSOPHILA Ancestral Function A2 Duplication
Co-ortholog Neo Functionalization A URBILATERIA Speciation Purifying Selection DROSOPHILA Ancestral Function A2 Duplication HUMAN New Function

14

15 Paralogue replacement
Constitutive proteasome β-subunits replacement after Interferon-γ stimulation Constitutive Proteasome Immuno-Proteasome Paralogue replacement PSMB8 (LMP 7) PSMB9 (LMP 2) PSMB10 (LMP Z) PSMB5 PSMB6 PSMB7 New function (specialization) (Specific size protein or peptide degradation – used by MHC system) Only found in vertebrates Ancestral function : Protein degradation Present in all Metazoans, therefore present in Urbilateria (Metazoan ancestor). Paralogue = duplicated gene

16 Immuno Proteasome Vertebrates Deutérostomata Proteasome Protostomata
Large scale gene duplication in vertebrate lineage Immuno Proteasome Proteasome Deutérostomata Protostomata Vertebrates Amniota (Human) Lisamphibia Chondrichthyes (shark) Cephalaspidomorphi (lamprey) Céphalochordata (amphioxus) Echinodermata Actinopterygii (Zebrafish) Urochordata (Ciona) Insects (Drosophila) Myxini (Hagfish) Nématod (c. elegans) 751 >751 564 528 450 < T1 T2 360 PROTEASOME genes Pikaia

17 PSMB7 Mus PSMB7 Ratt PSMB7 Bos PSMB7 Homo PSMB7 Gall PSMB7 Xeno PSMB7 Zebra PSMB7 Fugu PSMB10 Zebra PSMB10 Fugu PSMB10 Bos PSMB10 Mus PSMB10 Homo PSMB7/10 Bran PSMB7/10 Ci-zeta Cionai PSMB7/10 Bombyx PSMB7/10 Prosbeta2 PSMB7/10 CG18341 Drosophila 62 100 44 95 93 78 59 58 88 98 52 80 0.1 * 74 99 69 76 91 75 Duplication

18 PHYLOGENOMICS = STUDY genes and genomes history. => HELP to find evidences for gene function.

19 Structural and functional annotation.
Comparative genomics, concept of orthology and paralogy. What is phylogenomics? Structural and functional annotation. Structural annotation (deciphering of gene structure). Functional annotation (especially the use of phylogeny to decipher proteins function). Figenix . Genome evolution CASSIOPE

20 Une prédiction structurale correcte pour une analyse phylogénétique correcte.

21 Gene finding and protein prediction
Structural annotation Genome nucleotide-level Annotation : Mapping Finding genomic landmarks Gene finding and protein prediction Non-coding RNAs and regulatory regions Identifying repetitive elements Mapping segmental duplications Mapping variations (SNP, microsatellites, ….)

22 Available tools State of the Art Structural annotation Ab initio :
Genscan Fgenesh Genie Etc … Based on statistical signals within the DNA. Coding propensity (hexamer signals). Splice Site Signals. Strengths : Easy and quick to run. Only need DNA as input. Weakness : High false positive rate. Similarity Assisted : GenomeScan Twinscan Extension of ab initio programs. Use sequence similarities to guide the predictions Strengths : Should be better than pure ab initio. Weakness : High false positive rate. Similarity Based : Genewise Sim4 Est2genome Alignement programs that know about gene structure. Very accurate with strong sequence similarities Strengths : Accurate. Weakness : Need strong similarities, slow to run.

23 « FIGENIX SOFTWARE PLATFORM » Annotating method
Structural annotation « FIGENIX SOFTWARE PLATFORM » Annotating method Structural Annotation combining together a statistical and homologous approach (similarities with known proteins). The process automation resulted in an expert system based on biological inference rules using gene history and ab-initio program.

24 région 1 région 2 segment ADN protéine A (meilleur hit région 1)
protéine B (meilleur hit région 2) région 1 région 2 hsp: A1 hsp: A2 hsp: A3 hsp: B1 hsp:B2

25 D M S A La « meilleure » solution sera par exemple +

26 Validation of structural annotation
Protein = amino acid sequence Gene = nucleotidic sequence mRNA = nucleotidic sequence P Transcription Traduction Genome Sequence Sequence Experimentation Genscan : 31% Protein Protein HMMGene : 38% Result : 100% Figenix  : 87% The platform performances were validated on standard dataset (HMR195) see Guigò et al, 2000; Rogic et al, 2001.

27 CORRECT PROTEIN PREDICTION
Structural annotation Accuracy versus Exon Type and Prediction 0.87 0.38 0.31 CORRECT PROTEIN PREDICTION 0.22 0.65 0.80 0.55 Genscan 0.05 0.95 0.92 0.91 figenix 0.15 0.78 0.81 0.75 Hmmgen OVER PREDICTION Terminal (55) Internal (186) Initial EXON TYPE PROGRAMS The Mouse and Rat sequence from the HMR195 dataset was used on the human division of swissprot.

28 Functional annotation
Biochemical and Biological process : Experimental approach : RNA Interference Tandem affinity purification and mass spectrometry In Silico Similarity

29 Functional annotation
Based on phylogeny. It is inferred exclusively from experimentally annotated genes…

30 Small fraction correspond to known, well-characterized proteins
Small fraction correspond to known, well-characterized proteins. If the function is unknown : Phylogenetic analysis : Case 1 : an ortholog of experimentally known function is found. The function of the gene to annotate can be deduced. Case 2 : no ortholog of experimentally known function is found. The function of the gene to annotate will be deduced by the knowledge of the function of the closest paralog. In both cases the protein molecular function prediction by Bayesian Phylogenomics can be used (Engelhardt et al PLOS Computional biology 2005)

31 Textual Information Analysis
Functional annotation Orthologs, Paralogs with experimentally known function: how information can be found. Gene Ontology SwissProt GenBank MedLine Textual Information Analysis G.O. Standard

32 Gene Ontology Classification
Functional annotation Gene Ontology Classification Functionality classification : Three GO categories Biological process – biological process to which the gene or gene product contributes. Cell growth and maintenance; pyrimidine metabolism; … Molecular function – biochemical activity, including specific binding to ligands or structures, of a gene product. Enzyme, transporter; Toll receptor ligand, … Cellular component – place in the cell where a gene product is active. Cytoplasm, ribosome, …

33 Tumor necrosis factor family Phylogenetic tree :
Orthologs identification GgaTNFSF10 DreTNFSF10 HsaTNFSF10 PolTNFSF11 HsaTNFSF11 XlaTNFSF11 GgaTNFSF5 HsaTNFSF5 BboTNFSF5 MmuTNFSF2 HsaTNFSF2 MmuTNFSF1 HsaTNFSF1 MmuTNFSF15 HsaTNFSF15 HsaTNFSF14 MmuTNFSF14 HsaTNFSF6 RnoTNFSF6 HsaTNFSF13 MmuTNFSF6 GgaTNFSF13 PolTNFSF13 MmuTNFSF7 HsaTNFSF7 HsaTNFSF8 MmuTNFSF8 HsaTNFSF9 MmuTNFSF9 EIGER (DmeTNF) 99 96 73 78 79 95 MmuTNFSF5 98 88 69 74 55 58 97 68 0,2 DF1 DF2 DF3 Atherosclerotic plaque formation ALPS - LPR/GLD Lymphoproliferativesyndrome Trends in Immunology (July 2003)

34 Functional annotation
Human TNF family Phylogenetic tree : Search for the closest Paralog Functional annotation Molecular Function Biological Process TNFSF3 TNFRSF3 LN, PP, GC, Tumorocidal activity TNFSF1 TNFRSF1A PP, GC, T cell Homeostasis (death) TNFSF2 TNFRSF1B T cell Homeostasis (death) TNFSF15 TNFRSF12 T cell costimulation, negative selection? T cell Homeostasis (survival?), CTL activation, peripheral tolerance? TNFSF14 TNFRSF14 TNFRSF6B T cell Homeostasis (death), CTL function, peripheral tolerance, T cell costimulation, chemotaxis TNFSF6 TNFRSF6 TNFSF18 TNFRSF18 T cell transmigration and homeostasis (survival)? TNFSF4 TNFRSF4 T cell homeostasis (survival), peripheral tolerance GC, B cell function, peripheral tolerance, T cell priming TNFSF5 TNFRSF5 TNFRSF10B TNFRSF10A TNFRSF10C TNFRSF10D Tumorocidal activity, T cell function? TNFSF10 TNFRSF11B TNFSF11 TNFRSF11A LN, bone Homeostasis, mammary gland development BR3 B cell Homeostasis B cell Homeostasis ? TNFSF13B TNFRSF17 TNFSF13 TACI TNFSF12? TNFSF7 TNFRSF7 T cell activation? TNFSF9 TNFRSF9 T cell activation and survival, CTL activity, Tumorocidal actvity? TNFSF8 TNFRSF8 Negative selection, autoimmunity TNFRSF19 ? EDA-A1 EDAR Tooth, hair, sweat gland formation EDA-A2 XEDAR Tooth, hair, skin formation? TNFRSF21 ? Trends in Immunology (July 2003) RELT ?

35 INFORMATISATION DES CONCEPTS

36 FIGENIX FIGENIX est une plate-forme logicielle multi-utilisateur dédiée aux taches d'annotation structurales et fonctionnelles: - Prédictions de gènes pour de grandes séquences d'ADN - Construction d'arbres phylogénétiques robustes - Détection automatique d'orthologues et de paralogues - Recherche automatique de données fonctionnelles sur les gènes disponibles à partir de bases de données « Web » - Filtrage et construction de bases de données protéiques (contigage d'EST) - Processus chainés (ex: Prédiction de gènes suivie d'études phylogénétiques pour chacun)

37 ETAPES DU PIPELINE de Phylogénie (1)
Séquence protéique codée par un gène putatif Ensembl NR… BLAST + filtrage CLUSTAL W + purification + correction de biais PFAM Alignement multiple Recherche de domaines par HmmPFAM Conservation « repeats » monophylétiques Enumération domaines Construction Arbre de la Vie Alignement « repeats » fusionnés Existence « repeats »? O N Arbre de référence Test de composition par TREEPuzzle pour élim séq trop divergentes Création domaine « FIGENIX » (correctDomains) Conservation alignement complet

38 ETAPES DU PIPELINE de phylogénie (2)
Détection « groupes de paralogie » + élim sites qui évol trop vites (« test de Gu ») Élim séq >30% « gaps » Construction Arbre de la Vie Élim domaines les + non congruents détectés par HomPart de PAUP Arbre de référence Test de saturation NJ Parcimonie Maximum de vraisemblance arbre arbre arbre Comparaison topologies par tests Templeton-Hasegawa Arbre NJ Topologies congruentes? Arbre consensus N O Détection orthologues I recherche de fonctions

39

40 Architecture de FIGENIX EGEE
EST Agent MGI Agent GO Agent Functional Collector Agent Archiver RDBMS Expert System Genomic Data Annotation Engine Persistence Layer Repository Load Balancing, Security, ... Web Server Request Data exchange - plate-forme Intranet/Extranet -architecture 3 tiers (interface web/ serveurs “métier” / base de données)

41 Résultats (1) EGEE

42 Résultats (2) EGEE

43 Gouret P, Vitiello V, Balandraud N, Gilles A, Pontarotti P, Danchin EG
  Gouret P, Vitiello V, Balandraud N, Gilles A, Pontarotti P, Danchin EG. FIGENIX: intelligent automation of genomic annotation: expertise integration in a new software platform. BMC Bioinformatics Aug 5;6:198 __________________________________________ Balandraud N , Gouret P , Danchin EGJ , Blanc M , Zinn D , Roudier J Pontarotti P A rigorous method for multigenic families' functional annotation: the peptidyl arginine deiminase (PADs) proteins family example BMC Genomics 2005, 6:153     doi: /

44 Analysis using Figenix
Vienne et al . Evolution of the proto-MHC ancestral region: more evidence for the plesiomorphic organisation of human chromosome 9q34 region. Immunogenetics (7):429-36 Danchin E, et al. The Major Histocompatibiliy Complex Origin Immunological reviews April;198(1): Danchin EGJ , Gouret P, Pontarotti P Universally conserved genes lost in mammals and vertebrates BMC evolutionary biology accepted . C Yu, et al Roles of co-option in the emergence of vertebrate adaptive immune system, insights from amphioxus submitted On line users : INSERM U624*, TAGC, UPRESA CNRS 6032*, Marseille, INRA Nancy , Institute Mol. Genet., Acad.Sci. Czech Republic, SunYat Sen University China, Uppsala University, Department of Neuroscience Sweden. * Draft papers

45 Structural and functional annotation.
Comparative genomics, concept of orthology and paralogy. What is phylogenomics? Structural and functional annotation. Structural annotation (deciphering of gene structure). Functional annotation (especially the use of phylogeny to decipher proteins function). Figenix . Genome evolution CASSIOPE

46 C.A.S.S.I.O.P.E C.A.S.S.I.O.P.E: Clever Agent System for Synteny Inheritance and Other Phenomena in Evolution find conserved regions between genomes C.A.S.S.I.O.P.E decrease 50 times the working time

47 C.A.S.S.I.O.P.E.

48 Vers la reconstruction des génomes ancestraux

49

50 Etienne Danchin (AFMB) Collaboration
Philippe Gouret Etienne Pardoux Vérane Vitiello Simona Grusea Nathalie Balandraud Alexandre Vienne Virginie Lopez Magali Lienart Pierre Pontarotti


Télécharger ppt "The use of the concepts of evolutionary biology in genome annotation."

Présentations similaires


Annonces Google