La présentation est en train de télécharger. S'il vous plaît, attendez

La présentation est en train de télécharger. S'il vous plaît, attendez

The use of the concepts of evolutionary biology in genome annotation.

Présentations similaires


Présentation au sujet: "The use of the concepts of evolutionary biology in genome annotation."— Transcription de la présentation:

1 The use of the concepts of evolutionary biology in genome annotation.

2 Comparative genomics, concept of orthology and paralogy. What is phylogenomics? Structural and functional annotation. Structural annotation (deciphering of gene structure). Functional annotation (especially the use of phylogeny to decipher proteins function). Figenix. Genome evolution CASSIOPE

3 Metazoan Phylogeny (From Adoutte et al. 2000) Drosophila | Anopheles C.elegans Human | Mouse | Zebrafish | Fugu Ciona Sequenced metazoan species

4 URBILATERIA : The hypothetical Metazoan Ancestor Geoffroy de St Hilaire during XIX th Century URBILATERIA Genome evolved by the fixation of : Gene mutation Gene loss Genic duplication Gene duplication Genome region duplication Whole genome duplication 800 millions years ago …

5 Large scale gene duplication in vertebrate lineage AIS (Adaptative Immune System) Deutérostomata Protostomata Vertebrates Amniota (Human) Lisamphibia Chondrichthyes (shark) Cephalaspidomorphi (lamprey) Céphalochordata (amphioxus) Echinodermata Actinopterygii (Zebrafish) Urochordata (Ciona) Insects (Drosophila) Myxini (Hagfish ) Nématod (c. elegans) 751 > < T1 T genes Pikaia

6 Orhologs under purifying selection A A A URBILATERIA Speciation Purifying Selection DROSOPHILA Ancestral Function HUMAN Ancestral Function Purifying Selection

7 Ortholog functional switch A A2 A URBILATERIA Speciation Purifying Selection DROSOPHILA Ancestral Function HUMAN New Function Positive selection Or relaxed

8 Co-ortholog Sub Functionalization A A A URBILATERIA Speciation Purifying Selection DROSOPHILA Ancestral Function A Duplication HUMAN Sub-Function HUMAN Sub-Function

9 Co-ortholog Neo Functionalization A A A URBILATERIA Speciation Purifying Selection DROSOPHILA Ancestral Function A2 Duplication HUMAN Ancestral Function HUMAN New Function Positive or relaxed selection Purifying Selection

10 Orthologs and paralogs A1/2A3 A A1A2A3 URBILATERIA A2A3 A1 HUMAN multigenic family A1A2A3 DROSOPHILA multigenic family A1, A2, B Paralogs Duplication Speciation

11 Orthology/ Paralogy Orthologs : 2 genes on different species Which come from a common ancestor and separated by a speciation event. Paralogs : 2 genes resulting from a duplication event in a genome. A1 HUMAN A1 DROSO A2 HUMAN A2 DROSO A3 HUMAN A3 DROSO Co-Orthologues Duplication Speciation A A1/2 A3

12 Many scientists are using the best BLAST hit to look for orthologous relationship How to evidence orthologous relationship ? … BUT! Many co orthologs can be present Problem with genomes that are not fully sequenced Or when gene loss occurred … AND … Even with Phylogenetic analysis : Bias must be corrected. Different methods must be used to reconstruct phylogenetic trees.

13 Co-ortholog Neo Functionalization A A A URBILATERIA Speciation Purifying Selection DROSOPHILA Ancestral Function A2 Duplication HUMAN Ancestral Function HUMAN New Function Purifying Selection

14

15 Constitutive proteasome β-subunits replacement after Interferon-γ stimulation Paralogue = duplicated gene Constitutive ProteasomeImmuno-Proteasome Paralogue replacement PSMB8 (LMP 7) PSMB9 (LMP 2) PSMB10 (LMP Z) PSMB5 PSMB6 PSMB7 New function (specialization) (Specific size protein or peptide degradation – used by MHC system) Only found in vertebrates Ancestral function : Protein degradation Present in all Metazoans, therefore present in Urbilateria (Metazoan ancestor).

16 Large scale gene duplication in vertebrate lineage Immuno Proteasome Proteasome Deutérostomata Protostomata Vertebrates Amniota (Human) Lisamphibia Chondrichthyes (shark) Cephalaspidomorphi (lamprey) Céphalochordata (amphioxus) Echinodermata Actinopterygii (Zebrafish) Urochordata (Ciona) Insects (Drosophila) Myxini (Hagfish ) Nématod (c. elegans) 751 > < T1 T2 360 PROTEASOME genes Pikaia

17

18 PHYLOGENOMICS = STUDY genes and genomes history. => HELP to find evidences for gene function.

19 Comparative genomics, concept of orthology and paralogy. What is phylogenomics? Structural and functional annotation. Structural annotation (deciphering of gene structure). Functional annotation (especially the use of phylogeny to decipher proteins function). Figenix. Genome evolution CASSIOPE

20 Une prédiction structurale correcte pour une analyse phylogénétique correcte.

21 Genome nucleotide-level Annotation : Mapping Finding genomic landmarks Gene finding and protein prediction Non-coding RNAs and regulatory regions Identifying repetitive elements Mapping segmental duplications Mapping variations (SNP, microsatellites, ….) Structural annotation

22 Available tools Ab initio : Genscan Fgenesh Genie Etc … Similarity Assisted : GenomeScan Twinscan Similarity Based : Genewise Sim4 Est2genome Based on statistical signals within the DNA. Coding propensity (hexamer signals). Splice Site Signals. Strengths : Easy and quick to run. Only need DNA as input. Weakness : High false positive rate. Extension of ab initio programs. Use sequence similarities to guide the predictions Strengths : Should be better than pure ab initio. Weakness : High false positive rate. Alignement programs that know about gene structure. Very accurate with strong sequence similarities Strengths : Accurate. Weakness : Need strong similarities, slow to run. Structural annotation State of the Art

23 Structural Annotation combining together a statistical and homologous approach (similarities with known proteins). The process automation resulted in an expert system based on biological inference rules using gene history and ab-initio program. « FIGENIX SOFTWARE PLATFORM » Annotating method Structural annotation

24 segment ADN protéine A (meilleur hit région 1) protéine B (meilleur hit région 2) région 1 région 2 hsp: A1hsp: A2hsp: A3 hsp: B1hsp:B2

25 DMSDADDDDAAD A ADA La « meilleure » solution sera par exemple + DAAA

26 Protein = amino acid sequence Gene = nucleotidic sequence mRNA = nucleotidic sequence P Transcription Traduction Experimentation Genome Sequence Protein Result : 100% Figenix : 87% Genscan : 31% HMMGene : 38% Sequence Protein Validation of structural annotation The platform performances were validated on standard dataset (HMR195) see Guigò et al, 2000; Rogic et al, 2001.

27 CORRECT PROTEIN PREDICTION Genscan figenix Hmmgen OVER PREDICTION Terminal (55) Internal (186) Initial (55) EXON TYPE PROGRAMS Accuracy versus Exon Type and Prediction The Mouse and Rat sequence from the HMR195 dataset was used on the human division of swissprot. Structural annotation

28 Biochemical and Biological process : Experimental approach : RNA Interference Tandem affinity purification and mass spectrometry In Silico Similarity … Functional annotation

29 Functional Annotation Based on phylogeny. It is inferred exclusively from experimentally annotated genes… Functional annotation

30 Small fraction correspond to known, well-characterized proteins. If the function is unknown : Phylogenetic analysis : Case 1 : an ortholog of experimentally known function is found. The function of the gene to annotate can be deduced. Case 2 : no ortholog of experimentally known function is found. The function of the gene to annotate will be deduced by the knowledge of the function of the closest paralog. In both cases the protein molecular function prediction by Bayesian Phylogenomics can be used (Engelhardt et al PLOS Computional biology 2005)

31 Orthologs, Paralogs with experimentally known function: how information can be found. Gene Ontology MedLine SwissProt Textual Information Analysis G.O. Standard GenBank Functional annotation

32 Functionality classification : Three GO categories Biological process – biological process to which the gene or gene product contributes. Cell growth and maintenance; pyrimidine metabolism; … Molecular function – biochemical activity, including specific binding to ligands or structures, of a gene product. Enzyme, transporter; Toll receptor ligand, … Cellular component – place in the cell where a gene product is active. Cytoplasm, ribosome, … Functional annotation Gene Ontology Classification

33 Tumor necrosis factor family Phylogenetic tree : Orthologs identification Trends in Immunology (July 2003) Atherosclerotic plaque formation ALPS - LPR/GLD Lymphoprolifer ativesyndrome

34 TNFSF1 TNFSF2 TNFSF3 TNFSF14 TNFSF6 TNFSF10 TNFSF11 TNFSF5 TNFSF13B TNFSF13 TNFSF12 ? TNFSF9 TNFSF8 TNFSF7 TNFSF18 TNFSF4 EDA-A1 EDA-A2 TNFSF15 LN, PP, GC, Tumorocidal activity T cell Homeostasis (death) T cell Homeostasis (death), CTL function, peripheral tolerance, T cell costimulation, chemotaxis LN, bone Homeostasis, mammary gland development T cell Homeostasis (survival?), CTL activation, peripheral tolerance? T cell homeostasis (survival), peripheral tolerance T cell activation? T cell activation and survival, CTL activity, Tumorocidal actvity? ? Tooth, hair, sweat gland formation Tooth, hair, skin formation? PP, GC, T cell Homeostasis (death) T cell transmigration and homeostasis (survival)? GC, B cell function, peripheral tolerance, T cell priming Tumorocidal activity, T cell function? Negative selection, autoimmunity ? ? T cell costimulation, negative selection? B cell Homeostasis B cell Homeostasis ? B cell Homeostasis TNFRSF1A TNFRSF1B TNFRSF3 TNFRSF14 TNFRSF6B TNFRSF11A TNFRSF5 TNFRSF11B TNFRSF17 TNFRSF9 TNFRSF8 TNFRSF6 TNFRSF10B TNFRSF10A TNFRSF10C TNFRSF10D TACI TNFRSF7 TNFRSF18 TNFRSF4 TNFRSF19 EDAR XEDAR TNFRSF21 RELT TNFRSF12 BR3 Molecular FunctionBiological Process Human TNF family Phylogenetic tree : Search for the closest Paralog Functional annotation Trends in Immunology (July 2003)

35 INFORMATISATION DES CONCEPTS

36 FIGENIX FIGENIX est une plate-forme logicielle multi-utilisateur dédiée aux taches d'annotation structurales et fonctionnelles: - Prédictions de gènes pour de grandes séquences d'ADN - Construction d'arbres phylogénétiques robustes - Détection automatique d'orthologues et de paralogues - Recherche automatique de données fonctionnelles sur les gènes disponibles à partir de bases de données « Web » - Filtrage et construction de bases de données protéiques (contigage d'EST) - Processus chainés (ex: Prédiction de gènes suivie d'études phylogénétiques pour chacun)

37 ETAPES DU PIPELINE de Phylogénie (1) Ensembl NR… Séquence protéique codée par un gène putatif BLAST + filtrage CLUSTAL W + purification + correction de biais Alignement multiple Conservation « repeats » monophylétiques Alignement « repeats » fusionnés Test de composition par TREEPuzzle pour élim séq trop divergentes Construction Arbre de la Vie PFAM Recherche de domaines par HmmPFAM Création domaine « FIGENIX » (correctDomains) Conservation alignement complet Existence « repeats »? N O Arbre de référence Enumération domaines

38 Détection « groupes de paralogie » + élim sites qui évol trop vites (« test de Gu ») Élim séq >30% « gaps » Élim domaines les + non congruents détectés par HomPart de PAUP Test de saturation NJParcimonieMaximum de vraisemblance Comparaison topologies par tests Templeton-Hasegawa Topologies congruentes? Arbre NJ Arbre consensus Détection orthologues I recherche de fonctions ETAPES DU PIPELINE de phylogénie (2) arbre Construction Arbre de la Vie Arbre de référence ON

39

40 EGEE Architecture de FIGENIX RDBMS Expert System Genomic Data Annotation Engine Web Server Persistence Layer Repository Load Balancing, Security,... Archiver Request Data exchange MGI Agent GO Agent EST Agent Functional Collector Agent - plate-forme Intranet/Extranet -architecture 3 tiers (interface web/ serveurs métier / base de données)

41 EGEE Résultats (1)

42 EGEE Résultats (2)

43 Gouret P, Vitiello V, Balandraud N, Gilles A, Pontarotti P, Danchin EG. FIGENIX: intelligent automation of genomic annotation: expertise integration in a new software platform. BMC Bioinformatics Aug 5;6:198 __________________________________________ Balandraud N, Gouret P, Danchin EGJ, Blanc M, Zinn D, Roudier J Pontarotti P A rigorous method for multigenic families' functional annotation: the peptidyl arginine deiminase (PADs) proteins family example BMC Genomics 2005, 6:153 doi: / Gouret PVitiello VBalandraud NGilles APontarotti PDanchin EG

44 Analysis using Figenix Vienne et al. Evolution of the proto-MHC ancestral region: more evidence for the plesiomorphic organisation of human chromosome 9q34 region. Immunogenetics (7): Danchin E, et al. The Major Histocompatibiliy Complex Origin Immunological reviews April;198(1): Danchin EGJ, Gouret P, Pontarotti P Universally conserved genes lost in mammals and vertebrates BMC evolutionary biology accepted. C Yu, et al Roles of co-option in the emergence of vertebrate adaptive immune system, insights from amphioxus submitted On line users : INSERM U624*, TAGC, UPRESA CNRS 6032*, Marseille, INRA Nancy, Institute Mol. Genet., Acad.Sci. Czech Republic, SunYat Sen University China, Uppsala University, Department of Neuroscience Sweden. * Draft papers

45 Comparative genomics, concept of orthology and paralogy. What is phylogenomics? Structural and functional annotation. Structural annotation (deciphering of gene structure). Functional annotation (especially the use of phylogeny to decipher proteins function). Figenix. Genome evolution CASSIOPE

46 C.A.S.S.I.O.P.E C.A.S.S.I.O.P.E: Clever Agent System for Synteny Inheritance and Other Phenomena in Evolution find conserved regions between genomes C.A.S.S.I.O.P.E decrease 50 times the working time

47 C.A.S.S.I.O.P.E.

48 Vers la reconstruction des génomes ancestraux

49

50 Etienne Danchin (AFMB) Collaboration Philippe Gouret Etienne Pardoux Vérane Vitiello Simona Grusea Nathalie Balandraud Alexandre Vienne Virginie Lopez Magali Lienart Pierre Pontarotti


Télécharger ppt "The use of the concepts of evolutionary biology in genome annotation."

Présentations similaires


Annonces Google