La présentation est en train de télécharger. S'il vous plaît, attendez

La présentation est en train de télécharger. S'il vous plaît, attendez

The use of the concepts of evolutionary biology in genome (biological) annotation. Pierre Pontarotti EA 3781 Evolution Biologique

Présentations similaires


Présentation au sujet: "The use of the concepts of evolutionary biology in genome (biological) annotation. Pierre Pontarotti EA 3781 Evolution Biologique"— Transcription de la présentation:

1 The use of the concepts of evolutionary biology in genome (biological) annotation. Pierre Pontarotti EA 3781 Evolution Biologique

2

3 Somes Concepts in evolutionary biology Use of the concepts for Gene Structural and functional annotation. Informatisation Others concepts

4 Metazoan Phylogeny ( Adoutte et al. 2000)

5 URBILATERIA : The hypothetical Metazoan Ancestor Geoffroy de St Hilaire during XIX th Century URBILATERIA Genome evolved by the fixation of : Nucleotide substitution Gene loss Genic duplication Gene duplication Genome region duplication Whole genome duplication Chromosomal rearrangement

6 Large scale gene duplication in vertebrate lineage Deutérostomata Protostomata Vertebrates Amniota (Human) Lisamphibia Chondrichthyes (shark) Cephalaspidomorphi (lamprey) Céphalochordata (amphioxus) Echinodermata Actinopterygii (Zebrafish) Urochordata (Ciona) Insects (Drosophila) Myxini (Hagfish ) Nématod (c. elegans) 751 > < T1 T genes Pikaia

7 I ABCDABCD Population : POP 1 POP 1 split in 2 autonomous populations ABCDABCD ABCDABCD POP 1A POP 1B Allele A fixation and accumulation of new mutations A1 A2 B1 B2 Allele B fixation and accumulation of new mutations From alleles to orthologs Points mutations

8 From alleles to orthologs points mutations POP 1A POP 1B A1 A2 A1 A2 B1 B2 B1 B2 POP 1A1 POP 1A2 POP 1B1 POP 1B2 A11 A12 A21 A22 B11 B12 B21 B22 POP 1B split in 2 autonomous populations Allele A1 fixation and accumulation of new mutations POP 1A split in 2 autonomous populations Allele A2 fixation and accumulation of new mutations Allele B1 fixation and accumulation of new mutations Allele B2 fixation and accumulation of new mutations

9 From alleles to orthologs A.1.1 A.1.2 A.2.1 A.2.2 B.1.1 B.1.2 B.2.1 B.2.2 Alleles Orthologs

10 Orthologs and paralogs A1/2A3 A A1A2A3 URBILATERIA A2A3 A1 HUMAN multigenic family A1A2A3 DROSOPHILA multigenic family A1, A2, B Paralogs Duplication Speciation

11 Orthology/ Paralogy Orthologs : 2 genes on different species Which come from a common ancestor and separated by a speciation event. Paralogs : 2 genes resulting from a duplication event in a genome. A1 HUMAN A1 DROSO A2 HUMAN A2 DROSO A3 HUMAN A3 DROSO Co-Orthologues Duplication Speciation A A1/2 A3

12 From Gene History To Gene Function

13 Orhologs under purifying selection A A URBILATERIA Speciation Purifying Selection DROSOPHILA Ancestral Function HUMAN Ancestral Function Purifying Selection A

14 Ortholog functional switch A A2 A URBILATERIA Speciation Purifying Selection DROSOPHILA Ancestral Function HUMAN New Function ? Positive selection Or relaxed

15 Co-ortholog Sub Functionalization A A A URBILATERIA Speciation Purifying Selection DROSOPHILA Ancestral Function A Duplication HUMAN Sub-Function HUMAN Sub-Function

16 Co-ortholog Neo Functionalization A A A URBILATERIA Speciation Purifying Selection DROSOPHILA Ancestral Function A2 Duplication HUMAN Ancestral Function HUMAN New Function Positive or relaxed selection Purifying Selection

17 Orthology /paralogy information is important for functional inference (forget for species with high level of horizontal transfer)

18 Orthology/ Paralogy Orthologs : 2 genes on different species Which come from a common ancestor and separated by a speciation event. Paralogs : 2 genes resulting from a duplication event in a genome. A1 HUMAN A1 DROSO A2 HUMAN A2 DROSO A3 HUMAN A3 DROSO Co-Orthologues Duplication Speciation A A1/2 A3

19 Many scientists are using the best BLAST hit to look for orthologous relationship A Warning that will be discussed by other intervenants … BUT! Many co orthologs can be present Problem with genomes that are not fully sequenced Or when gene loss occurred AND Even with Phylogenetic analysis : Bias must be corrected. A phylogenetic tree is hypothetical

20 Evolutionary shift (due to positive or relaxed selection) could be linked to functional shift. See N Galtier and A Levasseur talks.

21

22 Detection of Positive selection and functional shift

23

24 Detection of Evolutionary constraint relaxation and functional shift

25 Co-ortholog Neo Functionalization A A A URBILATERIA Speciation Purifying Selection DROSOPHILA Ancestral Function A2 Duplication HUMAN Ancestral Function HUMAN New Function Purifying Selection

26

27 Constitutive proteasome β-subunits replacement after Interferon-γ stimulation Paralogue = duplicated gene Constitutive ProteasomeImmuno-Proteasome Paralogue replacement PSMB8 (LMP 7) PSMB9 (LMP 2) PSMB10 (LMP Z) PSMB5 PSMB6 PSMB7 New function (specialization) (Specific size protein or peptide degradation – used by MHC system) Only found in vertebrates Ancestral function : Protein degradation Present in all Metazoans, therefore present in Urbilateria (Metazoan ancestor).

28 Large scale gene duplication in vertebrate lineage Immuno Proteasome Proteasome Deutérostomata Protostomata Vertebrates Amniota (Human) Lisamphibia Chondrichthyes (shark) Cephalaspidomorphi (lamprey) Céphalochordata (amphioxus) Echinodermata Actinopterygii (Zebrafish) Urochordata (Ciona) Insects (Drosophila) Myxini (Hagfish ) Nématod (c. elegans) 751 > < PROTEASOME

29

30 The study genes and genomes HISTORY. Help to find evidences for gene FUNCTION.

31 Concepts in evolutionary biology Use of the concepts for Structural and functional annotation. Structural annotation (deciphering of gene structure). Functional annotation (especially the use of phylogeny to decipher proteins function)..

32 Biochemical and Biological process : Experimental approach : RNA Interference Tandem affinity purification and mass spectrometry In Silico Functional annotation

33 Functional Annotation Based on phylogeny. from experimentally annotated genes… Functional annotation

34 INTERLUDE FUNCTION???? A complex concept;

35 Function Prediction Using orthology information (done) Using the evolutionary shift information (in progress) Function prediction by Integrative phylogenomics ( Engelhardt et al PLOS Computional biology 2005 ) (in progress)

36 Homologs with experimentally known function: how information can be found. Gene Ontology MedLine SwissProt Textual Information Analysis G.O. Standard GenBank Functional annotation

37 Biological process – biological process to which the gene or gene product contributes. Cell growth and maintenance; pyrimidine metabolism; … Molecular function – biochemical activity, including specific binding to ligands or structures, of a gene product. Enzyme, transporter; Toll receptor ligand, … Cellular component – place in the cell where a gene product is active. Cytoplasm, ribosome, …. Plus others classifications to develop: In particular evolutionary based ontology Functional annotation Gene Ontology Classification

38

39 Small fraction correspond to known, well-characterized proteins. If the function is unknown : Phylogenetic analysis : Functional prediction: Using orthology information Using the evolutionary shift information by integrative Phylogenomics

40 Tumor necrosis factor family Phylogenetic tree : Orthologs identification Trends in Immunology (July 2003) Atherosclerotic plaque formation ALPS - LPR/GLD Lymphoproliferative syndrome

41 TNFSF1 TNFSF2 TNFSF3 TNFSF14 TNFSF6 TNFSF10 TNFSF11 TNFSF5 TNFSF13B TNFSF13 TNFSF12 ? TNFSF9 TNFSF8 TNFSF7 TNFSF18 TNFSF4 EDA-A1 EDA-A2 TNFSF15 LN, PP, GC, Tumorocidal activity T cell Homeostasis (death) T cell Homeostasis (death), CTL function, peripheral tolerance, T cell costimulation, chemotaxis LN, bone Homeostasis, mammary gland development T cell Homeostasis (survival?), CTL activation, peripheral tolerance? T cell homeostasis (survival), peripheral tolerance T cell activation? T cell activation and survival, CTL activity, Tumorocidal actvity? ? Tooth, hair, sweat gland formation Tooth, hair, skin formation? PP, GC, T cell Homeostasis (death) T cell transmigration and homeostasis (survival)? GC, B cell function, peripheral tolerance, T cell priming Tumorocidal activity, T cell function? Negative selection, autoimmunity ? ? T cell costimulation, negative selection? B cell Homeostasis B cell Homeostasis ? B cell Homeostasis TNFRSF1A TNFRSF1B TNFRSF3 TNFRSF14 TNFRSF6B TNFRSF11A TNFRSF5 TNFRSF11B TNFRSF17 TNFRSF9 TNFRSF8 TNFRSF6 TNFRSF10B TNFRSF10A TNFRSF10C TNFRSF10D TACI TNFRSF7 TNFRSF18 TNFRSF4 TNFRSF19 EDAR XEDAR TNFRSF21 RELT TNFRSF12 BR3 Molecular FunctionBiological Process Human TNF family Phylogenetic tree : Search for the closest Paralog Functional annotation Trends in Immunology (July 2003)

42 Small fraction correspond to known, well-characterized proteins. If the function is unknown : Phylogenetic analysis : Gene function prediction: Using orthology information Using the evolutionary shift information ( see Levasseur talk) by integrative Phylogenomics

43

44 evolutionary biology concepts for genome annotation Further reading Concepts Levasseur A, Danchin E, Orlando L, Bailly X, Pontarotti P. Conceptual bases for quantifying the role of the environment on genomes evolution: the participation of positive selection and neutral evolution Biological review in press Danchin E.G.J, et al. The Major Histocompatibiliy Complex Origin Immunological reviews. 2004;198(1): Concepts for applied evolution Danchin E.G.J, Levasseur A, Lopez-Rascol V, Gouret P, Pontarotti P. The use of evolutionary biology concepts for genome annotation. J. Exp. Zoology Part B: Mol. and Dev. Evol. 2006

45

46 Informatisation des concepts et connaissances Phylogénie Détection des gènes orthologues et paralogues Détection de changements évolutifs (en cours) Prévision de fonctions

47 FIGENIX est une plate-forme logicielle multi-utilisateur dédiée aux taches d'annotation structurales et fonctionnelles: - Prédictions de gènes pour de grandes séquences d'ADN - Construction d'arbres phylogénétiques robustes - Détection automatique d'orthologues et de paralogues - Recherche automatique de données fonctionnelles sur les gènes disponibles à partir de bases de données « Web » - Filtrage et construction de bases de données protéiques (contigage d'EST) - Processus chainés (ex: Prédiction de gènes suivie d'études phylogénétiques pour chacun)

48 ETAPES DU PIPELINE de Phylogénie (1) Ensembl NR… Séquence protéique codée par un gène putatif BLAST + filtrage CLUSTAL W + purification + correction de biais Alignement multiple Conservation « repeats » monophylétiques Alignement « repeats » fusionnés Test de composition par TREEPuzzle pour élim séq trop divergentes Construction Arbre de la Vie PFAM Recherche de domaines par HmmPFAM Création domaine « FIGENIX » (correctDomains) Conservation alignement complet Existence « repeats »? N O Arbre de référence Enumération domaines

49 Détection « groupes de paralogie » + élim sites qui évol trop vites (« test de Gu ») Élim séq >30% « gaps » Élim domaines les + non congruents détectés par HomPart de PAUP Test de saturation NJParcimonieMaximum de vraisemblance Comparaison topologies par tests Templeton-Hasegawa Topologies congruentes? Arbre NJ Arbre consensus Détection orthologues I recherche de fonctions ETAPES DU PIPELINE de phylogénie (2) arbre Construction Arbre de la Vie Arbre de référence ON

50

51 Architecture de FIGENIX RDBMS Expert System Genomic Data Annotation Engine Web Server Persistence Layer Repository Load Balancing, Security,... Archiver Request Data exchange MGI Agent GO Agent EST Agent Functional Collector Agent - plate-forme Intranet/Extranet -architecture 3 tiers (interface web/ serveurs métier / base de données)

52 1)

53

54 Further reading: about concepts informatisation Gouret et al.FIGENIX: intelligent automation of genomic annotation: expertise integration in a new software platform. BMC Bioinformatics Aug 5;6:198 Balandraud et al. A rigorous method for multigenic families' functional annotation: the peptidyl arginine deiminase (PADs) proteins family example BMC Genomics 2005, 6:153

55 Further reading on FIGENIX utilization Danchin et al. Eleven ancestral gene families lost in mammals and vertebrates while otherwise universally conserved in animals BMC Evolutionary Biology 2006, 6:5 Paillisson et al. Bromodomain testis-specific protein is expressed in mouse oocyte and evolves faster than its ubiquitously expressed paralogs BRD2, -3 and -4. Genomics Levasseur et al Tracking the evolutionary and functional shifts connection: the lipase-esterase example.BMC evolutionary biology 2006

56

57 Structural annotation

58 Genome nucleotide-level Annotation : Mapping Finding genomic landmarks Gene finding and protein prediction Non-coding RNAs and regulatory regions Identifying repetitive elements Mapping segmental duplications Mapping variations (SNP, microsatellites, ….) Structural annotation

59 Available tools Ab initio : Genscan Fgenesh Genie Etc … Similarity Based : Genewise Sim4 Est2genome Figenix Based on statistical signals within the DNA. Coding propensity (hexamer signals). Splice Site Signals. Strengths : Easy and quick to run. Only need DNA as input. Weakness : High false positive rate. Alignement programs that know about gene structure. Very accurate with strong sequence similarities Strengths : Accurate. Weakness : Need strong similarities, slow to run. Structural annotation State of the Art

60 Structural Annotation combining together a statistical and homologous approach (similarities with known proteins). The process automation resulted in an expert system based on biological inference rules using gene history and ab-initio program. But yet not completely evolutionary biology based « FIGENIX SOFTWARE PLATFORM » Annotating method Structural annotation

61 segment ADN protéine A (meilleur hit région 1) protéine B (meilleur hit région 2) région 1 région 2 hsp: A1hsp: A2hsp: A3 hsp: B1hsp:B2

62 DMSDADDDDAAD A ADA + DAAA

63 Protein = amino acid sequence Gene = nucleotidic sequence mRNA = nucleotidic sequence P Transcription Traduction Figenix : 87% Genscan : 31% HMMGene : 38% Sequence Protein Validation of structural annotation The platform performances were validated on standard dataset (HMR195) see Guigò et al, 2000; Rogic et al, 2001.

64 CORRECT PROTEIN PREDICTION Genscan Figenix Hmmgen OVER PREDICTION Terminal (55) Internal (186) Initial (55) EXON TYPE PROGRAMS Accuracy versus Exon Type and Prediction The Mouse and Rat sequence from the HMR195 dataset was used on the human division of swissprot. Structural annotation

65 The next step for structural annotation : Is to take into account the gene evolutionary history

66 Concepts, modélisation, informatisation, bio-annalyse Structural annotation (deciphering of gene structure). Functional annotation (especially the use of phylogeny to decipher proteins function).

67 Next Phylogenomics (genome Evolution) Phylopostgenomics - phylotranscriptomics - phylointeractomics ………..

68

69 Connaissances/concepts Observation : il existe des régions de syntenies conservées entre espèce. Explication /concept : ces régions proviennent dune région ancestrale qui a évoluée de manière indépendante après spéciation dans chaque lignée, mais pas assez pour perdre toute trace de conservation. A partir de cette connaissance et de cette prédiction que découle un ensemble de réflexion qui indique que les analyses des synténies conservées et la reconstruction de régions ancestrales sont intéressantes, dun point de vu appliqué : assistance au clonage positionnel et dun point de vue conceptuel : compréhension de lévolution des génomes. Formalisation de la question biologique Comment mettre en évidence les synténies conservées ? Cest aussi à ce moment que la conceptualisation prend toute sa place Si les synténies conservées proviennent vraiment dune région ancestrale, les gènes dans ces régions doivent avoir ll faut donc avoir des programmes qui soient capables de mettre en évidence les relations dorthologie, et de trouver des clusters significatifs. Reconstruction des génomes (translocation, fusion inversion… pondération de ces événements) 1/ des relations dorthologie 2/ le regroupement des gènes orthologues doit être improbable sous lhypothèse du hasard (le regroupement doit être significatif).

70 Modélisation mathématique Il faut modéliser dans le cas ou les outils informatiques nexistent pas ou dont le formalisme biologique nest pas correct. Ce qui est le cas pour les tests statistiques de regroupement (la taille des famille de in-paralogues en particulier). Modéliser la reconstruction des génomes Formalisation informatique 1)Algorithmes Tests statistiques Modélisation reconstruction ancestrale des génomes 2) Intégration avec les autres outils « informatique » dans le système informatique (CASSIOPE)

71 Bioanalyse Recherche automatique de synténies conservées. Reconstruction et évolution de régions génomique Nouvelle connaissance et nouveaux concepts Application directe : aide au clonage positionnel Concepts/connaissance: Mise en évidence de regroupement fonctionnel

72 C.A.S.S.I.O.P.E C.A.S.S.I.O.P.E: Clever Agent System for Synteny Inheritance and Other Phenomena in Evolution find conserved regions between genomes For more info see Virginie Lopez Rascol

73 C.A.S.S.I.O.P.E.

74 Toward the ancestral genome reconstruction

75

76 C.A.S.S.I.O.P.E Bioanalyse Recherche automatique de synténies conservées. Reconstruction et évolution de régions génomique Nouvelle connaissance et nouveaux concepts Application directe : aide au clonage positionnel Concepts/connaissance: Mise en évidence de regroupement fonctionnel

77 Collaborateurs Projet MEG* ( Modèlisation Evolution Génome) Nathalie Balandraud Etienne Danchin Philippe Gouret Vérane Vitiello Math/bio Julien Berestycki* Simona Grusea* Stéphanie Léocard* Valda Limic * Laure Rigal* Etienne Pardoux* Info/bio Olivier Chabrol* Virginie Lopez* Cedric Notredame* Concepts et bio-analyse Roxane Barthelemy * Jean, Paul Casanova* Elodie Darbo* Anthony Levasseur* Eric Faure* Pierre Pontarotti*

78 Open Discussion Phylo postgenomic


Télécharger ppt "The use of the concepts of evolutionary biology in genome (biological) annotation. Pierre Pontarotti EA 3781 Evolution Biologique"

Présentations similaires


Annonces Google