09 March 2017

Valérie de Crécy-Lagard

Thursday the 9th of March 2017
at 11 A.M.
in Amphi SAUVY

Ecole polytechnique

Dr. Valérie de Crécy-Lagard

Professor, Department of Microbiology and Cell Science & Genetics Institute, University of Florida
Distinguished Invited Professor, Ecole Polytechnique , Nous contacter

will give the following seminar:

Linking gene and function by integrated approaches: how to improve the poor annotation status of sequenced genomes

Identifying the function of every gene in all sequenced organisms is the major challenge of the post-genomic era and an obligate step for any systems biology approach. This objective is far from reached. By various estimates, at least 30-50% of the genes of any given organism are of unknown function, incorrectly annotated, or have only a generic annotation such as “ATPase”. Moreover, with ~8000 genomes sequenced and ~80,000 in the pipeline (http://www.genomeson.line.org), the numbers of unknown genes are increasing, and annotation errors are proliferating rapidly. For some gene families, 40% of the annotations are wrong. On the other side of the coin, there are still ~1,900 known enzyme activities for which no corresponding gene has been identified and these numbers are also increasing. This biochemical knowledge is yet to be captured in genome annotations. Using mainly a comparative genomic approach, we have linked gene and function for around 50 gene families related mainly to the fields of coenzyme metabolism, tRNA modification, protein modification and more recently metabolite repair. This approach integrates several types of data and uses filters, sieves, and associations to make predictions that can then be tested experimentally. An unknown gene’s function may thus be predicted from those of its associates: the ‘guilt by association’ principle. Associations that can be derived from whole genome datasets include: gene clustering, gene fusion events, phylogenetic occurrence profiles or signatures and shared regulatory sites. Post-genomic experimental sources such as protein interaction networks, gene expression profiles and phenomics data can also be used to find associations. In practice it is often ‘guilt by multiple association’ as genes can be associated in several ways, and analyzing more than one of these improves the accuracy of predictions. If these types of comparative genomic approaches were systematically used to annotate genomes, the quality of annotations would greatly improve. Also the experimentalists need to be more involved in the annotation process, as without expert knowledge the curation effort is beyond what annotation ressources such as Uniprot or NCBI can do alone.

For further informations :

Yves Mechulam - Tél : 01 69 33 48 85 - e. mail : Nous contacter