AutoGRAPH: un serveur de comparaison de génomes - Application à l’identification de nouveaux gènes chez le chien Thomas DERRIEN & Christophe Hitte - Laboratory: CNRS - Institute of Genetics and Development of Rennes (France) - Team: Dog Genetics Rennes - 23 Oct 2007
45
Embed
AutoGRAPH: un serveur de comparaison de génomes ...videos.rennes.inria.fr/genopole/bioInfo2007/... · All Ensembl gene predictions are based on experimental evidences (UniProt -
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
AutoGRAPH: un serveur de comparaison de génomes -
Application à l’identification de nouveaux gènes chez le chien
Thomas DERRIEN & Christophe Hitte-
Laboratory: CNRS - Institute of Genetics and Development of Rennes (France)-
Team: Dog Genetics
Rennes - 23 Oct 2007
Context
Dog Radiation Hybrid (RH) map
- 2003 : >3200 markers (Guyon et al.)
- 2004 : >4200 markers FISH/RH (Breen et al.)
- 2005 : 10,000 genes (Hitte et al.)
Canis familiaris : 38 autosomes + XY
chr1
chr2
chr3
chr4
????Prédiction de gènes Comparaison et perspectivesIntroduction Multispecies mapMultiresources map
Rennes 23 oct 2007
Context
Dog sequence
- 2005: Optimization of the low-coverage sequence of the dog genome. (Hitte C. et al.)
- 2005: Framework for the high-coverage of the dog sequence assembly. (Lindblad-Toh K. et al.)
Prédiction de gènes Comparaison et perspectivesIntroduction Multispecies mapMultiresources map
Rennes 23 oct 2007
Dog Radiation Hybrid (RH) map
- 2003 : >3200 markers (Guyon et al.)
- 2004 : >4200 markers FISH/RH (Breen et al.)
- 2005 : 10,000 genes (Hitte et al.)
Comparative genomics x2
Prédiction de gènes Comparaison et perspectives
Multi-ressources and multi-species comparative genomics analyses.
=> Sequence vs RH map vs cytogenetic map...
=> Dog vs mammal sequences.
Introduction Multispecies mapMultiresources map
Rennes 23 oct 2007
RH markers localizations and sequence alignments from the dog sequence
assembly (CanFam 1.0).
rh markers/genes
sequence alignments
relation between sequence and RH markers
Dog sequenceDog RH map
CFA 9
- Compare gene order RH map and sequence assembly.
- Estimate the colinearity between the 2 resources.
Aims:
Multi-resources comparative maps:
Prédiction de gènes Comparaison et perspectivesIntroduction Multispecies mapMultiresources map
Sequence (CanFam1.0) for the dog chromosome 11 (CFA 11) :
- Strong colinearity between RH vs. CytoGenetic map.
- Inversion might be due to a problem in sequence assembly.
RH map Sequence
AutoGRAPH and Multi-resources datasets:
CytoGenetic
Results:- 8 discrepancies Sequence assembly / RH map- Cytogenetic experiments.- 4 have been solved in favor of the RH map.
Led to CanFam2.0 (Dec. 2005) (Lindblad-toh K)
Introduction Multispecies mapMultiresources map
Rennes 23 oct 2007
Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map
Multispecies Comparative Maps
Rennes 23 oct 2007
Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map
Rennes 23 oct 2007
- Identification of conserved sequences between species => functional sequences - Compare chromosomal organization between species => chromosomes rearrangements and evolution
Mutlispecies map: why?
Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map
Rennes 23 oct 2007
- “Comparative anchors” : conserved sequences between species....
- ortholog genes (orthology relationships 1:1) -
Mutlispecies map: How?
Ancestral genomegene A gene B
Genome 1gene A.1 gene B.1
Genome 2gene A.2 gene B.2
SPECIATION
Orthologs: gene A.1 et A.2
= homolog genes separated by a speciation event
(Carnivore)
(Felis catus)(Canis familiaris)
DUPLICATIONGenome 1
gene A.1 gene B.1’ gene B.1’’ Genome 2
gene A.2 gene B.2
Paralogs: gene B.1’ et B.1’’
= homolog genes separated by a duplication event
(Canis familiaris) (Felis catus)
Multi-species comparative maps:
Data sets: - Collect ortholog data sets from Ensembl v.42 (Biomart/MartView)
=> Orthologues features for 5 species of interest : Dog - Human - Chimp - Rat - Mouse.
Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map
- Compare genomes and construct multispecies comparative maps (synteny maps)
Testing the dog syntenic interval by sequence alignments
Dog gene predictions
1
2
3
Comparison with ensembl dog annotated genes4
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
412
389
348
285
185 new dog genes100 new orthology relationships
=
Rennes 23 oct 2007
Results validation:
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
80.5% have a peptide motif with InterProScan (InterPro Database)
48.6% (90/185) match with a canine ESTs (DB_EST)
> 40% protein identity with a reference ortholog.
Rennes 23 oct 2007
104 with no gene prediction: reasons?
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Mean Size (bp)Fraction of
GAP content(%)
Fraction of repeat content
(%)
Fraction of GC content
(%)
Fraction of gene in telomeric region (%)
412 None dog interval predicted
23 None dog interval predicted
389 347,401 2.23 37.33 46.23 44.4
41 231,900 5.96 35.24 48.86 53.6
348 361,008 1.79 37.57 45.92 43.4
63 287,821 3.35 35.36 48.09 46.0
285 377,187 1.44 38.06 45.43 42.8
375,929 1.3233 35.8865 45.63 31.0
Step 1
Step 2
Step 3
Dog Consensual intervals definition
Overlap Reference transcripts vs. dog consensual interval
Gene prediction in dog consensual interval
104 without dog prediction
Rennes 23 oct 2007
104 with no gene prediction: reasons?
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Mean Size (bp)Fraction of
GAP content(%)
Fraction of repeat content
(%)
Fraction of GC content
(%)
Fraction of gene in telomeric region (%)
412 None dog interval predicted
23 None dog interval predicted
389 347,401 2.23 37.33 46.23 44.4
41 231,900 5.96 35.24 48.86 53.6
348 361,008 1.79 37.57 45.92 43.4
63 287,821 3.35 35.36 48.09 46.0
285 377,187 1.44 38.06 45.43 42.8
Random set 1000 375,929 1.3233 35.8865 45.63 31.0
Step 1
Step 2
Step 3
Rennes 23 oct 2007
104 with no gene prediction ?
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Structural problems (Sequence Quality):
- higher GAP content (> 10% for 12 genes)
- Protein identity < 20% [3/4] :- Smaller sizes of the dog intervals- Higher rate of GC content + telomeric localization - No EST validation - Biological function prone to “Gain and Loss” (immunity, olfaction = adaptation to environment, GOTM analysis)
Rennes 23 oct 2007
92 genes
Evolutionnary scenario : Loss of dog genes
92 with no gene prediction? the example of the PNMA Family: RAP (Dufayard et al)
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Reconciliate TreeGene Tree
Conclusions - Directions:
- Analysis of the evolution rate of these dog sequence compared to reference sequence
- Other orphan-gene sets & Other species set (using Cat, Elephant...)
- Using the gene adjacency + in-depth gene prediction for refining gene family orthology :
1:0 orthology + n:m orthology
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Rennes 23 oct 2007
COILs approach : - Multispecies - Multiple set of 1:1:
- complementary contributions of different genomes
- short interval = short space search (350 kb) : - reduces the cost of detecting false-positives - divergent sequence match facilitated- background noise is significantly reduced
Acknowledgements:
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Francis GalibertCatherine André Christophe Hitte
Rennes Dog Genetics-Genomics Team
Rennes 23 oct 2007(Sophie Roucan, Hugues Leroy, Anthony Assi, Olivier Filangi...)