The Transforming Genetic Medicine Initiative (TGMI)
£5.3M 4 year programme funded by Wellcome
Designing, Developing and Delivering Integrated Foundations
for Genetic Medicine
Nazneen Rahman
Paul Flicek Caroline Wright
Sian Ellard David Fitzpatrick
Ewan Birney
Fiona Cunningham
Graeme BlackHelen Firth
Gerton Lunter
Matthew Hurles
Patrick Chinnery
TGMI PIs
National and international collaborations
Transforming genetic medicine
Must ensure the wealth of existing medical genetic knowledge informs our use of current and future technology, if we are to do more
right and less wrong.
‘The past is never dead, it’s not even past.’ William Faulkner
GENOME PHENOME
Genomic medicine
GENOME PHENOME
TGMI is focussed on genes
GENE ‘MENDELIAN’DISORDERS
Genetic medicine 1990-2010
GENE ‘MENDELIAN’DISORDERS
Prior to NGS, genetic medicine was phenotype-driven. Meticulous phenotyping used to decide which genes to test.
Genetic medicine 2020
GENE ‘MENDELIAN’DISORDERS
With NGS, genetic medicine becomes genotype-driven and can potentially be large-scale and routine.
Genetic medicine 2010-2016
GENE ‘MENDELIAN’DISORDERS
With NGS, genetic medicine can be genotype-driven. But as the processes are not well formed phenotyping often used (often incorrectly) to decide which data is ‘relevant’.
TGMI aims to undertake conceptual, foundational research to deliver
practical solutions to make genetic medicine work
TGMI Aims
1. To provide robust, comprehensive information on links between genes and human disease in a user-friendly interface.
2. To develop standardised frameworks for consistent clinical annotation and reporting of gene variation.
3. To develop approaches to deliver fast, automated, high-throughput, large-scale variant interpretation.
4. To develop and validate flexible, multipurpose analytical processes to maximise clinical and research utilities of genetic testing.
GENESGene 1
Gene 20,000
For each gene ask qn:Are germline mutations known to ‘cause’ a human disorder
YES – red (should not become blue) NO – blue (some will become red)All others – grey (further work to classify to red or blue)
Gene Disease Map
DISEASESMany complexitiesat phenotype level.
‘Mendelian’ diseases
Why this is needed
Q: How many disease genes are there?A: Depends who and how you ask.
OMIM: ‘genes phenotype-causing mutation’ = 3416‘phenotype description, molecular basis known’ = 4482
BioMart: Ensembl Genes: + Swiss Prot IDs and OMIM phenotype = 3268Gene Cards: ‘disease genes’ = 9578
TGMI Aims
1. To provide robust, comprehensive information on links between genes and human disease in a user-friendly interface.
2. To develop standardised frameworks for consistent clinical annotation and reporting of gene variation.
3. To develop approaches to deliver fast, automated, high-throughput, large-scale variant interpretation.
4. To develop and validate flexible, multipurpose analytical processes to maximise clinical and research utilities of genetic testing.
TGMI – Aim 2
2.1 – Defining a Clinical Annotation Reference System (CARS)2.2 – Defining a Clinical Sequencing Notation (CSN)2.3 – Development and distribution of conversion tools
Why this is needed
• In the clinic and research settings there is huge variability in annotation of genetic variation at every level (gene name, transcript choice, variant annotation etc).
• This inevitably compromises data integration, and clinical utility and fosters errors and harms.
The CARS
• The Clinical Annotation Reference System (CARS) encompasses the set of protein-coding genes, the set of reference transcripts and proteins corresponding to the genes, and a Clinical Sequencing Notation (CSN) for annotation of variation according to the sequences.
• Defined against the reference human genome.
TGMI gene set working criteria
• Has an HGNC ID• Has an annotated start (which can be non-
methionine)• Has an annotated stop• Occurs on chromosomes 1-22, X, Y, or MT• Has a gene and transcript biotype of “protein-
coding” from Ensembl (release 84)
The TGMI gene working set is comprised of 18,885 genes
Clinical reference transcripts
1. Sequences must be based on the reference human genome.
2. The system must allow flexible iteration without compromising stability or clarity of sequence selection.
3. Reference transcripts must have durability, i.e. historical sequences used for clinical reporting that are subsequently superseded must remain available.
4. The reference transcript set should include as few sequences as possible (one per gene for most genes) but as many as required.
5. The reference transcript set must be easily available and usable to encourage universal uptake.
CSN – Clinical Sequencing Notation
• Once transcript is selected, the observed variant must be named according to its relative difference from the reference.
• Fixed, standardised, automatic process for annotation of sequence variation
• Consistent with historical HGVS guidelines
TGMI Aims
1. To provide robust, comprehensive information on links between genes and human disease in a user-friendly interface.
2. To develop standardised frameworks for consistent clinical annotation and reporting of gene variation.
3. To develop approaches to deliver fast, automated, high-throughput, large-scale variant interpretation.
4. To develop and validate flexible, multipurpose analytical processes to maximise clinical and research utilities of genetic testing.
Traditional interpretation process
1. Leveraging generic predictors, e.g. evolutionary conservation, protein structural features, impact on splicing etc to predict the functional consequences of individual variants (done in lab).
2. Leveraging expert assessment of clinical impact through disease and gene specific knowledge about the phenotype, genetic architecture, genotype-phenotype correlations, personal and family history and variant segregation etc (done in clinic).
Interpretation requirements
1. High-throughput + large volume2. Fast turnaround3. Integrated into NGS pipelines4. Integrated into clinical pipelines5. Intelligible and usable by non-expert/patients
Variant Phenotype
Variant Phenotype
Frequency of phenotype
Mechanism of pathogenicity
Inheritance pattern
Attribution of gene for phenotype
Penetrance of gene for phenotype
Population variation
Variability of gene
Gene structure/function
Much useful information can be utilised and automated so that the required manual curation can be focussed on the ~2-5% of variants where it is required.
TGMI Aims
1. To provide robust, comprehensive information on links between genes and human disease in a user-friendly interface.
2. To develop standardised frameworks for consistent clinical annotation and reporting of gene variation.
3. To develop approaches to deliver fast, automated, high-throughput, large-scale variant interpretation.
4. To develop and validate flexible, multipurpose analytical processes to maximise clinical and research utilities of genetic testing.
OpEx (Optimised Exome) pipeline
https://github.com/RahmanTeam/OpEx
www.icr.ac.uk/opex
OpEx pipeline•Simple •To clinical standards•High-quality indel calling
http://icr.ac.uk/opexhttps://github.com/RahmanTeam/OpEx
• Comparisons with ExAC• Comparisons with clinical
exome pipelines
All input is welcome!
• The TGMI is keen to hear from and engage with anyone interested in our aims. We are grateful for any input into what is needed in genetic medicine, how those needs are best met, and whether our solutions work.
• How to stay in touch:
– http://theTGMI.org – [email protected]– Weekly blog – Twitter: @theTGMI