Medical genetics: Identification of hidden structural variants with long - read sequencing Alexander Hoischen Assistant Professor Immuno-Genomics Scientific Director Radboud Genomics Technology Center Departments of Human Genetics and Internal Medicine Radboud University Medical Center, Nijmegen, The Netherlands Contact: [email protected]www.radboudumc.nl/en/immunogenomics @ahoischen Engineer/PhD-student/Postdoc jobs in bioinformatics available!
45
Embed
Medical genetics: Identification of hidden structural ......Medical genetics: Identification of hidden structural variants with long-read sequencing Alexander Hoischen Assistant Professor
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Medical genetics:
Identification of hidden structural variants
with long-read sequencing
Alexander HoischenAssistant Professor Immuno-Genomics
Scientific Director Radboud Genomics Technology Center
Departments of Human Genetics and Internal MedicineRadboud University Medical Center,
This project is a collaborationbetween RUMC and PacBio Inc. in which reagents costs were shared.
Finding the answer in the genome
6 billion nucleotides46 chromosomes
2 people differ at >4 million positions
1 variant (mutation) can result in disease*
Genome sequencing: All variants in one experiment!
* With all variant types known to cause disease: karyotype aberrations, SVs, indels, SNVs
De novo mutations in ID• Intellectual disability (ID = IQ <70) is a model for severe, sporadic disorders
• Similar to autism spectrum disorder (ASD), epilepsy and other (neuro-) developmental disorders (NDDs)
• >60% of severe ID is caused by de novo mutations
• De novo mutation rate for SNVs is ca. 1.8x10-8, i.e. 30-100 de novo SNVs per
genome per generation (i.e. 1-2 per exome); • De novo mutation rate for large CNVs ca. 0.2 per generation• De novo mutation rate for SVs/large indels – largely unknown!
Nat Rev Genet. 2012 Jul 18;13(8):565-75
De novo mutations reduce genome complexity/wealth of variation greatlySevere, sporadic diseases offer opportunity to identify novel paradigms
De Vries et al. AJHG 2006& Vulto-van Silfhoutet al. Hum Mut 2013
New Genomic Technologies elucidate Intellectual Disability
Intellectual disability
42%3
11.6%1
27%2
Genomic microarray
Exome sequencing
2014Whole genome
sequencing
Single gene test~1-5%
% of ID patients with a
diagnosis
62%
±1,500 ID patients
Nodiagnosis
De novoSNVs
De novoSVsInherited
1Vulto-van Silfhout et al. Hum Mutat. 2013; 2De Ligt et al. NEJM. 20123Gilissen et al. Nature 2014
Majority is de novo!
2012
2008 38%
...hidden SVs – i.e. long reads?
Hidden de novo SVs in unsolved ID trios?
Patient cohort:
• A clinically well-characterized patient population with intellectual disability. These samples have been previously analyzed extensively:• CNV-microarrays (Vulto-van Silfhout et al. Hum Mut 2013)
• Whole exome sequencing (de Ligt et al. NEJM 2012)
• Short-read whole genome sequencing (Gilissen et al. Nature 2014)
• NovaSeq 30-40x whole genome sequencing
• All previous analyses failed to detect a causal variant
This study:
• Here we perform long-read SMRT sequencing on the Sequel platform in 5 such patient-parent trios
Hypothesis:
• Hidden, previously undetected, de novo SVs may explain disease
DNA• 5 trios
• Fresh gDNA from whole blood• Final libraries: fragment sizes of 40-70kb
Genomic DNAs Sheared DNAs
Michael Kwint
Our first data
5 first trios: • 1 trio with ~40x sequenced in Menlo Park (PacBio)• 4 trios with ~15x in Nijmegen
• Output:• On average 4.7Gb/SMRT cell*• 11.6kb average read length*
• All trios were also sequenced by short-read WGS (Complete Genomics, 80x & Illumina NovaSeq, 40x)
• BioNano mapping data has been generated for one trio
*Sequel 2.0 chemistry with mixture of 4.0 and 5.0 software.
New developments, latest WGS samples
Express library:
• Lower input, higher yield, less hands-on; amenable to automation
• 25 SMRT cells:• Average read length 19kb• Average output Gb per SMRT cell:
6.8Gb (Max. 9.2Gb)
Michael Kwint
Gigabase output – 234 SMRT cells
Sequel 2.0 chemistry with mixture of 4.0 and 5.0 software.
• Comparison with 40X WGS data (NovaSeq, called with Manta**): Using SURVIVOR*
SV distribution over samples
rare/private
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Co
un
t o
f SV
s
Individuals with respective SV
SVs > 20bp
How many are inherited?
Up to 74% in patient& parent (18% random)
Inversions in 15 genomes
0
50
100
150
200
250
Child Father Mother Child Father Mother Child Father Mother Child Father Mother Child Father Mother
Trio1 Trio2 Trio3 Trio4 Trio5
Called with PBSV developers version
Inversions – size distribution
0
50
100
150
200
250
300
0-100 100-200 200-300 300-400 400-500 >500
Inversions <1Kb, bin=100 bp
0
2
4
6
8
10
12
14
Inversions 1-10 Kb, bin=1 Kb
Next steps
• Are any candidate de novo SVs truly de novo?
• If they are, could they explain disease?• Genotype/phenotype recurrence in other cases?
• Do rare hidden SVs unmask recessive disease?
Extra ‘goodies’ of long-reads?
• Can we detect de novo SNVs?
• Phasing de novo mutations
• Phasing of candidate comp. het. variants
We can detect de novo SNVs
PacBio 40x child
PacBio 40x father
PacBio 40x mother
de novo c. 1685A>C; p.(His562Pro); TBKBP1
We can phase de novo SNVsde novo c. 1685A>C;
p.(His562Pro); TBKBP1
PacBio 40x child
PacBio 40x father
PacBio 40x mother
On same allele: Maternalheterozygous SNP
Phasing de novo mutations important to understand DNM biology
e.g.: Goldmann et al. Nat Genet 2016 & 2018
Compound heterozygous variants?
Allele 1Heterozygous variant
PacBio 10x childonly
Allele 2Heterozygous variant
Work in progress..
• Calling SV with other tools, e.g. sniffles• Calling single nucleotide variants• Assembly using the GRCh38 reference and full de novo assembly• Comparison with other technologies (10x genomics, bionano, etc.)
Summary
• Per genome: High quality coverage for 28Mb of previously uncovered sequence
• SMRT sequencing allows detection of ~25,000 SVs per genome
• Also: >33,000 indels (20-50bp) are called per genome
• Majority of those SVs/indels were not detected by short-read WGS
• Long reads to comprehend de novo mutation rates of indels/SVs – start to understand clinical relevance
PacBio in diagnostics
• Kornelia Neveling/Marcel Nelen:
Use long-range PCR for complex human regions:• HLA (5 amplicons), collab. with medical immunology• Pseudogenes• mtDNA• Repeat expansions
Launch of diagnostic PacBio assay for HLA: June 15th! ...others will follow in 2018
Full list of diagnostic tests: www.genomediagnosticsnijmegen.nl