Next Generation Sequencing Activities at NIST NGS Workshop Mid-Atlantic Association of Forensic Science May 19, 2014 Katherine Butler Gettings, Ph.D. Research Biologist, Applied Genetics Group National Institute of Standards and Technology
Next Generation Sequencing Activities at NIST
NGS Workshop Mid-Atlantic Association of Forensic Science
May 19, 2014
Katherine Butler Gettings, Ph.D. Research Biologist, Applied Genetics Group
National Institute of Standards and Technology
I will mention commercial STR kit names and information, but I am in no way attempting to endorse any specific products.
NIST Disclaimer: Certain commercial equipment, instruments and materials are identified in order to specify experimental procedures as completely as possible. In no case does such identification imply a recommendation or it imply that any of the materials, instruments or equipment identified are necessarily the best available for the purpose.
Points of view are mine and do not necessarily represent the official position of the National
Institute of Standards and Technology or the U.S. Department of Justice. Our group receives or
has received funding from the FBI Laboratory and the National Institute of Justice.
Disclaimer
Mid-Atlantic Association of Forensic Science Next Generation Sequencing Workshop
May 19, 2014
Outline
Background
NGS of Forensic DNA markers
– STRs
– mtDNA
– Single Nucleotide Polymorphisms (SNPs)
NGS on the PGM- Ampliseq workflow
Experimental data
– HID-Ion Ampliseq Identity Panel
– HID-Ion Ampliseq Ancestry Panel
What’s in a name???
Next-generation sequencing
Massively parallel sequencing
Second-generation sequencing
Third-generation sequencing
NGS
High-throughput sequencing
Next-generation genomics
Whole-genome sequencing
Parallel Sequencing ‘A million capillary Sanger sequencer’
Parallel Sequencing
‘A million capillary Sanger sequencer’
• Clonal vs population amplification
• Shorter reads (Range 75 to 400)
• Errors are more ‘detectable’
• High coverage 100 – 1000 - 10,000x
• Rely more on informatics to assemble millions of short reads
http://electronicsbyexamples.blogspot.com/2013/03/milestones-in-digital-electronics.html
http://blog.trentonsystems.com/moores-law-pushing-processor-technology-to-14-nanometers/
Forensic NGS Applications
• Short Tandem Repeats (STRs)
– PCR fragment-length polymorphisms
• Mitochondrial DNA (mtDNA)
– Sanger sequencing
• Single Nucleotide Polymorphisms (SNPs)
Capillary electrophoresis electropherogram
NGS of Forensic STR Loci
NGS of Forensic STR Loci
D8S1179
D21S11
D7S820
CSF1PO
D3S1358
TH01
D13S317
D16S539
D2S1338
D19S433
vWA
TPOX
D18S51
A
D5S818
FGA
Sizes of largest observed alleles (not including primer binding/flanking region)
NGS of Forensic STR Loci
D8S1179
D21S11
D7S820
CSF1PO
D3S1358
TH01
D13S317
D16S539
vWA
TPOX
D18S51
A
D5S818
FGA
Sizes of largest observed alleles (not including primer binding/flanking region) D2S1338
D19S433
NGS of Forensic STR Loci
D21S11: Individual appears homozygous by CE but different sequencing composition shown with NGS.
D21S11: Individual appears homozygous by CE but different sequencing composition shown with NGS.
NGS of Forensic STR Loci
D21S11: Individuals appear homozygous by CE but different sequencing composition shown with NGS.
NGS of Forensic STR Loci
NGS of Forensic STR Loci
NGS of Forensic STR Loci
NGS of Forensic STR Loci
NGS of Forensic STR Loci
STR sequence data from 2391c, Component A:
• Truseq Library Prep
• MiSeq sequencing
• STRait Razor data parsing
• R script (NIST) data viewer
Forensic DNA Markers
• Short Tandem Repeats (STRs)
– PCR fragment-length polymorphisms
• Mitochondrial DNA (mtDNA)
– Sanger sequencing
• Single Nucleotide Polymorphisms (SNPs)
http://www.orchidcellmark.ca http://remf.dartmouth.edu/images/mammalianLungTEM/source/8.html
Mitochondria
www.wikipedia.org
Maternally inherited Circular genome
• Increase in variants by whole genome analysis
(based on analysis of 3 SRM samples)
mtDNA Information
Over ten times more variable
per site Three times more polymorphisms than HV alone
http://www.sas.upenn.edu/~tgschurr/labwork/labwork_text.html
Current Method
• Sequence based on chromatogram
• Consensus of one forward and one reverse
NGS
• Sequence based on thousands of individual reads
• Improved sensitivity: – Mixture detection
– Low level heteroplasmy
mtDNA Information
Current Method
• Minor peaks may not be reproducible
• SRM 2392 9947a, 1393 G/A heteroplasmy
NGS
• More consistent detection of minor genotypes
• Validation important – Variant calling thresholds
– Characterizing noise
mtDNA Information
Characterization of SRM 2392 and 2392-I Mitochondrial genome sequencing standard
Detection of low level heteroplasmy
0.0%
5.0%
10.0%
15.0%
20.0%
HL60 - 2445 HL60 - 5149 9947A - 1393 9947A - 7861
Min
or
Alle
le F
req
ue
ncy
C T T A
2878 1981 1571
571
18821
10884
22719
17499
41064 40356
48071
42101
11737
16411 17697
11796
0
10000
20000
30000
40000
50000
0.0%
5.0%
10.0%
15.0%
20.0%
HL60 - 2445 HL60 - 5149 9947A - 1393 9947A - 7861
Co
verage
M
ino
r A
llele
Fre
qu
en
cy
Characterization of SRM 2392 and 2392-I Mitochondrial genome sequencing standard
Detection of low level heteroplasmy
C T T A
SNaPshot
http://portal.ccg.uni-koeln.de/
Allelic Discrimination (qPCR)
Forensic DNA Markers
• Short Tandem Repeats (STRs)
– PCR fragment-length polymorphisms
• Mitochondrial DNA (mtDNA)
– Sanger sequencing
• Single Nucleotide Polymorphisms (SNPs)
Sanger Sequencing
Most methods are low throughput and/or require a lot of DNA
NGS method can analyze many SNPs for many samples in one run
• IISNP-Individual
• AISNP-Ancestry
• LISNP-Lineage
• PISNP-Phenotype
SNP Information
• Individual Identification
– Balancing has occurred in all populations
– Low F statistics within (FIS) and among (FST) populations
– High heterozygosity
SNP Information
• Individual Identification Pakstis 2010, Kidd 2012
– Panel of 45 unlinked SNPs
– FST below ≈ 0.07
– Avg het > 0.4
– RMP 10-15 to 10-18
in 44 populations
SNP Information
• HID-Ion Ampliseq Identity Panel (version 2.3)
– 90 autosomal SNPs
– 30 Y-chromosome SNPs
– RMP 10-35
SNP Information
Kidd 45
SNPforID 52
HID Identity Panel
• Ancestry Information – High Fixation Index (FST)
– Population specific fixation has occurred
– Low heterozygosity
• Example – Malaria resistance SNPs in
Sub-Saharan Africa
SNP Information
• HID Ancestry Panel
– Beta version 3.0
– Publicly available soon
– 170 loci
– Derived from
Kosoy et. al (2008): 128 SNPs Kidd et. al (2014): 55 SNPs
SNP Information
• Ion Torrent Personal Genome Machine (PGM) – Launched in 2010
• Ion Torrent sequencing: – Emulsion PCR for single copy reactors
– Non-labeled nucleotide triphosphates
– Flowed over a bead on a semiconductor surface
• Hydrogen Ion detection – pH change is detected
– No optics
Life Tech - Ion Torrent - PGM
Ion Torrent PGM Workflow
http://www.youtube.com/watch?v=MxkYa9XCvBQ
The PGM Instrument at NIST
PGM Sequencer
OneTouch 2 (Emulsion PCR)
OneTouch ES (Enrichment)
7 ft
Ampliseq Workflow
PCR amplify
Chew back primers
Ligate adapters
Emulsion PCR
Sequencing
One template per bead/droplet
454 PGM “Ionogram”
Ion Ampliseq Library Kit
1 ng DNA input
Ampliseq Primer Pool
Template Kit Sequencing Kit
Front-End: Multiplex PCR
• HID-Ion Ampliseq Identity Panel (IISNP) – 120 markers in a single PCR reaction – Amplified regions 33 bp to 192 bp long
• HID-Ion Ampliseq Ancestry Panel (AISNP)
– 170 markers in a single PCR reaction – Amplified regions 34 bp to 136 bp long
• Small amplicons well suited to degraded or damaged DNA
60° 4:00
99° 2:00
99° 0:15
18 Cycles Time ≈ 1:30
4°
∞
1 ng DNA
Digest Primer Regions & Ligate Adaptors
• Enzymatic digestion removes ≈ 25 bp from ends of amplicons
• Universal sequencing adaptors are ligated to DNA – Adaptors termed P1 and A
• Barcoded sequencing adaptors can be used in this step
– Sequence multiple samples in one PGM run
P1 Adaptor
A Adaptor
Barcode Sequence
PCR Fragment
Adapted and Barcoded Sequencing Template
Enzyme Ligase
Prepare Ion Sphere Particles (ISPs)
• Libraries quantified by qPCR – Quantity of DNA going into emPCR is very important! – Goal: 10 % to 30 % template positive ISPs
• Too much DNA polyclonal ISPs (mixed read)
• Emulsion PCR
– Nanoliter droplets of PCR reagents in oil – Attaches a single DNA molecule to a single ISP
• Enrich for positive ISPs
– Liquid handler removes non-templated ISPs – Biotinylated primer/streptavidin beads
Ideal Non- Ideal
Prepare Ion Sphere Particles (ISPs)
• Libraries quantified by qPCR – Quantity of DNA going into emPCR is very important! – Goal: 10 % to 30 % template positive ISPs
• Too much DNA polyclonal ISPs (mixed read)
• Emulsion PCR
– Nanoliter droplets of PCR reagents in oil – Attaches sequencing template to the ISP
• Enrich for positive ISPs
– Liquid handler removes non-templated ISPs – Biotinylated primer/streptavidin beads
Ideal Non- Ideal
OneTouch 2
Prepare Ion Sphere Particles (ISPs)
• Libraries quantified by qPCR – Quantity of DNA going into emPCR is very important! – Goal: 10 % to 30 % template positive ISPs
• Too much DNA polyclonal ISPs (mixed read)
• Emulsion PCR
– Nanoliter droplets of PCR reagents in oil – Attaches sequencing template to the ISP
• Enrich for positive ISPs
– Liquid handler removes non-templated ISPs – Biotinylated primer/streptavidin beads
Ideal Non- Ideal
OneTouch 2
ISP
Magnetic bead w/ Streptavidin
Biotinylated PCR product
ISP NaOH
OneTouch ES
Sequencing & Data Analysis
• Library ISPs loaded onto chip
• PGM runs flows & detects pH
• Torrent Server & Torrent Suite Software – Processes pH signal into base calls
– Displays run summary
– Maps reads to reference genome Photo: www.lifetechnologies.com
Data Analysis HID SNP Genotyper Plugin
Allele coverage histogram
Normalized y-axis scale
X-axis is refSNP I.D.
Autosomal SNPs Y-SNPs
Data Analysis HID SNP Genotyper Plugin
Total Coverage
Reads for Each Base
Coverage for Either Strand
Strand Bias
Genotype
Quality Score
Major Allele Frequency
HID SNP Panel Sensitivity Study
• Dynamic range of DNA input to PCR – 1 ng is recommended – 10 ng (1 data point) – no problems were observed – 1 ng – 0.5 ng – 0.1 ng – 0.05 ng
• Libraries were generated and pooled (n = 12) • Sequenced on PGM 318 chip (11 M wells)
– 200 bp read chemistry
3 Replicates
0.5 ng Input DNA
HID SNP Panel Sensitivity Study A
ll s
cale
d t
o 8
00
0X
co
vera
ge
90 Autosomal SNP loci, sorted from highest to lowest coverage
0.1 ng Input DNA
0.05 ng Input DNA
Thresholds: analytical = 50 RFU, stochastic = 300 RFU, PHR = 0.5
Thresholds:
50X analytical
300X stochastic
50% balance
0.05 ng Input DNA
0.1 ng Input DNA
0.5 ng Input DNA
Identifiler® Plus amplification (29 cycle), 25 µl reaction , 3500xl electrophoresis, 1.2 kV for 8 seconds Thresholds: analytical = 50 RFU, stochastic = 200 RFU, PHR = 0.5
All
sca
led
to
40
00
RFU
HID SNP Panel Sensitivity Study
HID SNP Panel Sensitivity Study
D8S1179 D21S11 D7S820 CSF1PO D3S1358 TH01 D13S317 D16S539 D2S1338 D19S433 vWA TPOX D18S51 D5S818 FGA 1 in
0.05 ng
1.78E+01
1.17E+05
1.17E+06
0.1 ng
4.35E+03
6.19E+10
6.34E+11
0.5 ng
5.67E+14
rs1
00
55
33
rs1
00
92
49
1
rs1
01
52
50
rs1
02
41
16
rs1
03
18
25
rs1
04
88
71
0
rs1
04
95
40
7
rs1
05
80
83
rs1
07
73
76
0
rs1
07
76
83
9
rs1
10
90
37
rs1
29
97
45
3
rs1
32
18
44
0
rs1
33
58
73
rs1
35
53
66
rs1
35
76
17
rs1
36
02
88
rs1
38
23
87
rs1
41
32
12
rs1
45
43
61
rs1
46
37
29
rs1
49
04
13
rs1
49
32
32
rs1
49
85
53
rs1
52
35
37
rs1
52
84
60
rs1
59
60
6
rs1
73
64
42
rs1
82
13
80
rs1
87
25
75
rs1
88
65
10
rs1
97
92
55
rs2
01
62
76
rs2
04
04
11
rs2
04
63
61
rs2
07
68
48
rs2
11
19
80
rs2
14
95
5
rs2
21
95
6
rs2
26
93
55
rs2
29
29
72
rs2
34
27
47
rs2
51
93
4
rs2
83
07
95
rs2
83
17
00
rs3
21
19
8
rs3
38
88
2
rs3
54
43
9
rs3
78
09
62
rs4
28
84
09
rs4
30
04
6
rs4
36
42
05
rs4
45
25
1
rs4
53
00
59
rs4
84
70
34
rs5
60
68
1
rs5
76
26
1
rs6
44
47
24
rs6
81
12
38
rs6
95
54
48
rs7
04
11
58
rs7
17
30
2
rs7
19
36
6
rs7
22
09
8
rs7
22
29
0
rs7
27
81
1
rs7
33
16
4
rs7
35
15
5
rs7
37
68
1
rs7
40
59
8
rs7
40
91
0
rs7
52
03
86
rs7
70
47
70
rs8
26
47
2
rs8
73
19
6
rs8
76
72
4
rs8
91
70
0
rs9
01
39
8
rs9
07
10
0
rs9
14
16
5
rs9
64
68
1
rs9
87
64
0
rs9
90
59
77
rs9
93
93
4
rs9
95
11
71
1 in
0.05 ng 6.12E+15
3.62E+18
7.17E+21
0.1 ng 7.58E+27
8.87E+27
2.88E+30
0.5 ng 1.16E+35
2.31E+35
Identifiler Plus
PGM HID SNP Panel v2.3
Just like STR loci, some SNPs are
consistently less robust
1 in
0.05 ng 6.12E+15
3.62E+18
7.17E+21
0.1 ng 7.58E+27
8.87E+27
2.88E+30
0.5 ng 1.16E+35
2.31E+35
1 in
0.05 ng
D8S1179 D3S1358 D2S1338 D19S433 vWA TPOX D18S51 FGA TH01 D5S818 D7S820 D16S539 CSF1PO D21S11 1.78E+01
D8S1179 vWA D18S51 TH01 D5S818 D7S820 D16S539 CSF1PO D21S11 1.17E+05
D19S433 TPOX FGA TH01 D5S818 D7S820 D16S539 CSF1PO D21S11 1.17E+06
0.1 ng
D2S1338 D19S433 vWA TPOX D18S51 FGA D7S820 D16S539 CSF1PO D21S11 4.35E+03
D2S1338 D21S11 6.19E+10
CSF1PO D21S11 6.34E+11
0.5 ng
5.67E+14
HID SNP Panel Sensitivity Study
Identifiler Plus
PGM HID SNP Panel v2.3
SNPs have more possible loci and better
performance at low levels
HID SNP Panel Sensitivity Study
1
1000
1000000
1E+09
1E+12
1E+15
1E+18
1E+21
1E+24
1E+27
1E+30
1E+33
1E+36
0.05 ng 0.1 ng 0.5 ng
PGM HID SNP Panel Identifiler Plus
Ran
do
m M
atch
Pro
bab
ility
1
in
HID SNPs give better RMP
with 50 pg than ID+ gives with
0.5 ng
HID SNP Panel Sensitivity Study Summary
• Higher RMPs are expected for SNP panel compared to STRs due to many more loci
• Under thresholds indicated, higher % SNPs produce results than STRs also
• Better STR assays (GlobalFiler or NGS-STR) may lessen the “gap”
• Validation needed for SNP thresholds
HID SNP Panel Degraded DNA Study
Sheared genomic DNA
→Covaris S2 Focused Ultrasonicator
+ =
gDNA Sheared DNA
HID SNP Panel Degraded DNA Study
Sheared DNA was fractionated by size range
Blue Pippin system (3% Gel)
Automated size selection
1) 50 bp to 200 bp
2) 50 bp to 150 bp
3) 50 bp to 100 bp
4) 50 bp to 75 bp
5) 35 bp to 50 bp
Five individual agarose columns
Size fractionated fragments collected into recovery wells
1 2 3 4 5
HID SNP Panel Degraded DNA Study
Sheared DNA was fractionated by size range Agilent Bioanalyzer Trace
Size selected sheared DNA 50 bp to 200 bp
50 bp to 150 bp
50 bp to 100 bp
50 bp to 75 bp
35 bp to 50 bp
Input to HID Panel PCR 1 ng DNA
Built libraries and sequenced
Bioanalyzer Standard
Blue Pippin Marker (65 bp)
HID SNP Panel Degraded DNA Study
90 autosomal IISNPs
HID SNP Panel
Fragmented, size selected < 75 bp
Fragmented, size selected < 100 bp
Fragmented, size selected < 150 bp
Fragmented, size selected < 200 bp
Fragmented, non-size selected
Minifiler® amplification (30 cycle), 25 µl reaction, 3500xl electrophoresis, 1.2 kV for 8 seconds Thresholds: analytical = 100 RFU, PHR = 0.5; data scaled to 1000 RFU
HID SNP Panel Degraded DNA Study
Fragmented, size selected < 250 bp
Performed in
triplicate
One rep shown
Fragmented, size selected < 200 bp
Fragmented, size selected < 150 bp
Fragmented, size selected < 100 bp
Fragmented, size selected < 75 bp
Fragmented, non-size selected
PG
M 3
18
Ch
ip, a
ll s
cale
d t
o 2
00
0X
co
vera
ge
Fragmented, size selected < 250 bp
90 Autosomal SNPs, sorted from smallest to largest
Thresholds:
50X analytical
300X stochastic
50% balance
RMP 1 in
rs10
055
33
rs10
092
49
1
rs10
152
50
rs10
241
16
rs10
285
28
rs10
318
25
rs10
488
71
0
rs10
495
40
7
rs10
580
83
rs10
773
76
0
rs10
776
83
9
rs11
090
37
rs12
997
45
3
rs13
218
44
0
rs13
358
73
rs13
553
66
rs13
576
17
rs13
602
88
rs13
823
87
rs14
132
12
rs14
543
61
rs14
637
29
rs14
904
13
rs14
932
32
rs14
985
53
rs15
235
37
rs15
284
60
rs15
960
6
rs17
364
42
rs18
213
80
rs18
725
75
rs18
865
10
rs19
792
55
rs20
162
76
rs20
404
11
rs20
463
61
rs20
562
77
rs20
768
48
rs21
119
80
rs21
495
5
rs22
195
6
rs22
693
55
rs22
929
72
rs23
427
47
rs25
193
4
rs28
307
95
rs28
317
00
rs32
119
8
rs33
888
2
rs35
443
9
rs37
809
62
rs42
884
09
rs43
004
6
rs43
642
05
rs44
525
1
rs45
300
59
rs48
470
34
rs56
068
1
rs57
626
1
rs64
447
24
rs68
112
38
rs69
554
48
rs70
411
58
rs71
730
2
rs71
936
6
rs72
209
8
rs72
229
0
rs72
781
1
rs72
917
2
rs73
316
4
rs73
515
5
rs73
768
1
rs74
059
8
rs74
091
0
rs75
203
86
rs77
047
70
rs82
647
2
rs87
319
6
rs87
672
4
rs89
170
0
rs90
139
8
rs90
710
0
rs91
416
5
rs91
711
8
rs93
828
3
rs96
468
1
rs98
764
0
rs99
059
77
rs99
393
4
rs99
511
71
75-1 1.17E+01
75-2 3.14E+01
75-3 3.14E+01
100-1 6.83E+05
100-2 2.92E+07
100-3 1.38E+08
150-1 1.05E+21
150-2 8.75E+19
150-3 5.08E+20
200-1 1.96E+30
200-2 3.25E+26
200-3 6.61E+27
250-1 1.75E+36
250-2 8.62E+35
250-3 3.47E+36
HID SNP Panel Degraded DNA Study
PG
M II
SNP
s (9
0)
Min
iFile
r ST
Rs
(8)
RMP 1 in
PG
M II
SNP
s (9
0)
Min
iFile
r ST
Rs
(8)
RMP 1 in
HID SNP Panel Degraded DNA Study
SNPs have MANY more possible loci
and better performance in
degraded samples
75-1
75-2
75-3
100-1
100-2
100-3
150-1 7.08E+03
150-2 7.08E+03
150-3 1.26E+03
200-1 7.77E+07
200-2 6.94E+09
200-3 6.94E+09
250-1 6.94E+09
250-2 6.94E+09
250-3 6.94E+09
RMP 1 in
1.0E+00
1.0E+03
1.0E+06
1.0E+09
1.0E+12
1.0E+15
1.0E+18
1.0E+21
1.0E+24
1.0E+27
1.0E+30
1.0E+33
1.0E+36
1.0E+39
HID SNP Panel Degraded DNA Study
<75 <100 <150 <200 <250 Fragmented
MiniFiler
Ran
do
m M
atch
Pro
bab
ility
1
in
PGM HID SNP Panel
HID SNP Panel Degraded DNA Study Summary
• SNPs and STRs show expected performance in each fraction based on amplicon size
• Some SNPs can still amplify in degraded samples where STRs cannot
• Due to the high number of SNPs, very high RMPs are possible
• Better STR assays (GlobalFiler or NGS-STR) may lessen the “gap”
• Validation needed for SNP thresholds
SNPs and Mixtures
AA
AB
BB
1 A B
AA 100% 0%
AB 50% 50%
BB 0% 100%
% is coverage (like PH balance)
50
55
60
65
70
75
80
85
90
95
100
???
HID SNP Panel Mixture Detection
Maj
or
Alle
le F
req
ue
ncy
90 Autosomal SNPs
HOMOZYGOTE AA or BB
HETEROZYGOTE AB
Single source
samples should be either
50% or 100%
One single source
sample, major allele frequency plotted
for 90 HID SNPs (in ascending order)
Akin to an imbalanced STR locus
50
55
60
65
70
75
80
85
90
95
100
HID SNP Panel Mixture Detection
Maj
or
Alle
le F
req
ue
ncy
90 Autosomal SNPs
HOMOZYGOTE AA or BB
HETEROZYGOTE AB
Single source
samples should be either
50% or 100%
One single source sample in triplicate,
major allele frequency plotted
for 90 HID SNPs (in ascending order)
3 SNPs give outlying values, less useful for
mixtures
50
55
60
65
70
75
80
85
90
95
100
HID SNP Panel Mixture Detection
Maj
or
Alle
le F
req
ue
ncy
90 Autosomal SNPs
HOMOZYGOTE AA or BB
HETEROZYGOTE AB
Single source
samples should be either
50% or 100%
Two single source samples in triplicate,
major allele frequency plotted
for 90 HID SNPs (in ascending order)
50
55
60
65
70
75
80
85
90
95
100
HID SNP Panel Mixture Detection
Maj
or
Alle
le F
req
ue
ncy
90 Autosomal SNPs
HOMOZYGOTE AA or BB
HETEROZYGOTE AB
Single source
samples should be either
50% or 100%
Three single source samples in triplicate,
major allele frequency plotted
for 90 HID SNPs (in ascending order)
50
55
60
65
70
75
80
85
90
95
100
HID SNP Panel Mixture Detection
Maj
or
Alle
le F
req
ue
ncy
90 Autosomal SNPs
HOMOZYGOTE AA or BB
HETEROZYGOTE AB
Single source
samples should be either
50% or 100%
Four single source samples in triplicate,
major allele frequency plotted
for 90 HID SNPs (in ascending order)
50
55
60
65
70
75
80
85
90
95
100
HID SNP Panel Mixture Detection
Maj
or
Alle
le F
req
ue
ncy
90 Autosomal SNPs
HOMOZYGOTE AA or BB
HETEROZYGOTE AB
Single source
samples should be either
50% or 100%
Five single source samples in triplicate,
major allele frequency plotted
for 90 HID SNPs (in ascending order)
Assessing single source outliers
will improve mixture model
1 1 1A 1A 1B 1B A B
AA AA 2 2 0 0 100% 0%
AA AB 2 1 0 1 75% 25%
AB AA 1 2 1 0 75% 25%
AA BB 2 0 0 2 50% 50%
AB AB 1 1 1 1 50% 50%
BB AA 0 2 2 0 50% 50%
AB BB 1 0 1 2 25% 75%
BB AB 0 1 2 1 25% 75%
BB BB 0 0 2 2 0% 100%
SNPs in 1:1 Mixtures
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2 3 4 5 6 7 8 9
B
A
AA AA
AA AB
AB AA
AA BB
AB AB
BB AA
AB BB
BB AB
BB BB
1 1
50
55
60
65
70
75
80
85
90
95
100
Maj
or
Alle
le F
req
ue
ncy
90 Autosomal SNPs
A two-person
mixture in a 1:1 ratio should have
frequencies at: 50%, 75%, and 100%
SNPs in 1:1 Mixtures
One single source
sample, major allele frequency plotted
for 90 HID SNPs (in ascending order)
Theoretical distribution— Size of “bin” is related to average heterozygosity
Genotype combinations in this bin are:
AA:AB, AB:AA, AB:BB, BB:AB
AA:BB, AB:AB, BB:AA
AA:AA, BB:BB
2 1 2A 1A 2B 1B A B
AA AA 4 2 0 0 100% 0%
AA AB 4 1 0 1 83% 17%
AA BB 4 0 0 2 67% 33%
AB AA 2 2 2 0 67% 33%
AB AB 2 1 2 1 50% 50%
AB BB 2 0 2 2 33% 67%
BB AA 0 2 4 0 33% 67%
BB AB 0 1 4 1 17% 83%
BB BB 0 0 4 2 0% 100%
AA AA
AA AB
AA BB
AB AA
AB AB
AB BB
BB AA
BB AB
BB BB
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2 3 4 5 6 7 8 9
B
A
2 1
SNPs in 2:1 Mixtures
50
55
60
65
70
75
80
85
90
95
100
Maj
or
Alle
le F
req
ue
ncy
90 Autosomal SNPs
SNPs in 2:1 Mixtures
A two-person mixture in a 1:1
ratio should have frequencies at: 50%, 67.5%, 82.5%, 100%
One single source
sample, major allele frequency plotted
for 90 HID SNPs (in ascending order)
3 1 3A 1A 3B 1B A B
AA AA 6 2 0 0 100% 0%
AA AB 6 1 0 1 88% 13%
AA BB 6 0 0 2 75% 25%
AB AA 3 2 3 0 63% 38%
AB AB 3 1 3 1 50% 50%
AB BB 3 0 3 2 38% 63%
BB AA 0 2 6 0 25% 75%
BB AB 0 1 6 1 13% 88%
BB BB 0 0 6 2 0% 100%
AA AA
AA AB
AA BB
AB AA
AB AB
AB BB
BB AA
BB AB
BB BB
3 1
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2 3 4 5 6 7 8 9
B
A
SNPs in 3:1 Mixtures
50
55
60
65
70
75
80
85
90
95
100
Var
ian
t Fr
equ
en
cy
90 Autosomal SNPs
A two-person mixture in a 3:1
ratio should have frequencies at:
50%, 62.5%, 75%, 87.5% and 100%
One single source
sample, major allele frequency plotted
for 90 HID SNPs (in ascending order)
SNPs in 3:1 Mixtures
50
55
60
65
70
75
80
85
90
95
100
>3:1 or >2 people =
>> “bins”
Var
ian
t Fr
equ
en
cy
90 Autosomal SNPs
SNPs in 3:1 Mixtures
A two-person mixture in a 3:1
ratio should have frequencies at:
50%, 62.5%, 75%, 87.5% and 100%
One single source sample and one 3:1
mixed sample, major allele
frequency plotted for 90 HID SNPs
(in ascending order)
Actual vs Theoretical
50
55
60
65
70
75
80
85
90
95
100
Var
ian
t Fr
equ
en
cy
90 Autosomal SNPs
A two-person mixture in a 3:1
ratio should have frequencies at:
50%, 62.5%, 75%, 87.5% and 100%
One single source sample and two 3:1
mixed samples, major allele
frequency plotted for 90 HID SNPs
(in ascending order)
SNPs in 3:1 Mixtures
50
55
60
65
70
75
80
85
90
95
100
Var
ian
t Fr
equ
en
cy
90 Autosomal SNPs
A two-person mixture in a 3:1
ratio should have frequencies at:
50%, 62.5%, 75%, 87.5% and 100%
One single source sample and three 3:1
mixed samples, major allele
frequency plotted for 90 HID SNPs
(in ascending order)
SNPs in 3:1 Mixtures
HID SNP Panel Mixtures Summary
• Mixtures can be detected in SNP data based on the coverage levels at heterozygous loci
• It may be possible to determine two-person 1:1 or 2:1 mixtures (maaaybe 3:1)
• More than two contributors or greater than 3:1 mixtures will be difficult to distinguish
• Need to determine which SNPs “behave”
• Stay tuned!
PGM AIM Panel (beta testing)
• Ampliseq library prep
• 170 SNPs
• Seldin 128
• Kidd 55
• Analysis plug-in integrates FROGkb
AIM Panel Ancestry Prediction – SRM 2391c
SRM 2391c Component
Gender Ethnicity
(self declared)
A Female Not listed
B Male Mexican-American
C Male Melanesian
D Female:Male Mixed sample
E Female Not listed
F Male Caucasian
• Likelihood Ratio calculations – Four categories extant in both Kidd and Seldin studies
• Europeans, African Americans, Maya, and Han Chinese
– Allows comparison of SNP sets’ performance – Representative of major U.S. populations
HID SNP Genotyper Plugin (v4.1 Beta) New Feature – Ancestry Map
• Heatmap of highest probability of origin
Ancestry Prediction SRM 2391c Component A
SRM 2391c Component
Gender Ethnicity Kidd 55
Prediction Seldin 128 Prediction
A Female Not listed European 1.02 x 1033
European 6.32 x 1066
Kidd 55 SNPs Seldin 128 SNPs
Ancestry Prediction SRM 2391c Component B
SRM 2391c Component
Gender Ethnicity Kidd 55
Prediction Seldin 128 Prediction
B Male Mexican-American
European 5.39 x 1012
Han Chinese 1.48 x 1019
Kidd 55 SNPs Seldin 128 SNPs
Ancestry Prediction SRM 2391c Component C
SRM 2391c Component
Gender Ethnicity Kidd 55
Prediction Seldin 128 Prediction
C Male Melanesian Han Chinese 1.54 x 1014
Han Chinese 6.67 x 1028
Kidd 55 SNPs Seldin 128 SNPs
Ancestry Prediction SRM 2391c Component E
SRM 2391c Component
Gender Ethnicity Kidd 55
Prediction Seldin 128 Prediction
E Female Not listed European 5.41 x 1021
European 3.92 x 1050
Kidd 55 SNPs Seldin 128 SNPs
Ancestry Prediction SRM 2391c Component F
SRM 2391c Component
Gender Ethnicity Kidd 55
Prediction Seldin 128 Prediction
F Male Caucasian European 2.35 x 1031
European 1.16 x 1055
Kidd 55 SNPs Seldin 128 SNPs
HID SNP Panel Ancestry Summary
• 170 SNP panel containing two SNP sets that are suitable for use in U.S.
• Plug-in integrates FROG-kb (http://frog.med.yale.edu/FrogKB/)
• Heat maps give quick overview
• Interpretation tools being developed
– Combining loci
– Choosing/combining populations
Conclusions
• NGS can give more information on currently used forensic markers
– More STRs and STR sequence info
– Whole genome mtDNA
• NGS facilitates genotyping of forensic SNPs
• SNPs may help with low level & degraded samples
• SNPs may provide ancestry (and phenotype?) information
• Forensic NGS kits/methods are being developed
• Many questions to answer prior to implementation
Acknowledgements
Dr. Peter Vallone Group Leader
Kevin Kiesler Research Biologist
THANK YOU
Funding from the FBI Biometrics Center of Excellence
Forensic DNA Typing as a Biometric Tool
Thermo Fisher (Life Tech):
Nnamdi Ihuegbu
Robert Lagace
Applied
Genetics Thank you for your attention!
Contact Info:
301-975-6401