Genetic architecture of developmental traits in populations of male gypsy moths Christopher J. Friedline, Ph.D. Virginia Commonwealth University @noituloveand @cfriedline Evolution 2014 Raleigh, NC 6.21.2014
Jul 19, 2015
Genetic architecture of developmental traits in
populations of male gypsy mothsChristopher J. Friedline, Ph.D.!
Virginia Commonwealth University
@noituloveand @cfriedline
Evolution 2014!Raleigh, NC!6.21.2014
Lymantria dispar
http://bugguide.net/node/view/553307Adapted from: http://www.fs.fed.us/ne/morgantown/4557/gmoth/atlas/quar1a.gif
http://green.blogs.nytimes.com/2011/09/13/the-toll-from-tree-boring-pests/?_php=true&_type=blogs&_r=0
Can we gain insight into the genetic architecture of invasion
using L. dispar as a model?
Experimental design• 7 populations established in
replicated common gardens (NY, VA)
• NC (1), VA (2), NY (1), Quebec (2) + 1 lab strain (CPHST Otis Lab)
• Single reference assembly (HiSeq 2500 PE)
• 188 barcoded individuals
• 3 phenotypes (developmental)http://www.fs.fed.us/ne/morgantown/4557/gmoth/
Parry and Grayson, EntSoc 2013
Reference Assembly• MaSuRCA Zimin et al. (2013), Bioinformatics, 29 (21),
2669-2677
• 591,450 scaffolds (90M QC PE reads)
• 588,164,124 Mb (~1 Kb/contig, max=28 Kb)
• 268 contigs > 10 Kb
• N50 = 1.3 Kb
• 35.3% GC
• 32%/56% CEGMA complete/partial
ddRADseq
• 316 M reads (41.5 GB)
• 188 individuals
• 690 Kb/individual [10, 1400] Kb (QC)
Peterson et al. (2012) Plos One 7.5 (2012): e37135-e37135
SNP Calling/Filtering• Called with Freebayes (n = 30,791)
• Kept only biallelic SNPs
• Removed > 50% missing
• Removed FIS outliers (> |0.5|)
• Removed MAF < 0.01
• n = 11,021
Phenotypes
NC NY OTIS QC32 QC93 VA1 VA2
7
8
9
10
11
12
13
Pupual Duration
NC NY OTIS QC32 QC93 VA1 VA2
0.2
0.3
0.4
0.5
0.6
0.7Mass
NC NY OTIS QC32 QC93 VA1 VA2
65
70
75
80
85
90
95
Total Dev Time
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35FST
0
1000
2000
3000
4000
5000
6000
7000
8000
Multilocus FIS = 0.0486, FST = 0.0224, FIT = 0.0700
n=11021 [0.00, 0.34]
FST
MAF Bin F F F
50 0.1542 0.0234 0.174
40 0.1007 0.0303 0.128
30 0.0757 0.0295 0.103
20 0.0093 0.0169 0.0261
10 -0.0673 0.0088 -0.0579
Population Structure Correction by PCA
• Price et al. (2006) ”Principal Components Analysis Corrects for Stratification in Genome-wide Association Studies." Nat Genet 38.8
• First principal component approximates FST
• Number of axes chosen using a Tracy-Widom test, described in Eckert et al. (2010). Genetics 185: 969-982.
• Correlation of genotype vs. phenotype residuals to X2 to p-values
Population Structure Correction by PCA
• Price et al. (2006) ”Principal Components Analysis Corrects for Stratification in Genome-wide Association Studies." Nat Genet 38.8
• First principal component approximates FST
• Number of axes chosen using a Tracy-Widom test, described in Eckert et al. (2010). Genetics 185: 969-982.
• Correlation of genotype vs. phenotype residuals to X2 to p-values
�40 �20 0 20 40 60 80
PC1 (0.021%)
�60
�50
�40
�30
�20
�10
0
10
20
30
PC2
(0.0
15%
)
PCA of n=7 populations on 11021 loci
Population Structure Correction by PCA
• Price et al. (2006) ”Principal Components Analysis Corrects for Stratification in Genome-wide Association Studies." Nat Genet 38.8
• First principal component approximates FST
• Number of axes chosen using a Tracy-Widom test, described in Eckert et al. (2010). Genetics 185: 969-982.
• Correlation of genotype vs. phenotype residuals to X2 to p-values
Top SNP
C/C C/T T/TLocus 14103 (ctg7180001511349/152)
0.2
0.3
0.4
0.5
0.6
0.7Mass (p = 0.000004, FST = �0.008543)
T/T T/C C/CLocus 14908 (ctg7180001527347/31)
65
70
75
80
85
90
95
Total Dev Time (p = 0.000111, FST = 0.016849)
A/A A/T T/TLocus 10529 (ctg7180001452692/364)
7
8
9
10
11
12
13
Pupual Duration (p = 0.000022, FST = �0.024810)
468444
44
444
73 29
9
Total Dev Time Pupual Duration
Mass
• Corrected for population structure (Price et al. 2006), binned by MAF
• By p value (p < 0.05):
• Mass: n = 555
• Pupual duration: n = 526
• Total development time: n = 524
• By q value (Storey and Tibshirani, 2003)
• Mass: n = 3 (14103(*,10), 27843(40), 9023(40))
• Pupual duration: n = 1 (S10529(40))
• Total development time: n = 0
Significant SNPs
Multilocus effectS2
7843
S533
0S1
3360
S230
45S2
3046
S146
81S1
4682
S278
41S3
0469
S258
51S1
0170
S237
91S1
2194
S259
17S7
263
S787
0S1
134
S141
03S8
633
S132
28S1
4762
S260
9S2
6153
S178
55S9
023
Top 25 (p-value) loci
�0.005
0.000
0.005
0.010
mul
t.re
g.co
eff
Mass regression coefficients(R2 = 0.556, R2
adj = 0.494)
S100
56S2
0804
S305
88S2
1213
S114
90S2
3793
S298
19S1
8637
S140
30S2
1212
S125
63S2
4930
S261
20S2
3756
S238
92S1
2194
S149
08S1
5702
S122
18S1
5492
S301
68S2
9721
S287
52S1
9836
S100
57
Top 25 (p-value) loci
�1.0
�0.5
0.0
0.5
1.0
mul
t.re
g.co
eff
Total Dev Time regression coefficients(R2 = 0.639, R2
adj = 0.584)
S160
63S6
315
S101
93S1
8784
S218
58S5
008
S215
6S1
8532
S184
65S1
2721
S120
12S2
8904
S257
1S2
0272
S696
3S3
0060
S319
6S3
195
S165
60S1
7662
S105
29S2
0271
S185
30S6
316
S160
62
Top 25 (p-value) loci
�1.0
�0.5
0.0
0.5
1.0
1.5
mul
t.re
g.co
eff
Pupual Duration regression coefficients(R2 = 0.541, R2
adj = 0.473)
Blast resultsMass
reverse transcriptase*
non-LTR retrotransposon
predicted craniofacial development protein
sulfotransferase (amine, estrogen)
Pupual duration
endonuclease-reverse transcriptase
phosphatidylinositol 3-kinase
Development time
reverse transcriptase
endonuclease-reverse transcriptase
transcription initiation factor TFIID subunit 2-like protein
chosen by p+q*!chosen by q+
Conclusions/Future Work• Assembly curation is likely necessary for more
robust biological conclusion
• Small effect sizes difficult to detect with small sample size and populations
• High degree of multilocus effects
• Additional replicate gardens with related material
• Probabilistic genotype calling with full set
Acknowledgments
Johnson Lab!Derek Johnson
Kristine Grayson Trevor Faske
NPGI: NSF Postdoctoral Fellowship in Biology FY 2013
Rodney Dyer, VCU Dylan Parry, SUNY-ESF
Eckert Lab!Andrew Eckert Brandon Lind Erin Hobson
Ethan Harwood VCU NARF VCU CHiPC