Ecological and evolutionary functional genomics: finding & studying genes that matter Christopher W. Wheat
Dec 03, 2014
Ecological and evolutionary functional genomics:
finding & studying genes that matter
Christopher W. Wheat
Ecological and Evolutionary Functional Genomics
• Integrative Research– Ecology (Hanski) + Molecular Biology (Frilander) +
Evolution Genetics (Wheat) + Physiology (Marden)
• Need genomic tools– Access to the coding genes
– 1000’s of SNPs &/or 100’s of microsatellites
– Microarrays for gene expression
– QTL, association mapping, and outlier analyses
• Need to sequence the genes
C. Vera and C. Wheat, Fescemyer, Frilander, Crawford, Hanski, Marden 2008
Rapid transcriptomecharacterization for a nonmodel organism using
454 pyrosequencing
48k contigs
SNP’s
QuickTime™ and a decompressor
are needed to see this picture.
First de novo transcriptome assembly using 454
pyrosequencing
Annotation of 45,000 contigs + singletons
Predicted genes:D. melanogaster = 13,379B. mori = 18,510
Estimated coverage:70% D. mel estimate50% B. mori estimate
Lower estimate 50% genes
Upper estimate of 70% = 13,142 genes
0
20
40
60
80
100
120
140
glycolysis &related pathways
citric acid cycle fatty acidmetabolism
purinemetabolism
pyrimidinemetabolism
average oxidativephosphorylation
ribosomalproteins
M. cinxia
B. mori
Metabolic Map Comparison
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
Bombyx mori with WGS M. cinxia with 454 seq.
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
0 50 100 150 200 250 300NumOfEsts
Linear Fit
Polynomial Fit Degree=2
ContigLength = 192.47394 + 4.8993402 NumOfEsts
RSquareRSquare AdjRoot Mean Square ErrorMean of ResponseObservations (or Sum Wgts)
0.6756970.673347163.9744327.8357
140
Summary of Fit
ModelErrorC. Total
Source 1
138 139
DF 7730931 3710490
11441421
Sum of Squares 7730931
26888
Mean Square287.5276
F Ratio
<.0001Prob > F
Analysis of Variance
InterceptNumOfEsts
Term192.473944.8993402
Estimate15.993120.288933
Std Error 12.03 16.96
t Ratio<.0001<.0001
Prob>|t|
Parameter Estimates
Linear Fit
RSquareRSquare AdjRoot Mean Square ErrorMean of ResponseObservations (or Sum Wgts)
0.8172030.814534123.5563327.8357
140
Summary of Fit
ModelErrorC. Total
Source 2
137 139
DF 9349959 2091462
11441421
Sum of Squares 4674980
15266
Mean Square306.2318
F Ratio
<.0001Prob > F
Analysis of Variance
InterceptNumOfEsts(NumOfEsts-27.6286)^2
Term145.091498.4811567 -0.02242
Estimate12.89942 0.41033
0.002177
Std Error 11.25 20.67
-10.30
t Ratio<.0001<.0001<.0001
Prob>|t|
Parameter Estimates
Polynomial Fit Degree=2
Bivariate Fit of ContigLength By NumOfEsts
Increasing 454 sequencing (Aland + China & France)
• Find & cover more genes – Get full length contigs
• SNPs – Reconfirm previous – Identify population specific SNPs– Map genes & compare with other
species (synteny)– Finding genes of interest
• QTL and association mapping• Outlier analyses (Fst)
– Demographics• Population structure effects on genetic
diversity• Colonization history
Custom Designed Microarrays (Agilent) :
• Using 13,780 assembled contigs • Used validated probes (selected best of 6 per
contig)• 60 bp probes randomly printed at least in
triplicate • 44,000 feature array, with 4 arrays per slide.
• 2 dyes hybridized per array• randomized across population type• amplified RNA indirectly labeled using Alexiflour
dyes (555 & 647) coupled to aaUTP incorporated during synthesis.
• Scanned using Genepix 4000B
Wheat et al., in prep.
Population age: P = 0.02
• 2 day old females, 25 families from 25 different local populations (n = 65)
• Common garden reared for 2 generations
Assessing expression variation across 12,000 genes in :
Abdomen (n = 20)Thorax (n = 34)
Head (n = 18) analysis in progress
What are the expression differences:
1. between new and old populations?2. across PMR performance variation?
Popage fixed effects: dye popage bodymass popage*bodymass
PMR fixed effect:dye PMR
Random effects:slide array(slide) spot spot*array individual
(FDR threshold = 5%) Using JMP Genomics (SAS)
Mixed Model Analysis : popage & peak metabolic rate
Mixed model analysis of Abdomen ANOVA normalized
(Multiple test correction used FDR @ 5% = 3.16 -log10P)
221 genes differentially expressed
(≈ 11 false positives) Horizontal reference line draw n at -log10(p) = 3.16
0
1
2
3
4
5
6
7
8
9
-1 0 1
Diff of PopAge = (new )-(old)
Diff of PopAge = (new)-(old)
0
1
2
3
4
5
6
7
8
9
-12 -10 -8 -6 -4 -2 0 1 2 3 4 5 6 7 8 9 11
Estimate of popage: new - old
Estimate of popage: new - old
-log 1
0 P
-val
ue
Significantly different genes(FDR threshold = 5%)
623 3
3
188
65
6
Peak Metabolic
Rate
Continuous
Population Age
22
1
Categorical
QuickTime™ and a decompressor
are needed to see this picture.
Wheat et al., in prep.
Abdomen:• 33 genes overlap population age and PMR
• Egg development• Ecdysteroid 22-phosphate• 3-dehydroecdysone 3alpha-reductase
• Metabolism• Fructose 1,6-bisphosphate aldolase• Gallerin• Tyrosine aminotransferase
• Only 12 of 33 with homology inferred function
Thorax: No sig. popage effects expression
Biological validation:vitellogenin
• Egg yolk precursor protein• Higher protein = more eggs
QuickTime™ and a decompressor
are needed to see this picture.
• Protein levels – new > old (P < 0.05)
• Microarray (mRNA levels)– new > old (P = 0.02)
Treatments Hot Standard Cold
Temperature maximum
35oC 26oC 20oC
Temperature treatments
Last instar larvaeReared in lab
Study gene expression
H vs. S C vs. S
11405
1066835 302
-0.95 -0.90 -0.85 -0.80 -0.75 -0.70
Comp1
HotStandardCold
Kvist et al., in prep.
Summary• We’ve built, and are improving, our genomic tools :
– Sequenced transcriptome & aiming for full length, better coverage– 1000’s of SNPs identified and being prep’d for high throughput analysis– Microarray validated and used to address ecological hypotheses
• Using tools to test hypotheses about expression differences– between population types?– larval temperature experience?
• Preliminary findings:– Metapopulation dynamics sort/maintain expression variation– Larval thermal experience matters, some shared responses to cold and heat
• Dispersal variation appears to arise from– abdomen fuel supply variation rather than thorax structural variation