Candidate Gene Resource Steering Committee Meeting July 25, 2006
Dec 19, 2015
Goals for Today• Strengthen relationships among CARE investigators
• Define pilot project (phenotypes & SNPs)
• Establish principles of data release
• Discuss genotyping study design
• Select phenotypes to be analyzed
CARE Governance
• Steering committee– Representative of each CARE organization– Subcommittees : Data Release,
Phenotypes, Study Design, Informatics, SNP Selection, DNA/Genotyping
• NHLBI staff
• NHLBI appointed oversight committee
CARE : timeline
• RFP released March 2005
• Response submitted July 15, 2005
• Awarded April 1, 2006
• Four year award– Y1: Create DNA and phenotype database– Y2: Genotyping– Y3 / 4: Joint analysis and data distribution
Resources Provided by NHLBI
• $18.3M over 4 years to create a resource to relate genotype-phenotype across cohorts:– Create a consortium among CARE cohorts– Database DNA and phenotypes– Genotype a common set of SNPs across cohorts– Create software tools to enable joint analysis– Data distribution as per CARE data release policy– Project management and coordination
-PM hired : Deb Farlow
Areas for Discussion Today
• Data Release
• Study Design
• Phenotypes
NHLBI
Current state of genotyping technology
Presentation of informatics tools
Data release
• Data release policy to be established by CARE steering committee with NHLBI and local IRB’s
• Broad proposed secure, HIPAA compliant web architecture to implement this policy and to enable access-controlled environment for data sharing and analysis
Areas for Discussion Today
• Data Release
• Study Design
• Phenotypes
NHLBI
Current state of genotyping technology
Presentation of informatics tools
Original CARE Study Design
• Candidate Gene Study– 50,000 samples– average 10 SNPs/gene x 1700 genes = 17,000 SNPs– Requirement: $0.01 /genotype (fully loaded)
• Whole Genome Association Study– 500 cases / 1,000 controls– At least 300,000 SNPs genome wide
Candidate gene study
• Targeted genotyping technology has remained stable : same price and throughput as in approved proposal
• Key issue: criteria for selecting 17,000 candidate gene-based SNPs– biological hypotheses
Developments since RFP
• Whole genome scans promise new hypotheses for candidate genes
• Evaluation of coverage / performance of whole genome arrays
• Price for whole genome genotyping technology has improved
Whole genome scanning
• SHARE will genotype 15,000 people from NHLBI cohorts (FHS and TBA)
• RFA for 4-5 whole genome scans• GAIN, WTCCC, etc, etc• Implication: hypotheses that could be
confirmed and extended by CARE• Challenge: timing doesn’t synch up well
with original CARE timeline
Developments since RFP
• Whole genome scans promise new hypotheses for candidate genes
• Evaluation of coverage / performance of whole genome arrays
• Price for whole genome genotyping technology has improved
Do they work?
SamplesAverage call
rateConcordance with
Hap MapTrio
concordanceAffymetrix 500K
(Broad) 1200 99.10%48 CEU samples,
99.10% 60 trios, 99,9%Illumina 317K
(CIDR*) 1400 99.80%8 CEU samples,
99.85% 10 trios, 99.85%
* from http://www.cidr.jhmi.edu/human_gwa.html
Do They Work at High Scale?Recent Call Rate Data
(at Broad)
Product Chips Call Rate
Affy 500K 12,000 98.7%ILMN 317K 250 99.2%
In-Process QC test
HapMap sample vs Hap Map
CONCORDANCE (CNTRL VS HapMap, n=42)
97.50%
98.00%
98.50%
99.00%
99.50%
100.00%
0 5 10 15 20 25 30 35 40 45
Avg=99.62%7,947,748 comparisons
QC statistics: MS andT2D Scans
# % of Total # % of Total # % of Total # % of TotalSamples attempted 1530 100% 1558 100% 1117 100% 867 1%Pass DM (0.26) >=85% 1474 96% 1476 97% 1040 93% 817 94%Pass BRLMM >=95% 1438 94% 1428 93% 1008 90% 792 91%
Avg call rate passing samples 99.10% 99.00% 99.00% 98.70%
# Passing SNPs in passing samples 253,172 97% 230,816 97% 251,248 96% 228,972 96.10%
T2D ScanNsp StyNsp
MS ScanSty
Genotyping Costs per Sample
$0
$200
$400
$600
$800
$1,000
$1,200
$1,400
$1,600
$1,800
Jul-05 Oct-05 Jan-06 Apr-06 Aug-06 Nov-06 Feb-07 Jun-07
Ch
ip c
ost
per
sam
ple
Affy 500KILMN 317KILMN 550KILMN 650YMIP (20K)
WGAS: Then and Now
Original Plan
Product: Affymetrix 500KTotal cost per sample: $1600 (chip+reagents+equipment+labor+IDC)
Study Design: 500 cases / 1,000 controlsBudget=$2,400,000
WGAS: Then and Now
Now possible
Product: Affymetrix 500KTotal cost per sample: $530 (chip+reagents+equipment+labor+IDC)
Study Design: 4,500 samplesBudget=$2,400,000
WGAS: Then and Now
January 2007
Product: Affymetrix 500KTotal cost per sample: $410 (chip+reagents+equipment+labor+IDC)
Study Design: 5,800 samplesBudget=$2,400,000
In Summary
SNPs Samples Cost
7/15/05 500,000 1,500 $2.4M 17,000 50,000 $8.5M
7/25/06 500,000 4,500 $2.4M 17,000 50,000 $8.5M
1/07 500,000 5,800 $2.4M` 17,000 50,000 $8.5M
Conclusions: genotyping
• Targeted genotyping (custom set of candidate genes) stable @ $0.01 / gt
• Timing of candidate gene selection
• Improved cost and performance of whole genome arrays @ $0.001 / gt
Areas for Discussion Today
• Data Release
• Study Design
• Phenotypes
NHLBI
Current state of genotyping technology
Presentation of informatics tools
High Level Workflow – for CaRE
Upload Samples, Peds, Individuals,
Phenotypes
Create Experiments(Samples x Features)
Summarize/FilterPLINK
Data VaultQC/Curate Results
Design and Execute
Experiments
ProjectDB
LIMS DBs
BSP DB
Association & Statistics Viewers
Cohort’s CustomAlgorithms, Viewers
Web
Ser
vice
s
Data Compile
FeatureDB
Analysis: Gene Pattern + CaRE analysis tools
Production:BSP/GAP + CaRE enhancements
Designing a Pilot
• A trial run for DNA quality, genotyping, phenotype and joint analysis, and publication
• Scale and content of pilot to be refined, topic for today’s discussion sessions
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this p icture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
A R EA R EQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Our shared aspiration: the greatest genetic epidemiology experiment to date
CCQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
CSSCD
How?
Smaller format
BRLMM
Sequence Variability(DNA Analysis)
A/A B/BA/B
Mismatch probes not needed
Fewer probes needed
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Single format
Coverage of Common Variants by Whole-genome Products
Tag SNPs
Affymetrix Mapping 500K GeneChip
Illumina HumanHap300 BeadChip
Coverage Mostly Provided by Pairwise Correlations
A
A
A
T
T
T
G
G
G
T
T
T
G
G
T
G
G
G
A
A
C
A
A
C
T
T
C
T
T
C
T
T
G
T
T
G
G
G
C
C
C
C
G
G
T
T
G
G
G
G
T
T
G
G
C
C
C
C
T
T
C
C
C
C
G
G
A
A
A
A
C
C
A
A
A
A
T
T
G
G
C
C
C
C
G
G
C
C
C
C
G
G
T
T
G
G
Specified Multimarker Tests Improve Effective Coverage
A
A
A
T
T
T
G
G
T
G
G
G
A
A
C
A
A
C
G
G
C
C
C
C
G
G
T
T
G
G
G
G
T
T
G
G
C
C
C
C
T
T
G
G
T
T
G
G
C C
Coverage of the genomeYRI Coverage
0%
20%
40%
60%
80%
100%
Affy100k Affy500k Ilmn300k Ilmn550k
Array
Fra
cti
on
co
mm
on
SN
Ps
ca
ptu
red
at
r2 o
f 0
.8 Single markers2-marker predictors
CEU Coverage
0%
20%
40%
60%
80%
100%
Affy100k Affy500k Ilmn300k Ilmn550k
Array
Fra
cti
on
co
mm
on
SN
Ps
ca
ptu
red
at
r2 o
f 0
.8 Single markers2-marker predictors