A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data Sui-Pi Chen and Guan-Hua Huang Institute of Statistics National Chiao Tung University Hsinchu, Taiwan B:[email protected]2012.8.16 1 / 60
60
Embed
A Bayesian clustering approach for detecting gene-gene ...ghuang.stat.nctu.edu.tw/presentation/ABCDE_china.pdf · A Bayesian clustering approach for detecting gene-gene interactions
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
A Bayesian clustering approach for detectinggene-gene interactions in high-dimensional
genotype data
Sui-Pi Chen and Guan-Hua Huang
Institute of StatisticsNational Chiao Tung University
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Outline
1 Motivation
2 Methods for detecting gene-gene interaction
3 Proposed method: ABCDE
4 Simulation
5 Real data
6 Efficient Stochastic Search
7 Conclusion2 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Motivation
Outline
1 Motivation
2 Methods for detecting gene-gene interaction
3 Proposed method: ABCDE
4 Simulation
5 Real data
6 Efficient Stochastic Search
7 Conclusion3 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Motivation
Motivation
Cultural factors
Individual environment
Polygenic background
Common environment
4 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Motivation
Single nucleotide polymorphism (SNP)
A DNA sequence variation
Two alleles: A and a
Treating SNPs as categorical features that have three possiblevalues: AA, Aa, aa.
Relabel AA (2),Aa (1),aa (0).
5 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Motivation
What is the gene−gene interaction (epistasis)?
The effects of a given gene on a biological trait are masked orenhanced by one or more genes.
As increasing body of evidence has suggested that epistasisploy an important role in susceptibility to human complexdisease, such as Type 1 diabetes, breast cancer, obesity, andschizophrenia.
More evidences have confirmed that display interaction effectswithout displaying marginal effect.
6 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Methods for detecting gene-gene interaction
Outline
1 Motivation
2 Methods for detecting gene-gene interactionMDRBEAM
3 Proposed method: ABCDE
4 Simulation
5 Real data
6 Efficient Stochastic Search
7 Conclusion
7 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Methods for detecting gene-gene interaction
Methods for detecting gene-gene interaction
epistasis
Traditional
method
Two-stage methods
Data-mining
Bayesian model
selection
8 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Methods for detecting gene-gene interaction
Methods for detecting gene-gene interaction
Traditional –Logistic regression, contingency table χ2 test
method – It dose not include the interaction terms without main effect.
– High-dimensional data that has high-order interactions,
the contingency table have many empty cells.
Two-stage – A subset of loci that pass some single-locus significance threshold
method is chosen as the “filtered” subset.
– An exhaustive search of all two-locus or higher-order interactions
is carried out an the “filtered” subset.
Data-mining –Nonparametic
method –Not doing an exhaustive search
–Multifactor Dimensionality Reduction (MDR)
Bayesian model –Bayesian epistasis association mapping (BEAM)
selection –Algorithm via Bayesian Clustering to Detect Epistasis (ABCDE)
9 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Methods for detecting gene-gene interaction
MDR
Multifactor Dimensionality Reduction (MDR)
Step 1: 2-locus
Step 2: Calculate case-control ratios for each Multilocus genotype
Step 3: Identify High-risk Multilocus genotypes
(1,2) (1,3) (2,3)
SNP 2
SNP1
Caculate --prediction error (PE)
Step 5: Average PE
Step 6: Select best 2-locus model
Step 4: Cross-validation
1,2,3
10 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Methods for detecting gene-gene interaction
MDR
MDR
From all best models, the model with minimal averageprediction error is the final best model.
MDR is the data reduction strategy which is thenonparametric model and genetic model-free.
Permutation test for the final best model.
Applying MDR to 1000 permutation datasets, we use the PEof the 1000 final best models for the original data to create anempirical distribution for estimate of a p-value.
Note. This permutation test includes the variation of the search.
11 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Simulation
Single-set models
disease
Model 1
1 2
Model 2
disease
1,2
Model 3
Model 4
1,2,3
Model 5
disease disease
1,2,3,4,5,6
disease
1,2
37 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Simulation
Result for Single-set models
0.05 0.1 0.2 0.5MAF
pow
er0.
00.
20.
40.
60.
81.
0
ABCDE
BEAM
Model 1
0.05 0.1 0.2 0.5MAF
pow
er0.
00.
20.
40.
60.
81.
0 Model 2
0.05 0.1 0.2 0.5MAF
pow
er0.
00.
20.
40.
60.
81.
0 Model 3
0.05 0.1 0.2 0.5MAF
pow
er0.
00.
20.
40.
60.
81.
0 Model 4
0.05 0.1 0.2 0.5MAF
pow
er0.
00.
20.
40.
60.
81.
0 Model 5
38 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Simulation
Multiple-set models and LD-extend models
disease
Model 6
1,2 3,4
Model 7
disease
1,2 3,4,5
Model 8
disease
1,2,3 4 5
Model 9
1,2 3,4
5 6
Model 10
disease disease
1,2 3,4
5 6 7 8
39 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Simulation
Result for Multiple-set models and LD-extend models
0.05 0.1 0.2 0.5MAF
pow
er0.
00.
20.
40.
60.
81.
0 Model 6
0.05 0.1 0.2 0.5MAF
pow
er0.
00.
20.
40.
60.
81.
0 Model 7
0.05 0.1 0.2 0.5MAF
pow
er0.
00.
20.
40.
60.
81.
0 Model 8
0.05 0.1 0.2 0.5MAF
pow
er0.
00.
20.
40.
60.
81.
0 Model 9
0.05 0.1 0.2 0.5MAF
pow
er0.
00.
20.
40.
60.
81.
0 Model 10
40 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Real data
Outline
1 Motivation
2 Methods for detecting gene-gene interaction
3 Proposed method: ABCDE
4 Simulation
5 Real data
6 Efficient Stochastic Search
7 Conclusion41 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Real data
Real dataDetect pairwise and/or higher-order SNP interactions andunderstand the genetic architecture of schizophrenia throughABCDE and BEAM.
1512 individuals, including 912 schizophrenia cases and 600 controls.
Gene Chr number
DISC1 1q 16
LMBRD1 6q 11
DPYSL2 8p 14
TRIM35 8p 10
PTK2B 8p 19
NRG1 8p 10
DAO 12q 5
G72 13q 5
RASD2 22q 4
CACNG2 22q 6
42 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Real data
Flow chart-Quality Control
1512 samples (912 cases , 600 controls)
100 SNPs (10 genes)
Quality control <Haploview>
Exclusion criterion of samples -individual with GCR<70%
Exclusion criterion of SNPs -HWp-value<0.0001 -GCR<75% -MAF<0.005
1509 samples (909 cases , 600 controls)
95 SNPs (10 genes)
43 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Real data
Flow chart
All SNPs pass QC (95 SNPs)
Tag SNPs (78 SNPs) <Haploview>
Imputation of missing data <MDR Data Tool >
BEAM ABCDE
Validation test
B-statistic Cross-validation permutation test (BA, PA)
Run for 8 different hyper-
parameter settings
Run for 9 different hyper-
parameter settings
44 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Real data
Detection of gene-gene interaction
To obtain robust results, we adopted the two-stage approach.
Candidate SNP or subset SNPs hit by ABCDE (BEAM): In atleast 3 out of different settings, candidate SNP subset hit withthe posterior probability higher than a predefined cut-off, 0.3.
Susceptibility SNPs: permutation test (p-value< 0.001) orB-statistic (p-value< 0.1).
45 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Real data
Result
Table: Identified significant epistatic sets by BEAM using all 95 SNPs.
SNP Chr. Gene B-statistic(p-value) BA(p-value) PA(p-value)
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Efficient Stochastic Search
Outline
1 Motivation
2 Methods for detecting gene-gene interaction
3 Proposed method: ABCDE
4 Simulation
5 Real data
6 Efficient Stochastic Search
7 Conclusion50 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Efficient Stochastic Search
Efficient Stochastic Search
Although the GWCR algorithm works well high-dimensional data(simulation data with 1000 SNPs from 2000 cases and 2000controls), genome-scale gene-gene interaction analysis is stillinfeasible.
To improve the mixing of chains: Restricted Gibbs split mergeprocedure (RGSM) (Jain and Neal, 2004).
Be easy to move between local modes: equi-energy (EE)sampler (Kou, Zhou and Wong, 2006)
51 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Efficient Stochastic Search
Restricted Gibbs split merge procedure (RGSM)
Simple random split-merge procedure:
- The split proposals are unlikely to be appropriate, and henceare unlikely to be accepted.
Restricted Gibbs split merge procedure (RGSM):
- To employs a more complex proposal distribution obtained byusing a Gibbs sampling on subset of data.
- The split proposals with reference to the observed data is willlikely be accepted.
52 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Efficient Stochastic Search
Outline of Restricted Gibbs split merge procedure
Step 1: Random partition
Step 2: Split or Merge
Step 3: Restricted Gibbs sampling (t)[ ] 1
2 5
3 4 6 8 9
7 10
3,6
9,4,8
C
3,6 9,4,8
6
3 9,4,8
6
or
Step 4: Restricted Gibbs sampling (1)[ ] --proposal distribution
3 9,4,8,6
4
4
3 9,8,6
or
3,4 9,8,6
8
3,4 9,6
8
or
3,4,8 9,6
53 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Efficient Stochastic Search
Equi-Energy (EE) Sampler
The distribution of the system is thermal equilibrium attemperature T is described by the Boltzmann distribution,
p(h) =1
Z(T )exp(
−q(h)T
)
where Z(T ) =∑
h exp(−q(h)T ).
p(h): posterior distribution.
q(h): −log(p(h))
54 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Efficient Stochastic Search
Equi-Energy (EE) Sampler
1 = T0 < T1 < · · · < TK
pi(h) =1
Z(Ti)exp(
−q(h)Ti
)
The ideal is the perform sampling at different temperatureswhich make the distribution flat.
H(K)
burn in D̂(K)
H(K−1)
burn in D̂(K−1)
.
.
.
H(0)
burn in D̂(0)
55 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Efficient Stochastic Search
Equi-Energy (EE) Sampler
q(h) = −log(p(h)) ∈ [Ek, Ek+1)
E0 < E1 < E2 < · · · < EK < EK+1 =∞,
56 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Efficient Stochastic Search
Hybird-GRE SamplerHybird-GRE sampler consists of:
1. Global move: EE sampler.
2. Local move: GWCR(1)+RGSM(1).
Chain HK : only local move.
Other chain: prob for the global move is increasing.
EE
local
57 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Efficient Stochastic Search
Result for Hybird-GRE sampler
20000 22000 24000−76
600
−76
500
−76
400 GWCR
Iterations/L
log
likel
ihoo
d
20000 22000 24000−75
000
−74
800
−74
600 Hybird−GRE
Iterations/L
log
likel
ihoo
d
58 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Conclusion
Outline
1 Motivation
2 Methods for detecting gene-gene interaction
3 Proposed method: ABCDE
4 Simulation
5 Real data
6 Efficient Stochastic Search
7 Conclusion59 / 60
A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data
Conclusion
Conclusion
We propose the ABCDE algorithm which can character all explicit(interaction) effects, regardless of the number of groups.
We further develop permutation tests to validate the diseaseassociation of SNP subsets selected by ABCDE.
Applying ABCDE to the real data, we identify several known andnovel schizophrenia-associated SNPs and sets of SNPs.
We may develop a parallel implementation of the ABCDE, which isthe algorithm for large scale epistatic interaction mapping, includinggenome-wide studies with hundreds of thousands of markers.