Evaluation of Evaluation of methods in gene methods in gene association studies: association studies: yet yet another case for another case for Bayesian networks Bayesian networks Department of Measurement Department of Measurement and Information Systems and Information Systems Budapest University of Budapest University of Technology and Economics Technology and Economics Gábor Hullám Gábor Hullám , , Péter Antal Péter Antal , , András Falus and Csaba Szalai András Falus and Csaba Szalai Department of Genetics Cell and Immunobiology Semmelweis
25
Embed
Evaluation of methods in gene association studies: yet another case for Bayesian networks
Evaluation of methods in gene association studies: yet another case for Bayesian networks. Gábor Hullám , Péter Antal , András Falus and Csaba Szalai. Department of Measurement and Information Systems Budapest University of Technology and Economics. Department of Genetics - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Evaluation of Evaluation of methods in gene methods in gene
association studies: association studies: yetyet
another case for another case for Bayesian networksBayesian networks
Department of MeasurementDepartment of Measurement and Information Systemsand Information Systems
Budapest University of Budapest University of Technology and EconomicsTechnology and Economics
Gábor HullámGábor Hullám, , Péter AntalPéter Antal,,András Falus and Csaba SzalaiAndrás Falus and Csaba Szalai
Department of GeneticsCell and
ImmunobiologySemmelweis University
Genetic association studies Genetic association studies (GAS)(GAS)
A Bayesian approach to GASA Bayesian approach to GAS Bayesian networks in GASBayesian networks in GAS Evaluation of methodsEvaluation of methods
3
MotivationMotivation:: Exploring the Exploring the vvariomeariome
Number of gene association studies GWAS: ~100 PGAS-: ~10K
Current challenge: the discovery of epistasis
Statistical epistasis: non-linear Statistical epistasis: non-linear interaction of genes interaction of genes
The goal is the exploration ofThe goal is the exploration of…… explanatory variables explanatory variables ofof the target the target
variablevariable((ss)) the the interactioninteraction of explanatory variables of explanatory variables
Genetic association concepts can be Genetic association concepts can be formalized (partially) as machine formalized (partially) as machine learning concepts and as learning concepts and as Bayesian Bayesian network conceptsnetwork concepts
The model class: Bayesian The model class: Bayesian networksnetworks
AdvantagesAdvantages of of GA-to-BN - GA-to-BN - 11
SStrong relevancetrong relevance - direct association: - direct association: Clear semantics and dedicated goal for Clear semantics and dedicated goal for the explicit. faithful representation of the explicit. faithful representation of strongly relevant (e.g. non-transitive)strongly relevant (e.g. non-transitive) relationrelationss
GGraphical representationraphical representation:: It offers It offers better overview of the dependence-better overview of the dependence-independence structure. e.g. about independence structure. e.g. about interactions and conditionalinteractions and conditional relevance.relevance.
MultiMultiple ple targetstargets:: It inherently works for It inherently works for multiple targets.multiple targets.
AdvantagesAdvantages of of GA-to-BN – GA-to-BN – 22
IIncomplete datancomplete data:: It offers integrated It offers integrated management of incomplete data management of incomplete data within Bayesian inference.within Bayesian inference.
Causality:Causality: Model-based causal Model-based causal interpretation of associationsinterpretation of associations
Haplotype level: Haplotype level: Offers integrated Offers integrated approach to haplotype reconstruction approach to haplotype reconstruction and association analysis (assuming and association analysis (assuming unphased genotype data)unphased genotype data)
Challenges of applying BNs in GAS
High computational complexity High sample complexity Bayesian statistics
Bayesian model averaging Feature posterior
GoalGoal: approximat: approximate e the full-scale the full-scale summationsummation (integral) (integral)
A solutionA solution: Metropolis coupled : Metropolis coupled Markov chain Monte Carlo (MCMCMC)Markov chain Monte Carlo (MCMCMC)
fFG G GPfFP:
)()(
Uncertainty in Uncertainty in multivariate analysismultivariate analysis
Automated Automated correction for correction for “multiple “multiple testing”testing” The measure of uncertainty at a given level The measure of uncertainty at a given level
automatically indicates its applicabilityautomatically indicates its applicability Prior incorporationPrior incorporation: better prior incorporation : better prior incorporation
both at parameter and structural levels.both at parameter and structural levels. Post fusion:Post fusion: better semantics for the better semantics for the
construction of meta probabilistic knowledge construction of meta probabilistic knowledge basesbases
Normative uncertainty for model properties (cf. bootstrap)
The basis for comparison
Our approach is a model based exploration of the underlying
structure(note: multiple targets, causal and
direct aspects)
≠Prediction of class labels
Comparison of GAS toolsComparison of GAS toolsDedicated GAS toolsDedicated GAS tools
General purpose General purpose FSS toolsFSS tools
MDRMDR Causal ExplorerCausal Explorer
……
moderate number of clinical variables (in the moderate number of clinical variables (in the range of 50) range of 50)
hundreds of genotypic SNP variables for each hundreds of genotypic SNP variables for each patient patient
thousands of gene expression measurements thousands of gene expression measurements
Asthma Asthma Complex disease mechanismComplex disease mechanism Half of the patients do not respond well to current Half of the patients do not respond well to current
treatmentstreatments Unknown pathways in the asthmatic processUnknown pathways in the asthmatic process
Application domain: The Application domain: The genomic background of genomic background of
asthnaasthna
Evaluation on an artificial data set
Artificial model based on a real-world domain: the genomic background of asthma
The real data set consists of: 113 SNPs 1117 samples
SummarySummary General BN repreGeneral BN representation is feasible sentation is feasible
and gives superior performance for and gives superior performance for PGASPGAS
Bayesian statistics allows the Bayesian statistics allows the quantification of applicability of BNsquantification of applicability of BNs
Special extensions are necessary forSpecial extensions are necessary for Multiple targetsMultiple targets Combined discovery of relevance and Combined discovery of relevance and
interactions (MBM, MBS, MBG) interactions (MBM, MBS, MBG) Scalable multivariate analysis (k-MBS concept) Scalable multivariate analysis (k-MBS concept) Feature aggregationFeature aggregationAntal et al.: A Bayesian View of Challenges in Feature Antal et al.: A Bayesian View of Challenges in Feature
Selection: Multilevel Analysis, Feature Aggregation, Selection: Multilevel Analysis, Feature Aggregation, Multiple Targets, Redundancy and InteractionMultiple Targets, Redundancy and Interaction , JMLR , JMLR Workshop and Conference ProceedingsWorkshop and Conference Proceedings
Future work
Specific local models (GA –specific local models)
Integrated missing data management and GA analysis (cf. imputation)
Noisy genotyping probabilistic data (see poster)
Integrated haplotype reconstruction (see poster) Integrated study design and analysis (see