Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti [email protected]Section of Computational BioMedicine Boston University School of Medicine Biostatistics, BUSPH Bioinformatics Program, BU Graduate Program in Genetics & Genomics, BU Broad Institute of MIT & Harvard
18
Embed
Big Data Network Genomics Network Inference and Perturbation to Study Chemical-Mediated Cancer Induction Stefano Monti [email protected] Section of Computational.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Big Data Network Genomics Network Inference and Perturbation
Development and application of novel methods of network inference and differential analysis from multiple genomic data types toward the elucidation of a chemical's mechanism(s) of
cancer induction
Abstract
Development and application of novel methods of network inference and differential analysis from high-dimensional data types toward the elucidation of functionally relevant modules
(generalization)
high-dimensional data typesfunctionally relevant modules
domain specific
The Motivating Problem
GoalsDevelopment of “Carcinogenicity Biomarker(s)”
CarcinogenicityPrediction Model
Chemical
Carcinogen
Non-carcinogen
Pathways affected Driver alterations Biomarkers …
Understand Why
Manuscript under Review
GoalsDevelopment of “Carcinogenicity Biomarker(s)”
CarcinogenicityPrediction Model
Chemical
Carcinogen
Non-carcinogen
Non-carcinogens Carcinogens
gene1 gene2 gene3 gene4 gene5 gene6 gene7
…
To generate this ‘matrix’100,000s of experiments need
to be performed
1,000 of controls generated
In Progresshigh-throughput data generation
384-well plate
100,000s profiles
Phase I 24 plates (liver and lung) ~200 compounds ~10,000 profiles
Future plans … Phase II
More tissue types (breast, prostate, etc.) More compounds (~1,500) Mixtures 100,000s profiles
Phase III iPSC-derived cells & 3D cultures “personalized exposure” models
Generalization of the Motivating Problem
Comparison of a control state to multiple perturbation states
Standard approaches of gene-based differential analysis might miss salient (aggregate) differences
High-dimensional data (1000s of ‘features’) Usually representable as 2D [10K x 1K] matrices
Large sample size for the ‘control state’ ≥1000 observations
Small sample size for each of the ‘perturbation states’ ~10-100 observations/perturbation
Generalization of the Motivating Problem: an example
The Connectivity Map/LINCS project Expression Profiling of Chemical/Genetic perturbations