Joaquín Dopazo Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB), Bioinformatics Group (CIBERER) and Medical Genome Project, Spain. http://bioinfo.cipf.es http://www.medicalgenomeproject.com http://www.babelomics.org http://www.hpc4g.org @xdopazo Forum on Personalized Medicine, 25 September 2014 Bioinformatics and Big Data in the era of Personalized Medicine
33
Embed
Forum on Personalized Medicine: Challenges for the next decade
Bioinformatics and Big Data in the era of Personalized Medicine 10th Anniversary Instituto Roche Forum on Personalized Medicine: Challenges for the next decade. Santiago de Compostela (Spain), September 25th 2014
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Personalized Genomic Medicine. Phase I: generating the knowledge database
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
sequencing
Patient List of variants
Database. Query: variant/pathway
Therapy Outcome
System feedback
Genetic variants are linked to therapies through the knowledge of their functional effects (systems biology)
Initially the system will need much feedback: Knowledge generation phase. Growing knowledge database
Genomic medicine
Knowledge
database
Personalized genomic medicine.
Phase II: applying the knowledge database
Patient
1) Genomic sequencing 2) Database of markers 3) Therapy prediction
Genomic core facility phase II
Clinician receives hints on possible prescriptions and therapeutic interventions
+ Other factors (risk, cost, etc.)
Prescription Pre-symptomatic: • Genetic predisposition of acquired diseases
(>6000. some treatable)
• Early diagnosis of genetic diseases
Symptomatic analysis • Diagnostic of acquired diseases
• Early cancer detection
• Cancer treatment recommendation
From genetics to genomic medicine
Test 1
Test 2
Therapy 1
Therapy 2
Therapy 3
?
Genetic medicine
Test
Therapy 1
Therapy 2
Therapy 3
?
Genomic medicine
+
Genomic analysis allows associating patients to therapies from the very beginning, saving time and costs and increasing the success of treatments. feedback
Some examples
Conventional sequencing NGS (with capture)
Marfan syndrome 1300€
2 genes, 75 exons
900€
3 genes, 237 exons
Hereditary deafness 12500€
36 genes 1500 exons
1100€
38 genes > 1500 exons
• Low initial investment
• Already existent infrastructure
• Quick implementation
• Easily implementation as a cloud service that
guarantees sustainability
Preparing the scenario for the
introduction of genome in the clinics
Patient
Treatment
eHR
Decision support
techniques: algorithms
that relate biomarkers to
treatments, outcomes, etc.
(gene prioritization and
predictors)
Integration of
the data in
the eHR
Visualization and
data presentation.
Ready for the
clinical interpretation
Acceleration of
algorithms for data pre-
processing. Data
strorage optimization
feedback
Corporative
systems
Orion clinic
Abucasis, Gaia,
etc.
Preparing the scenario for the
introduction of genome in the clinics
Patient
Treatment
eHR
feedback
Corporative
systems
Orion clinic
Abucasis, Gaia,
etc.
Decision support
techniques: algorithms
that relate biomarkers to
treatments, outcomes, etc.
(gene prioritization and
predictors)
Visualization and
data presentation.
Ready for the
clinical interpretation
Integration of
the data in
the eHR
Acceleration of
algorithms for data pre-
processing. Data
strorage optimization
New Big Data storage strategies
Automatic QC Sequence cleansing
Variant calling + QC
Mapping + QC
8-10 hours 8-12 hours 8-12 hours
CLOUD
FASTQ
(10GB)
BAM
(7GB)
VCF
(200MB)
Data sizes for
exomes. In case of
whole genomes
sizes are >20x
Remote visualization
of big data.
Data production phase
e-health record
Final human supervision
of data QC
Tools developed to improve the pipeline Genome Maps, a HTML5+SVG data visualization of VCF and BAM
o Genome scale data visualization plays an important role in the data analysis process. It is a big data
management problem.
o Features of Genome Maps (Medina, 2013, NAR; ICGC data analysis portal)
● First 100% HTML5 web based: HTML5+SVG (inspired in Google Maps)
● Always updated, no browser plugins or installation
● Data taken from CellBase, remote NGS data, local files and DAS servers: genes, transcripts, exons, SNPs, TFBS, miRNA
targets, etc.
● Other features: Multi species, API oriented, easy integration, plugin framework, etc.
BAM
viewer
VCF viewer ICGC genomic viewer
www.genomemaps.org
Patient
Treatment
eHR
feedback
Corporative
systems
Orion clinic
Abucasis, Gaia,
etc.
Acceleration of
algorithms for data pre-
processing. Data
strorage optimization
Integration of
the data in
the eHR
Visualization and
data presentation.
Ready for the
clinical interpretation
Decision support
techniques: algorithms
that relate biomarkers to
treatments, outcomes, etc.
(gene prioritization and
predictors)
Preparing the scenario for the
introduction of genome in the clinics
Finding new biomarkers
Test
Therapy 1
Therapy 2
Therapy 3
?
feedback
Feedback: treatment failures are
reanalyzed to search for:
1) Biomarkers (of failure)
2) Subgroups (to search for new
personalized and rational
therapeutic interventions
Treatables
Failure
treatment
biomarkers
Group A
biomarkers
Group A
biomarkers
Irrelevant
Non treatables
Signaling
Protein interaction Regulation
Variants are used as biomarkers to distinguish
between responders and non-responders and to
sub-classify non-responders
Rationale design of therapies rely on
Systems Biology concepts. Pathways
are complex and must be understood
with the proper bioinformatic tools
Patient
Treatment
eHR
feedback
Corporative
systems
Orion clinic
Abucasis, Gaia,
etc.
Decision support
techniques: algorithms
that relate biomarkers to
treatments, outcomes, etc.
(gene prioritization and
predictors)
Acceleration of
algorithms for data pre-
processing. Data
strorage optimization
Visualization and
data presentation.
Ready for the
clinical interpretation
Integration of
the data in
the eHR
Preparing the scenario for the
introduction of the genome in clinics
BiERapp: interactive web-based tool for easy candidate prioritization by successive filtering
SEQUENCING CENTER
Data preprocessing
VCF FASTQ
Genome Maps
BAM
BiERapp filters
No-SQL (Mongo) VCF indexing
Population frequencies Consequence types
Experimental design
BAM viewer and Genomic context ?
Easy
sc
ale
up
NA19660 NA19661
NA19600 NA19685
BiERapp: the interactive filtering tool for easy candidate prioritization
http://bierapp.babelomics.org Aleman et al., 2014 NAR