Page 1
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Ferran Sanz
Programa de Investigación en Informática Biomédica (GRIB) Institut Hospital del Mar d’Investigacions Mèdiques (IMIM)
Universitat Pompeu Fabra Barcelona
Big data biomédico: La integración masiva de datos
para investigación
Page 2
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Clinical Data
Biomedical imaging
‘omics & Systems Biology
Drugs & other chemicals
Biomedical literature
Biomedical Big Data
HEALTH CARE PRACTICE
Page 3
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Biomedical Big Data
Clinical Data
Biomedical imaging
‘omics & Systems Biology
Drugs & other chemicals
Biomedical literature
Millions of EHRs that that can be reused for research
Page 4
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Biomedical Big Data
Clinical Data
Biomedical imaging
‘omics & Systems Biology
Drugs & other chemicals
Biomedical literature
Worldwide estimated medical imaging in 2020: 35 ZB S. Sarcar. GE Healthcare. http://es.slideshare.net/sarcar/data-explosion-in-medical-imaging
Page 5
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Clinical Data
Biomedical imaging
‘omics & Systems Biology
Drugs & other chemicals
Biomedical literature
Biomedical Big Data
BIOMEDICAL RESEARCH
Page 6
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Clinical Data
Biomedical imaging
‘omics & Systems Biology
Drugs & other chemicals
Biomedical literature
Biomedical Big Data
In May 2015, the European Genotype-Phenotype Archive (EGA) stored 1.8 PB of human ‘omics data
Page 7
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Biomedical Big Data
Clinical Data
Biomedical imaging
‘omics & Systems Biology
Drugs & other chemicals
Biomedical literature
ChEMBL: 11K targets; 1.5M compounds; 14M activities
Page 8
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Biomedical Big Data
Clinical Data
Biomedical imaging
‘omics & Systems Biology
Drugs & other chemicals
Biomedical literature
20+ million scientific papers referenced in PubMed®, and 700,000+ are added every year
Page 9
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Clinical Data
Biomedical imaging
‘omics & Systems Biology
Drugs & other chemicals
Biomedical literature
BIMEDICAL BIG DATA
Health information in social media (Web 2.0) should not be forgotten
Page 10
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Clinical Data
Biomedical imaging
‘omics & Systems Biology
Drugs & other chemicals
Biomedical literature
BIMEDICAL BIG DATA
Health information in social media (Web 2.0) should not be forgotten
80+% digital information available is not structured and is in multiple languages
Page 11
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Clinical Data
Biomedical imaging
‘omics & Systems Biology
Drugs & other chemicals
Biomedical literature
Integration of heterogeneous biomedical information to gain a more complete and powerful view on diseases and therapeutics
INTEGRATIVE BIOINFORMATICS
Page 12
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Exploitation of the Biomedical Big Data in pharmacovigilance
EHR Db iv EHR
Db iii EHR Db ii EHR
db i
Data extraction and integration
Signal detection
Signal substantiation
In silico pharmacology
Pharmacoepidemiological analysis
Text mining
Stardardization & terminology mapping
Bioinformatics
Page 13
Ferran Sanz – GRIB (IMIM-UPF)
Page 14
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
From Bauer-Mehren A, Bundschus M, Rautschka M, Mayer MA, Sanz F, Furlong LI. PLoS One 2011; 6(6): e20284
Knowledge discovery by information linkage
Page 15
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Workflows for chemo-bioinformatic signal substantiation
Drug used Clinical adverse event
Proteins interacted
Page 16
Ferran Sanz – GRIB (IMIM-UPF)
Data Silos
Different Standards
Large Volume
Need for resources that gather, standardize and integrate information on the genetic basis of diseases
Information on genetic basis of diseases
Page 17
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Page 18
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
• A comprehensive resource on gene-disease associations (GDAs)
• Integrates information from publicly available databases and from the literature by text mining
• DisGeNET v4.0 (April 2016) contains 429,036 GDAs involving 17,381 genes and more than 15,000 diseases and phenotypes
• Freely available at: http://www.disgenet.org
Page 19
Ferran Sanz – GRIB (IMIM-UPF)
GWAS Catalog
OrphaNet
UniProt
CTD
LHGDN
CTD
Curated Predicted Literature
RGD
BEFREE
GAD
ClinVar
MGD
DisGeNET version 4.0: Data sources
Bio-Entity Finder and Relation Extraction
Page 20
Ferran Sanz – GRIB (IMIM-UPF)
DisGeNET version 4.0: Statistics
Source Genes Diseases Associations
Curated 7,362 7,607 32,834
Predicted 2,743 2,064 10,264
Literature 16,141 11,447 403,925
All 17,381 15,093 429,036
Page 21
Ferran Sanz – GRIB (IMIM-UPF)
DisGeNET version 4.0: Tools
Network Analysis Web interface
Semantic Web
Programmatic Access
R package Federated queries
Page 22
Ferran Sanz – GRIB (IMIM-UPF)
DisGeNET version 4.0: Top scoring genes for Wilson disease
Gene Number
of diseases
DisGeNET score DSI DPI Number of
PMIDs Number of
SNPs
ATP7B 57 0.819 0.596 0.592 234 99 ANXA5 129 0.2 0.505 0.741 1 0 PRNP 205 0.128 0.468 0.962 4 1 CP 114 0.126 0.532 0.704 26 0 LOX 141 0.123 0.498 0.778 2 0 LOXL2 48 0.123 0.610 0.481 1 0 APOE 729 0.122 0.333 1 2 0 TNF 1524 0.120 0.247 1 2 0 IL6 1260 0.120 0.268 1 2 0 NDUFB7 1 0.120 1 0.148 1 0
Page 23
Ferran Sanz – GRIB (IMIM-UPF)
DisGeNET version 4.0: Top scoring genes for Major Depression
Gene Number
of diseases
DisGeNET score DSI DPI Number of
PMIDs Number of
SNPs
SLC6A4 374 0.236 0.411 0.852 157 5 TPH2 89 0.211 0.548 0.667 26 1 HTR2A 222 0.155 0.463 0.778 45 17 PCLO 20 0.130 0.696 0.333 12 5 CRHR1 118 0.127 0.531 0.778 11 11 CYP2D6 316 0.127 0.428 0.852 11 2 FKBP5 78 0.126 0.563 0.814 16 1 SP4 16 0.125 0.739 0.296 3 1 GRM7 32 0.123 0.666 0.444 5 1 GNAI3 7 0.122 0.812 0.296 2 1
Page 24
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Integrative Biomedical Informatics Group:
Acknowledgements
http://grib.upf.edu
• L.I. Furlong • A. Bauer-Mehren (Roche) • A. Bravo
• A. Gutiérrez • J. Piñero • N. Queralt
Page 25
Ferran Sanz – GRIB (IMIM-UPF) Ferran Sanz – GRIB (IMIM-UPF)
Thanks for your attention!