METACORE January 2020 Data-Mining and Pathway Analysis
METACORE
January 2020
Data-Mining and Pathway Analysis
2
• As technological advances lead to the generation of more omics data, scientists are facing a new challenge in making sense of it all.
• New tools are needed that can meet this challenge to enable researchers to generate actionable hypothesis from their data.
Big data challenges in the Life Sciences
3
• Gain molecular understanding of disease
• Analyze and understand experimental findings (Omics data) in the context of validated biological pathways
• Generate and confirm hypotheses for novel biomarkers, targets, mechanisms of action
MetaCore: Your GPS in pathway analysis
?Knowledge
Mining
‘Omics’ Data Analysis
Pathway Analysis
Platform
MetaCore delivers high-quality biological systems content in context, giving you essential data and analytics to
accelerate your scientific research. MetaCore contains sophisticated integrated pathway and network analysis for
multi-omics data.
With MetaCore you can analyze and understand experimental findings in the context of validated biological pathways.
Why Pathway Analysis Software?
• A learning tool– Study a group of gene products.
• A data analysis tool.– Which pathways are particularly affected?
– What disease has similar biomarkers?
• A hypothesis generation tool– Can provide insight into mechanism of regulation of your genes.
Which is the likely causative agent for the observed changes? What is likely to happen as a result of these changes?
– Suggest effects of gene knock-in or knock-outs.
– Suggest side-effects of drugs.
– Can highlight new phenomena that needs further investigation. What does the program not explain?
o 158,000 new network interactions including
82,000 new protein-protein interactions
13,000 new substrate-product reactions (interactions)
13,000 new RNA-protein interactions
o 57,000 new unique gene-disease associations
o 10 new regulatory, metabolic and disease maps
o 14,500 new articles (network – interactions, reactions)
2018 Enhancements: More data, more interactions and new maps
5
MetaCore Specialty Modules are now available under standard MetaCore license
o MetaCore Specialty Modules are available
MetaCore Allergic Contact Dermatitis Module
MetaCore Lung Disease Module
MetaCore Metabolic Disease Module
MetaCore Neurology Module
MetaCore Normal Stem Cells Module
MetaCore Oncology Module
MetaCore Systems Toxicology Module
7
31,545
8,441
16,956
68,968
20,251
60,492
METABASE/METACORE CONTENT OVERVIEW
• Human genes
• Human SwissProt proteins
• Mouse genes
• Mouse SwissProt proteins
• Rat genes
• Rat SwissProt proteins• Compounds
• Compounds with structure
• Endogenous compounds
• Nutritional compounds
• Metabolites of xenobiotic
• Drugs
- Biologics
- Small Molecules
- Approved drugs
- Withdrawn drugs- Clinical trial drugs
- Discontinued drugs
- Preclinical drugs- Unknown
- Drug combination regimens
• Human genes in network
• Mouse genes in network
• Rat genes in network
• Chemical compounds
• Drugs
• Endogenous compounds• Metabolic reactions
• Transport reactions
• Processing Reactions • Pubmed journals
• Pubmed records
• Pubmed articles (unique)• Total amount of interactions
- Protein – Protein
- Compound – Protein
- Compound – Compound
- Metabolic enzyme -Reaction
- Transporter – Reaction
- Substrate, Product – Reaction
- RNA – Protein
• Pathway maps
- Human genes in maps
- Mouse genes in maps
- Rat genes in maps
- Interactions in maps
849,694
760,787
11,340
47,130
4,681
101,732
148,689
1,550
5,457
7,529
6,856
6,704
47,959
8,021
831,988
126
29,754
9,166
1,344
7,822
2,293
261
5,037
1,188
136
251
24,798
21,519
18,730
4,603
3,557
36,945
4,162
3,700
2,141,916
273,267
1,924,033
850,9064,681
390,215
8
METABASE/METACORE CHEMICAL COMPOUND CONTENT
Kinases (30174)
Binding Proteins (11886)
Phospholipases (2186)
Proteases (39135) Phosphatases (5005)
Transporters (15849)
Ion Channels (30043)
Transcriptional Factors (16949)
Enzymes (93557)Nuclear Receptors (15404)
Receptors with Kinase Activity (13026)
Receptors GPCR (122598)
Compound distribution by target
Compounds 850,906 100.0%
Compounds in
Network 390,215 45.9%
Compounds in
reactions 38,267 4.5%
Chemical
compounds related
to toxic pathology 3,548 0.4%
9
INTERACTIONS BY TYPE
38.2%
5.2%
0.4%2.4%
47.1%
6.7%
Protein-Protein, RNA-Protein Interactions
Regulation of Transcription (381290)
Influence on Expression (52041)
Unspecified Regulation (3629)
Covalent Modification (24435)
Direct interaction (470914)
Co-regulation of transcription (66530)
0.1% 3.2%0.3%
46.4%
50.0%
Chemical Compound-Protein Interactions
Other (1649)
Unspecified (48672)
Influence on expression (4858)
Binding (705547)
Small Molecule-Prot interactions (760726)
10
Sites of modification
1,328 Modified sites
4,108
METABASE PTM CONTENT
sitesinvolved in
interactions after modification
Enzyme-Substrate
interactions
Amount of
interactions by
mechanism of
modification
Amount of
interactions with
defined sites of
modification
Amount of
interactions
with effect
and with
defined sites
of
modification
Amount of
records with
defined sites of
modification
Number of unique
substrate proteins
with defined sites of
modification
Number of unique
sites of
modification
Total 23,023 4,102 2,575 7,330 2,423 4,108
Phosphorylation 16,661 3,482 2,141 6,264
Dephosphorylation 1,234 226 169 410
Other types 5,128 394 265 656 337 550
2,236 3,558
Interactions with modified
proteins
Amount of
interactions with
defined modified
sites of proteins
Amount of
records with
defined modified
sites of proteins
Number of
unique
proteins
interacting in
modified
status
Number of
unique modified
sites in
interactions
Total 3,013 3,855 1,177 1,328
Phosphorylated sites 2,631 3,357 1,018 981
Other sites 410 501 232 347
1,024
11
NON-CODING RNAs
59%
5%4%
4%
28%41%
Human RNAs in Network
microRNA
asRNA
lincRNA
snoRNA
otherRNAs
Non-coding RNAs
interactions Total amount of interactions
Incoming
interactions
Amount of
records for
incoming
interactions
Outgoing
interactions
Amount of
records for
outgoing
interactions
Predicted vs
validated
interactions
microRNA 143,083 10,935 14,255 132,183 183,029 109136/33947
Other types of non-
coding RNAs 8,101 5,743 6,447 2,365 2,652 3722/4379
Non-coding RNAs
per organism in
Network
Human Mouse Rat
microRNA 4,455 3,102 1,254
Other types of non-
coding RNAs 1,918 544 44
12
INTERACTIONS BY MECHANISM
6.7%
3.3%
0.5%
1.1%0.1%
41.6%
0.1%0.1%0.5%
25.1%
4.4%
3.7%
3.1%
0.4%
0.1%0.5%
8.7%
Interaction substrates and products with reactions (101732)
Unspecified (50628)
Covalent modification (7137)
Phosphorylation (16735)
Dephosphorylation (1247)
Binding (632478)
Competition (1113)
Transformation (1391)
Cleavage (7065)
Transcription regulation (381290)
Co-regulation of transcription (66530)
Influence on expression (55979)
Catalysis (47130)
Transport (5468)
Pharmacological effect (2159)
Toxic effect (8077)
miRNA binding (132423)
13
(30367)
(5019) (4068)
6,532
19,561
65,135
30,367
DRUG-TARGET INTERACTION STATISTICS
6%
94%
Drug Network Objects
Small molecules Biologics
77.3%
14.6%
7.7%
0.5%
Interaction by mechanism
Binding
Unspecified
Influence on Expression
Covalent modification
0.4%
99.6%
Target Network Objects
RNAs Proteins
interactions with
records.
unique targets
from unique
articles.
14
NETWORK OBJECTS
8.5% 3.2%
73.6%
12.3%
1.2% 1.2%
Proteins (25424)
RNA (9611)
Chemical Compounds (220717)
Metabolic reactions (36899)
Transport Reactions (3628)
Processing Reactions (3742)
15
METABASE TOXICITY CONTENT
Database objects related to Toxic pathology Volume
Gene aberrations in Tox notes 366
Genes with aberrations in tox notes 323
Chemical compounds related to toxic pathology 3,548
Chemical Toxic agents 2,585
Chemical Protective agents 1,159
Chemical markers 309
Proteins and RNAs as markers 4,067
Genes encoding marker Proteins/RNAs 3,027
References for Toxicity content Volume
Total amount of tox notes 90,228
Notes with toxic agents 60,078
Notes with protective agents 17,343
PubMed articles in tox notes 8,339
Other refs in tox notes 648
Associations related to Toxicity content Volume
Chemical agent-Protein-Pathology-PMID associations 212,125
Chemical agent-RNA-Pathology-PMID associations 52,403
Chemical agent-Endogenous compound-Pathology-PMID associations 123,128
69%
31%
Toxic/Protective agents
Chemical Toxic agents
Chemical Protective agents
16
TISSUE-SPECIFIC TOXIC PATHOLOGY STATISTICS
Database objects related to Toxic pathology
Genes with
aberrations related
to pathology
Chemical Toxic
agents
Chemical
Protective agents Chemical markers
Proteins and
RNAs as
markers
Genes encoding
marker
Proteins/RNAs
Bone joint pathology 0 8 8 7 60 54
Bone marrow pathology 5 154 19 23 150 140
Bone pathology 3 67 29 25 178 163
Epididymis pathology 1 326 44 54 289 271
Esophagus pathology 4 118 0 8 47 53
Forestomach pathology 4 196 10 14 67 75
Glandular stomach pathology 8 180 130 28 149 135
Heart pathology 8 330 95 51 489 438
Intestine pathology 276 545 558 148 1,538 1,163
Kidney pathology 25 1,078 231 96 922 737
Liver pathology 42 1,334 327 128 1,503 1,216
Lung pathology 22 471 159 71 562 458
Nose pathology 3 200 3 8 35 47
Testicular pathology 7 479 77 83 512 436
Trachea pathology 1 73 0 7 38 45
17
1,550
PRE-BUILT MAPS AND NETWORKS
Graphic content
Maps : Networks:
ACM2 and ACM4 activation of ERK
Inflammation_IL-6 signaling
Unique content Volume
Processes networks 159
Metabolic endo
networks 118
Toxity networks 395
Metabolic networks 250
Disease biomarkers
networks 88
Drug target networks 92
1,102
Unique content Volume
Regulatory maps 662
Disease maps 721
Metabolic maps 138
Toxicity maps 29
18
9,422
94,252
2,762
DISEASE STATISTICS (MetaCore)
Genes NOT linked to Diseases
(40047)66%
Genes linked to Diseases
(20445)
34%
Human genes total
Diseases linked to
Genes
29%
Diseases NOT linked to
Genes
71%
Diseases, based on MESH+OMIM
genes are linked to 20,445 diseases
unique gene-disease associations from 182,786
60,492
articles
19
117,374
3,478
DISEASE STATISTICS (METABASE)
Genes NOT linked to Diseases
(38947)64%
Genes linked to Diseases
(21545)
36%
Human gene total
Diseases linked to Genes
(3478)
37%
Diseases NOT linked to
Genes (5944)
63%
Disease, based on MESH+OMIM
genes are linked to diseases
unique gene-disease associations from articles
21,545
240,223
60,492 9,422
20
OBJECT TYPES IN GENE-DISEASE ASSOCIATIONS
Database object Number of GenesNumber of
Objects
Genetic variants and epigenetics 18,618 331,948
Amplification 999 1,051
Rearrangement 942 3,221
Locus change 125 128
Fusion gene 201 135
Haplotype/SNP 18,323 325,417
Epigenetic modification 1,513 1,693
mRNA level 7,647 8,979
Major transcript , prot-coding 6,673 6,573
Alternative transcript, prot-coding 418 931
miRNA/other non-coding RNA 921 1,427
Protein abundance, activity, concentration or localization
change 5,653 6,779
Generic protein 5,597 5,737
Peptide 27 52
Posttranslational modifications 265 432
Isoform 233 429
Mutant protein 47 129
Endogenous metabolites - 583
Genetic variants and epigenetics
- Genes (18443)
mRNA (7411)
Protein (5595)
38,832
14,738Aberration biomarkers
Quantitative biomarkers
21
What do you need to do?
Disease pathway modeling
and investigation of casual
mechanisms
Knowledge mine manually curated data from peer reviewed sources to generate hypothesis.
Incorporate ‘omics’ data to further validate these hypothesis.
Target and biomarker
assessment and validationPathway analysis of ‘omics’ data for drug and biomarker discovery.
Understand the
mechanisms of genes
associated with variants
Combine variant data with other ‘omics’ data for a systems view of your disease.
Patient stratification and
comprehensive sample
comparison
Compare multiple ‘omics’ datasets at once to uncover differences and similarities in patient groups, time courses, or drug treatments.
22
Systems Biology Solutions
Disease
Specialty
Modules
Manually curated from Journal Literature
Programmatic AccessAPI of SQL ACCESS
Content
Add-ons
Analysis
Platforms
Systems
Toxicology
Modules
METACOREPathway Analysis
Platform
METADRUGCompound Activity
Prediction Platform
Pipeline
Pilot
METABASE
23
Confidence in interactions used for interpretation
ONE Complete Global Network
“The results of this study show that for the majority of pathway databases, the overlap between experimentally obtained target genes and targets reported in transcriptional regulatory pathway databases is surprisingly small and often is not statistically significant. The only exception is MetaCore pathway database which yields statistically significant intersection with experimental results in 84% cases.”
Percentage of statistically significant
intersections with gold standards
Transcription factor/
Gold standard ID#
Systematic study of transcription factors
and their targets identified through “gold
standard experiments” and intersection
with transcriptional regulatory interactions
in free and commercially available
databases
16% Ingenuity (Transcription)
36% Ingenuity (All)
32% TransPath
16% TransFac
16% Biocarta
24% KEGG
8% Wikipathways
16% Cell Signaling Technology
16% GeneSpring (Expression or Binding)
4% GeneSpring (Expression and Binding)
28% PathwayStudio
84% MetaCore
Assessing quality and completeness of human transcriptional regulatory
pathways on a genome-wide scale
Biology Direct 2011, 6:15 doi: 10.1186/1745-6150-6-15
24
From peer reviewed articles to signaling pathways
195 publications for EGF-EGFR interaction
Manual annotation from publications• Team of PhDs, MDs• More than 10 years
Publications Molecular Interaction 1,600CANONICAL AND
DISEASE SIGNALING PATHWAYS
1,800,000 molecular interactions
Global Network
25
Unique features of MetaCore interactions
Molecular function feature
MetaBase Does it exist in public domain?
Directionality There is always a initiator molecule and effector (acceptor) molecule. •Directionality exists for all types of interactions:both presented and not presented on pathway maps
Directionality/effect/mechanism may be available for interactions presented in pathway databases, but these are exclusive onlu to interactions found in pathways not the entire network
Effect Indicates if an initiator molecule activates or inhibits an effector. •Effect exists for all types of interactions: both presented and not presented on pathway maps
Some specialized databases that focus on miRNA action, Transcription
Factor regulation or PTM modifications have directionality/mechanism
due to the nature of their interaction mechanisms. However they are source specific and often include low trust experimental sources. Effect, however, is usually unavailable.
Mechanism of interaction Describes physico-chemical process by which molecules interact. •Theses are not exclusive to physical interactions (e.g. binding, phosphorylation, etc) but also catalysis, miRNA binding, transcriptional regulation and indirect causal effects like influence on expression caused by other molecules including natural ligands and compounds (including drugs).
The majority of databases will have different gene/protein associations
1) physical binding, 2) Co-expression, 3) Disease co-association;4) Genome neighborhood.However these databases lack functional feature data especiallymechanism of interaction.
26
Flexibility in data analysis
11 Different Network Building Algorithms, all with
written and visual descriptions
Multiple automated Workflows to
save, share and export
But also One-Click analysis for
instant answersCausal reasoning algorithm to find key hubs
27
MetaCore— Genomic Analysis Tool
Genomic Analysis Tools (GAT) allow analyzing gene variant data obtained by Next Generation Sequencing techniques. It contains tools for cohort and family trio analysis followed by comprehensive annotation and interpretation of gene variants required for identifying potentially causal ones. All gene variants could be used for pathway and network analysis.
Functional Prediction Pathway AnalysisAnnotation
Filter based on standard and knowledge based options
Understand variant role in disease through pathway analysis
Predict damaging changes due to variant
Gene Variant
IDENTIFY TOP-SCORED VARIANTS
Gene Variant
Gene Variant
DISEASE
Variants With KnownPhenotypic Impact
Variants Predicted To Be Damaging
28
Multi Omics Simultaneous Analysis
Combining metabolomic,
transcriptomic and gene variants
for side by side –omics
enrichment analysis
29
Companion modules available with MetaCore
Data Annotation and Processing Tool (DAPT) serves for processing of raw microarray data with following uploading of resulting differentially expressed gene sets directly to MetaCore. You can upload a dataset from a public repository and make a differential expressed gene list.
Pathway Map Creator (PMC) is a companion application that allows creation of custom pathway diagrams in MetaCore style. Such maps might be a visualization of analysis made in MetaCore adapted for publication or uploaded to MetaCore making it a part of analysis of new data.
Pathway Creation Algorithms in MetaCore
Direct Interactions Algorithm
• Draws direct interactions between selected objects.
No additional objects are added to the network
Self regulatory Networks
Finds the shortest directed paths containing transcription factors between your genes in the gene list.
(better used for small number of targets)
Auto expand
Draws sub-networks around
the selected objects, stopping
the expansion when the sub-
networks intersect
Pathway Creation Algorithms in MetaCore
• Analyze Network: Creates a list of possible networks, ranked according to how many objects in the network correspond to the user's list of genes, how many nodes are in the network, how many nodes are in each smaller network.
• Analyze Transcription Network similar to above, sub-networks created are centered on TFs.
• Analyze Networks (Transcription Factors) focusses on presence of TFs at end notes.
• Analyze Networks (Receptors) focusses on presence on Receptors at end point of a network.
Analyze Network Algorithm
Generates sub-
networks highly
saturated with selected
objects. Sub-networks
are ranked by a P-
value and
G-Score and
interpreted in terms of
Gene Ontology
Analyze Networks (Transcription Factors) Algorithm- an example
Favors network construction where the
end-nodes of transcriptionally
regulated pathways are present in the
original gene list.
P=7.2e-46
Example from an mRNA
expression analysis data set
comparing healthy and lesion
skin.
Analyze Network (Receptors) Algorithm- an example
Favors network construction where the
end-point of a pathway leads to a receptor
(through “receptor binding”) and the starting
point of a pathway (a transcription factor, or
ligands, etc.) is present in the original gene
list, regardless of the presence of the end-
point receptor in the list.
Transcription Regulation Algorithm
13 targets/14 nodes
P=7.3e-31
Generates sub-networks
centered on transcription
factors. Sub-networks are
ranked by a P-value and
interpreted in terms of
Gene Ontology
Immune response: Histamine H1 receptor signaling in immune response (p=1e-4)
Disease biomarker enrichment
Network-Disease Associations
1) Carcinoma (72% coverage, p=3.3e-10)
2) Neoplasms, connective and soft tissue. (42% coverage, p=8e-10)
Use of Pathway Analysis in Candidate Gene Identification
1061 genesare located to mapped region for disease
FGF2, WNT5A, Tenascin-C, EGF, ILI1RN, BDNF, TGF-beta2, FGF2, OSF-2, CSPG4(NG2), IL-8, ENA-78, GCP2, SLIT2, SLIT3, Activin beta A, Annexin I
360 genes up- or down-regulated by >2x
17 receptor ligand genes are important “input” nodes to pathways formed by genes with changed expression.
Other up- or down-regulated genes
Pathway analysis narrows down number of candidate genes for disease
ErbB2PECAM1DDX5BCAS3 microRNA1 RARalpha MUL VHR WIPErbB2 NIK Plakoglobin HEXIM1 Prohibitin STAT5A STAT3ClathrinPSME3PSMC5ErbB2
FGF2, ILI1RN,ErbB2
360 genes up- or down-regulated by >2x
Other up- or down-regulated genes
These genes, from mapped region of interest, are able to form interaction pathways going through these receptor ligands identified by first analysis.
A caveat
Not every gene belongs to a pathway in the database…