Finding Consistent Subnetworks across Microarray dataset
Post on 24-Feb-2016
33 Views
Preview:
DESCRIPTION
Transcript
FINDING CONSISTENT SUBNETWORKS ACROSS MICROARRAY DATASETFan QiGS5002 Journal Club
2
OUTLINE Introduction
Methodology
Results & Discussions
Conclusions
3
INTRODUCTION Identify Differential Gene Expression
Identify significant genes w.r.t a phenotype
Importance: Testing effectiveness of treatment Biological insights of diseases Develop new treatment Disease Prophylaxis Any others ?
4
CURRENT METHODS Individual Genes
Search for individual differentially expressed genes
Fold-change, t-test, SAM
Gene Pathway Detection Looking at a set of genes instead of individual
genes Bayesian learning and Boolean network learning
Gene Classes Adding existing biological insights Over-representation analysis (ORA), Functional
Class Scoring(FCS), GSEA, NEA, ErmineJ
5
CHALLENGE Different Results from Different Dataset of
the SAME disease!
Zhang M [1] demonstrated inconsistency in SAM:Datasets DEGs POG nPOG
Prostate cancer
Top 10 0.3 0.3Top 50 0.14 0.14TOP 100 0.15 0.15
Lung cancerTop 10 0.00 0.00Top 50 0.20 0.19TOP 100 0.31 0.30
DMDTop 10 0.20 0.20Top 50 0.42 0.42TOP 100 0.54 0.54
Reconstruct from Table 1 in [1]
Inconsistencyamong datasets
6
NEW APPROACH SNet [2]
Proposed in 2011 Utilize gene-gene relationship in analysis
Gene-gene relationship Activates VS. Inhibits
Gene Subnetwork Gene is the Vertex, Relationship is an edge
From Fig 1 in [2]
RHOA VAV PIK3R2
ARHGEF1 RAC1 IQGAP
1 Partially adapted from Fig 2 in [2]
7
METHODOLOGY Input:
Genes labeled with phenotype Gain from microarray experiment
Third-party Info: Gene Pathway Info Gene Reaction Info
Attributes of Subnetwork Size, Score
Output: A set of significant sub-network
Subnetwork
Extraction
Subnetwork
Scoring
Subnetwork
Significance
8
METHODOLOGY –STEP 1
P3 P2P1
Phenotypes
……..
Patient’s Gene Ranked List
9
METHODOLOGY –STEP 1
P1 P1
Only top genes is kept
for patient
Repeat for every phenotype group
10
METHODOLOGY –STEP 1
P1 (d)
Select one phenotype as others as
select genes occur in of patients
𝛽=50
𝐺𝐿
P1 P1 P1 P1
…….
11
METHODOLOGY –STEP 1
Partition into multiple pathwaysGenerate Subnetwork
𝐺𝐿
………
𝑎1
𝑎5𝑎3
𝑎4 𝑎7
𝑎6𝑎2
𝑎1
𝑎5𝑎3
𝑎4 𝑎7
𝑎6𝑎2
A list of Subnetworks w.r.t
12
METHODOLOGY – STEP 2 For each Subnetwork in in the and Patient ,
compute overall expression level: = , where a gene in that is highly expressed in # patients in who have highly expressed : total # patients in
For Patients and compute t-test
𝑆 𝑠𝑝𝑠𝑝 ,𝑑=¿𝑆𝑁𝑒𝑡𝑠𝑝 ,1 ,𝑆𝑁𝑒𝑡𝑠𝑝 ,2…𝑆𝑁𝑒𝑡 𝑠𝑝 ,𝑛>¿
𝑆 𝑠𝑝𝑠𝑝 ,¬𝑑=¿𝑆𝑁𝑒𝑡 𝑠𝑝 ,𝑛+1 ,𝑆𝑁𝑒𝑡𝑠𝑝 ,𝑛+2…𝑆𝑁𝑒𝑡 𝑠𝑝 ,𝑚>¿𝑆𝑆𝑝 𝑠𝑝 , 𝑡
T test
Assign to each Subnetwork
𝑎1
𝑎5𝑎3
𝑎4 𝑎7
𝑎6𝑎2
P1 (d)
13
METHODOLOGY – STEP 3A. Randomly Swap Phenotype labels of
patient, recreating subnetworks and t-test scores (step 1-2)
B. Repeat [A] for 1,000 permutations.• Forms a 2-D histogram ()
C. Estimate the nominal p-value of each Subnetwork
D. Select Subnetwork with -Null-hypo: subnetwork with is not significant
Fig 5 in original paper
14
RESULTS AND DISCUSSIONS Dataset:
Leukemia: Golub VS Armstrong ALL: Ross VS Yeoh DMD: Haslett VS Pescatori Lung: Bhattacharjee VS Garber
Performance Comparison: Subnetwork Overlap (with GSEA) Gene Overlap (GSEA, SAM, t-Test)
Other Comparisons: Network Size, Gene Validity with t-Test
15
RESULTS AND DISCUSSIONS Subnetwork Overlap
Disease Dataset 1 Dataset 2 SNET GSEA SNET
GSEA
Leukemia Golub Armstrong
83.33% 0% 20 0
ALL Ross Yeoh 47.63% 23.1% 10 6DMD Haslett Pescatori 58.33% 55.6% 7 10Lung Bhattacharj
eeGarber 90.90% 0% 9 0
Synthesized from Table 1, 2 from [2]Higher the better
16
RESULTS AND DISCUSSIONS Gene Overlap
Disease Snet GSEA T-Test (p <0.05)
T-Test(top)
SAM(p <0.05)
SAM(top)
Leukemia 91.30% 2.38% 73.01% 14.29% 49.96% 22.62%
ALL 93.01% 4.0% 60.20% 57.33% 81.25% 49.33%
DMD 69.23% 28.9% 49.60% 20.00% 76.98% 42.22%
Lung 51.18% 4.0% 65.61% 26.16% 65.61% 24.62%
Synthesized from Table 3, 4,5 from [2]Higher the better
17
RESULTS AND DISCUSSIONS Size of subnetworks
Disease T-Test SNetSize of Network 2 3 4 5 5 6 7 >8
Leukemia 84 8 1 0 0 2 3 2 1
Subtype 75 5 1 1 1 1 0 1 6
DMD 45 3 1 0 0 1 0 0 5
Lung 65 3 2 1 0 5 3 0 1
Reconstructed from Table 6 from [2]
18
RESULTS AND DISCUSSIONS Validity
Compare the genes in EACH Subnetwork with those in t-test
Genes in each Subnetwork appears in T-Test is around 70%- 100%
Selected Results (too large to present full) Subnetwork Name Percentage Subnetwork Name PercentageLeukaemia_B Cell-VAV1 81.82% SNET_CTNNB1 100%
Leukaemia_UBC 100% SNET_TNFSF10 60%
Leukaemia_RAC1 57.15% SNET_PYGM 60%
DMD_RHOA 75% DMD_ACTB 83.33%
DMD_SDC3 88.89% Leaukaemia_POU2F2 75.00%
MLLBCR_ACAA1 28.67% BCR_T_RASA1 44.44%
MLLBCR_BLNK 72.73% BCR_ABL1 75.00%
SNET_NOTCH3 100% DMD_CALM1 80%
Selected from Table 7,8,9,10 in[2]
19
CONCLUSIONS Traditional Methods have inconsistency
problem across different dataset of the same disease
SNet utilize Biological insights to mitigate the gap Gene-to-Gene relationship Gene Pathway knowledge
SNet shows better results than established algorithms More consistent
20
REFERENCES [1] Zhang M, Zhang L, Zou J, Yao C, Xiao H, Liu Q, Wang J, Wang D,
Wang C, Guo Z: Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes.
[2] Donny Soh, Difeng Dong1, Yike Guo, Limsoon Wong Finding consistent disease subnetworks across microarray datasets
21
THANK YOU!!
top related