Finding Consistent Subnetworks across Microarray dataset

FINDING CONSISTENT SUBNETWORKS ACROSS MICROARRAY DATASETFan QiGS5002 Journal Club

OUTLINE Introduction

Methodology

Results & Discussions

Conclusions

INTRODUCTION Identify Differential Gene Expression

Identify significant genes w.r.t a phenotype

Importance: Testing effectiveness of treatment Biological insights of diseases Develop new treatment Disease Prophylaxis Any others ?

CURRENT METHODS Individual Genes

Search for individual differentially expressed genes

Fold-change, t-test, SAM

Gene Pathway Detection Looking at a set of genes instead of individual

genes Bayesian learning and Boolean network learning

Gene Classes Adding existing biological insights Over-representation analysis (ORA), Functional

Class Scoring(FCS), GSEA, NEA, ErmineJ

CHALLENGE Different Results from Different Dataset of

the SAME disease!

Zhang M [1] demonstrated inconsistency in SAM:Datasets DEGs POG nPOG

Prostate cancer

Top 10 0.3 0.3Top 50 0.14 0.14TOP 100 0.15 0.15

Lung cancerTop 10 0.00 0.00Top 50 0.20 0.19TOP 100 0.31 0.30

DMDTop 10 0.20 0.20Top 50 0.42 0.42TOP 100 0.54 0.54

Reconstruct from Table 1 in [1]

Inconsistencyamong datasets

NEW APPROACH SNet [2]

Proposed in 2011 Utilize gene-gene relationship in analysis

Gene-gene relationship Activates VS. Inhibits

Gene Subnetwork Gene is the Vertex, Relationship is an edge

From Fig 1 in [2]

RHOA VAV PIK3R2

ARHGEF1 RAC1 IQGAP

1 Partially adapted from Fig 2 in [2]

METHODOLOGY Input:

Genes labeled with phenotype Gain from microarray experiment

Third-party Info: Gene Pathway Info Gene Reaction Info

Attributes of Subnetwork Size, Score

Output: A set of significant sub-network

Subnetwork

Extraction

Subnetwork

Scoring

Subnetwork

Significance

METHODOLOGY –STEP 1

P3 P2P1

Phenotypes

……..

Patient’s Gene Ranked List

Only top genes is kept

for patient

Repeat for every phenotype group

P1 (d)

Select one phenotype as others as

select genes occur in of patients

𝛽=50

𝐺𝐿

P1 P1 P1 P1

…….

Partition into multiple pathwaysGenerate Subnetwork

𝐺𝐿

………

𝑎5𝑎3

𝑎4 𝑎7

𝑎6𝑎2

𝑎5𝑎3

𝑎4 𝑎7

𝑎6𝑎2

A list of Subnetworks w.r.t

METHODOLOGY – STEP 2 For each Subnetwork in in the and Patient ,

compute overall expression level: = , where a gene in that is highly expressed in # patients in who have highly expressed : total # patients in

For Patients and compute t-test

𝑆 𝑠𝑝𝑠𝑝 ,𝑑=¿𝑆𝑁𝑒𝑡𝑠𝑝 ,1 ,𝑆𝑁𝑒𝑡𝑠𝑝 ,2…𝑆𝑁𝑒𝑡 𝑠𝑝 ,𝑛>¿

𝑆 𝑠𝑝𝑠𝑝 ,¬𝑑=¿𝑆𝑁𝑒𝑡 𝑠𝑝 ,𝑛+1 ,𝑆𝑁𝑒𝑡𝑠𝑝 ,𝑛+2…𝑆𝑁𝑒𝑡 𝑠𝑝 ,𝑚>¿𝑆𝑆𝑝 𝑠𝑝 , 𝑡

T test

Assign to each Subnetwork

𝑎5𝑎3

𝑎4 𝑎7

𝑎6𝑎2

P1 (d)

METHODOLOGY – STEP 3A. Randomly Swap Phenotype labels of

patient, recreating subnetworks and t-test scores (step 1-2)

B. Repeat [A] for 1,000 permutations.• Forms a 2-D histogram ()

C. Estimate the nominal p-value of each Subnetwork

D. Select Subnetwork with -Null-hypo: subnetwork with is not significant

Fig 5 in original paper

RESULTS AND DISCUSSIONS Dataset:

Leukemia: Golub VS Armstrong ALL: Ross VS Yeoh DMD: Haslett VS Pescatori Lung: Bhattacharjee VS Garber

Performance Comparison: Subnetwork Overlap (with GSEA) Gene Overlap (GSEA, SAM, t-Test)

Other Comparisons: Network Size, Gene Validity with t-Test

RESULTS AND DISCUSSIONS Subnetwork Overlap

Disease Dataset 1 Dataset 2 SNET GSEA SNET

Leukemia Golub Armstrong

83.33% 0% 20 0

ALL Ross Yeoh 47.63% 23.1% 10 6DMD Haslett Pescatori 58.33% 55.6% 7 10Lung Bhattacharj

eeGarber 90.90% 0% 9 0

Synthesized from Table 1, 2 from [2]Higher the better

RESULTS AND DISCUSSIONS Gene Overlap

Disease Snet GSEA T-Test (p <0.05)

T-Test(top)

SAM(p <0.05)

SAM(top)

Leukemia 91.30% 2.38% 73.01% 14.29% 49.96% 22.62%

ALL 93.01% 4.0% 60.20% 57.33% 81.25% 49.33%

DMD 69.23% 28.9% 49.60% 20.00% 76.98% 42.22%

Lung 51.18% 4.0% 65.61% 26.16% 65.61% 24.62%

Synthesized from Table 3, 4,5 from [2]Higher the better

RESULTS AND DISCUSSIONS Size of subnetworks

Disease T-Test SNetSize of Network 2 3 4 5 5 6 7 >8

Leukemia 84 8 1 0 0 2 3 2 1

Subtype 75 5 1 1 1 1 0 1 6

DMD 45 3 1 0 0 1 0 0 5

Lung 65 3 2 1 0 5 3 0 1

Reconstructed from Table 6 from [2]

RESULTS AND DISCUSSIONS Validity

Compare the genes in EACH Subnetwork with those in t-test

Genes in each Subnetwork appears in T-Test is around 70%- 100%

Selected Results (too large to present full) Subnetwork Name Percentage Subnetwork Name PercentageLeukaemia_B Cell-VAV1 81.82% SNET_CTNNB1 100%

Leukaemia_UBC 100% SNET_TNFSF10 60%

Leukaemia_RAC1 57.15% SNET_PYGM 60%

DMD_RHOA 75% DMD_ACTB 83.33%

DMD_SDC3 88.89% Leaukaemia_POU2F2 75.00%

MLLBCR_ACAA1 28.67% BCR_T_RASA1 44.44%

MLLBCR_BLNK 72.73% BCR_ABL1 75.00%

SNET_NOTCH3 100% DMD_CALM1 80%

Selected from Table 7,8,9,10 in[2]

CONCLUSIONS Traditional Methods have inconsistency

problem across different dataset of the same disease

SNet utilize Biological insights to mitigate the gap Gene-to-Gene relationship Gene Pathway knowledge

SNet shows better results than established algorithms More consistent

REFERENCES [1] Zhang M, Zhang L, Zou J, Yao C, Xiao H, Liu Q, Wang J, Wang D,

Wang C, Guo Z: Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes.

[2] Donny Soh, Difeng Dong1, Yike Guo, Limsoon Wong Finding consistent disease subnetworks across microarray datasets

THANK YOU!!

Finding Consistent Subnetworks across Microarray dataset

Documents

Reorganization of Functionally Connected Brain Subnetworks.....

Good Subnetworks Provably Exist: Pruning via Greedy ...Good....

Lecture 8 Microarray experiments MA plots Normalization of.....

Microarray (DNA and SNP microarray)

Inferring Host Gene Subnetworks Involved in Viral...

Supplemental Figures - Biochemistry · Supplemental...

Plant Microarray

DNA microarray and array data analysis - Computer...

Microarray ISAC

Different microarray applications · Different microarray.....

Chapter 5: Microarray Techniques - Columbia University ·.....

Subnetworks in Schizophrenia, fMRI

TRAINING INDEPENDENT SUBNETWORKS FOR ROBUST …

Identifying functional subnetworks in large-scale datasets

Estimating Dataset Size Requirements for Classifying DNA...

BioVLAB-Microarray: Microarray Data Analysis in...