Top Banner
1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture models for classifying differentially expressed genes
14

1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

Mar 28, 2015

Download

Documents

Abigail Mooney
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

1

Alex LewinCentre for Biostatistics

Imperial College, London

Joint work with Natalia Bochkina, Sylvia Richardson

BBSRC Exploiting Genomics grant

Mixture models for classifying differentially expressed genes

Page 2: 1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

2

Modelling differential expression

• Many different methods/models for differential expression– t-test – t-test with stabilised variances (EB)– Bayesian hierarchical models– mixture models

• Choice whether to model alternative hypothesis or not

• Our model: – Model the alternative hypothesis – Fully Bayesian

Page 3: 1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

3

• Gene means and fold differences: linear model on the log scale

• Gene variances: borrow information across genes by assuming exchangeable variances

• Mixture prior on fold difference parameters

• Point mass prior for ‘null hypothesis’

Mixture model features

Page 4: 1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

4

• 1st level

yg1r | g, dg, g1 N(g – ½ dg , g12),

yg2r | g, dg, g2 N(g + ½ dg , g22),

• 2nd level

gs2 | as, bs

IG (as, bs)

dg ~ 0δ0 + 1G_ (1.5, 1) + 2G+ (1.5, 2)

• 3rd level

Gamma hyper prior for 1 , 2 , as, bs

Dirichlet distribution for (0, 1, 2)

Fully Bayesian mixture model for differential expression

Explicit modellingof the alternative

H0

Page 5: 1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

5

• In full Bayesian framework, introduce latent allocation variable zg = 0,1 for gene g in null, alternative

• For each gene, calculate posterior probability of belonging to unmodified component: pg = Pr( zg = 0 | data )

• Classify using cut-off on pg (Bayes rule corresponds to 0.5)

• For any given pg , can estimate FDR, FNR.

Decision Rules

For gene-list S, est. (FDR | data) = Σg S pg / |S|

Page 6: 1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

6

Simulation Study

Explore Explore performance of fully Bayesian mixture in

different situations:

• Non-standard distribution of DE genes

• Small number of DE genes

• Small number of replicate arrays

• Asymmetric distributions of over- and under-expressed genes

Simulated data, 50 simulated data sets for each of several different set-ups.

Page 7: 1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

7

2500 genes, 8 replicates in each experimental condition

dg ~ 0δ0 + 1 ( Unif() + (1 - ) N() ) + 2 ( Unif() + (1 - ) N() )

gs ~ logNorm(-1.8, 0.5) ( logNorm based on data )

Simulation Study

Page 8: 1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

8Gamma distributions superimposed

Non-standard distributions of DE genes

Av. est. π0 = 0.805 ± 0.010

Av. est. π0 = 0.797 ± 0.010

Av. est. π0 = 0.781 ± 0.010

= 0.3 = 0.5 = 0.8

π0 = 0.8

Page 9: 1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

9

Small number of DE genes / Small number of replicate arrays

True π0 = 0.95

True π0 = 0.99

8 replicates

Av. FDR = 7.0 %Av. FNR = 2.0 %Av. est. π0 = 0.947 ± 0.007

3 replicates

Av. FDR = 17.9 %Av. FNR = 3.6 %Av. est. π0 = 0.956 ± 0.009

8 replicates

Av. FDR = 9.2 %Av. FNR = 0.6 %Av. est. π0 = 0.990 ± 0.003

3 replicates

Av. FDR = 17.6 %Av. FNR = 0.9 %Av. est. π0 = 0.995 ± 0.007

Page 10: 1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

10

Asymmetric distributions of over/under-expressed genes

True π0 = 0.9True π1 = 0.09True π2 = 0.01

Av. est. π0 = 0.897 ± 0.007Av. est. π1 = 0.093 ± 0.003Av. est. π2 = 0.011 ± 0.006

dg ~ 0δ0 + 1 (0.6 Unif( 0.01 , 1.7 ) + 0.4 N(1.7 , 0.8) ) + 2 (0.6 Unif( -0.7 , -0.01 ) + 0.4 N( -0.7 , 0.8) )

Page 11: 1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

11

1) FDR / FNR can be estimated well

Additional Checks

50 simulations of same set-up:Av. est. π0 = 0.999No genes are declared to be DE.

2) Model works when there are no DE genes

True FDREst. FDR

True FNREst. FNR

Page 12: 1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

12

Comparison with conjugate mixture prior

Replacedg ~ 0δ0 + 1G_ (1.5, 1) + 2G+ (1.5, 2)

withdg ~ 0δ0 + 1 N(0, cg

2 )

NB: We estimate both c and 0 in fully Bayesian way.

True 0 Est. 0 with

Gamma prior

Est. 0 with

conjugate prior

0.8 0.781 ± 0.010 0.796 ± 0.010

0.95 0.947 ± 0.007 0.955 ± 0.006

0.99 0.990 ± 0.003 0.991 ± 0.003

1 0.999 ± 0.001 0.999 ± 0.001

Page 13: 1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

13

Application to Mouse data

Mouse wildtype (WT) and knock-out (KO) data (Affymetrix)

~ 22700 genes, 8 replicates in each WT and KO

Gamma prior Est. π0 = 0.996 ± 0.001 Declares 59 genes DE

Page 14: 1 Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant Mixture.

14

Summary

• Good performance of fully Bayesian mixture model– can estimate proportion of DE genes in variety of situations– accurate estimation of FDR / FNR

• Different mixture priors give similar classification

results

• Gives reasonable results for real data