Top Banner
Gibbs sampling for motif finding Yves Moreau
24

Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

Dec 13, 2015

Download

Documents

Ashley Garrett
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

Gibbs sampling for motif finding

Yves Moreau

Page 2: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

2

Overview

Markov Chain Monte Carlo

Gibbs sampling

Motif finding in cis-regulatory DNA

Biclustering microarray data

Page 3: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

3

Markov Chain Monte-Carlo

Markov chain with transition matrix T

)|( 1 iXjXPT ttij

A C G TA 0.0643 0.8268 0.0659 0.0430

C 0.0598 0.0484 0.8515 0.0403

G 0.1602 0.3407 0.1736 0.3255

T 0.1507 0.1608 0.3654 0.3231

X=A

X=C X=G

X=T

Page 4: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

4

Markov Chain Monte-Carlo

Markov chains can sample from complex distributionsACGCGGTGTGCGTTTGACGAACGGTTACGCGACGTTTGGTACGTGCGGTGTACGTGTACGACGGAGTTTGCGGGACGCGTACGCGCGTGACGTACGCGTGAGACGCGTGCGCGCGGACGCACGGGCGTGCGCGCGTCGCGAACGCGTTTGTGTTCGGTGCACCGCGTTTGACGTCGGTTCACGTGACGCGTAGTTCGACGACGTGACACGGACGTACGCGACCGTACTCGCGTTGACACGATACGGCGCGGCGGGCGCGGACGTACGCGTACACGCGGGAACGCGCGTGTTTACGACGTGACGTCGCACGCGTCGGTGTGACGGCGGTCGGTACACGTCGACGTTGCGACGTGCGTGCTGACGGAACGACGACGCGACGCACGGCGTGTTCGCGGTGCGG

ACGT

%

Positio

n

Page 5: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

5

Markov Chain Monte-Carlo

Let us look at the transition after two steps

Similarly, after n steps

TTT

TT

iXkXPkXjXP

iXkXPiXkXjXPiXjXPT

S

kkjik

S

ktttt

S

ktttttttij

.

)|()|(

)|(),|()|(

)2(

1

1112

11122

)2(

( ) ( | )n nt n tT P X X T

Page 6: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

6

Markov Chain Monte-Carlo

Stationary distribution

If the samples are generated to the distribution , the samples at the next step will also be generated according to

is a left eigenvector of T Equilibrium distribution

Rows of T are stationary distributions From an arbitrary initial condition and after a sufficient number of

steps (burn-in), the successive states of the Markov chains are samples from a stationary distribution

T

TT

TTT

TT

n

n

n

n

1lim

lim 0.1188 0.0643 0.8268 0.0659 0.0430 0.1188

0.2788 0.0598 0.0484 0.8515 0.0403 0.2788. =

0.3905 0.1602 0.3407 0.1736 0.3255 0.3905

0.2119 0.1507 0.1608 0.3654 0.3231 0.2119

T

Page 7: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

7

Detailed balance

A sufficient condition for the Markov chain to converge to the stationary distribution p is that they satisfy the condition of detailed balance

Proof:

Problem: disjoint regions in probability space

, ,i ij j jip T p T i j

,j ji i ij i ij iij j j

pT p T p T p T p i

Page 8: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

8

Gibbs sampling

Markov chain for Gibbs sampling

1

1 1

0 0 0

( | , )1

( | , )1 1

( | , )1 1 1

( , , ) ( | , ) ( | , ) ( | , )

( , , )

( , , )

( , , )

( , , )

i i

i i

i i

P A B b C ci i i i

P B A a C ci i i i

P C A a B bi i i i

P A B C P A B C P B A C P C A B

a b c

a a b c

b a b c

c a b c

Page 9: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

9

Gibbs sampling

Detailed balance Detailed balance for the Gibbs sampler

Prove detailed balance

Bayes’ rule

Q.E.D.

1 1 1 1

1 1 1 1 1 1

( , , ) ( | , , , , , )

( , , , , , , ) ( | , , , , , )n i i i n

i i i n i i i n

P x x P x x x x x

P x x x x x P x x x x x

( ) ( | ) ( ) ( | ), ,P x y x P y x y x y

1 1 1( | ) ( | , , , , , )i i i ny x P x x x x x 1( ) ( , , )nP x P x x

1 1 1 1 1 1 1

1 1 1 1 1 1

( , , ) ( , , , , , , ) / ( , , , , , )

( , , , , , , ) ( , , ) / ( , , , , , )n i i i n i i n

i i i n i n i i n

P x x P x x x x x P x x x x

P x x x x x P x x P x x x x

Page 10: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

10

Data augmentation Gibbs sampling

Introducing unobserved variables often simplifies the expression of the likelihood

A Gibbs sampler can then be set up

Samples from the Gibbs sampler can be used to estimate parameters

( , | ) ( | , ) ( | , )

( | , ) ( | , )

model parameters, missing data, data

i ji j

P M D P M D P M D

P M D P M D

M D

PME

1

1( | ) ( , | )

Nk

kM

E D P M D dMdN

Page 11: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

11

Pros and cons

Pros Clear probabilistic interpretation Bayesian framework “Global optimization”

Cons Mathematical details not easy to work out Relatively slow

Page 12: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

12

Motif finding

Page 13: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

13

Gibbs sampler

Gibbs sampling for motif finding Set up a Gibbs sampler for the joint probability of the motif matrix and the

alignment given the sequences

Sequence by sequence

Lawrence et al. One motif of fixed length One occurrence per sequence Background model based on single nucleotides Too sensitive to noise Lots of parameter tuning

( , | ) ( | , ) ( | , )

motif matrix, alignment, sequences

P A S P A S P A S

A S

),|(),|(1

iii

K

iSaPSAP

Page 14: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

2.005.09.005.004.005.005.01.0

4.09.005.09.003.005.004.02.0

2.002.001.001.08.01.09.04.0

2.003.004.004.003.08.001.03.0

NCACGTGN :model Motif

T

G

C

A

28.0

24.0

16.0

32.0

model Background

T

G

C

A

1 20 Motif( | , , )W bg bgP S a B P P P

1Motif ,1

x j

W

j bj

P q

1

1

0,1

j

a

bg bj

P q

Translation start500 bp

2 0, j

L

bg bj a W

P q

Page 15: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

15

Gibbs motif finding

Initialization Sequences Random motif matrix

Iteration Sequence scoring Alignment update Motif instances Motif matrix

Termination Convergence of the alignment

and of the motif matrix

Page 16: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

16

Gibbs motif finding

Initialization Sequences Random motif matrix

Iteration Sequence scoring Alignment update Motif instances Motif matrix

Termination Convergence of the alignment

and of the motif matrix

Page 17: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

17

Gibbs motif finding

Initialization Sequences Random motif matrix

Iteration Sequence scoring Alignment update Motif instances Motif matrix

Termination Convergence of the alignment

and of the motif matrix

1

1

1

,

10 0,

, ,

( | , )( )

( | , )l i

l i

l l W

Wi bW

i b

x b b

P x SW x

P x S

Page 18: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

18

Gibbs motif finding

Initialization Sequences Random motif matrix

Iteration Sequence scoring Alignment update Motif instances Motif matrix

Termination Convergence of the alignment

and of the motif matrix

Page 19: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

19

Gibbs motif finding

Initialization Sequences Random motif matrix

Iteration Sequence scoring Alignment update Motif instances Motif matrix

Termination Convergence of the alignment

and of the motif matrix

Page 20: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

20

Gibbs motif finding

Initialization Sequences Random motif matrix

Iteration Sequence scoring Alignment update Motif instances Motif matrix

Termination Convergence of the alignment

and of the motif matrix

Page 21: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

21

Gibbs motif finding

Initialization Sequences Random motif matrix

Iteration Sequence scoring Alignment update Motif instances Motif matrix

Termination Convergence of the alignment

and of the motif matrix

Page 22: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

22

Gibbs motif finding

Initialization Sequences Random motif matrix

Iteration Sequence scoring Alignment update Motif instances Motif matrix

Termination Stabilization of the motif matrix

(not of the alignment)

Page 23: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

23

Motif Sampler (extended Gibbs sampling)

Model One motif of fixed length per round Several occurrences per sequence

Sequence have a discrete probability distribution over the number of copies of the motif (under a maximum bound)

Multiple motifs found in successive rounds by masking occurrences of previous motifs

Improved background model based on oligonucleotides

Gapped motifs

Page 24: Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

2.005.09.005.004.005.005.01.0

4.09.005.09.003.005.004.02.0

2.002.001.001.08.01.09.04.0

2.003.004.004.003.08.001.03.0

NCACGTGN :model Motif

T

G

C

A)...|(

model Background

21 mjjjj bbbbP

0Motif

1

( | , , , )c

i im bg bg

i

P S a c B P P P

1Motif ,1

a ji

Wi

j bj

P q

x

mjmjjjmbg bbbPbbPP

111

0 )...|(),...,(

1

11

( | ... )i

i

ai

bg j j j mj a w

P P b b b

Translation start500 bp