Top Banner
Markov Models Charles Yan Spring 2006
20

Markov Models Charles Yan Spring 2006. 2 Markov Models.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Markov Models Charles Yan Spring 2006. 2 Markov Models.

Markov Models

Charles YanSpring 2006

Page 2: Markov Models Charles Yan Spring 2006. 2 Markov Models.

2

Markov Models

Page 3: Markov Models Charles Yan Spring 2006. 2 Markov Models.

3

Markov Chains

While this chapter is about protein function prediction, I will use a gene finding example (to be exactly, CpG islands identification) to show Markov chains, since it is a simple and well-studied case.

The same approach can be used to other problems.

Page 4: Markov Models Charles Yan Spring 2006. 2 Markov Models.

4

Markov Chains

The CG island is a short stretch of DNA in which the frequency of the CG sequence is higher than other regions.  It is also called the CpG island, where "p" simply indicates that "C" and "G" are connected by a phosphodiester bond.

Whenever the dinucleotide CpG occurs, the C nucleotide is typically chemically modified by methylation.

C of CpG is methylated into methyl-C. methyl-C mutates into T relatively easily.

Page 5: Markov Models Charles Yan Spring 2006. 2 Markov Models.

5

Markov Chains

Thus, in general, CpG dinuclueotides are rarer in the genome. F (CpG) < f(C) * f(G).

Methylation process is supressed before the “starting point” of many genes.

These regions (CpG islands) have more CpG than elsewhere.

Usually, CpG islands are a few hundred to a few thousand bases long.

Identification of CpG islands is important for gene finding.

Page 6: Markov Models Charles Yan Spring 2006. 2 Markov Models.

6

Markov Chains

APRT(Homo Sapiens)

Page 7: Markov Models Charles Yan Spring 2006. 2 Markov Models.

7

Markov Chains

We want to develop a probabilistic model for CpG islands, such that every CpG island sequence is generated by the model.

Since dinucleotides are important, we want a model that generates sequences in which the probability of a symbol depends on the previous symbol.

The simplest one is a Markov chain.

Page 8: Markov Models Charles Yan Spring 2006. 2 Markov Models.

8

Markov Chains

Page 9: Markov Models Charles Yan Spring 2006. 2 Markov Models.

9

Markov Chains

The probability that a sequence x is generated by a Markov chain model

By applying many times of

)|()(),( XYPXPYXP

Page 10: Markov Models Charles Yan Spring 2006. 2 Markov Models.

10

Markov Chains

One assumption of Markov chain is that the probability of xi only depend on the previous symbol xi-1, i.e.,

Thus,

Page 11: Markov Models Charles Yan Spring 2006. 2 Markov Models.

11

Markov Chains

In this model, we must specify the probability P(x1) as well as the transition probabilities

. To make the formula homogeneous (i.e.,

comprise of only terms in the form of ), we can introduce a begin state to the model.

Page 12: Markov Models Charles Yan Spring 2006. 2 Markov Models.

12

Markov Chains

Page 13: Markov Models Charles Yan Spring 2006. 2 Markov Models.

13

Markov Chains

The probability that a sequence x is generated by a Markov chain model (with a begin state)

Page 14: Markov Models Charles Yan Spring 2006. 2 Markov Models.

14

Markov Chains

Training the model, i.e., estimate the transition probabilities

``

tst

stst c

ca Where Cst is the number of times that

letter t followed letter s

Maximum likelihood (ML) approach is used to estimated the transition probabilities

Page 15: Markov Models Charles Yan Spring 2006. 2 Markov Models.

15

Markov Chains A set of CpG islands (CpG model)

1st row: The probabilities that A is followed by each of the four bases.

The sum of each row is 1 The sum of each column? (Hint: P(.A)=P(A.)=1)

A set of sequences that are not CpG islands

(Background model)

Page 16: Markov Models Charles Yan Spring 2006. 2 Markov Models.

16

Markov Chains

Given a sequence x, does it belong to CpG islands?

If the log likelihood ratio >0, then x belongs to CpG islands.

Page 17: Markov Models Charles Yan Spring 2006. 2 Markov Models.

17

Markov Chains

Page 18: Markov Models Charles Yan Spring 2006. 2 Markov Models.

18

Markov Chains

Page 19: Markov Models Charles Yan Spring 2006. 2 Markov Models.

19

Markov Chains

Page 20: Markov Models Charles Yan Spring 2006. 2 Markov Models.

20

Markov Chains to Hidden Markov Models