Top Banner
DNA Sequencing Jessica Scheld
24

DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

DNA Sequencing

Jessica Scheld

Page 2: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Recall: DNA

Polymer of nucleotides which encodes information

Made up of long sequences of A,T,C,G’s

We want to read the DNA sequence, but it’s often too long to just read off

Problem: How do we know that a reassembled DNA sequence is in the right order?

http://www.tokyo-med.ac.jp/genet/picts/dna.jpg

Page 3: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

DNA polymerase

DNA Sequencing-Biology

TAQ polymerase

primer

C A T G

deoxynucleotides

dideoxynucleotides

The Players

DNA

Page 4: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

What would our sequence read?T-C-G-A

Page 5: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Problem

Say we want to read this piece of single-stranded DNA

We can’t read it all in one piece, so we break it up into -length “snippets,” and then piece it back together to reconstruct the DNA sequence.

Here, we’ll use snippets of 4 nucleotides. For the sequence above, we would get

snippets of:

ATCGACTATAAGGCATCGAA

TCGA

CGAC GACTACTA

CTATTATA

ATAA

TAAGAAGGAGGC

GGCA GCAT

CATC ATCG

TCGA CGAA

ATCG

l

A T C G A T C A T A A G G C A T C G A A

Page 6: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Constructing the DeBruijn Graph

These snippets can be represented by a graph.

Each snippet has length 4. Make each vertex of the graph consist of 3 of these letters. The “head” vertex contains the first three of one snippet and the “tail” vertex contains the last three.

Example:

GGCA = GGC GCA

TCGA CGACGACT

ACTA

CTATTATA

ATAA

TAAG

AAGG

AGGC

GGCA

GCAT

CATC ATCG

TCGA

CGAA

ATCG

Page 7: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Constructing the DeBruijn Graph cont.

Do for all snippets – connect with directed edges. Above is a construction of the DeBruijn graph for the

subsequence of DNA above

CGA

GAC ACT

CTA

TAT

ATA TAA

AAG

AGGGGC

GGC

CAT

ATCGAA

TCG

GCA

ATCGACTATAAGGCATCGAA

Page 8: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Creating the 2-in 2-out digraph

Notice only 3 vertices with more than degree two. We can redraw this graph so it only has vertices with degree 4.

CGA

GAC ACT

CTA

TAT

ATA TAA

AAG

AGGGGC

GGC

CAT

ATCGAA

TCG

GCA

CGA

ATC

TCG

CGATCGATC

Page 9: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Recap

There is almost always more than one way to reconstruct a strand of DNA (and only 1 correct way)

DeBruijn graph visualizes these ways and can be redrawn as an Eulerian digraph

We want to find the number of ways to reconstruct the DNAFinding the probability of getting

the correct one

Page 10: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Eulerian digraphs

Required:Only 2-in 2-out digraphs

• why not 1 or 3?

Use circuit: ABCDCABDA

Then we can rewrite this graph state as:

C

D

B

A

CC

D

B

A

Page 11: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Chord Diagram Interlace Graph

C

A B

D

D

B

D

C

B

A

A

Use circuit: ABCDCABDA

C

Page 12: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Interlace polynomial

if , the edgeless graph on vertices( , )

( , ) ( , ) if ( )

nn

vw

x G E nq G x

q G v x q G w x vw E G

Arratia, Bollobas, Sorkin, ‘00

vwA vwA

vA wA

vwA

vA wA

vw vwG G G wInterchange edges and non-edges among

, , and vw v wA A A

v vww w

wAvA

v

Page 13: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Finding the Interlace Polynomial

dbG

C

C

A

BD

D B

A

G

dbG b

C

D

A

B

C

A

BD

G d

Page 14: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Finding the Interlace Polynomial cont.

abG b

daG G

B

A

A B

A

G

abG G

G

daG a G dG a

B

A B

D A

C

D

A

C

D A

C

D A

C

D A

C

A B

CC

C

C

C

Page 15: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Finding Interlace Polynomial cont.

=+ + + +

=+ + ++ +=

B

A

D

A

C

A B

C

A D

C

A AB C D C

C

C C

C C

22 4x x

Page 16: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Interlace Polynomial Reconstructions To find the number of reconstructions, we must

relate the interlace polynomial to the circuit partition polynomial.

Theorem*: If is a 4-regular Eulerian digraph, C is any Eulerian circuit of , and H is the circle graph of the chord diagram determined by C, then

Thus:

2

2 2

2

3 2 2

3 2

( ; ) 2 4

( ; ) * ( ; 1)

2 4 * 2( 1) 4( 1)

* 2 4 2) 4 4

2 4 2 4 4

2 8 6

q H x x x

f G x x q H x

x x x x x

x x x x

x x x x x

x x x

BBBBBBBBBBBBBB

GBBBBBBBBBBBBBBG

BBBBBBBBBBBBBB

( ; ) ( ; 1).f G x xq H x BBBBBBBBBBBBBB

6 reconstructions

*J.A. Ellis-Monaghan, I. Sarmiento. Properties of the Interlace Polynomial via Isotropic Systems, Preprint.

Page 17: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Different Possible Cycles

C D

B

A

C D

B

A

C D

B

A

C D

B

A

ABCABDCDAABCDABDCAABCDCABDAABCDABDCAABCABDCDAABCDCABDA

Page 18: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

What does it mean?

6x 6 ways to reconstruct the original Eulerian graph.

Can find this by counting, but with bigger graphs, it would take much longer to find all the different circuits (if we don’t miss one)

Useful in determining probability of getting the right sequence of DNAProblem above – probability = 1/6

Page 19: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Problem for the Class

Find the interlace polynomial, using these graphs:

What is the sequence we are using? How can you tell?

How many reconstructions are possible?

D

C C

B

B AD

AA B

D C

Page 20: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

Solution

D

C C

B

A

B A

DDC

A B

Page 21: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

A B

D

Solution cont.C

D

C

D

BA

abG

G

abG b

C

A BA B

C

G aD

Page 22: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

D

A

ADD

Solution cont.

C

B

DC

A

+ =

C

D

B

D

+ + + C

= C + + + + +B C =

Similar to the previous example, the interlace polynomial can be equated to the circuit partition polynomial, giving: implying that there are 6 ways to reconstruct this snippet of DNA the probability you have the correct one is 1/6.

D D

D D

22 4x x

( , 1)xf H x 3 22 8 6x x x

Page 23: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.

The 6 circuits

A B

D C

ABCDACBDAABCDACBDAABCBDACDAABCBDACDAABDACBCDAABDACBCDA

Page 24: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the.