Top Banner
Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut
57

Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Design and Optimization of Universal DNA Arrays

Ion MandoiuComputer Science & Engineering Department

University of Connecticut

Page 2: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Overview

Background on DNA Microarrays DNA Tag Arrays

- Tag Set Design

- Tag Assignment Problem

New SBE/SBH Assay- Decoding and Multiplexing Algorithms

Conclusions

Page 3: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Watson-Crick Complementarity

• Four nucleotide types: A,C,T,G

• A’s paired with T’s (2 hydrogen bonds)

• C’s paired with G’s (3 hydrogen bonds)

Page 4: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

DNA Microarrays• Exploit Watson-Crick complementarity to simultaneously

perform a large number of substring tests• Used in a variety of high-throughput genomic analyses

– Transcription (gene expression) analysis – Single Nucleotide Polymorphism (SNP) genotyping– Genomic-based microorganism identification– Point-of-service diagnosis– Alternative splicing, ChIP-on-chip, tiling arrays,…

• Common microarray formats involve direct hybridization between labeled DNA/RNA sample and DNA probes attached to a glass slide

Page 5: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

SNP Genotyping

• Genome variation: 0.1% of the DNA different from one individual to another– 80% of the variation is represented by Single Nucleotide

Polymorphisms (SNPs)– 2 possible nucleotides (alleles) for each SNP

• SNP genotyping = determining the alleles present at the SNP sites

• Highest throughput for SNP genotyping is achieved by high-density DNA microarrays based on direct hybridization

Page 6: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Solid phase hybridization

SNP genotyping via direct hybridization

• SNP1 with alleles T/G• SNP2 with alleles A/G

A CTC

G

A

A CTCG A

2 probes per SNP

Optical scanning used to identify

probes with complements in the

mixture

Page 7: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Universal DNA Arrays

• Limitations of direct hybridization formats:– Arrays of cDNAs: inexpensive, but can only be used for

transcription analysis

– Oligonucleotide arrays: flexible, but expensive unless produced in large quantities

• Universal DNA arrays: “programable” arrays– Array consists of application independent oligonucleotides

– Detection carried by a sequence of reactions involving application specific primers

– Flexible AND cost effective

• Universal array architectures: DNA tag arrays, APEX arrays, SBE/SBH arrays

Page 8: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Overview

Background on DNA Microarrays DNA Tag Arrays

- Tag Set Design

- Tag Assignment Problem

New SBE/SBH Assay- Decoding and Multiplexing Algorithms

Conclusions

Page 9: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

DNA Tag Arrays

• “Programmable” array format [Brenner 97, Morris et al. 98]– Array consists of application independent array probes

called tags

– The complements of the tags are called antitags

– Detection carried by a sequence of reactions involving application specific primers and antitags

Page 10: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

DNA Tag ArraysAntitag

+

PrimerG

A

G

G

A

G

G

A

G

TG

C

A

add labeled dideoxinucleotides and DNA polymerase

T

C

CTag

TC C

1. Mix antitag+primer reporter probes with genomic DNA

2. Solution phase hybridization

3. Single-Base Extension (SBE)4. Solid phase hybridization

Page 11: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Universal Tag Array Advantages

• Cost effective– Same array used in many analyses can be mass produced

• Fast to customize– Only need to synthesize new set of reporter probes

• Reliable– Solution phase hybridization better understood than

hybridization on solid support

Page 12: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Tag Set Design Problem

(H1) Tags hybridize strongly to complementary antitags

(H2) No tag hybridizes to a non-complementary antitag

(H3) Tags do not cross-hybridize to each other

t1t1 t2t2 t1 t2t1

Tag Set Design Problem: Find a maximum cardinality set of tags satisfying (H1)-(H3)

Page 13: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

• Hamming distance model, e.g., [Marathe et al. 01]– Models rigid DNA strands

• LCS/edit distance model, e.g., [Torney et al. 03] – Models infinitely elastic DNA strands

• c-token model [Ben-Dor et al. 00]:– Duplex formation requires formation of nucleation complex

between perfectly complementary substrings

– Nucleation complex must have weight c, where wt(A)=wt(T)=1, wt(C)=wt(G)=2 (2-4 rule)

Hybridization Models

Page 14: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

c-h Code Problem• c-token: left-minimal DNA string of weight c, i.e.,

– w(x) c

– w(x’) < c for every proper suffix x’ of x

• A set of tags is a c-h code if(C1) Every tag has weight h

(C2) Every c-token is used at most once

c-h Code Problem [Ben-Dor et al.00]

Given c and h, find maximum cardinality c-h code

[Ben-Dor et al.00] give approximation algorithm based on DeBruijn sequences

Page 15: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Periodic Tags [MT05]

• Key observation: c-token uniqueness constraint in c-h code formulation is too strong– A c-token should not appear in two different tags, but

can be repeated in a tag– Periodic tags use fewer c-tokens!

Tag set design can be cast as a cycle packing problem

Page 16: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Vertex-disjoint Cycle Packing Problem

• Given directed graph G, find maximum number of vertex disjoint directed cycles in G

• [MT 05] APX-hard even for regular directed graphs with in-degree and out-degree 2– h-c/2+1 approximation factor for tag set design problem

• [Salavatipour and Verstraete 05] – Quasi-NP-hard to approximate within (log1- n)

– O(n1/2) approximation algorithm

Page 17: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

c-token factor graph, c=4 (incomplete)

CC

AAG AAC

AAAA

AAAT

Page 18: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Cycle Packing Algorithm1. Construct c-token factor graph G

2. T{}

3. For all cycles C defining periodic tags, in increasing order of

cycle length,

• Add to T the tag defined by C

• Remove C from G

4. Perform an alphabetic tree search and add to T tags consisting

of unused c-tokens

5. Return T

– Gives an increase of over 40% in the number of tags compared to previous methods

Page 19: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Experimental Results

h

Page 20: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Antitag-to-Antitag Hybridization

• Additional practical constraint (ignored by Ben-Dor et al): antitags do not cross-hybridize, including self

• Formalization in c-token hybridization model:(C3) No two (anti)tags contain complementary substrings of

weight c

• Cycle packing and tree search extend easily

Page 21: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Results w/ Extended Constraints

h

Page 22: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

More Hybridization Constraints…

• Enforced during tag assignment by- Leaving some tags unassigned and distributing primers across multiple arrays [Ben-Dor et al. 03]

- Exploiting availability of multiple primer candidates [MPT05]

t1 t2t1

Page 23: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Assignable Primers

• If primer p hybridizes to tag t’, at most one of the assignments (p,t’), (p,t) and (p’,t’) can be made

p

t’

t

p’

• Set P of primers is assignable to a set T of tags if the condition above is satisfied for every p,p’ and t,t’

Page 24: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Finding Assignable Primer Sets

Maximum Assignable Primer Set Problem: given primer set P and tag set T, find a maximum size assignable subset of P

• Both problems are NP-hard [Ben-Dor 04]

Multiplexing Problem: given primer set P and tag set T, find partition of P into minimum number of assignable sets

Page 25: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Integration with Primer Selection

• In practice, several primer candidates with equivalent functionality– In SNP genotyping, can pick primer from either forward and

reverse strand

– In gene expression/identification applications, many primers have desired length, Tm, etc.

Page 26: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Pooled Array Multiplexing Problem

Pooled Multiplexing Problem: Given set of primer pools P and tag set T, find a primer from each pool and a partition of selected primers into minimum number of assignable sets

Page 27: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Pooled Multiplexing Algorithms

1. Primer-Del = greedy deletion for pools similar to [Ben-Dor et al 04]

Repeatedly delete primer of maximum potential until X+Y #pools, where Potential of tag t is 2-deg(t)

Potential of primer p is sum of potentials of conflicting tags Subtract ½ if primer adjacent to a tag of degree 1

Page 28: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Pooled Multiplexing Algorithms

1. Primer-Del = greedy deletion for pools similar to [Ben-Dor et al 04]

2. Primer-Del+ = same but never delete last primer from pool unless no other choice

3. Min-Pot = select primer with min potential from each pool, then run Primer-Del

4. Min-Deg = select primer with min degree, then run Primer-Del

5. Iterative ILP = iteratively find a maximum assignable pool set using integer linear program

Page 29: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Results: 213 [MPT05] Tags, c=7

Page 30: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Herpes B Gene Expression Assay

Tm # poolsPool size

500 tags 1000 tags 2000 tags

# arrays % Util. # arrays % Util. # arrays % Util.

60 14461 4 94.06 2 97.20 1 72.30

5 4 96.13 2 100.00 1 72.30

67 15601 4 96.53 2 98.70 1 78.00

5 4 98.00 2 99.90 1 78.00

70 15221 4 96.73 2 98.90 1 76.10

5 4 97.80 2 99.80 1 76.10

Tm # poolsPool size

500 tags 1000 tags 2000 tags

# arrays % Util. # arrays % Util. # arrays % Util.

60 14461 4 82.26 3 65.35 2 57.05

5 4 88.26 3 70.95 2 63.55

67 15601 4 86.33 3 69.70 2 61.15

5 4 91.86 3 76.00 2 67.20

70 15221 4 88.46 3 73.65 2 65.40

5 4 92.26 2 91.10 2 70.30

GenFlex Tags

Periodic Tags

Page 31: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Overview

Background on DNA Microarrays DNA Tag Arrays

- Tag Set Design

- Tag Assignment Problem

New SBE/SBH Assay- Decoding and Multiplexing Algorithms

Conclusions

Page 32: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

TTGCA

GATAA

T

A

T

AA AC CC CA

AT AG CG CT

TT TG GG GT

TA TC GC GA

Primers

CCATT

T

A

T

T

A

T

hybridization on a 2-mer array (SBH)

New SBE/SBH Assay

single-base extension (SBE)

Page 33: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Some notations

• P set of primers, X set of probes

• Ep {A,C,T,G}⊆ the set of possible extensions for primer p

• The spectrum of primer p, SpecX(p), is the set of probes hybridizing with p

• The extended spectrum of primer p with extension set Ep,)(),( peSpecEpSpec XpX Eep

Page 34: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Decodable primer sets

• Four parallel single-color SBE/SBH experiments one type of extension in each SBE experiment

– P is weakly decodable with respect to extension e if for every primer p

• One SBE/SBH experiment with 4 colors (4 extensions)

– P is weakly decodable if for every primer p and every extension e ∈ Ep

)'(\)( ' epSpecpeSpec XX pp

)'(\)( ',' epSpecpeSpec XpX Eepp

Page 35: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Strongly r-decodable primer sets

• Hybridization involving labeled nucleotide is unreliable

Informative probes should not rely on it

• Signal from one SNP may obscure signal from another when read at the same probe due to differences in DNA amplification efficiency

Informative probes cannot be shared between SNPs

• P is strongly r-decodable if for every primer p

where r = redundancy parameter

rEpSpecpSpec ppp XX |),'(\)(| ''

Page 36: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

MPPP

Minimum Pool Partitioning Problem (MPPP)Given:

• primer pools set P and extensions sets Ep, for every primer p

• probe set X• redundancy r Find:

partition of P into the min number of strongly r-decodable subsets

A set of primer pools P ={P1,…,Pn } is strongly r-decodable iff there is a primer pi in each pool Pi such that {p1,…,pn} is strongly r-decodable.

Page 37: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

MDPSP

Maximum r-Decodable Pool Subset Problem (MDPSP)Given:

• primer pools set P and extensions sets Ep, for every primer p

• probe set X• redundancy r Find:

• strongly r-decodable subset of P of maximum size

Page 38: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Min-Greedy Algorithm for Maximum Induced Matching in General Graphs• Pick a vertex u of min degree • Pick a vertex v of min degree from among u’s

neighbors• Add edge (u,v) to the matching• Delete all neighbors of u and v from the graph• Repeat the above steps until the graph becomes empty

• [Duckworth 05] d-1 approximation factor for d-regular graphs

Page 39: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Min-Greedy Algorithms for MDPSP• Bipartite hybridization graph G:

– Primers in left side, probes in right side– Two types of edges:

• N+(p)=SpecX(p)• N-(p)=SpecX(p,Ep) \ SpecX(p)

• Two algorithm variants:– MinPrimerGreedy: pick primer first– MinProbeGreedy: pick probe first

• Delete primer/probe if N+ degree drops below r/1

Page 40: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Experimental results for k-mers

Page 41: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Experimental Results for k-mersk=10

0

20000

40000

60000

80000

100000

120000

140000

160000

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

primer length

r=2, n=200k, MinProbe

r=2, n=200k, Sequential

r=2, n=200k, MinPrimer

r=5, n=200k, MinProbe

r=5, n=200k, Sequential

r=5, n=200k, MinPrimer

Page 42: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Experimental results for c-tokens

Page 43: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Experimental Resultsc=13

0

10000

20000

30000

40000

50000

60000

70000

80000

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

primer length

r=2, n=200k, MinProbe

r=2, n=200k, Sequential

r=2, n=200k, MinPrimer

r=5, n=200k, MinProbe

r=5, n=200k, Sequential

r=5, n=200k, MinPrimer

Page 44: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Overview

Background on DNA Microarrays DNA Tag Arrays

- Tag Set Design

- Tag Assignment Problem

New SBE/SBH Assay- Decoding and Multiplexing Algorithms

Conclusions

Page 45: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Conclusions and Ongoing Work

• Combinatorial algorithms yield significant increases in multiplexing rates of universal DNA arrays– New SBE/SBH architecture particularly promising based on

preliminary simulation results

• Ongoing work: – Extend methods to more accurate hybridization models, e.g.,

use NN melting temperature models

– More complex (e.g., temperature dependent) DNA tag set non-interaction requirements for DNA self/mediated assembly

– Probabilistic decoding in presence of hybridization errors

Page 46: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Acknowledgments

• Claudia Prajescu and Dragos Trinca

• Funding by NSF (CAREER Award IIS 0546457) and UCONN Research Foundation

Page 47: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Backup Slides

Microarray Technologies Number of c-tokens Characterization of assignable sets Integer program for MAPS

Page 48: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Microarray Technologies

• Arrays of cDNAs– Obtained by reverse transcription from

Expressed Sequence Tags (ESTs)

• Oligonucleotide arrays– Short (20-60bp) synthetic DNA strands

Page 49: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Ink jet Technology

Pin Technology Quill Pen Technology

Pin Ring Technology

Robotic cDNA Arrayers

Page 50: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

In-Place Oligonucleotide Synthesis

CG

AC

CG

AC

ACG

AG

G

AG

C

Probes to be synthesized

A

A

A

A

A

Page 51: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

In-Place Oligonucleotide Synthesis

CG

AC

CG

AC

ACG

AG

G

AG

C

Probes to be synthesized

A

A

A

A

A

C

C

C

C

C

C

Page 52: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

In-Place Oligonucleotide Synthesis

CG

AC

CG

AC

ACG

AG

G

AG

C

Probes to be synthesized

A

A

A

A

A

C

C

C

C

C

C

G G

G G

G G

Page 53: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Number of c-tokens

• W=A or T, S=C or G

• Gn = #strings of weight n

G1 = 2; G2 = 6; Gn = 2Gn-2 + 2Gn-1

Token type Num tokens

<c-2>S 2 Gc-2

S<c-3>S 4 Gc-3

<c-1>W 2 Gc-1

S<c-2>W 4 Gc-2

Total Gc + 2 Gc-1

Page 54: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Number of c-tokensc Num c-tokens

5 208

6 568

7 1552

8 4240

9 11584

10 31648

Page 55: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Characterization of Assignable Sets

• conflict graph: – G=(T P,E), where (t,p) ∈ E if t hybridizes with p– X = number of primers adjacent to a degree 1 tag– Y = number of degree 0 tags

Y=2

X=1

• [Ben-Dor 04] Set P is assignable to T iff

X+Y |P|

Page 56: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

X+Y Characterization Fails for Pools

Page 57: Design and Optimization of Universal DNA Arrays Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Integer Linear Program for MAPS

• where zpt = 1 iff primer p is assigned to tag t