Top Banner
Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg
93

Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Genome Rearrangements, Synteny, and

Comparative Mapping

CSCI 4830: Algorithms for Molecular Biology

Debra S. Goldberg

Page 2: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Turnip vs Cabbage

• Share a recent common ancestor

• Look and taste different

Page 3: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Turnip vs Cabbage

• Comparing mtDNA gene sequences yields no evolutionary information

• 99% similarity between genes

• These surprisingly identical gene sequences differed in gene order

• This study helped pave the way to analyzing genome rearrangements in molecular evolution

Page 4: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Turnip vs Cabbage: Different mtDNA Gene Order

• Gene order comparison:

Page 5: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Turnip vs Cabbage: Different mtDNA Gene Order

• Gene order comparison:

Page 6: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Turnip vs Cabbage: Different mtDNA Gene Order

• Gene order comparison:

Page 7: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Turnip vs Cabbage: Different mtDNA Gene Order

• Gene order comparison:

Page 8: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Turnip vs Cabbage: Gene Order Comparison

Before

After

• Evolution is manifested as the divergence in gene order

Page 9: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Transforming Cabbage into Turnip

Page 10: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Reversals1 32

4

10

56

8

9

7

1, 2, 3, -8, -7, -6, -5, -4, 9, 10

Blocks represent conserved genes. In the course of evolution or in a clinical context, blocks

1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10.

1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Blocks represent conserved genes.

Page 11: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Types of Mutations

Page 12: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Types of Rearrangements

Inversion/Reversal1 2 3 4 5 6 1 2 -5 -4 -3 6

Page 13: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Types of Rearrangements

Translocation1 2 3 44 5 6

1 2 6 4 5 3

Page 14: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Types of Rearrangements

1 2 3 4 5 6

1 2 3 4 5 6

Fusion

Fission

Page 15: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

• What are the similarity blocks and how to find them?

• What is the architecture of the ancestral genome?

• What is the evolutionary scenario for transforming one genome into the other?

Unknown ancestor~ 75 million years ago

Mouse (X chrom.)

Human (X chrom.)

Genome rearrangements

Page 16: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Why do we care?

Page 17: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

SKY (spectral karyotyping)

Page 18: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Robertsonian Translocation

13 14

Page 19: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Robertsonian Translocation

• Translocation of chromosomes 13 and 14

• No net gain or loss of genetic material: normal phenotype.

• Increased risk for an abnormal child or spontaneous pregnancy loss

13 14

Page 20: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Philadelphia Chromosome

9

22

Page 21: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Philadelphia Chromosome

• A translocation between chromosomes 9 and 22 (part of 22 is attached to 9)

• Seen in about 90% of patients with Chronic myelogenous leukemia (CML)

9 22

Page 22: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Colon cancer

Page 23: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Colon Cancer

Page 24: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Comparative maps

Page 25: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Waardenburg’s Syndrome: Mouse Provides Insight into Human

Genetic Disorder

• Characterized by pigmentary dysphasia• Gene implicated linked to human

chromosome 2 • It was not clear

where exactly on chromosome 2

Page 26: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Waardenburg’s syndrome and splotch mice

• A breed of mice (with splotch gene) had similar symptoms caused by the same type of gene as in humans

• Scientists identified location of gene responsible for disorder in mice

• Finding the gene in mice gives clues to where same gene is located in humans

Page 27: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Reversals: Example

= 1 2 3 4 5 6 7 8

1 2 5 4 3 6 7 8

Page 28: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Reversals: Example

= 1 2 3 4 5 6 7 8

1 2 5 4 3 6 7 8

1 2 5 4 6 3 7 8

Page 29: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Reversals and Gene Orders

• Gene order represented by permutation 1------ i-1 i i+1 ------j-1 j j+1 -----n

1------ i-1 j j-1 ------i+1 i j+1 -----n

Reversal ( i, j ) reverses (flips) the elements from i to j in

,j)

Page 30: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Reversal Distance Problem

• Goal: Given two permutations, find shortest series of reversals to transform one into another

• Input: Permutations and

• Output: A series of reversals 1,…t

transforming into such that t is minimum

• t - reversal distance between and

• d(, ) = smallest possible value of t, given

Page 31: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Sorting By Reversals Problem

• Goal: Given a permutation, find a shortest series of reversals that transforms it into the identity permutation (1 2 … n )

• Input: Permutation

• Output: A series of reversals 1, … t transforming into the identity permutation such that t is minimum

• min t =d( ) = reversal distance of

Page 32: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Sorting by reversalsExample: 5 steps

Step 0: 2 -4 -3 5 -8 -7 -6 1Step 1: 2 3 4 5 -8 -7 -6 1Step 2: 2 3 4 5 6 7 8 1Step 3: 2 3 4 5 6 7 8 -1Step 4: -8 -7 -6 -5 -4 -3 -2 -1Step 5: g 1 2 3 4 5 6 7 8

Page 33: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Sorting by reversalsExample: 4 steps

Step 0: 2 -4 -3 5 -8 -7 -6 1Step 1: 2 3 4 5 -8 -7 -6 1Step 2: -5 -4 -3 -2 -8 -7 -6 1Step 3: -5 -4 -3 -2 -1 6 7 8Step 4: g 1 2 3 4 5 6 7 8

What is the reversal distance for this permutation? Can it be sorted in 3 steps?

Page 34: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Pancake Flipping Problem• Chef prepares unordered

stack of pancakes of different sizes

• The waiter wants to sort (rearrange) them, smallest on top, largest at bottom

• He does it by flipping over several from the top, repeating this as many times as necessary

Christos Papadimitrou and Bill Gates flip pancakes

Page 35: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Sorting By Reversals: A Greedy Algorithm

• Unsigned permutations

• Example: permutation = 1 2 3 6 4 5

• First three elements are already in order

• prefix() = length of already sorted prefix

– prefix() = 3

• Idea: increase prefix() at every step

Page 36: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

• Doing so, can be sorted

1 2 3 6 4 5

1 2 3 4 6 5

1 2 3 4 5 6

• Number of steps to sort permutation of length n is at most (n – 1)

Greedy Algorithm: An Example

Page 37: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Greedy Algorithm: Pseudocode

SimpleReversalSort()1 for i 1 to n – 12 j position of element i in (i.e., j = i)

3 if j ≠i4 * (i, j)5 output 6 if is the identity permutation 7 return

Page 38: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Analyzing SimpleReversalSort

• SimpleReversalSort does not guarantee the smallest number of reversals and takes five steps on = 6 1 2 3 4 5 :

•Step 1: 1 6 2 3 4 5•Step 2: 1 2 6 3 4 5 •Step 3: 1 2 3 6 4 5•Step 4: 1 2 3 4 6 5•Step 5: 1 2 3 4 5 6

Page 39: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

• But it can be sorted in two steps:

= 6 1 2 3 4 5 – Step 1: 5 4 3 2 1 6 – Step 2: 1 2 3 4 5 6

• So, SimpleReversalSort() is not optimal

• Optimal algorithms are unknown for many problems; approximation algorithms used

Analyzing SimpleReversalSort (cont’d)

Page 40: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Approximation Algorithms

• These algorithms find approximate solutions rather than optimal solutions

• The approximation ratio of an algorithm A on input is: A() / OPT()where

A() -solution produced by algorithm A OPT() - optimal solution of the problem

Page 41: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Approximation Ratio / Performance Guarantee

• Approximation ratio (performance guarantee) of algorithm A: max approximation ratio of all inputs of size n

– For algorithm A that minimizes objective function (minimization algorithm):

•max|| = n A() / OPT()

– For maximization algorithm:

•min|| = n A() / OPT()

Page 42: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

= 23…n-1n

• A pair of elements i and i + 1 are adjacent if

i+1 = i + 1

• For example:

= 1 9 3 4 7 8 2 6 5

• (3, 4) or (7, 8) and (6,5) are adjacent pairs

Adjacencies and Breakpoints

Page 43: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

There is a breakpoint between any adjacent element that are non-consecutive:

= 1 9 3 4 7 8 2 6 5

• Pairs (1,9), (9,3), (4,7), (8,2) and (2,5) form breakpoints of permutation

• b() - # breakpoints in permutation

Breakpoints: An Example

Page 44: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Adjacency & Breakpoints

•An adjacency – consecutive

•A breakpoint – not consecutive

π = 5 6 2 1 3 4 0 5 6 2 1 3 4 7

adjacencies

breakpoints

Extend π with π0 = 0 and πn+1 = n+1

Page 45: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

• Add 0 =0 and n + 1=n+1 at ends of

Example:

Extending with 0 and 10

Note: A new breakpoint was created after extending

Extending Permutations

= 1 9 3 4 7 8 2 6 5

= 0 1 9 3 4 7 8 2 6 5 10

Page 46: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Each reversal eliminates at most 2 breakpoints.

= 2 3 1 4 6 5

0 2 3 1 4 6 5 7 b() = 5

0 1 3 2 4 6 5 7 b() = 4

0 1 2 3 4 6 5 7 b() = 2

0 1 2 3 4 5 6 7 b() = 0

Reversal Distance and Breakpoints

This implies: reversal distance ≥ #breakpoints / 2

Page 47: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Sorting By Reversals: A Better Greedy Algorithm

BreakPointReversalSort()

1 while b() > 02 Among all possible reversals,

choose reversal minimizing b( • )3 • (i, j)4 output 5 return

Problem: this algorithm may work forever

Page 48: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Strips

• Strip: an interval between two consecutive breakpoints in a permutation

– Decreasing strip: elements in decreasing order (e.g. 6 5)

– Increasing strip: elements in increasing order (e.g. 7 8)

0 1 9 4 3 7 8 2 5 6 10

– Consider single-element strips decreasing except strips 0 and n+1 are increasing

Page 49: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Reducing the Number of Breakpoints

Theorem 1:

If permutation contains at least one decreasing strip, then there exists a reversal which decreases the number of breakpoints (i.e. b(• ) < b() )

Page 50: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Things To Consider

• For = 1 4 6 5 7 8 3 2

0 1 4 6 5 7 8 3 2 9 b() = 5

– Choose decreasing strip with the smallest element k in (k = 2 in this case)

Page 51: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Things To Consider (cont’d)

• For = 1 4 6 5 7 8 3 2

0 1 4 6 5 7 8 3 2 9 b() = 5

– Choose decreasing strip with the smallest element k in (k = 2 in this case)

Page 52: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Things To Consider (cont’d)

• For = 1 4 6 5 7 8 3 2

0 1 4 6 5 7 8 3 2 9 b() = 5

– Choose decreasing strip with the smallest element k in (k = 2 in this case)

– Find k – 1 in the permutation

Page 53: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Things To Consider (cont’d)

• For = 1 4 6 5 7 8 3 2

0 1 4 6 5 7 8 3 2 9 b() = 5

– Choose decreasing strip with the smallest element k in (k = 2 in this case)

– Find k – 1 in the permutation

– Reverse segment between k and k-1: 0 1 2 3 8 7 5 6 4 9 b() = 4

Page 54: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Reducing the Number of Breakpoints Again

– If there is no decreasing strip, there may be no reversal that reduces the number of breakpoints (i.e. b(•) ≥ b() for any reversal ).

– By reversing an increasing strip ( # of breakpoints stay unchanged ), we will create a decreasing strip at the next step. Then the number of breakpoints will be reduced in the next step (theorem 1).

Page 55: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Things To Consider (cont’d)

• There are no decreasing strips in , for:

= 0 1 2 5 6 7 3 4 8 b() = 3

•(3,4) = 0 1 2 5 6 7 4 3 8 b() = 3

(3,4) does not change the # of breakpoints(3,4) creates a decreasing strip thus

guaranteeing that the next step will decrease the # of breakpoints.

Page 56: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

ImprovedBreakpointReversalSortImprovedBreakpointReversalSort()1 while b() > 02 if has a decreasing strip3 Among all possible reversals, choose reversal

that minimizes b( • )4 else5 Choose a reversal that flips an increasing

strip in 6 • 7 output 8 return

Page 57: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

• ImprovedBreakPointReversalSort is an approximation algorithm with a performance guarantee of at most 4– It eliminates at least one breakpoint in every

two steps; at most 2b() steps– Approximation ratio: 2b() / d()– Optimal algorithm eliminates at most 2

breakpoints in every step: d() b() / 2– Performance guarantee:

( 2b() / d() ) [ 2b() / (b() / 2) ] = 4

Performance Guarantee

Page 58: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Signed Permutations

• Up to this point, reversal sort algorithms sorted unsigned permutations

• But genes have directions… so we should consider signed permutations

5’ 3’

= 1 -2 - 3 4 -5

Page 59: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Signed Permutation

• Genes are directed fragments of DNA

• Genes in the same position but different orientations do not have same gene order

• These two permutations are not equivalent gene sequences

1 2 3 4 5

-1 2 -3 -4 -5

Page 60: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Signed permutations are easier!

Polynomial time (optimal) algorithm is known

Page 61: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

• What are the similarity blocks and how to find them?

• What is the architecture of the ancestral genome?

• What is the evolutionary scenario for transforming one genome into the other?

Unknown ancestor~ 75 million years ago

Mouse (X chrom.)

Human (X chrom.)

Genome rearrangements

Page 62: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

• What are the similarity blocks and how to find them?

• What is the architecture of the ancestral genome?

• What is the evolutionary scenario for transforming one genome into the other?

Unknown ancestor~ 75 million years ago

Mouse (X chrom.)

Human (X chrom.)

Genome rearrangements

Page 63: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Comparative maps

Page 64: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

A brief history

• Chromosome comparisons– no information about genes

• 1920’s: Sturtevant, Weinstein

• Today: many organisms, many uses

• Humans:– primates, mouse, cat, dog, zebrafish, ...– Alzheimer, cancers, diabetes, obesity, ...

Page 65: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Why construct comparative maps?

• Identify & isolate genes– Crops: drought resistance, yield, nutrition...– Human: disease genes, drug response,…

• Infer ancestral relationships

• Discover principles of evolution– Chromosome– Gene family

• “key to understanding the human genome”

Page 66: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Map construction

3S

8L

10L

3L

Maize 1 (target), Rice (base)

Wilson et al. Genetics 1999

pds1 (3S)

rz742a (2S)

rz103b (2L)

cdo1387b (3S)

isu040 (3)

rz574 (3S)

cdo38a (7L)

cdo938a (3S)

rz585a (3S)

rz672a (3S)

isu081b (3S 10L)

rz323a (8L)

cdo344c (12L)

rz296a (5L)

bcd734b (3S)

rz500 (10L)

rz421 (10L)

isu74 (3S)

cdo464a (8L)

isu73 (3S)

cdo475b (6S)

cdo595 (8L)

cdo116 (8L)

rz28a (8L)

cdo99 (8L)

rz698a (9L)

bcd207a (10L)

cdo94b (10L)

bcd386a (10L)

isu78 (5L)

csu77 (10L)

cdo98b (10L)

rz630e (3L)

rz403 (3L)

cdo795a (3L)

bcd1072c (5C)

isu92b (3L)

cdo122a (3L)

rz912a (3L)

bcd808a (11S)

cdo246 (3L)

adh1 (11S)

cdo353b (3L)

isu106a (3L)

phi1 (3L)

Go from this to this

Page 67: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Why automate?

• Time consuming, laborious– Needs to be redone frequently

• Codify a common set of principles

• Nadeau and Sankoff: warn of “arbitrary nature of comparative map construction”

Page 68: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Input/Output

• Input: – genetic maps of 2 species– marker/gene correspondences (homologs)

• Output:– a comparative map

• homeologies identified

Page 69: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

A natural model?

Maize 1 (target),

Rice (base)

Wilson et al. Genetics 1999

Maize 1

pds1 (3S)

rz742a (2S)

rz103b (2L)

cdo1387b (3S)

isu040 (3)

rz574 (3S)

cdo38a (7L)

cdo938a (3S)

rz585a (3S)

rz672a (3S)

isu081b (3S 10L)

rz323a (8L)

cdo344c (12L)

rz296a (5L)

bcd734b (3S)

rz500 (10L)

rz421 (10L)

isu74 (3S)

cdo464a (8L)

isu73 (3S)

cdo475b (6S)

cdo595 (8L)

cdo116 (8L)

rz28a (8L)

cdo99 (8L)

rz698a (9L)

bcd207a (10L)

cdo94b (10L)

bcd386a (10L)

isu78 (5L)

csu77 (10L)

cdo98b (10L)

rz630e (3L)

rz403 (3L)

cdo795a (3L)

bcd1072c (5C)

isu92b (3L)

cdo122a (3L)

rz912a (3L)

bcd808a (11S)

cdo246 (3L)

adh1 (11S)

cdo353b (3L)

isu106a (3L)

phi1 (3L)

Rice

3S

8L

10L

3L

Page 70: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Scoring

10L

3L

s

m

bcd207a (10L)cdo94b (10L)bcd386a (10L)isu78 (5L)csu77 (10L)cdo98b (10L)rz630e (3L)rz403 (3L)cdo795a (3L)isu92b (3L)

Page 71: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Assumptions

• Accept published marker order

• All linkage groups of base are unique

• Simplistic homeology criteria

• At least one homeologous region

Page 72: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

A natural model?

Page 73: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Dynamic programming

• li = location of homolog to marker i

• S[i,a] = penalty (score) for an optimal labeling of the submap from marker i to the end, when labeling begins with label a

a

1 ... i ... n

Page 74: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Recurrence relation

• S[n,a] = m (a, ln)

S[i,a] = m (a, li) + min (S[i+1,b] + s (a,b) )bL

a b

... i i+1 ... n

li li+1 ln

a ... n... ln

Page 75: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Problem with linear model

s = 2

a-b-c motif:

a b c score: 2s = 4

a a a b b b c c c

a-b-a motif:

a score: 3m = 3

a a a b b b a a a

Page 76: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

The stack model

• Segment at top of the stack can be:– pushed (remembered), later popped– replaced

• Push and replace cost s -- pop is free.

b b bfe

dc

ac

Page 77: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Scoring

s

9L

7L

7L

“free” pop

m

m

m

uaz265a (7L) isu136 (2L) isu151 (7L) rz509b (7L) cdo59c (7L) rz698c (9L) bcd1087a (9L) rz206b (9L) bcd1088c (9L) csu40 (3S) cdo786a (9L) csu154 (7L) isu113a (7L) csu17 (7L) cdo337 (3L) rz530a (7L)

Page 78: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Dynamic programming

• S[i,j,a] = score for an optimal labeling of:– submap from marker i to marker j– when labeling begins with label a --

i.e., marker i is labeled a

a

1 ... i ... j ... n

Page 79: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Recurrence relation

• S[i,i,a] = m (a, li)

• S[i,j,a] = min:• m (a, li) + min (S[i+1,j,b] + s (a,b) )

• min S[i,k,a] + S[k+1,j,a] i<k<j

bL

a a

1 ... i ... k+1 ... j ... n

a1 ... i i+1 ... n

a b1 ... i i+1 ... n

Page 80: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Advantage: output similar to experts’

Maize 6 (target),

Rice (base)

Page 81: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Advantage: proposes testable hypotheses

• New relations predicted;greater resolution maps confirm

Ahn-Tanksley ‘93 Ahn-Tanksley data Wilson et. al. ‘99

Maize 7 (target),

Rice (base)

Page 82: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Advantage: infers evolutionary events

Maize 1 (target)

Rice (base)

Wilson et al.

Stack

Page 83: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Problem: Incomplete input

• Gene order not always fully resolved.

• Co-located genes can be ordered to give most parsimonious labeling.

8p

19p

33.0 Atp6b1 (8p)33.0 Comp (19)33.0 Jak3 (19p)33.0 Jund1 (19p)33.0 Lpl (8p)33.0 Mel (19p)33.0 Npy1r (4q)33.0 Pde4c (19)33.033.0 Srebf1 (17p)

Slc18a1 (8p)

Atp6b1 (8p)Lpl (8p)

Npy1r (4q)Srebf1 (17p)Comp (19)Jak3 (19p)Jund1 (19p)Mel (19p)Pde4c (19)

Slc18a1 (8p)

=

8p

19p

33.033.033.033.033.033.033.033.033.033.0

Page 84: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

The reordering algorithm

• Uses a compression scheme– Within a megalocus, group genes by location

of related gene.– Order these groups– First, last groups interact with nearby genes– Any ordering of internal groups is equally

parsimonious

Page 85: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

The reordering algorithm

Page 86: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

The reordering algorithm

Page 87: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Definitions

extended to distance to a set A of labels

0 if a A,

1 otherwise

li = set of labels matching markers in megalocus i

S = set of megalocus start nodes

(a, A) =

Page 88: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Definitions

• p(i,a,b) gives the total mismatched marker and segment boundary penalties attributed to “hidden markers”– i is index for megalocus

– a and b are labels for megalocus ends

– Do any markers in megalocus match a, b?

• No: don’t penalize in recurrence and p(i,a,b)

Page 89: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Recurrence relation

S[i,i,a] = m (a, li)

S[i,j,a] = min:

m (a, li) + min (S[i+1,j,b] + s (a,b) + p(i,a,b))

min S[i,k,a] + S[k+1,j,a] i<k<jk S

bL

a a

1 ... i ... k+1 ... j ... n

a1 ... i i+1 ... n

a b1 ... i i+1 ... n

Page 90: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Results: Fewer mismatches

stack reordering

Mouse 5 (target)

Human (base)

Page 91: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Results: Mismatches placed between segments

stack reordering

Mouse 8 (target)

Human (base)

Page 92: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Results: Detects new segments

stack reordering

Mouse 13 (target)

Human (base)

Page 93: Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.

Summary

• Global view

• Finds optimal comparative map– Arranges markers in most parsimonious way

• Biologically meaningful results

• Robust– not species-specific– high/low resolution, genetic/physical maps– stable to errors in marker order