Top Banner
3/23/09 [email protected] Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor Laboratory
16

3/23/[email protected] Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

Dec 17, 2015

Download

Documents

Ashlyn Cummings
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

3/23/09 [email protected] shRNA libraries with DNA Sudoku

Yaniv Erlich Hannon Lab

Yaniv Erlich Hannon Lab

Compressed Genotyping

Cold Spring Harbor Laboratory

Page 2: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

Poster in a nutshell• Genotyping is the process of determining the genetic variation for a certain trait in an individual.

• It is one of the main diagnostic tools in medical genetics- Finding carriers for rare genetic diseases such as Cystic Fibrosis- Tissue matching in organ donation- Forensic DNA analysis

• Until now - only serial genotyping is possible. This is expensive and tedious.

• Taking advantage on the ‘signal sparsity’, we developed and tested a compressed genotyping framework.

Page 3: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

3/23/09 [email protected] shRNA libraries with DNA Sudoku

Significant volumes of knowledge have been accumulated in recent years linking subtle genetic variations to a wide variety of medical disorders from cystic fibrosis to mental retardation. Nevertheless, there are still great challenges in applying this knowledge routinely in the clinic, largely due to the relatively tedious and expensive process of DNA sequencing. Since the genetic polymorphisms that underlie these disorders are relatively rare in the human population, the presence or absence of a disease-linked polymorphism can be thought of as a sparse signal. Using methods and ideas from compressed sensing and group testing, we have developed a cost-effective reconstruction protocol, called "DNA Sudoku", to retrieve useful data. In particular, we have adapted our scheme to a recently developed class of high throughput DNA sequencing technologies, and assembled a mathematical framework that has some important distinctions from 'traditional' compressed sensing ideas in order to address different biological and technical constraints.

Abstract

Page 4: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

The genotyping problem

Input: Thousands of specimens

Output: Genotype of each specimen

Genotype

Page 5: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

Genotyping as a sparse graph reconstruction

Samples

Alleles

An example of carrier screen for Cystic Fibrosis. There are two allele nodes, the Wild Type (WT) and the and the Cystic Fibrosis mutation. Samples 1, 2, 3, 5 are WT, while specimen 4 is a carrier. The specimen labeled with ’X’ is affected and does not enter to the screen. Genotyping is equivalent of finding the edges in the graph.

THE GRAPH IS SPARSE 1.Number of carriers is very low2.No affected individuals3.The degree of every sample node is always two (human genome is diploid)

Genotyping is equivalent to reveal the edges of the bipartite graph

Page 6: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

3/23/09 [email protected] shRNA libraries with DNA Sudoku

The main idea – pooled processing

One could reveal the graph edges by DNA sequence each sample

- expensive, tedious, and slow

Better:

Pool the samples and then sequence the pools

Page 7: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

3/23/09 [email protected] shRNA libraries with DNA Sudoku

Allele

AllelePool

What the observer sees

The biadjacency matrix of the graph

What the observer wants

The pooling design

A binary matrix (‘1’ – in the pool, ‘0’ – otherwise)

Mathematically speaking

Pool

Specimen

Specimen

0 2

0 2

0 2

11

0 2

1 0 1 1 1

1 1 0 1 0

1 1 0 0 1

1 7

1 5

0 6

Page 8: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

What is a good pooling design

Attribute WhyDecodability

Small number of pools Less genotyping assays

Constant column weight The robot can pull several specimens every step

Low column weight Less robotics efforts

Low row weight Reducing the chance for biological noise

Trivial compressed sensing demands

Biological oriented requirements

We need a light-weight d-disjunct matrix

Page 9: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

Inputs: N (number of specimens)

Column Weight (robotics efforts)

Algorithm:

1. Find W numbers {x1,x2,…,xw} such that:

(a) Bigger than

(b) Pairwise coprime

2. Generate W modular equations:

3. Construct the pooling matrix upon the modular equations

Output: Pooling matrix

Light Chinese Design

N

)(mod

)(mod 1

WxPoolSpecimen

xPoolSpecimen

The algorithm reaches the bound derived by Kautz & Singleton (1964)

Page 10: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

Example of a pooling matrix

Page 11: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

Decoding the genotyping results by Belief Propagation

The pooled results can be decoded as using Belief Propagation

SpecimensPools

Genotyping results

A-priori biological informati

on

Page 12: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

03/06/09

Example of Belief Propagation

Specimens

Pools

Specimen is in a pool

#1

#2

#3

#4

#5

#6

#7

CBA D

CBA D

CBA D

CBA D

CBA D

CBA D

CBA D

DCA

ACB

CBA

CDB

1.You can be either A, C, or D

Possible genotypes:

2. I can’t be B

3.Specimen #3, #6 and #7: One of you guys

should be B

CBA D

CBA D

CBA D

Page 13: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

Simulation results

1000 specimens

W = 5

Total pools = 180

Number of carriers

Page 14: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

Real results – biotechnology application

40,000 specimens

W = 5

Total pools = 1900

Page 15: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

Work in progress

Page 16: 3/23/09erlich@cshl.edu Sequencing shRNA libraries with DNA Sudoku Yaniv Erlich Hannon Lab Yaniv Erlich Hannon Lab Compressed Genotyping Cold Spring Harbor.

References & Acknowledgments

• Compressed Genotyping. Yaniv Erlich, Assaf Gordon, Michael Brand, Gregory J. Hannon & Partha P. Mitra. Submitted to IEEE Trans. Info. Theory. 2009.

• DNA Sudoku - harnessing high-throughput sequencing for multiplexed specimen analysis. Yaniv Erlich, Kenneth Chang, Assaf Gordon, Roy Ronen, Oron Navon, Michelle Rooks & Gregory J. Hannon. Genome Research. 2009.

Lindsay-Goldberg Fellowship