Top Banner
Near-optimal probabilistic RNA-seq quantification Bray and Pachter et al. Nature biotechnology(2016) doi:10.1038/nbt.3519 Saket Choudhary September 25, 2016
24

Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ......

Jul 16, 2018

Download

Documents

dotuong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

Near-optimal probabilistic RNA-seqquantificationBray and Pachter et al. Nature biotechnology(2016)doi:10.1038/nbt.3519

Saket ChoudharySeptember 25, 2016

Page 2: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

RNA-Seq Workflow

Zheng and Mortazavi(2012)

1/12

Page 3: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

RNA-Seq Workflow

Zheng and Mortazavi(2012)

2/12

Page 4: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

Motivation

• First two steps in typical RNA-Seq processing pipeline:

• Alignment• Quantification

• Alignments are slow and probably not so important

3/12

Page 5: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

Motivation

• First two steps in typical RNA-Seq processing pipeline:• Alignment

• Quantification

• Alignments are slow and probably not so important

3/12

Page 6: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

Motivation

• First two steps in typical RNA-Seq processing pipeline:• Alignment• Quantification

• Alignments are slow and probably not so important

3/12

Page 7: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

Motivation

• First two steps in typical RNA-Seq processing pipeline:• Alignment• Quantification

• Alignments are slow and probably not so important

3/12

Page 8: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

It’s all about compatible transcripts

• Circumvent alignment step – Use information from k−mers• Pseudoalignment: Find compatible transcripts for a read,without pinpointing where exactly it aligns

4/12

Page 9: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

Method I

Figure 1: Reads and overlapping transcripts

5/12

Page 10: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

Method II

Figure 2: de Bruijn Graph

6/12

Page 11: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

Method III

Figure 3: Transcriptome - de Bruijn Graph. Node = k−mers, Path = Transcript

7/12

Page 12: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

Method IV

Figure 4: k−mers in read = black nodes

8/12

Page 13: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

Method V

Figure 5: Nodes can be skipped if k−mers did arise from blue transcript

9/12

Page 14: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

Method VI

Figure 6: Intersection of k-compatibility class

10/12

Page 15: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

Method VII

Figure 7: Quantification over a cup of coffee

11/12

Page 16: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

Questions?

11/12

Page 17: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

notes

• Better than Sailfish that looks up k−mers in reads intok−mers of transriptome

• Pseudoalignment: Find compatible transcript for a read, notwhere it exactly aligns

• Key Idea: Find comtabile transcript for a read, not where itexactly aligns

• Psuedoalignment: Subset S ⊂ T such that read r is compatible.• Hash k-mers of reads and have a de-bruijn graph oftranscriptome assembly handy

• T-DBG: nodes are k-mers , each transcript corresponds to a pathand path cover induces a k-comptability class for each k-mer

• T-DBG: Colors correspond to transcripts, node corresponds tok-mers, every k-mer receives a color for each transcript it occursin

• Hash table stores mapping of each k-mer to the contig it iscontained in

Page 18: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

notes

• Better than Sailfish that looks up k−mers in reads intok−mers of transriptome

• Pseudoalignment: Find compatible transcript for a read, notwhere it exactly aligns

• Key Idea: Find comtabile transcript for a read, not where itexactly aligns

• Psuedoalignment: Subset S ⊂ T such that read r is compatible.• Hash k-mers of reads and have a de-bruijn graph oftranscriptome assembly handy

• T-DBG: nodes are k-mers , each transcript corresponds to a pathand path cover induces a k-comptability class for each k-mer

• T-DBG: Colors correspond to transcripts, node corresponds tok-mers, every k-mer receives a color for each transcript it occursin

• Hash table stores mapping of each k-mer to the contig it iscontained in

Page 19: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

notes

• Better than Sailfish that looks up k−mers in reads intok−mers of transriptome

• Pseudoalignment: Find compatible transcript for a read, notwhere it exactly aligns

• Key Idea: Find comtabile transcript for a read, not where itexactly aligns

• Psuedoalignment: Subset S ⊂ T such that read r is compatible.• Hash k-mers of reads and have a de-bruijn graph oftranscriptome assembly handy

• T-DBG: nodes are k-mers , each transcript corresponds to a pathand path cover induces a k-comptability class for each k-mer

• T-DBG: Colors correspond to transcripts, node corresponds tok-mers, every k-mer receives a color for each transcript it occursin

• Hash table stores mapping of each k-mer to the contig it iscontained in

Page 20: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

notes

• Better than Sailfish that looks up k−mers in reads intok−mers of transriptome

• Pseudoalignment: Find compatible transcript for a read, notwhere it exactly aligns

• Key Idea: Find comtabile transcript for a read, not where itexactly aligns

• Psuedoalignment: Subset S ⊂ T such that read r is compatible.

• Hash k-mers of reads and have a de-bruijn graph oftranscriptome assembly handy

• T-DBG: nodes are k-mers , each transcript corresponds to a pathand path cover induces a k-comptability class for each k-mer

• T-DBG: Colors correspond to transcripts, node corresponds tok-mers, every k-mer receives a color for each transcript it occursin

• Hash table stores mapping of each k-mer to the contig it iscontained in

Page 21: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

notes

• Better than Sailfish that looks up k−mers in reads intok−mers of transriptome

• Pseudoalignment: Find compatible transcript for a read, notwhere it exactly aligns

• Key Idea: Find comtabile transcript for a read, not where itexactly aligns

• Psuedoalignment: Subset S ⊂ T such that read r is compatible.• Hash k-mers of reads and have a de-bruijn graph oftranscriptome assembly handy

• T-DBG: nodes are k-mers , each transcript corresponds to a pathand path cover induces a k-comptability class for each k-mer

• T-DBG: Colors correspond to transcripts, node corresponds tok-mers, every k-mer receives a color for each transcript it occursin

• Hash table stores mapping of each k-mer to the contig it iscontained in

Page 22: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

notes

• Better than Sailfish that looks up k−mers in reads intok−mers of transriptome

• Pseudoalignment: Find compatible transcript for a read, notwhere it exactly aligns

• Key Idea: Find comtabile transcript for a read, not where itexactly aligns

• Psuedoalignment: Subset S ⊂ T such that read r is compatible.• Hash k-mers of reads and have a de-bruijn graph oftranscriptome assembly handy

• T-DBG: nodes are k-mers , each transcript corresponds to a pathand path cover induces a k-comptability class for each k-mer

• T-DBG: Colors correspond to transcripts, node corresponds tok-mers, every k-mer receives a color for each transcript it occursin

• Hash table stores mapping of each k-mer to the contig it iscontained in

Page 23: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

notes

• Better than Sailfish that looks up k−mers in reads intok−mers of transriptome

• Pseudoalignment: Find compatible transcript for a read, notwhere it exactly aligns

• Key Idea: Find comtabile transcript for a read, not where itexactly aligns

• Psuedoalignment: Subset S ⊂ T such that read r is compatible.• Hash k-mers of reads and have a de-bruijn graph oftranscriptome assembly handy

• T-DBG: nodes are k-mers , each transcript corresponds to a pathand path cover induces a k-comptability class for each k-mer

• T-DBG: Colors correspond to transcripts, node corresponds tok-mers, every k-mer receives a color for each transcript it occursin

• Hash table stores mapping of each k-mer to the contig it iscontained in

Page 24: Near-optimal probabilistic RNA-seq quantification · Near-optimal probabilistic RNA-seq quantification BrayandPachteretal.Naturebiotechnology(2016) ... k-mers,everyk-merreceivesacolorforeachtranscriptitoccurs

notes

• Better than Sailfish that looks up k−mers in reads intok−mers of transriptome

• Pseudoalignment: Find compatible transcript for a read, notwhere it exactly aligns

• Key Idea: Find comtabile transcript for a read, not where itexactly aligns

• Psuedoalignment: Subset S ⊂ T such that read r is compatible.• Hash k-mers of reads and have a de-bruijn graph oftranscriptome assembly handy

• T-DBG: nodes are k-mers , each transcript corresponds to a pathand path cover induces a k-comptability class for each k-mer

• T-DBG: Colors correspond to transcripts, node corresponds tok-mers, every k-mer receives a color for each transcript it occursin

• Hash table stores mapping of each k-mer to the contig it iscontained in