Top Banner
The spectra of somatic mutations across many tumor types Mike Lawrence Broad Institute of Harvard and MIT 1st Annual TCGA Scientific Symposium November 17, 2011
48

The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

Mar 19, 2018

Download

Documents

dinhkien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

The spectra of somatic mutations across many tumor types

Mike Lawrence Broad Institute of Harvard and MIT

1st Annual TCGA Scientific Symposium November 17, 2011

Page 2: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

?? ?? HPV

HPV &

Hematologic Childhood

Carcinogens

mutation rates across cancer

Page 3: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

OV

mutation type C → T C → A C → G A → G A → T A → C

Page 4: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

OV

mutation type C → T C → A C → G A → G A → T A → C

mutation rate (per million sites)

Page 5: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

GBM

mutation type C → T C → A C → G A → G A → T A → C

mutation rate (per million sites)

Page 6: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

LUSC lung squamous

mutation type C → T C → A C → G A → G A → T A → C

Page 7: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

LUAD lung adeno

mutation type C → T C → A C → G A → G A → T A → C

Page 8: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

Melanoma

mutation type C → T C → A C → G A → G A → T A → C

Page 9: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

cervical

mutation type C → T C → A C → G A → G A → T A → C

Page 10: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

bladder

mutation type C → T C → A C → G A → G A → T A → C

Page 11: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra
Page 12: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

total rate

1/Mb

10/Mb

100/Mb

0.1/Mb

Page 13: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

total rate

type of spectrum

Page 14: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

Lung

Melanoma

Head&Neck HPV

Bladder viral?

Esophageal

Gastric Colorectal

GBM

Kidney

HPV

Page 15: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

H&N

Bladder

GBM

Esophageal

Gastric Colorectal

Melanoma

Lung

Kidney

Page 16: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

finding significantly mutated genes

Page 17: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

significance

scoring algorithm

patients ge

nes

tally

*

MutSig

Page 18: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

significance

scoring algorithm

assume background mutation rate is: · uniform across sequence contexts · uniform across patients · uniform across genes

version 0

patients ge

nes

tally

*

MutSig

Page 19: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

scoring algorithm version 1

assume background mutation rate is: · variable across sequence contexts · uniform across patients · uniform across genes

significance patients ge

nes

tally

*

MutSig

C→T (UV-induced)

A→T

Page 20: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

scoring algorithm version 1

gene W

gene X

gene Y

gene Z

assume background mutation rate is: · variable across sequence contexts · uniform across patients · uniform across genes

melanoma patients C→T (UV-induced) A→T

significance patients ge

nes

tally

*

MutSig

Page 21: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

scoring algorithm version 2

patient 1 low mutation rate

patient 2 high mutation rate

assume background mutation rate is: · variable across sequence contexts · variable across patients · uniform across genes

significance patients ge

nes

tally

*

MutSig

Page 22: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

scoring algorithm version 2

patient 1 low mutation rate

patient 2 high mutation rate

gene A

gene B

gene C

gene D

assume background mutation rate is: · variable across sequence contexts · variable across patients · uniform across genes

significance patients ge

nes

tally

*

MutSig

Page 23: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

scoring algorithm version 2

gene J

gene K

gene L

gene M

assume background mutation rate is: · variable across sequence contexts · variable across patients · uniform across genes

significance patients ge

nes

tally

*

Problem: mutation rate is heterogeneous across genes

MutSig

Page 24: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

scoring algorithm version 2

significance patients ge

nes

tally

*

Problem: mutation rate is heterogeneous across genes

MutSig

average = 3 / Mb uniform across genes

assume background mutation rate is: · variable across sequence contexts · variable across patients · uniform across genes

Page 25: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

scoring algorithm version 2

significance patients ge

nes

tally

*

Problem: mutation rate is heterogeneous across genes

MutSig

q<0.01

assume background mutation rate is: · variable across sequence contexts · variable across patients · uniform across genes

→ hits

average = 3 / Mb uniform across genes

Page 26: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

scoring algorithm version 2

significance patients ge

nes

tally

*

Problem: mutation rate is heterogeneous across genes

MutSig

average = 3 / Mb

q<0.01

assume background mutation rate is: · variable across sequence contexts · variable across patients · uniform across genes

→ hits

average = 3 / Mb uniform across genes

Page 27: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

scoring algorithm version 2

significance patients ge

nes

tally

*

Problem: mutation rate is heterogeneous across genes

MutSig

25% genes: rate = 6 / Mb

q<0.01

assume background mutation rate is: · variable across sequence contexts · variable across patients · uniform across genes

→ hits

average = 3 / Mb uniform across genes

average = 3 / Mb

Page 28: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

scoring algorithm version 2

significance patients ge

nes

tally

*

Problem: mutation rate is heterogeneous across genes

MutSig

75% genes: rate = 2 / Mb 25% genes: rate = 6 / Mb

q<0.01

assume background mutation rate is: · variable across sequence contexts · variable across patients · uniform across genes

→ hits

average = 3 / Mb uniform across genes

average = 3 / Mb

Page 29: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

scoring algorithm version 2

significance patients ge

nes

tally

*

Problem: mutation rate is heterogeneous across genes

MutSig

→ false positives

75% genes: rate = 2 / Mb 25% genes: rate = 6 / Mb

q<0.01

assume background mutation rate is: · variable across sequence contexts · variable across patients · uniform across genes

→ hits

average = 3 / Mb uniform across genes

average = 3 / Mb

Page 30: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

Lung cancer 457 patients 180 lung squamous cell carcinoma 277 lung adenocarcinoma average mutation rate = 10 / Mb

MutSig results (assuming uniform background mutation rate across genes)

#1 TP53 #2 KRAS #7 OR4A15 #13 KEAP1 #14 OR8H2 #15 STK11 #17 OR2T4 #25 OR2T3 #31 OR2T6 #48 CSMD3 #49 OR5D16 #55 RYR2 #100 CSMD1 #139 PIK3CA #158 RYR3 #159 MUC16 #161 OR2T33 #169 NFE2L2 #172 OR10G8 #180 OR2L8 #198 MUC17 #217 TTN

total of 843 genes significantly mutated (q<0.01)

all of these genes are extremely significant

(q<10-7)

* *

*

*

*

*

* known lung cancer genes

version 0

Bryan Hernandez Peter Hammerman Marcin Imielinski Matthew Meyerson

Page 31: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

Lung cancer 457 patients 180 lung squamous cell carcinoma 277 lung adenocarcinoma average mutation rate = 10 / Mb

MutSig results (assuming uniform background mutation rate across genes)

total of 843 genes significantly mutated (q<0.01)

all of these genes are extremely significant

(q<10-7)

* *

*

*

*

*

* known lung cancer genes

version 0 olfactory receptors (146 with q<0.01)

titin largest human protein 100x bigger than p53 34,350 amino acids 100 Kb coding sequence

"fishy" genes

mucins gel-forming proteins

"cub and sushi" proteins reported to be tumor suppressors but significantly mutated in almost every tumor type (including TCGA ovarian)

#1 TP53 #2 KRAS #7 OR4A15 #13 KEAP1 #14 OR8H2 #15 STK11 #17 OR2T4 #25 OR2T3 #31 OR2T6 #48 CSMD3 #49 OR5D16 #55 RYR2 #100 CSMD1 #139 PIK3CA #158 RYR3 #159 MUC16 #161 OR2T33 #169 NFE2L2 #172 OR10G8 #180 OR2L8 #198 MUC17 #217 TTN

ryanodine receptors cardiac calcium channels

Page 32: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

scoring algorithm version 2

significance patients ge

nes

tally

*

Problem: mutation rate is actually heterogeneous across genes

Challenge: predict gene-specific background mutation rates

We eventually want to learn the background mutation rate of every gene (and all possible mutations at all basepairs!) As we sequence more and more samples, we get closer to this goal.

MutSig

assume background mutation rate is: · variable across sequence contexts · variable across patients · uniform across genes

Page 33: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

Chapman et al. Nature (2011)

`

`` Low → High

3

2

1

0 expression

mut

atio

n ra

te (/

Mb)

Highly expressed genes have lower mutation rates

Page 34: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

mut

atio

ns /

Mb

position (Mb)

background mutation rate varies ten-fold or more across the genome shown: noncoding mutation rate from TCGA lung cancer dataset

chr10

Page 35: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

mut

atio

ns /

Mb

position (Mb)

chr10

repl

icat

ion

late

early

replication time also varies greatly

across the genome

Sunyaev Lab (Harvard/BWH) Stamatoyannopoulos et al. (2009) Nat. Gen.

shown: replication time measurements from Chen et al. (2010) Genome Research 20:447

background mutation rate varies ten-fold or more across the genome shown: noncoding mutation rate from TCGA lung cancer dataset

highly correlated

Early-replicating genes have lower mutation rates

Page 36: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

Late replication explains most olfactory receptors

16 ORs

chr1 (Mb)

repl

icat

ion

late

early

mut

atio

ns /

Mb

Early Late

All Genes

Olfactory Receptors

Page 37: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

mut

atio

ns /

Mb

position (Mb)

chr8

repl

icat

ion

late

early

mut

atio

ns /

Mb

repl

icat

ion

late

early

CSMD3

extrapolate even later replication times

Page 38: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

initial model assumed a flat mutational landscape

gene B

gene A

mutation rate

Page 39: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

landscape is actually not flat

gene A mutation rate

gene B

Page 40: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

improve estimate by binning together similar genes...

mutation rate

gene B

gene A

Page 41: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

...or by local regression

mutation rate

average outward until neighborhood

becomes too different from starting point

gene A

gene B

Page 42: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

Lung cancer

MutSig v0 assuming uniform bkgd mutation rate across all genes

#1 TP53 #2 KRAS #7 OR4A15 #13 KEAP1 #14 OR8H2 #15 STK11 #17 OR2T4 #25 OR2T3 #31 OR2T6 #48 CSMD3 #49 OR5D16 #55 RYR2 #100 CSMD1 #139 PIK3CA #158 RYR3 #159 MUC16 #161 OR2T33 #169 NFE2L2 #172 OR10G8 #180 OR2L8 #198 MUC17 #217 TTN

843 genes significantly mutated (q<0.01)

q<10-7

* *

*

*

*

*

* known lung cancer genes "fishy" genes

Page 43: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

Lung cancer

MutSig v0 assuming uniform bkgd mutation rate across all genes

#1 TP53 #2 KRAS #7 OR4A15 #13 KEAP1 #14 OR8H2 #15 STK11 #17 OR2T4 #25 OR2T3 #31 OR2T6 #48 CSMD3 #49 OR5D16 #55 RYR2 #100 CSMD1 #139 PIK3CA #158 RYR3 #159 MUC16 #161 OR2T33 #169 NFE2L2 #172 OR10G8 #180 OR2L8 #198 MUC17 #217 TTN

843 genes significantly mutated (q<0.01)

q<10-7

* *

*

*

*

*

* known lung cancer genes "fishy" genes

STK11 #1 NFE2L2 #4 TP53 #7 KRAS #8 KEAP1 #11 PIK3CA #12

OR8H2 #181 OR5T2 #276 OR10J3 #334 CSMD3 #388 MUC17 #2614 RYR2 #2898 CSMD1 #4482 TTN #4825 MUC16 #5650 RYR3 #11496

* *

*

*

*

*

q<10-5 52 genes

significantly mutated (q<0.01)

q=1

q~0.2

improved MutSig using gene-specific background mutation rates

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

*

*most significant olfactory receptor

Page 44: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

Lung Squamous 261 (50 OR) 18 (0 OR) Lung Adeno 511 (93 OR) 33 (1 OR) Melanoma 177 (7 OR) 61 (0 OR) Prostate 3 (0 OR) 3 (0 OR) DLBCL 32 (1 OR) 15 (0 OR)

Before After

Correcting for variation in mutation rate

Ultimate solution: Learn the background rate

Page 45: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

putting it all together

Page 46: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

Fp,s,c Σ k

Fp wp,k vk,c µp,s,c = = µo µo · · · · ( ) Fg

mutation rate in gene g

patient p category c

relative mutation rate in gene g patient p category c

overall mutation rate across entire dataset

relative mutation rate of patient p

relative mutation rate of gene g

sum across k "factors" (i.e. mutational processes)

weight of factor k in patient p

contribution of factor k to mutation category c

patient 1 patient 2

Page 47: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

significantly mutated genes across tumor types tu

mor

type

s

TP53 KRAS

NRAS PTEN PIK3CA IDH1 EGFR

ATM BRAF SF3B1 MYD88 NFE2L2 TNFRSF14

CREBBP

Page 48: The Spectra of Somatic Mutations Across Many Tumor Types · PDF fileThe spectra of somatic mutations across many tumor types ... mutation rate is actually heterogeneous ... The Spectra

Acknowledgements MutSig Team Petar Stojanov Bryan Hernandez Marcin Imielinski Peter Hammerman Gregory Kryukov Eran Hodis Chip Stewart Analysis Team Kristian Cibulskis Andrey Sivachenko Scott Carter Gordon Saksena Yotam Drier Alex Ramos Aaron McKenna Rui Jing Lihua Zou David DeLuca Elena Helman Jaegil Kim Cheng-Zhong Zhang Sylvan Baca Trevor Pugh Nam Pho Andrew Cherniak Alex Kostic Peter Chen Nikhil Wagle

Broad Institute Sequencing Program and Platform Tim Fennell Sara Chauvin Lauren Ambrogio Sheila Fisher Joshua Levin Xian Adiconis Andreas Gnirke David Jaffe Toby Bloom Chad Nusbaum

Broad Institute Biological Genetic Analysis Platform Robb Onofrio Brendan Blumenstiel Huy Nguyen Mellisa Parkin Wendy Winckler Broad Institute Biological Samples Platform Kristin Ardlie

Stacey Gabriel

Levi Garraway

Lynda Chin Broad Institute of Harvard and MIT

Dana Farber Cancer Institute

NHGRI NCI

Project Management Carrie Sougnez Erica Shefler Elizabeth Nickerson Daniel Auclair Marisa Cortes Kristin Thompson

Jim Robinson Helga Thorvaldsdottir Marc-Danie Nazaire Jill Mesirov

Mark DePristo Eric Banks Kiran Garimella

Firehose Doug Voet Mike Noble Pei Lin Dan DiCara Lee Lichtenstein Robert Zupko Peter Carr

Rameen Beroukhim Steve Schumacher Barbara Tabak

Cathy Wu Jennifer Brown Lili Wang Youzhong Wan Dan-Avi Landau Aviv Regev Nathalie Pochet

Matthew Meyerson

Eric Lander

Todd Golub

Gaddy Getz

Adam Bass Austin Dulak

(cont.) Mike Berger Mike Chapman Jens Lohr Derek Chiang Craig Mermel Nicolas Stransky Roel Verhaak Barbara Weir Shantanu Banerji

Shamil Sunyaev Paz Polak Leonid Mirny