Top Banner
Lecture 23. Genomic Futures Michael Schatz April 20, 2020 JHU 600.749: Applied Comparative Genomics
42

Lecture 23. Genomic Futures - Schatzlab

May 01, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 23. Genomic Futures - Schatzlab

Lecture 23. Genomic FuturesMichael Schatz

April 20, 2020JHU 600.749: Applied Comparative Genomics

Page 2: Lecture 23. Genomic Futures - Schatzlab
Page 3: Lecture 23. Genomic Futures - Schatzlab
Page 4: Lecture 23. Genomic Futures - Schatzlab

Part I. Metagenomics

Page 5: Lecture 23. Genomic Futures - Schatzlab

Your second genome?

Are We Really Vastly Outnumbered? Revisiting the Ratio of Bacterial to Host Cells in HumansSender et al (2016) Cell. http://doi.org/10.1016/j.cell.2016.01.013

Human body:~10 trillion cells

Human brain:~3.3 lbs

Microbiome~100 trillion cells

Total mass:~3.3 lbs

Page 6: Lecture 23. Genomic Futures - Schatzlab

Pre-PCR: Gram-StainingGram staining differentiates bacteria by the chemical and physical properties of their cell walls by detecting peptidoglycan, which is present in the cell wall of Gram-positive bacteria

Page 7: Lecture 23. Genomic Futures - Schatzlab

16S rRNA

The 16S rRNA gene is a section of prokaryotic DNA found in all bacteria and archaea. This gene codes for an rRNA, and this rRNA in turn makes up part of the ribosome.

The 16S rRNA gene is a commonly used tool for identifying bacteria for several reasons. First, traditional characterization depended upon phenotypic traits like gram positive or gram negative, bacillus or coccus, etc. Taxonomists today consider analysis of an organism's DNA more reliable than classification

based solely on phenotypes. Secondly, researchers may, for a number of reasons, want to identify or classify only the bacteria within a given environmental or medical sample. Thirdly, the 16S rRNA gene is relatively short at 1.5 kb, making it faster and cheaper to sequence than many other unique bacterial genes.

http://greengenes.lbl.gov/cgi-bin/JD_Tutorial/nph-16S.cgi

Page 8: Lecture 23. Genomic Futures - Schatzlab
Page 9: Lecture 23. Genomic Futures - Schatzlab
Page 10: Lecture 23. Genomic Futures - Schatzlab
Page 11: Lecture 23. Genomic Futures - Schatzlab

16S versus shotgun NGS

16S

Fast (minutes – hours)Directed analysis

Cheap per sample Family/Genus Identification

NGS

Slower (hours to days)Whole Metagenome

More expensive per sampleSpecies/Strain IdentificationGenes presence/absence

Variant analysis

Eukaryotic hostsCan ID fungi, viruses, etc.

Page 12: Lecture 23. Genomic Futures - Schatzlab

Kraken

Kraken: ultrafast metagenomic sequence classification using exact alignmentsWood and Salzberg (2014) Genome Biology. DOI: 10.1186/gb-2014-15-3-r46

Page 13: Lecture 23. Genomic Futures - Schatzlab

Global Ocean Survey

Page 14: Lecture 23. Genomic Futures - Schatzlab

Metasub

Geospatial Resolution of Human and Bacterial Diversity with City-Scale MetagenomicsAfshinnekoo et al (2016) Cell Systems. http://dx.doi.org/10.1016/j.cels.2015.01.001

Page 15: Lecture 23. Genomic Futures - Schatzlab
Page 16: Lecture 23. Genomic Futures - Schatzlab

Microbes and Human Health

“MICROBE DIET Mice fed microbes from obese people tend to gain fat. Microbes from lean people protect mice from excessive weight gain, even when animals eat a high-fat, low-fiber diet.”

Gut Microbiota from Twins Discordant for Obesity Modulate Metabolism in MiceRidaura et al (2013) Science. doi: 10.1126/science.1241214

Page 17: Lecture 23. Genomic Futures - Schatzlab

Microbes and Human Health

The human microbiome: at the interface of health and diseaseCho & Blaser (2012) Nature Reviews Genetics. doi:10.1038/nrg3182

Page 18: Lecture 23. Genomic Futures - Schatzlab

Human Microbiome Project

Structure, function and diversity of the healthy human microbiomeThe Human Microbiome Project Consortium (2012) Nature. doi:10.1038/nature11234

Page 19: Lecture 23. Genomic Futures - Schatzlab

Functional composition tends to be more stable than genome composition

Structure, function and diversity of the healthy human microbiomeThe Human Microbiome Project Consortium (2012) Nature. doi:10.1038/nature11234

Page 20: Lecture 23. Genomic Futures - Schatzlab

0 4 8 12 16 20 24 28 32 36 40 44 48

0

25

50

75

100

B0 H0Sa

mpl

eB1_

0_S7

Sam

pleB

2_0_

S8Sa

mpl

eB3_

0_S9

Sam

pleB

4_0_

S10

Sam

pleH

3_0_

S5Sa

mpl

eH4_

0_S6

Sam

pleU

1_0_

S1Sa

mpl

eU2_

0_S2

Sam

pleU

3_0_

S3Sa

mpl

eU4_

0_S4 U0 B4 H4 U4 B8 H8 U8 B1

2H1

2U1

2U1

6H1

6B1

6U2

0H2

0B2

0B2

4H2

4U2

4B2

8H2

8U2

8U3

2H3

2B3

2U3

6H3

6B3

6U4

0H4

0B4

0U4

4H4

4B4

4B1

_48_

S9B2

_48_

S10

B3_4

8_S1

1B4

_48_

S12

H1_4

8_S5

H2_4

8_S6

H3_4

8_S7

H4_4

8_S8

U1_4

8_S1

U2_4

8_S2

U3_4

8_S3

U4_4

8_S4

sample

Rela

tive.

Abun

danc

e

SpeciesListeria monocytogenesAnoxybacillus flavithermusThermus parvatiensisThermus thermophilusGeobacillus stearothermophilusVibrio alginolyticusStaphEpidermidis_d101_6055 BranchPseudomonas fulvaStaphEpidermidis_d99_6057 BranchEnterococcus faeciumPseudomonas sp. URMO17WK12:I11Firmicutes bacterium JGI 0000112−M16Vibrio antiquarius

Pasteurella multocidaEscherichia coliStreptococcus_2055 BranchStreptococcus thermophilusEnterococcus faecalisStaphEpidermidis_d100_6056 BranchStaphylococcus aureusClostridium perfringensGeobacillus_12818 BranchFirmicutes bacterium JGI 0000112−P22Enterococcus sp. GMD5Eothers

0 4 8 12 16 20 24 28 32 36 40 44 48

0

25

50

75

100

B0 H0Sa

mpl

eB1_

0_S7

Sam

pleB

2_0_

S8Sa

mpl

eB3_

0_S9

Sam

pleB

4_0_

S10

Sam

pleH

3_0_

S5Sa

mpl

eH4_

0_S6

Sam

pleU

1_0_

S1Sa

mpl

eU2_

0_S2

Sam

pleU

3_0_

S3Sa

mpl

eU4_

0_S4 U0 B4 H4 U4 B8 H8 U8 B1

2H1

2U1

2U1

6H1

6B1

6U2

0H2

0B2

0B2

4H2

4U2

4B2

8H2

8U2

8U3

2H3

2B3

2U3

6H3

6B3

6U4

0H4

0B4

0U4

4H4

4B4

4B1

_48_

S9B2

_48_

S10

B3_4

8_S1

1B4

_48_

S12

H1_4

8_S5

H2_4

8_S6

H3_4

8_S7

H4_4

8_S8

U1_4

8_S1

U2_4

8_S2

U3_4

8_S3

U4_4

8_S4

sample

Rela

tive.

Abun

danc

e

SpeciesListeria monocytogenesAnoxybacillus flavithermusThermus parvatiensisThermus thermophilusGeobacillus stearothermophilusVibrio alginolyticusStaphEpidermidis_d101_6055 BranchPseudomonas fulvaStaphEpidermidis_d99_6057 BranchEnterococcus faeciumPseudomonas sp. URMO17WK12:I11Firmicutes bacterium JGI 0000112−M16Vibrio antiquarius

Pasteurella multocidaEscherichia coliStreptococcus_2055 BranchStreptococcus thermophilusEnterococcus faecalisStaphEpidermidis_d100_6056 BranchStaphylococcus aureusClostridium perfringensGeobacillus_12818 BranchFirmicutes bacterium JGI 0000112−P22Enterococcus sp. GMD5Eothers

0 4 8 12 16 20 24 28 32 36 40 44 48

0

25

50

75

100

B0 H0Sa

mpl

eB1_

0_S7

Sam

pleB

2_0_

S8Sa

mpl

eB3_

0_S9

Sam

pleB

4_0_

S10

Sam

pleH

3_0_

S5Sa

mpl

eH4_

0_S6

Sam

pleU

1_0_

S1Sa

mpl

eU2_

0_S2

Sam

pleU

3_0_

S3Sa

mpl

eU4_

0_S4 U0 B4 H4 U4 B8 H8 U8 B1

2H1

2U1

2U1

6H1

6B1

6U2

0H2

0B2

0B2

4H2

4U2

4B2

8H2

8U2

8U3

2H3

2B3

2U3

6H3

6B3

6U4

0H4

0B4

0U4

4H4

4B4

4B1

_48_

S9B2

_48_

S10

B3_4

8_S1

1B4

_48_

S12

H1_4

8_S5

H2_4

8_S6

H3_4

8_S7

H4_4

8_S8

U1_4

8_S1

U2_4

8_S2

U3_4

8_S3

U4_4

8_S4

sample

Rela

tive.

Abun

danc

eSpecies

Listeria monocytogenesAnoxybacillus flavithermusThermus parvatiensisThermus thermophilusGeobacillus stearothermophilusVibrio alginolyticusStaphEpidermidis_d101_6055 BranchPseudomonas fulvaStaphEpidermidis_d99_6057 BranchEnterococcus faeciumPseudomonas sp. URMO17WK12:I11Firmicutes bacterium JGI 0000112−M16Vibrio antiquarius

Pasteurella multocidaEscherichia coliStreptococcus_2055 BranchStreptococcus thermophilusEnterococcus faecalisStaphEpidermidis_d100_6056 BranchStaphylococcus aureusClostridium perfringensGeobacillus_12818 BranchFirmicutes bacterium JGI 0000112−P22Enterococcus sp. GMD5Eothers

Listeria in ice cream

Page 21: Lecture 23. Genomic Futures - Schatzlab

Amerithrax Analysis

Bacillus anthracis comparative genome analysis in support of the Amerithrax investigationRasko et al (2011) PNAS. doi: 10.1073/pnas.1016657108

Page 22: Lecture 23. Genomic Futures - Schatzlab

Diagnosing Brain Infections with NGS

Next-generation sequencing in neuropathologic diagnosis of infections of the nervous systemSalzberg et al (2016) Neurol Neuroimmunol Neuroinflamm dx.doi.org/10.1212/NXI.0000000000000251

Page 23: Lecture 23. Genomic Futures - Schatzlab
Page 24: Lecture 23. Genomic Futures - Schatzlab

The Future of Metagenomics

• Applications:– WGS metagenomics in the clinic for anaerobic infections

and high risk patients (NICU etc.)– Surveillance: bioterror agents and epidemiology

• Methods:– Single cell, Hi-C, and long read sequencing– Computational challenges

• Species level binning of large datasets• Plasmid analysis (antimicrobial resistance genes)• Going from associations to specific mechanisms• Functional analysis

Page 25: Lecture 23. Genomic Futures - Schatzlab

Part II:

Genetic Privacy

Page 26: Lecture 23. Genomic Futures - Schatzlab
Page 27: Lecture 23. Genomic Futures - Schatzlab

What are microsatellites• Tandemly repeated sequence motifs

– Motifs are 1 – 6 nt long– So far, min. 8 nt length, min. 3 tandem repeats for our analyses

• Ubiquitous in human genome– >5.7 million uninterrupted microsatellites in hg19

• Extremely unstable– Mutation rate thought to be ~10-3 per generation in humans

• Unique mutation mechanism– Replication slippage during mitosis and meiosis

• May be under neutral selection

cCTCTCTCTCTCTCTCTCTCTCTCTCa è (CT)13

tTTGTCTTGTCTTGTCTTGTCTTGTCTTGTCc è (TTGTC)6

tCAACAACAACAACAACAACAAa è (CAA)7

cCATTCATTCATTCATTa è (CATT)4

Microsatellites: Simple Sequences with Complex EvolutionEllegren (2004) Nature Reviews Genetics. doi:10.1038/nrg1348

Page 28: Lecture 23. Genomic Futures - Schatzlab

Replication slippage• Out-of-phase re-annealing

– Nascent and template strands dissociate and re-anneal out-of-phase

• Loops repaired by mismatch repair machinery (MMR)– Very efficient for small loops– Possible strand-specific repair

• Stepwise process– Nascent strand gains or loses full

repeat units– Typically single unit mutations

• Varies by motif length, motif composition, etc.

Expansion:

Contraction:

Microsatellites: Simple Sequences with Complex EvolutionEllegren (2004) Nature Reviews Genetics. doi:10.1038/nrg1348

Page 29: Lecture 23. Genomic Futures - Schatzlab

lobSTR Algorithm Overview

lobSTR: A short tandem repeat profiler for personal genomesGymrek et al. (2012) Genome Research. doi:10.1101/gr.135780.111

Page 30: Lecture 23. Genomic Futures - Schatzlab

Why should we care about microsatellites?

• Polymorphism and mutation rate variation

• Disease– Huntington’s Disease– Fragile X syndrome– Friedrich’s ataxia

• Mutations as lineage– Organogenesis/embryonic

development– Tumor development

30Phylogenetic fate mappingSalipante (2006) PNAS. doi: 10.1073/pnas.0601265103

Page 31: Lecture 23. Genomic Futures - Schatzlab
Page 32: Lecture 23. Genomic Futures - Schatzlab
Page 33: Lecture 23. Genomic Futures - Schatzlab

Surname Inference

Whose sequence reads are these?

Identifying Personal Genomes by Surname InferenceGymrek et al (2013) Science. doi: 10.1126/science.1229566

Page 34: Lecture 23. Genomic Futures - Schatzlab

Step 1. Profile Y-STRs from the individual’s genome.

Page 35: Lecture 23. Genomic Futures - Schatzlab

Step 2. Search for a surname hit in online genetic genealogy databases.

http://www.ysearch.org

Page 36: Lecture 23. Genomic Futures - Schatzlab

Step 3. Search with additional metadata to narrow down the individual.

http://www.ussearch.com

Page 37: Lecture 23. Genomic Futures - Schatzlab

Surname Inference

It’s Craig Venter!

Identifying Personal Genomes by Surname InferenceGymrek et al (2013) Science. doi: 10.1126/science.1229566

Page 38: Lecture 23. Genomic Futures - Schatzlab

Possible route for identity tracing

● Tracing attacks combine metadata and surname inference to triangulate the identity of an unknown individual.

● With no information, there are roughly 300 million matching individuals in the US, equating to 28.0 bits of

entropy.

● Sex reduces entropy by 1 bit, state of

residence and age reduces to 16, successful surname inference reduces to ~3 bits.

● US population: ~313.9 million individuals

● log2 313,900,000 = 28.226 bits● Sex ~ 1.0 information bits● log2 156,950,000 = 27.226 bits

Page 39: Lecture 23. Genomic Futures - Schatzlab

The risks of big data?

Page 40: Lecture 23. Genomic Futures - Schatzlab

Genomic Futures?

The rise of a digital immune systemSchatz & Phillippy (2012) GigaScience 1:4

Page 41: Lecture 23. Genomic Futures - Schatzlab

Computational Research Landscape• Avoid

• New Illumina/PacBio base callers• Entirely new genome assembler from scratch

• Good• Alignment/Assembly/Analysis methods robust to errors, polyploidy, aneuploidy• Use insights from long-reads to improve analysis of short-reads

• Best• Synthesis of large numbers of samples (“pan-genome assembly”)

and/or multiple data types (“multi-omics”)• Prioritization and interpretation of variations

http://schatz-lab.org

NGM+Sniffles RibbonSURVIVOR AssemblyticsLRSimFALCON

Page 42: Lecture 23. Genomic Futures - Schatzlab

Computational Research Landscape• Avoid

• New Illumina/PacBio base callers• Entirely new genome assembler from scratch

• Good• Alignment/Assembly/Analysis methods robust to errors, polyploidy, aneuploidy• Use insights from long-reads to improve analysis of short-reads

• Best• Synthesis of large numbers of samples (“pan-genome assembly”)

and/or multiple data types (“multi-omics”)• Prioritization and interpretation of variations

http://schatz-lab.org

NGM+Sniffles RibbonSURVIVOR AssemblyticsLRSimFALCON

Also consider starting a company!