Quantitative Metagenomics Lea Benedicte Skov Hansen, PhD NGS Course 13 th of June 2016
Quantitative Metagenomics
Lea Benedicte Skov Hansen, PhD NGS Course 13th of June 2016
13/06/2016 Quantitative Metagenomics 2 DTU Sytems Biology, Technical University of Denmark
Exercise
13/06/2016 Quantitative Metagenomics 3 DTU Sytems Biology, Technical University of Denmark
Exercise
• Metagenome assembly – Preassembled with two methods:
• Soap • Meta Velvet
– Contig coverage – Assembly statistics
13/06/2016 Quantitative Metagenomics 4 DTU Sytems Biology, Technical University of Denmark
Exercise
• Metagenome assembly – Preassembled with two methods:
• Soap • Meta Velvet
– Contig coverage – Assembly statistics
• Gene prediction – Prodigal – Gene clustering based on similarity – Gene catalogue
13/06/2016 Quantitative Metagenomics 5 DTU Sytems Biology, Technical University of Denmark
Exercise
• Metagenome assembly – Preassembled with two methods:
• Soap • Meta Velvet
– Contig coverage – Assembly statistics
• Gene prediction – Prodigal – Gene clustering based on similarity – Gene catalogue
• Gene abundance matrix – Align reads to gene catalogue with bwa – Count number of reads mapping to a
gene – samtools
13/06/2016 Quantitative Metagenomics 6 DTU Sytems Biology, Technical University of Denmark
Exercise 2
• Metagenome assembly – Preassembled with two methods:
• Soap • Meta Velvet
– Contig coverage – Assembly statistics
• Gene prediction – Prodigal – Gene clustering based on similarity – Gene catalogue
• Gene abundance matrix – Align reads to gene catalogue with bwa – Count number of reads mapping to a
gene – samtools • Taxonomic annotation of gene catalogue
– Blast gene catalogue • NCBI Bacterial Genomes • 373 additional genomes
– Rearranging gene abundance to taxonomic abundance
13/06/2016 Quantitative Metagenomics 7 DTU Sytems Biology, Technical University of Denmark
Ecology
Is the scientific analysis and study of interactions among organisms and their environment, such as the interactions organisms have with each other and with their abiotic environment.
13/06/2016 Quantitative Metagenomics 8 DTU Sytems Biology, Technical University of Denmark
Nothing new – except the technology
Classical measures • Abundance • Diversity • Richness
13/06/2016 Quantitative Metagenomics 9 DTU Sytems Biology, Technical University of Denmark
Abundance (Counts)
13/06/2016 Quantitative Metagenomics 10 DTU Sytems Biology, Technical University of Denmark
Abundance (Count)
Lion 64 Zebra 128 Giraffe 64 leopard 64 rhinoceros 64 hippopotamus 128 gazelle 128 elephant 64 monkey 9
13/06/2016 Quantitative Metagenomics 11 DTU Sytems Biology, Technical University of Denmark
Richness
Lion 64 Zebra 128 Giraffe 64 Leopard 64 Rhinoceros 64 Hippopotamus 128 Gazelle 128 Elephant 64 Monkey 9
9 observed species
13/06/2016 Quantitative Metagenomics 12 DTU Sytems Biology, Technical University of Denmark
Richness
Rarefaction curves
13/06/2016 Quantitative Metagenomics 13 DTU Sytems Biology, Technical University of Denmark
Richness
Rarefaction curves
13/06/2016 Quantitative Metagenomics 14 DTU Sytems Biology, Technical University of Denmark
Richness
Lion 1 Zebra 2 Giraffe 1 Leopard 1 Rhinoceros 1 Hippopotamus 2 Gazelle 2 Elephant 1 Monkey 0
Species richness estimators: Chao1 index = Sobs + f12/(2f2) Sobs = observed species f1 = species observed once f2 = species observed twice
8 observed species
Chao1 index = 8 + 52/(2*3) = 12.17
13/06/2016 Quantitative Metagenomics 15 DTU Sytems Biology, Technical University of Denmark
Evenness
Lion 1 1 Zebra 2 1 Giraffe 1 8 Leopard 1 1 Rhinoceros 1 1 Hippopotamus 2 1 Gazelle 2 1 Elephant 1 1 Monkey 0 0
13/06/2016 Quantitative Metagenomics 16 DTU Sytems Biology, Technical University of Denmark
Alpha Diversity
Richness Evenness Richness: s1 = s2
Lion 1 1 Zebra 2 1 Giraffe 1 8 Leopard 1 1 Rhinoceros 1 1 Hippopotamus 2 1 Gazelle 2 1 Elephant 1 1 Monkey 0 0
13/06/2016 Quantitative Metagenomics 17 DTU Sytems Biology, Technical University of Denmark
Alpha Diversity
Richness: s1 = s2 Evenness s1 ≠ s2
Lion 1 1 Zebra 2 1 Giraffe 1 8 Leopard 1 1 Rhinoceros 1 1 Hippopotamus 2 1 Gazelle 2 1 Elephant 1 1 Monkey 0 0
13/06/2016 Quantitative Metagenomics 18 DTU Sytems Biology, Technical University of Denmark
Alpha Diversity
Shannon index
H = Σ i=1
R
pi ln pi
H = Shannon index p = count of species i / total counts R = observed species
Lion 1 1 Zebra 2 1 Giraffe 1 8 Leopard 1 1 Rhinoceros 1 1 Hippopotamus 2 1 Gazelle 2 1 Elephant 1 1 Monkey 0 0
13/06/2016 Quantitative Metagenomics 19 DTU Sytems Biology, Technical University of Denmark
Alpha Diversity
Lion 1 1 Zebra 2 1 Giraffe 1 8 Leopard 1 1 Rhinoceros 1 1 Hippopotamus 2 1 Gazelle 2 1 Elephant 1 1 Monkey 0 0
Shannon index
H = Σ i=1
R
pi ln pi
Hs1 = 2.02 Hs2 = 1.60
p1 = p2 = p3 .. pR
H = ln(R) = 2.08
13/06/2016 Quantitative Metagenomics 20 DTU Sytems Biology, Technical University of Denmark
Sample Sizes
13/06/2016 Quantitative Metagenomics 21 DTU Sytems Biology, Technical University of Denmark
Sample Sizes
Accounting for different sample sizes:
• Normalize to sample size
• Rarefy samples
• Statistical model of sample variance
13/06/2016 Quantitative Metagenomics 22 DTU Sytems Biology, Technical University of Denmark
Sample Sizes
Lion 64 1 Zebra 128 2 Giraffe 64 1 Leopard 64 1 Rhinoceros 64 1 Hippopotamus 128 2 Gazelle 128 2 Elephant 64 1 Monkey 9 0 Total 713 11
Normalize to library size: Norm = ni/ntot
Lion 8.98 9.09 Zebra 17.95 18.18 Giraffe 8.98 9.09 Leopard 8.98 9.09 Rhinoceros 8.98 9.09 Hippopotamus 17.95 18.18 Gazelle 17.95 18.18 Elephant 8.98 9.09 Monkey 1.26 0 Total 100 100
13/06/2016 Quantitative Metagenomics 23 DTU Sytems Biology, Technical University of Denmark
Sample Sizes
Rarefying to smaller library size:
Lion 64 1 Zebra 128 2 Giraffe 64 1 Leopard 64 1 Rhinoceros 64 1 Hippopotamus 128 2 Gazelle 128 2 Elephant 64 1 Monkey 9 0 Total 713 11
Lion 2 1 Zebra 3 2 Giraffe 0 1 Leopard 1 1 Rhinoceros 0 1 Hippopotamus 3 2 Gazelle 1 2 Elephant 0 0 Monkey 0 0 Total 10 10
13/06/2016 Quantitative Metagenomics 24 DTU Sytems Biology, Technical University of Denmark
Sample sizes
Normalization and downsizing does not account for heteroscedasticity! Statistically modeled variance: • DESeq2 • EdgeR
13/06/2016 Quantitative Metagenomics 25 DTU Sytems Biology, Technical University of Denmark
Beta-Diversity
Diversity between communities!
13/06/2016 Quantitative Metagenomics 26 DTU Sytems Biology, Technical University of Denmark
Beta-Diversity
Lion 0 2 Zebra 3 2 Giraffe 0 4 Leopard 0 2 Rhinoceros 1 2 Hippodrome 4 0 Gazelle 0 1 Elephant 1 0 Total 9 13
13/06/2016 Quantitative Metagenomics 27 DTU Sytems Biology, Technical University of Denmark
Beta-Diversity Lion 0 2 Zebra 3 2 Giraffe 0 4 Leopard 0 2 Rhinoceros 1 2 Hippodrome 4 0 Gazelle 0 1 Elephant 1 0 Total 9 13
Bray-Curtis dissimilarity metric
Bij = 1 - 2Cij / (Si + Sj) C = sum of the lowest count of common species S = total count of the sample Bs1s2 = 1 – 2*3 / 22 = 0.73 - Dissimilar C = 3 Ss1 + Ss2 = 22
0 ≤ B ≤ 1
13/06/2016 Quantitative Metagenomics 28 DTU Sytems Biology, Technical University of Denmark
Beta-Diversity
Other similarity metrics • Eucledian distance
• Jensen Shannon Distance
M=(x+y)/2
13/06/2016 Quantitative Metagenomics 29 DTU Sytems Biology, Technical University of Denmark
Beta-Diversity
Distance matrix
13/06/2016 Quantitative Metagenomics 30 DTU Sytems Biology, Technical University of Denmark
Diversity - example
13/06/2016 Quantitative Metagenomics 31 DTU Sytems Biology, Technical University of Denmark
Diversity - example
13/06/2016 Quantitative Metagenomics 32 DTU Sytems Biology, Technical University of Denmark
Diversity - example
13/06/2016 Quantitative Metagenomics 33 DTU Sytems Biology, Technical University of Denmark
Diversity - example
13/06/2016 Quantitative Metagenomics 34 DTU Sytems Biology, Technical University of Denmark
Hands on!