Whole Genome Sequencing analysis of an outbreak of Mycobacterium caprae in alpine region affecting both domestic and wildlife animals: molecular epidemiological perspective By Ashok Varadharajan Msc Epidemiology LMU Supervisor Dr. Helmut Blum Laboratory for Functional Genome Analysis Gene Center of the LMU Feodor-Lynen-Strasse 25 81377 Munich
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Whole Genome Sequencing analysis of an outbreak of Mycobacterium caprae in alpine region affecting both
domestic and wildlife animals: molecular epidemiological perspective
By
Ashok Varadharajan
Msc Epidemiology
LMU
Supervisor
Dr. Helmut Blum
Laboratory for Functional Genome Analysis
Gene Center of the LMU
Feodor-Lynen-Strasse 25
81377 Munich
BACKGROUND
Tuberculosis
An infectious bacterial disease characterized by the growth of nodules (tubercles) in the tissues, especially the lungs.
Mycobaterium Tuberculosis Complex (MTC)
Genetically related group of Mycobacterium species causing tuberculosis in humans or other organisms
Mycobacterium tuberculosis
Mycobacterium africanum
Mycobacterium bovis
Mycobacterium microti
Mycobacterium canettii
Mycobacterium caprae
Mycobacterium pinnipedii
Mycobacterium suricattae
Mycobacterium mungi
tubercles
BACKGROUND
Mycobacterium caprae
Formerly known as M. Tuberculosis subsp. caprae and M. Bovis subsp. caprae
Designated as new species in the MTC in 2003
Predominantly found in Central and Western European countries
Pathogen can affect both animals and Humans
Differentiating Features in MTC
Special combination of polymorphisms in the genes of oxyR , pncA, katG, gyrA and gyrB
Region of Differentiation 4 (RD4)
BACKGROUND
Phylogeny of MTC (Rodriguez-Campos et.al, 2014)
Differences
M. caprae M. bovis
RD4 RD4
gyrBn1311 gyrBn1410
pncAc57
PROBLEMS
Increased prevalance of tuberculosis by M. caprae around Alpine regions
Affected in various species such as Cattle, Red deer, Fox and Human
Lack of Knowledge about the transmission of infection between domestic and wildlife animals
Little knowledge about genomic features of M. caprae concerning markers for genotyping
Lack of Complete genome sequence of M. caprae, complicating the differentiation of M. caprae and M. bovis
OBJECTIVES
Analysis of spatial spread of the disease by Whole Genome Sequencing analysis.
1. Identification of Variants
2. Clustering of isolates
3. Analysis of Transmission chain
4. Analysis of RD4 region
Development of Web Integrated Environment
1. Automation of Variant Calling pipeline
2. A tailored database to store the identified variants
3. Real time generation of results such as phylogenetic tree, Minimum Spanning Tree , SNP plot Coverage plot etc
Distribution of sequenced M. caprae isolates
Host Number of
Isolates
Red Deer 95
Cattle 180
Human 3
Schaf 1
Total 279
By host By Location
Germany
Region Number of Isolates
GAP 9
Landshut 3
OA 122
OAL 21
Oldenburg 2
RO 1
TÖL 2
UA 7
UAL 4
WM 2
Unknown 15
Total 188
Austria
Region Number of Isolates
BH Schwaz 7
Bludenz 18
Bregenz 2
Eranet 12
Kaisers 12
Reutte 15
Vorarlber 1
Zillertal 23
Total 90
WHOLE GENOME SEQUENCING (WGS)
Sequencer - Illumina HiSeq 1500
Technique - Paired-End sequencing
Coverage - 100x
Read Length – 100-250 basepairs
Principle of WGS
M. Caprae Genome
Library Preparation
Illumina HiSeq 1500
Sequenced Reads
Downstream Analysis
METHODS - IDENTIFICATION OF VARIANTS
Paired End Fastq files
Quality Filter Alignment
BWA - MEM
Removing Duplicates
Picard - MarkDuplicates
Variant Calling
• Samtools - Pileup • VarScan - mpileup2snp
SNPs
VCF
Variant Calling Workflow
METHODS - IDENTIFICATION OF VARIANTS
Paired End Fastq files
Quality Filter Alignment Removing Duplicates
Variant Calling SNPs
@HISEQ:55:H80HWADXX:2:1101:1341:2054 1:N:0:
GCTCAATTCGAGCCGATGCACCAGGTGTTCCTAGGTGTGCGGT
+
CCCFFFFFHHHHHJJJJJJJJJJJJHIJJJJJIJJHIIJJJJJ
@HISEQ:55:H80HWADXX:2:1101:1259:2056 1:N:0:
AGCCTGCTGGTTGCTGGGTCATTGCGCCATGCCTTCGAGAACA
+
CBCFFFFFHHHHHIJIIJIHIJJIJIJJJJJJJJJJJJIJJJJ
@HISEQ:55:H80HWADXX:2:1101:1341:2054 3:N:0:
GGGGGATCGACCGCTCCCGGAATTCGGTGGAAGCTGCTGCGGT
+
CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJHHD
@HISEQ:55:H80HWADXX:2:1101:1259:2056 3:N:0:
GGCGAGGGCCGCGTCATTGCGGCGTAGCGTGGACGCGATGTTG
+
CCCFFFFFHHHHHIIJJJJJJJJJIGHFFFDDDDDDDDDDDDD
Identifiers->
Sequence ->
Comments ->
Quality ->
Identifiers->
Sequence ->
Comments ->
Quality ->
METHODS - IDENTIFICATION OF VARIANTS
Paired End Fastq files
Quality Filter Alignment Removing Duplicates
Variant Calling SNPs
Qua
lity
sco
re
Quality Scores across all Bases
Position in read (bp)
METHODS - IDENTIFICATION OF VARIANTS
Paired End Fastq files
Quality Filter Alignment Removing Duplicates
Variant Calling SNPs
Reference Genome Position
Mapped Reads
Coverage
METHODS - IDENTIFICATION OF VARIANTS
Paired End Fastq files
Quality Filter Alignment Removing Duplicates
Variant Calling SNPs
Duplicate
Reads
METHODS - IDENTIFICATION OF VARIANTS
Paired End Fastq files
Quality Filter Alignment Removing Duplicates
Variant Calling SNPs
SNP
Reference Base
METHODS - IDENTIFICATION OF VARIANTS
Paired End Fastq files
Quality Filter Alignment Removing Duplicates
Variant Calling SNPs
VCF File format
METHODS – CLUSTERING OF ISOLATES
- Phylogenetic tree can be drawn from custom
generated Fasta sequence containing only the
identified SNPs
Phylogenetic Tree
Magnified Version of Constructed tree
Custom Generated Fasta sequence
>Reference
ATCGCCCTA
>isolate1
TTGGCCGTA
>isolate2
AACTCAGTT
>isolate3
AACGACGTA
- FastTree tool can generate approximately-maximum-