Top Banner
Whole Genome Sequencing analysis of an outbreak of Mycobacterium caprae in alpine region affecting both domestic and wildlife animals: molecular epidemiological perspective By Ashok Varadharajan Msc Epidemiology LMU Supervisor Dr. Helmut Blum Laboratory for Functional Genome Analysis Gene Center of the LMU Feodor-Lynen-Strasse 25 81377 Munich
18
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Master Thesis Presentation

Whole Genome Sequencing analysis of an outbreak of Mycobacterium caprae in alpine region affecting both

domestic and wildlife animals: molecular epidemiological perspective

By

Ashok Varadharajan

Msc Epidemiology

LMU

Supervisor

Dr. Helmut Blum

Laboratory for Functional Genome Analysis

Gene Center of the LMU

Feodor-Lynen-Strasse 25

81377 Munich

Page 2: Master Thesis Presentation

BACKGROUND

Tuberculosis

An infectious bacterial disease characterized by the growth of nodules (tubercles) in the tissues, especially the lungs.

Mycobaterium Tuberculosis Complex (MTC)

Genetically related group of Mycobacterium species causing tuberculosis in humans or other organisms

Mycobacterium tuberculosis

Mycobacterium africanum

Mycobacterium bovis

Mycobacterium microti

Mycobacterium canettii

Mycobacterium caprae

Mycobacterium pinnipedii

Mycobacterium suricattae

Mycobacterium mungi

tubercles

Page 3: Master Thesis Presentation

BACKGROUND

Mycobacterium caprae

Formerly known as M. Tuberculosis subsp. caprae and M. Bovis subsp. caprae

Designated as new species in the MTC in 2003

Predominantly found in Central and Western European countries

Pathogen can affect both animals and Humans

Differentiating Features in MTC

Special combination of polymorphisms in the genes of oxyR , pncA, katG, gyrA and gyrB

Region of Differentiation 4 (RD4)

Page 4: Master Thesis Presentation

BACKGROUND

Phylogeny of MTC (Rodriguez-Campos et.al, 2014)

Differences

M. caprae M. bovis

RD4 RD4

gyrBn1311 gyrBn1410

pncAc57

Page 5: Master Thesis Presentation

PROBLEMS

Increased prevalance of tuberculosis by M. caprae around Alpine regions

Affected in various species such as Cattle, Red deer, Fox and Human

Lack of Knowledge about the transmission of infection between domestic and wildlife animals

Little knowledge about genomic features of M. caprae concerning markers for genotyping

Lack of Complete genome sequence of M. caprae, complicating the differentiation of M. caprae and M. bovis

Page 6: Master Thesis Presentation

OBJECTIVES

Analysis of spatial spread of the disease by Whole Genome Sequencing analysis.

1. Identification of Variants

2. Clustering of isolates

3. Analysis of Transmission chain

4. Analysis of RD4 region

Development of Web Integrated Environment

1. Automation of Variant Calling pipeline

2. A tailored database to store the identified variants

3. Real time generation of results such as phylogenetic tree, Minimum Spanning Tree , SNP plot Coverage plot etc

Page 7: Master Thesis Presentation

Distribution of sequenced M. caprae isolates

Host Number of

Isolates

Red Deer 95

Cattle 180

Human 3

Schaf 1

Total 279

By host By Location

Germany

Region Number of Isolates

GAP 9

Landshut 3

OA 122

OAL 21

Oldenburg 2

RO 1

TÖL 2

UA 7

UAL 4

WM 2

Unknown 15

Total 188

Austria

Region Number of Isolates

BH Schwaz 7

Bludenz 18

Bregenz 2

Eranet 12

Kaisers 12

Reutte 15

Vorarlber 1

Zillertal 23

Total 90

Page 8: Master Thesis Presentation

WHOLE GENOME SEQUENCING (WGS)

Sequencer - Illumina HiSeq 1500

Technique - Paired-End sequencing

Coverage - 100x

Read Length – 100-250 basepairs

Principle of WGS

M. Caprae Genome

Library Preparation

Illumina HiSeq 1500

Sequenced Reads

Downstream Analysis

Page 9: Master Thesis Presentation

METHODS - IDENTIFICATION OF VARIANTS

Paired End Fastq files

Quality Filter Alignment

BWA - MEM

Removing Duplicates

Picard - MarkDuplicates

Variant Calling

• Samtools - Pileup • VarScan - mpileup2snp

SNPs

VCF

Variant Calling Workflow

Page 10: Master Thesis Presentation

METHODS - IDENTIFICATION OF VARIANTS

Paired End Fastq files

Quality Filter Alignment Removing Duplicates

Variant Calling SNPs

@HISEQ:55:H80HWADXX:2:1101:1341:2054 1:N:0:

GCTCAATTCGAGCCGATGCACCAGGTGTTCCTAGGTGTGCGGT

+

CCCFFFFFHHHHHJJJJJJJJJJJJHIJJJJJIJJHIIJJJJJ

@HISEQ:55:H80HWADXX:2:1101:1259:2056 1:N:0:

AGCCTGCTGGTTGCTGGGTCATTGCGCCATGCCTTCGAGAACA

+

CBCFFFFFHHHHHIJIIJIHIJJIJIJJJJJJJJJJJJIJJJJ

@HISEQ:55:H80HWADXX:2:1101:1341:2054 3:N:0:

GGGGGATCGACCGCTCCCGGAATTCGGTGGAAGCTGCTGCGGT

+

CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJHHD

@HISEQ:55:H80HWADXX:2:1101:1259:2056 3:N:0:

GGCGAGGGCCGCGTCATTGCGGCGTAGCGTGGACGCGATGTTG

+

CCCFFFFFHHHHHIIJJJJJJJJJIGHFFFDDDDDDDDDDDDD

Identifiers->

Sequence ->

Comments ->

Quality ->

Identifiers->

Sequence ->

Comments ->

Quality ->

Page 11: Master Thesis Presentation

METHODS - IDENTIFICATION OF VARIANTS

Paired End Fastq files

Quality Filter Alignment Removing Duplicates

Variant Calling SNPs

Qua

lity

sco

re

Quality Scores across all Bases

Position in read (bp)

Page 12: Master Thesis Presentation

METHODS - IDENTIFICATION OF VARIANTS

Paired End Fastq files

Quality Filter Alignment Removing Duplicates

Variant Calling SNPs

Reference Genome Position

Mapped Reads

Coverage

Page 13: Master Thesis Presentation

METHODS - IDENTIFICATION OF VARIANTS

Paired End Fastq files

Quality Filter Alignment Removing Duplicates

Variant Calling SNPs

Duplicate

Reads

Page 14: Master Thesis Presentation

METHODS - IDENTIFICATION OF VARIANTS

Paired End Fastq files

Quality Filter Alignment Removing Duplicates

Variant Calling SNPs

SNP

Reference Base

Page 15: Master Thesis Presentation

METHODS - IDENTIFICATION OF VARIANTS

Paired End Fastq files

Quality Filter Alignment Removing Duplicates

Variant Calling SNPs

VCF File format

Page 16: Master Thesis Presentation

METHODS – CLUSTERING OF ISOLATES

- Phylogenetic tree can be drawn from custom

generated Fasta sequence containing only the

identified SNPs

Phylogenetic Tree

Magnified Version of Constructed tree

Custom Generated Fasta sequence

>Reference

ATCGCCCTA

>isolate1

TTGGCCGTA

>isolate2

AACTCAGTT

>isolate3

AACGACGTA

- FastTree tool can generate approximately-maximum-

likelihood phylogenetic trees

Page 17: Master Thesis Presentation

METHODS – ANALYSIS OF TRANSMISSION CHAIN

Minimum Spanning Tree

Algorithm available:

Kruskal's algorithm

Prim's algorithm

reverse-delete algorithm

Applications

Taxonomy

Cluster Analysis

Page 18: Master Thesis Presentation

METHODS – ANALYSIS OF RD4 REGION

Coverage Plot