Bioinformatic Analyses of Whole- Genome Sequence Data in a Public Health Laboratory InFORM 2017 Garden Grove, CA Dr. Kelly F. Oakeson Ph.D.
Bioinformatic Analyses of Whole-
Genome Sequence Data in a Public
Health
Laboratory
InFORM 2017
Garden Grove, CA
Dr. Kelly F. Oakeson Ph.D.
UPHL Bioinformatic Workflow
Computational Requirements & Throughput
Oakeson KF, Wagner JM, Mendenhall M, Rohrwasser A, Atkinson-Dunn R.
Bioinformatic Analyses of Whole-Genome Sequence Data in a Public Health
Laboratory. Emerging Infect Dis. 2017 Sep;23(9):1441–5.
Analysis Workflow
Sequence QC
High Quality
Sequence
De novo Genome
AssemblyDraft Genome
Sequence
Annotation
Draft Genome
Annotation
Phylogenetic
Relationships
Phylogenetic Tree
Construction
Sequence QC with SeqyClean
lya Y. Zhbannikov, Samuel S. Hunter, James A. Foster, and Matthew L. Settles. 2017. SeqyClean: A Pipeline for High-throughput Sequence Data Preprocessing.
In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (ACM-BCB '17). ACM, New York,
NY, USA, 407-416. DOI: https://doi.org/10.1145/3107411.3107446
De novo Genome Assembly with SPAdes
Determine Sequence Overlap
————— —————
Assembled Overlapping Sequence
Assembled Draft Genome Sequence
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell
Sequencing. J Comput Biol. 2012 May;19(5):455–77.
Analysis Workflow
Sequence QC
High Quality
Sequence
De novo Genome
AssemblyDraft Genome
Sequence
Annotation
Draft Genome
Annotation
Draft Genome Annotation with Prokka
Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. Oxford University Press; 2014 Jul 15;30(14):2068–9.
Analysis Workflow
Sequence QC
High Quality
Sequence
De novo Genome
AssemblyDraft Genome
Sequence
Annotation
Draft Genome
Annotation
Phylogenetic
Relationships
Phylogenetic Tree
Construction
Phylogenetic Analysis with Roary
Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MTG, et al. Roary: rapid large-scale prokaryote pan genome analysis.
Bioinformatics. Oxford University Press; 2015 Nov 15;31(22):3691–3.
Phylogenetic Analysis with RAxML
RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 2006, 22: 2688–2690. BMC Bioinformatics; 2009.
Campylobacter
jejuni
• May 2014, three confirmed
cases of C. jejuni infections
• Identical PFGE patterns
• All patients reported raw milk
consumption from dairy “A”
• Additional cases identified
during May and June
• Outbreak investigation initiated
June 10, 2014
• Total of 99 cases
Davis KR, Dunn AC, Burnett C, McCullough L, Dimond M, Wagner J,
et al. Campylobacter jejuni Infections Associated with Raw Milk
Consumption--Utah, 2014. MMWR Morb Mortal Wkly Rep. 2016 Apr
1;65(12):301–5.
Campylobacter jejuni
PFGE
• PFGE performed on 79 isolates
• 61 patient derived isolates
• 18 isolates derived from bulk milk
storage tanks
• 76 of 79 isolates have indistinguishable
SmaI PFGE patterns
Salmonella enterica
• Complex Multi-state Outbreak
• Associated with Rotisserie
Chicken
• Five Distinct PFGE Patterns
• 88 Isolates in Total
• 80 Patient Derived Isolates
• 8 Environmental Isolates
• Sequence Data Obtained From
SRA
Analysis Pipeline
Sequence QC
High Quality
Sequence
De novo Genome
AssemblyDraft Genome
Sequence
Annotation
Draft Genome
Annotation
Phylogenetic
Relationships
Signatures of
Selection
Phylogenetic Tree
Construction
Phylogenetic
Analysis