Top Banner
White Paper Edge Implementation with Intel® NUC Mini PCs for MDR, XDR and XXDR Tuberculosis Genome Informatics Pipeline About HaystackAnalytics® Private Limited HaystackAnalytics® Private Limited is a MedTech startup based out of Society for Innovation and Entrepreneurship, the business incubator at Indian Institute of Technology Bombay, Mumbai. Dr. Anirvan Chatterjee, Mr. Gaurav Srivastav and Prof. Kiran Kondabagil co-founded HaystackAnalytics® with the core idea of research driven value creation in the space of genomics analysis. HaystackAnalytics® created an automated genomic data analysis, inferencing and clinical report generation platform (Ω-Suite), which enables non-bioinformaticians to use a graphic user interface, plug-n-play SaaS for generating clinical reports from raw next generation sequencing (NGS) data. The Ω-Suite is a B2B product intended for the use of genomics in pathology laboratories (standalone; multi-geography chains; in-hospital diagnostics) to make fast and comprehensive clinical decisions, while substantially reducing the risk of misdiagnosis. Intel Startup Program Intel Startup Program is a coveted innovation platform for deep tech startups aimed at providing access to advanced technologies, enabling design and assistance to scale and go-to-market (GTM). While the Ω-Suite was developed as a cloud native SaaS, after initial deployments it was clear that the very high raw NGS data output created huge latency in data upload due to poor bandwidth availability. Furthermore, incomplete data upload increased the risk of data corruption. Thus, in the Intel Startup Program, with mentorship from Intel engineers with significant experience in computational architecture for genomic analysis, HaystackAnalytics® targeted to create an edge device for genomic analysis of small genomes (bacteria, viruses and other pathogens). The Intel® Next Unit of Computing (NUC) was selected as the computing platform of choice due its low power requirement and compatibility with latest generation Intel® Core™ Processors. During the acceleration program, HaystackAnalytics® optimized the Ω-Suite to run a non-batched sample analysis thereby enabling parallelisation of individual steps in the pipeline. Further optimization using latest Intel optimized genomic tools were tested, validated and replaced in the Ω-Suite. Finally, OS level optimization wherein other system processes were blocked to provide maximum computing time for Ω-Suite. At the end of the accelerator program. HaystackAnalytics® was able to reproducibly achieve an analysis time on ΩTB® (Whole genome sequence analysis of Mycobacterium Tuberculosis on the Ω-Suite) of 13 – 15 minutes using up to 0.5 GB paired-end Illumina NGS reads. Abstract High proportion of drug resistance in Tuberculosis (TB) patients in India 1 and elsewhere 2 necessitates early and rapid drug susceptibility testing in TB diagnostic facilities. Genomic analysis of TB whole genome sequencing (WGS) data has emerged as the gold standard for comprehensive drug susceptibility testing (DST) for TB worldwide 3–6 . However, the need for high performance computing (HPC) for genomic analysis using validated analytical pipelines is a bottleneck for the uptake of WGS based TB DST. Here we describe the edge implementation of automated genomic DST for TB samples by ΩTB® software created by HaystackAnalytics® using Intel® NUC Mini PCs. Authors Dr. Anirvan Chatterjee HaystackAnalytics® Dr. Ramanathan Sethuraman Intel India Lakshminarasimhan Ranganathan Intel India Healthcare Analytics Table of Contents About HaystackAnalytics® Private Limited ... 1 Intel Startup Program ............................................ 1 Abstract ........................................................................ 1 Combating Tuberculosis Drug Resistance is Essential for the Success of the National TB Elimination Program .............................................. 2 TB Genomics - The Global Gold Standard for DR-TB Diagnosis ............................................... 2 TB Genomics – The Most Comprehensive, Rapid and Economical Diagnosis for MDR, XDR and XXDR TB ................................................... 2 Edge Implementation of Genomic DST for TB ............................................................................ 3 Future developments on the Integrated Genomic Analysis Platform ................................. 4 References .................................................................. 4 Appendix ..................................................................... 5
5

Edge Implementation with Intel® NUC Mini PCs for MDR, XDR ...

Jun 06, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Edge Implementation with Intel® NUC Mini PCs for MDR, XDR ...

White Paper

Edge Implementation with Intel® NUC Mini PCs for MDR, XDR and XXDR Tuberculosis Genome Informatics Pipeline

About HaystackAnalytics® Private LimitedHaystackAnalytics® Private Limited is a MedTech startup based out of Society for Innovation and Entrepreneurship, the business incubator at Indian Institute of Technology Bombay, Mumbai. Dr. Anirvan Chatterjee, Mr. Gaurav Srivastav and Prof. Kiran Kondabagil co-founded HaystackAnalytics® with the core idea of research driven value creation in the space of genomics analysis.

HaystackAnalytics® created an automated genomic data analysis, inferencing and clinical report generation platform (Ω-Suite), which enables non-bioinformaticians to use a graphic user interface, plug-n-play SaaS for generating clinical reports from raw next generation sequencing (NGS) data.

The Ω-Suite is a B2B product intended for the use of genomics in pathology laboratories (standalone; multi-geography chains; in-hospital diagnostics) to make fast and comprehensive clinical decisions, while substantially reducing the risk of misdiagnosis.

Intel Startup ProgramIntel Startup Program is a coveted innovation platform for deep tech startups aimed at providing access to advanced technologies, enabling design and assistance to scale and go-to-market (GTM). While the Ω-Suite was developed as a cloud native SaaS, after initial deployments it was clear that the very high raw NGS data output created huge latency in data upload due to poor bandwidth availability. Furthermore, incomplete data upload increased the risk of data corruption. Thus, in the Intel Startup Program, with mentorship from Intel engineers with significant experience in computational architecture for genomic analysis, HaystackAnalytics® targeted to create an edge device for genomic analysis of small genomes (bacteria, viruses and other pathogens).

The Intel® Next Unit of Computing (NUC) was selected as the computing platform of choice due its low power requirement and compatibility with latest generation Intel® Core™ Processors. During the acceleration program, HaystackAnalytics® optimized the Ω-Suite to run a non-batched sample analysis thereby enabling parallelisation of individual steps in the pipeline. Further optimization using latest Intel optimized genomic tools were tested, validated and replaced in the Ω-Suite. Finally, OS level optimization wherein other system processes were blocked to provide maximum computing time for Ω-Suite.

At the end of the accelerator program. HaystackAnalytics® was able to reproducibly achieve an analysis time on ΩTB® (Whole genome sequence analysis of Mycobacterium Tuberculosis on the Ω-Suite) of 13 – 15 minutes using up to 0.5 GB paired-end Illumina NGS reads.

AbstractHigh proportion of drug resistance in Tuberculosis (TB) patients in India1 and elsewhere2 necessitates early and rapid drug susceptibility testing in TB diagnostic facilities. Genomic analysis of TB whole genome sequencing (WGS) data has emerged as the gold standard for comprehensive drug susceptibility testing (DST) for TB worldwide3–6. However, the need for high performance computing (HPC) for genomic analysis using validated analytical pipelines is a bottleneck for the uptake of WGS based TB DST. Here we describe the edge implementation of automated genomic DST for TB samples by ΩTB® software created by HaystackAnalytics® using Intel® NUC Mini PCs.

AuthorsDr. Anirvan Chatterjee

HaystackAnalytics®

Dr. Ramanathan SethuramanIntel India

Lakshminarasimhan RanganathanIntel India

HealthcareAnalytics

Table of ContentsAbout HaystackAnalytics® Private Limited ... 1

Intel Startup Program ............................................ 1

Abstract ........................................................................ 1

Combating Tuberculosis Drug Resistance isEssential for the Success of the National TBElimination Program .............................................. 2

TB Genomics - The Global Gold Standardfor DR-TB Diagnosis ............................................... 2

TB Genomics – The Most Comprehensive,Rapid and Economical Diagnosis for MDR,XDR and XXDR TB ................................................... 2

Edge Implementation of Genomic DSTfor TB ............................................................................ 3

Future developments on the IntegratedGenomic Analysis Platform ................................. 4

References .................................................................. 4

Appendix ..................................................................... 5

Page 2: Edge Implementation with Intel® NUC Mini PCs for MDR, XDR ...

Combating Tuberculosis Drug Resistance is Essential for the Success of the National TB Elimination ProgramTuberculosis (TB) patients who do not respond to therapy due to drug resistance actively transmit the disease in the community. By recent estimates 75% of drug resistant (DR) TB remain undiagnosed⁶ and each of the undiagnosed DR-TB patients can cause on an average 2-3 percent new DR-TB cases⁷.

Despite an aggressive TB control program in India, DR TB has risen sharply over the past decade.

The most worrying aspect is the increase in the number of antibiotics to which resistance has been detected, wherein Multidrug Resistant (MDR) TB has been overshadowed by the increase in Extensively Drug Resistant (XDR), Extremely Drug Resistant (XXDR) TB and Totally Drug Resistant (TDR) TB⁸.

The inability to diagnose DR-TB early in the diagnostic cycle is the most challenging part of treating DR-TB. It can take longer than 3 weeks and up to 3 months for all existing approaches in the Programmatic Management of DR-TB (PMDT⁹) to detect DR-TB, causing poor outcomes in diagnosis and significantly increasing the proportion of DR-TB. Therefore, until an efficient and comprehensive diagnostic test is adopted nationally, all existing procedures would not be able to allow the achievement of TB elimination goals.

TB Genomics - The Global Gold Standard for DR-TB DiagnosisGenomic prediction of TB drug resistance has revolutionised DR-TB diagnosis globally. The National Health Services (in UK) employed genomics in TB diagnosis in 201510, after which the World Health Organisation released the technical guidelines for using genomics for DR-TB diagnosis.

While current methods used 11 different tests to detect drug resistance to 13 drugs over a period of several months, genomics can provide a drug resistance profile of 17 antibiotics in one test within eight hours11. Clinical studies from Mumbai have already shown the use of genomics can significantly reduce the TB burden in India5.

TB Genomics – The Most Comprehensive, Rapid and Economical Diagnosis for MDR, XDR and XXDR TB

Most Affordable Complete DST for DR-TBA new HaystackAnalytics® private pilot for the National Health Mission12 in Mumbai, Ltd (A Dept. of Science and Technology13, BIRAC14 and Plugin15 supported startup) showed that while current diagnostic methods can cost up to INR 18000 per sample, genomics-based drug resistance profiling can be conducted at 30% lower cost per sample. HaystackAnalytics® services have been registered on the GeM portal16 and are ready for deployment anywhere in India.

Exponential Year-on-Year Reduction in TB ExpenditureBy diagnosing DR-TB patients within a few days, the number of new DR-TB infections are going to reduce by an estimated factor of 0.30. Given that the cost of treating XDR TB patients is 5x than that of a normal TB patient, genomics-based DST will enable a rapid reduction in the cost of diagnosis and therapy, year-on-year.

ΩTB® Works on Culture and Direct SputumThe first deployment of ΩTB® supported by the Maharashtra State Innovation Society17 approved by the Central TB Division18 was used for genomic diagnosis of DR-TB from TB cultures. A recent deployment of ΩTB® supported by the Indian Council for Medical Research19 (ICMR) at the Hinduja Hospital20 analysed genomic data directly from sputum to provide the first genomic DST of TB from sputum in India. The data has been forwarded to ICMR and the TB RePORT consortia21.

Intuitive and Easy-to-Use Genomic AnalysisAs a SaaS product, ΩTB® does not require up-skilling from existing TB laboratories. It only requires access to genome sequencers. The web interface helps the user to pick the samples to be analysed, where the raw sequencing data is processed on the basis of the GATK22 best practices guide and the genomic research pipeline3,5 validated by peers.

Page 3: Edge Implementation with Intel® NUC Mini PCs for MDR, XDR ...

From Sample to DST Report in 8 HoursImplementation of genomic DST for TB in accredited TB DST laboratories can enable comprehensive diagnosis within one working day11, as against current diagnostic turn-around- time (TAT) of greater than 3 weeks.

Edge Implementation of Genomic DST for TB

Genomic DST for TB in 15 MinutesWhen HaystackAnalytics® started working with Intel to optimize the ΩTB® genomic analytical pipeline, it took ~60 minutes for analysing one sample. Our goal was to reduce this time by half. The ΩTB® is a validated genomic pipeline which automates the use of standard genomic libraries3,5.

To optimize the computation hungry genomic analytics pipeline for the αBox (a small computing device using Intel® NUC Mini PCs), tools like Intel® Vtune™ Profiler22 were used to identify critical hotspots and bottlenecks. By replacing standard libraries with Intel optimized libraries, per sample analysis time was reduced by 25%. Further optimization of the pipeline was performed to maximise parallelisation of libraries which are benchmarked for multithreading, and others were queued, resulting in optimal peak performance from the CPU.

The per sample analysis time reduced to 15 minutes (4X performance improvement from the baseline). Further options are being explored with other Intel Optimized libraries to reduce this time further, which will reduce batch processing time. Overall, all these optimizations have tremendous potential to reduce cost for analysing samples at the same time improving the throughput for analysis.

Salient Features of ΩTB®:Fully automated

Antibiotics tested: 17

Turn Around Time: 15 minutes

WHO validated mutation catalogue

Only test to detect and quantify mixed infections

One test for strain classification, drug susceptibility testing and co-infections

No requirement for internet connectivity

Registered on GeM

Made in IndiaIntel and HaystackAnalytics® have collaborated to create the first edge computing device (αBox) which can be deployed countrywide with minimal costs. Further, the ΩTB® Software created by HaystackAnalytics®, loaded in the αBox provides antibiotic profiling for DR-TB suspected cases within 15 minutes.

ΩTB® has been validated with the WHO genomic guidelines for 17 antibiotics. The αBox-based ΩTB® genomic testing solution is the world’s first edge computing device for

genomics-based DST for TB. In addition to being a shot-in-the-arm for achieving the goals, the implementation of this device in the National TB Elimination Program would also establish India's commitment to eliminating TB on the global level.

ProcessThe process as detailed in the figure below for genomics is broadly split into DNA sequencing using next generation sequencing (NGS) and NGS data analysis. While NGS platforms are widely available commercially, analysis of NGS data is contingent on availability of large teams of highly specialised scientists and engineers, resulting in non-standardised analytical pipelines.

The αBox-based ΩTB® genomic testing provides an integrated solution which automates computation, inferencing and report generation.

The genomic analytical pipeline enabled on the ΩTB® is based on the pipeline implemented by Chatterjee et al.⁵ and by Pankhurst et al.10. As per GATK best practices, reference based analysis of NGS data involves quality filtering, reference mapping, pileup, variant calling and annotation. While there are more than 100 different open source libraries available for each of the steps, the choice and combination of libraries determines the speed of analysis and accuracy of the variant calling.

Here, the raw FastQ NGS data is first processed through three step quality control based on de-duplication, phred Score >30, metagenomic read binning for Mycobacterium Tuberculosis Complex (MTBC). The binned reads are mapped to H37Rv, using both BWA-MEM and GATK. The variants from the two mapping are reconciled to create the final variant list into SAMTOOLS variable calling format. The variants are annotated to the ΩTB® resistance catalogue.

The resistance catalogue is manually curated and updated by HaystackAnalytics® based on variants reported in literature which show association with phenotypic drug resistance testing5,7, and by phylogenomic inferencing of novel mutation detected in clades previously annotated with drug resistance 5,6.

Page 4: Edge Implementation with Intel® NUC Mini PCs for MDR, XDR ...

Future developments on the Integrated Genomic Analysis PlatformFuture development of the αBox-based ΩTB® includes ISO27001 and CE certification. Current R&D pipeline at

RIF

INH

EMB

PYZ

STREP

CAP

KAN

AMI

MO

XOFL

O

GATI

ETH

LZD

PAS

BDQ

CFZ

DLM

DNA Sequencing

Next GenerationSequencer

ΩTBTM

Simple

Precise

Personalized

Automated

Validated

TB Diagnosis:XDR

Sample : ###-^^^-&&&Sample Source : SputumM.tb detected : 34%Mixed infection : 56%Strain : CAS-Delhi

αBox

3rd Party

Computation Inferencing Reporting

ΩPlatform

ΩTB ΩGut

ΩID ΩOnco

HaystackAnalytics® includes three products, ΩID (one test for detecting 200+ pathogens with antibiotic resistance profile for bacteria), ΩGut (microbiome based analysis of metabolic and immune disorders) and ΩOnco (automation of onco-genomics).

References1. Dalal A, Pawaskar A, Das M, et al. Resistance Patterns among

Multidrug-Resistant Tuberculosis Patients in Greater Metropolitan Mumbai: Trends over Time. PLoS One 2015; 10: e0116798.

2. Phelan J, Coll F, McNerney R, et al. Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance. BMC Med 2016; 14. DOI:10.1186/s12916-016-0575-9.

3. Coll F, Phelan J, Hill-Cawthorne GA, et al. Genome-wide analysis of multi- and extensively drug-resistant Mycobacterium tuberculosis. Nat Genet 2018. DOI:10.1038/s41588-017-0029-0.

4. Makhado NA, Matabane E, Faccin M, et al. Outbreak of multidrug-resistant tuberculosis in South Africa undetected by WHO-endorsed commercial tests: an observational study. Lancet Infect Dis 2018; 18: 1350–9.

5. Chatterjee A, Nilgiriwala K, Saranath D, Rodrigues C, Mistry N. Whole genome sequencing of clinical strains of Mycobacterium tuberculosis from Mumbai, India: A potential tool for determining drug-resistance and strain lineage. Tuberculosis 2017; published online Aug. DOI: 10.1016/j.tube.2017.08.002.

6. Farhat MR, Freschi L, Calderon R, et al. GWAS for quantitative resistance phenotypes in Mycobacterium tuberculosis reveals resistance genes and regulatory regions. Nat Commun 2019; 10: 2128.

7. Kendall EA, Azman AS, Cobelens FG, Dowdy DW. MDR-TB treatment as prevention: The projected population-level impact of expanded treatment for multidrug-resistant tuberculosis. PLoS One 2017; 12: e0172748.

8. Udwadia ZF. MDR, XDR, TDR tuberculosis: ominous progression. Thorax 2012; 67: 286–8.

9. PMDT: https://tbcindia.gov.in/index1.php?lang=1&level=1&sublinkid=4150&lid=2794

10. Pankhurst LJ, Del Ojo Elias C, Votintseva AA, et al. Rapid, comprehensive, and affordable mycobacterial diagnosis with whole-genome sequencing: a prospective study. Lancet Respir Med 2016; 4: 49–58.

11. Votintseva AA, Bradley P, Pankhurst L, et al. Same-Day Diagnostic and Surveillance Data for Tuberculosis via Whole-Genome Sequencing of Direct Respiratory Samples. J Clin Microbiol 2017; 55: 1285–98.

12. NHM: https://nhm.gov.in/13. DST: https://dst.gov.in/14. BIRAC: https://www.birac.nic.in/15. Plugin: https://plugin.org.in/current-cohort16. GeM portal: https://gem.gov.in/17. Maharashtra State Innovation Society: https://www.msins.in/18. Central TB Division: https://tbcindia.gov.in/19. ICMR: https://www.icmr.gov.in/ 20. Hinduja hospital: https://www.hindujahospital.com/about-us/tb-awards.html21. TB RePORT Consortia: https://www.reportinternational.org/about22. Libraries/Tools Used: • BWA: http://bio-bwa.sourceforge.net/• GATK: https://gatk.broadinstitute.org/hc/en-us• BBMAP: https://sourceforge.net/projects/bbmap/• PICARD: https://broadinstitute.github.io/picard/• vcftools: http://vcftools.sourceforge.net/• samtools: http://www.htslib.org/• kraken: https://ccb.jhu.edu/software/kraken/• BWA-Mem2: https://github.com/bwa-mem2/bwa-mem2 • Intel® Select Solutions for Genomics:

https://www.intel.com/content/www/us/en/high-performance-computing/select-solutions-for-genomics-analytics.html

• Intel® VTuneTM Profiler: https://software.intel.com/content/www/us/en/develop/tools/vtune-profiler.html

Page 5: Edge Implementation with Intel® NUC Mini PCs for MDR, XDR ...

Platform

CPU

Cores/socket, Threads/socket

Hyper Threading

Turbo

BIOS version (including microcode verison: cat /proc/cpuinfo | grep microcode –m1)

System DDR Mem Config: slots / cap / run-speed

Total Memory/Node (DDR+DCPMM)

Hard drive type and capacity

Other HW (Accelerator)

x86_64

Intel® Core™ i7-8559U

4 cores/8 threads

Yes

Yes

0xd6

DDR4-2400 SODIMM @ 1.2V

16 GB

SSD/256GB. V-NAND SS 970 EVO. NNMe2

None

Appendix:Hardware Configuration:

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Performance results are based on testing as of December 2020 and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.

Your costs and results may vary.

Intel technologies may require enabled hardware, software or service activation.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

HaystackAnalytics® disclaims all express and implied warranties whatsoever, including without limitation, the implied warranties of merchantability, non-infringement and fitness for any particular purpose. Further, HaystackAnalytics® will not be liable for any direct, indirect, special, incidental, punitive, or consequential damages of any kind.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.