Klebsiella pneumoniae and Acinetobacter baumannii

Faculty of Medicine and Health Sciences

Genomic insights into the emergence and spread of ‘high-risk’

Klebsiella pneumoniae and Acinetobacter baumannii clones

Thesis submitted for the degree of doctor in Medical Sciences at the

University of Antwerp to be defended by

Mattia PALMIERI

Supervisors:

Prof. Herman Goossens

Prof. Alex van Belkum

Dr. Pieter Moons

Antwerp, 2020

Genomic insights into the emergence and spread of ‘high-risk’

Klebsiella pneumoniae and Acinetobacter baumannii clones

Genomische inzichten in het ontstaan en de verspreiding van

“hoog-risico” Klebsiella pneumoniae en Acinetobacter baumannii

klonen

Thesis submitted for the degree of doctor in Medical Sciences at the

University of Antwerp to be defended by

Mattia PALMIERI

Doctoral committee:

Promotors:

Prof. Herman Goossens

Prof. Alex van Belkum

Dr. Pieter Moons

Counsellor:

Dr. Caroline Mirande

Internal jury, Universiteit Antwerpen:

Dr. Arvid Suls

Prof. Annelies Van Rie

External jury:

Prof. Christian Giske

Prof. Derrick Crook

Prof. Marco Maria D’Andrea

Index of contents

Abstract ................................................................................................................................................... 1

Samenvatting ........................................................................................................................................... 2

List of abbreviations ................................................................................................................................ 4

List of figures ........................................................................................................................................... 6

Preface ..................................................................................................................................................... 7

CHAPTER 1 : General introduction and aims ......................................................................................... 10

1.1 The antimicrobial resistance crisis .............................................................................................. 10

1.2 The ESKAPE pathogens ................................................................................................................ 12

1.3 Whole Genome Sequencing (WGS): a disruptive diagnostic tool ............................................... 21

1.4 Aims ............................................................................................................................................. 26

1.5 References ................................................................................................................................... 27

CHAPTER 2 : Genomic epidemiology of carbapenem- and colistin-resistant Klebsiella pneumoniae

isolates from Serbia: predominance of ST101 strains carrying a novel OXA-48 plasmid ..................... 36

2.1 Abstract ....................................................................................................................................... 37

2.2 Introduction ................................................................................................................................. 37

2.3 Materials and methods ............................................................................................................... 39

2.4 Results ......................................................................................................................................... 41

2.5 Discussion .................................................................................................................................... 47

2.6 References ................................................................................................................................... 48

CHAPTER 3 : Abundance of colistin-resistant, OXA-23- and ArmA-producing Acinetobacter baumannii

belonging to International Clone 2 in Greece ....................................................................................... 55

3.1 Abstract ....................................................................................................................................... 56

3.2 Introduction ................................................................................................................................. 56


3.4 Results ......................................................................................................................................... 59

3.5 Discussion .................................................................................................................................... 63

3.6 References ................................................................................................................................... 65

CHAPTER 4 : Genomic evolution and local epidemiology of Klebsiella pneumoniae from the Beijing

Hospital 301 over a fifteen-year period: dissemination of known and novel high-risk clones ............. 72

4.1 Introduction ................................................................................................................................. 73


4.3 Results and discussion ................................................................................................................. 75

4.4 Conclusions .................................................................................................................................. 86

4.5 References ................................................................................................................................... 87

CHAPTER 5 : Interpreting k-mer based signatures for antibiotic resistance prediction ....................... 93

5.1 Abstract ....................................................................................................................................... 94

5.2 Introduction ................................................................................................................................. 94

5.3 Methods ...................................................................................................................................... 96

5.4 Results ....................................................................................................................................... 102

5.5 Discussion .................................................................................................................................. 111

5.6 References ................................................................................................................................. 114

CHAPTER 6 : PFM-like, a novel family of subclass B2 metallo β-lactamase from Pseudomonas

synxantha belonging to the Pseudomonas fluorescens complex ........................................................ 119

6.1 Abstract ..................................................................................................................................... 120

6.2 Main text ................................................................................................................................... 120

6.3 Data availability ......................................................................................................................... 126

6.4 References ................................................................................................................................. 126

CHAPTER 7 : Summary and perspectives ............................................................................................ 130

7.1 Summary.................................................................................................................................... 130

7.2 General discussion and future perspectives ............................................................................. 132

7.3 References ................................................................................................................................. 136

Acknowledgments ............................................................................................................................... 139

1

Abstract

While antibiotics still represent the major antibacterial agents for the treatment of bacterial

infections, an increasing number of bacteria is becoming (multi-drug) resistant (MDR), complicating

the treatment of infections. Carbapenems are highly effective antibiotics commonly used for the

treatment of severe bacterial infections of MDR bacteria, which are resistant to first-line antibiotics.

Of major concern, carbapenem resistance is on the rise, and in some countries it is so high that other

drugs, usually reserved as last options, are widely used. As an example, colistin, an old drug that was

essentially unused due to its toxicity, it’s now commonly adopted in some countries, and resistance

toward this antibiotic is on the rise.

Of the several pathogens associated with MDR, carbapenem-resistant K. pneumoniae and A.

baumannii represent major concerns. Both pathogens frequently cause outbreaks of infections, while

strains which are resistant to all available antibiotics are emerging. Concerning K. pneumoniae, a

novel kind of superbug has been emerging recently. While MDR K. pneumoniae clones causing

hospital outbreaks and hypervirulent, drug susceptible clones causing severe community-acquired

infections were two separate concerns, strains that showed convergence of the two traits are

emerging. Acquisition of hypervirulence and resistance genes have been observed in MDR and

hypervirulent clones, respectively, especially in Asia. Tracking the emergence and evolution of such

novel clones, which cause severe infections with limited treatment options, is fundamental.

The decreasing cost of Whole Genome Sequencing (WGS) is allowing its increase implementation in

bacterial diagnosis. However, there is still a lack of surveillance investigations for last-line resistance

mechanisms and for convergence of resistance and hypervirulence traits. Moreover, while the

phenotype prediction from the genomic data showed encouraging results, the understanding of the

genetic resistance mechanisms of some drugs, such as colistin, is still limited, and novel in silico tools

for the phenotype prediction are needed.

We employed WGS and bioinformatics, together with phenotypic techniques, to address different

problems: i) to decipher the colistin resistance mechanisms and the genomic epidemiology of clinical

isolates of K. pneumonia and A. baumannii from countries where carbapenem resistance is sky-high,

and colistin represent a life-saving agent. ii) to explore the longitudinal population dynamics of K.

pneumonia in a major Chinese hospital, focusing on the simultaneous carriage of resistance and

hypervirulence genes. iii) to predict the phenotype of K. pneumonia strains from their genomes. iv) to

study a novel carbapenemase-encoding gene obtained from environmental bacteria.

2

Samenvatting

Hoewel antibiotica de belangrijkste antibacteriële middelen zijn voor de behandeling van bacteriële

infecties, wordt een toenemend aantal bacteriesoorten (multi-) resistent (MDR), wat de behandeling

van infecties bemoeilijkt. Carbapenems zijn zeer effectieve antibiotica die vaak worden gebruikt voor

behandeling van ernstige MDR bacteriële infecties, die resistent bleken tegen eerstelijns antibiotica.

Zorgwekkend is dat de carbapenem-resistentie toeneemt en in sommige landen zo hoog is dat

andere geneesmiddelen, die meestal alleen als laatste optie worden gebruikt, op grote schaal

worden gebruikt. Colistine, een oud medicijn dat meestal niet werd gebruikt vanwege toxiciteit,

wordt nu in sommige landen algemeen gebruikt en de resistentie tegen dit antibioticum neemt toe.

Van de verschillende MDR pathogenen vormen carbapenem-resistente Klebsiella pneumoniae en

Acinetobacter baumannii klinisch belangrijke voorbeelden. Beide ziekteverwekkers veroorzaken vaak

uitbraken van infecties, terwijl er stammen ontstaan die resistent zijn tegen alle beschikbare

antibiotica. In het geval van K. pneumoniae is onlangs een nieuw soort superbacterie waargenomen.

Terwijl normaalgesproken MDR en hypervirulentie in K. pneumoniae klonen apart werden

waargenomen zij er nu klonen geïdentificeerd die convergentie van deze twee eigenschappen laten

zien. Acquisitie van hypervirulentie- en resistentiegenen is vooral in Azië gezien. Het volgen van de

opkomst en evolutie van dergelijke nieuwe klonen, die ernstige infecties veroorzaken met beperkte

behandelingsmogelijkheden, is van fundamenteel belang.

De dalende kosten van Whole Genome Sequencing (WGS) maakt het mogelijk de implementatie

ervan in de bacteriële routinematige diagnostiek van infectieziekten te versnellen. Er is echter nog

steeds een gebrek aan surveillance van bestaande en nieuwe resistentiemechanismen en naar

convergentie van resistentie- en hypervirulentie-eigenschappen. Bovendien, alhoewel de fenotype-

voorspelling uit de genomische gegevens bemoedigende resultaten liet zien, is het begrip omtrent

resistentiemechanismen rond sommige geneesmiddelen, zoals colistine, nog steeds beperkt, en zijn

nieuw bio-informatische in silico instrumenten voor de fenotype-voorspelling nodig.

In mijn proefschrift gebruikte ik WGS en bio-informatica, samen met fenotypische technieken, om

verscheidene problemen aan te pakken. Ten eerste heb ik onderzoek uitgevoerd naar colistine-

resistentiemechanismen en de genomische

epidemiologie van klinische isolaten van K. pneumoniae en A. baumannii uit landen waar de

carbapenem-resistentie torenhoog is. Ten tweede bestudeerde ik de longitudinale

populatiedynamiek van K. pneumoniae in een groot Chinees ziekenhuis, met nadruk op de analyse

van lokale en internationale verspreiding van resistentie- en hypervirulentiegenen. Ik analyseerde en

3

ontwikkelde methoden om het fenotype van K. pneumoniae stammen uit hun genomen te

voorspellen. Tenslotte bestudeerde ik een nieuw carbapenemase-coderend gen dat was gevonden in

omgevingsbacteriën. Resultaten van deze onderzoekingen zijn samengevat in dit proefschrift.

4

List of abbreviations

Abbreviations Full description

ACL adaptive cluster lasso

AMR antimicrobial resistance

AST antimicrobial susceptibility testing

AUC area under the curve

bACC balanced accuracy

CC clonal complex

cDBG compacted De Bruijn Graph

CG clonal group

cKp classical K. pneumoniae

colR/ColR Colistin resistant

ColS colistin susceptible

cps Capsular polysaccharide

CRAB carbapenem resistant A. baumannii

CRKP/CR-Kp carbapenem-resistant K. pneumoniae

dNTP deoxyribonucleotide triphosphate

ESBL extended spectrum β-lactamase

GI gastro-intestinal

GWAS genome-wide association studies

HAI hospital acquired infection

hvKp hyper-virulent K. pneumoniae

IC international clone

ICU intensive care unit

IS insertion sequence

KPC Klebsiella pneumoniae carbapenemases

L-Ara4N L-aminoarabinose

LD linkage disequilibrium

LPS lipopolysaccharide

MAF minor allele frequency

MALDI-TOF MS matrix-assisted laser desorption/ionization–time of flight mass spectrometry

MBL metallo-β-lactamase

MDR multidrug-resistant

MIC minimum inhibitory concentration

5

ML machine learning

MLST multi-locus sequence typing

NGS Next-Generation Sequencing

NS non-susceptible

OCL outer core locus

ONT Oxford Nanopore Technologies

PBS phosphate-buffered saline

pEtN phosphoethanolamine

PFGE pulsed-field gel electrophoresis

ROC Receiver Operating Characteristic

S susceptible

SMRT single-molecule real-time

SNP single nucleotide polymorphism

UTI urinary tract infection

VNTR variable-number tandem repeat

WGS whole genome sequencing

WHO World Health Organization

ZMW zero-mode waveguide

6

List of figures

Figure 1. Antibiotic resistance strategies in bacteria. From Erik Gullberg, 2014.

Figure 2. Predicted global deaths due to antimicrobial-resistant infections every year, compared to

other major diseases. From O’Neill, 2014.

Figure 3. WHO priority pathogens list for R&D of new antibiotics. *Enterobacteriaceae include: K.

pneumoniae, E. coli, Enterobacter spp., Serratia spp., Proteus spp., Providencia spp. and

Morganella spp. From Tacconelli et al., 2018.

Table 1. β-lactamases types, including some examples of clinically relevant enzymes.

Figure 4. Regulation pathways of LPS modifications in Klebsiella pneumoniae. From Poirel et al.,

2017.

Figure 5. Four well-characterized virulence factors in classical and hypervirulent K. pneumoniae

strains. From Paczosa and Mecsas, 2016.

Figure 6. Schematic representation of A. baumannii colistin resistance mechanisms. From Trebosc

et al., 2019.

Figure 7. A schematic representation of the hypothetical workflow after adoption of WGS, with low

complexity and an expected turnaround time within one day. Adapted from Didelot et al., 2012.

Figure 8. Overview of the three generations of sequencing technologies, with examples of the

major sequencing platforms. From Loman and Pallen, 2015.

7

Preface

In this preface, an overview of the contents of each chapter in this thesis is provided, the chapters

that are included as publications are listed, and the contribution to the chapters directly from the

author of this thesis are listed.

Chapter 1: General introduction and aims

This is an original overview of the background, key concepts and objectives of this thesis.

Chapter 2: Genomic epidemiology of carbapenem- and colistin-resistant Klebsiella pneumoniae

isolates from Serbia: predominance of ST101 strains carrying a novel OXA-48 plasmid

This chapter is an original work that resulted in a publication in Frontiers in Microbiology (DOI:

10.3389/fmicb.2020.00294). I was first author and the main contributor of the work presented in this

publication.

The nature and extent of the thesis author contributions to this chapter are detailed below:

• I contributed to the design of this published study and interpretation with Prof. Alex van Belkum,

Prof. Marco Maria D’Andrea and Prof. Gian Maria Rossolini.

• I performed all wet lab experiments, including antimicrobial susceptibility testing, MALDI-TOF MS

and DNA extraction.

• I performed library preparations for Nanopore long-read sequencing under supervision by and

assistance from Franck Tarendeau (bioMérieux Grenoble).

• I conducted all epidemiological, phylogenetic, and genomic analysis with Prof. Marco Maria

D’Andrea.

• I was responsible for the planning, drafting, editing, and submission of the manuscript, though all

co-authors also edited the manuscript.

Chapter 3: Abundance of colistin-resistant, OXA-23- and ArmA-producing Acinetobacter baumannii

belonging to International Clone 2 in Greece

This chapter is an original work that resulted in a publication in Frontiers in Microbiology (DOI:

10.3389/fmicb.2020.00668). I was first author and the main contributor of the work presented in this

publication.

The nature and extent of my contributions to this chapter are detailed below:

8

• I contributed to the design of this published study and interpretation with Prof. Alex van Belkum,

Prof. Marco Maria D’Andrea and Prof Gian Maria Rossolini. Dr. Nikos Legakis was responsible for the

collection, initial characterization and shipment of the strains. I verified some of the strain

characteristics for reasons of quality control.

• I performed MALDI-TOF MS under supervision by and assistance from Nadine Perrot.

• I performed all wet lab experiments, including antimicrobial susceptibility testing and DNA

extraction.

• I conducted all epidemiological, phylogenetic, and genomic analysis with input from Prof. Marco

Maria D’Andrea.

• I was responsible for the planning, drafting, editing, and submission of the manuscript, though all

co-authors also edited the manuscript.

Chapter 4: Genomic evolution and local epidemiology of Klebsiella pneumoniae from the Beijing

Hospital 301 over a fifteen-year period: dissemination of known and novel high-risk clones

This chapter is an original work that resulted in an in-progress manuscript, soon to be submitted for

publication. I was first author and the main contributor of the work presented in this manuscript.


• I conducted all epidemiological, phylogenetic, and genomic analysis together with Dr. Kelly L. Wyres.

• I wrote the first draft of the manuscript and consolidated the editing suggestions made by the co-

authors.

Chapter 5: Interpreting k-mer based signatures for antibiotic resistance prediction

This chapter is an original work that resulted in a submitted manuscript, under revision at the time of

submission of this thesis. I was second author.

The nature and extent of my contributions to this chapter are details below:

• I contributed to the design of this nearly published study and performed data interpretation with

Dr. Pierre Mahé, Dr. Magali Jaillard and Prof. Alex van Belkum.

• I built the K. pneumoniae database used to test the machine elarning algorithm.

• I contributed to the analysis of the data.

9

• I contributed to the initial writing and editing of the manuscript.

Chapter 6 : PFM-like, a novel family of subclass B2 metallo β-lactamase from Pseudomonas

synxantha belonging to the Pseudomonas fluorescens complex

This chapter is an original work that resulted in a publication in Antimicrobial Agents and

Chemotherapy (DOI: 10.1128/AAC.01700-19). I was second author and the main contributor of the

experimental work presented in this publication.


• I performed most of the wet lab experiments, including antimicrobial susceptibility testing, gene

cloning, enzyme purification and kinetic analysis of hydrolysis.

• I conducted all bioinformatics analyses.

• I wrote the first draft of the manuscript.

Chapter 7 : Summary and future perspectives

This is an original summary of the implication and significance of the work presented in this thesis,

together with a brief general discussion and the future perspectives.

10

CHAPTER 1 : General introduction and aims

1.1 The antimicrobial resistance crisis

The discovery of antibiotics in the early phase of the previous century was one of the most important

developments in medicine and a milestone in the history of modern human society. Before the

introduction of antibiotics, infectious diseases were a major cause of mortality due to the systemic

infections, sepsis resulting from wound infections, pneumonia and also common infections

surrounding childbirth. In the absence of antibiotics, routine clinical practices such as organ

transplants, surgery and cancer chemotherapy would be impossible 1.

As soon as antibiotics were introduced in clinical practice, clinically-relevant antibiotic resistant

bacterial strains were described. These strains emerged due to their ability to rapidly evolve via both

vertical and horizontal inheritance 2.

Moreover, antibiotics have been inappropriately used in particular outside healthcare settings and

especially in low-income countries. The misuse and overuse of antibiotics has not only been a

problem observed in human clinical settings, but also a frequent habit in agriculture, aquaculture and

animal farming. Alarmingly, these drugs are largely used as disease prophylaxis and growth factors 3.

This situation has led to selection and propagation of antibiotic resistant strains in many

environments, turning them into reservoirs that contribute to storage, transmission and selection of

new superbugs. Consequently, some infections previously easily manageable are now difficult or

impossible to treat 4. Infections caused by a pathogen resistant to the drug of treatment generally

have a poorer clinical outcome (possibly even death) and are also linked to a greater overall

consumption of healthcare resources, when compared to infections caused by antibiotic-susceptible

organisms 1.

Members of a bacterial species can all be naturally resistant to a specific drug (intrinsic resistance) or

(the) resistance trait(s) can be acquired by susceptible microorganisms (acquired resistance). On a

genetic level, resistance may arise i) endogenously, through random chromosomal point mutations,

often when sub-therapeutic concentrations of antibiotics increase mutability and specifically select

for resistant strains, or ii) exogenously, through horizontal gene transfer, when foreign DNA is

mobilized via conjugative plasmids (transformation), bacteriophages (transduction), transposons,

insertion sequences and naked DNA, eventually leading to the recombination of acquired DNA into

the chromosome 2. Concerning the endogenous mechanisms, the process toward high level

resistance is usually stepwise. The antibiotic selection pressure enriches for bacterial cells with an

initial mutation that allows its enhanced survival, followed by subsequent additional mutations that

11

confer increased resistance levels during further antibiotic therapy. Though mutation frequencies can

be as low as 10-8, this is offset by the huge numbers of cells in bacterial colonies 5. Concerning

exogenous mechanisms, the major genetic elements associated with resistance genes are plasmids.

These are nearly ideal carriers for acquisition and dissemination of resistance genes followed by

transposons, which can move genes between plasmids or chromosomes, and the integrons that can

ease the recruitment and expression of resistance determinants. These elements are widely present

among both Gram-negative and Gram-positive bacterial species and play a crucial role for

dissemination of resistance determinants 6.

From a biochemical point of view, four major mechanisms of resistance can occur in bacteria: i)

decreased antibiotic uptake associated with reduction of membrane permeability (e.g. resistance to

tetracyclines and quinolones); ii) enzymatic inhibition/inactivation of the antibiotic (e.g. resistance to

β-lactams by β-lactamases); iii) rapid efflux of the antibiotic from the cell (e.g. resistance to

tetracyclines and macrolides); iv) target alterations: mutation of the cellular structure (receptor) that

the antibiotics target (e.g. resistance to oxacillin and methicillin by mutating the mecA gene,

mutations in DNA gyrase resulting in resistance to several fluoroquinolones); and v) acquisition of

one or more alternative metabolic pathways to supplement those inhibited by antibiotics (e.g.

resistance to sulfonamides) 7(Figure1). These resistance mechanisms can be present together in

different combinations in a single bacterial cell, potentially allowing high level resistance to multiple

antibiotic compounds simultaneously 8.

Figure 1. Antibiotic resistance strategies in bacteria 9

Ever-growing levels of antimicrobial resistance (AMR) menace the health benefits facilitated by

antibiotics and this phenomenon is recognised as a global crisis 10. With an estimate of 50,000 deaths

across the US and Europe every year attributable to AMR, urgent international actions need to be

taken to preserve the efficacy of modern antibiotic treatments.

12

Without proactive solutions to prevent the continued escalation of antibiotic resistance, it is

estimated that by 2050 approximately 10 million people will die annually of antimicrobial-resistant

infections, which is more than the cumulative number of people dying today from any other type of

disease 1(Figure2).

Figure 2. Predicted global deaths due to antimicrobial-resistant infections every year, compared to other major diseases 1

1.2 The ESKAPE pathogens

The ESKAPE pathogens (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae,

Acinetobacter baumannii, Pseudomonas aeruginosa and Enterobacter species), although not the only

worrisome pathogens, have been labelled as requiring special attention since they are responsible

for the majority of hospital acquired infections (HAIs), concurrently showing a high prevalence of

AMR 11. The World Health Organization (WHO) has also recently listed twelve bacterial species

against which new antibiotics are urgently needed 12. They describe three categories of pathogens

namely critical, high and medium priority, according to the urgency of need for new antibiotics

(Figure3). Carbapenem-resistant A. baumannii and P. aeruginosa along with extended spectrum β-

lactamase (ESBL) or carbapenem-resistant Enterobacteriaceae (including K. pneumoniae) were listed

in the critical priority list of pathogens.

13

Figure 3. WHO priority pathogens list for R&D of new antibiotics. *Enterobacteriaceae include: K. pneumoniae, E. coli, Enterobacter spp., Serratia spp., Proteus spp., Providencia spp. and Morganella spp.

12

1.2.1 Klebsiella pneumoniae

K. pneumoniae, belonging to the Enterobacteriaceae family, was first isolated in the late 19th century

and was initially known as Friedlaender’s bacterium 13. From a clinical point of view, the species K.

pneumoniae is the most important member of the genus Klebsiella spp., which also includes other

clinically relevant species such as K. oxytoca and, even if to a lesser extent, K. rhinoscleromatis and K.

ozaenae 14. Klebsiella spp. are Gram-negative, encapsulated, non-motile bacteria that are able to

readily colonize human mucosal surfaces, including the gastro-intestinal (GI) tract and oropharynx,

even if this colonization appears benign 15. From these sites, this opportunistic pathogen can gain

entry to other tissues where it can cause severe infections in humans. Major diseases include urinary

tract infections, lower respiratory tract infections, intraabdominal infections and bloodstream

infections. Other diseases, such as meningitis and wound infections, are less common 16.

As the best known genus member, K. pneumoniae is a common opportunistic mostly nosocomial

pathogen, accounting for about one third of all Gram-negative HAIs overall 17. It is also an important

cause of serious community onset infections such as necrotizing pneumonia, pyogenic liver abscesses

and endogenous endophthalmitis 14.

In healthcare settings, K. pneumoniae infections commonly occur among patients who already suffer

from serious underlying clinical conditions, often together with a state of general immunodeficiency.

14

Risk factors for K. pneumoniae infections include extremes of age, presence of malignancy, diabetes,

chronic liver disease, recent solid-organ transplantation, and chronic dialysis 18. Other risk factors for

nosocomial infections by K. pneumoniae are treatment with corticosteroids, chemotherapy, organ

transplantation, or other treatments or conditions resulting in neutropenia 19.

Over the last few decades, there has been a concerning rise in the acquisition of resistance to a wide

range of antibiotic classes by “classical” K. pneumoniae strains 20. Consequently, simple infections

such as UTIs have become hard to treat, while more serious infections such as pneumonia and

bacteremia have become increasingly life-threatening 21.

From the mid-1980s, a novel type of community-acquired invasive K. pneumoniae infection, primarily

in the form of pyogenic liver abscesses, has emerged in mostly Asian countries 22. K. pneumoniae

strains causing these invasive infections are defined as being hyper-virulent and express a distinct

hyper-mucoviscous phenotype when grown on agar plates 23.

Very recently, strains with a hyper-virulent phenotype have been found to carry antimicrobial

resistance genes including carbapenemases 24 but also mechanisms of resistance against last resort

antibiotics such as colistin 25, thus leading to a terrific scenario in lacking of novel approach to treat

this kind of superbugs.

1.2.1.1 Antimicrobial resistance in K. pneumoniae: the β-lactamases

K. pneumoniae can produce various enzymes that hydrolyze the four-membered ring of β-lactams

and inactivate them. These enzymes include ESBLs, oxacillinases, carbapenemases (including metallo-

and serine-β-lactamases), among others (Table 1). Genes encoding such enzymes are generally

present on plasmids which K. pneumoniae seems to readily acquire. Such plasmids often carry other

genes conferring resistance to other antibiotic classes including aminoglycosides, chloramphenicol,

sulfonamides, trimethoprim, and tetracyclines. Thus, bacteria containing these plasmids are often

multidrug-resistant (MDR) 26.

Type Ambler class Features Enzymes

Narrow-spectrum β-lactamases

A Hydrolyze penicillins TEM-1, TEM-2, SHV-1

Extended-spectrum β-lactamases

A Hydrolyze narrow and extended-spectrum β-lactams

SHV-2, CTX-M-15, VEB-1, PER-1

Serine carbapenemases

A Hydrolyze carbapenems KPC-2, KPC-3, IMI-1

Metallo β-lactamases B Hydrolyze carbapenems NDM-1, VIM-1, IMP-1 Cephalosporinases C Hydrolyze cephamycins and

some oxymino β-lactams AmpC, CMY-2, FOX-1

OXA-type enzymes D Hydrolyze carbapenems OXA-48, OXA-232 Table 1. β-lactamases types, including some examples of clinically relevant enzymes.

15

Two major types of antibiotic resistance have been commonly described in K. pneumoniae, both

involving the production of β-lactamases. The first mechanism, initially described in the late 1980’s

concomitantly in Europe 27 and in the US 28, is the production of variants of the SHV-1 or TEM-1 β-

lactamases, in which the substitution of only one or two amino acids led to the appearance of

variants that have been termed ESBLs. ESBLs are chromosomally or plasmid-encoded enzymes that

mediate resistance to penicillins, extended-spectrum (third generation) cephalosporins (e. g.

ceftazidime, cefotaxime, and ceftriaxone) and monobactams (e. g. aztreonam), but do not affect

cephamycins (e. g. cefoxitin and cefotetan) or carbapenems (e. g. meropenem and imipenem) 29. The

early SHV and TEM variants have been largely replaced by the CTX-M family of ESBLs, identified in

the early 1990s in Western Europe and South America and that are currently the most common type

of ESBL in enteric bacteria 30.

The second major mechanism of resistance is the expression of carbapenemases, which renders K.

pneumoniae resistant to all β-lactams, including the carbapenems. Carbapenemases can be classified

on the basis of their aminoacid sequence in different molecular classes: class A (e.g. IMI-, SME-, KPC-

type enzymes), class B (of which the main representatives in clinical isolates are the NDM-, IMP- and

VIM-types) and class D β-lactamases (e.g. OXA-48-types, OXA-232-types) 31.

Klebsiella pneumoniae carbapenemases (KPCs) represent the clinically most relevant mechanism of

acquired antimicrobial resistance observed in K. pneumoniae during recent years. This is due to their

very wide range of activity against several β-lactam families, including penicillins, older and newer

cephalosporins, aztreonam and carbapenems 32.

Several different KPC variants (KPC-2 to KPC-22) have been described, even if KPC-2 and KPC-3 are

the most widely diffused. KPCs are mostly plasmid-encoded enzymes and bacteria carrying these

plasmids are often susceptible to only a few antibiotics such as colistin, aminoglycosides, and

tigecycline.

1.2.1.2 Antimicrobial resistance in K. pneumoniae: colistin resistance

Polymyxins represent the major antimicrobial therapeutic option against carbapenem-resistant K.

pneumoniae infections over the last decades. Indeed, polymyxin E (colistin) is considered as a “last

resort” antimicrobial for the treatment of MDR K. pneumoniae infections, essentially the only drug

that will reach adequate serum levels and that will pass the minimum inhibitory concentration (MIC)

of the infecting strain 33.

16

Consequently, the increasing prevalence of colistin-resistant K. pneumoniae is a major concern,

considering the scarcity of the alternative treatment options and the high mortality rate associated

with carbapenem- and colistin-resistant K. pneumoniae infections 34.

The target of colistin is the outer membrane of Gram-negative bacteria. An electrostatic interaction

occurs between the positively charged colistin molecule on the one side and the phosphate groups of

the negatively charged lipid A on the other side. Divalent cations (Ca2+ and Mg2+) are consequently

displaced from the negatively charged phosphate groups of membrane lipids 35. Then, the

lipopolysaccharide (LPS) is destabilized, the permeability of the bacterial membrane is increased, and

cytoplasmic leakage ultimately causes cell death 36. Even though LPS is the initial target, the exact

colistin mode of action is still uncertain 37.

Similar to what is observed in bacteria that are naturally resistant to colistin, LPS modifications via

addition of cationic groups, i.e. L-aminoarabinose (L-Ara4N) and phosphoethanolamine (pEtN), is

responsible for colistin resistance in K. pneumoniae. A large panel of genes and operons is involved in

qualitative modification of the LPS (Figure4). The pmrCAB operon encodes the pEtN

phosphotransferase PmrC, the response regulator PmrA, and the sensor kinase protein PmrB. The

pEtN phosphotransferase PmrC adds a pEtN group to the LPS. Environmental stimuli such as ferric

(Fe3+) iron, aluminium (Al3+), and low pH (e.g., pH 5.5) activate PmrB through its periplasmic domain.

The tyrosine kinase PmrB in turn activates PmrA by phosphorylation. Finally, PmrA activates the

transcription of the pmrCAB operon itself, and also of the pmrHFIJKLM operon and the pmrE gene

which are also involved in LPS modifications. Specific PmrA/B mutations are responsible for

constitutive activation of the PmrAB two-component system, and have been described as being

responsible for colistin resistance in K. pneumoniae 38.

The pmrHFIJKLM operon encodes for seven proteins, and together with the pmrE gene they are

responsible for the synthesis of the L-Ara4N and its coupling to lipid A. The phoPQ operon encodes

the regulator protein PhoP and the sensor protein kinase PhoQ. In a similar way to PmrB, PhoQ

senses environmental stimuli such as low magnesium (Mg2+) and low pH (e.g., pH 5.5), which mediate

PhoQ activation through its periplasmic domain. PhoQ in turn activates PhoP by phosphorylation.

Finally, PhoP activates the transcription of the pmrHFIJKLM operon, mediating the addition of L-

Ara4N to the LPS. PhoP can also activate the PmrA protein, both directly or indirectly via the PmrD

connector protein, causing the LPS modification via pEtN addition. Several mutations in the phoP/Q

genes are responsible for constitutive activation of the PhoPQ two component system and

consequently colistin resistance in K. pneumoniae 38.

17

MgrB is a small transmembrane protein that acts as a negative regulator of the PhoPQ two-

component system. Inactivation of the mgrB gene leads to overexpression of the phoPQ operon and

consequently colistin resistance. Several missense mutations resulting in amino acid substitutions

and nonsense mutations leading to a truncated MgrB protein have been observed. Insertional

inactivation caused by different insertion sequences (IS), belonging to several families and inserted at

different locations within the mgrB gene, is often responsible for colistin resistance in K. pneumoniae

39,40.

The crrAB operon encodes the regulatory protein CrrA and the sensor protein kinase CrrB, which

regulate the pmrAB expression. Inactivation of the crrB gene leads to overexpression of the pmrAB

operon, finally resulting in colistin resistance 41.

Finally, the plasmid-mediated mcr-1 gene is responsible for horizontal transfer of colistin resistance.

It was initially described in E. coli and K. pneumoniae isolates from Chinese patients between 2011

and 2014 42. The encoded MCR-1 protein is a pEtN transferase, and its acquisition results in the

addition of pEtN to lipid A, similarly to the chromosomal mutations mentioned above. Following mcr-

1, several other variants, up to mcr-9, have been described 43–50.

Figure 4. Regulation pathways of LPS modifications in Klebsiella pneumoniae 37

1.2.1.3 Hyper-virulent K. pneumoniae

Despite rendering bacterial infections more difficult to treat, MDR does not enhance the virulence of

K. pneumoniae strains. However, starting from the 1980s, K. pneumoniae strains with the ability to

cause severe infections in apparently healthy individuals emerged. These strains are defined as

hyper-virulent K. pneumoniae (hvKp) compared to classical K. pneumoniae (cKp) strains as they are

18

able to infect both healthy and immunocompromised individuals, with resulting infections which are

generally invasive.

Infections were first described in Taiwan and are common on the Asian Pacific Rim. However, new

cases have recently been reported on a more global scale. In contrast to the infections caused by cKp,

most hvKp infections originate in the community 51. While pyogenic liver abscesses represents the

major disease, hvKp strains can also cause pneumonia and lung abscesses, among others 52.

Bacteremia is frequent among hvKP-infected patients and is correlated with a significantly poorer

prognosis 53.

Several virulence factors were reported and studied in hvKP strains. Capsule is a polysaccharide

matrix that overlays the cell and it is fundamental for K. pneumoniae virulence. hvKp strains are

characterized by hyper-capsulation which consists of an extensive mucoviscous exopolysaccharide

coating that is thicker and more robust than that of the typical capsule. This hyper-capsule

contributes significantly to the pathogenicity of hvKp 20.

Most hvKp are associated with only two of the 130 reported capsular serotypes, K1 and K2, that were

shown to be particularly anti-phagocytic and serum resistant 20,54. hvKp are also associated with

several other key virulence factors (Figure5); the rmpA and rmpA2 genes that upregulate capsule

expression thereby aiding the formation of a hyper-capsule which is linked to the hyper-mucoviscous

phenotype; the colibactin genotoxin that induces eukaryotic cell death and promotes bacterial

transfer from the intestines into the blood; the yersiniabactin, aerobactin and salmochelin

siderophores that enhance survival in the blood by promoting iron scavenging 20. Yersiniabactin

synthesis is encoded by the ybt locus that is generally mobilized by an integrative, conjugative

element termed ICEKp. Its prevalence is about 40% in K. pneumoniae and it is frequently acquired

and lost from MDR clones 55. Conversely, the salmochelin (iro), aerobactin (iuc) and rmpA/rmpA2 loci

are usually co-harbored by a virulence plasmid 56. The prevalence of that virulence plasmid is less

than 10% in the K. pneumoniae population, and until recently it was rarely reported among cKp

strains 57.

hvKp strains are generally susceptible to most antimicrobials. However, the last few years have seen

an increasing number of reports of ‘convergent’ K. pneumoniae strains that are both hyper-virulent

(carrying the iuc aerobactin locus, which is recognized as the single most important feature of hvKp

strains 58) and ESBL/carbapenemase producers. The majority of these reports represent sporadic

isolations, but in 2017 Gu and colleagues reported a fatal outbreak in a Chinese hospital caused by a

hyper-virulent carbapenemase-producing K. pneumoniae isolate 59.

19

Figure 5. Four well-characterized virulence factors in classical and hypervirulent K. pneumoniae strains 20

1.2.2 Acinetobacter baumannii

Acinetobacter baumannii is a Gram-negative coccobacillus recognized as an important opportunistic

human pathogen causing infections of the urinary tract, skin, bloodstream, and soft tissues 60. The

majority of A. baumannii infections occur among critically ill patients in the intensive care unit (ICU)

setting, accounting for as much as 20% of infections in ICUs worldwide 61. MDR phenotypes due to

the acquisition of antibiotic resistance mechanisms represent a major factor of the success of A.

baumannii in hospital environments. Antibiotic modifying enzymes, decreased permeability to

antibiotic molecules, and active efflux pumps are among the major AMR mechanisms. Apart from its

multidrug resistance, the success of A. baumannii can also be attributed to its ability to survive in the

hospital environment 62. Examples of the challenges that A. baumannii faces as an opportunistic

human pathogen include the survival at low temperatures, the exposure to antiseptics and

desiccating agents and the rapid changes of environmental and nutritional conditions when

transferred into the human body from the hospital environment. Therefore, A. baumannii needs to

sense and adapt to these changes in an efficient and prompt manner. A. baumannii also has also the

ability to colonize the skin of patients or healthy individuals without causing any apparent illness.

However, transmission of such colonizing bacteria to a susceptible patient can result in immediate

infection.

1.2.2.1 Multidrug-Resistant A. baumannii

The major mechanism of β-lactam resistance in A. baumannii is enzymatic degradation by β-

lactamases. A. baumannii strains are characterized by chromosomally encoded AmpC

cephalosporinases, which are also known as Acinetobacter-derived cephalosporinases (ADCs). The

overexpression of such enzymes in A. baumannii is regulated by the presence of an upstream

insertion sequence (IS) element, the major representative being ISAba1. The presence of this

20

element correlates with resistance to extended-spectrum cephalosporins due to the increased ADC

production. Cefepime and carbapenems are not hydrolyzed by these enzymes.

ESBLs of the VEB-, PER-, TEM- and CTX-M-type have also been reported in A. baumannii. However,

the assessment of their prevalence is hindered by difficulties with laboratory detection in the

presence of ADCs 60.

The β-lactamases with carbapenemase activity are of major concern and include the serine

oxacillinases (Ambler class D OXA type) and the metallo-β-lactamases (MBLs) (Ambler class B).

The second intrinsic β-lactamase produced by A. baumannii is an oxacillinase, represented by the

OXA-51/69 variants. The OXA-51-like-encoding genes are chromosomally located in A. baumannii and

the carbapenemase activities of OXA-51/69 enzymes have been studied in detail 63,64. However, the

level of expression of the corresponding genes is quite low in most cases, resulting in a minor impact

on β-lactam susceptibility 65.

Identification of a carbapenem-hydrolyzing oxacillinase-encoding gene was first reported in A.

baumannii in 1995 and named blaOXA-23. This enzyme type now represents the major carbapenem

resistance determinant in A. baumannii on a global scale. Two other acquired OXA-type genes giving

rise to the production of proteins with carbapenemase activity have been reported, the blaOXA-24-like

and the blaOXA-58-like carbapenemase genes 65.

IS elements play an important role in oxacillinases-mediated carbapenem resistance in A. baumannii.

These elements provide two major functions. First, they encode a transposase, allowing the

mobilization of the carbapenemase-encoding gene. Second, they can contain promoter regions that

lead to overexpression of downstream genes. IS elements have been frequently described upstream

of blaOXA-23 and blaOXA-58 genes, but they may also promote carbapenem resistance in association with

intrinsic genes such as blaOXA-51. Some IS elements, in particular ISAba1, are relatively unique to A.

baumannii 60.

Aminoglycoside resistance in A. baumannii is encoded by acetyltransferases, nucleotidyltransferases,

and phosphotransferase-encoding genes. More alarmingly, 16S rRNA methylation is becoming

common in A. baumannii due to the expression of the armA gene. This resistance mechanism

protects the 30S ribosomal subunit from aminoglycoside binding conferring high-level resistance to

all clinically useful aminoglycosides, including gentamicin, tobramycin, and amikacin 66.

The major fluoroquinolone resistance mechanism depends on modifications of DNA gyrase or

topoisomerase IV through mutations in the gyrA and parC genes. Such mutations modify the

fluoroquinolone’s target binding site 60.

21

1.2.2.2 Colistin resistance in A. baumannii

The main mechanism of colistin resistance in A. baumannii corresponds to the addition of cationic

groups to the LPS (Figure6). Colistin resistance may also be the consequence of a complete loss of

LPS production. However, LPS loss is associated to growth defects and decreased virulence, and for

these reasons very few clinical isolates are LPS deficient 67.

Colistin resistance has been linked to mutations in the two-component transcriptional regulator

genes pmrA/B and consequent pmrC overexpression in most instances. The pEtN phosphotransferase

PmrC adds a pEtN group to the lipid A of the lipopolysaccharide, lowering the net negative charge of

the cell membrane, thus impacting the binding of colistin and preventing the cell membrane leakage.

The complete loss of LPS is caused by alterations of the lipid A biosynthesis genes, namely the lpxA,

lpxC, and lpxD genes. Mutations identified in those genes were either substitutions, truncations,

frameshifts , or insertional inactivation by the insertion sequence ISAba11 37.

Colistin resistance may also result from the overexpression of etpA, a pmrC homolog. This is

mediated by insertional inactivation of a gene encoding an H-NS family transcriptional regulator 68 or

by integration of insertion sequence elements upstream of the eptA gene itself 69–71.

Figure 6. Schematic representation of A. baumannii colistin resistance mechanisms 69

.

1.3 Whole Genome Sequencing (WGS): a disruptive diagnostic tool

The current methods of clinical microbiology diagnostics mainly consist on conventional culturing of

clinical samples on different agar plates, followed by antimicrobial susceptibility testing (AST) and

further characterization on a case-by-case basis. The major steps in processing a sample are isolating

a pathogen, determining its species, testing antimicrobial susceptibility and virulence and, in specific

22

settings, intra-species typing for epidemiological purposes. The first three steps are crucial for the

treatment and management of an infected patient, while the last step is valuable for identifying

outbreaks and improve the surveillance. Depending on the pathogen, this practice usually takes one

to two days for culturing, an additional one to two days for species identification and susceptibility

testing, and several days for typing 72. While the species identification and AST can be performed

significantly faster, for example by employing MALDI-TOF MS and rapid disk diffusion after 4-6 hours

of culture 73,74, the overall diagnostic process, including typing, remains complex, time-consuming

and difficult to automate 72.

Several methods for rapid diagnostic testing have been developed and evaluated. Molecular

methods, such as PCR, microarray, and nucleic acid sequencing, have been widely adopted in the

clinical laboratory. These methods are able to identify microorganisms, genes and genetic

polymorphisms with high sensitivity and specificity through detection of specific nucleic acid targets.

Regardless of methodology, molecular diagnostics have the capability to reduce the time to results

and provide more accurate diagnosis. Despite these clear advantages, molecular diagnostic methods

are still expensive, and AST is limited to the detection of few resistance markers 75.

WGS has all the essentials to dramatically revolutionize bacterial diagnosis and surveillance by

replacing current time-consuming and labour-intensive techniques with a single and rapid diagnostic

test (Figure 7). Over the past two decades, huge progress was made in the field of high-throughput

sequencing technologies, and nowadays sequencing the full genome of a bacterial pathogen is

considered neither challenging nor particularly expensive anymore. As a result, WGS is believed as

the obvious and inevitable future diagnostics in multiple reviews and opinion articles 72,75–79.

Figure 7. A schematic representation of the hypothetical workflow after adoption of WGS, with low complexity and an expected turnaround time within one day (Adapted from

72).

23

However, WGS diagnostics is still not widely adopted in clinical microbiology, which may seem in

contrast with the number of applications for which WGS has huge potential, and which are already

widely used in the academic research 80.

Some major applications of WGS in diagnosing infectious diseases include:

i) Strain identification and typing. WGS data can be exploited to obtain information concerning the

bacterial species and subtype. WGS can also allow the phylogenetic placement of a given sequence

relative to an existing set of isolates for which the complete genome sequence is also known. WGS-

based strain identification offers a greater resolution compared to current genetic marker-based

approaches such as multi-locus sequence typing (MLST) pulsed-field gel electrophoresis (PFGE),

variable-number tandem repeat (VNTR) profiling. The greater resolution offered by WGS is also of

major significance for bacteria with large accessory genomes. While the core genome contains the

essential housekeeping genes which are present in all members of a lineage, the accessory genome is

defined as the genome fraction containing nonessential genes. In K. pneumoniae and A. baumannii

most of the relevant genes, like those encoding for resistance or virulence, are located in the

accessory genome.

ii) Phenotype prediction. WGS data provide a rich resource that can be exploited to predict the

pathogen’s phenotype. The major bacterial traits of clinical relevance are AMR and virulence, but

may also include other traits such as the ability to form biofilms or survival in the environment.

Concerning AMR prediction, several databases and bioinformatics tools were developed to detect

known genes and mutations associated with a resistance phenotype 81. More recently, the use of

machine learning (ML) techniques was assessed for the antimicrobial susceptibility prediction

without any previous knowledge of the actual AMR determinants involved 82. In general, ML

algorithms work by finding the relevant features in a complex data set that enable strong and reliable

prediction 83. ML algorithms are used to select the genomic features that are relevant to a given

antibiotic susceptibility profile. These relevant genomic features are then used as a phenotype

“classifier” for unknown genomes and as a source for identifying important genomic regions. From a

practical point of view, the counts of overlapping K-mers (subsequences of length ‘k’ contained

within a biological sequence) are computed and combined with the clinical laboratory generated

phenotypic data for each antibiotic to form one large matrix containing both the k-mers and

antibiotics as features. Different algorithms (boosting algorithms, penalized regression models,

decision trees, random forest, neural networks or set cover machines) are then used to build a

predictive model 82.

24

iii) Tracking outbreaks and identifying sources of recurrent infections. WGS can identify isolates

which are part of an outbreak and, by combining epidemiological data with phylogenetic information,

detect putative transmission events between patients or between patients and the environment.

WGS was successfully employed to reconstruct outbreaks within hospitals and the community

caused by pathogens belonging to several species, including carbapenem-resistant K. pneumoniae 84–

86 and A. baumannii 87. A recent review summarizes the major bioinformatics tool for outbreak

investigations 88.

iv) Improved surveillance. Molecular surveillance and real-time tracking of bacterial disease are

among the major promises of WGS implementation. In order to achieve this, the genomes sequenced

each year together with their metadata (e.g. sampling date, geographic location, isolation host) need

to be shared and methodically archived in an exploitable form. With such data, surveillance

initiatives have the capability to identify the likely geographic origin of emerging bacteria and AMR

genes, to group seemingly unrelated cases into outbreaks, and to clearly identify the emergence of

new clones. In a hospital environment, surveillance can help to detect cross-transmission events

between the hospital and the community and to improve antimicrobial stewardship; on a wider scale,

it can anticipate worldwide emerging trends consequently enabling anticipatory policy decisions.

Despite the WGS potential, there are some major bottlenecks to its implementation as a routine

clinical microbiology diagnostic tool. Major limitations include: the cost of performing WGS, which is

still high but it keeps falling; a lack of clinical microbiologists with bioinformatics skills; a lack of the

necessary computational infrastructure in most medical settings; the incompleteness of reference

microbial genomics databases required for AMR and virulence determinants detection; and the lack

of standardized, effective and easy to use bioinformatics protocols 75,80.

1.3.1 Different WGS platforms

From 2005, novel sequencing technologies emerged under the name of second (or next) generation

sequencing platforms, as opposed to the automated Sanger method, which is a first-generation

technology (Figure 8). Three major technologies, Illumina, SOLiD and 454, were employed to

generate bacterial genomes. From 2011, Illumina displaced the other competitors, and nowadays it

represents the major sequencing platform 89.

Illumina sequencing is based on the sequencing-by-synthesis principle to elucidate the sequence of

DNA. Briefly, DNA polymerases catalyse the binding of fluorescently labelled deoxyribonucleotide

triphosphates (dNTPs) into a DNA template strand during subsequent cycles of DNA synthesis. During

each cycle, at the point of incorporation, the nucleotides are identified by fluorophore excitation.

This process takes place across millions of fragments in a massively parallel fashion. The size of the

25

Illumina reads (the fragments of DNA that are sequenced by the instrument) is up to 300 bases. With

appropriate multiplexing, the ordinary coverage for a bacterial genome sequence project is between

30 and 100 reads per base. Illumina reads accuracy rates are typically around 99.9%, although

systematic biases related to GC-rich regions and some specific DNA motifs exist 90. Illumina has

developed several instruments ranging from low-throughput benchtop machines (MiniSeq, MiSeq) to

ultra-high-throughput instruments (HiSeq, NovaSeq). Illumina sequencing is considered as short-read

sequencing. Such short reads are insufficiently large to cover repeat elements such as transposons

and insertion sequences, which usually mobilize resistance and virulence determinants.

Consequently, short-read genome assemblies are fragmented and can consist of up to hundreds of

DNA fragments, called contigs. Sequencing technologies producing longer reads can cover such

repeats allowing the complete assembly of bacterial genomes.

In 2011, the first single-molecule, third generation long-read sequencing technology was released by

Pacific Biosciences (PacBio), while in 2014 Oxford Nanopore Technologies (ONT) released the MinION

instrument. PacBio’s single-molecule real-time (SMRT) sequencing it’s also based on the sequencing-

by-synthesis principle, as it detects sequence information during the replication process of the target

DNA molecule. The method is based on the optical observation of the polymerase-mediated

synthesis in real time. A zero-mode waveguide (ZMW), a hole less than half the wavelength of light,

limits fluorescent excitation to only a single polymerase together with its template. Consequently,

only fluorescently labelled nucleotides integrated into the growing DNA chain emit signals of

sufficient duration to be read 91.

SMRT sequencers (RSII, Sequel and Sequel II) have fast run times, typically less than three hours, and

the long reads produced can be longer than 80 Kb. The raw base-called error rate is decreasing over

the last years, and is now reduced to < 1% 92. As a major drawback, the high cost per base compared

with Illumina technologies and the massive cost for a PacBio sequencer represent major obstacles for

the implementation of this technology in the clinical microbiology laboratory 93.

ONT sequencing principle is based on the passage of a single stranded DNA in a nanopore over which

a voltage is continuously applied. The current through the nanopore changes depending on which

base is passing through it. Such changes can be processed and translated to obtain the sequence of

the DNA molecule that passes through the pore 94. The MinION is the main ONT device, it’s a small

and portable sequencer that can be used outside of traditional laboratories. Its throughput is up to

30 Gb per run, and it can produce reads longer than 200 Kb. The raw base-called error rate is claimed

to have been reduced to < 5% for nanopore sequences 95. An important feature of the MinION

sequencer is that the output can be analysed during its generation. This allows strain identification

26

within 30 minutes and prediction of the antibiotic resistance profile within 10 hours after the start of

a run 89.

Figure 8. Overview of the three generations of sequencing technologies, with examples of the major sequencing platforms

96.

1.4 Aims

Antimicrobial resistance is a severe threat to public health worldwide, leading to growing costs,

treatment failure, morbidity and mortality. Nowadays, the antibiotic resistance level of bacterial

strains can be assessed by simple, mostly culture-based clinical AST methods. Although the classic

tests are reliable, they require extensive manual laboratory work and results are normally obtained

after several days only. WGS is a high-throughput DNA sequencing strategy that can produce a large

amount of data in a single reaction. WGS could potentially reduce the turnaround time for laboratory

results and allow clinically actionable information to be obtained sooner than traditional laboratory

diagnostic tests. However, translating genomic information to AST results is challenging. Moreover,

WGS allows for high resolution epidemiologic investigations, fundamental to track the spread and

the evolution of novel ‘high-risk’ clones.

This research project focuses on the use of WGS in order to study collections of MDR strains obtained

from countries with high AMR rates. The general aim is to study the AMR mechanisms at the

genomic level, with particular focus on last line drugs, such as colistin, and to perform

27

epidemiological investigations about the nosocomial spread focusing mainly on clinical A. baumannii

and K. pneumoniae strains.

The research was part of an initiative to define new diagnostic routing in infectious disease under the

name of ND4ID (Novel Diagnostics for Infectious Diseases). This project received funding from the

European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie

grant agreement No 675412.

The specific aims of this thesis are:

1. To investigate the genetic mechanisms of colistin resistance in K. pneumoniae (CHAPTER2)

and A. baumannii (CHAPTER 3) from two countries facing high AMR levels. Resistance

mechanism analysis of other antimicrobials, plasmid analysis and genomic epidemiology

investigations were also performed.

2. To study the population of K. pneumoniae isolates collected over a 15-year period in the

Beijing hospital H301 (CHAPTER 4). WGS was employed to decipher the genomic

epidemiology, the AMR and virulence determinants, as well as the emergence of novel ‘high-

risk’ clones, characterized by hyper-virulence and MDR.

3. To build and evaluate a machine learning algorithm for the prediction of antimicrobial

susceptibilities from genomic data (CHAPTER 5). To test the algorithm performances for the

phenotype prediction of K. pneumoniae genomes.

4. To perform classical molecular and enzymology techniques for the cloning, expression and

enzymatic activity testing of a novel carbapenemase. WGS was employed to detect the

putative determinant of carbapenem resistance and its genetic environment and to perform

phylogenetic analysis (CHAPTER 6).

1.5 References

1. O’Neill J. Review on Antimicrobial Resistance. Antimicrobial Resistance: Tackling a Crisis for the

Health and Wealth of Nations, 2014. 2014; 4.

2. Davies J, Davies D. Origins and evolution of antibiotic resistance. Microbiol Mol Biol rev 2010; 74:

417–33.

3. Aarestrup FM, Wegener HC, Collignon P. Resistance in bacteria of the food chain: Epidemiology

and control strategies. Expert Rev Anti Infect Ther 2008; 6: 733–50.

4. Rice LB. The clinical consequences of antimicrobial resistance. Curr Opin Microbiol 2009; 12: 476–

81.

28

5. Drlica K, Perlin DS. Antibiotic Resistance: Understanding and Responding to an Emerging Crisis.

Emerg Infect Dis 2011; 17: 1984–1984.

6. Partridge SR, Kwong SM, Firth N, Jensen SO. Mobile Genetic Elements Associated with

Antimicrobial Resistance. Clin Microbiol Rev 2018; 31: 1–61.

7. Munita JM, Arias CA. Mechanisms of Antibiotic Resistance. Microbiol Spectr 2016; 4: 464–72.

8. Nikaido H. Multidrug Resistance in Bacteria. Annu Rev Biochem 2009; 78: 119–46.

9. Erik Gullberg. Selection of Resistance at very low Antibiotic Concentrations. PhD thesis Uppsqle

Univ 2014; ISBN 978-9.

10. Ventola CL. The antibiotic resistance crisis: causes and threats. P T J 2015; 40: 277–83.

11. Rice LB. Federal Funding for the Study of Antimicrobial Resistance in Nosocomial Pathogens: No

ESKAPE. J Infect Dis 2008; 197: 1079–81.

12. Tacconelli E, Carrara E, Savoldi A, et al. Discovery, research, and development of new antibiotics:

the WHO priority list of antibiotic-resistant bacteria and tuberculosis. Lancet Infect Dis 2018; 18:

318–27.

13. Friedlaender C. Ueber die Schizomyceten bei der acuten fibrösen Pneumonie. Arch für Pathol

Anat und Physiol und für Klin Med 1882; 87: 319–24.

14. Podschun R, Ullmann U. Klebsiella spp. as nosocomial pathogens: Epidemiology, taxonomy,

typing methods, and pathogenicity factors. Clin Microbiol Rev 1998; 11: 589–603.

15. Bagley ST. Habitat association of Klebsiella species. Infect Control 1985; 6: 52–8.

16. Bengoechea JA SPJ. Klebsiella pneumoniae infection biology: living to counteract host defences.

FEMS Microbiol Rev 2019; 43(2):123-.

17. Navon-Venezia S, Kondratyeva K, Carattoli A. Klebsiella pneumoniae: a major worldwide source

and shuttle for antibiotic resistance. FEMS Microbiol Rev 2017; 013: 252–75.

18. Meatherall BL, Gregson D, Ross T, Pitout JDD, Laupland KB. Incidence, Risk Factors, and Outcomes

of Klebsiella pneumoniae Bacteremia. Am J Med 2009; 122: 866–73.

19. Tsay RW, Siu LK, Fung CP, Chang FY. Characteristics of bacteremia between community-acquired

and nosocomial Klebsiella pneumoniae infection: Risk factor for mortality and the impact of capsular

serotypes as a herald for community-acquired infection. Arch Intern Med 2002; 162: 1021–7.

29

20. Paczosa MK, Mecsas J. Klebsiella pneumoniae: Going on the Offense with a Strong Defense.

Microbiol Mol Biol Rev 2016; 80: 629–61.

21. Boucher HW, Talbot GH, Bradley JS, et al. Bad Bugs, No Drugs: No ESKAPE! An Update from the

Infectious Diseases Society of America. Clin Infect Dis 2009; 48: 1–12.

22. Liu YC, Cheng DL, Lin CL. Klebsiella pneumoniae Liver Abscess Associated With Septic

Endophthalmitis. Arch Intern Med 1986; 146: 1913–6.

23. Shon AS, Bajwa RPS, Russo TA. Hypervirulent (hypermucoviscous) Klebsiella Pneumoniae: A new

and dangerous breed. Virulence 2013; 4: 107–18.

24. Chen L, Kreiswirth BN. Convergence of carbapenem-resistance and hypervirulence in Klebsiella

pneumoniae. Lancet Infect Dis 2018; 18: 2–3.

25. Arena F, Henrici De Angelis L, D’Andrea MM, et al. Infections caused by carbapenem-resistant

Klebsiella pneumoniae with hypermucoviscous phenotype: A case report and literature review.

Virulence 2017; 8: 1900–8.

26. Jacoby GA, Sutton L. Properties of plasmids responsible for production of extended-spectrum

beta-lactamases. Antimicrob Agents Chemother 1991; 35: 164–9.

27. Sirot J, Chanal C, Petit A, Sirot D, Labia R, Gerbaud G. Klebsiella pneumoniae and other

Enterobacteriaceae producing novel plasmid-mediated β-lactamases markedly active against third-

generation cephalosporins: Epidemiologic studies. Clin Infect Dis 1988; 10: 850–9.

28. Jacoby GA, Medeiros AA, O’brien TF, Pinto ME, Jiang H. Broad-Spectrum, Transmissible β-

Lactamases. N Engl J Med 1988; 319: 723–4.

29. Paterson DL, Bonomo RA. Extended-Spectrum β-Lactamases: a Clinical Update. Clin Microbiol Rev

2005; 18: 657–86.

30. Cantón R, González-Alba JM, Galán JC. CTX-M enzymes: Origin and diffusion. Front Microbiol 2012;

3.

31. Queenan AM, Bush K. Carbapenemases: the versatile beta-lactamases. Clin Microbiol Rev 2007;

20: 440–58.

32. Tzouvelekis LS, Markogiannakis A, Psichogiou M, Tassios PT, Daikos GL. Carbapenemases in

Klebsiella pneumoniae and other Enterobacteriaceae: An evolving crisis of global dimensions. Clin

Microbiol Rev 2012; 25: 682–707.

30

33. Arnold RS, Thom KA, Sharma S, Phillips M, Kristie Johnson J, Morgan DJ. Emergence of Klebsiella

pneumoniae carbapenemase-producing bacteria. South Med J 2011; 104: 40–5.

34. Capone A, Giannella M, Fortini D, et al. High rate of colistin resistance among patients with

carbapenem-resistant Klebsiella pneumoniae infection accounts for an excess of mortality. Clin

Microbiol Infect 2013; 19.

35. Dixon RA, Chopra I. Leakage of periplasmic proteins from Escherichia coli mediated by polymyxin

B nonapeptide. Antimicrob Agents Chemother 1986; 29: 781–8.

36. Li J, Nation RL, Turnidge JD, et al. Colistin: the re-emerging antibiotic for multidrug-resistant

Gram-negative bacterial infections. Lancet Infect Dis 2006; 6: 589–601.

37. Poirel L, Jayol A, Nordmann P. Polymyxins: Antibacterial Activity, Susceptibility Testing, and

Resistance Mechanisms Encoded by Plasmids or Chromosomes. Clin Microbiol Rev 2017; 30: 557–96.

38. Cheng H-Y, Chen Y-F, Peng H-L. Molecular characterization of the PhoPQ-PmrD-PmrAB mediated

pathway regulating polymyxin B resistance in Klebsiella pneumoniae CG43. J Biomed Sci 2010; 17: 60.

39. Cannatelli A, D’Andrea MM, Giani T, et al. In vivo emergence of colistin resistance in Klebsiella

pneumoniae producing KPC-type carbapenemases mediated by insertional inactivation of the

PhoQ/PhoP mgrB regulator. Antimicrob Agents Chemother 2013; 57: 5521–6.

40. Cannatelli A, Giani T, D’Andrea MM, et al. MgrB inactivation is a common mechanism of colistin

resistance in KPC-producing klebsiella pneumoniae of clinical origin. Antimicrob Agents Chemother

2014; 58: 5696–703.

41. Wright MS, Suzuki Y, Jones MB, et al. Genomic and transcriptomic analyses of colistin-resistant

clinical isolates of Klebsiella pneumoniae reveal multiple pathways of resistance. Antimicrob Agents

Chemother 2015; 59: 536–43.

42. Liu YY, Wang Y, Walsh TR, et al. Emergence of plasmid-mediated colistin resistance mechanism

MCR-1 in animals and human beings in China: A microbiological and molecular biological study.

Lancet Infect Dis 2016; 16: 161–8.

43. Xavier BB, Lammens C, Ruhal R, et al. Identification of a novel plasmid-mediated colistin-

resistance gene, mcr-2, in Escherichia coli, Belgium, June 2016. Euro Surveill 2016; 21: 30280.

44. Wenjuan Yin A, Hui Li, a Yingbo Shen, a Zhihai Liu A, Shaolin Wang A, et al. Novel Plasmid-

Mediated Colistin Resistance Gene mcr-3 in Escherichia coli. MBio 2017.

31

45. Carattoli A, Villa L, Feudi C, et al. Novel plasmid-mediated colistin resistance mcr-4 gene in

Salmonella and Escherichia coli , Italy 2013, Spain and Belgium, 2015 to 2016. Eurosurveillance 2017;

22: 30589.

46. Borowiak M, Fischer J, Hammerl JA, Hendriksen RS, Szabo I, Malorny B. Identification of a novel

transposon-associated phosphoethanolamine transferase gene, mcr-5, conferring colistin resistance

in d-tartrate fermenting Salmonella enterica subsp. enterica serovar Paratyphi B. J Antimicrob

Chemother 2017: 3317–24.

47. Yang Y, Li Y, Lei C, Zhang A, Wang H. Novel plasmid-mediated colistin resistance gene mcr-7.1 in

Klebsiella pneumoniae. 2018: 5–9.

48. Wang X, Wang Y, Zhou Y, et al. Emergence of a novel mobile colistin resistance gene , mcr-8 , in

NDM-producing Klebsiella pneumoniae. Emerg Microbes Infect 2018: 1–9.

49. Lima WG, Alves MC, Cruz WS, Paiva MC. Chromosomally encoded and plasmid-mediated

polymyxins resistance in Acinetobacter baumannii: a huge public health threat. Eur J Clin Microbiol

Infect Dis 2018; 37: 1009–19.

50. Carroll LM, Gaballa A, Guldimann C, Sullivan G, Henderson LO, Wiedmann M. Identification of

novel mobilized colistin resistance gene mcr-9 in a multidrug-resistant, colistin-susceptible

Salmonella enterica serotype typhimurium isolate. MBio 2019; 10.

51. Russo TA, Marr CM. Hypervirulent Klebsiella pneumoniae. Clin Microbiol Rev 2019; 32: 1–42.

52. Ko WC, Paterson DL, Sagnimeni AJ, et al. Community-acquired Klebsiella pneumoniae bacteremia:

Global differences in clinical patterns. Emerg Infect Dis 2002; 8: 160–6.

53. Wang J, Chen K, Fang C, Hsueh P, Yang P, Chang S. Changing Bacteriology of Adult Community‐

Acquired Lung Abscess in Taiwan: Klebsiella pneumoniae versus Anaerobes . Clin Infect Dis 2005; 40:

915–22.

54. Kabha K, Nissimov L, Athamna A, et al. Relationships among capsular structure, phagocytosis, and

mouse virulence in Klebsiella pneumoniae. Infect Immun 1995; 63: 847–52.

55. Lam MMC, Wick RR, Wyres KL, et al. Genetic diversity, mobilisation and spread of the

yersiniabactin-encoding mobile element ICEKp in Klebsiella pneumoniae populations. Microb

Genomics 2018; 4.

56. Lam MMC, Wyres KL, Judd LM, et al. Tracking key virulence loci encoding aerobactin and

32

salmochelin siderophore synthesis in Klebsiella pneumoniae. Genome Med 2018; 10: 77.

57. Holt KE, Wertheim H, Zadoks RN, et al. Genomic analysis of diversity, population structure,

virulence, and antimicrobial resistance in Klebsiella pneumoniae, an urgent threat to public health.

Proc Natl Acad Sci 2015; 112: E3574–81.

58. Russo TA, Olson R, Fang CT, et al. Identification of biomarkers for differentiation of hypervirulent

Klebsiella pneumoniae from classical K. pneumoniae. J Clin Microbiol 2018; 56.

59. Gu D, Dong N, Zheng Z, et al. A fatal outbreak of ST11 carbapenem-resistant hypervirulent

Klebsiella pneumoniae in a Chinese hospital: A molecular epidemiological study. Lancet Infect Dis

2017; 18: 37–46.

60. Peleg AY, Seifert H, Paterson DL. Acinetobacter baumannii: Emergence of a Successful Pathogen.

Clin Microbiol Rev 2008; 21: 538–82.

61. Vincent JL, Rello J, Marshall J, et al. International study of the prevalence and outcomes of

infection in intensive care units. JAMA - J Am Med Assoc 2009; 302: 2323–9.

62. Jawad A, Seifert H, Snelling AM, Heritage J, Hawkey PM. Survival of Acinetobacter baumannii on

dry surfaces: Comparison of outbreak and sporadic isolates. J Clin Microbiol 1998; 36: 1938–41.

63. Héritier C, Poirel L, Fournier PE, Claverie JM, Raoult D, Nordmann P. Characterization of the

naturally occurring oxacillinase of Acinetobacter baumannii. Antimicrob Agents Chemother 2005; 49:

4174–9.

64. Brown S, Young HK, Amyes SGB. Characterisation of OXA-51, a novel class D carbapenemase

found in genetically unrelated clinical strains of Acinetobacter baumannii from Argentina. Clin

Microbiol Infect 2005; 11: 15–23.

65. Poirel L, Nordmann P. Carbapenem resistance in Acinetobacter baumannii: mechanisms and

epidemiology. Clin Microbiol Infect 2006; 12: 826–36.

66. Doi Y, Wachino J ichi, Arakawa Y. Aminoglycoside Resistance: The Emergence of Acquired 16S

Ribosomal RNA Methyltransferases. Infect Dis Clin North Am 2016; 30: 523–37.

67. Carretero-Ledesma M, García-Quintanilla M, Martín-Peña R, Pulido MR, Pachón J, McConnell MJ.

Phenotypic changes associated with colistin resistance due to lipopolysaccharide loss in

Acinetobacter baumannii. Virulence 2018; 9: 930–42.

68. Lucas DD, Crane B, Wright A, et al. Emergence of high-level colistin resistance in an Acinetobacter

33

baumannii clinical isolate mediated by inactivation of the global regulator H-NS. AAC 2018; 30: 1–17.

69. Trebosc V, Gartenmann S, Tötzl M, et al. Dissecting Colistin Resistance Mechanisms in Extensively

Drug-Resistant Acinetobacter baumannii Clinical Isolates. MBio 2019; 10.

70. Gerson S, Betts JW, Lucaßen K, et al. Investigation of Novel pmrB and eptA Mutations in Isogenic

Acinetobacter baumannii Isolates Associated with Colistin Resistance and Increased Virulence in vivo .

Antimicrob Agents Chemother 2019; 63: 1–15.

71. Potron A, Vuillemenot J-B, Puja H, et al. ISAba1-dependent overexpression of eptA in clinical

strains of Acinetobacter baumannii resistant to colistin. J Antimicrob Chemother 2019; 74: 2544–50.

72. Didelot X, Bowden R, Wilson DJ, Peto TEA, Crook DW. Transforming clinical microbiology with

bacterial genome sequencing. Nat Rev Genet 2012; 13: 601–12.

73. Fröding I, Vondracek M, Giske CG. Rapid EUCAST disc diffusion testing of MDR Escherichia coli

and Klebsiella pneumoniae: Inhibition zones for extended-spectrum cephalosporins can be reliably

read after 6 h of incubation. J Antimicrob Chemother 2017; 72: 1094–102.

74. Jonasson E, Matuschek E, Kahlmeter G. The EUCAST rapid disc diffusion method for antimicrobial

susceptibility testing directly from positive blood culture bottles. J Antimicrob Chemother 2020; 75:

968–78.

75. van Belkum A, Burnham C-AD, Rossen JWA, Mallard F, Rochas O, Dunne WM. Innovative and

rapid antimicrobial susceptibility testing systems. Nat Rev Microbiol 2020.

76. Pallen MJ, Loman NJ, Penn CW. High-throughput sequencing and clinical microbiology: Progress,

opportunities and challenges. Curr Opin Microbiol 2010; 13: 625–31.

77. Köser CU, Ellington MJ, Cartwright EJP, et al. Routine Use of Microbial Whole Genome

Sequencing in Diagnostic and Public Health Microbiology. PLoS Pathog 2012; 8.

78. Fricke WF, Rasko D a. Bacterial genome sequencing in the clinic: bioinformatic challenges and

solutions. Nat Rev Genet 2014; 15: 49–55.

79. Dunne Jr WM, Jaillard M, Rochas O, Van Belkum A. Microbial genomics and antimicrobial

susceptibility testing. Expert Rev Mol Diagn 2017; 17: 257–69.

80. Balloux F, Brynildsrud OB, Van Dorp L, et al. From Theory to Practice: Translating Whole-Genome

Sequencing (WGS) into the Clinic. Trends Microbiol 2018; xx: 1–14.

34

81. McArthur AG, Tsang KK. Antimicrobial resistance surveillance in the genomic age. Ann N Y Acad

Sci 2017; 1388: 78–91.

82. Su M, Satola SW, Read TD. Genome-based prediction of bacterial antibiotic resistance. J Clin

Microbiol 2019; 57: 1–15.

83. Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet

2015; 16: 321–32.

84. Jiang Y, Wei Z, Wang Y, Hua X, Feng Y, Yu Y. Tracking a hospital outbreak of KPC-producing ST11

Klebsiella pneumoniae with whole genome sequencing. Clin Microbiol Infect 2015; 21: 1001–7.

85. Sheppard AE, Stoesser N, Wilson DJ, et al. Nested Russian doll-like genetic mobility drives rapid

dissemination of the carbapenem resistance gene blaKPC. Antimicrob Agents Chemother 2016; 60:

3767–78.

86. Yang S, Hemarajata P, Hindler J, et al. Evolution and Transmission of Carbapenem-Resistant

Klebsiella pneumoniae Expressing the blaOXA-232 Gene During an Institutional Outbreak Associated

With Endoscopic Retrograde Cholangiopancreatography. Clin Infect Dis 2017; 64: 894–901.

87. Fitzpatrick MA, Ozer EA, Hauser AR. Utility of Whole-Genome Sequencing in Characterizing

Acinetobacter Epidemiology and Analyzing Hospital Outbreaks. J Clin Microbiol 2016; 54: 593–612.

88. Quainoo S, Coolen JPM, van Hijum SAFT, et al. Whole-genome sequencing of bacterial pathogens:

The future of nosocomial outbreak analysis. Clin Microbiol Rev 2017; 30: 1015–63.

89. Schürch AC, van Schaik W. Challenges and opportunities for whole-genome sequencing–based

surveillance of antibiotic resistance. Ann N Y Acad Sci 2017; 1388: 108–20.

90. Schirmer M, Ijaz UZ, D’Amore R, Hall N, Sloan WT, Quince C. Insight into biases and sequencing

errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res 2015; 43: e37.

91. Eid J, Fehr A, Gray J, et al. Real-time DNA sequencing from single polymerase molecules. Science

(80- ) 2009; 323: 133–8.

92. Wenger AM, Peluso P, Rowell WJ, et al. Accurate circular consensus long-read sequencing

improves variant detection and assembly of a human genome. Nat Biotechnol 2019; 37: 1155–62.

93. Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q. Opportunities and challenges in

long-read sequencing data analysis. Genome Biol 2020; 21: 30.

35

94. Deamer D, Akeson M, Branton D. Three decades of nanopore sequencing. Nat Biotechnol 2016;

34: 518–24.

95. Jain M, Koren S, Miga KH, et al. Nanopore sequencing and assembly of a human genome with

ultra-long reads. Nat Biotechnol 2018; 36: 338–45.

96. Loman NJ, Pallen MJ. Twenty years of bacterial genome sequencing. Nat Rev Microbiol 2015; 13:

787–94.

36

CHAPTER 2 : Genomic epidemiology of carbapenem- and colistin-

resistant Klebsiella pneumoniae isolates from Serbia: predominance of

ST101 strains carrying a novel OXA-48 plasmid

Mattia Palmieri1, Marco Maria D’Andrea2,3, Andreu Coello Pelegrin1, Caroline Mirande4, Snezana

Brkic5, Ivana Cirkovic6, Herman Goossens7, Gian Maria Rossolini8,9, Alex van Belkum1

1bioMérieux, Data Analytics Unit, La Balme Les Grottes, France.

2Department of Biology, University of “Tor Vergata”, Rome, Italy.

3Department of Medical Biotechnologies, University of Siena, Siena, Italy.

4bioMérieux, R&D Microbiology, La Balme Les Grottes, France.

5Institute for Laboratory Diagnostics Konzilijum, Belgrade, Serbia.

6Institute of Microbiology and Immunology, Faculty of Medicine, University of Belgrade, Serbia.

7Laboratory of Medical Microbiology, Vaccine and Infectious Disease Institute, University of Antwerp,

Belgium.

8Microbiology and Virology Unit, Florence Careggi University Hospital, Florence, Italy.

9Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy

Published in Frontiers in Microbiology, 21 February 2020, doi: 10.3389/fmicb.2020.00294

37

2.1 Abstract

Klebsiella pneumoniae is a major cause of severe healthcare-associated infections and often shows

MDR phenotypes. Carbapenem resistance is frequent, and colistin represents a key molecule to treat

infections caused by such isolates. Here we evaluated the antimicrobial resistance mechanisms and

the genomic epidemiology of clinical K. pneumoniae isolates from Serbia. Consecutive non-replicate

K. pneumoniae clinical isolates (n=2,298) were collected from seven hospitals located in five Serbian

cities and tested for carbapenem resistance by disk diffusion. Isolates resistant to at least one

carbapenem (n=426) were further tested for colistin resistance with Etest or Vitek2. Broth

microdilution (BMD) was performed to confirm the colistin resistance phenotype, and colistin-

resistant isolates (N=45, 10.6%) were characterized by Vitek2 and whole genome sequencing. Three

different clonal groups (CGs) were observed: CG101 (ST101, N=38), CG258 (ST437, N=4; ST340, N=1;

ST258, N=1) and CG17 (ST336, N=1). mcr genes, encoding for acquired colistin resistance, were not

observed, while all the genomes presented mutations previously associated with colistin resistance.

In particular, all strains had a mutated MgrB, with MgrBC28S being the prevalent mutation and

associated with ST101. Isolates belonging to ST101 harbored the carbapenemase OXA-48, which is

generally encoded by an IncL/M plasmid that was no detected in our isolates. MinION sequencing

was performed on a representative ST101 strain, and the obtained long reads were assembled

together with the Illumina high quality reads to decipher the blaOXA-48 genetic background. The blaOXA-

48 gene was located in a novel IncFIA-IncR hybrid plasmid, also containing the extended spectrum β-

lactamase-encoding gene blaCTX-M-15 and several other antimicrobial resistance genes. Non-ST101

isolates presented different MgrB alterations (C28S, C28Y, K2*, K3*, Q30*, adenine deletion leading

to frameshift and premature termination, IS5-mediated inactivation) and expressed different

carbapenemases: OXA-48 (ST437 and ST336), NDM-1 (ST437 and ST340) and KPC-2 (ST258). Our

study reports the clonal expansion of the newly emerging ST101 clone in Serbia. This high-risk clone

appears adept at acquiring resistance, and efforts should be made to contain the spread of such

clone.

2.2 Introduction

Klebsiella pneumoniae has emerged as one of the most challenging antibiotic-resistant pathogens,

since it can cause a variety of infections, including pneumonia and bloodstream infections, and

exhibits a remarkable propensity to acquire antimicrobial resistance (AMR) traits. In particular,

carbapenem-resistant K. pneumoniae (CRKP) are challenging pathogens due to the limited treatment

options, high mortality rates, and potential for rapid dissemination in health care settings (Paczosa

and Mecsas, 2016).

38

Treatment options for CRKP infections are usually limited to aminoglycosides, tigecycline, fosfomycin

and colistin. Novel β-lactam-β-lactamase inhibitors combinations, such as ceftazidime-avibactam and

meropenem-vaborbactam, have represented a major breakthrough for treatment of some CRKP (e. g.

those producing KPC-type and OXA-48-like enzymes), but unfortunately they do not cover strains

producing metallo-carbapenemases (Bassetti et al., 2018). Colistin, despite its nephrotoxicity and

neurotoxicity, remains a key component of some anti-CRKP regimens (Karaiskos et al., 2017).

Colistin resistance (colR) is mainly mediated by modifications of the lipid A moiety of the bacterial

lipopolysaccharide (LPS) by addition of positively charged 4-amino-4-deoxy-L-arabinose (LAra4N)

and/or phosphoethanolamine (pEtN) residues. A large panel of genes and operons is involved in

modifications of the LPS, and mutations conferring colistin resistance have mainly been observed in

mgrB, phoP/phoQ, pmrA/pmrB, and crrB genes (Cheng et al., 2010; Cannatelli et al., 2013, 2014a;

Wright et al., 2015). Recently, several plasmid-mediated colistin resistance genes, named mcr,

encoding pEtN transferases, have also been reported in E. coli and other members of

Enterobacterales, including K. pneumoniae (Sun et al., 2018).

Global dissemination of CRKP is mainly caused by the spread of a few successful clones. Major

representatives of these high-risk clonal lineages include the Clonal Group (CG) 11, CG15, CG307,

CG17, CG37, CG101 and CG147 strains. CG258 strains, and in particular those of ST258, are major

players in the worldwide spread of KPC-type carbapenemases, and are responsible for 68% of the

CRKP outbreaks (Navon-Venezia et al., 2017). CG101 strains harbor different clinically-relevant

resistance determinants, such as carbapenemases of the KPC, OXA-48, VIM and NDM types, and

virulence genes, such as an integrative conjugative element carrying the yersiniabactin siderophore

(ICEKp3), the fimbriae cluster (mrkABCDFHIJ), the ferric uptake system (kfuABC), a capsular K type

K17, and an O antigen type of O1 (Roe et al., 2019). These features, together with the ability to

produce biofilm, are likely major factors in the ecological success of CG101 strains. Indeed, spreading

of this clone is on the rise (Navon-Venezia et al., 2017).

Multidrug resistance (MDR) prevalence in clinical isolates of K. pneumoniae, including resistance to

third-generation cephalosporins, fluoroquinolones and aminoglycosides, may be as high as 50% in

Southern Europe, and even higher proportions have been observed in Eastern Europe. In Serbia, in

2016, MDR K. pneumoniae accounted for 63% of all K. pneumoniae infections in humans, of which

35% were also carbapenem resistant (WHO Regional Office for Europe, 2017). Previous studies

reported that NDM-1 was the main K. pneumoniae-associated carbapenemase observed in Serbia in

the period 2013-2014 followed by OXA-48, while KPC was only sporadically reported (Grundmann et

al., 2017; Trudic et al., 2017). Novović et al. performed a molecular epidemiology study of

39

carbapenem- and colistin-resistant strains from Serbia, showing prevalence of CG258 and CG101

strains, producing NDM-1 and OXA-48 carbapenemases, respectively. However, the proportion of

colistin resistance among those isolates was not reported, and the mechanisms of colistin resistance

of those isolates were not elucidated (Novović et al., 2017).

In this study, we used whole genome sequencing (WGS) to study the genomic epidemiology and

antimicrobial resistance mechanisms of colR K. pneumoniae isolates from Serbia, including some

representative of the previously mentioned collection as reference to study the dynamic changes of

population structure (Novović et al., 2017).

2.3 Materials and methods

Bacterial isolates and susceptibility testing. In the period between November 2013 and May 2017, K.

pneumoniae isolates were obtained from routine microbiological cultures of clinical samples (e.g.

urine, blood, skin, bronchial aspirate) from seven Serbian medical centers distributed in five Serbian

cities (Niš, Novi Sad, Belgrade, Kraljevo and Subotica). Bacteria were not isolated by the authors but

provided by the respective medical centers. Therefore, an ethics approval was not required as per

institutional and national guidelines and regulations. Information about patients antimicrobial

treatment were not available. Identification at the species level was performed by MALDI-TOF MS

(Vitek MS, bioMérieux, Marcy l’Etoile, France), and carbapenem susceptibility was determined by

disk diffusion and interpreted according to the EUCAST breakpoints (EUCAST, 2019). Isolates non-

susceptible to at least one carbapenem (ertapenem, meropenem and imipenem) were tested for

colistin resistance by Vitek2 or Etest (bioMérieux, Marcy l’Etoile, France) according to manufacturer’s

instructions (note that the warning by EUCAST about colistin susceptibility testing was only issued in

July 2016, and for this reason the above methods were used for colistin susceptibility testing of the

isolates collected in this study). Antimicrobial susceptibility testing of the colR isolates was

performed using the Vitek2 automated system, and results were interpreted according to EUCAST

breakpoints (EUCAST, 2019). Colistin minimum inhibitory concentrations (MICs) were confirmed

using the broth microdilution method performed according to the CLSI guidelines (CLSI, 2019) and

interpreted by using the EUCAST breakpoints (EUCAST, 2019). For carbapenems (ertapenem,

imipenem and meropenem), MICs were obtained by using Etests (bioMérieux, Marcy l’Etoile, France).

To note, 25 colR isolates were from the previously described collection by Novović et al., and were

included in this study for comparative purposes.

Mass spectrometry analysis of lipid A. Preparations of lipid A were obtained as previously described

(Kocsis et al., 2017). An aliquot of 0.7 µL of each preparation was spotted on a matrix-assisted laser

desorption/ionization–time of flight mass spectrometry (MALDI-TOF MS) sample plate, mixed with an

40

isovolume of norharmane matrix (Sigma-Aldrich, St Louis, Missouri) and then air-dried. Samples were

analyzed with a Vitek MS instrument (bioMérieux, Marcy l’Étoile, France) in the negative-ion mode.

DNA extraction and Whole Genome Sequencing. Genomic DNA was extracted with the DNeasy

UltraClean kit (Qiagen, Hilden, Germany), quantified by using the Qubit fluorometer (Thermo Fisher

Scientific, USA) and quality checked by using the 260/280 ratio absorbance parameter as determined

by the DS-11 FX + instrument (DeNovix, Wilmington, USA). Sequencing was performed using a

NextSeq platform (Illumina, Inc., San Diego, USA) and a 2x150 bp paired-end approach. Raw data

from paired-end sequencing were quality checked with the FastQC tool (v.0.11.6) and assembled

with SPAdes (v.3.10.1)(Bankevich et al., 2012). One representative strain (KB-2017-139) was also

sequenced with the MinION sequencer (ONT, Oxford, UK) using an R9.5.1 flow cell and the protocol

1D Genomic DNA by Ligation (SQK-LSK109). Illumina and Nanopore raw data from KB-2017-139 were

assembled with a hybrid approach using Unicycler (Wick et al., 2017). Whole genome sequencing

data of the 45 clinical isolates have been deposited under BioProject PRJNA449293

(www.ncbi.nlm.nih.gov/bioproject/PRJNA449293). The complete sequence of the plasmid

pSRB_OXA-48 obtained by Illumina and Nanopore sequencing was deposited on GenBank under

accession number MN218814.

Bioinformatics analysis. MLST was performed in silico by using the tool mlst

(https://github.com/tseemann/mlst) and the Pasteur database (https://bigsdb.pasteur.fr/). BLAST+

(2.7.1) was used to detect mutations in genes potentially involved in colistin resistance (mgrB,

pmrA/B, phoP/Q, crrA/B), and only mutations leading to amino acid variations were considered. For

the characterization of colistin resistance mechanisms, strains of CG258, ST101 and ST336 were

compared to colistin susceptible reference strains of the same CG, i. e. NJST258_2 (accession no.

NZ_CP006918.1), BA33875 (NEWA00000000) and MGH-78578 (NC_009648.1), respectively.

Phylogenetic relatedness was investigated with the parsnp tool (v1.2) (Treangen et al., 2014) by using

default parameters and the strain NTUH-K2044 (accession no. NC_012731.1) as reference. The

phylogenetic tree obtained was visualized with the online tool iTol (Letunic and Bork, 2016). The

ABRicate tool (https://github.com/tseemann/abricate) was used to detect acquired antimicrobial

resistance genes using the ResFinder database (Zankari et al., 2012), while plasmid replicons were

predicted by PlasmidFinder (Carattoli et al., 2014). Kaptive was used for the capsular type detection

(Wyres et al., 2016). Comparative analysis of plasmids was performed with BLAST Ring Image

Generator (Alikhan et al., 2011) and Easyfig (Sullivan et al., 2011).

For the comparative genomic analysis of ST101 isolates, on 31 October 2018 all the K. pneumoniae

genomes available on NCBI (N=5,820) were downloaded with the ncbi-genome-download tool

41

(https://github.com/kblin/ncbi-genome-download). MLST was performed and all ST101 (N=195)

(Table S2) together with ST101 strains from this study were used for phylogenetic investigations by

using parsnp and the closed ST101 chromosome from Kp_Goe_121641 (accession no.

NZ_CP018735.1) as reference.

2.4 Results

K. pneumoniae isolates and antimicrobial susceptibilities. In the period between November 2013

and May 2017, a total of 2,298 clinical isolates of K. pneumoniae were isolated from patients

admitted to seven medical settings located in five Serbian cities. Among those, 426 isolates (18.5%)

were non-susceptible to at least one carbapenem by disk diffusion, and were tested for colistin

resistance. A total of 45 strains (10.6%) out of this subset showed a colistin resistant phenotype. At

the time of the collection, colistin susceptibility testing was routinely performed with the Vitek2

instrument or Etest, although these methods had several limitations (Tan and Ng, 2007). Thus, the

number of colR isolates may be underestimated.

All the strains were confirmed as colistin resistant by the broth microdilution method (considering

the EUCAST susceptibility breakpoint of 2 mg/L) with MICs that ranged between 8 and 32 mg/L

(Table S1). Etest results for carbapenemes showed that all the strains were resistant to ertapenem,

while meropenem and imipenem had susceptibility rates of 93.3% and 91.1%, respectively. Vitek2

results showed that none of the fluoroquinolones, penicillins combined with β-lactamase inhibitors

and cephalosporins (including cefoxitin and the 4th generation cephalosporin cefepime) were

effective against the 45 colR isolates. Conversely, amikacin (86% susceptibility) and

trimethoprim/sulfamethoxazole (78% susceptibility) were the most active agents together with

imipenem and meropenem (Table S1).

Genomic epidemiology. Genome sequence data were used to investigate the population structure of

the colR K. pneumoniae strains circulating in Serbia. Five different STs were detected among the

investigated collection (ST101, ST437, ST258, ST336 and ST340), with the majority of strains

belonging to ST101 (N=38) or CG258 (ST258, N=1; ST340, N=1 and ST437, N=4) (Figure 1). The

remaining strain belonged to CG17 and was typed as ST336. Isolates of ST101 were closely related to

each other (single nucleotide polymorphism (SNP) variation: 5–893, mean 107, median 61), with only

two of them (i. e. KV-2017-142 and KV-2017-143) having more than 200 SNPs when compared to

other ST101 isolates and to each other. The ST101 isolates were detected in all the cities involved in

this study, except Niš, thus demonstrating the endemicity at the national level of this clone.

Moreover, there was not a clear clustering of isolates obtained from different hospitals, suggesting

inter-hospital cross infections.

42

Figure 1. Phylogenetic tree of the colR K. pneumoniae isolates from Serbia. For each isolate, the medical setting (CN, Clinical center of Niš, Niš; CV, Clinical center of Vojvodina, Novi Sad; KB, Konzilijum, Belgrade; DM, University hospital center “Dr Dragiša Mišovic-Dedinje”, Belgrade; KV, The General hospital “Studenica”, Kraljevo; GZ, The Institute of Public health of Belgrade, Belgrade; SU, General Hospital Subotica, Subotica), the year of isolation and the sample number are reported. Colored nodes indicate MLST, while the presence/absence of ESBLs, carbapenemases, resistance genes (black) and plasmid replicons is indicated by filled boxes.

The genomes of the ST101 Serbian isolates were compared with 195 ST101 genomes available in the

NCBI databases, and their phylogenetic relation is showed in Figure 2. Strains from our study (red

lines) cluster together in the tree in a well-defined branch containing other strains from Serbia,

Slovenia, Turkey and Greece. Overall, the number of SNPs among all analyzed ST101 isolates ranged

between 1 and 1,547 (mean 195, median 135), and two major lineages within this group can be

observed. The majority of SNPs separating these two lineages fell in the cps gene cluster, and this

was consistent with the previous observations that strains of ST101 are characterized by two

different K-loci , KL17 and KL106, associated with wzi alleles 137 and 29, respectively (Roe et al.,

2019). While KL17 is prevalent among ST101 strains, KL106 is less frequent but, interestingly, it is the

second most abundant capsular variant of CG258 (Wyres et al., 2015), reinforcing the hypothesis that

capsular exchange in K. pneumoniae is a common event (Chen et al., 2014; Bowers et al., 2015).

43

All non-ST101 isolates (excluding KB-2015-119) were part of a single monophyletic subclade within

the CG258 (Bowers et al., 2015) and produced different carbapenemases or were carbapenemase

negative (Figure 1), while the remaining isolate of ST336 was a OXA-48-producer and harbored the

KL25 capsular type.

Figure 2. Phylogenetic tree of the ST101 K. pneumoniae isolates from this study (red lines) in comparison to ST101 isolates retrieved from NCBI (black lines). The two types of capsular polysaccharides (KL17 and KL106) are indicated by colored ranges. Two datasets are also present, indicating the type of carbapenemase (inner circle) and the country of origin (outer circle).

Colistin resistance mechanisms. No mcr genes were observed in the genomes of the colR isolates.

Conversely, all of them showed alterations in the PhoP/PhoQ regulator mgrB gene. These alterations

were mainly SNPs, with the majority of ST101 isolates from this study characterized by the mutation

MgrBC28S (N=37; 97.4%). Although different substitutions of the cysteine amino acid at position 28

have already been described (e. g. MgrBC28F and MgrBC28Y), and their role in colistin resistance has

been experimentally demonstrated (Cannatelli et al., 2014b; Olaitan et al., 2014; Cheng et al., 2015;

Wright et al., 2015), the MgrBC28S is first described here. This cysteine residue has been previously

shown to be involved in a key disulfide bond relevant to MgrB function (Lippa and Goulian, 2012),

44

thus its substitution by Serine or by any other amino acid is expected to interfere with the ability to

repress PhoQ, leading to the overexpression of the pmrHFIJKLM operon and to a colistin resistance

phenotype. The isolate CN-2013-099, belonging to ST340, displayed the previously studied MgrBC28Y

substitution (Cheng et al., 2015). Different mutations leading to premature stop codons were MgrBK2*

in the ST101 isolate KV-2017-143, firstly described here, MgrBK3* in the ST437 isolate GZ-2017-145

(Nordmann et al., 2016) and MgrBQ30* in the ST336 strain KB-2015-119 (Nordmann et al., 2016). The

ST258 isolate was characterized by an insertion sequence of the family IS5 which interrupted the

mgrB gene at nucleotide 75. Disruption of the mgrB gene by insertion sequences has been shown as

a common mechanism of colistin resistance in KPC harboring strains (Cannatelli et al., 2014b). Three

ST437 strains were characterized by an adenine deletion within the polyadenine region present from

nucleotide 4 to 9 in mgrB, resulting in a frameshift mutation. Collectively, the results of these

analyses demonstrated that all colistin resistant strains investigated in this study were characterized

by genetic alterations in the mgrB gene.

Other genetic alterations potentially involved in colistin resistance were: PmrAE57G (KB-2015-119,

ST336), PmrBT157P (CCV-2015-105, ST101) and PhoQV446G (CCDM-2017-135, ST258). Among these, only

PmrBT157P was previously reported, and its role in reducing colistin susceptibility was demonstrated

(Jayol et al., 2014). Accordingly, the ST101 isolate CV-2015-105 having PmrBT157P together with

MgrBC28S, showed a colistin MIC 1- to 2-fold higher than isogenic strains carrying only MgrBC28S.

Mass spectrometry of lipid A was performed on a subset of isolates representative of the different

alterations potentially involved in colistin resistance. Compared to the colistin susceptible reference

ATCC11296 strain, colR isolates showed an additional peak at 1,971 m/z resulting from the addition

of a 4-amino-4-deoxy-L-arabinose moiety (131 m/z) to lipid A (peak at 1,840 m/z), as previously

reported (Leung et al., 2017) (results not shown). This supports the role of the observed mutations in

the overexpression of the pmrHFIJKLM operon and consequent lipid A modification, leading to

reduced colistin interactions. Moreover, no addition of pEtN moieties to lipid A were observed,

consistently with the absence of mcr-like genes (Liu et al., 2017).

To note, our findings concerning MgrB alterations differ from those previously reported by Novovic

et al., as they did not detect significant MgrB alterations for most of the isolates. This underline the

importance of using well characterized colistin susceptible reference isolates, as the one used in the

mentioned study was not characterized with reference methods for colistin susceptibility testing

(Mirovic et al., 2012).

Other antibiotic resistance mechanisms. All strains were positive for an ESBL-encoding gene, with

blaCTX-M-15 harbored by all strains except the only ST258, which carried a blaSHV-12 gene. Analysis of the

45

ompK35 gene, encoding a major outer membrane protein, showed that all non-ST258 strains had

deletions leading to frameshift and premature stop codons, while the ompK36 gene was intact in all

the genomes. Outer membrane impermeability most likely explains resistance to cefoxitin (a

cephamycin) and to ertapenem for those isolates negative for a carbapenemase encoding gene

(Ardanuy et al., 1998). Two ST437 and the ST336 isolate harbored the 16S rRNA methylase gene

armA, which confers high level resistance to aminoglycosides. Several other antimicrobial resistance

genes were observed for the following antimicrobial classes: aminoglycosides (presence of aac- ,

aad- , aph- and ant-type modifying enzymes), fluoroquinolones (oqxAB, qnrB1, aac(6’)-Ib-cr, parCS80I,

gyrAS83Y-S83I-D87G-D87N), phenicol (floR, catA1 and catB4 genes), sulfonamide (sul1 and sul2 genes),

tetracycline (tetA and tetD genes) and trimethoprim (dfrA).

Novel IncR/IncFIA OXA-48 plasmid within ST101 isolates. The production of OXA-48 was at the basis

of carbapenem resistance in the K. pneumoniae of ST101 analyzed in this study. For this reason, we

deeply investigated the genetic context of this gene. Spreading of the blaOXA-48-encoding gene among

Enterobacterales is mainly related to the dissemination of a single ~62-kb IncL/M-like conjugative

plasmid (Poirel et al., 2012). However, PlasmidFinder analysis did not detect any IncL/M replicon

among ST101 isolates from Serbia. Therefore, MinION sequencing was performed on one

representative strain (KB-2017-139) with the aim to fully characterize the genomic background of the

blaOXA-48 gene.

The blaOXA-48 gene was located on a plasmid of 83,654 bp, named pSRB_OXA-48, carrying both the

IncR and the IncFIA type replicons, the blaCTX-M-15 and several other antimicrobial resistance genes

(tet(D), aac(6')-Ib-cr, blaOXA-1, catB3-like, aac(3’)-IIa and dfrA14). A BLAST analysis showed that

pSRB_OXA-48 is a hybrid plasmid composed by i) a fragment having 99.7 % identity with the IncFIA-

IncR pKp_Goe_641-1 plasmid (CP018737.1) and carrying the blaCTX-M-15 gene and several other

antimicrobial resistance genes (aac(3)-IIa, catB3, blaOXA-1, aac(6')-Ib-cr, aac(6')-Ib, ant(3'')-Ia, blaOXA-9,

blaTEM-1A, dfrA14), and ii) a fragment identical to the IncL/M plasmid pKp_Goe_641-2 (CP018736.1)

carrying the blaOXA-48 gene (Figure 3). Both these plasmids have been described in K. pneumoniae

strain Kp_Goe_121641 (accession no. NZ_CP018735.1), isolated from a refugee from North Africa

hospitalized in Germany, in 2013. The latter strain belongs to ST101 and has a median of 142 SNPs

(min 134, max 601) compared to the Serbian ST101 isolates from this study. Collectively these results

suggest that pSRB_OXA-48 likely originated by recombination events between two plasmids within

an ST101 strain related to Kp_Goe_121641. In order to elucidate the recombination mechanisms at

the origin of pSRB_OXA-48, we compared this plasmid to pKp_Goe_641-1 and to pRA35

(LN864821.1), an IncL/M plasmid similar to pKp_Goe_641-2 but with an intact structure of the

transposon Tn6237 carrying blaOXA-48 (Beyrouthy et al., 2014) (Figure 3). A detailed analysis showed

46

that pSRB_OXA-48 contained a copy of Tn6237 which was disrupted by a IS26 composite transposon

of 73.7 Kbp sharing similarity with pKp_Goe_641-1. This hypothesis was corroborated by the

presence of 8-bp target site duplication sequences (5’-GCGAATAA-3’) flanking the composite

transposons regions (Figure 4). The results of reads-mapping performed against pSRB_OXA-48 using

Illumina short-reads from the other ST101/OXA-48 strains were consistent with the presence of a

pSRB_OXA-48-related plasmid in all the ST101/OXA-48 isolates. Non-ST101 OXA-48 strains (ST336

KB-2015-119 and ST437 GZ-2017-145) had the IncL/M replicon, while lacking the IncFIA and IncR

replicons, suggesting that the blaOXA-48 gene was located in a classic IncL/M plasmid and not in a

pSRB_OXA-48-like plasmid (Figure 1).

Figure 3. BLAST ring image generator output of the OXA-48 plasmid pSRB_OXA-48 from the ST101 isolate KB-2017-139 (violet) against the two major plasmids from the ST101 isolate Kp_Goe_1216141 (pKp_Goe_641-1, in red and pKp_Goe_641-2 in green). Only identities >95% are indicated. Antimicrobial resistance genes are indicated in red, plasmid replicons in blue and all other genes in black.

47

Figure 4. Comparison of plasmids pSRB_OXA-48, pKpGoe_641-1 and pRA35. Antimicrobial resistance genes, plasmid replicons and mobile elements are also indicated. TSD: target site duplication.

2.5 Discussion

This study exploited WGS to characterize a collection of colR CRKP isolates obtained from seven

medical settings and five Serbian cities over a nearly four-year period. Results showed that all the

isolates presented alterations in the PhoP/PhoQ regulator MgrB, confirming its major role in colR in K.

pneumoniae. Lipid A alterations associated with colR were also studied with MALDI-TOF MS. The

analysis revealed the addition of a 4-amino-4-deoxy-L-arabinose moiety to lipid A, but no addition of

pEtN moieties, for all isolates tested. These results support the role of the MgrB mutations in colistin

resistance, and also confirm the absence of mcr-like genes.

The predominant ST observed was ST101, an emerging high-risk clone detected worldwide and

associated with different carbapenemases and high mortality (Navon-Venezia et al., 2017; Can et al.,

2018). In a recent European survey of CRKP isolates, including 244 hospitals in 32 countries, four

major clonal lineages accounted for roughly 70% of the carbapenemase-producing isolates, including

ST 11, 15, 101, 258/512 and their derivatives (David et al., 2019). The first ST101 strain from Serbia

was isolated in 2013, and coproduced the OXA-48 and the NDM-1 carbapenemases (Seiffert et al.,

2014). Most of the colR ST101 from this study were carbapenemase-producers, and OXA-48 was the

only carbapenemase expressed. ST101/OXA-48 has been frequently reported, and in an 11-year

epidemiology study of OXA-48 producers among European and north- African countries, a quarter of

the OXA-48 K. pneumoniae isolates belonged to ST101 (Potron et al., 2013). Outbreaks of

ST101/OXA-48 were also described, with reports from Spain (Pitart et al., 2011; Cubero et al., 2015),

Algeria (Loucif et al., 2016), Czech Republic (Skálová et al., 2016) and Greece (Avgoulea et al., 2018).

The challenging phenotypic detection of OXA-48 carbapenemases and the rapid horizontal transfer

of OXA-48-encoding plasmids favor hospital outbreaks linked to patient transfer (Skálová et al., 2016)

and draw attention to the need for continuous and meticulous surveillance, as well as timely

investigation.

48

The blaOXA-48 gene spread is mainly related to the dissemination of a single ~62-kb IncL/M-like

conjugative plasmid that does not carry additional resistance determinants (Poirel et al., 2012).

Conversely, ST101/OXA-48 isolates from this study carried a novel hybrid plasmid (pSRB_OXA-48)

with replicons IncR and IncFIA and encoding OXA-48, the CTX-M-15 ESBL and several other

antimicrobial resistance genes. Such plasmids confer an MDR phenotype which limits the use of most

β-lactams, including carbapenems. In fact, even if most isolates (91%) were susceptible to imipenem,

carbapenems have been proven to be not effective in an in vivo murine model (Wiskirchen et al.,

2014). Moreover, there have been a number of case reports and series describing treatment failures

with carbapenem-containing regimens in the treatment of OXA-48-producing bacterial infections

(Stewart et al., 2018). Ceftazidime-avibactam may represent an effective alternative against such

isolates, as previously reported (Kazmierczak et al., 2018).

Similarities among the Serbian ST101 strains, supported by the limited number of SNPs observed and

the presence of the same alteration in the mgrB gene, suggest a clonal expansion of this clone among

Serbian medical settings. This observation underscores the need to strengthen contact precautions

for patients diagnosed with or suspected of having CRKP infections to limit the diffusion of colR CRKP

of ST101.

Of note, colR ST101 strains have recently been associated with high mortality rates. Indeed, a

prospective cohort study showed that among colR isolates, ST101 was found to be a significant

independent predictor of patient mortality, with a 30-day patient mortality of 72% (Can et al., 2018).

In conclusion, this work corresponds to the first genomic investigation of colistin resistance in K.

pneumoniae isolates from Serbia. The major role of MgrB mutations in colistin resistance in K.

pneumoniae, observed in strains of CG258, is here confirmed for those of ST101. We also report the

full sequence of a novel plasmid, pSRB_OXA-48, conferring MDR phenotype and encoding for the

ESBL CTX-M-15 and the carbapenemase OXA-48.

2.6 References

Alikhan, N.-F., Petty, N. K., Ben Zakour, N. L., and Beatson, S. A. (2011). BLAST Ring Image Generator

(BRIG): simple prokaryote genome comparisons. BMC Genomics 12, 402. doi:10.1186/1471-

2164-12-402.

Ardanuy, C., Liñares, J., Domínguez, M. A., Hernández-Allés, S., Benedí, V. J., and Martínez-Martínez,

L. (1998). Outer membrane profiles of clonally related Klebsiella pneumoniae isolates from

clinical samples and activities of cephalosporins and carbapenems. Antimicrob. Agents

Chemother. 42, 1636–40. Available at: http://www.ncbi.nlm.nih.gov/pubmed/9660996.

49

Avgoulea, K., Di Pilato, V., Zarkotou, O., Sennati, S., Politi, L., Cannatelli, A., et al. (2018).

Characterization of extensively- or pandrug-resistant ST147 and ST101 OXA-48-producing

Klebsiella pneumoniae isolates causing bloodstream infections in ICU patients. Antimicrob.

Agents Chemother., AAC.02457-17. doi:10.1128/AAC.02457-17.

Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., et al. (2012). SPAdes:

A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J. Comput.

Biol. 19, 455–477. doi:10.1089/cmb.2012.0021.

Bassetti, M., Righi, E., Carnelutti, A., Graziano, E., and Russo, A. (2018). Multidrug-resistant Klebsiella

pneumoniae : challenges for treatment, prevention and infection control. Expert Rev. Anti.

Infect. Ther. 00, 14787210.2018.1522249. doi:10.1080/14787210.2018.1522249.

Beyrouthy, R., Robin, F., Delmas, J., Gibold, L., Dalmasso, G., Dabboussi, F., et al. (2014). IS1R-

Mediated Plasticity of IncL/M Plasmids Leads to the Insertion of blaOXA-48 into the Escherichia

coli Chromosome. Antimicrob. Agents Chemother. 58, 3785. doi:10.1128/AAC.02669-14.

Bowers, J. R., Kitchel, B., Driebe, E. M., MacCannell, D. R., Roe, C., Lemmer, D., et al. (2015). Genomic

Analysis of the Emergence and Rapid Global Dissemination of the Clonal Group 258 Klebsiella

pneumoniae Pandemic. PLoS One 10, e0133727. doi:10.1371/journal.pone.0133727.

Can, F., Menekse, S., Ispir, P., Atac, N., Albayrak, O., Demir, T., et al. (2018). Impact of the ST101

clone on fatality among patients with colistin-resistant Klebsiella pneumoniae infection. J.

Antimicrob. Chemother., 1–7. doi:10.1093/jac/dkx532.

Cannatelli, A., D’Andrea, M. M., Giani, T., Di Pilato, V., Arena, F., Ambretti, S., et al. (2013). In vivo

emergence of colistin resistance in Klebsiella pneumoniae producing KPC-type carbapenemases

mediated by insertional inactivation of the PhoQ/PhoP mgrB regulator. Antimicrob. Agents

Chemother. 57, 5521–5526. doi:10.1128/AAC.01480-13.

Cannatelli, A., Di Pilato, V., Giani, T., Arena, F., Ambretti, S., Gaibani, P., et al. (2014a). In vivo

evolution to Colistin resistance by PmrB sensor kinase mutation in KPC-producing Klebsiella

pneumoniae is associated with low-dosage colistin treatment. Antimicrob. Agents Chemother.

58, 4399–4403. doi:10.1128/AAC.02555-14.

Cannatelli, A., Giani, T., D’Andrea, M. M., Pilato, V. Di, Arena, F., Conte, V., et al. (2014b). MgrB

inactivation is a common mechanism of colistin resistance in KPC-producing klebsiella

pneumoniae of clinical origin. Antimicrob. Agents Chemother. 58, 5696–5703.

doi:10.1128/AAC03110-14.

50

Carattoli, A., Zankari, E., Garciá-Fernández, A., Larsen, M. V., Lund, O., Villa, L., et al. (2014). In Silico

detection and typing of plasmids using plasmidfinder and plasmid multilocus sequence typing.

Antimicrob. Agents Chemother. 58, 3895–3903. doi:10.1128/AAC.02412-14.

Chen, L., Mathema, B., Pitout, J. D. D., DeLeo, F. R., and Kreiswirth, B. N. (2014). Epidemic Klebsiella

pneumoniae ST258 Is a Hybrid Strain. mBio 5, e01355-14. doi:10.1128/mBio.01355-14.

Cheng, H.-Y., Chen, Y.-F., and Peng, H.-L. (2010). Molecular characterization of the PhoPQ-PmrD-

PmrAB mediated pathway regulating polymyxin B resistance in Klebsiella pneumoniae CG43. J.

Biomed. Sci. 17, 60. doi:10.1186/1423-0127-17-60.

Cheng, Y. H., Lin, T. L., Pan, Y. J., Wang, Y. P., Lin, Y. T., and Wang, J. T. (2015). Colistin resistance

mechanisms in Klebsiella pneumoniae strains from Taiwan. Antimicrob. Agents Chemother. 59,

2909–2913. doi:10.1128/AAC.04763-14.

CLSI (2019). CLSI. Performance Standards for Antimicrobial Susceptibility Testing. 29th ed. CLSI

supplement M100. Wayne, PA: Clinical and Laboratory Standars Institute; 2019.

Cubero, M., Cuervo, G., Dominguez, M. Á., Tubau, F., Martí, S., Sevillano, E., et al. (2015).

Carbapenem-resistant and carbapenem-susceptible isogenic isolates of Klebsiella pneumoniae

ST101 causing infection in a tertiary hospital. BMC Microbiol. 15, 177. doi:10.1186/s12866-015-

0510-9.

David, S., Reuter, S., Harris, S. R., Glasner, C., Feltwell, T., Argimon, S., et al. (2019). Epidemic of

carbapenem-resistant Klebsiella pneumoniae in Europe is driven by nosocomial spread. Nat.

Microbiol. doi:10.1038/s41564-019-0492-8.

EUCAST (2019). The European Committee on Antimicrobial Susceptibility Testing. Breakpoint tables

for interpretation of MICs and zone diameters. Version 9.0, 2019. http://www.eucast.org.

Grundmann, H., Glasner, C., Albiger, B., Aanensen, D. M., Tomlinson, C. T., Andrasević, A. T., et al.

(2017). Occurrence of carbapenemase-producing Klebsiella pneumoniae and Escherichia coli in

the European survey of carbapenemase-producing Enterobacteriaceae (EuSCAPE): a prospective,

multinational study. Lancet Infect. Dis. 17, 153–163. doi:10.1016/S1473-3099(16)30257-2.

Jayol, A., Poirel, L., Brink, A., Villegas, M. V., Yilmaz, M., and Nordmann, P. (2014). Resistance to

colistin associated with a single amino acid change in protein PmrB among Klebsiella

pneumoniae isolates of worldwide origin. Antimicrob. Agents Chemother. 58, 4762–4766.

doi:10.1128/AAC.00084-14.

51

Karaiskos, I., Souli, M., Galani, I., and Giamarellou, H. (2017). Colistin: still a lifesaver for the 21st

century? Expert Opin. Drug Metab. Toxicol. 13, 59–71. doi:10.1080/17425255.2017.1230200.

Kazmierczak, K. M., Bradford, P. A., Stone, G. G., de Jonge, B. L. M., and Sahm, D. F. (2018). In Vitro

Activity of Ceftazidime-Avibactam and Aztreonam-Avibactam against OXA-48-Carrying

Enterobacteriaceae Isolated as Part of the International Network for Optimal Resistance

Monitoring (INFORM) Global Surveillance Program from 2012 to 2015. Antimicrob. Agents

Chemother. 62. doi:10.1128/AAC.00592-18.

Kocsis, B., Kilár, A., Péter, S., Dörnyei, Á., Sándor, V., and Kilár, F. (2017). “Mass Spectrometry for

Profiling LOS and Lipid A Structures from Whole-Cell Lysates: Directly from a Few Bacterial

Colonies or from Liquid Broth Cultures,” in Methods in molecular biology (Clifton, N.J.), 187–198.

doi:10.1007/978-1-4939-6958-6_17.

Letunic, I., and Bork, P. (2016). Interactive tree of life (iTOL) v3: an online tool for the display and

annotation of phylogenetic and other trees. Nucleic Acids Res. 44, W242-5.

doi:10.1093/nar/gkw290.

Leung, L. M., Fondrie, W. E., Doi, Y., Johnson, J. K., Strickland, D. K., Ernst, R. K., et al. (2017).

Identification of the ESKAPE pathogens by mass spectrometric analysis of microbial membrane

glycolipids. Sci. Rep. 7, 6403. doi:10.1038/s41598-017-04793-4.Lippa, A. M., and Goulian, M.

(2012). Perturbation of the Oxidizing Environment of the Periplasm Stimulates the PhoQ/PhoP

System in Escherichia coli. J. Bacteriol. 194, 1457–1463. doi:10.1128/JB.06055-11.

Liu, Y.-Y., Chandler, C. E., Leung, L. M., McElheny, C. L., Mettus, R. T., Shanks, R. M. Q., et al. (2017).

Structural Modification of Lipopolysaccharide Conferred by mcr-1 in Gram-Negative ESKAPE

Pathogens. Antimicrob. Agents Chemother. 61, e00580-17. doi:10.1128/AAC.00580-17.

Loucif, L., Kassah Laouar, A., Saidi, M., Messala, A., Chelaghma, W., and Rolain, J.-M. (2016).

Outbreak of OXA-48-producing Klebsiella pneumoniae involving an ST 101 clone in Batna

University Hospital, Algeria. Antimicrob. Agents Chemother. 60, AAC.00525-16.

doi:10.1128/AAC.00525-16.

Mirovic, V., Tomanovic, B., Lepsanovic, Z., Jovcic, B., and Kojic, M. (2012). Isolation of Klebsiella

pneumoniae Producing NDM-1 Metallo-β-Lactamase from the Urine of an Outpatient Baby Boy

Receiving Antibiotic Prophylaxis. Antimicrob. Agents Chemother. 56, 6062–6063.

doi:10.1128/AAC.00838-12.

Navon-Venezia, S., Kondratyeva, K., and Carattoli, A. (2017). Klebsiella pneumoniae: a major

52

worldwide source and shuttle for antibiotic resistance. FEMS Microbiol. Rev. 013, 252–275.

doi:10.1093/femsre/fux013.

Nordmann, P., Jayol, A., and Poirel, L. (2016). Rapid Detection of Polymyxin Resistance in

Enterobacteriaceae. Emerg. Infect. Dis. 22, 1038–1043. doi:10.3201/eid2206.151840.

Novović, K., Trudić, A., Brkić, S., Vasiljević, Z., Kojić, M., Medić, D., et al. (2017). Molecular

Epidemiology of Colistin-Resistant, Carbapenemase-Producing Klebsiella pneumoniae in Serbia

from 2013 to 2016. Antimicrob. Agents Chemother. 61, e02550-16. doi:10.1128/AAC.02550-16.

Olaitan, A. O., Diene, S. M., Kempf, M., Berrazeg, M., Bakour, S., Gupta, S. K., et al. (2014). Worldwide

emergence of colistin resistance in Klebsiella pneumoniae from healthy humans and patients in

Lao PDR, Thailand, Israel, Nigeria and France owing to inactivation of the PhoP/PhoQ regulator

mgrB: an epidemiological and molecular study. Int. J. Antimicrob. Agents 44, 500–507.

doi:10.1016/j.ijantimicag.2014.07.020.

Paczosa, M. K., and Mecsas, J. (2016). Klebsiella pneumoniae: Going on the Offense with a Strong

Defense. Microbiol. Mol. Biol. Rev. 80, 629–61. doi:10.1128/MMBR.00078-15.

Pitart, C., Solé, M., Roca, I., Fàbrega, A., Vila, J., and Marco, F. (2011). First Outbreak of a Plasmid-

Mediated Carbapenem-Hydrolyzing OXA-48 β-Lactamase in Klebsiella pneumoniae in Spain.

Antimicrob. Agents Chemother. 55, 4398–4401. doi:10.1128/AAC.00329-11.

Poirel, L., Bonnin, R. A., and Nordmann, P. (2012). Genetic features of the widespread plasmid coding

for the carbapenemase OXA-48. Antimicrob. Agents Chemother. 56, 559–62.

doi:10.1128/AAC.05289-11.

Potron, A., Poirel, L., Rondinaud, E., and Nordmann, P. (2013). Intercontinental spread of OXA-48

beta-lactamase-producing Enterobacteriaceae over a 11-year period, 2001 to 2011. Euro

Surveill. 18. doi:10.2807/1560-7917.es2013.18.31.20549.

Roe, C. C., Vazquez, A. J., Esposito, E. P., Zarrilli, R., and Sahl, J. W. (2019). Diversity, Virulence, and

Antimicrobial Resistance in Isolates From the Newly Emerging Klebsiella pneumoniae ST101

Lineage. Front. Microbiol. 10, 1–13. doi:10.3389/fmicb.2019.00542.

Seiffert, S. N., Marschall, J., Perreten, V., Carattoli, A., Furrer, H., and Endimiani, A. (2014). Emergence

of Klebsiella pneumoniae co-producing NDM-1, OXA-48, CTX-M-15, CMY-16, QnrA and ArmA in

Switzerland. Int. J. Antimicrob. Agents 44, 260–262. doi:10.1016/j.ijantimicag.2014.05.008.

Skálová, A., Chudějová, K., Rotová, V., Medvecky, M., Študentová, V., Chudáčková, E., et al. (2016).

53

Molecular characterization of OXA-48-like-producing Enterobacteriaceae in the Czech Republic:

evidence for horizontal transfer of pOXA-48-like plasmids. Antimicrob. Agents Chemother. 61,

AAC.01889-16. doi:10.1128/AAC.01889-16.

Stewart, A., Harris, P., Henderson, A., and Paterson, D. (2018). Treatment of Infections by OXA-48-

Producing Enterobacteriaceae. Antimicrob. Agents Chemother. 62. doi:10.1128/AAC.01195-18.

Sullivan, M. J., Petty, N. K., and Beatson, S. A. (2011). Easyfig: a genome comparison visualizer.

Bioinformatics 27, 1009–1010. doi:10.1093/bioinformatics/btr039.

Sun, J., Zhang, H., Liu, Y.-H., and Feng, Y. (2018). Towards Understanding MCR-like Colistin Resistance.

Trends Microbiol. 26, 794–808. doi:10.1016/j.tim.2018.02.006.

Tan, T. Y., and Ng, S. Y. (2007). Comparison of Etest, Vitek and agar dilution for susceptibility testing

of colistin. Clin. Microbiol. Infect. 13, 541–544. doi:10.1111/j.1469-0691.2007.01708.x.

Treangen, T. J., Ondov, B. D., Koren, S., and Phillippy, A. M. (2014). The Harvest suite for rapid core-

genome alignment and visualization of thousands of intraspecific microbial genomes. Genome

Biol. 15, 524. doi:10.1186/s13059-014-0524-x.

Trudic, A., Jelesic, Z., Mihajlovic-Ukropina, M., Medic, D., Zivlak, B., Gusman, V., et al. (2017).

Carbapenemase production in hospital isolates of multidrug-resistant Klebsiella pneumoniae

and Escherichia coli in Serbia. Vojnosanit. Pregl. 74, 715–721. doi:10.2298/VSP150917260T.

WHO Regional Office for Europe (2017). Central Asian and Eastern European Surveillance of

Antimicrobial Resistance. doi:10.2307/3395557.

Wick, R. R., Judd, L. M., Gorrie, C. L., and Holt, K. E. (2017). Unicycler: Resolving bacterial genome

assemblies from short and long sequencing reads. PLoS Comput. Biol. 13, e1005595.

doi:10.1371/journal.pcbi.1005595.

Wiskirchen, D. E., Nordmann, P., Crandon, J. L., and Nicolau, D. P. (2014). Efficacy of Humanized

Carbapenem and Ceftazidime Regimens against Enterobacteriaceae Producing OXA-48

Carbapenemase in a Murine Infection Model. Antimicrob. Agents Chemother. 58, 1678–1683.

doi:10.1128/AAC.01947-13.

Wright, M. S., Suzuki, Y., Jones, M. B., Marshall, S. H., Rudin, S. D., van Duin, D., et al. (2015). Genomic

and transcriptomic analyses of colistin-resistant clinical isolates of Klebsiella pneumoniae reveal

multiple pathways of resistance. Antimicrob. Agents Chemother. 59, 536–43.

doi:10.1128/AAC.04037-14.

54

Wyres, K. L., Gorrie, C., Edwards, D. J., Wertheim, H. F. L., Hsu, L. Y., Van Kinh, N., et al. (2015).

Extensive capsule locus variation and large-scale genomic recombination within the Klebsiella

pneumoniae clonal group 258. Genome Biol. Evol. 7, 1267–1279. doi:10.1093/gbe/evv062.

Wyres, K. L., Wick, R. R., Gorrie, C., Jenney, A., Follador, R., Thomson, N. R., et al. (2016).

Identification of Klebsiella capsule synthesis loci from whole genome data. Microb. Genomics 2,

e000102. doi:10.1099/mgen.0.000102.

Zankari, E., Hasman, H., Cosentino, S., Vestergaard, M., Rasmussen, S., Lund, O., et al. (2012).

Identification of acquired antimicrobial resistance genes. J. Antimicrob. Chemother. 67, 2640–

2644. doi:10.1093/jac/dks261.

55

CHAPTER 3 : Abundance of colistin-resistant, OXA-23- and ArmA-

producing Acinetobacter baumannii belonging to International Clone 2

in Greece

Mattia Palmieri1, Marco Maria D’Andrea2,3, Andreu Coello Pelegrin1, Nadine Perrot4, Caroline

Mirande4, Bernadette Blanc4, Nicholas Legakis5*, Herman Goossens6, Gian Maria Rossolini7,8 and

Alex van Belkum1

1bioMérieux, Data Analytics Unit, La Balme Les Grottes, France

2Department of Medical Biotechnologies, University of Siena, Siena, Italy.

3Department of Biology, University of Rome “Tor Vergata”, Rome, Italy.

4bioMérieux, R&D Microbiology, La Balme Les Grottes, France

5Central Laboratories, IASO Group Hospitals, Athens, Greece

6Laboratory of Medical Microbiology, Vaccine and Infectious Disease Institute, University of Antwerp, Antwerp, Belgium

7Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy

8Clinical Microbiology and Virology Unit, Florence Careggi University Hospital, Florence, Italy

Published in Frontiers in Microbiology, 15 April 2020, doi: 10.3389/fmicb.2020.00668

56

3.1 Abstract

Carbapenem resistant Acinetobacter baumannii (CRAB) represents one of the most challenging

pathogens in clinical settings. Colistin is routinely used for treatment of infections by this pathogen,

but increasing colistin resistance has been reported. We obtained 122 CRAB isolates from nine Greek

hospitals between 2015 and 2017, and those colistin resistant (ColR) (N=40, 32.8%) were whole

genome sequenced, also by including two colistin susceptible (ColS) isolates for comparison. All ColR

isolates were characterized by a previously described mutation, PmrBA226V, which was associated with

low-level colistin resistance. Some isolates were characterized by additional mutations in PmrB

(E140V or L178F) or PmrA (K172I or D10N), first described here, and higher colistin MICs, up to 64

mg/L. Mass spectrometry analysis of lipid A showed the presence of a phosphoethanolamine (pEtN)

moiety on lipid A, likely resulting from the PmrA/B-induced pmrC overexpression. Interestingly, also

the two ColS isolates had the same lipid A modification, suggesting that not all lipid A modifications

lead to colistin resistance or that other factors could contribute to the resistance phenotype. Most of

the isolates (N=37, 92.5%) belonged to the globally distributed international clone (IC) 2 and

comprised four different sequence types (STs) as defined by using the Oxford scheme (ST 425, 208,

451 and 436). Three isolates belonged to IC1 and ST1567. All the genomes harbored an intrinsic

blaOXA-51 group carbapenemase gene, where blaOXA-66 and blaOXA-69 were associated with IC2 and IC1,

respectively. Carbapenem resistance was due to the most commonly reported acquired

carbapenemase gene blaOXA-23, with ISAba1 located upstream of the gene and likely increasing its

expression. The armA gene, associated with high-level resistance to aminoglycosides, was detected

in 87.5 % of isolates. Collectively, these results revealed a convergent evolution of different clonal

lineages towards the same colistin resistance mechanism, thus limiting the effective therapeutic

options for the treatment of CRAB infections.

3.2 Introduction

Acinetobacter baumannii is now recognized as a major hospital pathogen by its ability to resist major

antimicrobials and to survive in the healthcare environment (Peleg et al., 2008). Currently,

carbapenem resistant A. baumannii (CRAB) is widespread, with rates reaching or exceeding 90% in

some clinical settings in Southern and Eastern European countries (ECDC, 2018) and elsewhere

(https://resistancemap.cddep.org), and mortality rates for the most common CRAB infections such as

bloodstream infections and hospital acquired pneumoniae approaching 60% (Wong et al., 2017).

OXA-type carbapenemases constitute the most prevalent mechanism of carbapenem resistance in

this species, with OXA-23, OXA-24 and OXA-58 being the most prevalent enzymes (Poirel and

Nordmann, 2006). Molecular epidemiological studies usually revealed an oligoclonal distribution of

CRAB, with outbreak strains mostly belonging to international clones (IC) 1 and 2 (Zarrilli et al., 2013).

57

In Greece, since their first emergence in 2000, CRAB have become endemic, and the percentage of

carbapenem resistance reached 94% in 2017 (Tsakris et al., 2003; ECDC, 2018). Regarding the CRAB

clonal nature and carbapenemase gene content, a study conducted from 2000 to 2009 in Greece

showed that CRAB were harboring only the OXA-58 carbapenemase gene; while IC1 was prevalent

until 2004, IC2 became dominant during 2005–2009 (Gogou et al., 2011). Between 2009 and 2011,

OXA-23 producers emerged and replaced the previously predominant OXA-58 producing A.

baumannii strains (Liakopoulos et al., 2012). Recently, a molecular epidemiological study on

contemporary CRAB clinical isolates derived from hospitals throughout Greece demonstrated the

predominance of OXA-23 producers belonging to IC2 (Pournaras et al., 2017).

Colistin-based treatment often represents the only therapeutic option for CRAB infections (Viehman

et al., 2014). However, CRAB isolates that are also ColR are being reported more frequently. Data

from the EARS-Net study in 2016 collected from 30 European countries showed that 4.0 % of the

tested isolates were resistant to colistin, with the vast majority (70.7 %) of the resistant isolates

reported from Greece and Italy (ECDC, 2017). A study from Greece reported an increase in colistin

resistance from 1% in 2012 to 21.1% in 2014 (Oikonomou et al., 2015), while Pournaras et al.

reported a resistance rate of 27.3% in 2015 (Pournaras et al., 2017). More alarmingly, the colistin

resistance rate was 56.8% in isolates collected from patients with ventilator-associated pneumonia in

Greece during 2015 (Nowak et al., 2017). Colistin resistance has been linked to mutations in the two-

component transcriptional regulator genes pmrA/B and consequent pmrC overexpression in most

instances. The phosphoethanolamine phosphotransferase PmrC adds a pEtN group to the lipid A of

the lipopolysaccharide, lowering the net negative charge of the cell membrane, thus impacting the

binding of colistin and preventing the cell membrane leakage (Poirel et al., 2017). Colistin resistance

may also result from the overexpression of etpA, a pmrC homolog. This is mediated by insertional

inactivation of a gene encoding an H-NS family transcriptional regulator (Lucas et al., 2018) or by

integration of insertion sequence elements upstream of the eptA gene itself (Gerson et al., 2019;

Potron et al., 2019; Trebosc et al., 2019).

In this study, 40 ColR and two ColS CRAB isolates collected from nine Greek hospitals between 2015

and 2017 were studied. Whole genome sequencing was performed to investigate the mechanisms of

antibiotic resistance as well as the genomic relatedness between the strains.


Bacterial strains and antimicrobial susceptibility testing. In the period 2015-2017, a total of 122

consecutive non-duplicate clinical CRAB isolates were obtained from routine microbiological cultures

of clinical samples (e.g. urine, blood, skin, bronchial aspirate) from different patients admitted to

58

nine Greek hospitals involved in this study (Figure 1). Bacteria were not isolated by the authors but

provided by the respective medical centers. Therefore, an ethics approval was not required as per

institutional and national guidelines and regulations. Antimicrobial susceptibility testing was

performed using the Vitek2 instrument (bioMérieux, Marcy l’Étoile, France) and the results were

interpreted following the EUCAST breakpoints (EUCAST, 2019). Since EUCAST doesn’t provide

breakpoints for cephalosporins and Acinetobacter spp., CLSI breakpoints were used for those

antibiotics (CLSI, 2019). Colistin minimum inhibitory concentrations (MICs) were obtained by broth

microdilution following the CLSI guidelines (CLSI, 2019), and the results were interpreted following

the EUCAST susceptibility breakpoint of 2 mg/L (EUCAST, 2019). Only the ColR CRAB isolates plus two

randomly selected ColS CRAB isolates were retained for further experiments.

Genome sequencing and assembly. Whole DNA of the selected CRAB isolates was extracted using

the QIAGEN UltraClean Microbial kit and sequenced with a NovaSeq sequencer (Illumina, USA),

generating paired end reads of 100 bp. Raw reads were assembled using SPAdes v.3.11.1 (Bankevich

et al., 2012) and annotated with Prokka (Seemann, 2014). Whole genome sequencing data have been

deposited under BioProject PRJNA578598.

Bioinformatics analysis. Sequence types (STs) were assigned by the mlst tool

(https://github.com/tseemann/mlst) by using the Oxford (gltA, gyrB, gdhB, recA, cpn60, gpi and rpoD

genes) and the Pasteur (cpn60, fusA, gltA, pyrG, recA, rplB and rpoB genes) schemes available on

pubMLST.org. The ABRicate tool (https://github.com/tseemann/abricate) was used for the detection

of antimicrobial resistance genes, by using the ResFinder (Zankari et al., 2012), CARD (Jia et al., 2017),

BLDB (Naas et al., 2017) and ARG-ANNOT (Gupta et al., 2014) databases. The minimum percentage of

coverage and identity used were 60 % and 90 %, respectively. The Kaptive tool was used to detect

the KL and OC locus (Wyres et al., 2019). BLAST+ (2.7.1) was used to detect mutations in genes

previously demonstrated to be potentially involved in colistin resistance (i.e. pmrCAB, eptA), and only

those leading to amino acid variations were considered. The pmrA/B/C and eptA genes were

compared to the reference genome ACICU (accession no. CP031380.1). The presence of insertion

sequence elements in the 500 bp region upstream of the blaADC, blaOXA-23, armA, eptA and pmrC genes

was determined using the ISfinder tool (Siguier et al., 2006). Core genes were defined by Roary

(v3.12.0) (Page et al., 2015) by using the annotated genomes, and genomes belonging to different

international clones (ICs) were treated separately. The alignment of these genes was screened for

further recombination using Gubbins (v2.3.4) (Croucher et al., 2015), while an ML phylogeny was

obtained by using RAxML (v8.2.12) (Stamatakis, 2014) with the GTRGAMMA model and 100

bootstrap replicates. The phylogenetic tree was visualized together with associated metadata using

Microreact (v7.0.0)(Argimón et al., 2016). Single nucleotide polymorphisms (SNPs) were obtained

59

with the snp-dists tool (https://github.com/tseemann/snp-dists) by using the Roary core genes

alignment as input.

Analysis of Lipid A. Lipid A was extracted using an acetic acid-based procedure as previously

described (Kocsis et al., 2017). Once extracted, 0.7 µL of the concentrate was spotted on a matrix-

assisted laser desorption/ ionization–time of flight mass spectrometry (MALDI-TOF MS) plate

followed by 0.7 µL of norharmane matrix (Sigma-Aldrich, St Louis, Missouri) and then air-dried. The

samples were analyzed on a Vitek MS instrument (bioMérieux, Marcy l’Étoile, France) in the

negative-ion mode. The resulting spectra were compared to that obtained for the ColS reference

strain A. baumannii ATCC 19606.

3.4 Results

Bacterial strains and antimicrobial susceptibilities. Of the 122 CRAB isolates, 40 (32.8%) were also

ColR, with colistin MICs ranging from 4 to 64 mg/L. All following data concern only the ColR isolates.

Antimicrobial susceptibility testing revealed that all isolates were resistant to cephalosporins

(ceftazidime and cefepime), carbapenems (imipenem and meropenem) fluoroquinolones

(ciprofloxacin and levofloxacin) and tobramycin. Resistance rates for gentamicin and

trimethoprim/sulfamethoxazole were 87.5% (N=35) and 92.5% (N=37), respectively. The two ColS

CRAB isolates, included in this study for comparative purposes, had a colistin MIC of 0.5 mg/L (Table

S1).

Genomic epidemiology. The majority of the ColR CRAB isolates (N=37, 92.5 %) were sequence type

(ST) 2, belonging to the previously described IC2 as defined by the Pasteur MLST scheme (Diancourt

et al., 2010) (Figure 1). The Oxford MLST scheme allowed to further differentiate the IC2 isolates in 4

different STs, all belonging to the clonal complex (CC) 208: the majority of isolates (N=29, 78.4%)

belonged to ST425, while ST208, ST451 and ST436 represented the 8.1% (N=3), 8.1% (N=3) and 5.4%

(N=2), respectively. These 4 STs shared 6 out of 7 alleles and differed only by the gpi gene. The gpi

gene is one of the capsular polysaccharide synthesis genes; therefore, the Oxford MLST scheme

suffers from limitations, as the gpi gene is prone to homologous recombination (Gaiarsa et al., 2019).

From a total of 4,612 different genes detected in all the isolates, 3,031 (65.7%) were core genes. Core

gene SNPs among IC2 genomes varied between 2 and 1,652 (mean: 569, median:531). The

phylogenetic analysis of IC2 isolates shows two major clusters of ST425, well differentiated in the

tree. These two clusters were characterized by two different capsular polysaccharides, KL4 and KL40.

Different capsular polysaccharides were observed in the other IC2 isolates (Figure1), while all the IC2

isolates were characterized by the lipooligosaccharide outer core (OC) locus 1 (OCL1). Isolates of

ST425:KL4 were only observed in Athens, within two hospitals in 2015 (Aglaia Kyriakoy and Agia Olga)

60

and one isolate in the Thriassio General hospital in 2017. The thirteen isolates obtained from the

Aglaia Kyriakoy hospital had an average of 12 core SNPs, suggesting cross-transmission of isolates

between different patients. Isolates of ST425:KL40 were retrieved between 2015-2016 from four

hospitals in Athens and one isolate from the University Hospital in Patras (200 km west of Athens).

This underscores the endemicity at the local level of this clone, moreover, suggesting inter-hospital

cross infections, given the absence of a clear clustering in the tree of isolates obtained from different

hospitals.

The remaining three isolates belonged to ST1 (IC1) and ST1567 according to the Pasteur and Oxford

MLST schemes, respectively, and harbored a capsule and lipooligosaccharide of type KL40 and OCL2.

Core gene SNPs varied between 29 and 305.

The two ColS CRAB isolates belonged to IC2, or ST208 (isolate PU_2016_41) and ST195 (GE_2017_62)

by using the Oxford MLST scheme, and had a median of 706 (min: 8, max:1568) and 1175 SNPs

(min:725, max:1652) compared to the ColR isolates, respectively.

Figure 1. Phylogenetic tree of the A. baumannii clinical strains belonging to IC2. cps: capsular polysaccharides.

Colistin resistance mechanisms. Several chromosomal mutations in genes potentially involved in

colistin resistance were detected, in comparison with the ACICU ColS reference genome. The

mutation PmrBA138T was detected in all ColR and ColS isolates, indicating that it may not contribute

61

significantly to the resistance phenotype, as previously reported (Oikonomou et al., 2015).

Conversely, the mutation A226V in the histidine kinase A (phosphoacceptor) domain of PmrB was

observed in all ColR isolates, and not in the ColS ones (Figure 1). This mutation has been described in

several prior studies, associated with ColR strains (Arroyo et al., 2011; Mavroidi et al., 2015, 2017;

Dortet et al., 2018; Trebosc et al., 2019).

Isolates with PmrBA226V without other alterations had colistin MICs ranging from 4 to 8 mg/L. When

an additional mutation in PmrB occurred (PmrBE140F in AK_2015_33 and PmrBL178F in SI_2017_69),

strains showed a colistin MIC of 16 mg/L. Two strains belonging to IC1 (FK_2016_46 and FK_2016_47)

had an additional K172I mutation in the transcriptional regulatory protein C-terminal domain of

PmrA, and showed colistin MICs of 32 mg/L. Finally, the strain AO_2015_54 had an additional D10N

mutation in the CheY-homologous receiver PmrA domain and was associated with colistin MIC of 64

mg/L. All these additional mutations are, to the best of our knowledge, first described here.

The susceptible strain GE_2017_62 had no additional mutations in pmrA/B genes. However, it had an

ISAba1 positioned 110 bp upstream of the pmrC gene, in reverse orientation. This is, to the best of

our knowledge, the first report of an insertion sequence transposition upstream of the pmrC gene.

However, this transposition event doesn’t seem to alter the colistin susceptibility in this isolate. The

second susceptible strain PU_2016_41 had PmrAM12V and PmrBR181H+Y388N. These mutations are firstly

described here, and in this strain they don’t seem to impact the colistin susceptibility.

The pmrC homolog eptA was detected in all the isolates of the IC2 except the susceptible isolate

GE_2017_62, while it was absent in the IC1 isolates. The obtained eptA gene sequences were

identical to that of the susceptible reference ACICU, and did not present insertion elements in the

upstream region.

The mcr genes, encoding for acquired colistin resistance, have not been described in A. baumannii

yet, and were not detected in our strain collection.

Lipid A modifications. An increased expression of pmrC or eptA results to the addition of pEtN to

lipid A. The lipid A of the ColR and ColS CRAB isolates was extracted and analyzed by MALDI-TOF MS,

and the resulting spectra were compared to that of the ColS reference strain A. baumannii ATCC

19606. Several lipid A species were detected in the reference strain ATCC 19606 and in all clinical

isolates: hepta-acylated lipid A (m/z 1,910), hexa-acylated lipid A (m/z 1,728) and tetra-acylated lipid

A (m/z 1,404). The addition of pEtN (m/z 124) to lipid A was shown by the mass at m/z 2,034, and

unexpectedly it was observed in all the clinical strains, including the ColS ones (Figure 2). Isolates

with colistin MICs higher than 8 mg/L also showed the peak at m/z 1954, representing the pEtN-

62

modified hepta-acylated lipid A (m/z 2034) minus one phosphate group (m/z 80), as previously

reported (Kim et al., 2014). The addition of galactosamine to lipid A, which is indicated by a mass at

m/z 2,071 (Pelletier et al., 2013), was not observed in any isolate.

Figure 2. Mass spectrometry analysis of Lipid A. From the bottom, isolate (colistin MIC, resistant/susceptible): ATCC-19606 (0.5, S), TR_2016_35 (4, R), SI_2017_69 (16, R), AO_2015_54 (64, R) and GE_2017_62 (0.5, S).

Antimicrobial resistance mechanisms and phenotype correlation. All CRAB genomes harbored a

chromosomal blaADC cephalosporinase, an intrinsic blaOXA-51 group carbapenemase and an acquired

blaOXA-23. IC2 genomes contained the blaADC-73 (accession no. KP881233), a variant of blaADC with a

sequence identity of 1,151/1,152 nucleotides compared to that of blaADC-30, and previously observed

in IC2 isolates (Karah et al., 2016). An ISAba1 element was present 9 bp upstream of the blaADC-73

gene in reverse orientation in all IC2 genomes, and it is responsible to increase the cephalosporinase

gene expression (Héritier et al., 2006). Conversely, IC1 genomes contained blaADC-175 (MH594297)

with an ISAba125 element positioned 66 bp upstream the gene in reverse orientation, as also

previously reported (Lopes and Amyes, 2012). ISAba125 was shown to increase the cephalosporinase

expression 6 times more than ISAba1 (Lopes and Amyes, 2012). The allelic variants of the intrinsic

blaOXA-51-like carbapenemase genes were blaOXA-66 and blaOXA-69, associated with IC2 and IC1,

respectively, as previously observed (Zander et al., 2012). All the blaOXA-23 genes were characterized

by the presence of an ISAba1 located upstream of the gene, which has been previously

demonstrated to increase its expression (Turton et al., 2006). In particular, the blaOXA-23 gene was part

of a Tn2006 transposon in the IC2 genomes. Conversely, a Tn2008 embedded within a TnaphA6 was

found in the three IC1 genomes, matching 100% with the sequence of plasmid pABKp1 (KP074966.1)

63

obtained from A. baumannii isolates from Romania (Gheorghe et al., 2014). Consistently with the

mentioned genes and their genetic environments, all isolates were resistant to cephalosporins,

including ceftazidime (3rd generation) and cefepime (4th generation), and carbapenems (imipenem

and meropenem).

Several aminoglycoside resistance genes were observed among the isolates, namely aac(3)-I, aac(3)-

Ia, ant(3”)-1a, aph(3’)-Ia, aph(3’)-Via, aph(6)-Id, armA and strA (Table S1). ArmA is a 16S ribosomal

RNA methyltransferase, which protects the 30S ribosomal subunit from aminoglycoside binding and

conferring high aminoglycosides MICs. Consistently, all the strains carrying armA (35/40, 87.5 %)

were resistant to gentamicin and tobramycin. The armA gene was located in the chromosome

aboard on the widely disseminated Tn1548, and it was found downstream of a cluster of genes

encoding proteins annotated as paraquat-inducible protein A and protein B, as previously described

for ST195 strain AC29 (Lean et al., 2016).

All strains contained substitutions within the QRDR, namely GyrAS83L and ParCS80L, previously

associated to quinolone resistance (Vila et al., 1995, 1997). As expected, all strains were non-

susceptible to ciprofloxacin and levofloxacin.

3.5 Discussion

Carbapenems represent first-line agents for the treatment of A. baumannii infections, consequently

the rise of infections due to carbapenem-resistant strains is of major concern. The carbapenem

resistance in the isolates described here was associated to the ISAba1-mediated overexpression of

blaOXA-23 located either in Tn2006 (IC2 isolates) or Tn2008 (IC1) transposons. Previous studies

reported that OXA-23 producers emerged and replaced the previously predominant OXA-58 A.

baumannii isolates (Liakopoulos et al., 2012), and this phenomenon could be linked to the stronger

hydrolytic activity of OXA-23 compared to OXA-58 (Peleg et al., 2008). Most CRAB isolates are

susceptible to only 1 or 2 agents, making them extensively drug-resistant (XDR) pathogens (Viehman

et al., 2014).

Because of the increasing use of colistin, resistance to this antibiotic has rapidly increased, especially

in CRAB isolates (Giamarellou, 2016; Jeannot et al., 2017), and now reached critical levels in some

countries (Nowak et al., 2017).From the nine hospitals involved in this study, the 32.8% of the CRAB

isolates were also ColR. These results indicate that colistin resistance rates among CRAB isolates from

Greece is on the rise, as a previous study reported a resistance rate of 27.3% in 2015 (Pournaras et al.,

2017). While the mcr genes, encoding for acquired colistin resistance, were absent among our

isolates, we found several mutations in the pmrCAB operon associated with the colistin resistance

64

phenotype. Interestingly, the previously described PmrBA226V mutation, previously associated to low-

level colistin resistance, was detected in all the ColR isolates but no in the ColS ones. In a recent

study, Trebosc et al. investigated the colistin resistance mechanisms of 12 clinical A. baumannii

strains. The authors concluded that colistin resistance was conferred, in most cases, by mutations in

the PmrB sensor kinase that led to PmrC overexpression. Two of those strains were isolated in

Greece in 2012, belonged to either IC1 or IC2 and had the mutation PmrBA226V. Such findings support

the important role of the mentioned PmrB mutation in the colistin resistance phenotype. Moreover,

a similar substitution of the alanine in position 226 of PmrB was reported to confer stable colistin

resistance in clinical A. baumannii isolates (Charretier et al., 2018). Some of our isolates had

additional mutations in either PmrB or PmrA, and were associated with higher colistin MICs, up to 64

mg/L. Multiple mutations may result in an increased expression of pmrC, as recently shown by RNA-

Seq experiments (Wright et al., 2017) and by qRT-PCR (Gerson et al., 2020). However, the same

studies reported clinical isolates characterized by pmrC overexpression due to PmrA/B mutations,

but with an unexpected colistin susceptible phenotype. Similarly, the two ColS isolates from our

study had pmrCAB alterations and a modified lipid A, as observed with mass spectrometry. All these

observations support the hypothesis that additional and still unknown factors are involved in colistin

resistance of clinical A. baumannii isolates (Jeannot et al., 2017; Gerson et al., 2019, 2020).

Determination of the cell-envelope charge could be useful in the elucidation process of the complex

mechanism of colistin resistance in A.baumannii (Cafiso et al., 2019).

In this study, we observed a clear predominance of IC2, which is globally distributed (Higgins et al.,

2010) and which is gradually replacing IC1 (Gogou et al., 2011; Villalon et al., 2011). The major

sequence type within IC2 was ST425, as defined by the Oxford MLST scheme. To the best of our

knowledge, only one study reported such ST, with one clinical isolate collected in 2002 in Sydney,

Australia (Nigro and Hall, 2016). However, WGS data were not provided. Both capsular

polysaccharides reported within our ST425 isolates, KL4 and KL40, were rarely observed (0.2%) or

completely absent, respectively, within IC2 genomes in a recent study where 3,416 publicly available

A. baumannii genomes were analyzed (Wyres et al., 2019). Conversely, KL4 and KL40 represented the

second (20.1%) and third (11.9%) most common capsular types observed within IC1 genomes. It is

conceivable that ST425 resulted from homologous recombination between a CC208 and an IC1

genomes, where the IC1 capsular polysaccharides genes were acquired by the CC208 strain, as this

region was previously shown to be a frequent subject of homologous recombination (Adams et al.,

2008; Snitkin et al., 2011; Kenyon and Hall, 2013).

In conclusion, genomic analysis of ColR CRAB isolates from different Greek hospitals revealed a

convergent evolution of different clonal lineages towards the same colistin resistance mechanism,

65

characterized by the mutation PmrBA226V. The prevalence of ColR CRAB isolates belonging to IC2 and

expressing OXA-23 and ArmA is increasing, and it represents a huge threat within clinical settings,

given the very limited effective agents for the treatment of infections caused by such isolates.

3.6 References

Adams, M. D., Goglin, K., Molyneaux, N., Hujer, K. M., Lavender, H., Jamison, J. J., et al. (2008).

Comparative genome sequence analysis of multidrug-resistant Acinetobacter baumannii. J.

Bacteriol. 190, 8053–8064. doi:10.1128/JB.00834-08.

Argimón, S., Abudahab, K., Goater, R. J. E., Fedosejev, A., Bhai, J., Glasner, C., et al. (2016). Microreact:

visualizing and sharing data for genomic epidemiology and phylogeography. Microb. genomics 2,

e000093. doi:10.1099/mgen.0.000093.

Arroyo, L. A., Herrera, C. M., Fernandez, L., Hankins, J. V., Trent, M. S., and Hancock, R. E. W. (2011).

The pmrCAB Operon Mediates Polymyxin Resistance in Acinetobacter baumannii ATCC 17978

and Clinical Isolates through Phosphoethanolamine Modification of Lipid A. Antimicrob. Agents


Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., et al. (2012). SPAdes:

A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J. Comput.

Biol. 19, 455–477. doi:10.1089/cmb.2012.0021.

Cafiso, V., Stracquadanio, S., Lo Verde, F., Gabriele, G., Mezzatesta, M. L., Caio, C., et al. (2019).

Colistin Resistant A. baumannii: Genomic and Transcriptomic Traits Acquired Under Colistin

Therapy. Front. Microbiol. 9, 3195. doi:10.3389/fmicb.2018.03195.

Charretier, Y., Diene, S. M., Baud, D., Chatellier, S., Santiago-Allexant, E., Van Belkum, A., et al. (2018).

Colistin heteroresistance and involvement of the PmrAB regulatory system in Acinetobacter

baumannii. Antimicrob. Agents Chemother. 62. doi:10.1128/AAC.00788-18.

CLSI (2019). CLSI. Performance Standards for Antimicrobial Susceptibility Testing. 29th ed. CLSI

supplement M100. Wayne, PA: Clinical and Laboratory Standars Institute; 2019.

Croucher, N. J., Page, A. J., Connor, T. R., Delaney, A. J., Keane, J. A., Bentley, S. D., et al. (2015). Rapid

phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using

Gubbins. Nucleic Acids Res. 43, e15. doi:10.1093/nar/gku1196.

Diancourt, L., Passet, V., Nemec, A., Dijkshoorn, L., and Brisse, S. (2010). The Population Structure of

Acinetobacter baumannii: Expanding Multiresistant Clones from an Ancestral Susceptible

66

Genetic Pool. PLoS One 5, e10034. doi:10.1371/journal.pone.0010034.

Dortet, L., Potron, A., Bonnin, R. A., Plesiat, P., Naas, T., Filloux, A., et al. (2018). Rapid detection of

colistin resistance in Acinetobacter baumannii using MALDI-TOF-based lipidomics on intact

bacteria. Sci. Rep. 8, 16910. doi:10.1038/s41598-018-35041-y.

ECDC (2017). European Centre for Disease Prevention and Control. Surveillance of antimicrobial

resistance in Europe 2016. Annual Report of the European Antimicrobial Resistance Surveillance

Network (EARS-Net). Stockholm: ECDC; 2017.

ECDC (2018). European Centre for Disease Prevention and Control. Surveillance of antimicrobial

resistance in Europe – Annual report of the European Antimicrobial Resistance Surveillance

Network (EARS-Net) 2017. Stockholm: ECDC; 2018.

EUCAST (2019). The European Committee on Antimicrobial Susceptibility Testing. Breakpoint tables


Gaiarsa, S., Batisti Biffignandi, G., Esposito, E. P., Castelli, M., Jolley, K. A., Brisse, S., et al. (2019).

Comparative analysis of the two Acinetobacter baumannii multilocus sequence typing (MLST)

schemes. Front. Microbiol. 10. doi:10.3389/fmicb.2019.00930.

Gerson, S., Betts, J. W., Lucaßen, K., Nodari, C. S., Wille, J., Josten, M., et al. (2019). Investigation of

Novel pmrB and eptA Mutations in Isogenic Acinetobacter baumannii Isolates Associated with

Colistin Resistance and Increased Virulence In Vivo . Antimicrob. Agents Chemother. 63, 1–15.

doi:10.1128/aac.01586-18.

Gerson, S., Lucaßen, K., Wille, J., Nodari, C. S., Stefanik, D., Nowak, J., et al. (2020). Diversity of amino

acid substitutions in PmrCAB associated with colistin resistance in clinical isolates of

Acinetobacter baumannii. Int. J. Antimicrob. Agents. doi:10.1016/j.ijantimicag.2019.105862.

Gheorghe, I., Novais, Â., Grosso, F., Rodrigues, C., Chifiriuc, M. C., Lazar, V., et al. (2014). Snapshot on

carbapenemase-producing Pseudomonas aeruginosa and Acinetobacter baumannii in bucharest

hospitals reveals unusual clones and novel genetic surroundings for blaOXA-23. J. Antimicrob.

Chemother. 70, 1016–1020. doi:10.1093/jac/dku527.

Giamarellou, H. (2016). Epidemiology of infections caused by polymyxin-resistant pathogens. Int. J.

Antimicrob. Agents 48, 614–621. doi:10.1016/j.ijantimicag.2016.09.025.

Gogou, V., Pournaras, S., Giannouli, M., Voulgari, E., Piperaki, E.-T., Zarrilli, R., et al. (2011). Evolution

of multidrug-resistant Acinetobacter baumannii clonal lineages: a 10 year study in Greece

67

(2000-09). J. Antimicrob. Chemother. 66, 2767–2772. doi:10.1093/jac/dkr390.

Gupta, S. K., Padmanabhan, B. R., Diene, S. M., Lopez-Rojas, R., Kempf, M., Landraud, L., et al. (2014).

ARG-annot, a new bioinformatic tool to discover antibiotic resistance genes in bacterial

genomes. Antimicrob. Agents Chemother. 58, 212–220. doi:10.1128/AAC.01310-13.

Héritier, C., Poirel, L., and Nordmann, P. (2006). Cephalosporinase over-expression resulting from

insertion of ISAba1 in Acinetobacter baumannii. Clin. Microbiol. Infect. 12, 123–130.

doi:10.1111/j.1469-0691.2005.01320.x.

Higgins, P. G., Dammhayn, C., Hackel, M., and Seifert, H. (2010). Global spread of carbapenem-

resistant Acinetobacter baumannii. J. Antimicrob. Chemother. 65, 233–238.

doi:10.1093/jac/dkp428.

Jeannot, K., Bolard, A., and Plésiat, P. (2017). Resistance to polymyxins in Gram-negative organisms.

Int. J. Antimicrob. Agents 49, 526–535. doi:10.1016/j.ijantimicag.2016.11.029.

Jia, B., Raphenya, A. R., Alcock, B., Waglechner, N., Guo, P., Tsang, K. K., et al. (2017). CARD 2017 :

expansion and model-centric curation of the comprehensive antibiotic resistance database. 45,

566–573. doi:10.1093/nar/gkw1004.

Karah, N., Dwibedi, C. K., Sjöström, K., Edquist, P., Johansson, A., Wai, S. N., et al. (2016). Novel

Aminoglycoside Resistance Transposons and Transposon-Derived Circular Forms Detected in

Carbapenem-Resistant Acinetobacter baumannii Clinical Isolates. Antimicrob. Agents


Kenyon, J. J., and Hall, R. M. (2013). Variation in the Complex Carbohydrate Biosynthesis Loci of

Acinetobacter baumannii Genomes. PLoS One 8, e62160. doi:10.1371/journal.pone.0062160.

Kim, Y., Bae, I. K., Lee, H., Jeong, S. H., Yong, D., and Lee, K. (2014). In vivo emergence of colistin

resistance in Acinetobacter baumannii clinical isolates of sequence type 357 during colistin

treatment. Diagn. Microbiol. Infect. Dis. 79, 362–366. doi:10.1016/j.diagmicrobio.2014.03.027.

Kocsis, B., Kilár, A., Péter, S., Dörnyei, Á., Sándor, V., and Kilár, F. (2017). “Mass Spectrometry for

Profiling LOS and Lipid A Structures from Whole-Cell Lysates: Directly from a Few Bacterial

Colonies or from Liquid Broth Cultures,” in Methods in molecular biology (Clifton, N.J.), 187–198.

doi:10.1007/978-1-4939-6958-6_17.

Lean, S. S., Yeo, C. C., Suhaili, Z., and Thong, K. L. (2016). Comparative genomics of two ST 195

carbapenem-resistant Acinetobacter baumannii with different susceptibility to polymyxin

68

revealed underlying resistance mechanism. Front. Microbiol. 6, 1–17.

doi:10.3389/fmicb.2015.01445.

Liakopoulos, A., Miriagou, V., Katsifas, E. A., Karagouni, A. D., Daikos, G. L., Tzouvelekis, L. S., et al.

(2012). Identification of OXA-23-producing Acinetobacter baumannii in Greece, 2010 to 2011.

Euro Surveill. 17. Available at: http://www.ncbi.nlm.nih.gov/pubmed/22449866 [Accessed

February 7, 2019].

Lopes, B. S., and Amyes, S. G. B. (2012). Role of ISAba1 and ISAba125 in governing the expression of

blaADC in clinically relevant Acinetobacter baumannii strains resistant to cephalosporins. J. Med.

Microbiol. 61, 1103–1108. doi:10.1099/jmm.0.044156-0.

Lucas, D. D., Crane, B., Wright, A., Han, M.-L., Moffatt, J., Bulach, D., et al. (2018). Emergence of high-

level colistin resistance in an Acinetobacter baumannii clinical isolate mediated by inactivation

of the global regulator H-NS. Antimicrob. Agents Chemother 30, 1–17. doi:10.1128/AAC.02442-

17.

Mavroidi, A., Katsiari, M., Palla, E., Likousi, S., Roussou, Z., Nikolaou, C., et al. (2017). Investigation of

Extensively Drug-Resistant blaOXA-23-Producing Acinetobacter baumannii Spread in a Greek

Hospital. Microb. Drug Resist. 23, 488–493. doi:10.1089/mdr.2016.0101.

Mavroidi, A., Likousi, S., Palla, E., Katsiari, M., Roussou, Z., Maguina, A., et al. (2015). Molecular

identification of tigecycline- and colistin-resistant carbapenemase-producing Acinetobacter

baumannii from a Greek hospital from 2011 to 2013. J. Med. Microbiol. 64, 993–997.

doi:10.1099/jmm.0.000127.

Naas, T., Oueslati, S., Bonnin, R. A., Dabos, M. L., Zavala, A., Dortet, L., et al. (2017). Beta-lactamase

database (BLDB) – structure and function. J. Enzyme Inhib. Med. Chem. 32, 917–919.

doi:10.1080/14756366.2017.1344235.

Nigro, S. J., and Hall, R. M. (2016). Loss and gain of aminoglycoside resistance in global clone 2

Acinetobacter baumannii in Australia via modification of genomic resistance islands and

acquisition of plasmids. J. Antimicrob. Chemother. 71, 2432–40. doi:10.1093/jac/dkw176.

Nowak, J., Zander, E., Stefanik, D., Higgins, P. G., Roca, I., Vila, J., et al. (2017). High incidence of

pandrug-resistant Acinetobacter baumannii isolates collected from patients with ventilator-

associated pneumonia in Greece, Italy and Spain as part of the MagicBullet clinical trial. J.

Antimicrob. Chemother. 72, 3277–3282. doi:10.1093/jac/dkx322.

69

Oikonomou, O., Sarrou, S., Papagiannitsis, C. C., Georgiadou, S., Mantzarlis, K., Zakynthinos, E., et al.

(2015). Rapid dissemination of colistin and carbapenem resistant Acinetobacter baumannii in

Central Greece: Mechanisms of resistance, molecular identification and epidemiological data.

BMC Infect. Dis. 15, 13–18. doi:10.1186/s12879-015-1297-x.

Page, A. J., Cummins, C. A., Hunt, M., Wong, V. K., Reuter, S., Holden, M. T. G., et al. (2015). Roary:

rapid large-scale prokaryote pan genome analysis. Bioinformatics 31, 3691–3693.

doi:10.1093/bioinformatics/btv421.

Peleg, A. Y., Seifert, H., and Paterson, D. L. (2008). Acinetobacter baumannii: Emergence of a

Successful Pathogen. Clin. Microbiol. Rev. 21, 538–582. doi:10.1128/CMR.00058-07.

Pelletier, M. R., Casella, L. G., Jones, J. W., Adams, M. D., Zurawski, D. V., Hazlett, K. R. O., et al.

(2013). Unique Structural Modifications Are Present in the Lipopolysaccharide from Colistin-

Resistant Strains of Acinetobacter baumannii. Antimicrob. Agents Chemother. 57, 4831–4840.

doi:10.1128/AAC.00865-13.

Poirel, L., Jayol, A., and Nordmann, P. (2017). Polymyxins: Antibacterial Activity, Susceptibility Testing,

and Resistance Mechanisms Encoded by Plasmids or Chromosomes. Clin. Microbiol. Rev. 30,

557–596. doi:10.1128/CMR.00064-16.

Poirel, L., and Nordmann, P. (2006). Carbapenem resistance in Acinetobacter baumannii: mechanisms

and epidemiology. Clin. Microbiol. Infect. 12, 826–836. doi:10.1111/j.1469-0691.2006.01456.x.

Potron, A., Vuillemenot, J.-B., Puja, H., Triponney, P., Bour, M., Valot, B., et al. (2019). ISAba1-

dependent overexpression of eptA in clinical strains of Acinetobacter baumannii resistant to

colistin. J. Antimicrob. Chemother. 74, 2544–2550. doi:10.1093/jac/dkz241.

Pournaras, S., Dafopoulou, K., Del Franco, M., Zarkotou, O., Dimitroulia, E., Protonotariou, E., et al.

(2017). Predominance of international clone 2 OXA-23-producing- Acinetobacter baumannii

clinical isolates in Greece, 2015: results of a nationwide study. Int. J. Antimicrob. Agents 49,

749–753. doi:10.1016/j.ijantimicag.2017.01.028.

Seemann, T. (2014). Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069.

doi:10.1093/bioinformatics/btu153.

Siguier, P., Perochon, J., Lestrade, L., Mahillon, J., and Chandler, M. (2006). ISfinder: the reference

centre for bacterial insertion sequences. Nucleic Acids Res. 34, D32-6. doi:10.1093/nar/gkj014.

Snitkin, E. S., Zelazny, A. M., Montero, C. I., Stock, F., Mijares, L., Mullikin, J., et al. (2011). Genome-

70

wide recombination drives diversification of epidemic strains of Acinetobacter baumannii. Proc.

Natl. Acad. Sci. U. S. A. 108, 13758–13763. doi:10.1073/pnas.1104404108.

Stamatakis, A. (2014). RAxML version 8: A tool for phylogenetic analysis and post-analysis of large

phylogenies. Bioinformatics 30, 1312–1313. doi:10.1093/bioinformatics/btu033.

Trebosc, V., Gartenmann, S., Tötzl, M., Lucchini, V., Schellhorn, B., Pieren, M., et al. (2019). Dissecting

Colistin Resistance Mechanisms in Extensively Drug-Resistant Acinetobacter baumannii Clinical

Isolates. MBio 10. doi:10.1128/mBio.01083-19.

Tsakris, A., Tsioni, C., Pournaras, S., Polyzos, S., Maniatis, A. N., and Sofianou, D. (2003). Spread of

low-level carbapenem-resistant Acinetobacter baumannii clones in a tertiary care Greek

hospital. J. Antimicrob. Chemother. 52, 1046–1047. doi:10.1093/jac/dkg470.

Turton, J. F., Ward, M. E., Woodford, N., Kaufmann, M. E., Pike, R., Livermore, D. M., et al. (2006).

The role of ISAba1 in expression of OXA carbapenemase genes in Acinetobacter baumannii.

FEMS Microbiol. Lett. 258, 72–77. doi:10.1111/j.1574-6968.2006.00195.x.

Viehman, J. A., Nguyen, M. H., and Doi, Y. (2014). Treatment Options for Carbapenem-Resistant and

Extensively Drug-Resistant Acinetobacter baumannii Infections. Drugs 74, 1315–1333.

doi:10.1007/s40265-014-0267-8.

Vila, J., Ruiz, J., Goñi, P., and Jimenez de Anta, T. (1997). Quinolone-resistance mutations in the

topoisomerase IV parC gene of Acinetobacter baumannii. J. Antimicrob. Chemother. 39, 757–62.

Available at: http://www.ncbi.nlm.nih.gov/pubmed/9222045 [Accessed October 17, 2018].

Vila, J., Ruiz, J., Goñi, P., Marcos, A., and Jimenez de Anta, T. (1995). Mutation in the gyrA gene of

quinolone-resistant clinical isolates of Acinetobacter baumannii. Antimicrob. Agents Chemother.

39, 1201–3. Available at: http://www.ncbi.nlm.nih.gov/pubmed/7625818 [Accessed October 17,

2018].

Villalon, P., Valdezate, S., Medina-Pascual, M. J., Rubio, V., Vindel, A., and Saez-Nieto, J. A. (2011).

Clonal Diversity of Nosocomial Epidemic Acinetobacter baumannii Strains Isolated in Spain. J.

Clin. Microbiol. 49, 875–882. doi:10.1128/JCM.01026-10.

Wong, D., Nielsen, T. B., Bonomo, R. A., Pantapalangkoor, P., Luna, B., and Spellberg, B. (2017).

Clinical and Pathophysiological Overview of Acinetobacter Infections: a Century of Challenges.

Clin. Microbiol. Rev. 30, 409–447. doi:10.1128/CMR.00058-16.

Wright, M. S., Jacobs, M. R., Bonomo, R. A., and Adams, M. D. (2017). Transcriptome Remodeling of

71

Acinetobacter baumannii during Infection and Treatment. MBio 8, e02193-16.

doi:10.1128/mBio.02193-16.

Wyres, K. L., Cahill, S. M., Holt, K. E., Hall, R. M., and Kenyon, J. J. (2019). Identification of

Acinetobacter baumannii loci for capsular polysaccharide (KL) and lipooligosaccharide outer

core (OCL) synthesis in genome assemblies using curated reference databases compatible with

Kaptive. bioRxiv 1, 869370. doi:10.1101/869370.

Zander, E., Nemec, A., Seifert, H., and Higgins, P. G. (2012). Association between β-lactamase-

encoding blaOXA-51 variants and DiversiLab rep-PCR-based typing of Acinetobacter baumannii

isolates. J. Clin. Microbiol. 50, 1900–4. doi:10.1128/JCM.06462-11.

Zankari, E., Hasman, H., Cosentino, S., Vestergaard, M., Rasmussen, S., Lund, O., et al. (2012).

Identification of acquired antimicrobial resistance genes. J. Antimicrob. Chemother. 67, 2640–

2644. doi:10.1093/jac/dks261.

Zarrilli, R., Pournaras, S., Giannouli, M., and Tsakris, A. (2013). Global evolution of multidrug-resistant

Acinetobacter baumannii clonal lineages. Int. J. Antimicrob. Agents 41, 11–19.

doi:10.1016/j.ijantimicag.2012.09.008.

72

CHAPTER 4 : Genomic evolution and local epidemiology of Klebsiella

pneumoniae from the Beijing Hospital 301 over a fifteen-year period:

dissemination of known and novel high-risk clones

Mattia Palmieri1, Kelly L. Wyres2, Andreu Coello Pelegrin1, Caroline Mirande3, Zhao Qiang4, Ye

Liyan4, Chen Gang4, Herman Goossens5, Kathryn E. Holt2, Alex van Belkum1, Luo Yan Ping4.


2Department of Infectious Diseases, Monash University, Melbourne, Victoria, Australia.

3bioMérieux, R&D Microbiology, La Balme Les Grottes, France.

4Chinese PLA General Hospital 301, BJ 301 clinical hospital laboratory, Beijing, China.

5Laboratory of Medical Microbiology, Vaccine and Infectious Disease Institute, University of Antwerp, Belgium.

Manuscript in preparation

73

4.1 Introduction

Klebsiella pneumoniae is one of the greatest infectious threats amongst Gram-negative pathogens.

Multidrug-resistant (MDR) strains causing hospital outbreaks and hypervirulent strains causing

severe community-acquired infections are of major concern (Paczosa & Mecsas 2016). In China,

hypervirulent K. pneumoniae (hvKp), primarily of clonal group (CG) 23, and carbapenem-resistant K.

pneumoniae (CR-Kp), mostly belonging to CG258, represent the two major clinically significant

lineages of K. pneumoniae (Struve et al. 2015; Zhang et al. 2017).

HvKp infections are characterized by high morbidity and mortality and they are mainly associated

with severe life-threatening liver abscesses, pneumonia, meningitis, and endophthalmitis in young

and healthy individuals (Shon et al. 2013). Several virulence factors have been reported in hvKp

strains. The capsular polysaccharide (cps) is a major virulence factor, and hvKp strains are usually

associated with K1 or K2 capsular serotypes, that were shown to be particularly anti-phagocytic and

provide serum resistance (Kabha et al. 1995; Paczosa & Mecsas 2016). hvKp also harbor other

virulence genes: i) the rmpA and rmpA2 genes that upregulate capsule expression, ii) the colibactin

(clb) genotoxin that induces eukaryotic cell death and promotes bacterial transition from the blood

from the gut; the yersiniabactin (ybt), aerobactin (iuc) and salmochelin (iro) siderophores that

enhance survival in the blood by promoting iron scavenging (Paczosa & Mecsas 2016). While the ybt

locus is generally mobilized by an integrative, conjugative element termed ICEKp (Lam, Wick, et al.

2018), the iro, iuc and rmpA/rmpA2 loci are usually co-located on a virulence plasmid (Lam, Wyres, et

al. 2018). CG23 strains are usually susceptible to most antibiotics (Siu et al. 2012), however the last

few years have seen the emergence of MDR strains, including those resistant to carbapenems,

namely CR-hvKp (Bialek-Davenet et al. 2014; Liu et al. 2017; Shen et al. 2019; Dong, Lin, et al. 2018;

Chen et al. 2020).

Carbapenem resistance is rapidly increasing in China, and the CHINET surveillance network showed

that the resistance rate of K. pneumoniae to imipenem and meropenem increased from 3.0% and

2.9% in 2005 to 25% and 26.3% in 2018, respectively, resulting in a more than 8-fold increase (Hu et

al. 2016, 2019). KPC-2 is the most prevalent enzyme among CR-Kp in China, with 77% KPC-2 positive

strains among the carbapenemase producers reported in a recent study (Zhang et al. 2018). CG258 is

recognized worldwide as the most common clinical carbapenem-resistant clone and the major vector

of KPC-2, with ST258 being most prevalent in Europe and the U.S.A (Chen et al. 2014) and ST11

accounting for 75% of CR-Kp in China (Zhang et al. 2018). Genomic studies revealed that most of the

ST11 CR-Kp strains in China harbour a capsule of type KL47 or the recently emerging KL64 (Dong,

Zhang, et al. 2018; Zhou et al. 2020). Recently, CR-Kp ST11 strains with a hyper-virulent phenotype,

74

as defined by the carriage of the iuc aerobactin locus, have emerged (Gu et al. 2017; Yao et al. 2018;

Wong et al. 2018; Dong, Zhang, et al. 2018; Xu et al. 2019; Zhang et al. 2019; Yang et al. 2020; Zhou

et al. 2020). While the majority of these reports represent sporadic isolations, in 2017 a fatal

outbreak was caused by a CR-Kp ST11-KL47 strain harbouring a virulence plasmid containing the iuc

and the rmpA2 genes (Gu et al. 2017). Further retrospective investigations revealed that similar

strains were already circulating within China before the initial report (Gu et al. 2017; Yao et al. 2018).

Numerous studies have investigated the genetic epidemiology of CR-Kp in China (Van Dorp et al.

2019; Yang et al. 2020; Zhou et al. 2020). We here study a large collection of serially selected K.

pneumoniae strains obtained from patients in the H301 Beijing hospital during the period 2002-2016.

Phenotypic antimicrobial susceptibility testing and WGS were employed in order to obtain a global

picture of the strains circulating within the hospital during the study period. Focusing on the broad

population, instead of CR-Kp solely, allows the understanding of the evolution towards MDR,

including ESBL production, and hypervirulence, as well as the convergence of the two traits.


Bacterial isolates and antimicrobial susceptibility. Bacterial isolates were obtained from the 4,000-

bed Hospital 301 in Beijing, China. A total of 300 K. pneumoniae isolates were collected from routine

microbiological cultures of clinical samples (urine, blood, sputum, tissues etc) within the period 2002-

2016. Of those, 200 were randomly selected from different patients over the study period, they

represented 3% of the K. pneumoniae isolates collected during the study period and were used for

genomic epidemiology investigations. The additional 100 isolates were selected based on different

criteria (e.g. carbapenem-resistance, isolates from outbreak) and were included to enrich the analysis

of the major clones. Antimicrobial susceptibility testing was performed for all isolates with the Vitek2

automated system (bioMérieux, Marcy L’Ètoile, France), and results were interpreted according to

the EUCAST breakpoints (EUCAST 2019). Antimicrobials tested were: amikacin, aztreonam, cefepime,

ceftazidime, ciprofloxacin, ertapenem, gentamicin, imipenem, levofloxacin, piperacillin/tazobactam,

tobramycin and trimethoprim/sulfamethoxazole. We defined MDR when non-susceptibility to three

or more classes of antimicrobials was observed, as described in reference (Magiorakos et al. 2012).

Data were analysed with python (v3.7.4) and statistical analysis were conducted by using a linear

regression method from the Scikit-learn package.

Whole genome sequencing and assembly. Genomic DNA was extracted with the DNeasy UltraClean

kit (Qiagen, Hilden, Germany), quantified by using the Qubit fluorometer (Thermo Fisher Scientific,

USA) and quality checked by using the 260/280 ratio absorbance parameter as determined by the

DS-11 FX + instrument (DeNovix, Wilmington, USA). Sequencing was performed using a HiSeq

75

platform (Illumina, Inc., San Diego, USA) and a 2x150 bp paired-end approach. Raw data from paired-

end sequencing were quality checked with the FastQC tool (v.0.11.6) and assembled with SPAdes

(v.3.11.1)(Bankevich et al., 2012). Assemblies were inspected with Bandage (v0.8.1) (Wick et al.

2015).

Bioinformatics analysis. Sequence types (STs) were assigned by the mlst tool

(github.com/tseemann/mlst) by using the Pasteur database (bigsdb.pasteur.fr/). The ABRicate tool

(github.com/tseemann/abricate) was used to detect acquired antimicrobial resistance genes using

the ResFinder database (Zankari et al. 2012), while plasmid replicons were predicted by

PlasmidFinder (Carattoli et al. 2014). Kaptive was used for the capsular type detection (Wyres et al.

2016a). Kleborate (github.com/katholt/Kleborate) was used for the species identification, detection

of ICEKp associated virulence loci (yersiniabactin (ybt), colibactin (clb)), virulence plasmid associated

loci (salmochelin (iro), aerobactin (iuc), hypermucoidy (rmpA, rmpA2)) and for checking the

ompK35/36 gene integrity. Phylogenetic analysis of CG258, CG23 and ST383 genomes were

performed by reads mapping of the respective reads by using the reference genomes GD4 (accession

no. CP025951), SGH10 (CP025080) and KpvST383_NDM_OXA-48 (CP034200), respectively. Snippy

was used for the reads mapping (github.com/tseemann/snippy). The whole genome alignments

obtained were screened for recombination using Gubbins (v2.3.4) (Croucher et al. 2015), while a

maximum likelihood phylogeny was obtained by using RAxML (v8.2.12)(Stamatakis 2014) with the

GTRGAMMA model and 100 bootstrap replicates. Core genome Single nucleotide polymorphisms

(SNPs) were obtained with the snp-dists tool (github.com/tseemann/snp-dists) by using the Gubbins

output. The phylogenetic tree were visualized together with associated metadata using Microreact

(v7.0.0)(Argimón et al. 2016) or Phandango (Hadfield et al. 2018). The Harvest suite was used to align

and visualize genomes of CG23 and ST35 strains in order to decipher the recombination events

within ST1265 (Treangen et al. 2014).

4.3 Results and discussion

A total of 299 K. pneumoniae strains were successfully sequenced. One isolate was further identified

as K. michiganensis and was excluded, leaving 299 isolates. Of those, 200 were randomly selected

over the 15-year period (2002-2016) and will be considered for longitudinal and epidemiological

investigations. In silico species identification reported the presence of the four major K. pneumoniae

species (Figure 7), with a prevalence of K. pneumoniae sensu stricto (N=177, 88.5%) followed by K.

quasipneumoniae subsp. similipneumoniae (N=11, 5.5%), K. quasipneumoniae subsp.

quasipneumoniae (N=8, 4%) and K. variicola (N=4, 2%). No particular trends in terms of species

abundance were observed.

76

Figure 7. Phylogenetic analysis of the whole K. pneumoniae collection, showing the different K. pneumoniae species.

4.3.1 Antimicrobial susceptibility.

Phenotypic results highlighted imipenem as the most effective drug, with 94.5% susceptibility,

followed by amikacin and ertapenem (both at 87.5% susceptibility) (Table 2). By clustering the strains

in 5-year groups (2002-2006, 2007-2011 and 2012-2016), we observed a decrease in susceptibility

rates for most of the drugs. The observed trends resulted to be statistically significant for imipenem

(p value=0.024), ertapenem (0.048) and ceftazidime (0.045). Data from the China Antimicrobial

Resistance Surveillance System (CARSS) revealed that the resistance rates of K. pneumoniae were on

a rising trend and reached 34.5 and 8.7% in 2016 to third generations cephalosporins and

carbapenems, respectively (CARSS). In line with such results, K. pneumoniae resistance rates reached

51.0 and 4.1% in 2016 for ceftazidime and imipenem, respectively, within H301.

Overall, the majority of the strains were classified as MDR (N=118, 59%), with an increase from 44.8%

through 51.2% to 64.8% over the three 5-year periods.

AK ATM FEP CAZ CIP ETP GEN IPM LEV TZP TOB SXT

2002-2006 86.2 55.2 79.3 72.4 39.3 96.6 73.1 100 63.0 89.3 65.5 75.0

2007-2011 90.2 62.8 79.1 69.8 37.2 90.7 58.1 97.7 65.1 74.4 51.2 65.1

2012-2016 87.5 46.1 72.7 50.8 35.2 84.4 54.7 92.2 62.5 79.5 51.2 45.3

total 87.5 51.0 75.0 58.0 36.1 87.5 57.8 94.5 63.1 79.8 53.3 53.8

Table 2. Percentages of susceptibility towards the following drugs: tzp: piperacillin/tazobactam, caz: ceftazidime, fep: cefepime, atm: aztreonam, gn: gentamicin, etp: ertapenem, ipm: imipenem, ak: amikacin, tob: tobramycin, gen: gentamycin, cip: ciprofloxacin, lev: levofloxacin, sxt: trimethoprim/sulfamethoxazole

77

4.3.2 Genomic epidemiology

Considering the random collection of 200 strains, 98 different STs were observed, including 27 novel

STs. The majority of STs (72.4%) were represented by only a single strain, highlighting the diversity

within the K. pneumoniae population. Eight clonal groups (CGs) were represented by at least five

strains, including the most frequent CG258 (N=28), CG23 (N=14), CG37 (N=13), CG14 (N=10), CG65

(N=9), CG15 (N=8), CG147 (N=8) and CG307 (N=7), and Figure 8 and Table 3 summarize their major

features. Strains belonging to CG258 represented the 14% of the population overall, and 60% of all

carbapenemase producers.

A total of 73 different K loci were detected, with 60 of them represented by a maximum of three

strains. The major K loci were KL2 (N=21, including ST14, ST65, ST380, ST375, ST86 and ST25), KL1

(N=17, including ST23, ST367 and two novel STs) and KL107 (N=10, including ST15 and 5 other less

represented STs). CG258 strains had the highest number of K loci, with 12 different ones detected, of

which 11 were detected in ST11 strains. CG37 was the second clonal group by K locus diversity, with

eight different ones detected. Conversely, CG23 and CG65, the hypervirulent clones, had K locus type

KL1 and KL2 only, respectively (Table 3).

Table 3. Features of the major CGs observed. The brackets enclose the percentages.

CG count mlst K_loci MDR ESBL CARBA ybt iuc clb rmpA rmpA2

CG258 28 ST11, ST11-

1LV, ST1264,

ST340, ST437

KL105, KL110, KL111,

KL14, KL141, KL142,

KL15, KL22, KL25, KL36,

KL39, KL47, KL64

25 (89.3) 15 (53.6) 9 (32.1) 16 (57.1) 2 (7.1) 0 0 2 (7.1)

CG23 14 ST23 KL1 2 (14.3) 1 (7.1) 1 (7.1) 14 (100.0) 14 (100.0) 14 (100.0) 13 (92.9) 14 (100.0)

CG37 13 ST309, ST37,

ST726, ST727

KL118, KL12, KL122,

KL128, KL15, KL21, KL23,

KL42

6 (46.2) 8 (61.5) 1 (7.7) 1 (7.7) 0 0 0 0

CG14 10 ST14 KL16, KL2 5 (50.0) 2 (20.0) 0 0 0 0 0 0

CG65 9 ST375, ST65 KL2 0 2 (22.2) 0 5 (55.6) 8 (88.9) 5 (55.6) 8 (88.9) 6 (66.7)

CG147 8 ST147, ST273 KL14, KL64, KL74, KL81 7 (87.5) 4 (50.0) 2 (25.0) 1 (12.5) 1 (12.5) 0 1 (12.5) 1 (12.5)

CG15 8 ST15 KL107, KL19, KL24, KL48 8 (100.0) 6 (75.0) 0 0 0 0 0 0

CG307 7 ST307 KL102 6 (85.7) 7 (100.0) 0 1 (14.3) 0 0 0 0

78

Figure 8. Features of the major CGs observed among the 200 randomly collected strains. The prevalence of MDR vs MDS A) and the types of ESBLs (B), carbapenemases (C) and capsular types (D) observed within the major CGs.

4.3.3 Antimicrobial resistance determinants.

More than half of the strains (N=110, 55%) harboured an ESBL-encoding gene, with 13 strains

harbouring more than one gene with up to four genes per strain. The most common ESBLs observed

were of the CTX-M type, with CTX-M-14 (N=35), CTX-M-3 (N=26) and CTX-M-15 (N=22) being the

most prevalent. CG307 strains had the highest prevalence of ESBLs, with all strains encoding for

either CTX-M-15 or CTX-M-14.

Four different carbapenemase-encoding genes were observed, blaKPC-2 (N=10), blaIMP-4 (N=2), blaOXA-48

(N=2) and blaIMP-30 (N=1). Strains belonging to ST11 carried most of the blaKPC-2 genes (90%), while the

remaining gene was found in an ST37 strain. The blaIMP-4 genes were observed in an hypervirulent

ST23 strain and in an ST337 strain. Two ST147 strains had either blaOXA-48 or blaIMP-30, and an ST383

strain had blaOXA-48.

Mutations in ompK genes were observed in 42 strains (21%) and consisted in insertion and deletions

leading to premature termination of OmpK35, which in few cases (N=9) were combined with

simultaneous ompK36 alterations. Such porin deficiencies were mainly observed within CG258, with

22 mutated strains out of 28 (78.6%). No porins alterations were observed for hypervirulent CG23

and CG65 strains.

79

Genes encoding 16S rRNA methyltransferase, associated with high-level aminoglycoside resistance,

were observed, with 13 strains harbouring armA, 11 harbouring rmtB genes and 2 strains harbouring

both armA and rmtB. Such genes were mainly observed in strains belonging to ST11 (N=9) and ST15

(N=4).

Several chromosomal mutations associated with known fluoroquinolone resistance were observed,

the most common being ParC80I (N=57), GyrA83I (N=47) and GyrA83F (N=11). Overall, 65 strains (32.5%)

had at least one ParC or GyrA mutations, the most common combination being GyrA83I-ParC80I (N=37),

and all 65 strains had high ciprofloxacin MIC (≥4 mg/L). Concerning the acquired fluoroquinolone

resistance mechanisms, QnrS1 (N=65), Aac(6')-Ib-cr (N=61) and QnrB4 (N=33) were the most

prevalent. Overall, 150 strains had at least one mechanism of fluoroquinolone resistance.

Genes encoding resistance to trimethoprim (dfrA) and sulfonamides (sul) were observed in 138

strains, with 100 carrying both genes and showing trimethoprim/sulfamethoxazole resistance.

Acquired mechanisms of colistin resistance were also observed. The mcr-1.1 gene was observed in

the K. pneumoniae ST231 strain K089 isolated in 2015. The gene was carried by a plasmid with

replicon IncX4 and identical to plasmid pMCR_WCHEC1618 (accession no. KY463454.1) obtained

from an E. coli strain from China in 2015 (Zhao et al. 2017). Strain K089 also encoded the ESBL CTX-

M-27, as well as fluoroquinolone, trimethoprim and sulfonamide resistance mechanisms. Two mcr-

9.1 genes were detected in K. quasipneumoniae subsp. quasipneumoniae K7029 and K7030 strains

belonging both to ST1681 and collected in 2005. Unfortunately, only relying on the Illumina short-

reads we were not able to determine the genetic background of the mcr-9.1 genes.

4.3.4 Hypervirulent K loci and acquired virulence genes.

K. pneumoniae capsule is a major virulence factor, and the capsule synthesis locus has considerable

genetic diversity between clonal groups (DeLeo et al. 2014; Wyres et al. 2015; Holt et al. 2015; Wyres

et al. 2016b).The hypervirulence-associated KL1 and KL2 represented the two most common capsular

polysaccharides within our collection. KL2 was associated with CG14 and CG65 strains (N=9 each),

and three more strains belonging to ST380, ST86 and ST25. KL1 was strictly linked to ST23 in K.

pneumoniae sensu stricto (N=14). KL1 was also observed in an ST367 K. quasipneumoniae subsp.

similipneumoniae, in a novel ST two locus variant of ST367 belonging to K. quasipneumoniae subsp.

similipneumoniae, and in a novel ST (single locus variant of ST527) belonging to Klebsiella variicola.

Siderophore gene acquisition was recently recognised as an important contributor to severe K.

pneumoniae invasive disease (Holt et al. 2015; Lam, Wick, et al. 2018). Lam et al. reported that the

ybt locus was present in 40.0% of the CG258, 87.8% of the hyper-virulent CG23, and was identified in

80

32.2% of the wider population. In our collection, yersiniabactin-encoding genes were observed in 61

strains (30.5%), and were located in eight different ICEKp chromosomally integrated mobile elements

and one plasmid. The major mobile elements were ICEKp10 (N=22) and ICEKp3 (N=17). While

ICEKp10 was linked to hypervirulent clones (CG23, N=14; CG65, N=5), ICEKp3 was mostly associated

with CG258 (N=9) and other non-hypervirulent clones. We observed ybt genes in 57.1% and 100% of

CG258 and CG23 strains, respectively, which is higher than previously reported (Lam, Wick, et al.

2018).

Plasmid-related iuc, iro, clb, rmpA and rmpA2 genes were also observed (iuc, 17%; iro, 16.5%; rmpA,

16%; rmpA2, 15%; clb, 11%), mostly associated with CG23 and CG65 (Figure 9). Because of its crucial

role in hypervirulence, aerobactin (iuc) positivity was considered a defining genetic trait for hvKP

(Russo et al. 2014). iuc1 was the most prevalent iuc lineage (N=32), and was linked to CG23 (N=14),

CG65 (N=8) and six other less represented CGs, including ‘classic’ clones and including a K.

quasipneumoniae subsp. similipneumoniae strain. iuc1 is usually located within the KpVP-1 virulence

plasmid (Lam, Wyres, et al. 2018) together with the previously mentioned virulence genes. We found

iuc1 together with iro1 (N=28), clb2 (N=14), clb3 (N=4), rmpA (N=28) and rmpA2 (N=29). Other iuc

lineages observed were iuc2, which is associated to KpVP-2 (Lam, Wyres, et al. 2018) and observed in

an ST380 strain, and iuc5, observed in an ST107 strain.

Figure 9. Percentages of virulence genes within the major CGs.

81

4.3.5 Comparative genomics of CG258 strains: cps diversity and hypervirulence

Figure 10. Phylogenetic analysis of CG258 strains, including 48 strains from this study and 18 strains from previous studies (Gu et al. 2017; Dong, Zhang, et al. 2018; Zhou et al. 2020). The fatal outbreak clone reported in China in 2017 (Gu et al. 2017) is highlighted on the tree. Aerobactin and salmochelin are not showed in the legend as they were of the type iuc1 and iro1 only, respectively. Chromosomal regions characterized by high SNPs density are reported on the right and their locations are shown compared to the reference GD4 genome (CP025951). Red blocks indicate predicted recombinations occurring on an internal branch, which are therefore shared by multiple isolates through common descent. Blue blocks represent recombinations that occur on terminal branches, which are unique to individual isolates.

Considering all 299 genomes, we ended with 48 non-duplicated CG258 genomes (ST11, N=40; ST11-

1LV, N=3; ST395, ST437, ST1264, ST340, ST1326, N=1 each). The rapid evolution within CG258 was

emphasized by the number of different capsular polysaccharides detected (N=17), of which 11

detected in ST11 only, and by the high evolutionary rate (~15 SNPs/genome/year) detected in

previous studies (Wyres et al. 2015; Zhou et al. 2020).

Figure 10 shows the phylogenetic relations of the 48 strains together with other ST11 strains

sequenced in previous studies. Two major clades were formed, with clade 1 consisting of ST11-KL47

and ST11-KL64 only, and clade 2 consisting of six different STs and 15 different cps types. Average

core SNP difference between clade 1 strains was 23, ranging from 0 to 60. Consistent with previous

studies, the major CG258 clone was ST11-KL47-KPC-2, which was similar to strains recently described

in China and causing outbreaks, including the fatal one that caused 5 deaths in 2017 (Gu et al. 2017;

Dong, Zhang, et al. 2018; Zhou et al. 2020). All strains from this clade harboured blaKPC-2 and carried

the ybt9 locus on an ICEKp3 element. Two of our ST11-KL47 strains were CR-hvKp and carried blaKPC-2

plus a pLVPK-like plasmid containing iuc1 and a truncated rmpA2. Retrospective studies have shown

that ST11-KL47 CR-hvKP emerged before 2015 and has since become detectable in different Asian

countries, including China, Hong Kong and India, suggesting that CR-hvKP may undergo worldwide

dissemination in the near future (Shankar et al. 2016; Wong et al. 2017; Du et al. 2018).

82

Clade 2 strains had 47 core SNPs on average, ranging from 0 to 123 (median 45). Recent studies

revealed the emergence and predominance of a novel ST11 clone, harbouring KL64, KPC-2 and the

hypervirulence plasmid in some instances (Zhou et al. 2020; Yang et al. 2020). Genomic analysis

revealed that this clade originated from ST11-KL47 after recombination of the cps genes around 2011

(Zhou et al. 2020). Of note, ST11-KL64 strains from this study did not cluster in clade 1 together with

previously reported ST11-KL64 strains, but they were located within clade 2. Analysis of

recombination sites revealed that such strains had two major regions of recombination, the cps

genes and the ICEKpnHS11286-1 region. Conversely, ST11-KL64 strains described by Zhou et al. only

showed recombination within the cps biosynthesis genes. Such findings suggest a different

evolutionary origin of ST11-KL64 strains from this study compared to the emerging clone described

by Zhou et al. The three ST11-KL64 strains in our collection were isolated in 2006 and 2007, they

lacked the blaKPC-2 gene and the ybt locus which is normally present in the ICEKpnHS11286-1

recombinant region. Strain ST11-KL64 K7069, isolated in 2007, carried a pLVPK-like plasmid

containing iuc1 and a truncated rmpA2 and also co-harboured blaCTX-M-3, armA and several other AMR

genes (Table S1). Only three strains out of the 28 composing the lower clade harboured blaKPC-2. Also,

the prevalence of yersiniabactin-encoding genes was lower compared to that of clade 1, with twelve

strains carrying either ybt9, ybt10, ybt13 or ybt14.

4.3.6 Phylogenetic analysis of the hypervirulent CG23

Figure 11. Comparative genomics of CG23 strains from the present study. STs are indicated by coloured tips, with yellow and green indicating ST23 and ST1265, respectively. All strains also contained the cps KL1, ybt1, clb2 and a truncated rmpA2. *replicons IncFIB(K) and IncHI1B of the pLVPK-like plasmid were observed in all strains.

A total of 19 non-duplicate CG23 strains were sequenced over the study period (Figure 11). All

belonged to ST23, except strain K7159 which belonged to ST1265. Average core SNPs observed were

186, ranging from 49 to 288 (median 188). All genomes contained the KL1 capsular locus, the

chromosomally encoded ybt1 embedded in ICEkp10 and the colibactin locus clb2. The hypervirulent

83

plasmid with IncFIB(K) and IncHI1B replicons was observed in all strains, containing iuc1, iro1, rmpA

and rmpA2 in most instances (Figure 11).

Strain K7159 (ST1265) shared 6 MLST genes with ST23, differing only for allele phoE, which is of type

9 and 10 in ST23 and ST1265, respectively. ST1265 was first described in Beijing in 2010, associated

with KL1 cps type, rmpA and a negative string test (Liu et al. 2014). Recombination analysis revealed

that strain K7159 had a ~750 Kbp recombinant region which also contained the phoE gene. Genomic

comparison revealed that this region likely originated from an ST35 genome (Figure 12).

Figure 12. Whole genome alignment of ST1265 in comparison to ST23 and ST35 genomes. The SGH10 chromosome was used as reference for the alignment. Pink lines indicate SNPs identified with the Harvest suite. The MLST gene phoE position is indicated, as well as the ~750 Kb region of divergence of ST1265 strains originating from ST35 genomes.

Strain K7159 was nearly identical to strain 11420 (GCA_009497755.1) isolated in Beijing in 2014 (Li et

al. 2020). Strain 11420 consists of a chromosome of length 5’438’591 bp, a pLVPK-like plasmid of size

229’796 bp and a KPC-2 plasmid of size 81’180 bp, containing the replicon IncN without additional

AMR genes. Reads mapping analysis showed that our ST1265 genome also contained two plasmids

with identical organization and 99.9% nucleotide identity compared to plasmids from strain 11420.

Three additional cases of genomic convergence of MDR and hypervirulence were observed. Strains

K931 and K862 both carried a ~50 Kbp IncN plasmid similar to pIMP-HZ1 (KU886034.1) described in

IMP-4-producing Enterobacteriaceae from China (Wang et al. 2017). While K862 carried a plasmid

identical to pIMP-HZ1, the IncN plasmid from strain K931 had blaCTX-M-3 and blaTEM-1 replacing the

blaIMP-4 gene. Strain K7046 had a plasmid identical to pCTX-M-3 (AF550415) described in C. freundii in

Poland (Gołȩbiewski et al. 2007). It’s a ~90 Kbp, IncL/M plasmid carrying blaCTX-M-3, armA, blaTEM-1,

aac(3)-IId, mph(E), msr(E), sul1, aadA2 and dfrA12 genes.

84

4.3.7 Global comparison of ST383: an emerging high-risk clone

Figure 13. Phylogenetic tree of ST383 genomes from this study in comparison with publicly available ST383 genomes. Coloured leaves indicate different capsular polysaccharides, where yellow is for KL30 and green for KL15.

We deeply investigated the strains belonging to ST383 as we found several of them to be CR-hvKp.

ST383 is an emerging clone that was first observed in Greek hospitals during 2009-2010 and strains

belonging to this clone were co-harbouring blaVIM-4, blaKPC-2 and blaCMY-4 β-lactamases (Papagiannitsis

et al. 2010). Figure 13 shows the phylogenetic relatedness of our ST383 together with publicly

available ST383 genomes. Only ten genomes were available, with most of them originating from

Greece. Strain KpvST383_NDM_OXA-48 from the UK had a complete genome and it was used as

reference for the phylogeny (Turton et al. 2019). Genomic relatedness showed strains from Europe

clustering together, the strain from the UK positioned apart from the rest of the tree, and the

Chinese strains from this study clustering together. Overall, an average of 158 core SNPs was

observed (min: 4, max: 627, median: 157), which decreases to 53 (min: 4, max: 182, median: 40) if we

only consider the strains from China. Two different K loci were observed, with the strain from

Belgium carrying KL15 and all other strains carrying KL30. Gubbins analysis revealed that the capsular

polysaccharide genes represented the major recombinant region. A second recombination concerned

a ~12 Kbp region consisting of mercury resistance genes and several transposases. No other major

recombination events were observed. Several carbapenemase-encoding genes were observed,

comprising the major clinically relevant KPC, OXA-48, NDM and VIM types, with two strains co-

harbouring two different carbapenemase genes. All strains from China carried the blaOXA-48 gene and

had an IncL/M plasmid replicon. ESBL-encoding genes were blaCTX-M-14, observed in all strains from

China, and strain K57 additionally had blaCTX-M-55.

Concerning virulence factors, yersiniabactin-encoding genes were not observed. Conversely, the

hypervirulent pLVPK-like plasmid was observed in some strains from China and in the strain from the

85

UK. Although it was not possible to fully reconstruct the hv plasmid sequences from our short-reads

sequence data, we detected iuc1 on a contig that matches a 45kb region of pLVPK and also carries

rmpA and rmpA2.

Strains belonging to ST383 and carrying OXA-48 plasmids were previously reported, with reports

from the UK (Dimou et al. 2012) and from China (Guo et al. 2016). In the latter study, Guo et al.

reported an outbreak caused by ST383 strains carrying a 70 Kb IncL/M OXA-48 plasmid. ST383 strains

carrying hypervirulence genes were also reported from UK, carrying the iuc and rmpA/A2 genes

together with carbapenemase-encoding genes of type blaOXA-48, sometimes in combination with

blaNDM (Turton et al. 2017, 2019)

4.3.8 Simultaneous carriage of acquired AMR and hypervirulence genes.

We detected eleven examples of genomic convergence of hypervirulence, indicated by the presence

of the aerobactin locus (iuc), and MDR, indicated by the presence of either an ESBL- or a

carbapenemase-encoding gene, in our 200 randomly selected strains (5.5%), spanning eight different

STs. Similarly, in a recent study from South and Southeast Asia aiming at studying the population

structure of bloodstream infection isolates, the prevalence of convergent strains was 7.3%, with

seven different STs observed (Wyres et al. 2020). By considering our complete collection of genomes

after exclusion of duplicates, we ended with 25 cases of genomic MDR-hv convergence (Table S2).

The occurrence of such convergent strains is on the rise, with 80% of them being detected in the

period 2012-2016. Among the convergent strains, the major ST reported was ST383, with 6 cases,

followed by ST11 and ST23 (3 cases each), ST29 (two cases) and eleven other STs with only one case.

Most cases of convergence (N=21) were characterized by the presence of a pLVPK-like plasmid. Such

a plasmid is common within hypervirulent clones such as CG23 and CG65, and we observed more

than 80% of its sequence within our CG23 and CG65 convergent strains. Conversely, variable portions

of the virulent plasmid were observed in normally non-hypervirulent clones (Table S2).

Aerobactin loci detected were of type 1 (N=21), 3 (N=3) and 5 (N=1). Most of the iuc1 convergent

strains belonged to ST383, ST23 and ST11 and were previously described. In some cases we were

able to detect the genetic background of the hv and MDR genes. The K. quasipneumoniae subsp.

similipneumoniae strain K898 belonged to ST367 and had an hypervirulent capsule of the KL1 type. It

carried a blaCTX-M-15 gene in an IncFII plasmid together with blaTEM-1. Such IncFII plasmid is ~95 Kbp and

is identical to pL22-5 (CP031262.1) obtained from an ST367 from Beijing. The pLVPK-like plasmid was

characterized by the presence of the replicon IncFIB(K) and by the virulence genes iuc1, iro1, rmpA

and a truncated rmpA2. Strain K7058 belonged to ST65 and carried a pLVPK-like plasmid plus an ~70

Kb IncFII plasmid harbouring blaCTX-M-15 and no other AMR genes.

86

Three strains carried iuc3 which was associated with IncFIBK and IncFII plasmids similar to NCTC11676

(NZ_UGMR01000002.1). Two of those strains also carried iro3 and an ICEKp1 element containing

ybt2 and rmpA. All three strains carried multiple ESBL-encoding genes, and strain K7156 additionally

harboured a blaIMP-4 carbapenemase-encoding gene.

The strain K7146 belonged to ST107 and carried iuc5 together with iro5, which have been previously

detected in E. coli plasmids such as p3PCN033 (CP006635.1). Reads mapping revealed that our ST107

strain contained a plasmid with 90% coverage and 99.5% identity compared to p3PCN033, including

the plasmid replicons IncFIB, IncFIC and IncQ1 and several AMR genes (aph(3')-Ia, aph(6)-Id, aph(3'')-

Ib, sul2, oqxA/B, dfrA17, blaTEM-1B, tet(B)). K7146 also carried the ESBL-encoding gene blaCTX-M-3 on a

plasmid with replicons IncN and IncU, also containing additional AMR genes (aac(6')-Ib-cr, ARR-3,

qnrS1, catA1, mph(A), dfrA14).

4.4 Conclusions

This study aimed to investigate the longitudinal population of K. pneumoniae clinical isolates from

the Hospital 301 (People's Liberation Army General Hospital) in Beijing, China. The major focus was

directed towards the investigation of ‘high-risk’ clones, those characterized by the simultaneous

carriage of AMR and hypervirulence genes and potentially able to cause serious infections with

limited treatment options. A major limitation was that the sample size was small, especially if we

consider that it was spread over a long time frame. While some sporadic clones may have been

missed from our collection, the major K. pneumoniae clones, as described in previous reports from

China (Zhang et al. 2016; Van Dorp et al. 2019; Yang et al. 2020; Zhou et al. 2020), were observed.

While we did not get a complete picture of the complex K. pneumoniae population, we were able to

detect the major AMR and virulence determinants and, eventually, their genetic environment. We

detected three major high-risk clones, characterized by ESBL and/or carbapenemase production or

hypervirulence, with also strains expressing both features simultaneously. Strains belonging to

CG258, the globally dominant clinical K. pneumoniae clone, were the most represented and showed

high diversity. However, one clone, ST11-KL47, represented the majority of strains, and was highly

associated with KPC-2 and several virulence factors. CG23 still remains the dominant hvKp clone.

While it is usually susceptible to multiple antibiotics, we found some strains harbouring MDR

plasmids encoding for ESBLs and carbapenemases. Moreover, we found a strain belonging to the

recently described ST1265 and we showed that it’s an hybrid strain originating from an ST23 and an

ST35. The simultaneous carriage of the cps KL1, the hypervirulence plasmid and a KPC-2 plasmid

underscore the importance of tracking the spread of such novel clone. We also reported the

emergence of a recently described high-risk clone, ST383. Conversely to strains belonging to CG258,

87

which are usually associated to KPC-2, ST383 strains seems to readily acquire carbapenemases of the

different types, sometimes harbouring two different types. Moreover, we found several ST383

strains carrying the hypervirulent plasmid. The combination of carbapenem resistance and

hypervirulence significantly reduces the antimicrobial options for treating the life-threatening

infections caused by such strains and therefore represents a major urgent challenge for clinical

treatment, infection control and public health (Chen & Kreiswirth 2017).

4.5 References

Argimón S et al. 2016. Microreact: visualizing and sharing data for genomic epidemiology and

phylogeography. Microb. genomics. 2:e000093. doi: 10.1099/mgen.0.000093.

Bialek-Davenet S et al. 2014. Genomic definition of hypervirulent and multidrug-resistant Klebsiella

pneumoniae clonal groups. Emerg. Infect. Dis. 20:1812–20. doi: 10.3201/eid2011.140206.

Carattoli A et al. 2014. In Silico detection and typing of plasmids using plasmidfinder and plasmid

multilocus sequence typing. Antimicrob. Agents Chemother. 58:3895–3903. doi: 10.1128/AAC.02412-

14.

CARSS. National Health and Family Planning Commission of the People’s Republic of China (2017).

Report on Current Status of Antimicrobial Agent Management and Antimicrobial Resistance in China.

Beijing: Beijing Union Medical University Press.

Chen L et al. 2014. Carbapenemase-producing Klebsiella pneumoniae: molecular and genetic

decoding. Trends Microbiol. 22:686–696. doi: 10.1016/j.tim.2014.09.003.

Chen L, Kreiswirth BN. 2017. Convergence of carbapenem-resistance and hypervirulence in Klebsiella

pneumoniae. Lancet Infect. Dis. 3099:9–10. doi: 10.1016/S1473-3099(17)30517-0.

Chen Y et al. 2020. Acquisition of Plasmid with Carbapenem-Resistance Gene blaKPC2 in Hypervirulent

Klebsiella pneumoniae , Singapore . Emerg. Infect. Dis. 26:549–559. doi: 10.3201/eid2603.191230.

Croucher NJ et al. 2015. Rapid phylogenetic analysis of large samples of recombinant bacterial whole

genome sequences using Gubbins. Nucleic Acids Res. 43:e15. doi: 10.1093/nar/gku1196.

DeLeo FR et al. 2014. Molecular dissection of the evolution of carbapenem-resistant multilocus

sequence type 258 Klebsiella pneumoniae. Proc. Natl. Acad. Sci. 111:4988–4993. doi:

10.1073/pnas.1321364111.

Dimou V, Dhanji H, Pike R, Livermore DM, Woodford N. 2012. Characterization of Enterobacteriaceae

88

producing OXA-48-like carbapenemases in the UK. J. Antimicrob. Chemother. 67:1660–1665. doi:

10.1093/jac/dks124.

Dong N, Zhang R, et al. 2018. Genome analysis of clinical multilocus sequence Type 11 Klebsiella

pneumoniae from China. Microb. Genomics. doi: 10.1099/mgen.0.000149.

Dong N, Lin D, Zhang R, Chan EW-C, Chen S. 2018. Carriage of blaKPC-2 by a virulence plasmid in

hypervirulent Klebsiella pneumoniae. J. Antimicrob. Chemother. 73:3317–3321. doi:

10.1093/jac/dky358.

Van Dorp L et al. 2019. Rapid phenotypic evolution in multidrug-resistant Klebsiella pneumoniae

hospital outbreak strains. Microb. Genomics. 5:1–11. doi: 10.1099/mgen.0.000263.

Du P, Zhang Y, Chen C. 2018. Emergence of carbapenem-resistant hypervirulent Klebsiella

pneumoniae. Lancet Infect. Dis. 18:23–24. doi: 10.1016/S1473-3099(17)30625-4.

EUCAST. 2019. The European Committee on Antimicrobial Susceptibility Testing. Breakpoint tables


Gołȩbiewski M et al. 2007. Complete nucleotide sequence of the pCTX-M3 plasmid and its

involvement in spread of the extended-spectrum β-lactamase gene blaCTX-M-3. Antimicrob. Agents

Chemother. 51:3789–3795. doi: 10.1128/AAC.00457-07.

Gu D et al. 2017. A fatal outbreak of ST11 carbapenem-resistant hypervirulent Klebsiella pneumoniae

in a Chinese hospital: A molecular epidemiological study. Lancet Infect. Dis. 18:37–46. doi:

10.1016/S1473-3099(17)30489-9.

Guo L et al. 2016. Nosocomial Outbreak of OXA-48-Producing Klebsiella pneumoniae in a Chinese

Hospital: Clonal Transmission of ST147 and ST383 Forestier, C, editor. PLoS One. 11:e0160754. doi:

10.1371/journal.pone.0160754.

Hadfield J et al. 2018. Phandango: an interactive viewer for bacterial population genomics.

Bioinformatics. Jan 15;34(2):292-293. doi: 10.1093/bioinformatics/btx610.

Holt KE et al. 2015. Genomic analysis of diversity, population structure, virulence, and antimicrobial

resistance in Klebsiella pneumoniae , an urgent threat to public health. Proc. Natl. Acad. Sci.

112:E3574–E3581. doi: 10.1073/pnas.1501049112.

Hu F et al. 2019. Resistance reported from China antimicrobial surveillance network (CHINET) in 2018.

Eur. J. Clin. Microbiol. Infect. Dis. 38:2275–2281. doi: 10.1007/s10096-019-03673-1.

89

Hu FP et al. 2016. Resistance trends among clinical isolates in China reported from CHINET

surveillance of bacterial resistance, 2005-2014. Clin. Microbiol. Infect. 22:S9–S14. doi:

10.1016/j.cmi.2016.01.001.

Kabha K et al. 1995. Relationships among capsular structure, phagocytosis, and mouse virulence in

Klebsiella pneumoniae. Infect. Immun. 63:847–52. http://www.ncbi.nlm.nih.gov/pubmed/7868255.

Lam MMC, Wick RR, et al. 2018. Genetic diversity, mobilisation and spread of the yersiniabactin-

encoding mobile element ICEKp in klebsiella pneumoniae populations. Microb. Genomics. 4. doi:

10.1099/mgen.0.000196.

Lam MMC, Wyres KL, et al. 2018. Tracking key virulence loci encoding aerobactin and salmochelin

siderophore synthesis in Klebsiella pneumoniae. Genome Med. 10:77. doi: 10.1186/s13073-018-

0587-5.

Li C et al. 2020. A rare carbapenem-resistant hypervirulent K1/ST1265 Klebsiella pneumoniae with an

untypeable blaKPC-harbored conjugative plasmid. J. Glob. Antimicrob. Resist. doi:

10.1016/j.jgar.2020.04.009.

Liu Y et al. 2017. Capsular Polysaccharide Types and Virulence-Related Traits of Epidemic KPC-

Producing Klebsiella pneumoniae Isolates in a Chinese University Hospital. Microb. Drug Resist.

23:901–907. doi: 10.1089/mdr.2016.0222.

Liu YM et al. 2014. Clinical and molecular characteristics of emerging hypervirulent Klebsiella

pneumoniae bloodstream infections in mainland China. Antimicrob. Agents Chemother. 58:5379–85.

doi: 10.1128/AAC.02523-14.

Magiorakos AP et al. 2012. Multidrug-resistant, extensively drug-resistant and pandrug-resistant

bacteria: An international expert proposal for interim standard definitions for acquired resistance.

Clin. Microbiol. Infect. 18:268–281. doi: 10.1111/j.1469-0691.2011.03570.x.

Paczosa MK, Mecsas J. 2016. Klebsiella pneumoniae: Going on the Offense with a Strong Defense.

Microbiol. Mol. Biol. Rev. 80:629–61. doi: 10.1128/MMBR.00078-15.

Papagiannitsis CC et al. 2010. Emergence of Klebsiella pneumoniae of a novel sequence type (ST383)

producing VIM-4, KPC-2 and CMY-4 β-lactamases. Int. J. Antimicrob. Agents. 36:573–574. doi:

10.1016/j.ijantimicag.2010.07.018.

Russo TA et al. 2014. Aerobactin mediates virulence and accounts for increased siderophore

production under iron-limiting conditions by hypervirulent (hypermucoviscous) Klebsiella

90

pneumoniae. Infect. Immun. 82:2356–2367. doi: 10.1128/IAI.01667-13.

Shankar C et al. 2016. Draft Genome Sequences of Three Hypervirulent Carbapenem-Resistant

Klebsiella pneumoniae Isolates from Bacteremia. Genome Announc. 4. doi: 10.1128/genomeA.01081-

16.

Shen D et al. 2019. Emergence of a multidrug-resistant hypervirulent klebsiella pneumoniae

sequence type 23 strain with a rare blaCTX-M-24-harboring virulence plasmid. Antimicrob. Agents

Chemother. 63. doi: 10.1128/AAC.02273-18.

Shon AS, Bajwa RPS, Russo TA. 2013. Hypervirulent (hypermucoviscous) Klebsiella pneumoniae.

Virulence. 4:107–118. doi: 10.4161/viru.22718.

Siu LK, Yeh KM, Lin JC, Fung CP, Chang FY. 2012. Klebsiella pneumoniae liver abscess: A new invasive

syndrome. Lancet Infect. Dis. 12:881–887. doi: 10.1016/S1473-3099(12)70205-0.

Stamatakis A. 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large

phylogenies. Bioinformatics. 30:1312–1313. doi: 10.1093/bioinformatics/btu033.

Struve C et al. 2015. Mapping the Evolution of Hypervirulent Klebsiella pneumoniae. MBio. 6:e00630.

doi: 10.1128/mBio.00630-15.

Treangen TJ, Ondov BD, Koren S, Phillippy AM. 2014. The Harvest suite for rapid core-genome

alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol. 15:524.

doi: 10.1186/s13059-014-0524-x.

Turton Jane et al. 2019. Hybrid resistance and virulence plasmids in “high-risk” clones of klebsiella

pneumoniae, including those carrying blaNDM-5. Microorganisms. 7. doi:

10.3390/microorganisms7090326.

Turton JF et al. 2017. Virulence genes in isolates of Klebsiella pneumoniae from the UK during 2016,

including among carbapenemase gene-positive hypervirulent K1-ST23 and ‘non-hypervirulent’ types

ST147, ST15 and ST383. J. Med. Microbiol. doi: 10.1099/jmm.0.000653.

Wang Y et al. 2017. IncN ST7 epidemic plasmid carrying blaIMP-4 in Enterobacteriaceae isolates with

epidemiological links to multiple geographical areas in China. J. Antimicrob. Chemother. 72:99–103.

doi: 10.1093/jac/dkw353.

Wick RR, Schultz MB, Zobel J, Holt KE. 2015. Bandage: interactive visualization of de novo genome

assemblies. Bioinformatics. 31:3350–2. doi: 10.1093/bioinformatics/btv383.

91

Wong MHY et al. 2018. Emergence of carbapenem-resistant hypervirulent Klebsiella pneumoniae.

Lancet Infect. Dis. 18:24. doi: 10.1016/S1473-3099(17)30629-1.

Wong MHY et al. 2017. Emergence of carbapenem-resistant hypervirulent Klebsiella pneumoniae.

Lancet Infect. Dis. 3099:5–6. doi: 10.1016/S1473-3099(17)30629-1.

Wyres KL et al. 2015. Extensive capsule locus variation and large-scale genomic recombination within

the Klebsiella pneumoniae clonal group 258. Genome Biol. Evol. 7:1267–1279. doi:

10.1093/gbe/evv062.

Wyres KL et al. 2020. Genomic surveillance for hypervirulence and multi-drug resistance in invasive

Klebsiella pneumoniae from South and Southeast Asia. Genome Med. 12:11. doi: 10.1186/s13073-

019-0706-y.

Wyres KL et al. 2016. Identification of Klebsiella capsule synthesis loci from whole genome data.

Microb. Genomics. 2:e000102. doi: 10.1099/mgen.0.000102.

Xu M et al. 2019. High prevalence of KPC-2-producing hypervirulent Klebsiella pneumoniae causing

meningitis in Eastern China. Infect. Drug Resist. 12:641–653. doi: 10.2147/IDR.S191892.

Yang Q et al. 2020. Emergence of ST11-K47 and ST11-K64 hypervirulent carbapenem-resistant

Klebsiella pneumoniae in bacterial liver abscesses from China: a molecular, biological, and

epidemiological study. Emerg. Microbes Infect. 9:320–331. doi: 10.1080/22221751.2020.1721334.

Yao H, Qin S, Chen S, Shen J, Du XD. 2018. Emergence of carbapenem-resistant hypervirulent

Klebsiella pneumoniae. Lancet Infect. Dis. 18:25. doi: 10.1016/S1473-3099(17)30628-X.

Zankari E et al. 2012. Identification of acquired antimicrobial resistance genes. J. Antimicrob.

Chemother. 67:2640–2644. doi: 10.1093/jac/dks261.

Zhang R et al. 2017. Nationwide Surveillance of Clinical Carbapenem-resistant Enterobacteriaceae

(CRE) Strains in China. EBioMedicine. 19:98–106. doi: 10.1016/j.ebiom.2017.04.032.

Zhang Y et al. 2016. High Prevalence of Hypervirulent Klebsiella pneumoniae Infection in China:

Geographic Distribution, Clinical Characteristics, and Antimicrobial Resistance. Antimicrob Agents

Chemother. 60:6115–6120. doi:10.1128/AAC.01127-16.

Zhang Y et al. 2018. Epidemiology of carbapenem-resistant Enterobacteriaceae infections: Report

from the China CRE Network. Antimicrob. Agents Chemother. 62. doi: 10.1128/AAC.01882-17.

Zhang Y et al. 2019. Evolution of hypervirulence in carbapenem-resistant Klebsiella pneumoniae in

92

China: a multicentre, molecular epidemiological analysis. J. Antimicrob. Chemother. doi:

10.1093/jac/dkz446.

Zhao F, Feng Y, Lü X, McNally A, Zong Z. 2017. Remarkable diversity of Escherichia coli carrying mcr-1

from hospital sewage with the identification of two new mcr-1 variants. Front. Microbiol. 8:2094. doi:

10.3389/fmicb.2017.02094.

Zhou K et al. 2020. Novel subclone of carbapenem-resistant klebsiella pneumoniae sequence type 11

with enhanced virulence and transmissibility, China. Emerg. Infect. Dis. 26:289–297. doi:

10.3201/eid2602.190594.

93

CHAPTER 5 : Interpreting k-mer based signatures for antibiotic

resistance prediction

Magali Jaillard1, Mattia Palmieri1, Alex van Belkum1 and Pierre Mahé1

1bioMérieux, Marcy l’Etoile, France

Submitted to GigaScience

94

5.1 Abstract

Background. Recent years witnessed the development of several k-mer-based approaches aiming to

predict phenotypic traits of bacteria based on their whole-genome sequences. While often

convincing in terms of predictive performance, the underlying models are in general not

straightforward to interpret, the interplay between the actual genetic determinant and its translation

as k-mers being generally hard to decipher.

Results. We propose a simple and computationally efficient strategy allowing one to cope with the

high correlation inherent to k-mer-based representations in supervised machine learning models,

leading to concise and easily interpretable signatures. We demonstrate the benefit of this approach

on the task of predicting the antibiotic resistance profile of a Klebsiella pneumoniae strain from its

genome, where our method leads to signatures defined as weighted linear combinations of genetic

elements that can easily be identified as genuine antibiotic resistance determinants, with state of the

art predictive performance.

Conclusions. By enhancing the interpretability of genomic k-mer-based antibiotic resistance

prediction models, our approach improves their clinical utility, hence will facilitate their adoption in

routine diagnostics by clinicians and microbiologists. While antibiotic resistance was the motivating

application, the method is generic and can be transposed to any other bacterial trait.

5.2 Introduction

Antimicrobial resistance (AMR) is a global healthcare problem and rapid diagnostics are needed to

select the right treatment, to follow the route to cure and to monitor and prevent community- and

hospital-acquired outbreaks of infections. Next-Generation Sequencing (NGS) is a disruptive

technology which is, potentially, able to supplant or even replace the current plethora of diagnostic

tests with a single, most probably well-affordable and faster solution. Inferring the antibiotic

resistance profile from a bacterial genome is challenging. However, good results have been obtained

for several species [1-7], including Klebsiella pneumoniae [8]. Su et al. [9] discussed the challenges of

NGS-based antibiotic susceptibility testing (AST) and provided a comprehensive review of the current

state of the art in this field.

Early approaches relied on the detection of known resistance markers to claim resistance, a strategy

sometimes referred to as direct association analysis [10]. While effective when the genetic bases of

antibiotic resistance are well known, which is the case for instance for most antibiotic resistance

mechanisms in the highly clonal species M. tuberculosis [11, 12] and Salmonella typhi [13], this

approach suffers from several limitations. First and foremost, it intrinsically relies on prior knowledge

95

of the precise nature of the resistance determinants, which may not be available for all species and

drugs. Secondly, it is not able to account for the fact that these markers can have different levels of

predictive power [14, 15], that they can act in a multi-factorial fashion through epistasis [16, 17], or

that resistance can result from the accumulation of several different mutations [18, 19]. Last but not

least, it is hazardous to predict susceptibility when no marker is detected, since the resistance marker

may be novel and databases incomplete. This issue is more and more addressed from the supervised

machine learning (ML) standpoint: given a set of genomes with associated reference phenotypes

(provided by phenotypic AST methods [20]), one seeks a prediction rule allowing to infer the

resistance or susceptibility of a novel strain from genomic features. Even for M. tuberculosis, where

the antibiotic resistance knowledge is probably among the most thorough and complete, recent

studies showed that performance of direct association strategies can still be significantly improved

by ML models [10, 17].

A great variety of ML strategies have been explored, taking into account several parameters. First,

regarding the nature of the genomic features considered: supervised ML models can indeed operate

from known markers like the ones involved in direct association strategies, offering the possibility to

discover more complex and multivariate marker combinations better predicting resistance

phenotypes [3, 10, 17], or directly using the raw sequences represented as k-mers [4, 8, 21-23]. The

latter approach offers several advantages: it does not require prior knowledge about the underlying

resistance mechanisms, allows to capture various types of genomic determinants (including the

acquisition of genes or point mutations), and does not require to align the genomes to a common

reference which may be hard to define for some species, especially the less clonal ones [24, 25].

Second, regarding the type of ML algorithms. Boosting algorithms [4, 8, 21], penalized regression

models [10, 17, 23], decision trees [26], random forest [10, 27], neural networks [17] or set cover

machines [22, 26] have already been successfully deployed in this context. While each algorithm has

its own merits and shortcomings, several studies reported comparable global performance for

various algorithms, with specific variations by drug and microbial species [10, 17, 28]. Finally,

different kinds of antibiotic susceptibility information can be considered: either discrete when the

objective is to distinguish susceptible from resistant (or non-susceptible) ones [10, 17, 21, 22], or

continuous, where one seeks to predict the minimum inhibitory concentration (MIC) of the

antimicrobial agent itself [3, 4, 8].

A critical challenge for the adoption of such predictive ML models by clinicians and microbiologists

resides in their level of interpretability and, ultimately, clinical action-driving ability. While the notion

of interpretability is somehow ill-defined, a natural requirement for the end-user would be to

achieve the prediction from a limited number of genomic features, that can be easily and

96

unambiguously interpreted as actual genetic determinants [25, 26]. This challenge is particularly

important using k-mer-based representations, for several reasons. Firstly, k-mers covering conserved

genomic regions are redundant and can be easily detected and filtered [29], but they define groups

of equivalent k-mers which are not always straightforward to interpret as genomic determinants [21-

23, 26]. Secondly, k-mers may not be specific of a given genomic region, hence may be hard to

annotate. This is especially the case for short k-mers, e.g., when k = 8 or k = 10 [4, 8]. Last but not

least, the k-mer-based representation of genomes intrinsically leads to very high-dimensional feature

spaces, with strongly correlated variables. Using k = 31 for instance, and depending on the bacterial

species considered, it is common to end up working with 105 - 106 (non-redundant) k-mers, many of

which are observed in almost the same sets of genomes, hence bringing almost the same

information regarding the studied phenotype.

We propose to rely on the adaptive cluster lasso (ACL) [30], an extension of Bühlmann et al. [31]

tailored to the high-dimension setting by means of a prior screening of variables. We implemented in

a R package a simple and efficient ACL-inspired strategy able to cope with the very high-dimension

and strong correlations of k-mer-based representation, leading to sparse and interpretable genomic

signatures. This approach compared favorably to the standard lasso on a systematic validation study

focusing on K. pneumoniae. It provided a comparable level of performance while offering better

interpretability of the genomic determinants involved in the models. We could identify known and

potentially novel resistance determinants from the corresponding k-mer signatures, which allowed to

extract meaningful scientific insights.

5.3 Methods

5.3.1 Datasets

Training dataset. We gathered the assembled genomes, provided as contigs, of 1665 strains to

develop MIC prediction models for K. pneumoniae [8]. This set of genomes defines our training

dataset. We focused on the 10 clinically most relevant antibiotics listed in Table 1 which belong to

seven different antibiotic classes. The reference MICs were cast into resistant, susceptible and

intermediate according to the Clinical and Laboratory Standards Institute (CLSI) breakpoints. The

intermediate and resistant strains were finally merged into a common category, to define a binary

classification problem aiming to distinguish susceptible (S) from non-susceptible (NS) strains. Table 1

provides the number of S/NS phenotypes available for each selected drug.

97

Table 1. Dataset constitution. This table provides the number of susceptible (S) and non-susceptible (NS) strains available in the training and test dataset for the various antibiotics considered. piper.tazo stands for piperacillin/tazobactam. Note that a limited number of susceptible strains is available in the test dataset for aztreonam, and to a lesser extent cefepime and meropenem.

k-merization of the training dataset. The k-merization was computed from the contigs of all training

genomes, using the DBGWAS software [25], with a k-mer size of 31 and filtering patterns with a

minor allele frequency (MAF) below 1%. DBGWAS allows for the deduplication of the strictly

equivalent k-mers by compacting overlapping non-branching paths of kmers into unitigs, thanks to

the use of a compacted De Bruijn Graph (cDBG) (Figure 1 A). DBGWAS stores the profiles of

presence/absence of each unitig in the training genomes in a matrix V such as Vi,j = 1 if the j-th unitig

is present in the i-th input genome and Vi,j = 0 otherwise (Figure 1, B1). Each vector Vi,j is then

transformed according to its allele frequency: if its allele frequency exceeds 0.5, meaning that it is

observed in more than 50% of the panel genomes, it is inverted as Vi,j = |1–Vi,j| so that its MAF

corresponds to its average value. This transformation renders identical two originally complementary

vectors. Keeping only the unique patterns then leads to an optimal reduction of the number of

features, without modifying the intrinsic statistical signal (Figure 1 B2). These unique, MAF-filtered,

patterns define the final variant matrix X, where Xi,j = 1 if the j-th pattern is found in the i-th genome,

and 0 otherwise. This process is described in details in Jaillard et al. [25]. The DBGWAS files

describing the cDBG are kept for the further interpretation of the genomic signatures, allowing to

visualize the unitigs of the selected patterns within their genomic environment.

In practice we carry out this k-merization process for each antibiotic separately, processing solely the

strains that have been phenotypically tested. The output of this k-merization step is a sparse variant

matrix X with, for instance in the case of the cefoxitin antibiotic, N = 1643 rows for the N cefoxitin-

phenotyped strains of the training panel and p = 1,234,397 columns representing the p distinct

patterns of presence/ absence retained by DBGWAS. The matrix X is binary as DBGWAS only encodes

the presence or absence in the genomes. It is sparse as only around 13% of the values are not null.

98

Figure 1. K-merization of the training genomes. Illustration of the DBGWAS process of k-merization and variant matrix construction. Refer to Jaillard et al. [25] for further details.

Test dataset. To validate the predictive performance of the models, we built an independent test

dataset involving 634 strains, including 114 strains from our bioMérieux collection (NCBI Bioproject

PRJNA449293 and PRJNA597427) and 520 strains from the PATRIC database (https://www.patricbrc.

org/). Such strains were mostly from the USA, the UK, Serbia, Greece and other European countries

and the MICs were obtained with either agar dilution, broth microdilution or Vitek 2 (bioMérieux,

Marcy l’Étoile, France) (see Supplementary Section S1). Table 1 provides the number of S/NS

phenotypes available in the test dataset.

5.3.2 Coping with highly correlated genomic features.

Logistic regression is a widely used generalized linear model addressing binary classification problems.

In our case, it consists of building a linear function defined for a strain represented by a vector x 𝜖 {0,

1}p as:

99

where p corresponds to the number of distinct patterns identified by DBGWAS, and x encodes their

presence/absence in the strain genome. To estimate the model coefficients and simultaneously

select a limited number of patterns from a training panel of n strains, one can rely on the L1 or lasso

penalty and consider the following optimization problem:

where yi = 0 if the ith strain, stored in the ith row of the training matrix X, is susceptible and 1

otherwise. The function L is the logistic loss function, which quantifies the discrepancy between the

true phenotypes yi of the strains and the predictions f(Xi,.) obtained by the model. The λ parameter

achieves a trade-off between this empirical error and the lasso regularization term, and is usually

optimized by cross-validation.

The feature selection ability of the lasso penalty is notoriously unstable in the presence of strong

correlation between features. This is particularly the case using k-mer based representations, making

it difficult to derive meaningful interpretations from the features selected by the model, and their

associated coefficients. We propose a simple and efficient three-step strategy to identify sparse and

interpretable genomic signatures.

Screening step. In this step, we screen features. For this purpose, we first fit a standard lasso-

penalized regression model on the original feature matrix X for several values of the regularization

parameter λ, and extract the set of features that are selected at some point on this regularization

path. Formally, letting (λ1, ..., λm) be the m values of the considered grid of λ, and B the p x m matrix

containing the model coefficients obtained by Equation 1. We define a set a of active features as:

and let pa = |a| be their number. Since the lasso cannot select more features than observations, we

typically end up with pa in the order of N (i.e., 103 in our case). We then extract the features which

are strongly correlated to the active ones from the entire feature matrix. For this purpose, we

compute a pa x p matrix G containing the pairwise correlations between the pa active features

identified beforehand and the p original ones. Formally, Gi,j = cor(X.,ai , X.,j), where cor is the standard

Pearson correlation between vectors of MAF patterns across the genomes, and is a classical criterion

to quantify linkage disequilibrium (LD) between genomic features [32]. Since we rely on binary

variables encoding the presence/absence of features in the genomes, Gi,j quantifies the extent to

which features i and j co-occur in the genomes. As pa is typically much smaller than p (in the orders

of 103 versus 106 in our case), computing this matrix is much easier than computing the entire p x p

100

correlation matrix. Finally, we extract the set e of features that are strongly correlated to at least one

active feature as:

where the hyperparameter s1 controls the minimum level of correlation required, and is referred to

as the screening threshold. This operation defines a set of pe = |e| features, called the set of

extended features. Obviously, we have pa ≤ pe ≤ p. In our context, we typically end up with a few

thousand extended features, hence pa < pe << p.

Clustering step. While the screening step identifies a limited number of features deemed sufficiently

correlated to the features identified by a standard lasso, the second step aims to explicitly define

groups, or clusters, of strongly correlated variables. We rely for this purpose on a bottom-up

agglomerative clustering procedure, as suggested by Bühlmann et al. [31]. More precisely, we first

define a pe x pe distance matrix D between extended features, defined as Di,j = |1 – cor(X.,ei , X.,ej )|.

This matrix is then used to carry out a hierarchical clustering, implemented in R by the hclust function,

using a minimum linkage criterion. The resulting dendrogram is finally cut at a height of 1–s2, the

second hyperparameter s2, called the clustering threshold, controlling the level of within-cluster

correlation.

Learning step. Finally, we summarize each identified cluster as a new composite variable, defined as

the average of the original variables defining the cluster, and carry out a standard lasso at the cluster

level. Since in our case the original variables encode the presence/absence of a given DBGWAS

pattern in the genomes, these composite variables correspond to the proportion of patterns involved

in a cluster that are present/absent in the genomes. Figure 2 summarizes this three-step method.

101

Figure 2. Three-step process. Illustration of the proposed three-step procedure.

5.3.3 Model selection

Our approach involves three hyperparameters that must be optimized for each antibiotic: the

screening and clustering thresholds s1 and s2 used to build the clusters of correlated variables, and

the regularization parameter λ involved in the final cluster-level lasso model. We relied on the

glmnet software [33] to fit the lasso models involved in both the screening and learning steps. We

used the default heuristic proposed by the software to define the grids of candidate values for the

regularization parameters. The screening and clustering thresholds were both systematically set to

0.95 based on preliminary experiments (see Supplementary Section S2), and we relied on a 10-fold

cross-validation procedure to optimize the regularization parameter involved in the final cluster-level

lasso model, as we now describe.

We first split the training dataset into ten folds, stratified by sequence type and phenotype. For each

of the ten folds, 9 tenth of the dataset were used to screen variables and identify clusters. The final

cluster-level lasso model was then fit and applied to the held-out strains, for each candidate value of

the regularization parameter. Our model selection strategy aimed to simultaneously maximize its

sensitivity and specificity, respectively defined as the fractions of correctly classified non-susceptible

and susceptible strains. For this purpose, a Receiver Operating Characteristic (ROC) curve was built

for each candidate regularization parameter after completion of the cross-validation procedure, and

the point closest to the optimal one (defined by a true positive rate of 1 and a false positive rate of 0)

was used to define the optimal sensitivity/specificity trade-off. Following Hicks et al. [28], we refer to

the average of the (optimal sensitivity and specificity as balanced accuracy (bACC). Finally, we

selected the sparsest model that allowed to maximize the balanced accuracy up to one point, in

order to reduce the risk of overfitting. In practice, this cross-validation procedure was repeated three

times and the selection was based on average balanced accuracy values obtained across the three

repetitions. Supplementary Figure S4 illustrates this model selection strategy.

5.3.4 Interpretation of the predictive signature

We use the DBGWAS software to interpret the genomic signatures, based on the cDBG built during

the k-merization step. The unitigs defining the patterns involved in the final model are visualized

within their neighborhood in the cDBG, which represents their genomic environment hence provides

insight on the type of variant involved, typically a plasmid-based acquired gene versus a local

mutation (single nucleotide polymorphism (SNP) or indel) in a chromosomal region.

102

5.3.5 Evaluation of the computational requirements

We evaluate the computational requirements of the standard lasso and cluster-lasso procedures by

measuring the time and memory required to compute a regularization path involving 100 values of

the regularization parameter. For the standard lasso, this simply amounts to calling the glmnet

function of the glmnet R package, using the variant matrix provided by DBGWAS. For the cluster-

lasso procedure, this amounts to:

i. making the same call to glmnet to identify the set of active variables,

ii. computing the pa x p correlation matrix G in order to identify the set of extended features,

iii. building the clusters of correlated variables

iv. making a second call to glmnet, using the variant matrix defined at the cluster-level.

This procedure is repeated five times for each drug, using a single Xeon E5-2690-V3 CPU.

5.4 Results

5.4.1 Cross-validation results

Table 2 provides the results obtained in terms of cross-validation performance and support size of

the models. The predictive performance is measured by the area under the ROC curve (AUC) and

balanced accuracy. Additional performance indicators are provided in Supplementary Table S1. The

support size of a model is defined as the number of features it involves, which respectively

corresponds to individual or clusters of DBGWAS patterns, for the lasso and cluster-lasso strategies.

We also report the overall number of unitigs involved, which is only slightly higher than the number

of features for the lasso and corresponds to unitigs in total LD. In contrast, this overall number is

markedly higher for the cluster-lasso strategy, because of the pattern clustering.

Table 2. Cross-validation results. This table summarizes the cross-validation results obtained by the lasso and cluster-lasso strategies for the 10 antibiotics, in terms of balanced accuracy (bACC), AUC, support size, overall number of unitigs involved and maximal number of unitigs associated to a single pattern or cluster (between brackets).

103

Both strategies show similar performance in terms of both balanced accuracy and AUC, confirming

that taking into account, or not, the correlation between features has a limited impact in terms of

predictive performance. We also note that the model support is often slightly smaller with cluster-

lasso (for 8 drugs out of 10), suggesting that several features selected separately with the lasso

ended up merged in a single cluster by the cluster-lasso. As expected, the overall number of unitigs

involved in a cluster-lasso model is significantly larger. Interestingly, it is not evenly distributed across

its features. In the meropenem model, for instance, 159 out of the 164 unitigs defining the model

features are associated to a single feature, suggesting that it corresponds to the presence of a gene,

as confirmed in the interpretation analysis depicted in the next section.

Finally, Figure 3 provides a graphical representation of the lasso and cluster-lasso signatures obtained

for ceftazidime, which are of moderate complexity. The heatmap shows the correlation between the

patterns involved in one signature and/or the other, and highlights the 8 major clusters identified by

the cluster-lasso strategy (clusters including more than 10 patterns). While all the patterns defining a

cluster have by construction a similar level of predictive power, the lasso model usually selected a

single one of them. There is an exception for the 3rd cluster, shown in green in the zoomed area of

Figure 3, where two patterns were selected as distinct features of the lasso model.

By explicitly reconstructing and providing these clusters of correlated features to the learning

algorithm, the cluster-lasso strategy leads to a more meaningful characterization of the genetic

determinants involved, as we describe below.

104

Figure 3. Correlation within features selected in the signatures. This heatmap shows the correlation matrix built from the features selected by the lasso and the cluster-lasso (identified by the orange and blue bars shown above the heatmap, respectively), for ceftazidime. The corresponding values of model coefficients are represented by green bars. The 8 major clusters (involving more than 10 patterns) of the cluster-lasso signatures are identified by a dedicated color ranging from red to grey. A zoom of the top left side of the figure allows a better reading of the colored bars for the major clusters 1, 3, 7 and 8.

5.4.2 Model interpretation

We focus on two drugs to illustrate the improved interpretability offered by cluster-lasso signatures:

meropenem, where the interpretation is straightforward, and cefoxitin, which is among the

signatures of highest support. Additional results obtained for the remaining drugs are deferred to

Supplementary Materials, Section S5.

As shown in Table 2, the lasso and cluster-lasso meropenem models involve 8 and 3 features,

respectively. As shown in Figure 4(B), each lasso feature corresponds to a single unitig, while the

cluster-lasso signature involves a large cluster of unitigs (159 out of the 164 involved). Figure 4(A)

shows the magnitude of the model coefficients. It reveals that the cluster-lasso signature is

essentially driven by a single prominent feature, while 4 to 5 features of the lasso signature have a

non-negligible weight. The major feature of the cluster-lasso signature corresponds to the large

cluster of correlated patterns, and the DBGWAS visualization (Figure 4(C)) shows that the

corresponding unitigs are organized as a long linear path in the cDBG. This suggests that this cluster

105

corresponds to an entire gene. The annotation provided by DBGWAS shows the gene to be the Class

A beta-lactamase blaKPC. The DBGWAS visualization obtained for the lasso signature indicates that 3

of the 8 features – features 1, 2 and 4 – are also co-located in a region of the cDBG annotated as

blaKPC. The fact that the lasso selected these specific unitigs within the blaKPC gene suggests that the

resistance determinants involved are SNPs or indels. While the gene-level annotation is the same as

that obtained with the cluster-lasso, the interpretation of the signature in terms of genetic variants is

therefore radically different. A closer look at the lasso signature reveals that the 3 blaKPC features are

actually strongly correlated: they are often observed together. Unsurprisingly, they belong to the

largest cluster involved in the cluster-lasso signature, and interestingly, their cumulative weight is

approximately equal to that of the cluster-lasso feature (3.4 instead of 3.3). By explicitly detecting

that these features are correlated, and merging them into a single feature, together with additional

correlated features not even involved in the lasso signature, the cluster-lasso leads to a more

meaningful interpretation of the underlying prediction model, in two aspects. Firstly, it captures the

true nature of the genomic determinant involved: the presence of the blaKPC gene, as opposed to

mutations within the gene. Secondly, it assesses the overall contribution of the gene presence in the

decision rule, while, in the lasso signature, this contribution is shared by several distinct yet

correlated features.

106

Figure 4. Interpretation of the meropenem signatures. This figure provides a detailed comparison of the lasso (left) and cluster-lasso (right) signatures. A) Absolute value of the coefficients of the models. B) Number of unitigs involved in the features of the models. C) Visualization of the first subgraph obtained by DBGWAS for each signature. Nodes of the graphs correspond to unitigs of the cDBG built by DBGWAS from the training panel of genomes, as illustrated in Figure 1 and detailed in [25]. Colors allow to identify which unitigs of the graphs in panel C are related to which features of the models in panels A and B.

Likewise, Figure 5 presents the DBGWAS analysis of the lasso and cluster-lasso signatures obtained

for cefoxitin. We focused on the two first subgraphs provided by the software, which represent the

two genomic neighbourhoods of the most important patterns, or clusters of patterns, involved in the

models. The subgraphs are indeed ordered according to the maximal absolute value of model

coefficients among all patterns or clusters involved in the subgraph. While DBGWAS identifies the

same resistance genes in both methods (the efflux pump ompK36 and blaKPC), the nature of the

underlying resistance determinants cannot be deduced from the lasso signature. The ompK36-

annotated subgraph obtained for the cluster-lasso signature (top-right panel of Figure 5) involves 2

clusters gathering 9 unitigs (clusters 1 and 3), and presents a topology attributable to a local

polymorphism: a complex bubble, with a fork separating susceptible (blue) and resistant (red) strains,

as described in [25]. The corresponding lasso subgraph, shown on the top-left panel, includes 4

patterns (patterns 1, 2, 32 and 56) each having its proper value of model coefficient, represented by

4 shades of colors ranging from blue to red. These distinct model coefficient values can lead to wrong

conclusions regarding the individual importance of the corresponding unitig sequences. Indeed,

aligning these unitigs with annotated ompK36 sequences reveals that features 2 and 56 both

represent the wild type, while features 1 and 32 align to the insertion of two amino acids in the L3

loop, as described in Novais et al. [34] (Supplementary Figure S6). The second lasso subgraph

(bottom-left panel of Figure 5) includes a single feature of the signature (shown in purple),

surrounded by seven nodes (shown in grey), among which two are annotated as blaKPC. The node of

the signature is however not annotated itself, hence the subgraph could be interpreted as a local

polymorphism in the promoter region of the blaKPC gene. The cluster-lasso subgraph shown on the

bottom-right panel reveals however that this unitig was selected by the lasso among hundreds of

highly correlated unitigs. They all belong to cluster 2, which includes the complete blaKPC gene (shown

between brackets) and plasmid sequences in strong LD.

107

Figure 5. DBGWAS visualizations for the interpretation of the cefoxitin signatures. This figure presents the two first subgraphs obtained by DBGWAS for the lasso and cluster-lasso signatures. The DBGWAS subgraphs are ordered by decreasing maximal absolute value of model coefficient among all patterns/ clusters involved in the subgraph. Likewise, pattern and cluster identifiers are ordered by decreasing absolute value of model coefficient, meaning for instance that pattern/cluster #1 has a greater weight in the model that pattern/cluster #2. The nodes (unitigs) belonging to patterns/clusters of the signatures are colored by the value of their model coefficients (from blue to red, indicating negative and positive values, respectively). The grey nodes/unitigs, not involved in the models, represent their genomic neighbourhood. The nodes for which an annotation related to antibiotic resistance was found are surrounded by a black circle. Bold brackets are used on the bottom right subgraph to highlight these black-circled nodes. This particular subgraph gathers 7 clusters, whose identifiers are reported on the picture. Cluster 2 is the largest one, and includes the blaKPC-annotated nodes. The dashed arrow shows which node of the cluster-lasso blaKPC subgraph corresponds to the one selected by the lasso.

Performance on the test set

Table 3 shows the predictive performance obtained on the test set by the lasso and cluster-lasso

signatures, as well as the models defined by Nguyen et al. [8], in terms of sensitivity, specificity and

balanced accuracy.

108

Table 3. Test set results. This table summarizes the results obtained on the test dataset by the lasso, cluster-lasso and Nguyen et al. [8] models for the 10 antibiotics, in terms of sensitivity, specificity and balanced accuracy (bACC). The MIC predicted by the Nguyen et al. [8] models were converted into S/NS categorical phenotypes according to the CLSI breakpoints.

We first noted that the lasso and cluster-lasso strategies reached a similar level of balanced accuracy

for most drugs, although they did not always achieve the same trade-off in terms of sensitivity and

specificity. We noted however that the confidence intervals of the corresponding sensitivities and

specificities largely overlapped for all drugs but ceftazidime (Figure 6 and Supplementary Figure S8),

indicating that they were not significantly different between lasso and cluster-lasso, except for one

drug.

Figure 6. Test set results. This figure represents the ROC curves obtained for cefepime, cefoxitin, ceftazidime and meropenem by the lasso (red) and cluster-lasso (blue) signatures, as well as their associated sensitivities / specificities and that of the Nguyen et al. [8] models, with their 95% confidence intervals.

109

We also noted that the models proposed by Nguyen et al. [8] usually achieved a lesser level of

balanced accuracy. This was the case for all drugs but cefepime, imipenem and meropenem, where

the performance remained comparable. Apart from these three drugs, the loss ranged from 6.6

points for piperacillin-tazobactam to 23.6 points for aztreonam. Strikingly, these models usually

achieved a much lower level of specificity than the lasso and cluster-lasso ones. This was especially

the case for ceftazidime, piperacillin-tazobactam, tetracycline and aztreonam, where the specificity

fell below 50%. In the latter case, every single strain was actually classified as resistant, hence the

specificity was null. As can be seen from Figure 6 and Supplementary Figure S8, however, the

confidence intervals of their sensitivities and specificities often overlapped with the ROC curves of

the lasso and cluster-lasso models. That these models were however trained to predict MICs, which

we subsequently cast into S/NS categories according to the CLSI breakpoints. While this strategy may

not be optimal to evaluate the ability of these models to accurately predict MICs, we noted that the

agreement between reference and predicted MICs was much smaller on this dataset than reported

in the original publication (see Supplementary Table S3).

We often observed a serious drop between the predictive performance estimated by cross-validation

and that observed for the test set: more than 5 points of balanced accuracy for 6 drugs out of 10, and

up to 10 points or more for amikacin, cefoxitin, imipenem and meropenem (13.4, 10.2, 10.9 and 9.9

points, respectively). This suggested that the training dataset taken from Nguyen et al. [8] could not

account for the entire diversity displayed by K. pneumoniae. A simple resistome-based analysis done

using the kleborate software revealed indeed that the prevalence of well-known resistance genes

was sometimes very different in the two panels. This is illustrated in Figure 7 for amikacin and

imipenem, which suffered from the highest performance drop. Redesigning the training and test

datasets by shuffling the original ones in order to obtain a homogeneous split fixed this

generalization issue (Supplementary Section S9). This illustrates that while machine learning models

can indeed succeed in learning accurate prediction rules, they fail to generalize when the dataset

they are trained on does not account for the overall diversity of the bacterial species.

110

Figure 7. Resistome analysis. This figure compares the training and test panels of genomes in terms of predictive performance and resistome constitution for the drugs amikacin (top) and imipenem (bottom). Left: predictive performance in terms of sensibility, specificity, bACC and AUC estimated by cross-validation on the training set and measured on the test set, using the lasso signatures. Right: comparison of the resistome constitutions. Each kleborate resistance marker is represented by its prevalence in the resistant strains of the training (x-axis) and test (y-axis) panels.

Finally, Table 3 and Supplementary Figure S9 shows an uneven level of prediction performance

among the ten antibiotics considered. The best performances were obtained for ciprofloxacin and

ceftazidime, with an AUC around 95% using either the original or the redesigned datasets

(Supplementary Figure S9). The poorest performances were obtained for two beta-lactams: cefepime,

a 4th-generation cephalosporin, and the monobactam aztreonam. This may be due to a reduced

penetrance of their genetic determinants, as described in human genetics [35], because more

complex resistance mechanisms are involved, including efflux pumps, gene regulation, or plasmid

copy number [36-38].

5.4.3 Computational requirements

Figure 8 indicates that while the duration of the cluster-lasso was in average about three times

longer than the lasso (571 vs 180 seconds), it took only about 10 minutes to obtain an entire

regularization path defined at the cluster-level. Optimizing the regularization parameter using our

cross-validation process therefore took approximately 5 hours on a single CPU. We noted that while

111

the time required by the lasso was relatively homogeneous across drugs, it was more variable for the

cluster-lasso. This variability was due to the fact that the lasso used in the first step identified a

variable number of active features, which directly impacted the time required to screen the

remaining ones. This is illustrated in Supplementary Figure S7.

Figure 8. Time and memory requirements. The boxplots represent the variability of the time (panel A) and maximum memory (panel B) required to generate a lasso or cluster-lasso regularization path for the ten antibiotics.

In terms of memory, we noted that the cluster-lasso procedure led to an overhead of about 2 GB

with respect to the lasso, which was related to the computation of the correlation matrix G . In

practice, we limited this overhead by computing this matrix by slices, considering subsets of p’ =

10,000 features and computing pa x p’ matrices instead of the entire pa x p matrix at once. Altogether,

this led to a computationally efficient procedure, allowing to identify cluster-level signatures in a few

hours, for a limited memory footprint. We note that it could be straightforwardly parallelized, using

several CPUs to compute the various slices of the correlation matrix G.

5.5 Discussion

Representing bacterial genomes using k-mers leads to very high-dimensional representations with

strong correlation structures. This may hinder a meaningful interpretation of predictive models built

by sparse ML strategies like lasso-penalized regressions [39] or decision trees-based algorithms [40],

which are known to be unstable in this case: when some features are strongly correlated, they tend

to pick one, or few ones, out of them arbitrarily [41]. This instability may not be an issue in terms of

predictive performance: as long as one feature among a group of correlated ones appears in the

model, the prediction may be unchanged. It may however have a severe impact in terms of

interpretability, as the features selected by the model may provide an incomplete or erroneous

characterization of the causal resistance determinant.

112

We propose a simple and computationally efficient strategy to cope with the strong correlation

structures inherent to k-mer-based representations, and build sparse and meaningful genomic

signatures. While performing a systematic study on thousands of strains of K. pneumoniae, our

approach compared favorably to the state of the art, providing indeed a comparable level of

performance, while offering a greater interpretability of the genomic features involved in the models.

On this challenging genetically flexible bacterial species with significant accessory genome

components, this new approach allowed to extract meaningful scientific insights from the identified

signatures, as further detailed in Section S5 of the Supplementary Materials.

Central to our approach is a three-step strategy, where a sparse ML algorithm is first used to screen

features in a generic manner, which are then extended to clusters of strongly correlated features,

ultimately considered as candidate features to be included in the final antibiotic resistance prediction

model. While we here relied on lasso-penalized logistic regression for both the screening and final

learning stages, this principle is generic and could readily be transposed to other sparse ML

algorithms, like xgboost [4, 8] or set cover machines [26]. Likewise, it could straightforwardly be

extended to handle MICs or other phenotypic traits, as well as other types of genomic features (e.g.,

relying on SNPs instead of k-mers).

Several alternative strategies could be considered to handle correlations between k-mers. Most

related to our approach are the elastic-net and the group-lasso strategies, which also rely on logistic

regression – and more generally on generalized linear models – but with alternative regularization

penalties. The elastic-net penalty combines the lasso and the ridge penalties, which leads to sparse

models with a grouping mechanism: correlated features tend to be selected together [42]. This

approach was recently shown to be efficient in the context of bacterial genome-wide association

studies (GWAS), providing increased statistical power for the identification of genotype-phenotype

associations and accurate prediction rules [43]. As we demonstrate in Supplementary Section S10,

however, it remains limited in its ability to provide interpretable predictive signatures, for several

reasons. First, while it has the effect of stabilizing the lasso solution and of simultaneously activating

groups of correlated features, these groups are not defined explicitly, which intrinsically makes the

interpretation of the model difficult. Moreover, while the parameter controlling the trade-off

between the lasso and ridge penalties had a direct impact on the number of selected features, it had

little impact on the predictive performance of the model, thereby making it difficult to optimize

objectively. Finally, we empirically observed that it led to a partial and heterogeneous reconstruction

of the genomic features obtained by the cluster-lasso: a significant fraction of the cluster members

were not selected by the elastic-net, and the individual weights associated to the selected ones

greatly varied, although their level of predictive power was comparable.

113

The group-lasso penalty leverages predefined groups of features, ensuring that all features of a given

group are either active or inactive simultaneously [44]. This strategy was for instance considered in

human GWAS, using groups of SNPs defined spatially to account for their LD [45]. Transposing this

idea to bacterial genomes is challenging since no such prior information is available to define groups,

as LD can be genome-wide [29]. A solution could be to identify clusters of correlated k-mers using

agglomerating strategies [31], but is hard to carry out in practice from the high-dimensional datasets

involving 105 - 106 features encountered in our application. Our approach can therefore be seen as a

simple and efficient strategy to approximate such a group-lasso process in very high-dimensional

settings. Instead of collapsing groups of correlated features into composite variables, a natural

extension of our method would however be to rely on a group-lasso penalized regression defined at

the cluster level. Each feature would then be granted its own weight, which could allow to better

reflect their individual predictive power. We empirically observed that the weights variability within a

cluster was very small, as shown in Supplementary Figure S13, which therefore indicated that

keeping the features separated or averaging them is essentially equivalent. In practice, we find it

easier to explicitly collapse each cluster to a single composite variable to interpret the model

parameters.

On the practical side, our method involves two hyper-parameters, besides the regularization

parameter, to identify active variables and to build the final model. Although these so-called

screening and clustering thresholds did not have a strong influence in this study (Supplementary

Section S2), they may be cumbersome to optimize in practice for other applications. A natural

extension to our method would be to consider re-sampling strategies in the clustering step, in order

to identify stable clusters, whose constitution would be robust to the precise definition of the

clustering threshold [46]. Alternatively, one could rely on tree-guided lasso penalization to leverage

the entire dendrogram during the final learning step, which would then simultaneously identify

clusters and learn the prediction model [47].

Regarding AMR prediction, our study led on K. pneumoniae confirms several observations made

recently, namely that kmer- based approaches can learn sparse prediction rules without any prior

information, that predictions are more accurate with R/S models than MICs and that the level of

predictive performance can vary by antibiotic [26, 28]. Importantly, our study involved a novel panel

of 634 K. pneumoniae strains for the validation of the prediction models and suggested that the

problem is more challenging than reported in Nguyen et al. [8]. The figures reported in this study

were indeed probably optimistic because the genomes panel considered did not account for the

overall genomic diversity of the K. pneumoniae species (Supplementary Section S1). The 634

114

additional strains with genomes and phenotypes considered in this study will help learning more

accurate and generalizable predictions models.

Finally, the ML methods developed in this study are available in a generic R package that can be

easily transposed to other applications, not necessarily involving k-mers nor AMR phenotypes. On

the challenging dataset considering in this study, involving more than a thousand strains for more

than a million genomic features, the computational requirements remained limited and the

signatures could be identified in a few hours on a standard workstation. Coupled with the enriched

level of interpretability they offer, we believe our approach will help defining prediction models

amenable to routine diagnostics.

5.6 References

1. Gordon NC, Price JR, Cole K, Everitt R, Morgan M, Finney F, et al. Prediction of Staphylococcus

aureus Antimicrobial Resistance by Whole-Genome Sequencing. Journal of Clinical Microbiology

2014;52(4):1182–1191.

2. Walker TM, Kohl TA, Omar SV, Hedge J, Elias CDO, Bradley P, et al. Whole-genome sequencing for

prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort

study. The Lancet Infections Diseases 2015;15:1193–1202.

3. Eyre DW, De Silva D, Cole K, Peters J, Cole MJ, Grad YH, et al. WGS to predict antibiotic MICs for

Neisseria gonorrhoeae. The Journal of Antimicrobial Chemotherapy 2017;72(7):1937–1947.

4. Nguyen M, Long SW, McDermott PF, Olsen RJ, Olson R, Stevens RL, et al. Using Machine Learning

To Predict Antimicrobial MICs and Associated Genomic Features for Nontyphoidal Salmonella.

Journal of Clinical Microbiology 2019;57(2).

5. Tyson GH, McDermott PF, Li C, Chen Y, Tadesse DA, Mukherjee S, et al. WGS accurately predicts

antimicrobial resistance in Escherichia coli. Journal of Antimicrobial Chemotherapy 2015;70(10).

6. Moradigaravand D, Palm M, Farewell A, Mustonen V, Warringer J, Parts L. Prediction of antibiotic

resistance in Escherichia coli from large-scale pan-genome data. PLOS Computational Biology

2018;14(12):1–17.

7. Deng X, Memari N, Teatero S, Athey T, Isabel M, Mazzulli T, et al. Whole-genome Sequencing for

Surveillance of Invasive Pneumococcal Diseases in Ontario, Canada: Rapid Prediction of Genotype,

Antibiotic Resistance and Characterization of Emerging Serotype 22F. Frontiers in Microbiology

2016;7:2099.

115

8. Nguyen M, Brettin T, Long SW, Musser JM, Olsen RJ, Olson R, et al. Developing an in silico

minimum inhibitory concentration panel test for Klebsiella pneumoniae. Scientific reports

2018;8(1):421.

9. Su M, Satola SW, Read TD. Genome-Based Prediction of Bacterial Antibiotic Resistance. Journal of

Clinical Microbiology 2019;57(3).

10. Yang Y, Niehaus KE, Walker TM, Iqbal Z, Walker AS, Wilson DJ, et al. Machine Learning for

Classifying Tuberculosis Drug-Resistance from DNA Sequencing Data. Bioinformatics 2017;p. btx801.

11. Coll F, McNerney R, Preston MD, Guerra-Assunção JA, Warry A, Hill-Cawthorne G, et al. Rapid

determination of anti-tuberculosis drug resistance from whole-genome sequences. Genome

Medicine 2015;7(1):51.

12. Bradley P, Gordon NC, Walker TM, Dunn L, Heys S, Huang B, et al. Rapid antibiotic-resistance

predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis.

Nature Communications 2015;6:10063.

13. Tanmoy AM, Westeel E, De Bruyne K, Goris J, Rajoharison A, Sajib MS, et al. Salmonella enterica

Serovar Typhi in Bangladesh: exploration of genomic diversity and antimicrobial resistance. mBio

2018;9(6):e02112–18.

14. Miotto P, Tessema B, Tagliani E, Chindelevitch L, Starks AM, Emerson C, et al. A standardised

method for interpreting the association between mutations and phenotypic drug resistance in

Mycobacterium tuberculosis. European Respiratory Journal 2017;50(6).

15. Mahé P, El Azami M, Barlas P, Tournoud M. A large scale evaluation of TBProfiler and Mykrobe for

antibiotic resistance prediction in Mycobacterium tuberculosis. PeerJ 2019 May;7:e6857.

16. Gygli SM, Borrell S, Trauner A, Gagneux S. Antimicrobial resistance in Mycobacterium tuberculosis:

mechanistic and evolutionary perspectives. FEMS Microbiology Reviews 2017 03;41(3):354–373.

17. Chen ML, Doddi A, Royer J, Freschi L, Schito M, Ezewudo M, et al. Beyond multidrug resistance:

Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis

resistance prediction. EBioMedicine 2019.

18. Palomino JC, Martin A. Drug resistance mechanisms in Mycobacterium tuberculosis. Antibiotics

2014;3:317–340.

116

19. Palmer AC, Kishony R. Understanding, predicting and manipulating the genotypic evolution of

antibiotic resistance. Nature Review Genetics 2013;14:243–248.

20. van Belkum A, Burnham CAD, Rossen JWA, Mallard F, Rochas O, Dunne Jr WM. Innovative and

rapid antimicrobial susceptibility testing systems. Nature Reviews Microbiology 2020.

21. Davis JJ, Boisvert S, Brettin T, Kenyon RW, Mao C, Olson R, et al. Antimicrobial Resistance

Prediction in PATRIC and RAST. Scientific Reports 2016;6:27930.

22. Drouin A, Giguère S, Déraspe M, Marchand M, Tyers M, Loo VG, et al. Predictive computational

phenotyping and biomarker discovery using reference-free genome comparisons. BMC Genomics

2016;17(1):754.

23. Mahé P, Tournoud M. Predicting bacterial resistance from whole-genome sequences using k-

mers and stability selection. BMC Bioinformatics 2018 Oct;19(1):383.

24. Lees JA, Vehkala M, Välimäki N, Harris SR, Chewapreecha C, Croucher NJ, et al. Sequence element

enrichment analysis to determine the genetic basis of bacterial phenotypes. Nature Communications

2016;7(12797).

25. Jaillard M, Lima L, Tournoud M, Mahé P, van Belkum A, Lacroix V, et al. A fast and agnostic

method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic

events. PLOS Genetics 2018 11;14(11):1–28.

26. Drouin A, Letarte G, Raymond F, Marchand M, Corbeil J, Laviolette F. Interpretable genotype-to-

phenotype classifiers with performance guarantees. Scientific Reports 2019 dec;9(1).

27. Farhat MR, Sultana R, Iartchouk O, Bozeman S, Galagan J, Sisk P, et al. Genetic determinants of

drug resistance in Mycobacterium tuberculosis and their diagnostic value. Am J Respir Crit Care Med

2016 2016 Sep 1;194(5):621–30.

28. Hicks AL, Wheeler N, Sanchez-Buso L, Rakeman JL, Harris SR, Grad YH. Evaluation of parameters

affecting performance and reliability of machine learning-based antibiotic susceptibility testing from

whole genome sequencing data. PLOS Computational Biology 2019;15(9):e1007349.

29. Earle SG, Wu CH, Charlesworth J, Stoesser N, Gordon NC, Walker TM, et al. Identifying lineage

effects when controlling for population structure improves power in bacterial association studies.

Nature Microbiology 2016;1(16041).

117

30. Gauraha N, Parui SK. Efficient clustering of correlated variables and variable selection in high-

dimensional linear models. arXiv preprint arXiv:160303724 2016;.

31. Bühlmann P, Rütimann P, van de Geer S, Zhang CH. Correlated variables in regression: Clustering

and sparse estimation. Journal of Statistical Planning and Inference 2013;143:1835–1858.

32. Slatkin M. Linkage disequilibrium: understanding the evolutionary past and mapping the medical

future. Nature reviews genetics 2008;9(6):477.

33. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via

Coordinate Descent. Journal of Statistical Software 2010;33(1):1–22.

34. Novais A, Rodrigues C, Branquinho R, Antunes P, Grosso F, Boaventura L, et al. Spread of an

OmpK36-modified ST15 Klebsiella pneumoniae variant during an outbreak involving multiple

carbapenem-resistant Enterobacteriaceae species and clones. European journal of clinical

microbiology & infectious diseases 2012;31(11):3057–3063.

35. Cooper DN, Krawczak M, Polychronakos C, Tyler-Smith C, Kehrer-Sawatzki H. Where genotype is

not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance

in human inherited disease. Human genetics 2013;132(10):1077–1130.

36. Hocquet D, Nordmann P, El Garch F, Cabanne L, Plésiat P. Involvement of the MexXY-OprM efflux

system in emergence of cefepime resistance in clinical strains of Pseudomonas aeruginosa.

Antimicrobial agents and chemotherapy 2006;50(4):1347–1351.

37. Pages JM, Lavigne JP, Leflon-Guibout V, Marcon E, Bert F, Noussair L, et al. Efflux pump, the

masked side of ß-lactam resistance in Klebsiella pneumoniae clinical isolates. PLoS One

2009;4(3):e4817.

38. Kitchel B, Rasheed JK, Endimiani A, Hujer AM, Anderson KF, Bonomo RA, et al. Genetic factors

associated with elevated carbapenem resistance in KPC-producing Klebsiella pneumoniae.

Antimicrobial agents and chemotherapy 2010;54(10):4201–4207.

39. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical

Society: Series B (Methodological) 1996;58(1):267–288.

40. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm

sigkdd international conference on knowledge discovery and data mining ACM; 2016. p. 785–794.

118

41. Hastie T, Tibshirani R, Wainwright M. Statistical Learning with Sparsity: The Lasso and

Generalizations. Chapman & Hall/CRC; 2015.

42. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the royal

statistical society: series B (statistical methodology) 2005;67(2):301–320.

43. Lees JA, Tien Mai T, Galardini M, Wheeler NE, Corander J. Improved inference and prediction of

bacterial genotype-phenotype associations using pangenome-spanning regressions. bioRxiv

2019;https://www.biorxiv.org/content/ early/2019/11/23/852426.

44. Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of

the Royal Statistical Society: Series B (Statistical Methodology) 2006;68(1):49– 67.

45. Dehman A, Ambroise C, Neuvial P. Performance of a blockwise approach in variable selection

using linkage disequilibrium information. BMC bioinformatics 2015;16(1):148.

46. Kimes PK, Liu Y, Hayes DN, Marron JS. Statistical significance for hierarchical clustering.

Biometrics 2014;73(3):811–821.

47. Kim S, Xing EP. Tree-guided group lasso for multi-task regression with structured sparsity. In:

International Conference on Machine Learning; 2010. p. 543–550.

119

CHAPTER 6 : PFM-like, a novel family of subclass B2 metallo β-

lactamase from Pseudomonas synxantha belonging to the

Pseudomonas fluorescens complex

Laurent Poirel1,2,3, Mattia Palmieri1,4, Michael Brilhante5, Amandine Masseron1, Vincent Perreten5,

Patrice Nordmann1,2,3,6

1Microbiology Unit, Department of Medicine, Faculty of Science, University of Fribourg, Fribourg,

Switzerland.

2INSERM European Unit (IAME, France), University of Fribourg, Fribourg, Switzerland.

3Swiss National Reference Center for Emerging Antibiotic Resistance (NARA), University of Fribourg,

Fribourg, Switzerland.


5Institute of Veterinary Bacteriology, Vetsuisse Faculty, University of Bern, Bern, Switzerland.

6Institute for Microbiology, University of Lausanne and University Hospital Centre, Lausanne,

Switzerland

Published in Antimicrobial Agents and Chemotherapy, 27 January 2020, doi: 10.1128/AAC.01700-

19

120

6.1 Abstract

A carbapenem-resistant Pseudomonas synxantha isolate recovered from chicken meat produced the

novel carbapenemase PFM-1. That subclass B2 metallo-β-lactamase shared 71% amino acid identity

with β-lactamase Sfh-1 from Serratia fonticola. The blaPFM-1 gene was chromosomally located and

likely acquired. Variants of PFM-1 sharing 90% to 92% amino acid identity were identified in bacterial

species belonging to the Pseudomonas fluorescens complex, including Pseudomonas libanensis (PFM-

2) and Pseudomonas fluorescens (PFM-3), highlighting that these species constitute reservoirs of

PFM-like encoding genes.

6.2 Main text

Metallo-β-lactamases (MBLs) are zinc-dependent enzymes that can catalyze the hydrolysis of

virtually all β-lactam antibiotics (including carbapenems) except for monobactams and that are

resistant to the β-lactamase inhibitors clavulanate, tazobactam, and avibactam (1). They constitute a

highly diverse family of enzymes and can be categorized into three subclasses, namely, B1, B2, and

B3 (2). The subclass B1 enzymes are the most clinically important since they comprise MBLs such as

IMP-1, NDM-1, SPM-1, KHM-1, VIM-1, and VIM-2 (3), widely identified in Enterobacteriaceae,

Acinetobacter spp., and Pseudomonas spp. Subclass B2 includes CphA (4, 5), ImiS (6, 7), and AsbM1

(8), which are intrinsic enzymes in Aeromonas spp., and Sfh-I (9) from the occasionally pathogenic

species Serratia fonticola. These carbapenemases are monozinc enzymes that usually shown much

higher hydrolysis rates against carbapenem substrates than the other β-lactams (9).

Production of MBLs in the Pseudomonas genus is frequently observed, with acquired MBL-encoding

genes (blaIMP, blaVIM, blaSPM) being reported worldwide mainly in Pseudomonas aeruginosa and, to a

lesser extent, in Pseudomonas fluorescens (10, 11). In addition, intrinsic MBL genes encoding subclass

B3 POM-1-like and PAM-1-like enzymes have been identified in Pseudomonas otitidis and

Pseudomonas alcaligenes, respectively (12–14).

P. fluorescens and related species belonging to a same complex are rarely associated with infections

in human medicine (15). Nevertheless, P. fluorescens can cause bloodstream infections in humans,

and most reported cases have been iatrogenic (16). Few studies have focused on the β-lactamase

gene content of the P. fluorescens complex. While P. fluorescens possesses a chromosomally located

and inducible Ambler class C β-lactamase gene (17), the acquired but chromosomally located blaBIC-1

gene encoding an Ambler class A carbapenemase was previously identified as a source of

carbapenem resistance in P. fluorescens isolates recovered from the Seine River, Paris (18).

121

Here, we analyzed a carbapenem-resistant Pseudomonas sp. isolate that had been recovered during

a survey aimed to study the spread of multidrug-resistant Gram- negative organisms among food

varieties and food-producing animals in Switzerland in 2018. Isolate MCP-106 was isolated from

chicken meat after an 18-h preenrichment in LB broth and subsequent selection on ChromID

CarbaSmart (bioMérieux, La Balme-les- Grottes, France). Carbapenemase production was tested

using the Rapid Carba NP test (19). Matrix-assisted laser desorption ionization–time of flight (MALDI-

TOF) analysis assigned the strain to the Pseudomonas synxantha species, and that assignment was

further confirmed by analysis of the rpoB and rpoD gene sequences (Fig. 1). P. synxantha, which

belongs to the P. fluorescens complex (20), is an environmental species that reduces and

accumulates the heavy metal chromium (21, 22) that is pathogenic to nematode eggs and may

therefore be used as a nematicidal agent (23). Susceptibility testing performed for β-lactams by disk

diffusion showed that P. synxantha strain MCP-106 was resistant to amino- and carboxypenicillins,

broad- spectrum cephalosporins, aztreonam, and carbapenems. Whole-genome sequencing was

performed using an Illumina MiSeq platform (2 × 150-bp paired ends) to assess the genetic

determinants of carbapenem resistance. The obtained reads were trimmed using trimmomatic 0.36,

assembled with SPAdes version 3.11.1 (24), and annotated with PROKKA version 1.12. TBLASTN

analysis of the DNA contigs using VIM as a reference revealed a chromosomally located MBL protein

that was named PFM-1 (Pseudomonas fluorescens metallo-β-lactamase). PFM-1 (encoded by the

blaPFM-1 gene) consisted of a β-lactamase with 253 amino acids and a relative molecular mass of 28.5

kDa.

FIG 1. Dendrogram performed by using the seven genes from the multilocus sequence typing (MLST) analysis in comparison with representative genes from other Pseudomonas species, in particular, the most closely related ones, which are Pseudomonas fluorescens and Pseudomonas synxantha. The alignment used for the tree calculation was performed with the Clustal Omega program.

122

A BLASTN analysis against the NCBI database revealed the presence of a blaPFM-like gene (named

blaPFM-2, with PFM-2 sharing 92% amino acid identity with PFM-1) in Pseudomonas libanensis strain

CIP105460 (GenBank accession no. GCA_001439685.1) (25) which actually belongs to the

Pseudomonas fluorescens sp. complex. In addition, genes encoding PFM-like products were also

identified in the genomes of a single P. fluorescens strain (WP_050516231.1) and two Pseudomonas

brenneri strains, sharing 90% amino acid identity with PFM-1 (WP_128593843.1 and OAE14554.1).

Furthermore, a gene encoding a more distantly related enzyme (75% amino acid identity) was found

in the genome of a Pseudomonas chlororaphis strain (WP_038635452.1). However, no other blaPFM-

like gene was identified in any other P. fluorescens genomes (or in any genomes of species belonging

to the same complex), despite numerous genomes of strains belonging to the P. fluorescens complex

(n = 145) having been fully sequenced. We then screened 10 P. fluorescens strains from our

laboratory collection, all of which had been recovered from human, animal, or environmental

samples. A PCR-based approach using primer pair PFM-1-Fw (5’-GTTACGCCTGATGGACTTTG-3’) and

PFM-1-Rv (5’-CTTAGAAGCATGTCAGTGCG-3’) for blaPFM-1 and primer pair PFM-2-Fw (5’-

CTGATCAGAAAATGTGGGGC-3’) and PFM-2-Rw (5’-GACACGCCGTGTTTCTATATC-3’) for blaPFM-2 was

employed. A single strain gave a positive result, and Sanger sequencing identified a blaPFM-like gene

(blaPFM-3) encoding a protein sharing 91% amino acid identity with PFM-1. The blaPFM-3 gene was

identified from P. fluorescens PF1, an isolate recovered from a water sample from the Seine River in

Paris, France, and also producing the Ambler class A carbapenemase BIC-1 (18). PFM-2 and PFM-3

differed by five amino acids.

Pairwise alignment of the sequences of the PFM-like amino acid sequences with those of other MBLs

revealed that these newly identified enzymes were most closely related to the subclass B2 MBL

enzymes. PFM-1 shares 71% amino acid identity with Sfh-1, originally identified in Serratia fonticola

strain UTAD54 (9), and 53% identity with CphA-1 from Aeromonas hydrophila (26). It shared very low

identity with subclass B1 MBLs such as NDM-1 (17%) and VIM-1 and IMP-1 (22%) (Fig. 2). Protein

alignments of the β-lactamase PFM-1 with representative subclass B2 MBLs revealed the presence of

conserved amino acid residues known to be involved in binding to zinc of class B β-lactamase (BBL)

(27) (Fig. 3). The motif Asn-Tyr-His-Thr-Asp (positions 116 to 120 [BBL nomenclature]), being a

distinctive feature of subclass B2 MBLs and presumably involved in the coordination of the two zinc

ions found in the active site of these enzymes, was identified in PFM-like enzymes. Amino acids

Asp120, Cys-221, and His-263, presumably involved in the binding of the second zinc ion in subclass

B2 MBLs, were also conserved in the PFM-like proteins.

123

FIG 2. Dendrogram of PFM-1, PFM-2, and PFM-3 in comparison with representative class B β-lactamases subjected to neighbor-joining analysis. The alignment used for the tree calculation was performed with the Clustal Omega program. Numbers in parentheses indicate percentages of amino acid identity with PFM-1. The β-lactamases used for the comparisons (GenBank accession numbers) were Sfh-1 (NZ_AUZV01000091.1), CphA-1 (X57102), ImiS (Y10415), ImiH (AJ548797), VIM-1 (AJ278514), IMP-1 (EF027105), NDM-1 (KJ018857), POM-1 (EU315252), and PAM-1 (AB858498). Percentages of amino acid identities compared to PFM-1 are indicated.

FIG 3. Alignment of the amino acid sequences of subclass B2 MBLs. Residues conserved in the enzymes are indicated by asterisks; colons indicate conservation between groups with strongly similar properties; dots indicate conservation between groups with weakly similar properties. The BBL numbering scheme (in bold) is used for residues conserved in MBLs.

In order to gain insight into the β-lactam resistance phenotype conferred by the corresponding

proteins, the blaPFM-1, blaPFM-2, and blaPFM-3 genes of P. synxantha strain MCP-106, P. libanensis strain

CIP105460, and P. fluorescens PF1 were cloned into plasmid pTOPO (Invitrogen, Illkirch, France) and

expressed in Escherichia coli. Cloning experiments were performed using the pCR-blunt TOPO cloning

124

kit (Invitrogen, Illkirch, France) after amplification of the genes with primers PFM-1-Fw and PFM-1-Rv

for blaPFM-1 and with primers PFM-2-Fw and PFM-2-Rw for blaPFM-2 and blaPFM-3. The resulting

recombinant plasmids were transformed into chemically competent E. coli TOP10 strains. Once

expressed in E. coli TOP10, similar resistance phenotypes were observed with the different PFM

variants, with reduced susceptibility to carbapenems seen (Table 1) but paradoxically no effect on

the other β-lactams tested such as amoxicillin, ticarcillin, cefoxitin, cefotaxime, and ceftazidime (data

not shown). MICs of carbapenems were determined by Etest and showed values for the PFM-3-

producing recombinant strain that were higher than those obtained with the PFM-1-producing and

PFM-2-producing recombinant strains, particularly for imipenem (Table 1).

Table 1. MICs of carbapenems for E. coli TOP10 recipient strain with and without the blaPFM genes and for Pseudomonas isolates.

aCIP105460 was originally described by Dabboussi et al. (25).

bPF1 was originally described by Girlich et al. (18).

cClavulanic acid was used at a concentration of 2 µg/ml.

dTazobactam was used at a concentration of 4 µg/ml.

Purification of the PFM-1 enzyme was performed using a four-liter LB broth culture of E. coli TOP10

(pTOPO-blaPFM-1) recombinant strain supplemented with kanamycin (50 µg/ml) and inoculated for 24

h at 37°C under shaking conditions. The bacterial culture was centrifuged, and the pellet was

resuspended in Tris-HCl buffer (50 mM Tris-HCl, 100 µM ZnCl2, pH 8.5) and sonicated using a Vibra-

Cell 75186 sonicator (Thermo Fisher Scientific). After filtration using a 0.22-µm pore size

nitrocellulose filter, the crude extract was loaded in a Q-Sepharose column connected to an

ÄKTAprime chromatography system (GE Healthcare, Glattbrugg, Switzerland) and eluted with a linear

NaCl gradient. The presence of the β-lactamase was monitored using the Rapid Carba NP test (19),

and the fractions showing the highest β-lactamase activity were pooled and dialyzed against 100 mM

phosphate buffer (pH 7.0), prior to 10-fold concentration performed with a Vivaspin 20 concentrator

(GE Healthcare). The purified β-lactamase extract was immediately used for enzymatic

determinations.

The protein concentrations were measured using Bradford reagent (Sigma-Aldrich, Buchs,

Switzerland), and the purity of the enzyme was estimated by SDS-PAGE analysis (GenScript, NJ, USA).

The purity of PFM-1 was estimated to be >95%, with a single dominant band visible on the SDS-

125

polyacrylamide gel. Kinetic measurements were performed at room temperature using phosphate-

buffered saline (PBS) buffer (0.1 M, pH 7) supplemented with ZnSO4 (5 µM) using a UV/visible

Ultrospec 2100 Pro spectrophotometer (Amersham Biosciences, Buckinghamshire, United Kingdom).

This kinetic analysis confirmed that PFM-1 hydrolyzed carbapenems; however, the catalytic efficiency

was slightly lower than that seen with the previously described subclass B2 MBLs (Table 2). In

contrast, hydrolysis of other β-lactam substrates such as benzylpenicillin or cefotaxime was not

detected (kcat value < 0.01 s—1). This study therefore characterized a novel family of subclass B2

MBLs with substantial carbapenemase activity. Compared to other subclass B2 MBLs, PFM-1

hydrolysis is limited to carbapenems, and the catalytic efficiency is lower.

Table 2. Kinetic parameters of purified β-lactamase PFM-1 and comparison with other B2 MBLs. Kinetic data are displayed for Sfh-1, CphA, and AsbM1 as reported previously by Fonseca et al. (30), Vanhove et al. (31), and Yang and Bush (8), respectively. ImiS kinetic values are presented for imipenem and meropenem as reported by Sharma et al. (32) and Crawford et al. (6), respectively. NR, not reported.

The levels of G+C content of blaPFM-1 (50%) and blaPFM-2/-3 (52%) differed from the expected range of

the G+C content of Pseudomonas genes (ca. 60%); in addition, the fact that no other blaPFM-like genes

were identified in several fully sequenced genomes of P. fluorescens strains available in the GenBank

databases further suggests a non- Pseudomonas origin. However, no obvious genetic element that

could have been involved in the acquisition of that gene was observed in its nearby genetic environ-

ment. Similarly, no mobile genetic elements were identified in their upstream vicinity by analyzing

the genes showing significant identities with blaPFM-1 in the GenBank database. It may be speculated

that those genes have been acquired by transformation since P. fluorescens strains, as with many

other Gram-negative nonfermenters, are spontaneously transformable at high frequency (28).

However, a discrepancy was always noticed between all of the putative MBL-encoding genes

(including blaPFM-1) and the surrounding chromosomal sequences in term of GC content (ca. 50%

versus ca 60%), suggesting a foreign origin (data not shown).

This work underlines that P. fluorescens-like species may possess class B β-lactamase genes that are,

however, not systematically present in their genomes. Although strains belonging to the P.

fluorescens complex are rarely involved in human infections, they are widely disseminated in the

environment and parts of the human microbiota and can also be found in chicken meat (16). Those

bacterial species may therefore constitute reservoirs of antimicrobial resistance genes (29).

126

6.3 Data availability

The sequences of PFM-1, PFM-2, and PFM-3 have deposited in the NCBI database under GenBank

accession numbers MN065826 (PFM-1), MN080496 (PFM- 2), and MN080497 (PFM-3). The sequence

of the whole genome of P. synxantha strain MCP-106 has been deposited under GenBank accession

number VSRO00000000.1, BioProject accession no. PRJNA561277, and BioSample accession no.

SAMN12612925.

6.4 References

1. Jeon J, Lee JH, Lee JJ, Park KS, Karim AM, Lee CR, Jeong BC, Lee SH. 2015. Structural basis for

carbapenem-hydrolyzing mechanisms of carbapenemases conferring antibiotic resistance. Int J Mol

Sci 16:9654 –9692. https://doi.org/10.3390/ijms16059654.

2. Palzkill T. 2013. Metallo-β-lactamase structure and function. Ann N Y Acad Sci 1277:91–104.

https://doi.org/10.1111/j.1749-6632.2012.06796.x.

3. Cornaglia G, Giamarellou H, Rossolini GM. 2011. Metallo-β-lactamases: a last frontier for β-lactams?

Lancet Infect Dis 11:381–393. https://doi.org/ 10.1016/S1473-3099(11)70056-1.

4. Hernandez Valladares M, Felici A, Weber G, Adolph HW, Zeppezauer M, Rossolini GM, Amicosante

G, Frère JM, Galleni M. 1997. Zn(II) dependence of the Aeromonas hydrophila AE036 metallo-β-

lactamase activity and stability. Biochemistry 36:11534 –11541. https://doi.org/10.1021/ bi971056h.

5. Segatore B, Massidda O, Satta G, Setacci D, Amicosante G. 1993. High specificity of cphA-encoded

metallo-β-lactamase from Aeromonas hydrophila AE036 for carbapenems and its contribution to β-

lactam resistance. Antimicrob Agents Chemother 37:1324 –1328. https://doi.org/10.1128/

aac.37.6.1324.

6. Crawford PA, Sharma N, Chandrasekar S, Sigdel T, Walsh TR, Spencer J, Crowder MW. 2004. Over-

expression, purification, and characterization of metallo-β-lactamase ImiS from Aeromonas veronii bv.

sobria. Protein Expr Purif 36:272–279. https://doi.org/10.1016/j.pep.2004.04.017.

7. Walsh TR, Gamblin S, Emery DC, MacGowan AP, Bennett PM. 1996. Enzyme kinetics and

biochemical analysis of the Imis, the metallo-β-lactamase from Areonomas sobria. J Antimicrob

Chemother 37:423– 441. https://doi.org/10.1093/jac/37.3.423.

8. Yang Y, Bush K. 1996. Biochemical characterization of the carbapenem- hydrolyzing β-lactamase

AsbM1 from Aeromonas sobria AER 14M: a member of a novel subgroup of metallo-β-lactamases.

FEMS Microbiol Lett 137:193–200. https://doi.org/10.1111/j.1574-6968.1996.tb08105.x.

127

9. Saavedra MJ, Peixe L, Sousa JC, Henriques I, Alves A, Correia A. 2003. Sfh-I, a subclass B2 metallo-β-

lactamase from a Serratia fonticola environmental isolate. Antimicrob Agents Chemother 47:2330 –

2333. https:// doi.org/10.1128/aac.47.7.2330-2333.2003.

10. Koh TH, Wang GCY, Sng L-H. 2004. IMP-1 and a novel metallo-β- lactamase, VIM-6, in fluorescent

pseudomonads isolated in Singapore. Antimicrob Agents Chemother 48:2334 –2336.

https://doi.org/10.1128/ AAC.48.6.2334-2336.2004.

11. Pellegrini C, Mercuri PS, Celenza G, Galleni M, Segatore B, Sacchetti E, Volpe R, Amicosante G,

Perilli M. 2009. Identification of blaIMP-22 in Pseudomonas spp. in urban wastewater and nosocomial

environments: biochemical characterization of a new IMP metallo-enzyme variant and its genetic

location. J Antimicrob Chemother 63:901–908. https://doi.org/10.1093/jac/dkp061.

12. Lee K, Kim CK, Yong D, Yum JH, Chung MH, Chong Y, Thaller MC, Rossolini GM. 2012. POM-1

metallo-β-lactamase-producing Pseudomonas otitidis isolate from a patient with chronic otitis media.

Diagn Microbiol Infect Dis 72:295–296. https://doi.org/10.1016/j.diagmicrobio.2011.11.007.

13. Borgianni L, De Luca F, Thaller MC, Chong Y, Rossolini GM, Docquier JD. 2015. Biochemical

characterization of the POM-1 metallo-β-lactamase from Pseudomonas otitidis. Antimicrob Agents

Chemother 59:1755–1758. https://doi.org/10.1128/AAC.03843-14.

14. Suzuki M, Suzuki S, Matsui M, Hiraki Y, Kawano F, Shibayama K. 2014. A subclass B3 metallo-β-

lactamase found in Pseudomonas alcaligenes. J Antimicrob Chemother 69:1430 –1432.

https://doi.org/10.1093/jac/ dkt498.

15. Garrido-Sanz D, Arrebola E, Martínez-Granero F, García-Méndez S, Muriel C, Blanco-Romero E,

Martín M, Rivilla R, Redondo-Nieto M. 2017. Classification of isolates from the Pseudomonas

fluorescens complex into phylogenomic groups based in group-specific markers. Front Microbiol

8:413. https://doi.org/10.3389/fmicb.2017.00413.

16. Scales BS, Dickson RP, LiPuma JJ, Huffnagle GB. 2014. Microbiology, genomics, and clinical

significance of the Pseudomonas fluorescens species complex, an unappreciated colonizer of humans.

Clin Microbiol Rev 27:927–948. https://doi.org/10.1128/CMR.00044-14.

17. Pierrard A, Ledent P, Docquier JD, Feller G, Gerday C, Frère JM. 1998. Inducible class C β-

lactamases produced by psychrophilic bacteria. FEMS Microbiol Lett 161:311–315.

https://doi.org/10.1111/j.1574-6968.1998.tb12962.x.

128

18. Girlich D, Poirel L, Nordmann P. 2010. Novel Ambler class A carbapenem- hydrolyzing β-lactamase

from a Pseudomonas fluorescens isolate from the Seine River, Paris, France. Antimicrob Agents

Chemother 54:328 –332. https://doi.org/10.1128/AAC.00961-09.

19. Nordmann P, Poirel L, Dortet L. 2012. Rapid detection of carbapenemase- producing

Enterobacteriaceae. Emerg Infect Dis 18:1503–1507. https://doi.org/10.3201/eid1809.120355.

20. Anzai Y, Kim H, Park JY, Wakabayashi H, Oyaizu H. 2000. Phylogenetic affiliation of the

pseudomonads based on 16S rRNA sequence. Int J Syst Evol Microbiol 50:1563–1589.

https://doi.org/10.1099/00207713-50-4-1563.

21. McLean JS, Beveridge TJ, Phipps D. 2000. Isolation and characterization of a chromium-reducing

bacterium from a chromated copper arsenate- contaminated site. Environ Microbiol 2:611– 619.

https://doi.org/10.1046/j.1462-2920.2000.00143.x.

22. Badar U, Ahmed N, Beswick AJ, Pattanapipitpaisal P, Macaskie LE. 2000. Reduction of chromate

by microorganisms isolated from metal contaminated sites of Karachi, Pakistan. Biotechnol Lett

22:829 – 836. https://doi.org/10.1023/A:1005649113190.

23. Wechter WP, Begum D, Presting G, Kim JJ, Wing RA, Kluepfel DA. 2002. Physical mapping, BAC-

end sequence analysis, and marker tagging of the soilborne nematicidal bacterium, Pseudomonas

synxantha BG33R. OMICS 6:11–21. https://doi.org/10.1089/15362310252780807.

24. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI,

Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012.

SPAdes: a new genome assembly algorithm and Its applications to single-cell sequencing. J Comput

Biol 19:455– 477. https://doi.org/10.1089/cmb.2012.0021.

25. Dabboussi F, Hamze M, Elomari M, Verhille S, Baida N, Izard D, Leclerc H. 1999. Pseudomonas

libanensis sp. nov., a new specie isolated from Lebanese spring waters. Int J Syst Bacteriol 49:1091–

1101. https://doi.org/10.1099/00207713-49-3-1091.

26. Massidda O, Rossolini GM, Satta G. 1991. The Aeromonas hydrophila cphA gene: molecular

heterogeneity among class B metallo-β-lactamases. J Bacteriol 173:4611– 4617.

https://doi.org/10.1128/jb.173.15.4611-4617.1991.

27. Garau G, García-Sáez I, Bebrone C, Anne C, Mercuri P, Galleni M, Frère JM, Dideberg O. 2004.

Update of the standard numbering scheme for class B β-lactamases. Antimicrob Agents Chemother

48:2347–2349. https:// doi.org/10.1128/AAC.48.7.2347-2349.2004.

129

28. Nielsen KM, Smalla K, van Elsas JD. 2000. Natural transformation of Acinetobacter sp. strain

BD413 with cell lysates of Acinetobacter sp., Pseudomonas fluorescens, and Burkholderia cepacia in

soil microcosms. Appl Environ Microbiol 66:206 –212. https://doi.org/10.1128/aem.66.1.206-

212.2000.

29. D’Costa VM, King CE, Kalan L, Morar M, Sung WWL, Schwarz C, Froese D, Zazula G, Calmels F,

Debruyne R, Golding GB, Poinar HN, Wright GD. 2011. Antibiotic resistance is ancient. Nature

477:457– 461. https://doi.org/10.1038/nature10388.

30. Fonseca F, Sarmento AC, Henriques I, Samyn B, van Beeumen J, Domingues P, Domingues MR,

Saavedra MJ, Correia A. 2011. Biochemical characterization of Sfh-I, a subclass B2 metallo-β-

lactamase from Serratia fonticola UTAD54. Antimicrob Agents Chemother 55:5392–5395.

https://doi.org/10.1128/AAC.00429-11.

31. Vanhove M, Zakhem M, Devreese B, Franceschini N, Anne C, Bebrone C, Amicosante G, Rossolini

GM, Van Beeumen J, Frère JM, Galleni M. 2003. Role of Cys221 and Asn116 in the zinc-binding sites

of the Aeromonas hydrophila metallo-β-lactamase. Cell Mol Life Sci 60:2501–2509. https://

doi.org/10.1007/s00018-003-3092-x.

32. Sharma NP, Hajdin C, Chandrasekar S, Bennett B, Yang KW, Crowder MW. 2006. Mechanistic

studies on the mononuclear ZnII-containing metallo- β-lactamase ImiS from Aeromonas sobria.

Biochemistry 45:10729 –10738. https://doi.org/10.1021/bi060893t.

130

CHAPTER 7 : Summary and perspectives

7.1 Summary

A general description of the main topics in this work is provided in CHAPTER 1: the AMR challenge,

with particular focus on the species K. pneumoniae and A. baumannii, and the potential role of WGS

in improving diagnostics and surveillance.

While antibiotics still represent the major antibacterial agents for the treatment of bacterial

infections, an increasing number of bacteria is becoming resistant to them, complicating the

treatment of infections. Carbapenems are highly effective antibiotics commonly used for the

treatment of severe bacterial infections of MDR bacteria, which are resistant to first-line antibiotics.

Of major concern, carbapenem resistance is on the rise, and strains carrying mobile genetic

determinants of carbapenem resistance are the leading cause of nosocomial outbreaks. In some

countries the carbapenem resistance prevalence is so high that other drugs, usually reserved as last

options, are widely used. As an example, colistin, an old drug that was unused due to its toxicity, it’s

now commonly adopted in some countries, and resistance toward this antibiotic is on the rise.

Of the several pathogens associated with AMR, carbapenem-resistant K. pneumoniae and A.

baumannii represent major concerns. Both pathogens frequently cause outbreaks of infections, while

strains which are resistant to all available antibiotics are emerging. Concerning K. pneumoniae, a

novel kind of superbug is recently emerging. While MDR K. pneumoniae clones causing hospital

outbreaks and hypervirulent, drug susceptible clones causing severe community-acquired infections

were two separate concerns, the convergence of the two traits is emerging. Both acquisition of

hypervirulence and resistance genes have been observed in MDR and hypervirulent clones,

respectively, especially in Asia. Tracking the emergence and evolution of such novel clones, causing

severe infections with limited treatment options, is fundamental.

The decreasing cost of WGS is allowing its increase implementation in bacterial diagnosis.

Surveillance, outbreaks investigation and phenotype prediction, in particular for AMR and virulence

determinants, are some of the major applications of WGS in the clinical microbiology laboratory.

Despite an increasing number of studies, there is still a lack of surveillance investigations for last-line

resistance mechanisms and for convergence of resistance and hypervirulence traits. Moreover, while

the phenotype prediction from the genomic data showed encouraging results, the understanding of

the genetic resistance mechanisms of some drugs, such as colistin, is still limited, and novel in silico

tools for the phenotype prediction are needed.

131

The first aim of this work was to employ WGS to characterize collections of clinical colistin-resistant

isolates from countries where the carbapenem-resistance rate is sky-high and colistin often

represents the last treatment option. In CHAPTER 2 we analysed forty-five colistin-resistant K.

pneumoniae strains from Serbia, collected during 2013-2017 from seven Serbian medical settings

covering the entire country. WGS showed the absence of acquired colistin resistance mechanisms,

while alterations in the mgrB gene, involved in LPS modifications, were observed in all strains. Such

modifications were confirmed by mass spectrometry, which revealed addition of LAra4N to the LPS,

consistent with MgrB inactivation. Genomic epidemiology investigations revealed the abundance of

an emerging ‘high-risk’ clone, ST101, which was observed in most of the cities involved in the study,

demonstrating its high endemicity. Interestingly, ST101 strains carried the carbapenemase-encoding

gene blaOXA-48, however such gene was not embedded in the usual OXA-48 plasmid. In order to

decipher the blaOXA-48 genetic background, we performed long-reads sequencing with the ONT

MinION instrument, and we obtained the full plasmid sequence. Such plasmid was novel and likely

resulting from the recombination of two previously described plasmids. Compared to a classic OXA-

48 plasmid, it had different plasmid replicons and carried several other AMR genes, including the

ESBL-encoding gene blaCTX-M-15.

In CHAPTER 3 we studied a collection of carbapenem- and colistin-resistant A. baumannii clinical

isolates obtained during 2015-2017 from several Greek hospitals. WGS revealed that the strains

belonged to one of the two major international clones (IC1 or IC2), with a clear predominance of IC2

which is replacing IC1 globally. Interestingly, we observed the same colistin resistance-associated

mutation in all the strains from both ICs, represented by an amino acid substitution in the PmrB, a

regulator involved in LPS modifications. Such mutation was associated with low-level colistin

resistance. In some strains, additional mutations in either PmrA or PmrB further decreased the

colistin susceptibility, leading to high-level colistin resistance. Interestingly, mass spectrometry

analysis of LPS detected modifications in both colistin-resistant and susceptible strains. Such finding

indicates that other still unknown factors may be needed for a resistant phenotype. Overall, we

observed a convergent evolution of different clonal lineages towards the same colistin resistance

mechanism, indicating that such mechanism may not have major impact on the strains fitness.

Given the frequent emergence of novel AMR mechanisms and high-risk clones from Asia, in CHAPTER

4 we employed WGS to study the evolution and epidemiology of a large collection (N=300) of K.

pneumoniae isolates from the H301 hospital in Beijing, China. The isolates were of clinical origin and

obtained during the period 2002-2016. Of those, 200 were randomly selected, aiming to study the

population structure within the hospital during the time period. We observed an increase in

carbapenems resistance during the study period, driven by carbapenemases production from strains

132

mostly belonging to the globally dominant CG258 clone. Hypervirulent strains causing severe

infections were also observed, and were mainly represented by the CG23 clone. Interestingly, we

also detected eleven cases, corresponding to 5.5 % overall, of simultaneous carriage of AMR genes

(ESBLs or carbapenemases) and hypervirulent genes. Tracking the emergence and evolution of such

strains, causing severe infections with limited treatment options, is fundamental in order to

understand their origin, possible further evolution and to limit their spread.

In CHAPTER 5 we described the performance and interpretability of a machine learning (ML)

algorithm for the genome-based prediction of antimicrobial susceptibilities. While several algorithms

with gold-standard performances were built and successfully tested on different bacterial species,

such methods are not interpretable, as the predictive genetic features are not revealed. Our

approach was tested on a panel of K. pneumoniae genomes, with state-of-the-art predictive

performances while also revealing the underlying resistance mechanisms. By enhancing the

interpretability of in silico prediction models, such approach improves their clinical utility, hence

facilitating their adoption in routine diagnostics by clinicians and microbiologists.

Finally, in CHAPTER 6 we employed WGS to decipher the carbapenem resistance mechanism of an

environmental P. fluorescens isolate. WGS revealed the presence of a putative carbapenemase-

encoding gene. The gene was cloned in a plasmid vector and the carbapenemase, PFM-1, was

expressed in an E. coli laboratory strain. The carbapenemase hydrolytic activity was also tested and

compared to those obtained from similar carbapenemases. Bioinformatics analysis further revealed

the presence of such carbapenemase-encoding gene in other environmental strains, and allowed to

study its genetic environment. Although the gene was observed in environmental isolates, it could

mobilize to successful mobile genetic elements and spread to clinically relevant pathogens, as

previously reported for the most common clinically relevant AMR mechanisms.

7.2 General discussion and future perspectives

WGS is a powerful tool to practically monitor but also more theoretically study bacterial

epidemiology since it provides a comprehensive picture of bacterial populations in a single uniform

assay that can be used for all microbial species indiscriminately. WGS allows the simultaneous

detection and identification of species, distinction of strains within a species, characterisation of

lineages, indirect assessment of capsular and other pheno-types, the definition of core and accessory

AMR and virulence determinants, and it also allows for the unravelling of high-resolution strain

relatedness with which to assess evidence of microbial transmission.

133

Given the genetic diversity and complexity of K. pneumoniae and A. baumannii as clinical pathogens,

WGS is rapidly becoming a fundamental tool for epidemiology and surveillance of such pathogens.

Genomic surveillance already elucidated patterns of clonal spread at local, regional and global levels

and uncovered insights into the burden and geographic spread of K. pneumoniae and A. baumannii

HAIs 1–6. In particular, detecting novel AMR mechanisms, monitoring the emergence and

dissemination of high-risk clones, tracking the convergence of AMR and hypervirulence traits, and

exploring links between clinical infections and potential ecological reservoirs (the environment and

zoonoses being the two main examples of such reservoirs) are among the major WGS-resolvable

issues for genomic surveillance.

We successfully exploited WGS in order to study the presence and dynamics of AMR and virulence

genes, the population structure and the genomic epidemiology of clinical isolates of the two

aforementioned species from European and Asian countries, especially in regions and institutes

where AMR levels are sky-high. A major limitation of our studies was that in some cases the isolates’

selection criteria were not properly designed and our study design had to be a bit opportunistic. For

instance, studies of colistin resistant isolates should have included also several colistin susceptible

isolates from the same time period and geographic locales for comparative purposes. Concerning the

longitudinal study of K. pneumoniae isolates within the Chinese hospital H301, randomly selecting

isolates during a time frame of 15 years led to an overly diverse population, including several isolates

with little clinical relevance. An alternative selection should have been based on selecting only MDR

pathogens and hypermucoviscous pathogens, which are positive to a simple string test. Such kind of

collection would be focused on the characterisation of hypervirulence, MDR and the convergence of

the two traits, which is an emerging and serious threat, especially in Asia 3.

Another limitation of our studies was that patient information was generally scarce and this was

surely not only due to aspects of privacy of patient data. For instance, data about colistin

administration was not available for patients included in both our Greek and Serbian studies on

colistin resistance (Chapter 2 and 3). Concerning the Chinese H301 study (Chapter 4), the scarcity of

patient data didn’t allow to investigate the outbreaks within the hospital, a task that is proven to be

well suited for WGS 7.

The usefulness of WGS for outbreak investigations transcend its initial purpose. Local hospital

outbreak studies have shown for example that MDR strains are more transmissible than susceptible

ones in hospitals 2,8 and revealed hospital plumbing systems as a source of prolonged outbreaks 9,

providing useful informations for intervention and prevention strategies.

134

WGS investigations also underscored the importance of the dynamics of plasmids and other mobile

genetic elements in hospital outbreaks 10. In particular, K. pneumoniae was shown to play a leading

role as both donor and recipient for mobilisable AMR genes 9–12. The increasing awareness of the

potential for so-called plasmid outbreaks is transforming our vision on how screening and

surveillance of pathogens should be handled, moving the focus from individual pathogens to the

enzymes and plasmids that make the pathogen a threat.

Therefore, plasmid analysis is of major importance, and resolving the dynamics of mobile genetic

elelements represents the next step to further improve genomic surveillance. Obtaining complete

assemblies is a major prerogative in order to study plasmids. The low cost and high accuracy of

Illumina short read sequencing technology makes them well adapted for high-throughput bacterial

genomics. This led Illumina sequencing to become the dominant technology for WGS of bacterial

pathogens 13. However, short read sequencing cannot resolve all genomic repeats, which are

particularly abundant in mobile genetic elements. Therefore, fragmented assemblies of hundreds of

contigs often provide the best possible outcome. Where needed, short-read sequencing can be

combined with long-read sequencing data obtained with technologies such as PacBio SMRT and ONT

nanopore sequencing 14,15, resulting in the definition of more complete genomes. We obtained the

complete genome of a K. pneumoniae strain by combining the Illumina short reads with the ONT

MinION long-reads (Chapter 2). While this approach is known to be quite costly, multiplexing can

result in a final complete assembly for about 150 USD on a per strain basis 15. Therefore, the selection

of a subset of representative strains for long-read sequencing for genomic surveillance studies,

especially involving outbreaks, should be a major priority. This will limit the overall number of contigs

and facilitate the inter-genomic comparisons since alignment between genomes will become more

easy.

Currently, WGS in clinical or public health microbiology laboratories is mainly used as a typing tool to

inform on the genetic relatedness of strains, which can be used for local and global epidemiological

studies. However, WGS holds also considerable promise for AST 16. Sequence data can be queried to

identify the presence of both acquired antibiotic resistance genes and chromosomal mutations that

contribute to antibiotic resistance.

We were able to correlate the AST data with genomic data, also for some drugs where the underlying

genetic AMR mechanisms are still not completely understood, such as in the case of colistin. ML

algorithms for in silico prediction of antimicrobial susceptibilities are also promising, and we showed

that the data interpretability could still be augmented (Chapter 5). Unfortunately, we were only able

to test our ML algorithm on a single species. Extending the test to other bacterial species is an urgent

135

and mandatory next step, and focusing on A. baumannii will be a priority, given the fact that only few

reports on genomic AMR detection exist for this pathogen 17,18.

An overall limitation of our work (but also in general for similar work by third parties) was that the

genome collections were relatively small and focused on limited geographical locations. A further

improvement would be to expand the genome collection. ML algorithms generally require big

databases where genome sequences are combined with their respective AST data 19. While there is a

lot of public data available for several species/drug combinations 20,21, this is not the case for recently

introduced drugs or for drugs requiring a testing method that cannot be automated. Therefore,

testing ML algorithms in the prediction of antimicrobial susceptibilities of such molecules is today not

possible. Efforts should be made to collect existing and prospective data on such drugs in order to

create an exhaustive database suitable for ML testing, also potentially enabling the discovery of

novel AMR mechanisms.

Finally, a next step on our work will be to translate the pipeline from the R language to the Python

language. While R is one of the most popular code languages, it’s rapidly been replaced by Python

among researchers. In general, bioinformatics is a new scientific domain. There are no globally

accepted pipelines for both epidemiological analyses but also the establishment of resistance and

virulence profiles available. Different researchers use different systems and this is not promoting

standardisation and common use of certail tools. The few comparative studies available show that

this is risky and that different bioinformatic tools used for the analysis of the same dataset will lead

to different interpretation 22. This aspect should be taken very seriously and further research in tool

comparison is much needed. Resolving the irreproducibility will in the end allow the development of

diagnostic assays that will withstand scrutiny by regulatory agencies such as the FDA.

WGS is enabling the discovery of novel AMR mechanisms with ease and at an unprecedented speed,

also uncovering novel reservoirs of AMR genes in the environment and the animal domain. We

identified and characterized a novel family of carbapenemase enzymes produced by environmental

isolates (Chapter 6). It is well known that environmental bacterial species constitute an important

reservoir of antimicrobial resistance genes 23. More research is needed to fill knowledge gaps and

assess the potential risk antimicrobials and resistant bacteria in the environment pose to human

health and the broader environmental ecosystem. Though this was not the main purpose of this

thesis, a next steps should be to exploit metagenomics, the study of genetic material recovered

directly from environmental samples or essentially any niche specific sample containing microbes,

which was already previously demonstrated to be a fundamental tool in elucidating the strong

correlation between AMR and the environment 24.

136

Finally, the novel understandings obtained in the present thesis will pile up with the existing ones,

allowing a step forward towards a complete implementation of WGS in the clinical microbiology

laboratories and towards a WGS-based antimicrobial susceptibility test. The major obstacles that still

exist include the current lack of rapidity, the still elevated costs of the sequencing based assays, the

fact that the technologies are still developing rapidly and, consequently, the lack of generally

accepted FDA approved tests. The diagnostic microbiology community is aware of these fundamental

problems and there are many initiatives to solve these problems underway. The coming 5 years will

see solutions appear and there will be rapid, cheap and reproducible tests available for all to use.

Such tests are going to be based upon NGS technology solely and will have a major positive impact

on the quality of the diagnostic laboratory.

7.3 References

1. Long SW, Olsen RJ, Eagar TN, et al. Population genomic analysis of 1,777 extended-spectrum beta-

lactamase-producing Klebsiella pneumoniae isolates, Houston, Texas: Unexpected abundance of

clonal group 307. MBio 2017; 8.

2. David S, Reuter S, Harris SR, et al. Epidemic of carbapenem-resistant Klebsiella pneumoniae in

Europe is driven by nosocomial spread. Nat Microbiol 2019.

3. Wyres KL, Nguyen TNT, Lam MMC, et al. Genomic surveillance for hypervirulence and multi-drug

resistance in invasive Klebsiella pneumoniae from South and Southeast Asia. Genome Med 2020; 12:

11.

4. Heinz E, Brindle R, Morgan-McCalla A, Peters K, Thomson NR. Caribbean multi-centre study of

Klebsiella pneumoniae: whole-genome sequencing, antimicrobial resistance and virulence factors.

Microb Genomics 2019.

5. Ellington MJ, Heinz E, Wailan AM, et al. Contrasting patterns of longitudinal population dynamics

and antimicrobial resistance mechanisms in two priority bacterial pathogens over 7 years in a single

center. Genome Biol 2019; 20: 1–16.

6. Wright MS, Haft DH, Harkins DM, et al. New Insights into Dissemination and Variation of the

Health Care- Associated Pathogen Acinetobacter baumannii from Genomic Analysis. 2014; 5: 1–13.

7. Quainoo S, Coolen JPM, van Hijum SAFT, et al. Whole-genome sequencing of bacterial pathogens:

The future of nosocomial outbreak analysis. Clin Microbiol Rev 2017; 30: 1015–63.

8. Gorrie CL, Mirceta M, Wick RR, et al. Antimicrobial-Resistant Klebsiella Pneumoniae Carriage and

137

Infection in Specialized Geriatric Care Wards Linked to Acquisition in the Referring Hospital. Clin

Infect Dis 2018; Jul 15; 67.

9. Weingarten RA, Johnson RC, Conlan S, et al. Genomic analysis of hospital plumbing reveals diverse

reservoir of bacterial plasmids conferring carbapenem resistance. MBio 2018; 9.

10. Martin J, Phan HTT, Findlay J, et al. Covert dissemination of carbapenemase-producing Klebsiella

pneumoniae (KPC) in a successfully controlled outbreak: Long- and short-read whole-genome

sequencing demonstrate multiple genetic modes of transmission. J Antimicrob Chemother 2017; 72:

3025–34.

11. Conlan S, Park M, Deming C, et al. Plasmid dynamics in KPC-positive Klebsiella pneumoniae during

long-term patient colonization. MBio 2016; 7.

12. Sheppard AE, Stoesser N, Wilson DJ, et al. Nested Russian doll-like genetic mobility drives rapid

dissemination of the carbapenem resistance gene blaKPC. Antimicrob Agents Chemother 2016; 60:

3767–78.

13. Kwong JC, Mccallum N, Sintchenko V, Howden BP. Whole genome sequencing in clinical and

public health microbiology. Pathology 2015; 47: 199–210.

14. Zhang L, Li Y, Shen W, Wang SM, Wang G, Zhou Y. Whole-genome sequence of a carbapenem-

resistant hypermucoviscous Klebsiella pneumoniae isolate SWU01 with capsular serotype K47

belonging to ST11 from a patient in China. J Glob Antimicrob Resist 2017; 11: 87–9.

15. Wick RR, Judd LM, Gorrie CL, Holt KE. Completing bacterial genome assemblies with multiplex

MinION sequencing. Microb Genomics 2017: 0–6.

16. Schürch AC, van Schaik W. Challenges and opportunities for whole-genome sequencing–based

surveillance of antibiotic resistance. Ann N Y Acad Sci 2017; 1388: 108–20.

17. Ellington MJ, Ekelund O, Aarestrup FM, et al. The role of whole genome sequencing in

antimicrobial susceptibility testing of bacteria: report from the EUCAST Subcommittee. Clin Microbiol

Infect 2017; 23: 2–22.

18. Drouin A, Letarte G, Raymond F, Marchand M, Corbeil J, Laviolette F. Interpretable genotype-to-

phenotype classifiers with performance guarantees. Sci Rep 2019; 9.

19. Su M, Satola SW, Read TD. Genome-based prediction of bacterial antibiotic resistance. J Clin

Microbiol 2019; 57: 1–15.

138

20. Walker TM, Kohl TA, Omar S V., et al. Whole-genome sequencing for prediction of

Mycobacterium tuberculosis drug susceptibility and resistance: A retrospective cohort study. Lancet

Infect Dis 2015; 15: 1193–202.

21. Gordon NC, Price JR, Cole K, et al. Prediction of Staphylococcus aureus Antimicrobial Resistance

by Whole-Genome Sequencing. J Clin Microbiol 2014; 52: 1182–91.

22. Anon. Doyle RM, O’Sullivan DM, Aller SD, et al. Discordant bioinformatic predictions of

antimicrobial resistance from whole-genome sequencing data of bacterial isolates: an inter-

laboratory study. Microb Genom. 2020;6(2):e000335.

23. D’Costa VM, King CE, Kalan L, et al. Antibiotic resistance is ancient. Nature 2011; 477: 457–61.

24. Forbes JD, Knox NC, Ronholm J, Pagotto F, Reimer A. Metagenomics: The next culture-

independent game changer. Front Microbiol 2017; 8.

139

Acknowledgments

Firstly, I would like to express my sincere gratitude to my supervisors Prof. Alex van Belkum and Dr.

Caroline Mirande for the continuous support of my PhD study and related research, for their

patience, motivation, and valuable suggestions. Their guidance helped me in all the time of research

and writing of this thesis. I would like to thank also the bioMérieux employees who made my access

simpler to the research facilities and laboratory.

I thank the members of the doctoral committee, Dr. Arvid Suls, Prof. Annelies Van Rie, Prof. Herman

Goossens and Dr. Pieter Moons, for reading my work and for your time. A special thanks to Prof.

Herman Goossens, my promotor, for allowing me to get through my PhD and for his support, and to

Dr. Pieter Moons, for the great ND4ID project organization. I thank also all the people involved in the

ND4ID project, including PIs and PhD students.

I thank the other PhD students in bioMérieux, Andreu, Manisha and Rucha for their support and

chats.

My sincere thanks also goes to Prof. Marco Maria D’Andrea and Prof. Gian Maria Rossolini for the

nice collaborations, for all the suggestions and writing tips.

I thank very much Dr. Kelly Wyres and Prof. Kathryn Holt for the wonderful collaboration, for their

great suggestions and for helping me making sense of the research results.

I am also pleased to say thank you to Dr. Laurent Poirel and Prof. Patrice Nordmann who provided

me an opportunity to join their team for a few months, and who gave access to the laboratory and

research facilities. Thanks also to all the lab members who helped me during that time period.

I thank Dr. Pierre Mahé and Dr. Magali Jaillard for involving me in a wonderful project and teaching

me new things.

I thank Prof. Ivana Cirkovic, Dr. Nicholas Legakis and Prof. Luo Yan Ping for providing me with the

clinical isolates and for their involvement in the writing of the manuscripts.

In the end, I am grateful to my family, friends and acquaintances for supporting me throughout

writing this thesis and my life in general. A special thanks to my girlfriend, Amandine, for always

being there.

Klebsiella pneumoniae and Acinetobacter baumannii

Documents