Top Banner
Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila Nicolas Vodovar 1 , David Vallenet 2 , Ste ´phane Cruveiller 2 , Zoe ´ Rouy 2 , Vale ´rie Barbe 2 , Carlos Acosta 1 , Laurence Cattolico 2 , Claire Jubin 2 , Aure ´lie Lajus 2 , Be ´atrice Segurens 2 , Benoı ˆt Vacherie 2 , Patrick Wincker 2 , Jean Weissenbach 2 , Bruno Lemaitre 1 , Claudine Me ´digue 2 & Fre ´de ´ric Boccard 1 Pseudomonas entomophila is an entomopathogenic bacterium that, upon ingestion, kills Drosophila melanogaster as well as insects from different orders. The complete sequence of the 5.9-Mb genome was determined and compared to the sequenced genomes of four Pseudomonas species. P. entomophila possesses most of the catabolic genes of the closely related strain P. putida KT2440, revealing its metabolically versatile properties and its soil lifestyle. Several features that probably contribute to its entomopathogenic properties were disclosed. Unexpectedly for an animal pathogen, P. entomophila is devoid of a type III secretion system and associated toxins but rather relies on a number of potential virulence factors such as insecticidal toxins, proteases, putative hemolysins, hydrogen cyanide and novel secondary metabolites to infect and kill insects. Genome-wide random mutagenesis revealed the major role of the two-component system GacS/GacA that regulates most of the potential virulence factors identified. Pseudomonas spp. are ubiquitous Gram-negative bacteria that colonize and survive in numerous ecological niches including soil, water and plant surfaces. This versatility is reflected by the sizes of their genomes, which contain large sets of genes involved in carbon source utilization and adaptation. In 2001, we isolated a bacterial strain closely related to the saprophytic soil bacterium Pseudomonas putida, Pseudomonas entomophila, which triggers a systemic immune response in D. melanogaster after ingestion 1 . P. entomophila is highly pathogenic for both D. melanogaster larvae and adults. Its persistence in larvae leads to a massive destruction of gut cells 1 . Entomopathogenic bacteria such as the Gram-negative bacteria Photorhabdus luminescens, Xenorhabdus nematophilus, Yersinia pestis, Serratia marcescens and Serratia entomophila and the Gram- positive bacterium Bacillus thuringiensis have developed different strategies to interact with and kill insects 2 . Some gene products derived from these bacteria as well as the bacteria themselves, have been used to generate biopesticides 3 . The ability of P. entomophila to orally infect and kill larvae of insect species belonging to different orders makes it a promising model for the study of host-pathogen interactions and for the development of biocontrol agents against insect pests. To unravel features contributing to P. entomophila’s entomopathogenic properties, we have determined its complete genome sequence and performed a genome-wide screen for mutants affected in their ability to trigger an immune response and lethality in D. melanogaster . RESULTS Genome features and comparative genomics The P. entomophila genome is composed of a single circular chromo- some of 5,888,780 base pairs (Fig. 1). Among 5,169 coding sequences identified, 3,466 genes (67%) have been assigned a predicted function (Table 1). The P. entomophila genome is smaller than the six other Pseudomonas genomes that have been published (Table 1): the human opportunistic pathogen P. aeruginosa PAO1 (ref. 4), the three P. syringae pathovars 5–7 , the plant commensal P. fluorescens Pf-5 (ref. 8) and the saprophytic soil bacterium P. putida KT2440 (ref. 9). GC skew analysis and the predicted location of the origin of replication oriC near dnaA and of the chromosome dimer resolution dif site in PSEEN2780 revealed the presence of two replichores of similar size, contrary to the unbalanced replichores found in the genomes of P. putida KT2440 (ref. 10) and P. aeruginosa PAO1 (ref. 4) (see Supplementary Fig. 1 online). BLAST comparisons of genomes from the five Pseudomonas representative species identified a set of 2,065 genes that constitutes the Pseudomonas core genome. Based on this analysis, we identified 1,002 genes unique to the P. entomophila genome. We found that, consistent with the close relatedness between P. entomophila and P. putida 1 , 70.2% of P. entomophila genes (3,630) have orthologs in the P. putida genome, of which more than 96% are found in synteny (see Supplementary Table 1 online). The smaller size of the P. entomophila genome compared to that of other Pseudomonas does not seem to originate from reductive evolution. © 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology Received 30 January; accepted 7 April; published online 14 May 2006; doi:10.1038/nbt1212 1 Centre de Ge ´ne ´tique Mole ´culaire, Centre National de la Recherche Scientifique, 91198 Gif-sur-Yvette, France. 2 Genoscope, Centre National de Se ´quenc ¸age and CNRS- UMR8030, 2 rue Gaston Cre ´mieux, 91057 Evry Cedex, France. Correspondence should be addressed to F.B. ([email protected]). NATURE BIOTECHNOLOGY ADVANCE ONLINE PUBLICATION 1 ARTICLES
20

Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

Apr 21, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

Complete genome sequence of the entomopathogenicand metabolically versatile soil bacterium PseudomonasentomophilaNicolas Vodovar1, David Vallenet2, Stephane Cruveiller2, Zoe Rouy2, Valerie Barbe2, Carlos Acosta1,Laurence Cattolico2, Claire Jubin2, Aurelie Lajus2, Beatrice Segurens2, Benoıt Vacherie2, Patrick Wincker2,Jean Weissenbach2, Bruno Lemaitre1, Claudine Medigue2 & Frederic Boccard1

Pseudomonas entomophila is an entomopathogenic bacterium that, upon ingestion, kills Drosophila melanogaster as well as

insects from different orders. The complete sequence of the 5.9-Mb genome was determined and compared to the sequenced

genomes of four Pseudomonas species. P. entomophila possesses most of the catabolic genes of the closely related strain

P. putida KT2440, revealing its metabolically versatile properties and its soil lifestyle. Several features that probably contribute

to its entomopathogenic properties were disclosed. Unexpectedly for an animal pathogen, P. entomophila is devoid of a type III

secretion system and associated toxins but rather relies on a number of potential virulence factors such as insecticidal toxins,

proteases, putative hemolysins, hydrogen cyanide and novel secondary metabolites to infect and kill insects. Genome-wide

random mutagenesis revealed the major role of the two-component system GacS/GacA that regulates most of the potential

virulence factors identified.

Pseudomonas spp. are ubiquitous Gram-negative bacteria that colonizeand survive in numerous ecological niches including soil, water andplant surfaces. This versatility is reflected by the sizes of their genomes,which contain large sets of genes involved in carbon source utilizationand adaptation. In 2001, we isolated a bacterial strain closelyrelated to the saprophytic soil bacterium Pseudomonas putida,Pseudomonas entomophila, which triggers a systemic immune responsein D. melanogaster after ingestion1. P. entomophila is highly pathogenicfor both D. melanogaster larvae and adults. Its persistence in larvaeleads to a massive destruction of gut cells1.

Entomopathogenic bacteria such as the Gram-negative bacteriaPhotorhabdus luminescens, Xenorhabdus nematophilus, Yersiniapestis, Serratia marcescens and Serratia entomophila and the Gram-positive bacterium Bacillus thuringiensis have developed differentstrategies to interact with and kill insects2. Some gene productsderived from these bacteria as well as the bacteria themselves, havebeen used to generate biopesticides3. The ability of P. entomophila toorally infect and kill larvae of insect species belonging to differentorders makes it a promising model for the study of host-pathogeninteractions and for the development of biocontrol agents againstinsect pests. To unravel features contributing to P. entomophila’sentomopathogenic properties, we have determined its completegenome sequence and performed a genome-wide screen for mutantsaffected in their ability to trigger an immune response and lethalityin D. melanogaster.

RESULTS

Genome features and comparative genomics

The P. entomophila genome is composed of a single circular chromo-some of 5,888,780 base pairs (Fig. 1). Among 5,169 coding sequencesidentified, 3,466 genes (67%) have been assigned a predicted function(Table 1). The P. entomophila genome is smaller than the six otherPseudomonas genomes that have been published (Table 1): the humanopportunistic pathogen P. aeruginosa PAO1 (ref. 4), the threeP. syringae pathovars5–7, the plant commensal P. fluorescens Pf-5(ref. 8) and the saprophytic soil bacterium P. putida KT2440 (ref. 9).

GC skew analysis and the predicted location of the origin ofreplication oriC near dnaA and of the chromosome dimer resolutiondif site in PSEEN2780 revealed the presence of two replichores ofsimilar size, contrary to the unbalanced replichores found in thegenomes of P. putida KT2440 (ref. 10) and P. aeruginosa PAO1 (ref. 4)(see Supplementary Fig. 1 online). BLAST comparisons of genomesfrom the five Pseudomonas representative species identified a set of2,065 genes that constitutes the Pseudomonas core genome. Based onthis analysis, we identified 1,002 genes unique to the P. entomophilagenome. We found that, consistent with the close relatedness betweenP. entomophila and P. putida1, 70.2% of P. entomophila genes (3,630)have orthologs in the P. putida genome, of which more than 96% arefound in synteny (see Supplementary Table 1 online). The smallersize of the P. entomophila genome compared to that of otherPseudomonas does not seem to originate from reductive evolution.

©20

06 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

rebiotechnology

Received 30 January; accepted 7 April; published online 14 May 2006; doi:10.1038/nbt1212

1Centre de Genetique Moleculaire, Centre National de la Recherche Scientifique, 91198 Gif-sur-Yvette, France. 2Genoscope, Centre National de Sequencage and CNRS-UMR8030, 2 rue Gaston Cremieux, 91057 Evry Cedex, France. Correspondence should be addressed to F.B. ([email protected]).

NATURE BIOTECHNOLOGY ADVANCE ONLINE PUBLICATION 1

A R T I C L E S

Page 2: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

Indeed the 50 genes of P. entomophila present in other Pseudomonasbut absent from P. putida belong to functional classes as diverse as the34 genes of P. putida present in other Pseudomonas but absent fromP. entomophila. Furthermore, comparison of gene contents inP. entomophila and P. putida indicates that the higher number ofspecies-specific genes in P. putida (1,774 versus 1,539) largely resultsfrom the presence of a higher number of paralogous genes (Fig. 2 andSupplementary Table 2 online). Comparison of the chromosomestructures of P. entomophila and P. putida KT2440 and scatter plotanalysis of syntenic regions of the two strains revealed frequent geneticinversions that reverse the genomic sequence symmetrically acrossoriC as observed in other bacterial genera11 (Fig. 2 and Supplemen-tary Fig. 2 online). The same rearrangement profile was observedwhen comparing the P. entomophila genome with those of otherPseudomonas spp., even though the levels of orthology and of syntenywere lower (see Supplementary Table 1 and Supplementary Fig. 2online). A search for repetitive extragenicpalindromic sequences (REPs) identified943 REPs similar to those found in thegenomes of P. putida KT2440 (ref. 12) andP. fluorescens Pf-5 (ref. 8). The genome ofP. entomophila has been remodeled by geneticmobile elements and bacteriophage inser-tions considerably less than the genomes ofother environmental pseudomonads such asP. putida KT2440 and P. syringae pv. tomatoDC3000 (Fig. 1). Particularly notable arethree clustered prophages related to FluMuphage, a pyocin-like phage and a lambdoidphage; they are inserted between recA andmutS, as observed for FluMu phage inP. fluorescens Pf-5 genome. Also of particularinterest are two putative prophages insertedin genes encoding 4.5S RNA and tmRNA,respectively. The genome of P. entomophilacontains only nine genes encoding

transposase-like proteins including three that are remnant or inactive.Unlike the genomes of P. putida KT2440 and P. syringae pv. tomatoDC3000, the genome of P. entomophila is devoid of type II introns.

Toxins against insects

We used several criteria to uncover genes that may contribute to theentomopathogenic properties of P. entomophila: specificity to theP. entomophila genome, localization within genomic islands thatsuggest recent lateral acquisitions (based on break of the synteny,GC content and absence of REPs) and similarity to genes associatedwith virulence in other systems (Table 2).

Particularly striking are three genes absent from other Pseudomonasgenomes that encode proteins related to insecticidal toxin complexesthat have been found only in entomopathogenic enterobacteria suchas Photorhabdus luminescens, Serratia entomophila, Xenorhabdusnematophilus or in Yersinia spp.13,14. Three basic types of genetic©

2006

Nat

ure

Pu

blis

hin

g G

rou

p

htt

p:/

/ww

w.n

atu

re.c

om

/nat

urebiotechnology

Pseudomonas entomophila

5888780 bp

3000000

3500000

4000000

4500

000

5000

000

55000000

500000

1000000

1500

000

2000

000

2500000

Figure 1 Circular representation of the P. entomophila genome. The outer

scale indicates coordinates in base pairs (bp). Circles 1 and 2 (from outside

to inside) show predicted coding regions transcribed clockwise and

counterclockwise, respectively. Coding sequences are color coded by role

categories: salmon, amino acid biosynthesis; light blue, biosynthesis of

cofactors, prosthetic groups and carriers; light green, cell envelope; red,

cellular processes; brown, central intermediary metabolism; yellow, DNA

metabolism; green, energy metabolism; purple, fatty acid and phospholipidmetabolism; violet, mobile and extrachromosomal element functions; pink,

protein synthesis and fate; orange, purines, pyrimidines, nucleosides and

nucleotides; navy blue, regulatory functions and signal transduction; lime

green, secondary metabolite biosynthesis; gray, transcription; teal, transport

and binding proteins; black, unknown function and hypothetical proteins.

Circle 3 shows rRNA genes in salmon, tRNA genes in green and

miscellaneous RNA genes in blue. Circle 4 shows transposase genes,

putative prophages and gene clusters encoding secondary metabolites coded

by colored symbols as follows: green arrowheads, transposases; gray,

putative prophages; red, pyoverdine synthesis; light blue, cluster involved in

lipopeptide II biosynthesis; violet, acinetobactin-like siderophore synthesis;

light green, cluster involved in lipopeptide III biosynthesis; navy blue,

cluster and isolated genes involved in lipopeptide I biosynthesis; pink,

hydrogen cyanide production; brown, polyketide synthesis. Circle 5 shows

the distribution of REPs. These repeats are scattered all over the genome

and were found either as single elements, in paired elements or in clusters

of up to six elements in alternating orientation. Circle 6 shows G+C in

relation to the mean G+C in a 1,000-bp window. Circle 7 shows GC skew in

a 1,000-bp window.

Table 1 General features of genomes of representative Pseudomonas species

General features Pe Ppa Pf a Paa Psta

Size (Mb) 5.9 6.2 7.1 6.3 6.4

GC (%) 64.2 61.6 63.3 66.6 58.4

Nb CDS 5169 5420 6144 5570 5615

Coding (%) 89.1 87.7 88.8 89 86.8

rRNA operon 7 7 5 4 5

tRNA 78 74 71 63 63

Protein with predicted function (%) 67.1 65.8 62.2 54.2 61.0

Proteins without predicted function

Conserved hypothetical proteins (%) 25.3 19.1 32.5 13.8 17.0

Hypothetical proteins (%) 7.5 15.1 5.3 31.9 22.0

aThe distributions of ORFs for the published chromosomes are derived from the original annotation. These numbers, particularlythose of hypothetical and conserved hypothetical proteins, may be different from numbers obtained with updated BLAST searchesand annotations. Features of the genomes of P. syringae pv. syringae B728a (6.1 Mb) and P. syringae pv. phaesolicola 1448A(5.9 Mb) are not indicated. CDS, coding sequences; Pe, P. entomophila; Pa, Pseudomonas aeruginosa; Pp, Pseudomonas putida;Pf, Pseudomonas fluorescens Pf-5; Pst, Pseudomonas syringae pv. tomato DC3000.

2 ADVANCE ONLINE PUBLICATION NATURE BIOTECHNOLOGY

A R T I C L E S

Page 3: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

elements encode insecticidal toxin complexes: tcdA-, tcdB- and tccC-like genes. The P. entomophila genome encodes three TccC-typeinsecticidal toxins (PSEEN2485, PSEEN2697, PSEEN2788) (seeSupplementary Fig. 3 online). In addition to these three insecticidaltoxins, the P. entomophila genome, like that of P. syringae, encodesproteins more distantly related to TccC-type toxins (PSEEN701 andPSEEN702) and to TcdB-type toxins (PSEEN1172). The threeP. entomophila insecticidal toxins likely play a major role in thepathogenicity of P. entomophila as TccC and TcdB proteins havebeen shown to have entomocidal activity15,16, even though themolecular mechanisms remain to be characterized. These findingshighlight the efficient spreading of toxin-complex gene homologs ininsect-interacting soil bacteria belonging to different genera.

Bacterial hemolysins are exotoxins that attack blood cell membranesand cause cell rupture by poorly defined mechanisms17. Contrary tothe other Pseudomonas tested, P. entomophila secretes a strong diffu-sible hemolytic activity (see Supplementary Fig. 4 online) thatmay also be involved in pathogenicity against D. melanogaster. Weidentified three genes unique to P. entomophila that may be respon-sible for this activity (Table 2). The gene encoding PSEEN3925, aputative repeats-in-toxin (RTX) protein, is clustered with genesencoding a type I secretion system. PSEEN0968 and PSEEN3843 areproteins related to outer membrane autotransporters that havebeen associated with virulence in other bacteria. A number oflipases have also been shown to confer hemolytic activity. TheP. entomophila genome encodes four lipases that are absent fromP. putida KT2440 and that may contribute to its hemolytic activity(PSEEN709, PSEEN1065, PSEEN2195, PSEEN3432). Interestingly,the gene encoding a lysophospholipase (PSEEN709) is found in a

genomic islet associated with two genes encoding proteins related toinsecticidal toxins.

Proteases constitute another important group of extracellular,biologically active substances that are thought to contribute to thevirulence of bacterial species. P. entomophila encodes three serineproteases (PSEEN3027, PSEEN3028, PSEEN4433) and an alkalineprotease (PSEEN1550) absent from P. putida KT2440. These fourgenes are located at synteny break points between the genomes ofP. entomophila and other Pseudomonas spp. PSEEN1550 is the homo-log of the alkaline protease AprA, which has been shown to beinvolved in various virulence processes among different species18.AprA likely plays a key role in virulence because pathogenicity isaffected in mutants defective in PrtR, the predicted transcriptionalregulator of aprA (see below).

Pathogenic bacteria rely on a variety of cell surface–associatedvirulence factors that allow adhesion to the host surface and promoteeffective colonization. Filamentous hemagglutinin-like adhesins arebroadly important virulence factors in both plant and animal patho-gens. The genome of P. entomophila encodes three proteins(PSEEN0141, PSEEN2177, PSEEN3946) that are predicted to beinvolved in adhesion and cluster with genes encoding type I ortwo-partner secretion system proteins (Table 2). We also noticedthe presence of two putative autotransporter proteins with a pertactin-type adhesion domain.

Toxins against competitors

In addition to the putative toxins described above that may becrucial for its entomopathogenic properties, P. entomophilacarries a number of genes specifying diverse traits that may berequired not only for interaction with insects but also for its lifestylein soil, aquatic or rhizosphere environments (see SupplementaryFig. 5 online).

Fluorescent pseudomonads are characterized by the production ofpyoverdines, a diverse class of siderophores containing a chromophorelinked to a small peptide of varying length and composition synthe-sized by nonribosomal peptide synthases19. In P. entomophila, the twogene clusters that encode proteins required for pyoverdine biosynth-esis and uptake (PSEEN1813-PSEEN1815 and PSEEN3224-3234)present a general organization similar to that found in other fluor-escent pseudomonads20. We also identified a gene cluster responsible

©20

06 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

rebiotechnology

0 200 400 600 800 1,000

Unique P. entomophila-specific genes

Duplicated P. entomophila-specific genes

Unique P. putida-specific genes

P. putida KT2440

P. entomophila

Duplicated P. putida-specific genes

Aa

Bc

Ce

Cp

Ci

Dm

Em

Fam

Me

P

Pp

Rf

Sm

T

Tb

Uf

61818621

1 5888780a

b

Figure 2 Comparison of the P. entomophila and P. putida genomes.

(a) Regions of significant sequence identity between the nucleotide

sequence of P. entomophila (top) and P. putida KT2440 (bottom). Colinear

regions are connected by red lines and inverted regions by blue lines.

The display was generated using Artemis Comparison Toll (freely available

at http://www.sanger.ac.uk/Software/ACT/). (b) Specific gene content

comparison of the genomes of P. entomophila and P. putida KT2440.

Specific genes of P. entomophila (Pe) and of P. putida KT2440 (Pp) with noortholog in the other species are indicated in blue and green respectively,

and are classified according to role categories as described in Figure 1.

Two genes were considered as orthologs when their products share more

than 60% identity over more than 80% of their length. Duplicated genes

indicated by light colors were detected by using a constraint of 35% identity

over more than 80% of the length of the protein. Aa, amino acid

biosynthesis; Bc, biosynthesis of cofactors, prosthetic groups and carriers;

Ce, cell envelope; Cp, cellular processes; Ci, central intermediary

metabolism; Dm, DNA metabolism; Em, energy metabolism; Fam, fatty acid

and phospholipid metabolism; Me, mobile and extrachromosomal element

functions; P, protein synthesis and fate; Pp, purines, pyrimidines, nucleo-

sides and nucleotides; Rf, regulatory functions and signal transduction; Sm,

secondary metabolite biosynthesis; T, transcription; Tb, transport and

binding proteins; Uf, unknown function and hypothetical proteins.

NATURE BIOTECHNOLOGY ADVANCE ONLINE PUBLICATION 3

A R T I C L E S

Page 4: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

©20

06 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

rebiotechnology

Table 2 Gene/gene products potentially involved in P. entomophila-D. melanogaster interaction

Gene/gene producta,b,c Function Ps.d

Adhesion

PSEEN0141a Putative surface adhesion protein 54% PP0168e

PSEEN2177a Putative filamentous hemagglutinin 51% PFL4237

PSEEN3946a Putative filamentous hemagglutinin 41% PA0041

PSEEN3161 Putative autotransporter, pertactin-like protein 63% PP3069

PSEEN4310a Putative autotransporter, pertactin-like protein 42% PSPTO2225

Proteases

aprA c Alkaline metalloprotease 72% PSPTO3332

PSEEN3027b Putative autotransporter, SSP-h1 serine protease 68% PSPTO1650

PSEEN3028b Putative autotransporter, serine protease 64% PA3535

PSEEN4433a Putative subtilisin-like serine protease Absent

Lipases

PSEEN0709b Lysophospholipase 76% PA2540

PSEEN1065b Phospholipase C 62% PFL0888

PSEEN2195 Triacylglycerol lipase 64% Pf B52 (P21773)g

PSEEN3432a,b Lipase class3 48% Pfo0149

Toxins

hcnABCc Hydrogen cyanide production 76% PA2193 (hcnA)

PSEEN0132/3332/3042-5a,b Cluster involved in lipopeptide I biosynthesis See texth

PSEEN2138-56a,b Cluster involved in lipopeptide II biosynthesis Absent

PSEEN2716-20b Cluster involved in lipopeptide III biosynthesis 77% Pfo2266 (2717)

PSEEN5524-36a,b Cluster involved in polyketide biosynthesis Absent

PSEEN0701a,b Protein related to TccC-type insecticidal toxin Absentf

PSEEN0702a,b Protein related to TccC-type insecticidal toxin Absentf

PSEEN1172a Protein related to TcdB-type insecticidal toxin Absentf

PSEEN2485a,c TccC-type insecticidal toxin Absent

PSEEN2697a,b,c TccC-type insecticidal toxin Absent

PSEEN2788a,b,c TccC-type insecticidal toxin Absent

PSEEN3326a,b Putative toxin (cytolethal distending toxin B domains) Absent

PSEEN3925-9a Putative RTX toxin and type I secretion system Absent

Miscellaneous

PSEEN0968a,b Putative autotransporter with unknown passenger domain Absent

PSEEN3843a Putative autotransporter with unknown passenger domain 53% PSPTO0714

Noninfectious and nonlethal Tn5 derivativesi

gacS(5) Sensor histidine kinase 88% PP1650

gacA(2) Response regulator, LuxR family 98% PP4099

bioC(1) Biotin biosynthesis 86% PP0365

PSEEN5207(1)-8(2) Putative amino acid ABC transporter 97%/82% PP0283-2

PSEEN4425(2) CHPj, CAIB/BAIF family 62% PFL4631

Infectious and nonlethal Tn5 derivativesi

prtR(3) Transmembrane transcriptional regulator 74% PP2889

algR(2) Transcriptional regulator involved in alginate production 91% PP0185

PSEEN0132(3)-3(1) NRPS loading protein, CHP (operonic) 59%/75% PSPTO5546-7

PSEEN0389(1) Putative chorismate mutase, operonic with glnA, ntrBC 44% PFL0385

aGene products specific to P. entomophila and not found in other Pseudomonas species (constraint of 60% identity over more than 80% of the protein length).bUnusual GC content (differing by more than 1 s.d. from the average GC) likely due to recent lateral transfer.cGene products or predicted domains associated with virulence in other systems.dSequence identity between the protein encoded by P. entomophila and the best BLAST hit among proteins from other Pseudomonas. PP, P. putida KT2440; PA, P. aeruginosa PA01; PSPTO,P. syringae pv. tomato DC3000; PFL, P. fluorescens Pf-5; Pfo P. fluorescens PfO-1 and Pf, P. fluorescens.ePSEEN0141 and PP0168 are aligned only on 67% of PP0168 length.fo40% identity.gTrEMBL accession number.hThis cluster and similarity with that of P. fluorescens Pf-5 are discussed in the Supplementary Figure 5.iSuperscripted numerals indicate the number of independent Tn5 insertions.jConserved hypothetical protein.

4 ADVANCE ONLINE PUBLICATION NATURE BIOTECHNOLOGY

A R T I C L E S

Page 5: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

for the synthesis of a siderophore related to acinetobactin andcontaining a salicylamide moiety21 (Supplementary Fig. 5).

Five gene clusters that direct the production of secondary metabo-lites have been identified (see Supplementary Fig. 5). PSEEN5520-PSEEN5522 are responsible for hydrogen cyanide production that isinvolved in Caenorhabditis elegans killing by P. aeruginosa22 and in thesuppression of soil-borne plant pathogens by certain Pseudomonasspecies23. The genome of P. entomophila contains four clusters of genespredicted to encode three different lipopeptides and a polyketide(Table 2 and Supplementary Fig. 5).

Regulation of virulence revealed by a genome-wide mutagenesis

To directly identify factors that modulate the interaction betweenP. entomophila and D. melanogaster, we generated a Tn5-derivedlibrary of variants that were individually screened for their infectiousand pathogenic properties. Among the 7,500 clones, we isolated 23mutants whose growth was not affected and that displayed attenuatedinfectious and/or pathogenic properties (Table 2). Identification of themini-Tn5 insertion sites identified directly only a putative lipopeptideas a virulence factor. No other genes predicted to be virulence factorswere identified, indicating a likely redundancy. By contrast, a numberof insertions affected regulators that likely modulate the expression ofsuch virulence factors. Seven independent insertions inactivated thetwo-component system GacS/GacA involved in the regulation ofvarious processes, including virulence in different species, and resultedin the inability of these mutants to induce an immune response.P. entomophila gac mutants are defective in secretion of protease andhemolysin (data not shown) and do not persist in the gut ofD. melanogaster1, indicating the pivotal role of GacS/GacA inmodulating the entomopathogenic properties of that strain. Asobserved in other Pseudomonas species23, the GacS/GacA two-component system probably regulates P. entomophila virulence genesat a post-transcriptional level via the two identified small noncodingRsmY and RsmZ RNAs that alleviate post-transcriptional repressionby RsmA and RsmE homologs. Three independent insertions in theprtR gene reduce the pathogenic properties of P. entomophila butretain the capacity to induce an immune response. In P. fluorescensLS107d2 (ref. 24), PrtR and PrtI regulate the transcription of theaprA-inh-aprDEF operon suggesting that P. entomophila relies onAprA protease to fully express its pathogenic properties inD. melanogaster. Two independent insertions that had the sameconsequences for the interaction with D. melanogaster have beenfound in algR. In P. aeruginosa, AlgR regulates a number of processesincluding fimbrial biogenesis, biofilm formation and cyanide produc-tion25,26. Altogether, genetic analysis indicates that GacA is a masterregulator of the interaction and that PrtR and AlgR regulators, seem toplay secondary roles in the infection process.

Metabolism, transport and regulation

The P. entomophila genome encodes most of the central metabolicpathways found in the other Pseudomonas including the pentose phos-phate pathway, the Entner-Doudoroff pathway and the tricarboxylicacid cycle. Consistent with Pseudomonas metabolism, P. entomophilahas an incomplete Embden-Meyerhof-Parnas pathway owing to theabsence of 6-phosphofructokinase, and relies on a complete Entner-Doudoroff route for hexose utilization. The P. entomophila genomeharbors several genes that encode hydrolytic activities such as chitinases,lipases and proteases as well as a set of 19 uncharacterized hydrolases,which are potentially involved in the degradation of polymers found inthe soil. However, contrary to phytopathogenic strains such asP. syringae5–7, the genome of P. entomophila is devoid of genes encodingenzymes capable of degrading plant cell walls. This is consistent withthe observation that this species is not pathogenic for plants (M. Arlat,Institut National de la Recherche Agronomique, Castanet, France,personal communication).

The P. entomophila genome also contains determinants forthe catabolism of various aromatic compounds (see SupplementaryFig. 6 online) and long-chain carbohydrates. P. entomophila sharesseveral gene clusters with P. putida27 that are involved in thedegradation of various classes of aromatic compounds includingbenzoate and quinate, 4-hydroxybenzoate, phenylacetaldehyde andphenylalkanoate as well as phenylalanine and tyrosine. The P. ento-mophila genome contains two additional catabolic gene clusterspresent in the genome of P. aeruginosa PAO1 that encode determi-nants for the degradation of 3-hydroxybenzoate through gentisate28

and for the meta-cleavage of homoprotocatechuate29,30.Consistent with the size of its genome, P. entomophila possesses

more than 535 transporter-encoding genes. Remarkably, no genesencoding a type III or type IV secretion system, present in numerousGram-negative bacterial pathogens31, were found in P. entomophila.The high numbers of transcriptional regulators (more than 300) andgenes whose products are involved in signal transduction suggests thatP. entomophila is able to adapt to substantial substrate variations inits habitats.

The soil and entomopathogenic lifestyle of P. entomophila

The metabolic properties of P. entomophila predicted from its genomesuggest that this strain is a ubiquitous, metabolically versatile bacter-ium that may colonize diverse habitats including soil, rhizosphere andaquatic systems as shown for P. putida KT2440. However, in contrastto P. putida, P. entomophila contains a number of genes that arepredicted, or have been shown, to be important for virulence. Theexpression of these factors is under the control of the major regulatorGacA and presumably allows this strain to exploit new niches andinteract with various insects, particularly D. melanogaster (Fig. 3).

©20

06 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

rebiotechnology

ProventriculusEsophagus Midgut

gc

Ingestion (0 h)

1 3

5

42

Resistance to oxidative burstCatalases, SOD, GST

Persistence (2–3 h)Gac-dependent PPF

Immune response escape (3–6 h)PrtR, AprA, AlgR

Pathogenicity and death (12–24 h) Tc toxins, proteases,hemolysin, HCN, lipopeptides

*

PMmv

Ep

Figure 3 Steps in the interaction between P. entomophila and

D. melanogaster. Five different steps are shown: 1. ingestion of

P. entomophila through the esophagus; 2. resistance to oxidative

stress in response to a oxidative burst in the gut; 3. persistence of

P. entomophila in the gut; 4. escape from immune response effectors;

5. pathogenicity and lethal outcome of the interaction after important

modifications of the midgut physiology including microvilli disruption,

cell destruction (indicated by a *) and in some cases peritrophic matrixdisorganization (indicated by a dashed line). Red indicates important

steps in the infection process. Blue indicates newly identified proteins that

could be involved at these steps in the process. Time scale is indicated in

brackets. Ep, epithelial cell; mv, microvilli; PM, peritrophic matrix; gc,

gastric cecum.

NATURE BIOTECHNOLOGY ADVANCE ONLINE PUBLICATION 5

A R T I C L E S

Page 6: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

In D. melanogaster, an environment hostile for microbial coloniza-tion is maintained in the gut by secretion of antimicrobial factors suchas lysozymes32,33 and other digestive enzymes. Recently, it has beenshown that a unique epithelial oxidative burst limits microbialproliferation in the gut34; resistance to oxidative stress mighttherefore be a prerequisite for D. melanogaster gut colonization. TheP. entomophila genome encodes 40 proteins that are predicted to beinvolved in resistance to oxidative stress including four catalases, twosuperoxide dismutases, three hydroperoxide reductases and elevenglutathione-S-transferases. It is noteworthy that resistance to oxidativestress is probably not sufficient for colonization as otherPseudomonas species that possess a large repertoire of oxidant detox-ifying proteins are not able to persist in the gut of D. melanogaster1.This assumption is further reinforced by the observation thatP. entomophila gacA mutants were not less resistant toperoxide, hypochlorite or paraquat (data not shown). As theP. entomophila-D. melanogaster interaction is specific, P. entomophilainfectivity likely involves the expression of a specific gene enablingthis strain to persist in the D. melanogaster gut, as shown forthe Erwinia carotovora Evf factor35. Because P. entomophila doesnot contain any evf-related genes, we cannot predict candidatesfor this putative persistence promoting factor (ppf in Fig. 3). None-theless, this gene is likely regulated by the GacS/GacA two-componentsystem: the gacA::Tn5 or gacS::Tn5 mutants do not persist in thegut and P. entomophila cells are infectious only at stationaryphase, concomitant with Gac activation of virulence genes (datanot shown). It is striking to note that in both P. entomophilaand E. carotovora35, genes required to interact with D. melanogasterare under the control of global regulators, that is, Hor and GacA,respectively, revealing the branching of virulence genes in a complexregulatory network.

Infection of D. melanogaster by P. entomophila is accompanied byblockage of food-uptake1. This phenomenon is also observed in theinteraction between Serratia entomophila and the grass grub Costelytrazealandica or between Yersinia pestis and the flea. The processes usedto effect food blockage seem to be different in the two systems; Y. pestisrelies on phospholipase synthesis and biofilm formation36,37 whereasthe mechanism used by S. entomophila remains unknown38. Genesresponsible for the anti-feeding determinants of S. entomophila have aprophage origin and no related genes were identified in the genome ofP. entomophila. Since algR mutants still provoke food-uptake blockage,biofilm formation is probably not essential for D. melanogasterinfection by P. entomophila.

The persistence of P. entomophila in the larval gut triggers both alocal and systemic immune response1. The P. entomophila levelremains high in wild-type larvae, similar to that observed in a relishmutant unable to induce an immune response1, suggesting that thisstrain is able to escape the D. melanogaster immune response. Biofilmformation might protect P. entomophila cells from immune effectorsor persistence of bacteria might result from the degradation ofeffectors. The defects observed with prtR mutants indicated thatAprA may degrade antimicrobial peptides, as indicated by recentin vivo studies39, and consequently disable the immune response.

Twelve hours after D. melanogaster ingests the bacteria, physiologi-cal modifications to the fly caused by P. entomophila are dramatic andthe expression of 205 D. melanogaster genes is modified1 (Fig. 3).These changes probably result from the action of virulence factorssuch as proteases, hemolysins, insecticidal toxin-like proteins, second-ary metabolites or hydrogen cyanide. However, lethality starts to beapparent after 16 h, indicating that this late gene expression will haveno effect on the fatal outcome of the interaction.

DISCUSSION

The complete genome sequence of P. entomophila provides insightinto this organism’s entomopathogenic lifestyle. Combined with agenetic approach, it has revealed potential virulence factors along withregulators that modulate their expression. This study also revealed thatP. entomophila is the first Pseudomonas strain to be pathogenic in amulticellular organism and at the same time to be devoid of a type IIIsecretion system. Its potential to use various plant-derived compoundsincluding aromatic molecules, and its antibiotic- and oxidative stress-resistance capacities suggest that P. entomophila is a commensalbacterium. As this strain is not a plant pathogen, it may have potentialto control insects. Unexpectedly for an environmental isolate,P. entomophila has a genome that contains a limited number ofbacteriophages and transposons. This may contribute to its relativelysmall size compared to other Pseudomonas genomes. Finally, thecomplete genome sequence of P. entomophila provides a frameworkfor further studies to characterize its pathogenic properties and for ahost-pathogen system in which both organisms are amenable togenetic and genomic analysis.

METHODSGenome sequencing, assembly and annotation. The complete genome

sequence of P. entomophila L48 was determined using the whole-genome

shotgun method (10� coverage, using two plasmid libraries and one BAC

library to order contigs). Finishing was performed by PCR amplification from

contigs extremities. After a first round of annotation, regions of lower quality as

well as regions with putative frameshifts were resequenced from PCR ampli-

fication of the dubious regions. Using the AMIGene software (annotation of

microbial genes)40, a total of 5,279 CoDing Sequences were predicted (and

assigned a unique identifier prefixed with ‘‘PSEEN’’), and submitted to

automatic functional annotation: exhaustive BLAST searches against the Uni-

Prot databank were performed to determine significant homology. Protein

motifs and domains were documented using the InterPro databank. In parallel,

genes coding for enzymes were classified using the PRIAM software41.

TMHMM vs2.0 was used to identify transmembrane domains42, and SignalP

3.0 was used to predict signal peptide regions43. Finally, tRNAs were identified

using tRNAscan-SE44. Sequence data for comparative analyses were obtained

from the NCBI databank (RefSeq section). Putative orthologs and synteny

groups (that is, conservation of the chromosomal colocalization between pairs

of orthologous genes from different genomes) were computed between

P. entomophila and all the other complete genomes as previously described45.

Manual validation of the automatic annotation was performed using the

MaGe (Magnifying Genomes) interface, which allows graphic visualization

of the P. entomophila annotations enhanced by a synchronized representa-

tion of synteny groups in other genomes chosen for comparisons45. All the

data (that is, syntactic and functional annotations, and results of compara-

tive analysis) were stored in a relational database, called EntomoScope.

This database is publicly available via the MaGe interface at http://

www.genoscope.cns.fr/agc/mage/.

Bacterial mutagenesis and screening. Random mutagenesis was performed by

biparental mating using P. entomophila1 and Escherichia coli S17.1-lpir46

carrying the pUT-Tn5-Tc suicide plasmid as previously described47. A total

of 7,500 TcR colonies obtained from several independent conjugations were

screened individually as previously described35. Transconjugants that displayed

attenuated virulence were subjected to several secondary screenings by natural

infection as previously described1. Insertion sites were determined using two

different methods. First, genomic DNA was digested by PstI or NotI/PstI and

ligated into pUC18 and pBlueScript, respectively. Clones that contained the

mini-transposon and its flanking sequences were selected by plating the E. coli

BW25142 transformants on tetracycline (10 mg/ml). One flanking region was

sequenced from the Tc gene using the oligonucleotide (Tc-F) 5¢-TCGTCGACA

AGCTTCGG-3¢. Some insertion sites were determined by reverse PCR method.

Genomic DNA was digested by either PstI or EagI, self-ligated and amplified

using the oligonucleotides Tc-F and 5¢-AGATCTGATCAAGAGACAT-3¢

©20

06 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

rebiotechnology

6 ADVANCE ONLINE PUBLICATION NATURE BIOTECHNOLOGY

A R T I C L E S

Page 7: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

for PstI-digested DNA or 5¢-GGCGGCCCTATACCTTGTCTG-3¢ (Tet-end) and

5¢-CATAATGGGGAAGGCCAT-3¢ for EagI-digested DNA, respectively. One

flanking region was sequenced using the oligonucleotides Tc-F or Tet-end.

Insertion sites were confirmed by amplifying the region overlapping the

insertion site. Southern blot analysis was carried out to verify that the selected

clones only carried a single copy of the transposon.

Accession numbers. The P. entomophila nucleotide sequence and annota-

tion data have been deposited in the EMBL databank under accession

number CT573326.

Note: Supplementary information is available on the Nature Biotechnology website.

ACKNOWLEDGMENTSThis work was supported by CNRS (Programme Sequencage a grande echelle),by IFR115 and by MRT/ACI IMPBio 2004 ‘MicroScope.’ We thank CeliaFloquet and Camille Jourlain for technical assistance, Matthieu Arlat for plantassays and helpful discussions, Alexandra Gruss, Linda Sperling and SeanKennedy for critical reading of the manuscript, Olivier Espeli for expertannotation. N.V. was supported by a doctoral fellowship from the AssociationVaincre la Mucoviscidose and the Association pour la Recherche sur le Cancer.

AUTHOR CONTRIBUTIONSN.V., V.B., P.W., B.S., J.W., B.L., C.M. and F.B. designed research; N.V., D.V.,S.C., Z.R., V.B., C.A., L.C., C.J., A.L., B.V. and F.B. performed research; N.V.,D.V., S.C., V.B., C.A., C.M. and F.B. contributed new reagents/analytic tools;N.V., D.V., S.C., Z.R., V.B., C.A., L.C., C.J., A.L., B.V., B.L., C.M. and F.B.analyzed data; and N.V. and F.B. wrote the paper.

COMPETING INTERESTS STATEMENTThe authors declare that they have no competing financial interests.

Published online at http://www.nature.com/naturebiotechnology/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Vodovar, N. et al. Drosophila host defense after oral infection by an entomopathogenicPseudomonas species. Proc. Natl. Acad. Sci. USA 102, 11414–11419 (2005).

2. Waterfield, N.R., Wren, B.W. & ffrench-Constant, R.H. Invertebrates as a source ofemerging human pathogens. Nat. Rev. Microbiol. 2, 833–841 (2004).

3. Chattopadhyay, A., Bhatnagar, N.B. & Bhatnagar, R. Bacterial insecticidal toxins. Crit.Rev. Microbiol. 30, 33–54 (2004).

4. Stover, C.K. et al. Complete genome sequence of Pseudomonas aeruginosa PAO1, anopportunistic pathogen. Nature 406, 959–964 (2000).

5. Buell, C.R. et al. The complete genome sequence of the Arabidopsis and tomatopathogen Pseudomonas syringae pv. tomato DC3000. Proc. Natl. Acad. Sci. USA 100,10181–10186 (2003).

6. Feil, H. et al. Comparison of the complete genome sequences of Pseudomonas syringaepv. syringae B728a and pv. tomato DC3000. Proc. Natl. Acad. Sci. USA 102, 11064–11069 (2005).

7. Joardar, V. et al. Whole-genome sequence analysis of Pseudomonas syringae pv.phaseolicola 1448A reveals divergence among pathovars in genes involved in virulenceand transposition. J. Bacteriol. 187, 6488–6498 (2005).

8. Paulsen, I.T. et al. Complete genome sequence of the plant commensal Pseudomonasfluorescens Pf-5. Nat. Biotechnol. 23, 873–878 (2005).

9. Nelson, K.E. et al. Complete genome sequence and comparative analysis of themetabolically versatile Pseudomonas putida KT2440. Environ. Microbiol. 4, 799–808 (2002).

10. Weinel, C., Nelson, K.E. & Tummler, B. Global features of the Pseudomonas putidaKT2440 genome sequence. Environ. Microbiol. 4, 809–818 (2002).

11. Eisen, J.A., Heidelberg, J.F., White, O. & Salzberg, S.L. Evidence for symmetricchromosomal inversions around the replication origin in bacteria. Genome Biol 1,RESEARCH0011 (2000).

12. Aranda-Olmedo, I., Tobes, R., Manzanera, M., Ramos, J.L. & Marques, S. Species-specific repetitive extragenic palindromic (REP) sequences in Pseudomonas putida.Nucleic Acids Res. 30, 1826–1833 (2002).

13. Waterfield, N.R., Bowen, D.J., Fetherston, J.D., Perry, R.D. & ffrench-Constant, R.H.The tc genes of Photorhabdus: a growing family. Trends Microbiol 9, 185–191 (2001).

14. Bowen, D. et al. Insecticidal toxins from the bacterium Photorhabdus luminescens.Science 280, 2129–2132 (1998).

15. Joo Lee, P. et al. Cloning and heterologous expression of a novel insecticidal gene(tccC1) from Xenorhabdus nematophilus strain. Biochem. Biophys. Res. Commun.319, 1110–1116 (2004).

16. Waterfield, N., Hares, M., Yang, G., Dowling, A. & ffrench-Constant, R. Potentiation andcellular phenotypes of the insecticidal Toxin complexes of Photorhabdus bacteria. CellMicrobiol. 7, 373–382 (2005).

17. Wilson, M., McNab, R. & Henderson, B.. Bacterial Disease Mechanisms (CambridgeUniversity Press, Cambridge, UK, 2002).

18. Miyoshi, S. & Shinoda, S. Microbial metalloproteases and pathogenesis. MicrobesInfect. 2, 91–98 (2000).

19. Meyer, J.M. Pyoverdines: pigments, siderophores and potential taxonomic markers offluorescent Pseudomonas species. Arch. Microbiol. 174, 135–142 (2000).

20. Ravel, J. & Cornelis, P. Genomics of pyoverdine-mediated iron uptake in pseudomo-nads. Trends Microbiol. 11, 195–200 (2003).

21. Mercado-Blanco, J. et al. Analysis of the pmsCEAB gene cluster involved in biosynth-esis of salicylic acid and the siderophore pseudomonine in the biocontrol strainPseudomonas fluorescens WCS374. J. Bacteriol. 183, 1909–1920 (2001).

22. Gallagher, L.A. & Manoil, C. Pseudomonas aeruginosa PAO1 kills Caenorhabditiselegans by cyanide poisoning. J. Bacteriol. 183, 6207–6214 (2001).

23. Haas, D. & Defago, G. Biological control of soil-borne pathogens by fluorescentpseudomonads. Nat. Rev. Microbiol. 3, 307–319 (2005).

24. Burger, M., Woods, R.G., McCarthy, C. & Beacham, I.R. Temperature regulation ofprotease in Pseudomonas fluorescens LS107d2 by an ECF sigma factor and atransmembrane activator. Microbiology 146, 3149–3155 (2000).

25. Lizewski, S.E. et al. Identification of AlgR-regulated genes in Pseudomonas aeruginosaby use of microarray analysis. J. Bacteriol. 186, 5672–5684 (2004).

26. Whitchurch, C.B. et al. Phosphorylation of the Pseudomonas aeruginosa responseregulator AlgR is essential for type IV fimbria-mediated twitching motility. J. Bacteriol.184, 4544–4554 (2002).

27. Jimenez, J.I., Minambres, B., Garcia, J.L. & Diaz, E. Genomic analysis of the aromaticcatabolic pathways from Pseudomonas putida KT2440. Environ. Microbiol. 4, 824–841 (2002).

28. Liu, D.Q., Liu, H., Gao, X.L., Leak, D.J. & Zhou, N.Y. Arg169 is essential for catalyticactivity of 3-hydroxybenzoate 6-hydroxylase from Klebsiella pneumoniae M5a1. Micro-biol. Res. 160, 53–59 (2005).

29. Prieto, M.A., Diaz, E. & Garcia, J.L. Molecular characterization of the 4-hydroxyphe-nylacetate catabolic pathway of Escherichia coli W: engineering a mobile aromaticdegradative cluster. J. Bacteriol. 178, 111–120 (1996).

30. Thotsaporn, K., Sucharitakul, J., Wongratana, J., Suadee, C. & Chaiyen, P. Cloning andexpression of p-hydroxyphenylacetate 3-hydroxylase from Acinetobacter baumannii:evidence of the divergence of enzymes in the class of two-protein component aromatichydroxylases. Biochim. Biophys. Acta 1680, 60–66 (2004).

31. Hueck, C.J. Type III protein secretion systems in bacterial pathogens of animals andplants. Microbiol. Mol. Biol. Rev. 62, 379–433 (1998).

32. Hultmark, D. Insect lysozymes. EXS 75, 87–102 (1996).33. Regel, R., Matioli, S.R. & Terra, W.R. Molecular adaptation of Drosophila melanogaster

lysozymes to a digestive function. Insect Biochem. Mol. Biol. 28, 309–319(1998).

34. Ha, E.M. et al. An antioxidant system required for host protection against gut infectionin Drosophila. Dev. Cell 8, 125–132 (2005).

35. Basset, A., Tzou, P., Lemaitre, B. & Boccard, F. A single gene that promotes interactionof a phytopathogenic bacterium with its insect vector, Drosophila melanogaster. EMBORep. 4, 205–209 (2003).

36. Hinnebusch, B.J. et al. Role of Yersinia murine toxin in survival of Yersinia pestis in themidgut of the flea vector. Science 296, 733–735 (2002).

37. Darby, C., Ananth, S.L., Tan, L. & Hinnebusch, B.J. Identification of gmhA, a Yersiniapestis gene required for flea blockage, by using a Caenorhabditis elegans biofilmsystem. Infect. Immun. 73, 7236–7242 (2005).

38. Hurst, M.R., Glare, T.R. & Jackson, T.A. Cloning Serratia entomophila antifeedinggenes–a putative defective prophage active against the grass grub Costelytra zealan-dica. J. Bacteriol. 186, 5116–5128 (2004).

39. Liehl, P., Blight, M., Vodovar, N., Boccard, F. & Lemaitre, B. Prevalence of localimmune response against oral infection in a Drosophila/Pseudomonas infection model.PLoS Pathog., in the press.

40. Bocs, S., Cruveiller, S., Vallenet, D., Nuel, G. & Medigue, C. AMIGene: Annotation ofMIcrobial Genes. Nucleic Acids Res. 31, 3723–3726 (2003).

41. Claudel-Renard, C., Chevalet, C., Faraut, T. & Kahn, D. Enzyme-specific profiles forgenome annotation: PRIAM. Nucleic Acids Res. 31, 6633–6639 (2003).

42. Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E.L. Predicting transmembraneprotein topology with a hidden Markov model: application to complete genomes. J. Mol.Biol. 305, 567–580 (2001).

43. Bendtsen, J.D., Nielsen, H., von Heijne, G. & Brunak, S. Improved prediction of signalpeptides: SignalP 3.0. J. Mol. Biol. 340, 783–795 (2004).

44. Lowe, T.M. & Eddy, S.R. tRNAscan-SE: a program for improved detection of transferRNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).

45. Vallenet, D. et al. MaGe: a microbial genome annotation system supported by syntenyresults. Nucleic Acids Res. 34, 53–65 (2006).

46. Miller, V.L. & Mekalanos, J.J. A novel suicide vector and its use in construction ofinsertion mutations: osmoregulation of outer membrane proteins and virulence deter-minants in Vibrio cholerae requires toxR. J. Bacteriol. 170, 2575–2583 (1988).

47. de Lorenzo, V., Herrero, M., Jakubzik, U. & Timmis, K.N. Mini-Tn5 transposonderivatives for insertion mutagenesis, promoter probing, and chromosomalinsertion of cloned DNA in gram-negative eubacteria. J. Bacteriol. 172, 6568–6572(1990).

©20

06 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

rebiotechnology

NATURE BIOTECHNOLOGY ADVANCE ONLINE PUBLICATION 7

A R T I C L E S

Page 8: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

0 e+00 1 e+06 2 e+06 3 e+06 4 e+06 5 e+06 6 e+06

0.0

60

.04

0.0

20

.00

0.0

20

.04

0.0

6

GC

sk

ew

0 e+00 1 e+06 2 e+06 3 e+06 4 e+06 5 e+06 6 e+06

0.0

40

.02

0.0

00

.02

0.0

40

.06

Position (bp)

GC

sk

ew

Position (bp)

P. entomophila

P. putida KT2440

dnaA

dnaA

Dif

0 e+00 1 e+06 2 e+06 3 e+06 4 e+06 5 e+06 6 e+06

0.0

60

.04

0.0

20.0

00.0

20.0

40

.06

Position (bp)

GC

skew

P. aeruginosa PAO1

P. syringae pv. tomato DC3000

0 e+00 1 e+06 2 e+06 4 e+06 5 e+06 6 e+06

0.0

40.0

20.0

00.0

20.0

4

GC

skew

0 e+00 1 e+06 2 e+06 3 e+06 4 e+06 5 e+06 6 e+06

0.0

40.0

20.0

00.0

20.0

4

Position (bp)

GC

sk

ew

Position (bp)

3 e+06

P. syringae pv. syringae B728a

dnaA

dnaA

dnaADif

Dif

Dif

0 e+00 1 e+06 2 e+06 3 e+06 4 e+06 5 e+06 6 e+06 7 e+06

0.0

60.0

40

.02

0.0

00

.02

0.0

40

.06

Position (bp)

GC

sk

ew

P. fluorescens Pf-5

dnaA

Dif

Dif

Supplementary Figure 1 Comparison of the replichore organization in selected Pseudomonas

genomes. Replichores were mapped by GC skew analysis using a 2000-bp window. The dnaA gene

close to oriC and the chromosome dimer resolution dif site are shown in red and green, respectively.

Supplementary information for Vodovar et al. "Complete genome sequence of the entomopathogenic

and metabolically versatile soil bacterium Pseudomonas entomophila"

Page 9: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

P. entomophila

P. putida KT2440

P. entomophila

P. syringae pv. tomato D3000

P. entomophila

P. aeruginosa PAO1

P. entomophila

P. fluorescens Pf-5

P. entomophila

P. entomophila L48

P. a

eru

gin

osa

PA

01

P. puti

da K

T2440

P. fl

uore

scen

s P

f-05

P. sy

ringae

pv.

tom

ato D

C3000

A

C'

C

B

A'

B'

D'

D

1

1

1

1

1

1

1

1

5888780

5888780

5888780

5888780

6397126

6264403

7074893

6181863

Supplementary information for Vodovar et al. "Complete genome sequence of the entomopathogenic

and metabolically versatile soil bacterium Pseudomonas entomophila"

Page 10: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

Supplementary Figure 2 Comparison of the P. entomophila genome with that of other

Pseudomonas species and visualization of P. entomophila genomic synteny compared to

selected Pseudomonas. Regions of significant sequence identity between the nucleotide

sequence of P. entomophila (top) and of different Pseudomonas spp (bottom): P. putida

KT2440 (a), P. fluorescens Pf-5 (b), P. aeruginosa PAO1 (c) or P. syringae pv. tomato

DC3000 (d) connected by red (collinear regions) and blue (inverted regions) lines. Axes

represent the portions coded for in the order in which they occur in the chromosomes. The

display was generated using Artemis Comparison Toll (freely available at

www.sanger.ac.uk/Software/ACT). Visualization of P. entomophila genomic synteny

compared to P. putida KT2440 (a’), P. fluorescens Pf-5 (b’), P. aeruginosa PAO1 (c’) or

P.syringae pv. tomato DC3000 (d’). Square represent groups of synteny with the size in

respect of their length. Groups of synteny correspond to groups of the genes shared by the 2

genomes (> 60% identity on > 80% of their length at the protein level) that display similar

organization with authorizing five insertion or deletion events.

Page 11: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

Se SepA (pADAP_54)

PSPTO1231

PSPPH_2571

982

PSPTO4344

PSPTO4343

944

1000

PSPTO4340

PSPPH_4042

1000

1000

YPO3674

YPO3673

1000

PSEEN2788

PSEEN2485

PSEEN2697

640

YPO2312

Se SepC (pADAP_57)

1000

998

Pl TccC7 (plu4488)

Pl TccC1 (plu4167)

Pl TccC4 (plu0976)

Pl TccC3 (plu0967)

Pl TccC6 (plu4182)

762

466

Pl TccC5 (plu0964)

479

995

1000

919

782

1000

Serratia entomophila,

Pseudomonas entomophila,

Photorhabdus luminescens,

Yersinia YPO2312

subgroup

Yersinia spp.

subgroup

Pseudomonas syringae

subgroup

Supplementary information for Vodovar et al. "Complete genome sequence of the entomopathogenic

and metabolically versatile soil bacterium Pseudomonas entomophila"

Supplementary Figure 3 Phylogeny of representative TccC-type toxins. The tree was

reconstructed using the NJ method. The number shown next to each node indicates the bootstrap

values of 1,000 replicates. The sequence of Serratia entomophila SepA protein was used as

outgroup. The tree uncovers three major subgroups of TccC-type toxins and reveals that the P.

entomophila toxins are related to Yersinia spp. toxin YPO2312, to the S. entomophila SepC toxins

and to Photorhabdus luminescens TccC toxins (subgroup blue). The TccC-type toxins identified in

the genomes of the three P. syringae (green) are more distantly related. Labels indicate the

GenBank locus tags except for the S. entomophila Sep proteins and the P. luminescens TccC

proteins where both the name of the protein and the locus tag are shown (locus tags in brackets).

YPO: Yersinia pestis, PSPTO: P. syringae pv. tomato DC3000 and PSPPH: P. syringae pv.

phaseolicola 1448A.

Page 12: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

1 2 3 4

5 67 8 9 10

1112

1314 15 16

1718 19

20 21 22

2324

2526 27

2829

1: P. fluorescens pv. lomagnae; 2: P. cedrina; 3: P. libanesis ; 4: P. mandelli;

5: P. corrugata; 6: P. fluorescens biovar 1; 7: P. fluorescens biovar 2;

8: P. fluorescens biovar 3; 9: P. fluorescens biovar 4; 10 P. fluorescens biovar 5;

11: P. marginalis pv. marginalis; 12: P. rhodesiae; 13 P. tolaasii; 14: P. veronii;

15: P. putida KT2440; 16: P. putida; 17 P. putida biovar b; 18: P. montellii;

19: P. mossellii; 20: P. cichorii; 21: P. fuscovaginae; 22: P. chlororaphis;

23: P. aeruginosa PAO1; 24: P. gingeri; 25 P. brassicacearum; 26: P. jessenii;

27: P. agarici; 28: P. asplenii; 29: P. entomophila.

Supplementary information for Vodovar et al. "Complete genome sequence of the entomopathogenic

and metabolically versatile soil bacterium Pseudomonas entomophila"

Supplementary Figure 4 P. entomophila secretes a highly diffusible hemolytic activity compared

to the other Pseudomonas strains tested. These strains have been previously described1 except P.

jessenii (CFBP4842) and P. asplenii (CFBP2063) that come from the Collection Française de

Bactéries Phytopathogènes. In the conditions tested (29°C on Trypticase soy broth containing sheep

erythrocytes (bioMérieux, Marcy l’Etoile)), a slight hemolytic activity is barely observed for some

strains (for example P. cedrina) whereas hemolytic activity catalyzed by phospholipase C of P.

aeruginosa PAO1 is repressed 2.

1. Vodovar, N. et al. Drosophila host defense after oral infection by an entomopathogenic

Pseudomonas species. PNAS 102, 11414-11419 (2005).

2. Vasil, M.L., Berka, R.M., Gray, G.L. & Nakai, H. Cloning of a phosphate-regulated

hemolysin gene (phospholipase C) from Pseudomonas aeruginosa. J Bacteriol. 152, 431-440.

(1982).

Page 13: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

4. Lipopeptide II: PSEEN2138-PSEEN2156

2. Acinetobactin-like siderophore: PSEEN2492-PSEEN2507

5. Lipopeptide III: PSEEN2716-PSEEN2720

//1. Pyoverdine: PSEEN1813-PSEEN1815//PSEEN3223-PSEEN3234

//3. Lipopeptide I: PSEEN0132//PSEEN3042-PSEEN3045//PSEEN3332

6. HCN and PKS: PSEEN5520-PSEEN5536

//

Supplementary information for Vodovar et al. "Complete genome sequence of the

entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila"

Page 14: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

Supplementary Figure 5 Gene clusters involved in siderophore and secondary metabolism

biosynthesis in P. entomophila. Each cluster identified on the genome is represented and the

genes colored according to their assigned function as described in Figure 1: salmon, amino

acid biosynthesis; red, cellular processes; brown, central intermediary metabolism; navy blue,

regulatory functions and signal transduction; lime green, secondary metabolite biosynthesis;

teal, transport and binding proteins; black, unknown function and hypothetical proteins.

Concerning the siderophore related to acinetobactin, the gene cluster (PSEEN2492-

PSEEN2507) encodes determinants for both the synthesis of salicylic acid (SA)(PSEEN2504-

2507; similar to the pmsCEAB cluster from P. fluorescens WCS7343) and for the

nonribosomal synthesis and transport of a siderophore (PSEEN2492-PSEEN2503; similar to

the gene cluster involved in acinetobactin biosynthesis in Acinetobacter baumannii4).

Contrary to acinetobactin, the siderophore produced is thought to contain a salicylamide

moiety and might resemble to pseudomonine from P. fluorescens WCS734 as the genes

cluster in SA and siderophore biosynthesis seem to be linked3.

The cluster predicted to encode determinants for the synthesis of the lipodecapeptide I

contains three nonribosomal peptide synthetase (NRPS) encoding genes (PSEEN3332,

PSEEN3044-45) similar to those of P. fluorescens Pf-5 (PFL2144-2146) that are predicted to

be involved in a cyclic lipodecapeptide biosynthesis5. The ortholog of PFL2145 is found apart

from the two others and PSEEN3045 corresponds to a complete duplication of PFL2146. As

observed in the genome of P. fluorescens Pf-5, this cluster lacks an initial loading module;

this function may be carried by PSEEN0132 that specifies a loading module of NRPS. The

cluster predicted to encode determinants responsible for the synthesis of lipopeptide II

(PSEEN2139-PSEEN2156) is 32-kb long and encodes several NRPSs and PKSs which may

involved in the production of a novel uncharacterized lipopeptide. The cluster predicted to

encode determinants responsible for the synthesis of lipopeptide III contains three genes

(PSEEN2716-PSEEN2720) similar to Psyr1792-4 from P. syringae B728a. They encode two

NRPSs and a hybrid NRPS/polyketide synthase (PKS). As P. entomophila is not pathogenic

for plants (M. Arlat, unpublished data), this lipopeptide is probably not a phytotoxin but may

rather be involved in the suppression of plant diseases by competing with plant pathogens.

The cluster predicted to encode determinants responsible for the synthesis of a PKS is

adjacent to genes encoding hydrogen cyanid synthase, spans over 21-kb and contains genes

encoding five PKSs and six proteins related to polyketide biosynthesis (PSEEN5524-5536).

Page 15: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

3. Mercado-Blanco, J. et al. Analysis of the pmsCEAB gene cluster involved in

biosynthesis of salicylic acid and the siderophore pseudomonine in the biocontrol strain

Pseudomonas fluorescens WCS374. J Bacteriol 183, 1909-1920 (2001).

4. Dorsey, C.W., Tolmasky, M.E., Crosa, J.H. & Actis, L.A. Genetic organization of an

Acinetobacter baumannii chromosomal region harbouring genes related to siderophore

biosynthesis and transport. Microbiology 149, 1227-1238 (2003).

5. Paulsen, I.T. et al. Complete genome sequence of the plant commensal Pseudomonas

fluorescens Pf-5. Nat Biotechnol 23, 873-878 (2005).

Page 16: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

OO

O-

CH2

O

O-

-OOC

OH

-OOC

OH

OH

COO-

COO-

-OOC

COO-

-OOC

O

O

COO-

O

O

COO-

COO-

O

COOCoA

COO-

O

CH3 CoA

O

CoA

O

O

OH

NH2

COO-

OH

COO-

OH

OH

OH

COO-

COO-

COO-

O

O

COO-NH3

+

COO-NH3

+

OH

COO-

OH

COO-

OH

OH

COO-

O

O

COO-

H

H

COO-

O

O

H

H

-OOC

O

H

O

OCoA

COO-

OH

COO-

OH

OH

O

OOH

OH

O

O

O

CH3

O O

O-

O

CH3

O

OH

O

O-

O-

O

O

O-

O-

O

O

OH

COO- COO-

OH

OH

O

OH O

OH-OOC

O

OH O

OH-OOC

OH

O

-OOC

OH O

COO-

COO-

OH

-OOC

O

-OOC

COO-

-OOC COO-

OHOH

OH

H

O

O

OH

OH

O

O

4-hydroxybenzoate

protocatechuate

β-carboxy-

cis,cis-muconate

γ-carboxymucolactone

benzylamine

benzoate

benzoate diol

catechol

cis,cis-muconate

mucolactone

β-ketoadipate

enol-lactone

β-ketoadipate

β-ketoadipate-CoA

acetyl-CoA succinyl-CoA

Tricarboxylic

cycle

3-hydroxybenzoate

gentisate

maleylpyruvate

fumarylpyruvate

fumarate

pyruvate

phenylalkanoate

phenylacetate

phenylacetyl-CoA

phenylethylamine

phenylacetaldehyde

phenylalanine

tyrosine

4-hydroxyphenyl

pyruvate

homogentisate

maleylacetoacetate

fumarylacetoacetate

acetoacetate

fumarate

homoproto

catechuate

2-hydroxy-5-carboxy

methylmuconate

semialdehyde

5-carboxymethyl-

2-hydroxymuconate

5-carboxy-2oxohept-

3-enedioate

2-hydroxyhepta-2,4-

dienedioate

2-oxohept-3-enedioate

2,4-dihydroxyhept-

2-enedioate

succinate

semialdehyde

succinate

CH3 O

O OH

NH2

PobA

PcaGH

PcaB

PcaC

PcaD

CatIJ

PcaF

PhhAB

TyrB1

TyrB2

Hpd

HmgA

Mai

Fah

Fad

Pad

?

PaaF

PaaGLIJK

PaaN

PhaAPaaBCDPcaF-2

C1-hpah

C2-hpah

HpaD

HpaE

HpaF

?

HpaG

HpaH

HpaI

GabD

HpaH

MhbB

MhbD

MhbI

MhbM

0 1 2 3 4 5

pcaRKcatIJpcaFTBDC

maiAtyrB1

paaXYphaApaaBCDpcaF-2paaFGLIJKactPphaKpaaN

mhbDBIM

hpaG1G2EDFHI

C1-hpaH C2-hpaHpcaGHphhRABC

4783-8

4

4489-9

4

1160-6

9

1610

1635

2593-9

6

2670

2789-2

805

tyrB2 fahAhmgA

benFEcatACBbenKDCBA

fadB1xFxAxB2xDx

fadAB

3727-8

3545-9

3134-4

3

3092-8

3104-5

Mb

BenABC

BenD

CatA

CatB

CatC

( )( )2n+12n+1

n > 1

?OH

OH

OH

-OOC

OH

quinate

QuiA4-hydroxy

phenylacetate

quiA

3545-9

A

B

Supplementary information for Vodovar et al. "Complete genome sequence of the entomopathogenic

and metabolically versatile soil bacterium Pseudomonas entomophila"

Page 17: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

Supplementary Figure 6 Catabolic pathways for aromatic compounds identified in the P.

entomophila genome. (a) The genes involved are positioned on a linear map of the

chromosome and display a scattered organization with exception of a 60-kb cluster (yellow

box). Unlike the genome of Acinetobacter SP16, most of these genes are dispersed throughout

the genome with the exception of a 57-kb region that contains all the determinants of the

catechol and the homoprotocatechuate pathways along with several oxidases, dehydrogenases

and oxygenases that might be involved in aromatic compound degradation. (b) Pathways

similar to those found in P. putida KT2440 include the catechol and the protocatechuate

pathways that lead to the β-ketoadipate pathway, as well as the phenylacetate and the

homogentisate pathways respectively. The pathways involved in phenylpropenoid utilization

(vanillate, coumarate, ferrulate, caffeinate) are absent even though several aldehyde

dehydrogenases that might convert particular phenylpropenoids (e.g. coniferyl aldehyde,

PSEEN0293) were identified. On the other hand, the P. entomophila genome contains two

additional catabolic gene clusters. The first one (PSEEN2593-2596), which is also present in

the genome of P. aeruginosa PA01, is similar to the mhbDBIM operon of Klebsiella

pneumoniae M5a1 that encodes determinant for the degradation of 3-hydroxybenzoate

through gentisate7. The second one (PSEEN3092-3098) is similar to the hpaRAGEDFHI

operon of Escherichia coli W that encodes the meta-cleavage pathway of

homoprotocatechuate8. This operon is also present in the genomes of P. aeruginosa and P.

fluorescens. Unlike E. coli W, P. entomophila does not contain the operon hpaBC whose

product are involved in the first step of 4-hydroxyphenylacetate catabolism but contains genes

(PSEEN3104-3105) similar to the hpaH(C1-C2) operon of Acinetobacter baumannii that

encodes the same activity as hpaBC9.

6. Barbe, V. et al. Unique features revealed by the genome sequence of Acinetobacter sp.

ADP1, a versatile and naturally transformation competent bacterium. Nucleic Acids Res

32, 5766-5779 (2004).

7. Liu, D.Q., Liu, H., Gao, X.L., Leak, D.J. & Zhou, N.Y. Arg169 is essential for catalytic

activity of 3-hydroxybenzoate 6-hydroxylase from Klebsiella pneumoniae M5a1.

Microbiol Res 160, 53-59 (2005).

8. Prieto, M.A., Diaz, E. & Garcia, J.L. Molecular characterization of the 4-

hydroxyphenylacetate catabolic pathway of Escherichia coli W: engineering a mobile

aromatic degradative cluster. J Bacteriol 178, 111-120 (1996).

Page 18: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

9. Thotsaporn, K., Sucharitakul, J., Wongratana, J., Suadee, C. & Chaiyen, P. Cloning and

expression of p-hydroxyphenylacetate 3-hydroxylase from Acinetobacter baumannii:

evidence of the divergence of enzymes in the class of two-protein component aromatic

hydroxylases. Biochim Biophys Acta 1680, 60-66 (2004).

Page 19: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

Supplementary Table 1 Comparison between the genome of P. entomophila and that of other pseudomonads

Ppa Pfa Paa Psta

No. orthologous genes (%) 3630 3301 2683 2603

% of orthologs in synteny

96.9 93.6 90.3 93.5

No. synteny groupsb 227 376 399 295

Maximal size of synteny groups 211 62 61 67

Average size of synteny groups 14.6 8.3 6.7 8.5

a Pp: P. putida KT2440, Pa: P. aeruginosa PA01, Pf: P. fluorescens-Pf-5, Pst: P. syringae pv. tomato DC3000. b Groups of synteny correspond to groups of genes shared by the 2 genomes (> 60% identity on > 80% of their length at the protein level) that display similar organization with authorizing five insertion or deletion events.

Page 20: Complete genome sequence of the entomopathogenic and metabolically versatile soil bacterium Pseudomonas entomophila

Supplementary Table 2 Gene comparison between P. entomophila and P. putida KT2440

Pe c Pp d

Total genes

5169

5404

Common genes a

3630

3630

Number of duplicated genes a/b

262 / 1271

389 / 1585

Specific genes

1539

1774

Number of duplicated specific

genes duplicated in the genome a/b

139 /410

237 / 635

% of specific duplicated genes

duplicated among specific genes a

81%

76%

a indicates the number of common or duplicated genes by using a constraint of 60% indentity over 80% of the length of the protein. b indicates the number of common or duplicated genes by using a constraint of 35% identity over 80% of the length of the protein. cPe: P. entomophila dPp: P. putida KT2440.