Top Banner
Refolding strategies from inclusion bodies in a structural genomics project Lionel Trésaugues 1 , Bruno Collinet 1 , Philippe Minard 1 , Gilles Henckes 2 , Robert Aufrère 2 , Karine Blondeau 2 , Dominique Liger 1 , Cong-Zhao Zhou 1 , Joël Janin 3 , Herman van Tilbeurgh 1,3 & Sophie Quevillon-Cheruel 1,* 1 Institut de Biochimie et de Biophysique Moléculaire et Cellulaire (CNRS-UMR 8619), Université Paris-sud, Bât. 430, F-91400 Orsay, France; 2 Institut de Génétique et Microbiologie (CNRS-UMR 8621), Université Paris-sud, Bât. 360, F-91400 Orsay, France; 3 Laboratoire d’Enzymologie et Biochimie Structurales (CNRS-UPR 9063), Bât. 34, 1 Av. de la Terrasse, F-91198 Gif sur Yvette, France; * Author for correspondence (Fax: (33) 1 69 85 37 15; e-mail: [email protected]) Received 13 June 2003; accepted in revised form 14 January 2004 Key words: cell-free protein synthesis, chaperones, E. coli expression, inclusion bodies, in vitro refolding, solu- bility, structural genomics Abstract The South-Paris Yeast Structural Genomics Project aims at systematically expressing, purifying and determining the structure of S. cerevisiae proteins with no detectable homology to proteins of known structure (http://genom- ics.eu.org/). We brought 250 yeast ORFs to expression in E. coli, but 37% of them form inclusion bodies. This important fraction of proteins that are well expressed but lost for structural studies prompted us to test method- ologies to recover these proteins. Three different strategies were explored in parallel on a set of 20 proteins: (1) refolding from solubilized inclusion bodies using an original and fast 96-well plates screening test, (2) co-ex- pression of the targets in E. coli with DnaK-DnaJ-GrpE and GroEL-GroES chaperones, and (3) use of the cell- free expression system. Most of the tested proteins (17/20) could be resolubilized at least by one approach, but the subsequent purification proved to be difficult for most of them. Abbreviations: GdnHCl – guanidine hydrochloride; IPTG – isopropyl--D-thiogalactopyranoside; NMR – nuclear magnetic resonance spectroscopy; ORF – open reading frame; PCR – polymerase chain reaction; SDS-PAGE – sodium dodecylsulfate-polyacrylamide gel electrophoresis; TCA – trichloroacetic acid; -SH – 2-mercaptoetha- nol. Introduction The huge amount of DNA sequence information that now becomes continuously available provides us with unseen opportunities to study protein function, evolu- tion and interactions. It is generally recognised that a major challenge of the postgenomic area will be pro- teomics [1]. What fraction of potential proteins are expressed, and where in the cell and under what chemical form they are active, are just a few crucial questions that will need to be tackled by post-ge- nomic studies [2]. A fundamental property for the understanding of protein function is their unique three-dimensional structure. Structural genomics ini- tiatives all over the world now aim to speed up the protein structure determination process by a system- atic approach of cloning, expression and purification [3, 4]. It is believed that these initiatives will steadily complete the protein structure catalogue, allowing meaningful structure information for the majority of protein families to be extracted [5]. The yeast genome codes for about 6000 proteins and if membrane proteins are considered, reliable structural information is available only for about 10% of these [6]. The South-Paris yeast structural genom- ics project has targeted 250 yeast proteins of un- 195 Journal of Structural and Functional Genomics 5: 195–204, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.
10

Refolding strategies from inclusion bodies in a structural genomics project

May 13, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Refolding strategies from inclusion bodies in a structural genomics project

Refolding strategies from inclusion bodies in a structural genomics project

Lionel Trésaugues1, Bruno Collinet1, Philippe Minard1, Gilles Henckes2, Robert Aufrère2,Karine Blondeau2, Dominique Liger1, Cong-Zhao Zhou1, Joël Janin3, Herman vanTilbeurgh

1,3

& Sophie Quevillon-Cheruel1,*

1Institut de Biochimie et de Biophysique Moléculaire et Cellulaire (CNRS-UMR 8619), Université Paris-sud,Bât. 430, F-91400 Orsay, France; 2Institut de Génétique et Microbiologie (CNRS-UMR 8621), UniversitéParis-sud, Bât. 360, F-91400 Orsay, France; 3Laboratoire d’Enzymologie et Biochimie Structurales(CNRS-UPR 9063), Bât. 34, 1 Av. de la Terrasse, F-91198 Gif sur Yvette, France; *Author for correspondence(Fax: (33) 1 69 85 37 15; e-mail: [email protected])

Received 13 June 2003; accepted in revised form 14 January 2004

Key words: cell-free protein synthesis, chaperones, E. coli expression, inclusion bodies, in vitro refolding, solu-bility, structural genomics

Abstract

The South-Paris Yeast Structural Genomics Project aims at systematically expressing, purifying and determiningthe structure of S. cerevisiae proteins with no detectable homology to proteins of known structure (http://genom-ics.eu.org/). We brought 250 yeast ORFs to expression in E. coli, but 37% of them form inclusion bodies. Thisimportant fraction of proteins that are well expressed but lost for structural studies prompted us to test method-ologies to recover these proteins. Three different strategies were explored in parallel on a set of 20 proteins: (1)refolding from solubilized inclusion bodies using an original and fast 96-well plates screening test, (2) co-ex-pression of the targets in E. coli with DnaK-DnaJ-GrpE and GroEL-GroES chaperones, and (3) use of the cell-free expression system. Most of the tested proteins (17/20) could be resolubilized at least by one approach, butthe subsequent purification proved to be difficult for most of them.

Abbreviations: GdnHCl – guanidine hydrochloride; IPTG – isopropyl-�-D-thiogalactopyranoside; NMR – nuclearmagnetic resonance spectroscopy; ORF – open reading frame; PCR – polymerase chain reaction; SDS-PAGE –sodium dodecylsulfate-polyacrylamide gel electrophoresis; TCA – trichloroacetic acid; �-SH – 2-mercaptoetha-nol.

Introduction

The huge amount of DNA sequence information thatnow becomes continuously available provides us withunseen opportunities to study protein function, evolu-tion and interactions. It is generally recognised that amajor challenge of the postgenomic area will be pro-teomics [1]. What fraction of potential proteins areexpressed, and where in the cell and under whatchemical form they are active, are just a few crucialquestions that will need to be tackled by post-ge-nomic studies [2]. A fundamental property for theunderstanding of protein function is their unique

three-dimensional structure. Structural genomics ini-tiatives all over the world now aim to speed up theprotein structure determination process by a system-atic approach of cloning, expression and purification[3, 4]. It is believed that these initiatives will steadilycomplete the protein structure catalogue, allowingmeaningful structure information for the majority ofprotein families to be extracted [5].

The yeast genome codes for about 6000 proteinsand if membrane proteins are considered, reliablestructural information is available only for about 10%of these [6]. The South-Paris yeast structural genom-ics project has targeted 250 yeast proteins of un-

195Journal of Structural and Functional Genomics 5: 195–204, 2004.© 2004 Kluwer Academic Publishers. Printed in the Netherlands.

Page 2: Refolding strategies from inclusion bodies in a structural genomics project

known structure. Membrane proteins, coiled-coil andmultidomain proteins (for those at least that could bepredicted) were initially eliminated from the targetlist [7]. In the pilot project, we successfully expressed80% of the 250 target proteins in E. coli, but 37% ofthese ended up in inclusion bodies. Up to now, 30%of the expressed proteins could be purified in suffi-cient quantities for structural studies (for an updatedreport, consult our web site http://genomics.eu.org/).

Some strategies to recover proteins expressed asinclusion bodies have been tried in structural genom-ics programs with more or less success. For instance,directed evolution has been used in order to createsoluble mutants through a combination of randommutagenesis and selection [8]. In this paper wepresent three strategies for the recovery of proteinsfrom inclusion bodies, that can be inserted into a gen-eral structural genomics flowchart. The efficiency ofthe three strategies is compared using a test set of 20well-expressed but insoluble proteins (Table 1). Firstwe developed a rapid systematic screening test inorder to find out the best refolding conditions for pro-teins purified under denaturing conditions. Secondly,we directly co-expressed these proteins with the mainprokaryotic chaperone systems (GroEL, GroES,

DnaJ, DnaK, GrpE). And third, we tried the cell-freeexpression system as an alternative to heterologousE. coli expression, to verify whether in vitro condi-tions provide a more promising environment for sol-uble expression of eukaryotic proteins. It is crucial tonote that not only the solubility of the proteins is im-portant for structural genomics, but also that proteinsshould not be aggregated. The strategies for solubili-zation of proteins produced in inclusion bodies pre-sented here show that they often lead to polydispersesamples.

Materials and methods

Cloning of selected ORFs

Restriction enzymes and T4 DNA ligase were pur-chased from New England Biolabs. DyNAzyme EXTDNA polymerase from Finnzymes was used for PCRamplification of selected ORFs. Oligonucleotideswere purchased from MWG-Biotech. All of the DNAmanipulations are made in XL1-Blue strain (Strat-agene).

Table 1. List of the target proteins. The number given to the targets in the Yeast Structural Genomics Project is indicated (http://genomic-s.eu.org/), next to the systematic and usual name (http://genome-www.stanford.edu/Saccharomyces). The functions of the proteins are speci-fied whenever they are known. MW: molecular mass, pI: theoretical isoelectric point of the proteins.

ORF Systematic name Usual name MW (kDa) pI Function

4 YDL087c LUC7 30.2 9.3 Spliceosomal subunit

12 YOL110w ERF4 26.5 5.1 Involved in RAS localization and palmitoylation

27 YIL102c – 11.3 5.0 Unknown

28 YEL073c – 11.9 4.2 Unknown

32 YPL098c – 12.2 9.7 Unknown

33 YLR156w – 13.1 8.9 Unknown

55 YMR195w ICY1 14.3 4.6 Unknown (interacting with the cytoskeleton?)

57 YDR511w ACN9 15.8 9.1 Weak similarity to C. elegans protein F25H9.7

60 YOR226c ISU2 16.9 9.6 Similarity to iron-sulfur cluster nitrogen fixation proteins

61 YLR010c TEN1 18.6 6.9 Weak similarity to Aquifex aeolicus adenylosuccinate synthetase

70 YKL208w CBT1 31.2 9.5 Subunit of cx involved in processing of the 3� end of cyt-b pre-mRNA

71 YLR412w – 31.7 6.7 Unknown

77 YGL258w-a – 8.7 8.9 Unknown

115 YJR108w ABM1 14.2 4.4 Required for normal microtubule structure

121 YJL016w – 19.7 5.9 Unknown

127 YHR157w REC104 20.7 4.8 Transcription activator, meiosis-specific protein

161 YLR307w CDA1 35.7 5.1 Chitin deacetylase

179 YDR530c APA2 36.8 5.3 Tetraphosphate phosphorylase II, nucleotide metabolism

180 YDR151c CTH1 36.8 9.0 Zinc finger protein CTH1

191 YOR357c SNO1 24.9 6.7 Putative pyridoxine (vit B6) biosynthetic enzyme

196

Page 3: Refolding strategies from inclusion bodies in a structural genomics project

The pET-9 vector (Novagen) was modified by re-placing the multiple cloning site with NdeI and NotIrestriction sites in the 5� and 3� positions, respectivelyand a SfiI restriction site in-between, in order tocounter-select against self-ligated vectors beforetransformation.

Selected ORFs have been cloned by PCR usinggenomic DNA of the sequenced S288C Saccharomy-ces cerevisiae strain as a template (prepared with theWizardR Genomic DNA purification kit fromPromega). The ORFs were amplified by PCR using a5� oligonucleotide containing a NdeI site in place ofthe AUG codon, followed by the first six codons ofthe cloned ORF, and a 3� oligonucleotide containingthe last six codons of the desired ORF followed bysix histidine codons, a stop codon and the NotIsequence. The PCR products were ligated in thederivative pET-9 vector to give pET-ORF. When aNdeI site already existed in the selected ORF, thecloning was made in a pET28 vector between NcoIand NotI sites. The sequence of the constructs wascontrolled by QBiogen S.A.

Systematic in vitro refolding of denatured proteins

Isolation of inclusion bodies and purification underdenaturing conditionsFor our study, we chose 20 proteins that were ex-pressed as inclusion bodies in BL21 (DE3) pLysS orRosetta (DE3) pLysS E. coli strains (Novagen). Thecells transformed with the constructs were grown in2 � YT medium (BIO101 Inc.) at 37 °C up to anA600 nm of 1. Expression was induced with 0.3 mMIPTG (Sigma) and the cells were grown for a further4 h at the same temperature. Cells were collected bycentrifugation, resuspended in 20 mM Tris-HCl pH7.5, 200 mM NaCl, 5 mM �-SH and stored at� 20 °C. After extraction by freezing/thawing andsonication followed by centrifugation at 5000 g for 10min, the pellet was washed twice with a 0.2% tritonsolution containing 5 mM �-SH and centrifuged at8000 g for 20 min. The pellet was then solubilized in30 ml of 25 mM Tris-HCl pH 7.5 buffer containing 5mM of �-SH and 6 M GdnHCl, stirred for 12 h at 4°C and centrifuged at 20000 g for 30 min [9]. Thepurification of the solubilized protein was performedby metal chelate affinity chromatography usingNi2 � -NTA beads (Qiagen, Hilden, Germany), with abed volume of 3 ml. The resin was washed with 20ml of 50 mM Tris-HCl pH 7.5, 150 mM NaCl, 5 mM�-SH and then equilibrated with the same buffer com-

plemented with 6 M GdnHCl. 10 mM imidazole wasadded to the solubilized protein before loading ontothe resin. After washing with 30 ml of buffer contain-ing 6 M GdnHCl, 50 mM Tris-HCl pH 7 or 8, 150mM NaCl, 5 mM �-SH, 10 mM imidazole, the pro-tein was eluted with 10 ml of 6 M GdnHCl, 50 mMTris-HCl pH 7 or 8, 150 mM NaCl, 5 mM �-SH,400 mM imidazole and concentrated by centrifuga-tion (filtration system Millipore, cut-off 5000 Da) atabout 4 mg/ml. The purity of the protein was evalu-ated by SDS-PAGE after precipitation of the samplewith TCA.

In vitro renaturation by dilution in 96-well platesWe designed a multi-well plate strategy that allowsone to follow in parallel the refolding process of 7different purified proteins in 4 different buffers:200 mM NaCl, 20% glycerol, 800 mM arginine and amix of 50 mM each of CuSO4, ZnCl2, MgCl2,MnCl2, ADP, NADH, biotin and thiamine (so-called‘cocktail’ buffer). Each refolding compound was pre-pared in solutions at three different pH values: 5.5, 7and 8.5, giving 12 refolding conditions in total. 5 µlof GndHCl-solubilized protein solution at 2 mg/ml(supplemented with 10 mM �-SH) was deposited inthe well plate and refolding was triggered by dilutionwith 95 µl of appropriate refolding buffer. The platewas shaken at room temperature for 60 min and pro-tein aggregation was monitored by following lightscattering (i.e. turbidity of the solution) at 390 nm di-rectly from the 96-well plates, after 15, 30, 45 and60 min. The refolding experiment was reproduced intriplicate.

Determination of the ratio of the soluble andinsoluble protein fraction50 µl of pure and 6 M GdnHCl-denatured protein wasrapidly diluted in 1 ml of the buffer giving the small-est absorbance at 390 nm in the screening test, andrenaturation proceeded for 1 h at 15 °C. The samplewas then centrifuged at 20,000 g for 30 min and the‘soluble fraction’ (supernatant) was analysed on SDS-PAGE and compared with the ‘total fraction’ (beforecentrifugation). The soluble fraction was then esti-mated by scanning the Coomassie brilliant blue-stained protein bands with a Personal Densitometer SI(Molecular Dynamics). When arginine-containingbuffers were used, samples were desalted onto aSephadex G25 HiTrap column (Pharmacia) prior tocentrifugation in order to estimate the soluble fractionin the absence of arginine.

197

Page 4: Refolding strategies from inclusion bodies in a structural genomics project

Determination of the polydispersity of the solublefractionIn order to check for the presence of multimeric spe-cies formed during renaturation of the protein, 200 µlof the soluble fraction was loaded on a Superose-12column 16/20 (Amersham Pharmacia Biotech) andeluted at 0.5 ml/min in previously determined bestbuffer conditions, supplemented with 200 mM NaCland 10 mM �-SH (only 1 mM �-SH when the bestbuffer is the ‘cocktail’ buffer).

In vivo chaperones co-expression with the targetprotein

The DnaK-DnaJ-GrpE and GroEL-GroES chaperonesexpressing plasmid pG-KJE3, a kind gift fromY. Takashi [10], are compatible with our pET expres-sion system. The E. coli expression strain Xl-10 Goldfrom Stratagene was co-transformed by pG-KJE3 andpET-ORF, and growth in 10 ml 2 � YT (BIO101 Inc.)pursued at 37 °C until the culture reached an absor-bance A600 nm value of 1. The chaperone proteinswere simultaneously over-expressed under the controlof the araB promotor by adding 2 mg/ml arabinoseto the culture. 10 min later the expression of yeastproteins was induced with 0.3 mM IPTG during 3 hat 25 and 37 °C. Cells were collected by centrifuga-tion, resuspended in lysis buffer (20 mM Tris-HCl pH8, 200 mM NaCl, 5 mM �-SH) and stored at� 20 °C. Cells were lysed by two cycles of freezing/thawing followed by sonication. An aliquot was col-lected and the total extract was analysed by SDS-PAGE. The rest of the samples were centrifuged at20,000 g and 4 °C; the supernatant corresponding tothe soluble fraction of the cells was treated as de-scribed. Proteins were detected by staining the gelswith Coomassie brilliant blue, and the fractions ofsoluble versus total produced protein were estimateby densitometry.

Cell-free synthesis of target proteins

S30 extract was prepared from the E. coliBL21CodonPlus strain (Stratagene) according to theprotocol developed by Yokoyama’s group [11]. pET9derived plasmids were used as DNA templates.Batch-wise reaction mixtures containing 7.2 µl of S30extract and 6.7 ng/µl (final concentration) of plasmid,in a final volume of 30 µl, were incubated at either37 °C or 25 °C for 1 h. Total fractions (10 µl of batchreaction after incubation), and soluble fractions (10 µl

of supernatant obtained after centrifugation for30 min at 20000 g), were first acetone precipitated onice for 10 min and centrifuged at 15000 g for 15 min.The dry pellet was then resuspended in Laemmlibuffer and analyzed on SDS-PAGE. His-tagged T7RNA polymerase was expressed and purified accord-ing to the protocol of Ellinger and Ehricht [12].

Results

In this paper we present our strategy to improve uponyields for soluble proteins after over-expression forstructural studies. With the perspective of a largescale application in the context of a structural genom-ics project, we present a comparative study of threedifferent solubilization strategies tested on 20 proteinsthat were expressed as inclusion bodies: in vitrorefolding, in vivo chaperones co-expression and cell-free expression. The general flowchart of the projectis shown in Figure 1.

Expression of the target proteins in E. coli

The 20 target proteins were selected from the pool of71 proteins expressed as inclusion bodies in the yeaststructural genomics project. Their E. coli expressionlevel in 750 ml of 2 � YT medium varied between 5and 50 mg of pure inclusion bodies in total. The sys-tematic names and some biochemical properties, aswell as their internal numbering used in the project,are gathered in Table 1. Their molecular weightsrange from 8.7 to 36.8 kDa and their theoretical pIvalues vary between 4.41 and 9.73. About half ofthem are of unknown function.

In vitro refolding

We first designed a fast and convenient procedure todo an initial screening for the best refolding condi-tions for each of a subset of proteins expressed asinclusion bodies in E. coli. After isolation of inclu-sion bodies, the GdnHCl solubilized denaturedproteins were purified by a single Ni2 � affinity chro-matographic step (purity between 75 and 90%, datanot shown) and concentrated up to 4 mg/ml for thein vitro refolding protocol. This protocol consists ofthree different steps. First, the best refolding buffer isexplored on a microgram scale (96-well plates). Inorder to rapidly select for good refolding buffers weused light scattering, a good indicator of misfolding,

198

Page 5: Refolding strategies from inclusion bodies in a structural genomics project

as a screening test (Figure 2). Second the ratio andthe polydispersity of the soluble fraction obtainedwith the selected buffer is monitored on milligramscale preparations (Figure 3). Finally, if satisfactoryconditions are found, they are applied for the proteinpreparation for structural studies.

The 96-well plate refolding screen is based on thedetection of protein aggregates formation by mea-surement of light scattering. A low and constantabsorption value at 390 nm, after dilution of thedenatured protein sample in the refolding buffer,indicates the absence of high molecular weight aggre-gates formation. Our working hypothesis is that sucha buffer may offer potential refolding conditions. In atypical experiment, using one multi-well plate, 7 pro-teins are tested in 4 different types of buffers and at 3different pH conditions in parallel. The first buffercontained NaCl, the second one glycerol, which iscommonly used to prevent hydrophobic interactionsbetween proteins, and the third one arginine, recog-nized as a good additive for refolding [13]. Finally wetested a buffer composed of a cocktail of cofactorsand metallic ions that may help the folding process.The list of our additives is not exhaustive and can besupplemented provided the compounds do not precip-

itate from the mix buffer. The ORFs were selected tobe devoid of exportation sequences and are thereforesupposed to code for cytoplasmic disulfide free pro-teins [7]. The reducing potential of the refoldingbuffer should therefore fit all of the proteins used inthe present study. Figure 2A shows data for three dif-ferent protein samples with different behaviour:citrate synthase, which forms aggregates under theseconditions, served as positive control [14]; ORF60,which remains soluble in every refolding buffer; andORF4, which can only be refolded in arginine buffer.The behaviour of all the 20 selected proteins falls un-der one of these three cases.

As can be seen in Figure 2B, the best buffer con-dition, defined as the one leading to minimal increaseof light scattering upon dilution, contained 800 mMarginine for 15/20 proteins. The ‘cocktail’ bufferranks second (3 proteins); NaCl (1 protein) and glyc-erol buffers (1 protein) were less successful.

Although arginine-containing buffers were very ef-ficient in the maintenance of soluble protein, werealized that such high arginine concentrations wouldnot be compatible with NMR or crystallization exper-iments. The behaviour of the 20 individual proteinsafter desalting in arginine-free buffer is summarized

Figure 1. General flowchart for protein preparation within the Yeast Structural Genomics Project. The target ORFs are cloned in a pET-typeexpression vector, in fusion with a 6 His-Tag at the C-terminus. The expression level and in vivo solubility are tested at a small scale asdescribed in [7]. The left branch of the scheme illustrates the procedure followed in case of soluble proteins. The right branch, composed ofthe three options, illustrates the strategies applied in this study for the recovery of inclusion bodies.

199

Page 6: Refolding strategies from inclusion bodies in a structural genomics project

in Table 2. During removal of arginine by desaltingof refolded proteins on a G25 column, we observedthree types of behaviour: the protein (i) remained sol-uble (ORFs 12, 55, 60, 161, 179 and 191), (ii)remained only partially soluble (ORFs 4, 27, 61, 71,77 and 180), or (iii) totally precipitated.

Finally, the monodispersity of the soluble samplesafter desalting was checked by size exclusionchromatography (Figure 3A). The elution profiles ofthe desalted protein sample showed that only a fewwere monodisperse (exemplified by ORF55), someothers were in multimer equilibrium (monomer ordimer, as illustrated by ORF180), while the remain-ing ones seem to form soluble aggregates (Figure3B).

Co-expression with chaperones

We wanted to test alternative and potentially generalstrategies that are more easily adaptable to the gen-eral flowchart of a high throughput approach (Fig-ure 1). It is now well recognized that co-expressionof molecular chaperones in E. coli can help recombi-nant proteins to adopt a native conformation duringtheir synthesis [15–17]. In this report we tested the

effect of co-expressing the five major E. coli molecu-lar chaperones DnaK-DnaJ-GrpE and GroEL-GroESwith the 20 above described proteins.

As a first approach, we decided to over-express thefive chaperones simultaneously with a medium acti-vation strength of the promoter. Interestingly, co-ex-pression of DnaK-DnaJ-GrpE and GroEL-GroESwith the 20 yeast proteins under study only slightlyaffected the expression level of the target proteins(Figure 4A): 14 proteins have similar expression levelin standard strains and chaperone-expressing strains(9 with high yield, 6 with low yield). One protein isexpressed at 50% and two at less than 50% comparedto expression levels obtained with our standard ex-pression strain. Three proteins could not be expressedin this system, even though they are well expressedwithout chaperones. As observed from Table 2 andFigure 4B, co-expression with chaperones seems verysuccessful, since solubility was improved for 16 outof the 17 expressed proteins: the solubility gain is upto 75% for 3 of them, comprises between 25 and 75%for 10, and less than 25% for 3. Figure 4C shows aSDS-PAGE illustrating the case of ORF71, which ispresent in majority in the soluble fraction when over-expressed with chaperones at induction temperaturesof 37 or 25 °C. Other proteins, such as ORF161, re-mained totally in inclusion bodies, even at lowerinduction temperature (data not shown). The produc-tion and purification of 10 out of the 16 partiallysoluble proteins was up-scaled (750 ml cultures). The

Figure 2. In vitro refolding by dilution in 96-well plates. (A) Lightscattering measurement at 390 nm after dilution. : 200 mMNaCl at pH 5.5, 7 or 8.5; : 20% glycerol at pH 5.5, 7 or 8.5; :800 mM arginine at pH 5.5, 7 or 8.5; : mix of 50 mM each ofCuSO4, ZnCl2, MgCl2, MnCl2, ADP, NADH, biotine and thiamineat pH 5.5, 7 or 8.5 (cocktail); CS: citrate synthase. (B) Distributionof the best refolding buffers (i.e. giving the lowest absorption val-ues at 390 nm after dilution of the GdnHCl solubilized sample inthe test buffer).

Figure 3. Determination of the polydispersity of the in vitro re-folded proteins. (A) Elution profiles from an analytical Superose-12column. Typical behaviour for three ORFs is illustrated: ORF55elutes as a dimer, ORF180 as a mix of dimer and tetramer, andORF161 seems to form aggregates. (B) Distribution chart for thebehaviour of refolded proteins during desalting and gel filtration.

200

Page 7: Refolding strategies from inclusion bodies in a structural genomics project

final gel-filtration step yielded monodisperse samples,free of chaperones, for four of these proteins (ORF4,71, 76 and 191) and crystallization screens could belaunched. For ORF71 and 179, 50% of the sampleeluted as monodisperse fraction, while the remainingpart was contaminated by chaperons. ORF55, 57, 60and 121 probably eluted as soluble aggregates, sincethey came out in the dead volume of the column.

Cell-free expression

We wanted to test whether in vitro expression im-proves the solubility of proteins that are produced asinclusion bodies in E. coli. We applied the in vitroexpression system developed at the Genomic ScienceCenter (Riken, Yokohama, Japan) [18]. We first testedexpression at ‘batch’ scale at various incubation tem-peratures. At 37 °C, 17 proteins out of 20 are wellexpressed (Table 2 and Figure 5), but expression lev-els drop dramatically at 25 °C (data not shown).Interestingly, 7 out of 17 expressed proteins arepresent in the soluble fraction. The solubility gain isup to 90% for two of them (ORFs179 and 191), to

50% for ORF60 and between 5 and 25% for ORF57,61, 71 and 77. The production of three partially sol-uble ORFs is currently being scaled up for crystalli-zation purposes.

Discussion

A serious hurdle for systematic structure determina-tion of proteins is the process of obtaining pure, con-centrated and soluble samples. Many proteins fail toadopt their three-dimensional structure and end up asinclusion bodies when expressed in the more conven-ient bacterial expression systems [19]. A large scaleexperiment on the Thermatoga maritima proteomeexpression, for instance, showed that only 40% of thetested proteins were expressed in sufficient quantitiesand in a soluble form [20]. Another study at a com-parable scale on Methanobacterium thermoau-totrophicum gave about 40% insoluble proteins in thepool of expressed proteins (80% of the cloned ORFs)[21]. The problem became more pronounced with in-creasing ORF length. Reasons for folding failure may

Table 2. Summary of the results obtained for the three strategies. (R) In vitro refolding by dilution; the buffer permitting to obtain the bestrefolding score in the 96-well plate format is noted, as well as the percentage of soluble protein after desalting. (C) E. coli co-expression withchaperones; the expression level of the targets and the percentage of solubility gain are noted. (F) Cell-free expression; the expression levelof the proteins and their percentage of solubility are noted.

ORF Best refolding (R) buffer /

solubility

Chaperones (C)

expr. level / solubility

Cell free (F)

expr. level / solubility

Method of choice

4 Arg pH8.5 / 15% � � � / 20% � � � / no R / C

12 Arg pH8.5 / 80% � / 50% � � / no R / C

27 Arg pH8.5 / 50% � / 80% No expression R / C

28 Glycerol / no � � / 30% No expression C

32 Arg pH5.5 / no No expression � / no None

33 Cocktail pH5.5 / no � � � / 10% No expression C

55 Arg pH7 / 80% � / 50% No expression R / C

57 Arg pH7 / no � / 90% � � � / 20% C / F

60 Cocktail pH5.5 / 75% � � / 50% � � � / 50% R / C / F

61 Arg pH8.5 / 15% � � � / 10% � � � / 10% R / C / F

70 Cocktail pH5.5 / no � � / 70% � � � / no C

71 Arg pH8.5 / 25% � � / 70% � � � / 10% R / C / F

77 NaCl pH5.5 / 50% � � / 50% � � � / 5% R / C / F

115 Arg pH7 / no No expression � � / no None

121 Arg pH7 / no � / 70% � � � / no C

127 Arg pH8.5 / no No expression � � / no None

161 Arg pH8.5 / 90% � � � / no � � � / no R

179 Arg pH7 / 90% � � � / 80% � / 90% R / C / F

180 Arg pH8.5 / 20% � / 70% � � / no R / C

191 Arg pH8.5 / 80% � � / 50% � / 90% R / C / F

201

Page 8: Refolding strategies from inclusion bodies in a structural genomics project

be multiple: a protein may need general or specificchaperones, appropriate redox potentials, specificprotein partners, cofactors, etc. It is clear that theseproblems will get worse with protein targets of morecomplex eukaryotic organisms. Proteins in theseorganisms are more often modular compared to bac-terial ones and can be prone to many complex post-translational modifications that are not present inbacteria.

Our study compares and assesses the efficiency ofthree different approaches to solubilize proteins thatare expressed as inclusion bodies in E. coli understandard conditions. These protocols have been testedon a same set of 20 proteins. The results are sum-

marised by means of a Venn diagram in Figure 6,showing successful ORFs in different approaches andtheir overlaps. Six proteins have been refolded (ac-cording to the experimental criteria described in thispaper) in all three protocols. Five were refolded withchaperone co-expression as well as in vitro refolding,but were not soluble in the in vitro expression sys-tem. One protein was refolded by both chaperoneco-expression and in vitro expression. One proteincould only be solubilised by in vitro refolding, fourproteins were soluble only by chaperone co-expres-sion. All three protocols failed for three of the pro-teins. We should point out that, although the majorityof proteins could be obtained in large quantities byeither the ‘chaperone’ and/or ‘in vitro refolding’ pro-tocols, we frequently met aggregation and/or chaper-ones contamination problems during the subsequentpurification steps.

From these studies it appears that all three ap-proaches are potentially useful to recover proteins ex-pressed as inclusion bodies. As the best method forany given protein cannot be predicted, the three strat-egies can easily be used in parallel when a limitednumber of protein sample is being processed. How-ever, for systematic structural genomics programs thismight not be possible and installing a hierarchy of re-folding procedures is probably a more realistic ap-proach. Our results suggest that chaperone co-expres-sion is altogether the most convenient and efficientstrategy, and probably the first to be tried. It can eas-ily be inserted into a general expression protocol andthe chaperone rescued proteins can then be processedthrough the same pipeline as soluble proteins. Forproteins that are not solubilized or are contaminatedby chaperones after purification (even after incuba-tion with ATP), we suggest that an in vitro expressionat 37 °C or refolding in arginine-containing buffermay be the most efficient options. A considerable dif-ficulty for the rescue of proteins from inclusion bod-ies is the control for monodispersity and correct fold-ing of the sample. Routinely, we opted for a sizeexclusion chromatographic control, but HSQC-NMRspectroscopic analysis of the soluble fractions isclearly a complementary option in order to distin-guish between folded, unfolded and aggregated pro-tein samples [22].

We have recently applied this approach to a newset of 13 proteins, expressed as inclusion bodies. Allof them were solubilized by chaperones co-expres-sion, and six were successfully purified and are beingtested for crystallization. For two of these the struc-

Figure 4. In vivo co-expression of the proteins with chaperones.(A) Distribution chart of the protein expression levels in E. coli inthe presence of chaperones compared to standard expression. (B)Distribution chart of the solubility gain, obtained with chaperoneco-expression. (C) SDS-PAGE showing the total and soluble ex-tracts of E. coli overexpressing ORF71 without or with chaperones,at 37 or 25 °C.

202

Page 9: Refolding strategies from inclusion bodies in a structural genomics project

ture could be solved (manuscripts in preparation), andfor one the biochemical function could be demon-strated [23]. Three of the 13 proteins precipitated dur-ing purification, suggesting that permanent interac-tions with chaperones were essential for maintainingsolubility.

Acknowledgements

This work was supported by grants from the Min-istère de la Recherche et de la Technologie (Pro-gramme Génopoles). We are very grateful to ThomasEllinger for the generous gift of the T7 RNA poly-merase expression system (including expression vec-tor and E. coli BL21(pREP4) strain). B.C. wishes tothank members of Yokoyama’s group and more par-ticularly Akiko Tanaka and Rie Nakajima for teach-

ing how to prepare S30 extract and providing helpfulexpertise about the cell-free expression system.

References

1. Horak, C.E. and Snyder, M. (2002) Funct. Integr. Genom. 2,171–180.

2. Forster, J., Famili, I., Fu, P., Palsson, B.O. and Nielsen, J.(2003) Genome Res. 13, 244–253.

3. Brenner, S.E. (2001) Nat. Rev. Genet. 2, 801–809.4. Knaust, R.K. and Nordlund, P. (2001) Anal. Biochem. 297,

79–85.5. Hou, J., Sims, G.E., Zhang, C. and Kim, S.H. (2003) Proc.

Natl. Acad. Sci. USA 100, 2386–2390.6. Sanchez, R. and Sali, A. (1998) Proc. Natl. Acad. Sci. USA

95, 13597–13602.7. Quevillon-Cheruel, S., Collinet, B., Zhou, C.Z., Minard, P.,

Blondeau, K., Henkes, G., Aufrere, R., Coutant, J., Guittet,E., Lewit-Bentley, A., Leulliot, N., Ascone, I., Sorel, I., Sa-varin, P., Li de la Sierra-Gallay, I.L., de la Torre, F., Poupon,A., Fourme, R., Janin, J. and van Tilbeurgh, H. (2003) J.Synchrotron Radiat. 10, 4–8.

8. Pedelacq, J.D., Piltch, E., Liong, E.C., Berendzen, J., Kim,C.Y., Rho, B.S., Park, M.S., Terwilliger, T.C. and Waldo, G.S.(2002) Nat. Biotechnol. 20, 927–932.

9. Georgiou, G. and Valax, P. (1999) Methods Enzymol. 309,48–58.

10. Nishihara, K., Kitagawa, M., Yanagi, H. and Yura, T. (1998)Appl. Environ. Microbiol. 64, 1694–1699.

11. Kigawa, T., Yabuki, T., Yoshida, Y., Tsutsui, M., Ito, Y., Shi-bata, T. and Yokoyama, S. (1999) FEBS Lett. 442, 15–19.

12. Ellinger, T. and Ehricht, R. (1998) Biotechniques 24, 718–720.

13. Lilie, H., Schwarz, E. and Rudolph, R. (1998) Curr. Opin.Biotechnol. 9, 497–501.

14. Buchner, J., Schmidt, M., Fuchs, M., Jaenicke, R., Rudolph,R., Schmid, F.X. and Kiefhaber, T. (1991) Biochemistry 30,1586–1591.

15. Baneyx, F. (1999) Curr. Opin. Biotechnol. 10, 411–421.

Figure 5. Distribution chart of the percentage of proteins present in the soluble fractions versus total fractions of proteins expressed in cell-free expression system compared to standard expression in E. coli.

Figure 6. Global overview of the efficiency of the three refoldingstrategies for the solubility of the 20 tested proteins. Figures incircles denote the number of successful ORFs.

203

Page 10: Refolding strategies from inclusion bodies in a structural genomics project

16. Ben-Zvi, A.P. and Goloubinoff, P. (2001) J. Struct. Biol. 135,84–93.

17. Maki, J.A., Schnobrich, D.J. and Culver, G.M. (2002) Mol.Cell 10, 129–138.

18. Kigawa, T., Muto, Y. and Yokoyama, S. (1995) J. Biomol.NMR 6, 129–134.

19. Yokoyama, S. (2003) Curr. Opin. Chem. Biol. 7, 39–43.20. Lesley, S.A., Kuhn, P., Godzik, A., Deacon, A.M., Mathews,

I., Kreusch, A., Spraggon, G., Klock, H.E., McMullan, D.,Shin, T., Vincent, J., Robb, A., Brinen, L.S., Miller, M.D.,McPhillips, T.M., Miller, M.A., Scheibe, D., Canaves, J.M.,Guda, C., Jaroszewski, L., Selby, T.L., Elsliger, M.A.,Wooley, J., Taylor, S.S., Hodgson, K.O., Wilson, I.A.,

Schultz, P.G. and Stevens, R.C. (2002) Proc. Natl. Acad. Sci.USA 99, 11664–11669.

21. Christendat, D., Yee, A., Dharamsi, A., Kluger, Y.,Savchenko, A., Cort, J.R., Booth, V., Mackereth, C.D.,Saridakis, V., Ekiel, I., Kozlov, G., Maxwell, K.L., Wu, N.,McIntosh, L.P., Gehring, K., Kennedy, M.A., Davidson, A.R.,Pai, E.F., Gerstein, M., Edwards, A.M. and Arrowsmith, C.H.(2000) Nat. Struct. Biol. 10, 903–909.

22. Maxwell, K.L., Bona, D., Liu, C., Arrowsmith, C.H. and Ed-wards, A.M. (2003) Protein Sci. 12, 2073–2080.

23. Ganem, C., Devaux, F., Torchet, C., Jacq, C., Quevillon-Cheruel, S., Labesse, G., Facca, C. and Faye, G. (2003)EMBO J. 22, 1588–1598.

204