This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Vol. 12 no. 1 1996Pages 41-48
HTP: a neural network-based method forpredicting the topology of helicaltransmembrane domains in proteins
Piero Fariselli and Rita Casadio1
Abstract
In this paper we describe a microcomputer program (HTP)for predicting the location and orientation of a-helicaltransmembrane segments in integral membrane proteins.HTP is a neural network-based tool which gives as outputthe protein membrane topology based on the statisticalpropensity of residues to be located in external and internalloops. This method, which uses single protein sequences asinput to the network system, correctly predicts the topologyof 71 out of 92 membrane proteins of putative membraneorientation, independently of the protein source.
Introduction
The database of membrane proteins known at atomic ornearly atomic resolution currently contains only a fewexamples as compared to ~300 globular proteins (Bersteinet al., 1977). This paucity is due to difficulties in thecrystallization procedure of membrane proteins (Michel et al.,1986), and is in contrast to the large and exponentiallygrowing number of residue sequences pertaining to thisprotein class (as well as to the globular class) (Chothia, 1992).
Efficient predictive methods can therefore help to modelthe structure of membrane proteins starting from theamino acid sequence, as obtained with DNA recombinanttechniques.
Except for the 16 strands of the /3-barrel porins (Weisset al., 1991; Cowan et al., 1992), the transmembranesegments of the other proteins in the database consist oftypical apolar helices of some 20 residues perpendicular orpartially tilted with respect to the normal to the membranebilayer (Michel et al., 1986).
The topography and topology of membrane proteins isin turn determined via experimental approaches, such asgene fusion analysis and biochemical probes (Manoil etal., 1986; McGovern et al., 1991; Traxler et al., 1993).Following these criteria, the transmembrane organizationof some 90 membrane proteins is presently known andavailable in the SWISS-PROT data base (Bairoch andBoeckmann, 1992).
Laboratory of Biophysics. Department of Biology, University of Bologna,1-40126 Bologna. Italy1 To whom correspondence should be addressed. E-mail:[email protected]
Most of the theoretical methods so far developed topredict membrane proteins are essentially aimed at therecognition of the hydrophobic membrane-spanningstretches distributed throughout the amino acid sequence(Traxler et al., 1993), and implicitly assume that thepredicted segments are a-helices associated with the peaksfound in the hydrophobicity plot, which displays theaverage hydrophobicity of each residue versus its positionalong the protein sequence.
The most evident sequence pattern which correlateswith the topology of the membrane-spanning segmentsis the prevalence of positively charged residues in thecytoplasmic loops of polytopic (multiple spanning) mem-brane proteins (von Heijne, 1992).
The theoretical search for membrane-spanning seg-ments along a protein sequence is usually performed byevaluating a running-average hydrophobicity over asliding window of appropriate length on the basis ofhydrophobicity scales (Kyte and Doolittle. 1982; Engel-man et al., 1986); by calculating the statistical propensitiesfor each residue to be buried in the membrane phase(Kuhn and Leigh, 1985; Klein et al., 1985; Rao and Argos,1986); and, for amphipathic structures, by computing theperiodicity of the hydrophobicity (Eisenberg et al., 1984;Cornette et al., 1987).
Several hydrophobicity scales have been calculated(Schultz, 1988; Fasman, 1989; Fasman and Gilbert, 1990;Degli Esposti et al., 1990; von Heijne, 1994; White, 1994).However, the window length and the decision functions,which are crucial to the accuracy of the prediction, areempirically tailored for specific functional classes ofmembrane proteins and are of no general use. Anexception is a quadratic minimization procedure evaluatedon the membrane proteins known at atomic resolution(Edelman, 1993).
The average values of the hydrophobicity and aconsensus procedure, which combines the predictions ofseveral methods, seem to improve the accuracy of theprediction (Degli Esposti et al., 1990; Parodi et al., 1994),and this approach has recently been argued on the basis ofcomparison of the prediction with structural modelsobtained with NMR (Turner and Weiner, 1993).
Prediction scores improve considerably by usingevolutionary information derived by multiple alignments
of amino acid sequences. In this case, the protein to bepredicted is modified according to the residue frequency inthe aligned sequences (Rost et al., 1993; Persson andArgos, 1994).
A step forward in the automatic extraction of theinformation from the database of known structures wasmade by using supervised feed-forward neural networks(Hirst and Sternberg, 1992; Presnell and Cohen, 1993).This computational system provides a very efficientpredictive method for protein secondary structures (Rostand Sander, 1993; Rost et al., 1994).
Using a neural network as a pattern classifier andevolutionary information as input code, the structure oftransmembrane helices was recently predicted with 95%accuracy (Rost et al., 1995). Alternatively, a statisticalmethod based on a model recognition approach couldpredict both the structure and the topology of all-helicalmembrane proteins (Jones et al., 1994).
In this paper we describe HTP, a method based onneural networks and running on microcomputers. HTP istrained on a set comprising 11 membrane proteins ofknown topology, uses as input to the network system thesingle protein code, and does not rely on multiple sequencealignment. HTP nevertheless performs at a 91% level ofpredictive accuracy when tested on a set of 69 proteinswith no homology with those of the training set.Moreover, by using a set of rules derived from thestatistical propensity of each residue to be located in acytoplasmic or extra-cytoplasmic loop, HTP also correctlyevaluates the topology of 48 of the 69 testing proteins.When evaluating a larger set, which included 92 proteins(as described in the SWISS-PROT database), HTPcorrectly predicts the location and topology of the
Table I. The training and testing set of membrane proteins
transmembrane segments of 71 chains from a mixture oforganism classes. The statistical method based on a modelrecognition approach (Jones et al., 1994), tested on thesame 92 proteins, correctly predicts the location andtopology of 63 chains.
System and methods
HTP has been developed in the Microsoft QuickC (version2.5 for MS-DOS) environment and compiler using a PC-compatible 386 running MS-DOS. The code is written instandard C. Graphic routines are machine dependent.
Proteins known at atomic resolution are taken from theBrookhaven Data Bank (Bernstein et al., 1977). Mem-brane proteins for which experimental topology informa-tion is available are individually selected from the SWISS-PROT database (Bairoch and Boedckmann, 1992). Thetraining set of membrane proteins (SETL, the subset inTable I) consists of proteins known at atomic or nearlyatomic resolution and includes: the three subunits of thetwo photosynthetic reaction centers from Rhodopseudo-monas viridis (Deisenhofer et al., 1985) and Rhodobactersphaeroides (Feher et al., 1989), the seven transmembranehelices of bacteriorhodopsin (Henderson et al., 1990) andthe plant light-harvesting chlorophyll a/b-protein complex(Kuhlbrandt et al., 1994). Since the structure of bacter-iorhodopsin is only known in the transmembrane regions,the set is enlarged to include rhodopsin (Khorana, 1992),of which a projection map is available at 7 A resolution(Schertler et al., 1993), and two other receptors (thehuman /32 adrenergic and ni] acetylcholine receptors forwhich a presumed transmembrane topology is derivedfrom experimental data (Strosberg, 1991). By this, the
Prediction of all-helical membrane protein topology
average level of homology existing between the proteins ofthe two different functional groups included in the trainingset is balanced. During validation of HTP, care was takenthat the proteins of the training set had little or nohomology with those of the testing set. Homology wasdetected with the FASTP program for comparingsequences (Lipman and Pearson, 1985).
Algorithms
Artificial neural networks
In HTP, the architecture of neural networks (NNs) consistsof two basic feed-forward multilayered perceptrons withone hidden layer containing two hidden units (Figure 1).
CASCADED NET(NET.2)
T NT Output
Hidden Layer
Input
AL . : ^ |Input Window
Fig. 1. Architecture of the neural networks used to predict thetransmembrane domains of proteins. Grey, filled and open circlesrepresent the neurons in the input, hidden and output layersrespectively. Lines between neurons in different layers indicate theconnecting weights (w-,j). The residue sequence is encoded in the slidinginput window with a binary scheme (bar symbols). Outputs of thesimplest basic neural network without hidden layers are inputs to thecascaded network with two neurons in the hidden layer. At each neuron /,the activation is computed as: a, = Evi'yo* + i?,-, where ok is the output ofunit k and $, is a bias term (threshold value). Its output is computedaccording to a sigmoid trigger function: o, = [1 + exp(-a,)]" ' . Bothnetworks are performing with two output units, the transmembrane (T)and non-transmembrane (NT) nodes.
Similar NN systems have been described before (Fariselliet al., 1993; Vivarelli et al., 1995). Supervised learning isperformed with the back-propagation algorithm (Rumel-hart et al., 1986).
In NET_1 the input pattern is provided by a slidingwindow which reads an appropriate number of proteinresidues. A 20 bit binary encoding scheme is used for theamino acid residues. The two real outputs give thepropensity of each residue in the sequence-dependentpattern (a segment of flanked residues) to belong (or not)to the transmembrane helical class. The classification ofeach residue in the transmembrane (T) and not transmem-brane class (NT) is carried out according to the winner-take-all strategy.
NET_1 is trained on SETL, updating the weights aftereach pattern presentation. The architecture of NET_1 isselected after a search in the parameter space of thenetwork using a cross-validation test on the learning sets.The number of training cycles was set to 50. An inputwindow of 17 groups of 20 units is selected. The initialweights are randomly chosen in the range of [-1.0e-2,1.0e-2] and the learning rate is fixed at 0.01.
Sequences of outputs from the first network are used asinputs to the second network (CASCADED NET(NET_2) in Figure 1). NET_2 is expected to filter outspurious assignments and/or sum up short neighboringfragments (Qian and Sejnowski, 1989). As an example, theprediction of malf_ecoli is shown in Figure 2. Thecascaded network performs best with one hidden layercontaining two hidden units. The optimized window sizealso contains 17 groups of 20 units. The number of outputnodes is fixed at two, as for NET_1, and the winner-take-all procedure is again adopted in order to classifythe residue T or NT. Training is performed as forNET_1.
A cut-off procedure calibrated on the training set
Although NET_2 usually gives a well-defined segment, aslight overprediction of transmembrane segments can beseen, as compared to the expected structures contained inthe training sets. In order to correct for this, we introducea criterion for rejecting false positive segments based onthe logical AND function.
In considering the prediction of the training set, atransmembrane segment is accepted provided that thelength, height and area of the corresponding signal aregreater than or equal to those of the smallest existing helix(CUT1). A less stringent version of the rejection criterionis obtained by requiring that the above values are greaterthan those of the largest non-existing helix of the trainingset (CUT2). CUT2 can eventually also reveal transmem-brane signals weaker than those accepted with CUT1.
Fig. 2. Prediction of the maltose transporter of E. coli (malfecoli): (A)with NET1; (B) with the cascaded net (NET2); (C) after the cut-offprocedure (CUT2) (Qj = 0.89, C = 0.79). g3 and C are the fraction ofcorrect predictions and the correlation coefficient (Matthews, 1975)respectively. Expected transmembrane segements are: 17-35, 40-58,73-91, 277-295, 319-337. 371-389, 418-436. 486-504. Predictedsegments are: 17-37, 39-58, 69-94, 278-311, 318-339. 368-392,415^448, 486-508.
Based on the notion that transmembrane segments cannotexceed a given length, a stretch > 38 residues is auto-matically split in the neighborhood of the midpoint wherethe output signal has a local minimum.
Prediction of the topology
After locating the transmembrane segments, HTP alsoevaluates their topology with respect to the membranebilayer. The algorithm calculates:
(1)
where P is the statistical propensity of the ;th residue(res(/)) to be found in a cytoplasmic (inner) or in an extra-cytoplasmic (outer) loop region in the proteins of thedatabase. Index j runs over the number of predicted loopregions (N). In turn index / runs over the set 5 includingthe residues predicted in theyth loop region. In each loopP is a function of the initial assignment of the protein N-terminus. The in(y) and out( /') functions assume inner andouter propensity values for the ;'th residue, depending onthe parity ofyth loop.
To prevent the inclusion of possible globular domains inthe loop scores, an additional empirical rule is adopted.When a loop region exceeds 60 residues, only two flankingstretches of 15 residues and in contact with the trans-membrane helices are considered.
The statistical propensity (P) for each residue to belocated in an inner or outer loop region is evaluatedby considering the topology of the membrane proteinslisted in Table I and discriminating between single-spanning (monotopic) and multi-spanning (polytopic)proteins.
A protein is then classified inner or outer depending onthe AP value. When AP > 0 the sequence is classifiedinner (IN), otherwise outer (OUT).
Implementation
HTP is implemented with four different menus (Figure 3)which are briefly outlined in the following.
Predict
The prediction phase is batch performed with a NNsystem as described above. The cut-off procedure isimplemented as follows: IF [(LENGTH > Threshold_Length) AND (AREA > Threshold_Area) AND (MAXI-MUM HEIGHT > Threshold_Height) THEN keep thesegment (and if LENGTH > 2max_length split it) ELSEcut it.
HTP evaluates the difference between transmembrane (T)and non-transmembrane (NT) outputs for each residue.Positive and negative values indicate a helical transmem-brane and an external loop propensity respectively. A plotof the transmembrane propensity (transmembranicity) foreach residue is shown as a function of the residuesequence. A typical output is shown in Figure 2, wheremalf_ecoli is used as a test protein.
Write
The routine evaluates the protein topology using a set ofrules based on the statistical propensity of the residues tobe located in the cytoplasmic (in) or extra-cytoplasmic(out) loop regions. The reliability index is computed toscore the prediction (Figure 4).
Create
HTP is provided with an interface in order to train thepredictor on a user-defined learning set. In this way it ispossible to update the training set as soon as the topologyof other membrane proteins is available.
Total no. of proteins Q3 QT QNr C (L) N N,,op
69Receptors12Total81
0.91 0.89 0.91 0.76 23 54 48
0.93 0.97 0.90 0.87 26 12 12
0.92 0.96 0.91 0.77 23 66 60
Qi, Qj and QNT are the fraction of total, transmembrane and non-transmembrane correct predictions respectively; C is the correlationcoefficient (Matthews, 1975); (Z.) is the average length (number ofresidues) of a transmembrane segment; N is the number of proteinscorrectly predicted and N{op is the subset of N whose topology is correctlyassigned.
Discussion
The predictive performance
HTP can correctly predict the transmembrane location of54 out of 69 proteins from the database with no homologywith those of the training set (z < 3) (Table II). Thispredictive efficiency (78% on a protein basis) compareswell with that previously obtained using multiple sequencealignment and a cross-validation procedure over 69membrane proteins (80% on a protein basis) (Rost et al.,1995).
When topology is evaluated on the same testing set, thenumber of proteins correctly predicted reduces to 48 (70%on a protein basis). Eventually, the inclusion of 12receptors (with high homology with those of the trainingset) increases the predictive efficiency to 74%.
The performance of HTP was then tested on all theproteins of Table I. In this case we can also evaluate thepredictive efficiency of HTP on the proteins of thetraining set. The topology (location and orientation ofthe transmembrane segments) of 71 out of 92 proteinsof the database was correctly predicted (marked inTable I). This performance is better than that obtainedon the same set with MEMSAT, a statistical method basedon a model recognition approach (Jones et al., 1994)(Table III).
HTP correctly rejected porins (data not shown) andmispredicted only 15 out of 145 globular proteins. In thiscase, however, the average length of the predictedsegments was shorter than that of membrane proteins(Table IV).
Table III. Prediction
Method
HTPMEMSAT3
of the membrane proteins of the database
No. of proteins
9292
<V,op
7163
"Statistical method based on a model recognition approach (Jones et al.,1994).
N is the number of proteins correctly rejected by HTP.
Prediction of polytopic and monotopic membrane proteins
A typical HTP output shows the residue assignment andthe computed topology along the protein sequence (Figure4). The value of the reliability index scores the predictionfor the T or NT classification of each residue.
A few significant examples are shown in Figure 4. Thepredicted proteins include polytopic and monotopicmembrane proteins and the results obtained with HTPmatch the expected models. The topology of SecY, aprotein involved in protein export in Escherichia coli, hasbeen determined with a genetic approach (Akyama andIto, 1987) (Figure 4A). The two apoproteins of the light-harvesting complex of photosynthetic bacteria haverecently been crystallized (McDermott et at., 1995)
(Figure 4B, C). Despite the fact that they were closelyrelated, the structures of two coat proteins frombacteriophages Ml3 and PF1 (available from NMRdata) were predicted with different levels of accuracyusing different hydrophobicity scales, an average hydro-phobicity scale and a consensus procedure (Turner andWeiner, 1993; Parodi et al., 1994). Our method, trained onSETL, which shows no homology with the two coatproteins, scores as high as 89% and 78% of total correctprediction (Q^) for PF1 and M13 respectively, in goodagreement with the NMR structures and predicts thecorrect topology (Figure 4D, E).
In conclusion, in spite of the small training set used, thepresent version of HTP seems to be of general use inevaluating the topology of integral membrane proteins,
B) Iha4_rhoacS e q : MNQGKIWTWNPAIGIPALLGSVTVIAILVHLAILSHTTWFPAYWQGGVKKAAP r e d : HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHRel: 76688734889999999999999999999999999998753367868998888
Transmembrane Segments1:8-41 Length 34
A) a«cy__«coliSeq: MAKQPGLDFQSAKGGLGELKRRLLFVIGALIVFRIGSFIPIPGIDAAVLAPeed: HHHHHHHHHHHHHHHHHHHHHHHHHH"Bel: 88889999999999998887516899999999999999953996 63632 6
network predictionreliability prediction index 10-91Rel is the rescaled value of 'O' from 0 to 9where O ° integer*(10*absolute_valuelout(tr) - out[Not_tr]))
C) Ihb5_rhoacSeq: ATLTAEQSEELHKYVIDGTRVFLGLALVAHFLAFSATPWLHPred: HHHHHHHHHHHHHHHHHHRel: 8888999999999B887421169999999999866406777
Rel is the rescaled value of '0'where O = integer*{10'absolute_v<
from 0 to 9lue(out[tr] - outlNot
H Putative Transmembrane Helix* Putative outside Loop_ Putative inside Loop
H Putative Transmembrane Helix~ Putative outside Loop__ Putative inside Loop
Fig. 4. HTP at work: predictions of different membrane proteins. (A)SecY from E. coli (secy_ecoli) (03 = 0.89; C = 0.74). Expected topology:in; expected transmembrane segments: 23-41, 75-95, 122-139, 154-174,183-203, 217-237, 274-294, 316-335, 376-395, 399-414. (B) a-apoprotein of the light-harvesting complex from Rhodopseudomonasacidophila (Iha4_rhoac) (Q3 = 0.85; C = 0.73). Expected topology: in;expected transmembrane segment: 11-36. (C) /3-apoprotein of the light-harvesting complex of R. acidophila (Ihb5_rhoac) (Q} = 0.81; C = 0.67).Expected topology: in; expected transmembrane segment: 11-36. (D)Coat protein from bacteriophage Pfl (coab_bppfl) (Q} = 0.89;C = 0.80). Expected topology: out; expected transmembrane segment:19-42. (E) Coat protein from bacteriophage M13 (coab_bpfd)(Q3 = 0.78; C = 0.56). Expected topology: out; expected transmem-brane segment: 25-47. The reliability index (Rel) evaluates the reliabilityof the prediction from the network output and ranges from 0 (unreliableprediction) to 9 (fully reliable prediction).
Prediction of all-helical membrane protein topology
independently of their source and organization withrespect to the lipid bilayer.
Acknowledgements
We thank Dr B.Rost (EMBL, Heidelberg, Germany) for discussions. Weare also indebted to Dr D.T.Jones for sending us the MEMSAT program.This work was partially supported by the Italian Consiglio Nazionaledelle Ricerche.
References
Akyama,Y. and Ito,K. (1987) Topology analysis of the SecY protein, anintegral membrane protein involved in protein export in Escherichiacoli. EMBO J., 6, 3465-3470.
Bairoch,A. and Boeckmann,B. (1992) The SWISS-PROT proteinsequence data bank. Nucleic Acids. Res., 20, 2019-2022.
Bernstein,F.C, Koetzle.T.F., Williams.G.J.B., Meyer,E.F., Brice.M.D.,Rodgers,J.R., Kennard.O., Shimanouchi.T. and Tasumi.M. (1977)The protein data bank: a computer-based archival file for macro-molecular structures. J. Mol. Biol., 112, 535-542.
Chothia,C. (1992) One thousand families for the molecular biologist.Nature, 357, 543-544.
Cornette,J.L., Cease.K.B., Margalit,H., Spouge,J.L., Berzofsky,J.A. andDe Lisi,C. (1987) Hydrophobicity scales and computational techniquesfor detecting amphipathic structures in proteins. J. Mol. Biol., 195,659-685.
Cowan.S.W., Schirmer.T., Rummel,G., Steiert,M., Ghosh,R., Pauptit,R.A., Jansonius.J.N. and Rosenbusch,J.P. (1992) Crystal structuresexplain functional properties of two E. coli porins. Nature, 358, 727—733.
Degli Esposti,M.,Crimi,M. and Venturoli.G. (1990) A critical evaluationof the hydropathy profile of membrane proteins. Eur. J. Biochem.. 190,207-219.
Deisenhofer,J., Epp.O., Miki,K., Huber,R. and Michel,H. (1985)Structure of the protein subunits in the photosynthetic reactioncentre of Rhodopseudomonas viridis at 3 A resolution. Nature, 318,618-624.
Edelman.J. (1993) Quadratic minimization of predictors for proteinsecondary structure: application to transmembrane a-helixes. J. Mol.Biol., 232, 165-191.
Eisenberg.D., Schwarz,E., Komaromy.M. and Wall.R. (1984) Analysisof membrane and surface protein sequences with the hydrophobicmoment plot. J. Mol. Biol., 179, 125-142.
Engelman,D.M., Steitz.T.A. and Goldman,A. (1986) Identifying non-polar transbilayer helixes in amino acid sequences of membraneproteins. Annu. Rev. Biophys. Biophys. Chem., 15, 321-353.
Fariselli,P., Compiani,M. and Casadio,R. (1993) Predicting secondarystructures of membrane proteins with neural networks. Eur. Biophys.J., 22, 41-51.
Fasman.G.D. (1989) The development of the prediction of proteinstructure. In Fasman.G.D. (ed.), Prediction of Protein Structure and thePrinciples of Protein Conformation. Plenum Press, New York, pp. 193-316.
Fasman,G.D. and Gilbert, W. A. (1990) The prediction of transmembraneprotein sequences and their conformation: an evaluation. TrendsBiochem. Sci., 15, 89-92.
Feher,G., Allen.J.P., Okamura,M.Y. and Recs,D.C. (1989) Structure andfunction of bacterial photosynthetic reaction centers. Nature, 339,111-116.
Henderson.R., Baldwin,.!.M., Ceska,T.A., Zemlin,F., Beckmann,E. andDowning,K.H. (1990) Model for the structure of bacteriorhodopsinbased on high resolution electron cryo-microscopy. J. Mol. Biol., 213,899-929.
Hirst,J.D. and Sternberg.J.E. (1992) Prediction of structural andfunctional features of proteins and nucleic acid sequences by artificialneural networks. Biochemistry, 31, 7211-7218.
Jones.D.T., Taylor.W.R. and Thornton,.!.M. (1994) A model recognifi-tion approach to the prediction of all-helical membrane proteinstructure and topology. Biochemistry, 33, 3038-3049.
Khorana.H.G. (1992) Rhodopsin, photoreceptor of the rod cell. J. Biol.Chem., 267, 1-4.
Klein,P., Kanehisa.M. and De Lisi,C. (1985) The detection andclassification of membrane-spanning protein. Biochim. Biophys. Acta,815, 468-476.
Kyte,J. and Doolittle,R.F. (1982) A simple method for displaying thehydrophobic character of a protein. J. Mol. Biol., 157, 103-132.
Kuhlbrandt,W., Wang,D.N. and Fujiyoshi, Y. (1994) Atomic model ofplant light-harvesting complex by electron crystallography. Nature,367,614-621.
Kuhn,L.A., Leigh.J.S. (1985) A statistical technique for predictingmembrane protein structure. Biochim. Biophys. Acta, 828, 351-361.
Lipman,D.J. and Pearson.W.L. (1985) Rapid and sensitive proteinsimilarity searches. Science, 227, 1435-1441.
Manoil.C. and Beckwith.J. (1986) A genetic approach to analyzingmembrane protein topology. Science, 233, 1403-1408.
Matthews.B.W. (1975) Comparison of the predicted and observedsecondary structure of T4 phage lysozyme. Biochim. Biophvs. Acta,405,442-451.
McDermott,G., Prince,S.M., Freer.A.A., Hawthornthwaite-Lawless,A.M., Papiz,M.Z., Cogdell,R.J. and isaacs.N.W. (1995) Crystalstructure of an integral membrane light-harvesting complex fromphotosynthetic bacteria. Nature, 374, 517-521.
Michel,H. (1993) Crystallization of membrane proteins. Trends BiochemSci., 8, 56-59.
Michel,H., Weyer,K.A., Gruenberg.H., Dunger.I., Osterhelt.D. andLottspeich,F. (1986) The light and medium subunits of the photo-synthetic reaction center from Rhodopseudomonas viridis: isolation ofthe genes, nucleotide and amino acid sequence. EMBO J., 5, 1149-1158.
Parodi.L.A., Granatir.C.A. and Maggiora.G.M. (1994) A consensusprocedure for predicting the location of o-helical transmembranesegments in proteins. Comput. Applic. Biosci., 5, 527-535.
Persson.B. and Argos.P. (1994) Prediction of transmembrane segments inproteins utilising multiple sequence alignments. J. Mol. Biol., 237, 182-192.
Presnell,S.R. and Cohen.F.E. (1993) Artificial neural networks forpattern recognition in biochemical sequences. Annu. Rev. Biophvs.Biomol. Struct., 22, 283-298.
Qian.N. and Sejnowski.T.G. (1988) Predicting the secondary structure ofglobular proteins using neural network models. J. Mol. Biol., 202, 865-884.
Rao.J.K.M. and Argos.P. (1986) A conformational preference parameterto predict helices in integral membrane proteins. Biochim. Biophvs.Acta, 869, 197-214.
Rost,B. and Sander,C. (1993) Prediction of protein secondary structureat better than 70% accuracy. J. Mol. Biol., 232, 584-599.
Rost,B., Sander,C. and Schneider.R. (1994) Redefining the goals ofprotein secondary structure prediction. J. Mol. Biol., 235, 13-26.
Rost.B., Casadio,R., Fariselli,P. and Sander,C. (1995) Transmembranehelices predicted at 95% accuracy. Prot. Sci., 4, 521-533.
Rumelhart,D.E., Hinton,G.E. and Williams, R.J. (1986) Learningrepresentation by back-propagation errors. Nature, 323, 533-536.
Schultz,G.E. (1988) A critical evaluation of methods for prediction ofprotein secondary structures. Annu. Rev. Biophvs. Biophys. Chem., 17,1-21.
Strosberg.A.D. (1991) Structure/function relationship of proteinsbelonging to the family of receptors coupled to GTP-bindingproteins. Eur. J. Biochem., 29, 11009-11023.
Traxler,B., Boyd,D. and Beckwith,J. (1993) The topological analysis ofintegral membrane proteins. J. Membrane Biol., 132, 1 — 11.
Turner,R.J. and Weiner,J.H. (1993) Evaluation of transmembrane helixprediction methods using the recently defined NMR structures of thecoat proteins from bacteriophages M13 and Pfl. Biochim. Biophys.Acta, 1202, 161-168.
Vivarelli.F., Giusti.G., Villani,M., Campanini.R., Fariselli.P., Compiani,M. and Casadio.R. (1995) LGANN: a parallel system combining alocal genetic algorithm and neural networks for the prediction ofsecondary structure of proteins. Comput. Applic. Biosci.. 11, 253-260.
von Heijne,G. (1992) Hydrophobicity analysis and the positive-insiderule. J. Mot. Biol., 225, 487-494.
von Heijne.G. (1994) Membrane proteins: from sequence to structure.Annu. Rev. Biophvs. Biomol. Struct., 23, 167-192.
Weiss.M.S., Kreush.A., Shiltz,E., Nestel.U.. Welte.W.. Weckesser,J. andSchulZjG.E. (1991) The structure of porin from Rhodobacler capsulalusat 1.8 A resolution. FEBS Lett., 280, 379-382.
White.S.H. (1994) Hydrophaty plots and the prediction of membraneprotein topology. In White,S.H. (ed.). Membrane Protein Structure.Oxford University Press, New York, pp. 97-124.
Received on August 22, 1995; revised and accepted on October 27, 1995