Top Banner
BIOINFORMATICS Vol. 20 no. 6 2004, pages 959–969 DOI: 10.1093/bioinformatics/bth015 Identification of optimal classification functions for biological sample and state discrimination from metabolic profiling data Kyongbum Lee 1,, Daehee Hwang 2,, Tadaaki Yokoyama 3 , George Stephanopoulos 2 , Gregory N. Stephanopoulos 2 and Martin L. Yarmush 3,1 Chemical and Biological Engineering, Tufts University, Medford, MA 02155, USA, 2 Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA and 3 Shriners Burns Hospital and Center for Engineering in Medicine, Massachusetts General Hospital, Harvard Medical School, GRB 1402, Boston, MA 02114, USA Received on February 26, 2003; accepted on September 9, 2003 Advance Access publication January 29, 2004 ABSTRACT Motivations: Classification of biological samples for diagnostic purposes is a difficult task because of the many decisions involved on the number, type and functional manipulations of the input variables. This study presents a generally applic- able strategy for systematic formulation of optimal diagnostic indexes.To this end, we develop a novel set of computational tools by integrating regression optimization, stepwise variable selection and cross-validation algorithms. Results: The proposed discrimination methodology was applied to plasma and tissue (liver) metabolic profiling data describing the time progression of liver dysfunction in a rat model of acute hepatic failure generated by d-galactosamine (GalN) injection. From the plasma data, our methodology identified seven (out of a total of 23) metabolites, and the corresponding transform functions, as the best inputs to the optimal diagnostic index. This index showed better time res- olution and increased noise robustness compared with an existing metabolic index, Fischer’s BCAA/AAA molar ratio, as well as indexes generated using other commonly used discriminant analysis tools. Comparison of plasma and liver indexes found two consensus metabolites, lactate and gluc- ose, which implicate glycolysis and/or gluconeogenesis in mediating the metabolic effects of GalN. Contact: [email protected] INTRODUCTION With recent advances in high-throughput assay methods, vari- ous types of bioinformatics approaches have been proposed To whom correspondence should be addressed. The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. to extract useful empirical knowledge from large volumes of gene and protein expression data. For example, these kinds of analyses have been used for (1) classifying normal and cancer cells (Golub et al., 1999; Califano et al., 2000; Bichsel et al., 2001), (2) discovering novel subtypes of tumors (Hedenfalk et al., 2001), (3) identifying potential biomarkers for cancer prognosis (Alaiya et al., 2000; Kennedy, 2001), (4) delineat- ing gene families by function (Xu et al., 2001; Rus et al., 2002) and (5) improving general understanding of cellular biochemical regulation (Holter et al., 2000). Nevertheless, the extent to which such information may be employed directly in practical clinical applications is currently limited, because: (1) the knowledge output as described by gene regulatory or protein interaction networks is in a form too convoluted for clinically meaningful interpretation, (2) dimension reduc- tion analyses [e.g. principal component analysis (PCA) or Fischer’s discriminant analysis (FDA)] restrict description of the data structure to weighted linear combinations, which often result in loss of one-to-one correspondences between dimensional axes and physiological variables, (3) it is difficult to validate the statistical significance of results (e.g. regulation mechanisms elucidated by complex gene or protein network models) and (4) most gene and some protein expression data types are not well suited for medical uses such as routine dia- gnostics, because they are obtained via invasive procedures (e.g. tissue biopsies). This study addresses these issues by developing a sys- tematic method for extracting useful diagnostic information from multivariate biological data. In particular, the pro- posed method is applied to metabolic profiling data, which offers complementary information to gene and protein expres- sion profiling data (Glassbrook et al., 2000; Glassbrook and Ryals, 2001). Metabolites are intermediates of biochemical Bioinformatics 20(6) © Oxford University Press 2004; all rights reserved. 959 by guest on September 16, 2011 bioinformatics.oxfordjournals.org Downloaded from
11

Identification of optimal classification functions for biological sample and state discrimination from metabolic profiling data

May 13, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Identification of optimal classification functions for biological sample and state discrimination from metabolic profiling data

BIOINFORMATICS Vol. 20 no. 6 2004, pages 959–969DOI: 10.1093/bioinformatics/bth015

Identification of optimal classification functionsfor biological sample and state discriminationfrom metabolic profiling data

Kyongbum Lee1,†, Daehee Hwang2,†, Tadaaki Yokoyama3,George Stephanopoulos2, Gregory N. Stephanopoulos2 andMartin L. Yarmush3,∗

1Chemical and Biological Engineering, Tufts University, Medford, MA 02155, USA,2Chemical Engineering, Massachusetts Institute of Technology, Cambridge,MA 02139, USA and 3Shriners Burns Hospital and Center for Engineering in Medicine,Massachusetts General Hospital, Harvard Medical School, GRB 1402, Boston,MA 02114, USA

Received on February 26, 2003; accepted on September 9, 2003

Advance Access publication January 29, 2004

ABSTRACTMotivations: Classification of biological samples for diagnosticpurposes is a difficult task because of the many decisionsinvolved on the number, type and functional manipulationsof the input variables. This study presents a generally applic-able strategy for systematic formulation of optimal diagnosticindexes. To this end, we develop a novel set of computationaltools by integrating regression optimization, stepwise variableselection and cross-validation algorithms.Results: The proposed discrimination methodology wasapplied to plasma and tissue (liver) metabolic profiling datadescribing the time progression of liver dysfunction in a ratmodel of acute hepatic failure generated by d-galactosamine(GalN) injection. From the plasma data, our methodologyidentified seven (out of a total of 23) metabolites, and thecorresponding transform functions, as the best inputs to theoptimal diagnostic index. This index showed better time res-olution and increased noise robustness compared with anexisting metabolic index, Fischer’s BCAA/AAA molar ratio,as well as indexes generated using other commonly useddiscriminant analysis tools. Comparison of plasma and liverindexes found two consensus metabolites, lactate and gluc-ose, which implicate glycolysis and/or gluconeogenesis inmediating the metabolic effects of GalN.Contact: [email protected]

INTRODUCTIONWith recent advances in high-throughput assay methods, vari-ous types of bioinformatics approaches have been proposed

∗To whom correspondence should be addressed.†The authors wish it to be known that, in their opinion, the first two authorsshould be regarded as joint First Authors.

to extract useful empirical knowledge from large volumes ofgene and protein expression data. For example, these kinds ofanalyses have been used for (1) classifying normal and cancercells (Golubet al., 1999; Califanoet al., 2000; Bichselet al.,2001), (2) discovering novel subtypes of tumors (Hedenfalket al., 2001), (3) identifying potential biomarkers for cancerprognosis (Alaiyaet al., 2000; Kennedy, 2001), (4) delineat-ing gene families by function (Xuet al., 2001; Ruset al.,2002) and (5) improving general understanding of cellularbiochemical regulation (Holteret al., 2000). Nevertheless, theextent to which such information may be employed directlyin practical clinical applications is currently limited, because:(1) the knowledge output as described by gene regulatoryor protein interaction networks is in a form too convolutedfor clinically meaningful interpretation, (2) dimension reduc-tion analyses [e.g. principal component analysis (PCA) orFischer’s discriminant analysis (FDA)] restrict description ofthe data structure to weighted linear combinations, whichoften result in loss of one-to-one correspondences betweendimensional axes and physiological variables, (3) it is difficultto validate the statistical significance of results (e.g. regulationmechanisms elucidated by complex gene or protein networkmodels) and (4) most gene and some protein expression datatypes are not well suited for medical uses such as routine dia-gnostics, because they are obtained via invasive procedures(e.g. tissue biopsies).

This study addresses these issues by developing a sys-tematic method for extracting useful diagnostic informationfrom multivariate biological data. In particular, the pro-posed method is applied to metabolic profiling data, whichoffers complementary information to gene and protein expres-sion profiling data (Glassbrooket al., 2000; Glassbrook andRyals, 2001). Metabolites are intermediates of biochemical

Bioinformatics 20(6) © Oxford University Press 2004; all rights reserved. 959

by guest on Septem

ber 16, 2011bioinform

atics.oxfordjournals.orgD

ownloaded from

Page 2: Identification of optimal classification functions for biological sample and state discrimination from metabolic profiling data

K.Lee et al.

pathways that convert nutrient fuel into energy, maintain cel-lular homeostasis, eliminate harmful chemicals and providebuilding blocks for biosynthesis. Moreover, many metabolicintermediates are in constant exchange with the extracellularmedium, and changes in extracellular metabolite concentra-tions may be correlated with changes in pathway activity.Hence, metabolites can be considered the most immediatedescriptors of cell function. In recent applications, metabolicprofiling has been used to study plant metabolic phenotypes(Fiehnet al., 2000a; Roessneret al., 2001) and discover novelgene functions (Trethewey, 2001). Metabolite profiling alsohas obvious applications to the investigation of metabolicdisorders (Griffinet al., 2001). The potential for clinical dia-gnosis protocols based on multiple metabolite measurementshas been recognized for some time (Goodmanet al., 1977;Jellum, 1977), but the early efforts did not produce strategiesor generally applicable models for systematic analysis. Recentworks by Nicholsonet al. have applied pattern recognitiontechniques, notably PCA and partial least squares (PLS), to‘metabonomic’ data generated by nuclear magnetic resonance(NMR) spectroscopy to develop diagnostic screens for heartdisease (Brindleet al., 2002) and drug toxicity (Nicholsonet al., 2002; Coenet al., 2003). This paper expands the meta-bolic profiling tool space by developing a flexible discriminantanalysis strategy that, unlike FDA or PLS-DA, constructs thesample classifier (diagnostic) using both linear and nonlinearvariable transforms.

In the present analysis, we used a moderate sized data-set (∼1000 entries) obtained in an established animal modelof injury as inputs to a new computational methodology toderive the optimal functional forms for a diagnostic index.The injury model (Shitoet al., 2001) describes the pro-gression of liver dysfunction, or fulminant hepatic failure(FHF), induced by injection of liver toxind-galactosamine(GalN). The computational methodology combines multi-variate alternating conditional expectation MACE (Breimanand Friedman, 1985) with stepwise search algorithm andleave-one-out cross-validation LOOCV (Hwanget al., 2002),which integrates the selection (number and type) of the bestinput variables and identification of optimal functional formswhereby the selected variables enter the diagnostic index. Thiscomputational method should also facilitate the physiologicalinterpretation of the diagnostic index. To explore this notion,the study also compares the effects of GalN on the metabolicprofiles of the whole body (plasma samples) and the liver(isolated perfused liver samples), which is the primary site ofGalN action.

SYSTEMS AND METHODSAnimalsMale Sprague–Dawley rats (Charles River Laboratories,Boston, MA) weighing 150–200 g were housed in a tem-perature (25◦C) and light-controlled room (12 h light–dark

12 hrs 60 min

Fasting

D-GalN or Saline1st Injection

Cardiac Puncture,LiverPerfusion End

12 hrs

D-GalN or Saline2nd Injection

1, 4, 8, 12 hrs

Fig. 1. Experimental design. At time zero, all rats were subjected toa fast, which was continued throughout the study period. At 12 h, ratswere randomly selected and injected with saline or GalN dissolvedin saline (1.4 g/kg). These injections were repeated at 24 h. Plasmasamples were obtained via cardiac puncture at 1, 4, 8 and 12 h after thesecond GalN or saline injection. Immediately following the cardiacpuncture, livers were isolated and perfused for 60 min, during whichtime the perfusion medium was periodically sampled.

cycle). Animals were individually housed and allowed toadjust to their new surroundings for at least 2 days beforereceiving treatment. The animals were cared for in accordancewith the National Research Council guidelines. Experimentalprotocols were approved by the Subcommittee on ResearchAnimal Care, Committee on Research, MassachusettsGeneral Hospital. The initiation of FHF by GalN injection(Arai et al., 2001; Shitoet al., 2001) has been described indetail elsewhere. Briefly, FHF was caused by intraperiton-eally administering GalN dissolved in normal saline at a doseof 1.4 g/kg to fasted rats 12 and 24 h after beginning the fast.Previous work showed that a single dose administration ofGalN elicits reversible liver dysfunction, whereas a dual doseadministration produces irreversible liver failure. The mortal-ity rate of this treatment was 25% at 48 h and 83% at 168 hafter the initial injection. Control rats were fasted, but receivednormal saline injections. Rats remained fasted throughout thestudy period, which lasted up to 36 h. Plasma samples wereobtained via cardiac puncture at 1, 4, 8 and 12 h after thesecond GalN or saline injection. Livers were then isolatedfrom the same animals and perfused. Livers were perfusedin a recirculating system as described previously (Yamaguchiet al., 1997; Leeet al., 2000). During each 60 min perfusion,periodic samples were taken from the medium reservoir andanalyzed for metabolites (Yamaguchiet al., 1997; Leeet al.,2000). Figure 1 summarizes the experimental design.

Metabolite measurementsBiochemical assays measured 23 metabolite concentrationson plasma samples from 61 animals divided into groupsaccording to treatment and time elapsed following the secondGalN/saline injection: 1(n= 8), 4(n = 9), 8(n = 11) and12 h control(n = 6), and 1(n = 5), 4(n = 9), 8(n = 7)

and 12 h FHF(n = 6). The metabolites were glucose, urea,lactate, ammonia, ketone bodies and amino acids. Assayson perfused liver samples measured the same set of meta-bolites. Each perfusion produced eight time point samples,which were collapsed into a single uptake or release rateby linear regression of the concentration–time data. Glucoseand lactate concentrations were measured using commercial

960

by guest on Septem

ber 16, 2011bioinform

atics.oxfordjournals.orgD

ownloaded from

Page 3: Identification of optimal classification functions for biological sample and state discrimination from metabolic profiling data

Optimization for biological sample discrimination

enzyme assay kits (Sigma Chemical Co., St Louis, MO)that are based on the methods of Trinder (1969) and Loomis(1961), respectively. Urea was measured by the urease methodreported by Searcyet al. (1961). Ketone bodies (acetoacetateandβ-hydroxybutyrate) were measured using an enzymaticfluorometric method developed by Olsen (1971). Ammoniaand amino acids were quantified by high-performance liquidchromatography (HPLC) following pre-column derivatizationwith 6-aminoquinolyl-N-hydroxysuccinimidyl carbamate(Cohen and De Antonis, 1994), which is a commerciallyavailable fluorescent label for amino acids and primary andsecondary amines (Waters Corp., Milford, MA).

ALGORITHMMulti-variate alternating conditional expectationMACE is a non-parametric procedure to estimate the optimaltransformationθ(Y ) andφ1(X1), . . . ,φp(Xp) of the responsevariableY (treatments of GalN in this study; 1 for normaland 2 for GalN injected samples) and the predictor variables(metabolites in this study)X1, . . . ,Xp for multiple regres-sion (Breiman and Friedman, 1985). These transformationsaid in the interpretation of the relationship structure betweenresponse and predictor variables. Here, we used MACE tofind the optimal transformations of metabolite variables thatdiscriminate plasma or perfused liver samples from controland FHF rats. Algorithmically, MACE finds optimal trans-formations that maximize the correlation between responseand predictor variables. In its original formulation, for thebivariate case(p = 1), the optimal transformations ofY andX [denotedθ∗(Y ) andφ∗(X)] satisfies

ρ∗ = ρ(θ∗,φ∗) = maxθ ,φ

ρ[θ(Y ),φ(X)], (1)

such that the transformations minimize the fraction of the vari-ance not explained by a regression ofθ(Y ) on

∑p

i=1 φi(Xi).The variance(e2) is defined by

e2(θ ,φ) = E{[

θ(Y ) − ∑p

i=1 φi(Xi)]2}

Eθ2(Y ). (2)

In this study, we looked for the optimal transformations ofresponse and predictor variables for the purpose of discrim-inant analysis, rather than regression analysis. Therefore,the objective function maximizes the ratio of between-groupvariance to total variance instead of the correlation. Corres-pondingly, e2 in Equation (2) is replaced with the ratio ofwithin-group variance to total variance. This replacement per-mits a more stable stepwise search to find the best additivediscriminant model. LetR = ∑p

i=1 φi(Xi) andT = θ(Y ).Then, the ratio of within-group variance to total variance isdefined by

e2 =∑c

j=1

(∑Nj

i∈j

(Ri − R̄j

)2)

∑Nk=1

(Rk − R̄

)2, (3)

where R̄j is the mean ofR in group j and c is the num-ber of groups. Then, the ratio of between-group variance tototal variance isρ = 1−e2. For the sake of consistency, thebetween- and within-group ratios are denoted by the symbolsused for their corresponding regression measures, correlationand the fraction of residuals.

Stepwise variable selectionMACE is combined with a stepwise selection method(Dillon and Goldstein, 1984) to find the best-fitting addit-ive discriminant model (called the diagnostic index in thisstudy). The original implementation of MACE (Breiman andFriedman, 1985) used the forward selection method (Dillonand Goldstein, 1984), but this study uses the stepwise selec-tion method, because it reveals more clearly the independentnature of the variables selected to the discriminant model.

Leave-one-out cross-validationIn order to develop a discriminant model with a set of pre-dictors that are robust with respect to sample variability andpopulation noise, we used LOOCV (Hwanget al., 2002). Thedataset was split into a test set consisting of eight randomlyselected samples (one control and FHF sample for each of thefour time points) and a training set composed of the remaining53 samples:

(1) A sample is taken out from the training set. Stepwisevariable selection combined with MACE is performedon the remainder of the training set (53 samples) to findthe best-fitting additive discriminant model.

(2) A set of predictor variables(sl) retained in the modeland their partialF -ratios (rl) are stored for thel-thiteration.

(3) Steps 1 and 2 are continued until all training samplesare taken out once.

At the completion of LOOCV,N sets of predictors(sl) andtheir partial ratios(rl) are obtained. In order to identify thebest set of predictors for the training set, we used the averagedpartial ratios and the conditional expectation of partial ratiofor predictor variables as measures from theg sets ofsl andrl .They are referred to as power and local power in Table 1. LetS be the overall set of predictors selected during a total ofN

LOOCV procedures. First, the averaged partial ratio ofXk

duringN iterations of LOOCV is computed by:

gk = E (rk|Xk ∈ S) =∑N

Xk∈slrkl

N, (4)

whererkl is the partial ratio(rk) in thel-th iteration, whenXk isselected in thel-th iteration. Next, the conditional expectation(ckh) of Xk in a particular positionh of the stepwise selectionis computed by:

ckh = E (rk|Xk ∈ o(h)) =∑N

Xk∈o(h) rkl

qkh

, (5)

961

by guest on Septem

ber 16, 2011bioinform

atics.oxfordjournals.orgD

ownloaded from

Page 4: Identification of optimal classification functions for biological sample and state discrimination from metabolic profiling data

K.Lee et al.

Table 1. Plasma discriminant model variables

Rank Metabolite Power Local power Fitted equation

1 HIS 0.4767 0.6148 T (x) = 0.43+ 0.98x

2 ARG 0.1533 0.1907 T (x) = −0.07− 0.57x

3 LAC 0.1555 0.0587 T (x) = 0.17+ 0.51x

4 ALA 0.0249 0.0452 T (x) = −0.33− 0.53x+ 0.21x2 − 0.27x3

5 ILE 0.0219 0.0201 T (x) = −0.05− 0.21x3 + 0.40x4 + 0.45x5

6 MET 0.0273 0.015 T (x) = 0.31 log(2.91x+ 2.96)− 0.047 GLC 0.0543 0.0161 T (x) = 0.02 exp(−3.07x+ 0.96)− 0.29

Metabolites were selected and ordered by MACE, stepwise variable selection and LOOCV. Power and local power refer to position (selected order) independent and dependentcontribution, respectively, of the entering variable to overall discrimination as measured by the within-group to between-group variances (see Methods section). Variables were notallowed to enter unless they increased the final discrimination by at least 1%. This cut-off value selected a total seven metabolite predictor variables for final plasma discriminantmodel (diagnostic index).Abbreviations: HIS, histidine; ARG, arginine; LAC, lactate; ALA, alanine; ILE, isoleucine; MET, methionine; GLC, glucose.

whereo(h) is a predictor that is most frequently selected in theh-th position by the stepwise selection method during theN

LOOCV procedures, andqkh is the number of times thatXk isselected in theh-th position. For example, ifo(1) isX7 andX7

is selected 37 times as the first variable to enter(qkh = 37),then the conditional expectation ofX7 in the first position(c71)

is the average of the corresponding 37 partial ratios. Ifo(hj )

is the variable already included ino(hi ; i < j), theo(hj ) isdefined by the variable that is most often selected, but notincluded ino(hi ; i < j). For example, ifX7 of o(1) andX15

are most and second most often selected in the fifth position,respectively,o(5) is X15, notX7. The predictor variables thatwill be included in the discriminant model should havegk andckh values larger than the cutoff value of 0.01.

IMPLEMENTATIONS AND RESULTSOptimal formulation of diagnostic indexWe obtained metabolic profiles on plasma samples from con-trol and injured rats undergoing progressive FHF. Control andinjured rats were injected with normal saline and saline dis-solved GalN, respectively. The metabolic profiles consisted ofplasma concentration data for 23 major primary metabolites(called predictor variablesxk; see Methods section) obtainedat various time periods after the GalN insult (1, 4, 8 and12 h in both controls and FHF samples). From these data,the proposed criteria for outputs of LOOCVs (see Methodssection) identified an optimal subset of seven metabolites andtheir transforms such that a diagnostic index consisting ofthe sum of these transforms yields maximal discriminationbetween control and FHF samples at the earliest time pointafter the GalN injection. These seven metabolites are called abasis set of metabolites. The most discriminating metabolitesare, in order, histidine (HIS), arginine (ARG), lactate, alan-ine (ALA), isoleucine (ILE), methionine (MET) and glucose(Table 1).

Figure 2A shows the concentrations of these metabolites atvarious times after the second saline or GalN injection. Plasma

concentrations of HIS and MET in the FHF animal increasedsignificantly over control by 4 h after the second GalN injec-tion, and remained elevated thereafter. FHF increased lactateconcentration and decreased ARG and glucose concentra-tions. Finally, ALA and ILE concentrations changed littlethroughout the study period, except a sudden increase in ALAconcentration between 8 and 12 h. The decrease in glucoseand increase in lactate concentrations are noteworthy, becausethey consistently point to impaired glucose metabolism.

The transforms of these predictor variables and the pro-gressive improvement in discriminatory power (i.e. the ratioof between-group variance to total variance; see Methodssection) are shown in Figure 2C. The scatter plots of meta-bolites(xi) and treatments of GalN (y; 1 for controls and 2for FHFs) in Figure 2B show that the untransformed meta-bolites do not discriminate individually the source samplesto any significant degree. In contrast, the scatter plotsof the sum of transformedxi(Ip) and the transformedy(response score;Ty) (Fig. 2D) clearly show the progress-ively improving discrimination of the source samples withthe inclusion of each additional metabolite in the regressionmodel:

Ip =p∑

i=1

T (xi). (6)

The transform of the response variabley was trivial, yielding1 and−1, respectively, for FHF and control. With the sevenmetabolites, the final discriminatory power was 0.936. Theshapes of the transform functions for the predictor variablesare described in Figure 2C. The equations, shown in Table 1,were derived by fittingxi andT (xi) to various function famil-ies, including linear, logarithmic, exponential and polynomialfunctions (Friedman and Stuetzle, 1982). For each series ofxi andT (xi), the best function estimate was chosen to min-imize the residual (Table 1). Substituting for eachT (xi) inEquation (6) with Table 1 entries, the final diagnostic index

962

by guest on Septem

ber 16, 2011bioinform

atics.oxfordjournals.orgD

ownloaded from

Page 5: Identification of optimal classification functions for biological sample and state discrimination from metabolic profiling data

Optimization for biological sample discrimination

Fig. 2. Plasma metabolic profiles and optimized diagnostic index. MACE, stepwise variable selection and LOOCV identified metabolitepredictor variables and transforms that progressively improved the discriminatory power of the final diagnostic index. (A) Mean plasmaconcentrations of the selected metabolites, in order of decreasing discriminatory power, top-to-bottom (Table 1). Controls are representedby open square boxes and FHF by closed square boxes. Error bars are SD. Treatment and time point sample sizes varied from five to eight.(B) Scatter plots of untransformed response (y) and scaled predictor (xi) variables. Last row shows the linear transform ofy to Ty. (C) Scatterplots of transformed predictors (Txi) andxi . Dashed lines represent estimates ofTx obtained by fitting various function families to thedata. (D) Scatter plots ofTy versusTx showing progressively improving discrimination. Numbers are ratios of between-group variance towithin-group variance.

Ip becomes:

Ip = T (x1 = HIS)+T (x2 = ARG)+· · ·+T (x7 = glucose).(7)

In general, the transforms (six out of the seven) increased ordecreased monotonically over the present range of predictorvariables, suggesting a simple relationship between the rawand transformed predictor variables. One exception was the

transform of ILE(x4), which was described by a fifth orderpolynomial.

Application to diagnosticsThe diagnostic index(I ) described in Table 1 was evaluatedfor the training data, and plotted against the correspondingsample identification numbers (Fig. 3A). Consistent with ourobjective, the control and FHF samples clearly separated,

963

by guest on Septem

ber 16, 2011bioinform

atics.oxfordjournals.orgD

ownloaded from

Page 6: Identification of optimal classification functions for biological sample and state discrimination from metabolic profiling data

K.Lee et al.D

iagn

ostic

Inde

x (I

)D

iagn

ostic

Inde

x (I

)D

iagn

ostic

Inde

x (I

)D

iagn

ostic

Inde

x (I

)

A

B

Fig. 3. Optimized diagnostic index scores for training and test data.(A) The diagnostic index (I) described in Table 1 is evaluated for thetraining data and plotted against corresponding sample identificationnumbers. The sample group memberships are indicated by the sym-bol legend. L1, L4, L8 and L12 are FHF samples taken at 1, 4, 8and 12 h after GalN injection. H1, H4, H8 and H12 are correspond-ing control samples. FHF are plotted as closed and control as opensymbols. (B) IndexI for the test data plotted against correspondingsample identification numbers.

as all the index values for the FHF samples were found outsidethe confidence limits (dotted lines for 95% and solid lines for99% confidence levels) of the control group index values. Theseparation between control and FHF samples was achievedeven for the earliest time point, thus enabling us to detectFHF from metabolite measurements only 1 h after the secondGalN injection.

In Figure 3B, new test samples, which were not involvedin formulating the diagnostic index, were predicted using theconfidence limits derived from the training set. As with thetraining set, all the test samples were correctly classified.The performance of our method was compared against thatof a clinical index, Fischer’s ratio (Fischeret al., 1975),which correlates increasing severity of hepatic dysfunctionwith decreasing molar ratio of branched chain amino acids to

aromatic amino acids (BCAA/AAA). This ratio was evaluatedfor the training data used to develop our optimal diagnosticindex and plotted against corresponding sample identifica-tion numbers in Figure 4A. In contrast to our optimal index,which clearly separated control from FHF samples even atthe 1 h time point, the BCAA/AAA ratio did not discriminatesignificantly between control and FHF samples at any timepoint. The largest difference between control and FHF meanBCAA/AAA scores was observed for the 12 h samples, buteven at this time point, statistical significance could not beinferred (at the level ofp < 0.05).

We also compared our diagnostic index against other pop-ular models used for sample classification: self-organizingmap (SOM; Fig. 4B), PCA (Fig. 4C) and method of PLS(Fig. 4D). SOM (Fig. 4B) did not separate control and FHFsamples correctly except for the 12 h time point. PCA andPLS (Nguyen and Rocke, 2002) yielded comparable results,and performed better than the previous two methods, as theydiscriminated between control and FHF samples for both the8 and 12 h time points. However, neither method was ableto discriminate clearly between control and FHF samples forthe earlier time points (1 and 4 h). In all cases, the separationafforded by MACE was superior. The two best discrimin-ated groups were the control and FHF samples obtained atthe latest time point (12 h), consistent with the time course ofGalN insult established using plasma levels of liver enzymes(Shitoet al., 2001).

Metabolic profiling at the tissue levelTo investigate the relationship between changes in cellu-lar metabolic activity at the site of initial disturbance (byGalN action) and the metabolic state of the whole body,we compared the optimized diagnostic indexes for liver andplasma. Liver metabolic profiles were obtained using theisolated perfused liver. This experimental system removedsystemic influences and afforded measurements on metabolicrates, which more directly reflect pathway activities thanconcentrations.

MACE and stepwise-selection identified the following basismetabolites for the liver, listed in order of their discrim-inatory power: lactate output, glutamine (GLN) uptake,β-hydroxybutyrate (3-OHB) output, ammonia (AMM) out-put, glucose output and ornithine (ORN) output. Figure 5Ashows the time course of changes to the uptake or output ofthese metabolites. Lactate output was elevated throughout,whereas GLN uptake and glucose output steadily decreased.The trends were not as simple for 3-OHB, AMM and ORNoutput, as these rate profiles for control and FHF intersectedat one or more time points.

The elevated lactate and depressed glucose outputs pointedto impaired glucose metabolism in the liver, consistent withobservations on the plasma data. Moreover, both the plasmaand perfused liver diagnostic indexes included lactate andglucose, with lactate as the more important variable. On the

964

by guest on Septem

ber 16, 2011bioinform

atics.oxfordjournals.orgD

ownloaded from

Page 7: Identification of optimal classification functions for biological sample and state discrimination from metabolic profiling data

Optimization for biological sample discrimination

Fig. 4. Diagnostic index comparison. A clinical index (A) and commonly used discriminant models (B, C and D) are compared against ouroptimized diagnostic index. The sample group memberships are indicated by the symbol legend as in Figure 3A. (A) Fischer’s ratio correlatesdecreasing molar BCAA/AAA ratios with increasing liver dysfunction severity. (B) Unsupervised SOM. (C) PCA (FDA). (D) Method ofPLS. Note that PCA and PLS plots show sample projections onto the first Principal Component (PC) and the first Latent Variable (LV) forsake of fair comparisons.

other hand, HIS, ARG, ALA, ILE and MET in the plasmaset were replaced by GLN, 3-OHB, AMM, ORN and tyr-osine (TYR) in the liver set. Although lactate appeared inboth indexes, it played a more critical role in the liver dia-gnostic index than the plasma index, as it accounted for 62and 15% of discrimination of the liver and plasma samples,respectively.

The liver transform functions exhibited more complexshapes (Fig. 5C). Six out of the seven plasma transformfunctions (Fig. 2C) were monotonic, and could be describedwith a linear, exponential or polynomial function of order3 or less. In contrast, most of the basis metabolites in theliver diagnostic index had to be fitted using stepwise polyno-mial regression. Five out of the six liver transforms showednon-monotonic behavior, and were described by higher (>3)power polynomials. Transforms for LAC, GLN and AMMwere biphasic, and those for 3-OHB and ORN oscillatory,

suggestive of a complex relationship between the raw andtransformed variables.

DISCUSSION AND CONCLUSIONThis study presents a novel computational methodology foroptimally discriminating biological systems based on meta-bolite measurements, which has potential clinical applica-tions, such as diagnostics. We demonstrate the methodologyon a moderate sized (2× 61 × 23) dataset describing theprogression of liver failure in a rat model of GalN toxemiadeveloped previously (Shitoet al., 2001). Some detailedinformation already exists regarding GalN toxicity and livermetabolism (Ozekiet al., 1982; Usuiet al., 1985; De Oliveiraet al., 1992; Arai et al., 2001), providing an opportunityto check the discrimination results against physiologicalknowledge. The data were collected at two levels, whole body

965

by guest on Septem

ber 16, 2011bioinform

atics.oxfordjournals.orgD

ownloaded from

Page 8: Identification of optimal classification functions for biological sample and state discrimination from metabolic profiling data

K.Lee et al.

0 10 20 30 40 50 60-2

-1.5

-1

0.

1

1.5

2H 1L 1H 4L 4H 8L 8H 12L 12

Dia

gn

ost

ic I

nd

ex

(I)

A B C D

E

-0.5

0

5

Fig. 5. Perfused liver metabolic profiles and optimized diagnostic index. After obtaining plasma samples by cardiac puncture, livers wereisolated and perfused to obtain corresponding liver specific metabolic rate profiles. (A) Mean metabolic rates, in order of decreasing discrimin-atory power, top-to-bottom. Controls are represented by open square boxes and FHF by closed square boxes. Error bars are SD. Treatment andtime point sample sizes varied from five to eight. See Figure 2 for (B)–(D) captions. (E) The diagnostic index (I) is evaluated for the trainingdata (see Methods section) and plotted against corresponding sample identification numbers. The sample treatment group memberships areindicated by the symbol legend as in Figure 3A.

966

by guest on Septem

ber 16, 2011bioinform

atics.oxfordjournals.orgD

ownloaded from

Page 9: Identification of optimal classification functions for biological sample and state discrimination from metabolic profiling data

Optimization for biological sample discrimination

(plasma) and tissue (isolated liver). The plasma data was usedto construct an additive discriminant model (the diagnosticindex) composed of linear and non-linear transforms of thebasis metabolites. The diagnostic index correctly classified,even at the earliest observed time point (1 h), both the train-ing and test samples with 99% confidence. This representeda significant improvement over an existing metabolic index,Fischer’s BCAA/AAA molar ratio as well as indexes gener-ated using other popular discriminant analysis tools (SOMand FDA).

The systematic development of a diagnostic index fromeven moderate sized (∼10s–100s of variables) data is a com-plicated task, when little or no mechanistic information isat hand regarding the relationship structure (i.e. functionaldependence) between the measured metabolites and the patho-logical state under investigation, which is often the case formany diseases of multi-factorial origin. For example, withonly five functional (e.g. periodic, exponential, polynomial,power and logarithmic) families considered, the number ofall possible functional form combinations for a given set ofk metabolites is 5k, even in the simplest case of an additivemodel. A key advantage of the present method is that it sim-ultaneously selects both the best subset of basis metabolitesand identifies the necessary transforms of these metabol-ites for optimal discrimination, withouta priori ordering ofthe variables and any assumptions about the type of trans-form functions. Thus, both parsimony and flexibility areachieved.

The stepwise search algorithm guarantees that the metabol-ite inputs to the diagnostic index define the most compact, andtherefore maximally independent, set of metabolic character-istics that differentiate one physiological state from another.An intuitive, geometric interpretation of these basis metabol-ites is given by an analogy with vector spaces. In this view, thebasis metabolites are considered to be nonlinear bases span-ning a particular physiological discrimination space. In thisstudy, we refer to the hyperspace spanned by the seven basismetabolites as a ‘liver dysfunction space’. Similar interpreta-tions have been suggested for projection-based data analysismethods such as PCA (in essence a singular value decom-position) and FDA, where linear combinations of variablesconstitute bases spanning a multi-dimensional hyperspace.However, unlike PCA or FDA, our method uniquely asso-ciates each basis with a single transformed metabolite ratherthan a linear combination of several metabolites. This typeof one-to-one basis mapping directly facilitates physiologicalinterpretation of discrimination characteristics. Figure 6Ashows a three-dimensional liver dysfunction space, spannedby Tx(HIS), Tx(ARG) and Tx(LAC), where the FHF andcontrol group separation occurs mainly along the positive dir-ections of theTx(HIS), Tx(ARG) andTx(LAC) axes. Thus,elevated HIS and LAC, and decreased ARG (ARG transformis linear and negative) concentrations are characteristic of thispathology. It is important to note that the aforementioned

Tx(GLC)

Tx(

LAC

)FHF Control

Liver

Whole body

Circulation

r(x)uptake

r(x)output

∫=t

ot r(x)dtx][

B

C

A

Fig. 6. Metabolite basis and physiological spaces. (A) Three dimen-sional representation of the plasma liver dysfunction space spannedby transforms of LAC, HIS and ARG. Data shown are projections ofsample group means. Treatment group memberships are indicated bythe symbol legend as in Figure 3A. (B) Two-dimensional physiolo-gical subspace spanned by transforms of LAC and GLC representingthe overlap between the plasma and tissue level liver dysfunctionspaces. Data shown are scatter plots ofTx(LAC) versusTx(GLC) foreach sample in the training set. (C) Interactions between liver andwhole body-level metabolic activities in the regulation of organismlevel metabolic homeostasis. Variablesr(x) and t refer to rate ofuptake or output of metabolitex by the liver and time, respectively.

liver dysfunction space, as defined by the plasma diagnosticindex, is a subspace of the total attainable physiological space.By definition, this subspace captures significant metabolicchanges associated with FHF as observed on plasma samples.

967

by guest on Septem

ber 16, 2011bioinform

atics.oxfordjournals.orgD

ownloaded from

Page 10: Identification of optimal classification functions for biological sample and state discrimination from metabolic profiling data

K.Lee et al.

Physiological subspaces can also be defined for varioussubsystems within an organism that are subject to the samepathology (in this case liver dysfunction), but are describedby different discriminant models. In this study, a second liverdysfunction subspace was defined by the diagnostic indexderived from the perfused liver data. The plasma and liversubspaces shared two common basis components, lactateand glucose, which can be used to construct a ‘consensus’subspace (Fig. 6B). In this space, plasma and liver sampleprojections obtained by their respective glucose and lact-ate transforms trace two distinct trajectories. In the case ofplasma,Tx(LAC) monotonically decreased with increasingTx(GLC) regardless of sample origin (control or FHF). On theother hand, FHF samples traced a steeper curve, suggestive ofhigher sensitivity of lactate to changes in glucose concentra-tion. In the case of the perfused liver,Tx(LAC) again decreasedwith increasingTx(GLC) for the control samples, but showedbiphasic behavior for the FHF samples. Moreover, the per-fused liver trajectory spanned a larger range of transformedvariable values than the plasma trajectory. This is unlikelydue to differences in measurement error, as the same anim-als and biochemical assays were used for both plasma andperfused liver samples. A more likely explanation is that theeffects of GalN on glucose–lactate metabolism in the liverare dampened by extra-hepatic contributions in the wholebody. Conceptually, we consider such ‘smoothing’ to resultfrom a ‘time integral’ of tissue-level metabolic activity, whichmanifests as regulatory interactions between various tissuesaimed at maintaining whole body-level metabolic homeostasis(Fig. 6C).

The monotone nature of control sample trajectories indic-ates that both whole body and liver glucose and lactatemetabolic activities are synchronized, presumably becausethe liver is the primary regulator of glucose homeostasis in thebody. Our subspace analysis indicates that GalN disrupts thissynchronization, and together with the concentration and rateprofiles of lactate and glucose (Figs 2A and 5A) point tometabolic pathways connecting glucose and lactate as likelysites of GalN action in the liver. This interpretation is suppor-ted by several published reports. For example, administrationof fructose 1,6-bisphosphate reduced liver cell death duringGalN-induced FHF (De Oliveiraet al., 1992). Other stud-ies found a diminishing effect of GalN on the reserve powerof glycolysis and fructose 1,6-diphosphatase activity (Ozekiet al., 1982).

In conclusion, our methodology identified a robust and effi-cient discriminant model from metabolic profiling data, whichhas shown superior diagnostic power compared with a clin-ical index as well as other statistical models. A novel featureof our method is that it simultaneously selects the best sub-set of input variables and their linear or nonlinear transformsfor optimal sample discrimination. The method is system-atic and general, and could be applied to other diseases,especially those of multi-factorial origin, such as diabetes

and obesity, as well as other types of data, such as proteinexpression profiles. Given enough repeated observations, weexpect to develop separate discriminant models (diagnosticindexes) for particular time points, or stages, of disease pro-gression, which will enable us to construct a dynamic modelfor monitoring disease progression. Moreover, a larger datasetwould more representatively sample the population distribu-tion, and lessen the probability of over-fitting, which give riseto transforms with little or no physical meaning. This waslikely the case when a fifth-degree polynomial was selectedas the optimal transform for the plasma basis metabolite ILE(Table 1). In light of these limitations, future studies couldbenefit from (1) improved tools for high-throughput meas-urements of metabolites in complex biological fluids (Fiehnet al., 2000b; Griffinet al., 2002) and (2) improved know-ledge of metabolic pathways, so that more comprehensivedatasets, including derivative quantities like metabolic fluxes,are routinely available for systematic analysis.

ACKNOWLEDGEMENTSThe authors would like to thank Dr Albert K. Lee for use-ful discussions. This work was supported by NIH grant no.GM58125, the Shriners Hospitals for Children and the TuftsUniversity Faculty Fund.

REFERENCESAlaiya,A.A., Franzen,B. Auer,G. and Linder,S. (2000) Cancer pro-

teomics: from identification of novel markers to creation ofartificial learning models for tumor classification.Electrophoresis,21, 1210–1217.

Arai,K., Lee,K. Berthiaume,F., Tompkins,R.G. and Yarmush,M.L.(2001) Intrahepatic amino acid and glucose metabolism inaD-galactosamine-induced rat liver failure model.Hepatology,34, 360–371.

Bichsel,V.E., Liotta,L.A. and Petricoin,E.F.,III (2001) Cancer pro-teomics: from biomarker discovery to signal pathway profiling.Cancer J., 7, 69–78.

Breiman,L. and Friedman,J.H. (1985) Estimating optimal transform-ations for multiple-regression and correlation.J. Am. Stat. Assoc.,80, 580–598.

Brindle,J.T., Antti,H., Halmes,E., Tranter,G., Nicholson,J.K.,Bethell,H.W., Clarke,S., Schofield,P.M., McKilligin,E.,Mosedale,D.E. and Grainger,D.J. (2002) Rapid and noninvasivediagnosis of the presence and severity of coronary heart diseaseusing 1H-NMR-based metabonomics.Nat. Med., 8, 1439–1444.

Califano,A., Stolovitzky,G. and Tu,Y. (2000) Analysis of geneexpression microarrays for phenotype classification.Proc. Int.Conf. Intell. Syst. Mol. Biol., 8, 75–85.

Coen,M., Lenz,E.M., Nicholson,J.K., Wilson,J.D., Pognan,F. andLindon,J.C. (2003) An integrated metabonomic investigation ofacetaminophen toxicity in the mouse using NMR spectroscopy.Chem. Res. Toxicol., 16, 295–303.

Cohen,S.A. and De Antonis,K.M. (1994) Applications of aminoacid derivatization with 6-aminoquinolyl-N-hydroxysuccinimidyl

968

by guest on Septem

ber 16, 2011bioinform

atics.oxfordjournals.orgD

ownloaded from

Page 11: Identification of optimal classification functions for biological sample and state discrimination from metabolic profiling data

Optimization for biological sample discrimination

carbamate. Analysis of feed grains, intravenous solutions andglycoproteins.J. Chromatogr A, 661, 25–34.

De Oliveira,J.R., Rosa,J.L. Ambrosio,S. and Bartrons,R. (1992)Effect of galactosamine on hepatic carbohydrate metabolism:protective role of fructose 1,6-bisphosphate.Hepatology, 15,1147–1153.

Dillon,W.R. and Goldstein,M. (1984) Multivariate Analysis:Methods and Applications. New York, Wiley.

Fiehn,O., Kopka,J., Dormann,P., Altmann,T., Trethewey,R.N. andWillmitzer,L. (2000a) Metabolite profiling for plant functionalgenomics.Nat. Biotechnol., 18, 1157–1161.

Fiehn,O., Kopka,J. Trethewey,R.N. and Willmitzer,L. (2000b).Identification of uncommon plant metabolites based on calcula-tion of elemental compositions using gas chromatography andquadrupole mass spectrometry.Anal. Chem., 72, 3573–3580.

Fischer,J.E., Funovics,J.M., Aguirre,A., James,J.H., Keane,J.M.,Wesdorp,R.L., Yoshimura,N. and Westman,T. (1975) The roleof plasma amino acids in hepatic encephalopathy.Surgery, 78,276–290.

Friedman,J.H. and Stuetzle,W. (1982)Smoothing of Scatterplots.CA Stanford University, Stanford, Springfield, VA.

Glassbrook,N., Beecher,C. and Ryals,J. (2000) Metabolic profilingon the right path.Nat. Biotechnol., 18, 1142–1143.

Glassbrook,N. and Ryals,J. (2001) A systematic approach to bio-chemical profiling.Curr. Opin. Plant Biol., 4, 186–190.

Golub,T.R., Slonim,D.K., Tamayo,P., Huard,C., Gaasenbeek,M.,Mesirov,J.P., Coller,H., Loh,M.L., Downing,J.R., Caligiuri,M.A.,Bloomfield,C.D. and Lander,E.S. (1999) Molecular classificationof cancer: class discovery and class prediction by gene expressionmonitoring.Science, 286, 531–537.

Goodman,S.I., Helland,P., Stokke,O., Flatmark,A. and Jellum,E.(1977) Organic acid profiles of human tissue biopsies by capil-lary gas chromatography-mass spectrometry.J. Chromatogr., 142,497–503.

Griffin,J.L., Bollard,M., Nicholson,J.K. and Bhakoo,K. (2002)Spectral profiles of cultured neuronal and glial cells derived fromHRMAS 1H NMR spectroscopy.NMR Biomed., 15, 375–384.

Griffin,J.L., Williams,H.J., Sang,E., Clarke,K., Roe,C. andNicholson,J.K. (2001) Metabolic profiling of genetic disorders:a multitissue (1)H nuclear magnetic resonance spectroscopic andpattern recognition study into dystrophic tissue.Anal. Biochem.,293, 16–21.

Hedenfalk,I., Duggan,D., Chen,Y., Radmacher,M., Bittner,M.,Simon,R., Meltzer,P., Gustenson,B., Esteller,M.,Kallioniemi,O.P. et al. (2001) Gene-expression profiles inhereditary breast cancer.N. Engl. J. Med., 344, 539–548.

Holter,N.S., Mitra,M., Maritan,A., Cieplak,M., Banavar,J.R. andFedoroff,N.V. (2000) Fundamental patterns underlying geneexpression profiles: simplicity from complexity.Proc. Natl Acad.Sci. USA, 97, 8409–8414.

Hwang,D., Schmitt,W.A., Stephanopoulos,G. andStephanopoulos,G. (2002) Determination of minimum samplesize and discriminatory expression patterns in microarray data.Bioinformatics, 18, 1184–1193.

Jellum,E. (1977) Profiling of human body fluids in healthy and dis-eased states using gas chromatography and mass spectrometry,

with special reference to organic acids.J. Chromatogr., 143,427–462.

Kennedy,S. (2001) Proteomic profiling from human samples: thebody fluid alternative.Toxicol. Lett., 120, 379–384.

Lee,K., Berthiaume,F., Stephanopoulos,G.N., Yamush,D.M. andYarmush,M.L. (2000) Metabolic flux analysis of postburn hepatichypermetabolism.Metab. Eng., 2, 312–327.

Loomis,M.E. (1961) An enzymatic fluorometric method for thedetermination of lactic acid in serum.J. Lab. Clin. Med.,57, 966.

Nguyen,D.V. and Rocke,D.M. (2002) Tumor classification bypartial least squares using microarray gene expression data.Bioinformatics, 18, 39–50.

Nicholson,J.K., Connelly,J., Lindon,J.C. and Holmes,E. (2002)Metabonomics: a platform for studying drug toxicity and genefunction.Nat. Rev. Drug Discov., 1, 153–161.

Olsen,C. (1971) An enzymatic fluorimetric micromethod for thedetermination of acetoacetate, -hydroxybutyrate, pyruvate andlactate,Clin. Chim. Acta, 33, 293–300.

Ozeki,T., Kan,M., Yamagota,S. and Taoka,Y. (1982) Studies onsevere hepatic damage induced by galactosamine.Gastroenterol.Jpn., 17, 87–92.

Roessner,U., Luedemann,A., Brust,D., Fiehn,O., Linke,T.,Willmitzer,L. and Fernie,A. (2001) Metabolic profiling allowscomprehensive phenotyping of genetically or environmentallymodified plant systems.Plant Cell, 13, 11–29.

Rus,V., Atamas,S.P., Shustova,V., Luzina,I.G., Selaru,F.,Magder,L.S. and Via,C.S. (2002) Expression of cytokine-and chemokine-related genes in peripheral blood mononuclearcells from lupus patients by cDNA array.Clin. Immunol., 102,283–290.

Searcy,R.L., Gough,G.S., Korotzer,J.L. and Bergquist,L.M. (1961)Evaluation of a new technique for estimation of urea nitrogen inserum.,Am. J. Med. Technol., 27, 255–262.

Shito,M., Balis,U.J. Tompkins,R.G., Yarmush,M.L. and Toner,M.(2001) A fulminant hepatic failure model in the rat: involvementof interleukin-1beta and tumor necrosis factor-alpha.Dig. Dis.Sci., 46, 1700–1708.

Trethewey,R.N. (2001) Gene discovery via metabolic profiling.Curr.Opin. Biotechnol., 12, 135–138.

Trinder,P. (1969) Determination of blood glucose using an oxidase–peroxidase system with a non-carcinogenic chromogen.J. Clin.Pathol., 22, 158–161.

Usui,H., Ukida,M. and Nagashima,H. (1985) Metabolism ofbranched-chain amino acids in rats with acute hepatic failure:a tracer study using 15N-leucine.Acta Med. Okayama, 39,397–406.

Xu,W., Bak,S., Decker,A., Paquette,S.M., Feyereisen,R. andGalbraith,D.W. (2001) Microarray-based analysis of geneexpression in very large gene families: the cytochromeP450 gene superfamily ofArabidopsis thaliana. Gene, 272,61–74.

Yamaguchi,Y., Yu,Y.M., Zupke,C., Yarmush,D.M., Berthiaume,F.,Tompkins,R.G. and Yarmush,M.L. (1997) Effect of burn injury onglucose and nitrogen metabolism in the liver: preliminary studiesin a perfused liver system.Surgery, 121, 295–303.

969

by guest on Septem

ber 16, 2011bioinform

atics.oxfordjournals.orgD

ownloaded from