Top Banner
GASTROENTEROLOGY 2 0 0 2 ; 1 2 2 : 6 0 6 - 6 1 3 RAPID COMMUNICATIONS Artificial Neural Networks Distinguish Among Subtypes of Neoplastic Colorectal Lesions FLORIN M. SELARU,* YAN XU,* JING YIN,* TONG ZOU,* THOMAS C. LIU,* YURIKO MORI,* JOHN M. ABRAHAM,* FUMIAKI SATO,* SUNA WANG,* CHARLIE TWIGG,* ANDREEA OLARU,* VALENTINA SHUSTOVA,* ANATOLY LEYTIN,~ PRODROMOS HYTIROGLOU, ~ DAVID SHIBATA,* NOAM HARPAZ,* and STEPHEN J. MELTZER* *Department of Medicine, Division of Gastroenterology and Greenebaum Cancer Center, University of Maryland School of Medicine; Baltimore VA Hospital, Baltimore, Maryland; and ~The Mount Sinai School of Medicine, New York, New York Background&Aims: There is a subtle distinction be- tween sporadic colorectal adenomas and cancers (SAC) and inflammatory bowel disease (IBD)-associated dys- plasias and cancers. However, this distinction is clini- cally important because sporadic adenomas are usually managed by polypectomy alone, whereas IBD-related high-grade dysplasias mandate subtotal colectomy. The current study evaluated the ability of artificial neural networks (ANNs) based on complementary DNA (cDNA) microarray data to discriminate between these 2 types of colorectal lesions. Methods: We hybridized cDNA mi- croarrays, each containing 8064 cDNA clones, to RNAs derived from 39 colorectal neoplastic specimens. Hier- archical clustering was performed, and an ANN was constructed and trained on a set of 5 IBD-related dys- plasia or cancer (IBDNs) and 22 SACs. Results: Hierar- chical clustering based on all 8064 clones failed to correctly categorize the SACs and IBDNs. However, the ANN correctly diagnosed 12 of 12 blinded samples in a test set (3 IBDNs and 9 SACs). Furthermore, using an iterative process based on the computer programs GeneFinder, Cluster, and MATLAB, we reduced the num- ber of clones used for diagnosis from 8064 to 97. Even with this reduced clone set, the ANN retained its capac- ity for correct diagnosis. Moreover, cluster analysis per- formed with these 97 clones now separated the 2 types of lesions. Conclusions: Our results suggest that ANNs have the potential to discriminate among subtly differ- ent clinical entities, such as IBDNs and SACs, as well as to identify gene subsets having the power to make these diagnostic distinctions. grade. 1-1~ Despite extensive molecular, histopathologi- cal, and gross morphological studies of these 2 types of dysplasias, no test exists by which to reliably discrimi- nate between them) 2 The clinical implications of this distinction are profound. For example, a patient with IBD developing a sporadic adenoma may undergo an unnecessary and life-altering surgical procedure (total colectomy). Conversely, a patient with an IBD-related dysplasia or cancer (IBDN) mistakenly diagnosed as a sporadic adenoma has a higher risk of developing adeno- carcinoma. 13-16 Tremendous progress has been made toward elucidat- ing the molecular basis of IBDNs. There are important contrasts between the molecular events occurring in IBDN and those occurring in SACs. In both types of neoplasia, these events include proto-oncogene activa- tion, loss of tumor-suppressor gene function, and defec- tive DNA mismatch repair. However, ras proto-onco- gene mutation is common in SACs but rare in IBDNs.lV-= Similarly, p5 3 mutation is believed to occur late in SAC but early in IBDN. 23-2v Nevertheless, much remains to be learned regarding the molecular basis of IBD-associated neoplasia, and what makes it different from sporadic colorectal neoplasia. Complementary DNA (cDNA) microarray technolo- gies have enabled high-throughput determinations of gene expression levels in large numbers of biological samples. Microarray studies of acute leukemia, lym- phoma, breast cancer, and other cancers have been per- S Poradic colorectal adenomas and cancers (SAC) are diagnosed histologically, removed colonoscopically, and followed up with surveillance colonoscopy. In con- trast, flat inflammatory bowel disease (IBD)-dysplasias mandate subtotal colectomy when they are high- Abbreviations used in this paper: ANN, artificial neural network; IBDN, IBD-related dysplasia or cancer; L-FABP, liver fatty acid binding protein; PAF, platelet-activating factor; SAC, sporadic adenoma or carcinoma. © 2002 by the American Gastroenterological Association 0016-5085/02/$35.00 doi:10.1053/gast.2002.31904
8

Artificial neural networks distinguish among subtypes of neoplastic colorectal lesions

May 14, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Artificial neural networks distinguish among subtypes of neoplastic colorectal lesions

GASTROENTEROLOGY 2002;122:606-613

RAPID COMMUNICATIONS

Artificial Neural Networks Distinguish Among Subtypes of Neoplastic Colorectal Lesions

FLORIN M. SELARU,* YAN XU,* JING YIN,* TONG ZOU,* THOMAS C. LIU,* YURIKO MORI,*

JOHN M. ABRAHAM,* FUMIAKI SATO,* SUNA WANG,* CHARLIE TWIGG,* ANDREEA OLARU,*

VALENTINA SHUSTOVA,* ANATOLY LEYTIN,~ PRODROMOS HYTIROGLOU, ~ DAVID SHIBATA,*

NOAM HARPAZ,* and STEPHEN J. MELTZER* *Department of Medicine, Division of Gastroenterology and Greenebaum Cancer Center, University of Maryland School of Medicine; Baltimore VA Hospital, Baltimore, Maryland; and ~The Mount Sinai School of Medicine, New York, New York

Background&Aims: There is a subtle distinction be- tween sporadic colorectal adenomas and cancers (SAC) and inflammatory bowel disease (IBD)-associated dys- plasias and cancers. However, this distinction is clini- cally important because sporadic adenomas are usually managed by polypectomy alone, whereas IBD-related high-grade dysplasias mandate subtotal colectomy. The current study evaluated the ability of artificial neural networks (ANNs) based on complementary DNA (cDNA) microarray data to discriminate between these 2 types of colorectal lesions. Methods: We hybridized cDNA mi- croarrays, each containing 8064 cDNA clones, to RNAs derived from 39 colorectal neoplastic specimens. Hier- archical clustering was performed, and an ANN was constructed and trained on a set of 5 IBD-related dys- plasia or cancer (IBDNs) and 22 SACs. Results: Hierar- chical clustering based on all 8064 clones failed to correctly categorize the SACs and IBDNs. However, the ANN correctly diagnosed 12 of 12 blinded samples in a test set (3 IBDNs and 9 SACs). Furthermore, using an iterative process based on the computer programs GeneFinder, Cluster, and MATLAB, we reduced the num- ber of clones used for diagnosis from 8064 to 97. Even with this reduced clone set, the ANN retained its capac- ity for correct diagnosis. Moreover, cluster analysis per- formed with these 97 clones now separated the 2 types of lesions. Conclusions: Our results suggest that ANNs have the potential to discriminate among subtly differ- ent clinical entities, such as IBDNs and SACs, as well as to identify gene subsets having the power to make these diagnostic distinctions.

grade. 1-1~ Despite extensive molecular, histopathologi- cal, and gross morphological studies of these 2 types of dysplasias, no test exists by which to reliably discrimi- nate between them) 2 The clinical implications of this distinction are profound. For example, a patient with IBD developing a sporadic adenoma may undergo an unnecessary and life-altering surgical procedure (total colectomy). Conversely, a patient with an IBD-related dysplasia or cancer (IBDN) mistakenly diagnosed as a sporadic adenoma has a higher risk of developing adeno- carcinoma. 13-16

Tremendous progress has been made toward elucidat- ing the molecular basis of IBDNs. There are important contrasts between the molecular events occurring in IBDN and those occurring in SACs. In both types of neoplasia, these events include proto-oncogene activa- tion, loss of tumor-suppressor gene function, and defec- tive DNA mismatch repair. However, ras proto-onco- gene mutation is common in SACs but rare in IBDNs.lV-= Similarly, p5 3 mutation is believed to occur late in SAC but early in IBDN. 23-2v Nevertheless, much remains to be learned regarding the molecular basis of IBD-associated neoplasia, and what makes it different from sporadic colorectal neoplasia.

Complementary DNA (cDNA) microarray technolo- gies have enabled high-throughput determinations of gene expression levels in large numbers of biological samples. Microarray studies of acute leukemia, lym- phoma, breast cancer, and other cancers have been per-

S Poradic colorectal adenomas and cancers (SAC) are diagnosed histologically, removed colonoscopically,

and followed up with surveillance colonoscopy. In con- trast, flat inflammatory bowel disease (IBD)-dysplasias mandate subtotal colectomy when they are high-

Abbreviations used in this paper: ANN, artificial neural network; IBDN, IBD-related dysplasia or cancer; L-FABP, liver fatty acid binding protein; PAF, platelet-activating factor; SAC, sporadic adenoma or carcinoma.

© 2002 by the American Gastroenterological Association 0016-5085/02/$35.00

doi:10.1053/gast.2002.31904

Page 2: Artificial neural networks distinguish among subtypes of neoplastic colorectal lesions

March 2002 ANNs IN IBD AND SPORADIC COLONIC NEOPLASMS 607

formed. 28-34 There is now the promise of a modern

taxonomy based on comprehensive gene expression pro- files. 35 These comprehensive data offer the potential to improve disease classification. 32-34 Furthermore, they

may improve our understanding of disease processes and generate biomarkers by identifying genes involved in normal or abnormal biologic processes. 36 However, the

massive amount of data generated by cDNA microarrays

is difficult to process. Moreover, certain bioinformatics methods, such as clustering, are incapable of detecting subtle differences between biologic states) 6

Artificial neural networks (ANNs) are novel means for processing information, inspired by the way a mamma-

lian brain functions. ANNs are mathematical models composed of a large number of highly interconnected

processing elements analogous to neurons and are tied

together with weighted connections analogous to syn-

apses. One of the basic characteristics of ANNs is that they can learn through example. Learning occurs through

exposure, or training with a training set composed of pairs of input-output data. ANNs learn to associate each

of the inputs with the corresponding output, by modi-

fying the weights of the connections between its process- ing units (neurons). Based on the knowledge accumu-

lated during training, ANNs assign outputs (diagnoses) to new inputs (testing datasets). Thus, after training,

ANNs are able to identify patterns or make predictions

based on datasets never presented to them before. Figure 1 is a schematic illustration of a typical ANN.

ANNs can discover interrelationships within large sets of data. In complex biological systems, characterized by

an abundance of data, ANNs allow pattern recognition, which would otherwise prove difficult. For example, recently ANNs have been applied in diverse medical applications. Within cancer research alone, ANNs have

been applied to image processing, 3v outcome predic- tion, 38 treatment response prediction, 39 and diagnosis

and staging. 3v Finally, microarray technology provides

the type of abundant information, which lends itself to analysis by ANNs, because a typical experiment contain- ing 39 human specimens and 8000 genes will generate

312,000 data points (i.e., gene-expression ratios). We performed a proof-of-principle study, using a wide

variety of neoplastic lesions known to have developed either in the sporadic setting (SACs) or on the substra-

tum of IBD (IBDNs). Both polypoid and nonpolypoid, as well as dysplastic and cancerous lesions, were included in

each group to focus on the difference between IBD-

associated and sporadic neoplasias (rather than on the difference between polyps and cancers, or between flat

and polypoid adenomas). We devised a strategy to pro-

cess microarray-derived data based on a combination of classical statistical techniques and ANNs. ANNs and gene filtering algorithms may be useful in the diagnosis and classification of colorectal neoplastic lesions. More- over, this approach suggests a means to identify genes whose altered expression is most relevant to particular colorectal neoplastic lesions.

Materials and Methods

Tissues and Patients

Patients were enrolled sequentially during a 3-year period from the Baltimore VA Hospital (SACs) or the Mount Sinai School of Medicine (IBDNs). Patients with a presumed diagnosis of IBDN or SAC were enrolled in the study, with histologic confirmation. Tissues were obtained under protocols approved by the University of Maryland/Baltimore VA Hos- pital and Mount Sinai School of Medicine Institutional Review Boards. Tissues were immediately frozen on dry ice and stored under liquid nitrogen at -180°C until further use. Diagnoses of IBDN or SAC were confirmed by clinical diagnosis of underlying inflammatory bowel disease or lack thereof, using accepted clinical criteria for the diagnosis of IBD. 4° IBDNs came only from patients undergoing surgical resections for a known diagnosis of IBD, whereas SACs came only from pa- tients who did not have this diagnosis. Three patients contrib- uted more than 1 tissue specimen to the study (Table 1). A variety of neoplastic and preneoplastic lesions were included in each group to broaden the basis of comparison between IBDNs and SACs.

Microarray Experiments

Preparation of glass microarray slides, printing, probe labeling, microarray hybridization, and array scanning w e r e

performed according to standard protocols (http://www.mi- croarrays.org/protocols.html), cDNA clones were obtained from Research Genetics (Huntsville, AL), and prepared accord- ing to protocols obtained from the National Cancer Institute- Advanced Technology Center; 8064 clones were printed using 8 pins in a 32-pin print head (Majer Precision Engineering, Tempe, AZ) on a GeneMachines Omnigrid Arrayer (GeneMa- chines, Oxnard, CA). The printed slides were ultraviolet- crosslinked, posttreated with succinic anhydride to reduce background, and subjected to hybridization.

For hybridizations, total RNA was extracted from freshly frozen tissues by standard organic methods and amplified, as previously described. 41-4~ SAC or IBDN-derived RNAs w e r e

labeled with Cy5, while a reference probe RNA was labeled with Cy3 and hybridized to all slides. This reference probe was produced from a mixture containing equimolar aliquots of RNA from the cell lines Hct l l6 , HT29, Caco-2, Hctl5, HTB114, MCF-7, HeLa, and AGS. HTB114 was derived from a patient with leukemia, HeLa from a patient with cervical cancer, MCF-7 from a patient with breast cancer, and AGS from a gastric c~[ncer. The remaining 4 cell lines (Hctl l6,

Page 3: Artificial neural networks distinguish among subtypes of neoplastic colorectal lesions

608 SELARU ET AL. GASTROENTEROLOGY Vol. 122, No. 3

G~eS~ ~ Input layer

V~LI F(m)Wb a

Hidden Hidden O u ~ layer ~yer 1 layer 2 (Diagnosis)

(Microatray data)

Figure 1. Schematic representation of ANN. ANNs typically have an input layer, 1 or more hidden layers and an output layer (e.g., the diagnosis). During the learning process, the ANN is presented with a training set (e.g., cDNA microarray data) and is taught the correct diagnosis or output, which is encoded by numbers: e.g., in our study, IBDN was encoded by 0 and SAC by 1. Data from each ANN layer passes to the next one until it reaches the output. The ANN diagnosis is compared with the ideal diagnosis (0 or 1), an error is generated, and this error travels back through the ANN (backpropagation). The weights of the connections between the neurons are adjusted in an attempt to decrease the error generated at the output layer level. This iterative learning process occurs until the ANN accurately approxi- mates the training set. Finally, the ANN is tested against a test set, which has never been seen by the ANN.

i

Figure 3.

ANNs i,~ known pathologicaliBDNs diagnosis in SACs and[

Figure 2. Graphic representation of artificial neural network diagnosis. The Y-axis displays the number assigned to each diagnosis. Blue squares show the number assigned to the ideal (known pathologic) diagnosis; red circles display the ANN approximation. By convention, IBDN is coded with zero (0) and SAC is coded with one (1). The X-axis displays the 12 samples in the test set (3 IBDNs, 9 SACs). These samples were unknown to the ANN a priori and were not used to train the ANN. AJso on the X-axis are displayed the precise ideal and ANN approximation values that were displayed graphically on the Y-axis. This figure shows that the ANN accurately ap- proximated the known pathologic diagnosis in all 12 cases.

Page 4: Artificial neural networks distinguish among subtypes of neoplastic colorectal lesions

March 2002 ANNs IN IBD AND SPORADIC COLONIC NEOPLASMS 609

Figure 3. Cluster analysis of 39 colonic neoplasias, using 97 genes derived by a gene filtering protocol. As shown in the colored dendro- gram, this analysis correctly clustered IBDNs (the 8 red branches of dendrogram) separately from SACs (the remaining 31 green branch- es). The benign and malignant sporadic neoplasias did not form distinct clusters. This lack of divergence occurred because the ANN selected only genes important in distinguishing IBDNs from SACs, rather than in distinguishing among different grades of neoplasia. <

HT29, Caco-2, and Hct15) were derived from colorectal can- cers. These cell lines were chosen to represent a variety of cell types to garner a baseline (i.e., green or Cy3) expression level (1) for comparison by ratio with a red (Cy5) lesion-derived signal, and (2) as a process control to ensure that each clone was generating a detectable signal on the array. 31,32,34,44 RNA from each lesion was amplified, labeled, and hybridized to its own microarray slide once; repeat hybridizations were not per- formed, because preliminary validation studies indicated that variation among identical repeat experiments was < 10% (data not shown). The hybridized slides were scanned using a Ge- nePix 4000A dual-laser slide scanning system (Axon Instru- ments, Foster City, CA) at wavelengths corresponding to each probe's unique fluorescence (532 and 635 nm). Quantification of individual gene expression ratios (CyS:Cy3) was accom- plished using the program GenePix 3.0, provided with the microarray scanner (Axon, Foster City, CA).

D a t a Ana lys is

Primary data was log-transformed and inputted into GeneFinder, which selected clones with expression levels sig- nificantly different between IBDNs and SACs. Gene expres- sion in IBDNs was compared directly with that in SACs. GeneFinder software was used to select clones showing statis- tically significant differences in mean expression levels be- tween IBDNs and SACs (P = 0.001, Welch test).

We constructed an ANN using the software program Mat- Lab (MathWorks, Inc., Natick, MA). The ANN was based on the principle of FeedForward with Error Backpropagation. The net was designed with 2 hidden layers. The number of neurons in the input layer was equal to the number of clones used, whereas the ideal outputs were set at 0 for IBDNs and 1 for SACs. The ANN was then trained using a training set con- sisting of 5 IBDNs and 22 SACs. After training, the ANN was tested with the remaining data samples, consisting of 3 IBDNs and 9 SACs. Results were evaluated using regression analysis.

Hierarchical clustering was performed with the program Cluster (Stanford University, Stanford, CA). Average linkage clustering with centered correlation was used. TreeView gen- erated visual representations of these clusters.

Selective gene filtering also was performed with the software program Cluster. The standard deviation was calculated across all samples. Clones were then filtered by iteratively increasing stringency of this calculated standard deviation value. With each increase in value of the standard deviation, the number of clones in the analyzed set decreased and this new set was again analyzed with an ANN designed as mentioned previously.

ANN approximations were tested with regression analysis, comparing expected output (Targets, T) with ANN output following training (Network Approximation, A). An unpaired 2-sided Student t-test was also applied to evaluate the statis- tical difference between the net-defined IBDNs vs. SACs (i.e., 0 vs.1).

R e s u l t s

Thirty-nine colorectal neoplasias consisting of 8

IBDNs and 31 SACs were analyzed using cDNA mi-

croarrays, each containing 8064 clones. GeneFinder soft-

ware was used to select clones showing significantly

different mean expression levels between IBDNs and

SACs (P = 0.001). Using this algorithm, we reduced our

original dataset of 8064 clones to 1192.

Using these 1192 clones, we constructed an A N N and

trained it using data from 5 IBDNs and 22 SACs. The

net learned the training set very well, with network

outputs uniformly approaching ideal outputs. We then

tested the A N N on a test set of new samples that had

never been seen by the net. This test set comprised the

remaining 3 IBDNs and 9 SACs. In all 12 of these

samples, the correct diagnosis was made by the ANN.

Next, we used a stepwise reduction method to mini-

mize the number of clones used by the ANN. This

reduction was performed by sequentially increasing cut-

off values for standard deviation (SD) in the filtering

module of Cluster (see Methods). We selected only clones

with an SD across all samples higher than the specified

cutoff. After each reduction, we repeated the A N N train-

ing and testing process until we had reduced the number

of clones to 97. The complete list of these genes can be

found at http://microarray.umaryland.edu/, along with

results obtained from BLAST searches for each gene, and'

198 references regarding the genes on this list that have

been studied by previous authors. Using these 97 clones,

the A N N was still trainable. Moreover, the net accu-

rately diagnosed all 12 test samples (Figure 2).

However, decreasing the number of clones to < 9 7

resulted in unsuccessful training of the A N N because of

insufficient information.

The trained A N N was accurate in diagnosing the test

samples: correlation between expected output (Targets,

T) and A N N outputs after training (Network Approxi-

mation, A) yielded a regression factor of 0.999, indicat-

ing a high level of correlation. The P value for the

difference between IBDNs and SACs in the test set was

less than 0.0001 (unpaired 2-sided Student t-test).

As a further method of validation, we performed Clus-

ter analysis using only the 97 clones identified as men-

tioned previously. This analysis correctly clustered

Page 5: Artificial neural networks distinguish among subtypes of neoplastic colorectal lesions

610 SELARUETAL. GASTROENTEROLOGY Vol. 122, No. 3

Table 1. Detailed Clinical and Pathological Information About the Specimens Included in This Study

Size Dukes No. Sample name Sex Age Group ID Pathology Location (cm) Type stage

1 C882P M 57 SAC 1 TA L 0.8 sessile N/A 2 C933P M 85 SAC 2 TA R 0.8 pedunc. N/A 3 C942P M 64 SAC 3 TA L 1.1 pedunc. N/A 4 C980P M 69 SAC 4 TA R 4 pedunc. N/A 5 C986P M 48 SAC 5 TA L 0.7 pedunc. N/A 6 C992P M 50 SAC 6 TA L 1.5 pedunc. N/A 7 C1008P M 51 SAC 7 TA L 2 pedunc. N/A 8 C1011P M 67 SAC 8 TA R 0.2 sessile N/A 9 C1027P M 50 SAC 9 TA L 0.8 sessile N/A

10 C1034P M 76 SAC 10 TA R 1.5 sessile N/A 11 C1035P M 68 SAC 11 TA L 2 pedunc. N/A 12 C1045P M 77 SAC 12 TA L 1.5 pedunc. N/A 13 C1053P M 73 SAC 13 TA R 4 sessile N/A 14 C890P M 73 SAC 14 TA-MDADCA L 1.5 pedunc. Dukes A 15 C967P M 70 SAC 15 TA-HGD L 0.5 pedunc. N/A 16 C987P M 72 SAC 16 TVA-HGD R 5 pedunc. N/A 17 C1028P M 67 SAC 17 TVA R 1 pedunc. N/A 18 C837 M 54 SAC 18 TA/TVA-ClS L 6 pedunc. N/A 19 C1006 M 76 SAC 19 TVA L 6.5 pedunc. N/A 20 C1022 M 66 SAC 20 TA-HGD L 9.8 pedunc. N/A 21 C1129 M 53 SAC 21 TVA R 5 pedunc. N/A 22 C976P M 70 SAC 22 HP L 0.8 sessile N/A 23 C642 M 70 SAC 23 MDADCA L 2 mass Dukes C 24 C1011 M 67 SAC 24 MDADCA R 7 mass Dukes B2 25 C1026 M 63 SAC 25 MD-PDADCA R 3.5 mass NK 26 C11331 M 72 SAC 26 muc. MDADCA L 4.3 mass NK 27 C12742 M 70 SAC 27 MDADCA R 7.5 mass Dukes C 28 C12884 F 54 SAC 28 muc. ADCA L 8 mass NK 29 C13348 F 60 SAC 29 muc. ADCA R 5.7 mass NK 30 C13380 M SAC 30 muc. ADCA R 8 mass Dukes D 31 C13431 M 74 SAC 31 muc. MDADCA L 5.1 mass Dukes B2 32 H31RC M 35 IBDN 1 WDADCA R 0.9 mass Dukes Cl 33 H31LC2 M 35 IBDN 2 HGD L 1.7 flat N/A 34 H33 M 70 IBDN 3 WD-MDADCA L 10.5 stricture Dukes B2 35 H34 M 29 IBDN 4 MDADCA L 3.5 polypoid Dukes B2 36 H29 F 54 IBDN 5 MDADCA NA 6 mass Dukes B2 37 H3OT1 F 36 IBDN 6 MD-PDADCA L 2 mass Dukes B2 38 H30T3 F 36 IBDN 7 PDADCA L NA mass Dukes B1 39 H35 F 77 IBDN 8 TVA-LGD R 3.5 polypoid N/A

NOTE. Specimens SAC 8 and SAC 24 were collected from a single patient at widely separated locations, as were H31RC (right colon) and H31LC (left colon). In addition, specimens H3OT1 and H30T3 were also collected from 1 patient at locations widely separated by intervening nonneoplastic mucosa. ADCA, adenocarcinoma; HGD, high-grade dysplasia; LGD, low-grade dysplasia; MDADCA, moderately differentiated ADCA; MD-PDADCA, moder- ately-to-poorly differentiated ADCA; muc., mucinous; PDADCA, poorly differentiated ADCA; pedunc., pedunculated; TA, tubular adenoma; TVA, tubulovillous adenoma; WDADCA, well-differentiated ADCA.

IBDNs and SACs in the combined 39-lesion set into separate groups (Figure 3).

Cluster analysis performed using information from all 8064 clones was unable to clearly separate these 2 enti- ties, probably because of excessive "noise" in the dataset. However, cluster analysis performed on the 1192 clones generated by the first filtering round was successful at separating the 2 types of lesions.

Of the 97 clones discriminating IBDNs from SACs, 89 were overexpressed in IBDNs, whereas 8 were over- expressed in SACs. Some of the clones that have been studied previously include aldehyde dehydrogenase 6,

acid phosphatase 1, family A group 5 purinergic recep- tor, adenosine A1 receptor, platelet-activating factor (PAF) acetylhydrolase, fibulin 4, and liver fatty acid binding protein. These genes are discussed in detail later.

Discussion

Sporadic colorectal adenomas, which are by defi- nition dysplastic lesions, may occur in the setting of IBD; however, these lesions should not constitute an indication for colectomy. 1-8 A dilemma arises when it becomes difficult or impossible to distinguish these ad-

Page 6: Artificial neural networks distinguish among subtypes of neoplastic colorectal lesions

March 2002 ANNs IN IBD AND SPORADIC COLONIC NEOPLASMS 611

enomas from idiopathic IBD-associated dysplasias, which are widely considered to indicate colectomy. 9-11 Dyspla- sias occurring in IBD may be flat or polypoid; when polypoid, it may be difficult to distinguish between a sporadic adenoma arising in an IBD patient vs. a pol- ypoid area of dysplasia developing in and as a result of underlying IBD. Although several molecular differences between the 2 types of dysplasia have been observed, these molecular alterations are neither common nor dif- ferences consistent enough to provide the requisite level of discrimination), 6,45-49 Furthermore, the practice of

colonoscopic surveillance in ulcerative colitis and Crohn's colitis has been challenged because of its low sensitivity, as well as its low yield) °-54 Finally, when frank carci-

noma is diagnosed, treatment decisions based on tradi- tional histologic grading and Dukes' staging may not be appropriate to the biological aggressiveness of IBD-asso- ciated tumors, which may be greater than that of age- matched sporadic carcinomas of similar grade and stage. 13-16 Thus, a more reliable and comprehensive method of distinguishing these 2 types of dysplasia would be clinically welcomed, and a better way of dis- covering biomarkers of true cancer risk would be quite useful.

To our knowledge, this is the first report applying ANNs combined with cDNA microarrays to the analysis of human colonic neoplastic lesions. The data presented herein suggest that subtle categories of colorectal dys- plastic lesions can be distinguished using ANNs based on global gene expression data. In this respect, ANNs may be complementary to histologic or monomolecular characterization. Previous studies of IBD-related and sporadic colorectal neoplasias have pointed out important molecular and histologic differences between these 2 types of lesions. 55-5v Nevertheless, none of these differ- ences has proven consistently capable of discriminating between the 2 lesional subtypes.

Six known genes overexpressed in IBDNs relative to SACs have been studied previously. The product of al- dehyde dehydrogenase 1 belongs to the aldehyde dehy- drogenase family, other members of which have been implicated in colorectal carcinogenesis) 8 Acid phospha- tase 1 is overexpressed in neutrophils of patients with gastric, colorectal, and other carcinomas, possibly reflect- ing local inflammatory processes accompanying malig- nancy) 9 Similarly, the Family A group 5 purinergic receptor gene belongs to a family of purinergic receptor genes overexpressed in patients with IBD. 6° Another member of the same family, adenosine A1 receptor, was reported as more abundant in colon cancers than in

normal colon. 61 Furthermore, PAF acetylhydrolase is a phospholipid mediator with potent biological effects on the digestive tract that was overexpressed in IBDNs. This gene product was higher in stools of patients with active Crohn's disease than in subjects with irritable bowel syndrome or in patients with diarrhea accompa- nying malabsorption. PAF acetylhydrolase may also be involved in the onset and perpetuation of inflammatory symptoms in active Crohn's disease. 62 Finally, colon tu- mors showed approximately 2-7-fold increases in levels of fibulin 4 mRNA compared with normal colon63; and another member of the fibulin family, fibulin 1, was overexpressed in IBDNs compared with SACs in our

study. Eight of the 97 filtered clones were expressed at higher

levels in SACs than in IBDNs. One clone encoded the liver fatty acid binding protein (L-FABP). L-FABP has been hypothesized to be involved in fatty acid absorption and transport. 64 It has been detected in epithelial absorp- tive ceils of the duodenum, jejunum, ileum, and colon, but not in esophagus or stomach. In fact, the level of L-FABP in colon cancer specimens is reportedly similar to that in normal colon, but it is significantly decreased in ulcerative colitis tissues. 65

ANNs may represent an improvement over classical statistical methods, such as hierarchical clustering. Most published literature on cDNA microarrays has used clus- tering, which has certain Shortcomings. For example, it is an unsupervised method. However, biological samples represent complex states, and clustering may not be able to discriminate relevant characteristics (e.g., IBD or spo- radic) from irrelevant traits (e.g., age, ethnic, or sexual traits). Thus, in our study, Cluster could not group the 2 sets of neoplasias into distinct groups when it analyzed the full 8064-clone set. However, after filtering using GeneFinder and ANNs, the number of clones was nar- rowed to include the 1192 most relevant to distinguish- ing between IBDNs and SACs. Clustering was successful when performed on this filtered set. This improved ac- curacy may have resulted because "noise" (i.e., data ir- relevant to the difference between IBDNs and SACs) was eliminated by filtering.

In conclusion, our study shows the utility of ANNs in understanding biology, biomarker discovery, and diag- nosis. Ultimately, ANNs may lead to a comprehensive molecular taxonomy, which reliably and accurately dis- criminates among subtly different clinical subgroups. In addition to providing valuable biologic insights into disease, this taxonomy may also offer clinically valuable input into medical decision-making.

Page 7: Artificial neural networks distinguish among subtypes of neoplastic colorectal lesions

612 SELARU ET AL. GASTROENTEROLOGY Vol. 122, No. 3

References 1. Medlicott SA, Jewell LD, Price L, Fedorak RN, Sherbaniuk RW,

Urbanski SJ. Conservative management of small adenomata in ulcerative colitis. Am J Gastroenterol 1997;92: 2094-2098 .

2. Suzuki K, Mute T, Shinozaki M, Yokoyama T, Matsuda K, Masaki T. Differential diagnosis of dysplasia-associated lesion or mass and coincidental adenoma in ulcerative colitis. Dis Colon Rectum 1998;41:322-327.

3. Fogt F, Alsaigh N. Polypoid dysplasias in ulcerative colitis and sporadic adenomas: genetic approach to the differential diagno- sis (review). Oncol Rep 1999;6:721-725.

4. Friedman S, Rubin PH, Bodian C, Goldstein E, Harpaz N, Present DH. Screening and surveillance colonescopy in chronic Crohn's colitis. Gastroenterology 2001 ;120 :820-826 .

5. Engelsgjerd M, Farraye FA, Odze RD. Polypectomy may be ade- quate treatment for adenoma-like dysplastic lesions in chronic ulcerative colitis. Gastroenterology 1999 ;117 :1288-1294 [dis- cussion 1488-1491] .

6. Mueller E, Vieth M, Stolte M, Mueller J. The differentiation of true adenomas from colitis-associated dysplasia in ulcerative colitis: a comparative immunohistochemical study. Hum Pathol 1999; 30 :898-905 .

7. Odze RD. Adenomas and adenoma-like DALMs in chronic ulcer- ative colitis: a clinical, pathological, and molecular review. Am J Gastroenterol 1999;94 :1746-1750.

8. Fogt F, Urbanski S J, Sanders ME, Furth EE, Zimmerman RL, Deren J J, Noffsinger AE, Vortmeyer AO, Hartmann C J, Odze RL, Brown CA. Distinction between dysplasia-associated lesion or mass (DALM) and adenoma in patients with ulcerative colitis. Hum Pathol 2000;31 :288-291.

9. Lashner BA, Provencher KS, Bozdech JM, Brzezinski A. Worsening risk for the development of dysplasia or cancer in patients with chronic ulcerative colitis. Am J Gastroenterol 1995;90:377-380.

10. Riddell RH. Implications of a diagnosis of dysplasia in ulcerative colitis. J Gastroenterol 1995;30:25-29.

11. Axon T. Management of dysplasia in ulcerative colitis: is prophy- lactic cotectomy the preferred strategy? J Gastrointest Surg 1998;2:322-324.

12. Jain SK, Peppercorn MA. Inflammatory bowel disease and colon cancer: a review. Dig Dis 1997;15:243-252.

13. Lavery IC, Chiulli RA, Jagelman DG, Fazio VW, Weakley FL. Sur- vival with carcinoma arising in mucosal ulcerative colitis. Ann Surg 1982;195:508-512.

14. Smith C, Butler JA. Colorectal cancer in patients younger than 40 years of age. Dis Colon Rectum 1989;32 :843-846.

15. Papatheodoridis GV, Hamilton, M, Mistry PK, Davidson B, Relies K, Burroughs AK. Ulcerative colitis has an aggressive course after orthotopic liver transplantation for primary sclerosing cholangitis. Gut 1998 ;43 :639-644 .

16. Green S, Stock RG, Greenstein AJ. Rectal cancer and inflamma- tory bowel disease: natural history and implications for radiation therapy. Int J Radiat Oncol Biol Phys 1999;44 :835-840.

17. Bos JL, Fearon ER, Hamilton SR, Verlaan-de Vries M, van Boom JH, van der Eb A J, Vogelstein B. Prevalence of ras gene mutations in human colorectal cancers. Nature 1987;327:293-297.

18. Vogelstein B, Fearon ER, Hamilton SR, Kern SE, Preisinger AC, Leppert M, Nakamura Y, White R, Smits AM, Bos JL. Genetic alterations during colorectal-tumor development. N Engl J Med 1988;319:525-532.

19. Burmer GC, Levine DS, Kulander BG, Haggitt RC, Rubin CE, Rabinovitch PS. c-Ki-ras mutations in chronic ulcerative colitis and sporadic colon carcinoma. Gastroenterology 1990 ;99 :416- 420.

20. Meltzer S J, Mane SM, Wood PK, Resau JH, Newkirk C, Terzakis JA, Korelitz BI, Weinstein, WM, Needleman SW. Activation of c-Ki-ras in human gastrointestinal dysplasias determined by di-

rect sequencing of polymerase chain reaction products. Cancer Res 1990;50:3627-3630.

21. Bell SM, Kelly SA, Hoyle JA, Lewis FA, Taylor GR, Thompson H, Dixon MF, Quirke P. c-Ki-ras gene mutations in. dysplasia and carcinomas complicating ulcerative colitis. Br J Cancer 1991;64: 174-178.

22. Burmer GC, Rabinovitch PS, Loeb LA. Frequency and spectrum of c-Ki-ras mutations in human sporadic colon carcinoma, carcino- mas arising in ulcerative colitis, and pancreatic adenocarcinoma. Environ Health Perspect 1991;93:27-31.

23. Greenwald BD, Harpaz N, Yin J, Huang Y, Tong Y, Brown VL, McDaniel T, Newkirk C, Resau JH, Meltzer SJ. Loss of heterozy- gosity affecting the p53, Rb, and mcc/apc tumor suppressor gene loci in dysplastic and cancerous ulcerative colitis. Cancer Res 1992;52:741-745.

24. Yin, J, Harpaz N, Tong Y, Huang Y, Laurin J, Greenwald BD, Hontanosas M, Newkirk C, Meltzer SJ. p53 point mutations in dysplastic and cancerous ulcerative colitis lesions. Gastroenter- ology 1993;104:1633-1639.

25. Brentnall TA, Crispin DA, Rabinovitch PS, Haggitt RC, Rubin CE, Stevens AC, Burmer GC. Mutations in the p53 gene: an early marker of neoplastic progression in ulcerative colitis. Gastroen- terology 1994;107:369-378.

26. Harpaz N, Peck AL, Yin J, Fiel I, Hontanosas M, Tong TR, Laurin JN, Abraham JM, Greenwald, BD, Meltzer SJ. p53 protein expres- sion in ulcerative colitis-associated colorectal dysplasia and car- cinoma. Hum Pathol 1994;25 :1069-1074.

27. Kern SE, Redston M, Seymour AB, Caldas C, Powell SM, Kornacki S, Kinzler KW. Molecular genetic profiles of colitis-associated neoplasms. Gastroenterology 1994 ;107 :420-428 .

28. Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown PO, Herskowitz I. The transcriptional program of sporulation in bud- ding yeast [published erratum appears in Science 1998;282: 1421]. Science 1998;282:699-705.

29. Spellman PT, Sherlock G, Zhang MQ, lyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B. Comprehensive identifica- tion of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 1998;9: 3273-3297.

30. Eisen MB, Brown PO. DNA arrays for analysis of gene expression. Methods Enzymol 1999;303:179-205.

31. lyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, Trent JM, Staudt LM, Hudson J Jr, Boguski MS, Lashkari D, Shalon D, Botstein D, Brown PO. The transcriptional program in the re- sponse of human fibroblasts to serum [see comments]. Science 1999;283:83-87.

32. Perou CM, Jeffrey SS, van de Rijn M, Rees CA, Eisen MB, Ross DT, Pergamenschikov A, Williams CF, Zhu SX, Lee JC, Lashkari D, Shalon D, Brown PO, Botstein D. Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci U S A 1999;96:9212-9217.

33. Alizadeh AA, Eisen MB, Davis RE, Ma C, Losses IS, Rosenwald A, Boldrick JC, Saber H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J Jr, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Staudt LM, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO. Distinct types of diffuse large B-cell lymphoma identi- fied by gene expression profiling [see comments]. Nature 2000; 403:503-511.

34. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O, Pergamen- schikov A, Williams C, Zhu SX, Lonning PE, Borresen-Dale AL, Brown PO, Botstein D. Molecular portraits of human breast tu- mours. Nature 2000;406:747-752.

35. Bassett DE Jr, Eisen MB, Boguski MS. Gene expression informat- ics-it 's all in your mind. Nat Genet 1999;21:51-55.

36. Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F,

Page 8: Artificial neural networks distinguish among subtypes of neoplastic colorectal lesions

March 2002 ANNs IN IBD AND SPORADIC COLONIC NEOPLASMS 613

Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 2001;7 :673-679.

37. Biagiotti R, Desii C, Vanzi E, Gacci G. Predicting ovarian malig- nancy: application of artificial neural networks to transvaginal and color Doppler flow US. Radiology 1999 ;210 :399-403 .

38. Marchevsky AM, Patel S, Wiley K J, Stephenson MA, Gondo M, Brown RW, Yi ES, Benedict WF, Anton RC, Cagle PT. Artificial neural networks and logistic regression as tools for prediction of survival in patients with Stages land II non-small cell lung cancer. Mod Pathol 1998 ;11 :618-625 .

39. Wells DM, Niederer JA. Medical expert system approach using artificial neural networks for standardized treatment planning. Int J Radiat Oncol Biol Phys 1998;41:173-182.

40. Tedesco FJ. Differential diagnosis of ulcerative colitis and Crohn's ileocolitis and other specific inflammatory disease of the bowel. Med Clin North Am 1980;64:1173-1183.

41. Van Gelder RN, von Zastrow ME, Yool A, Dement WC, Barchas JD, Eberwine JH. Amplified RNA synthesized from limited quantities of heterogeneous cDNA. Proc Natl Acad Sci U S A 1990;87: 1663-1667.

42. Luo L, Salunga RC, Guo H, Bittner A, Joy KC, Galindo JE, Xiao H, Rogers KE, Wan JS, Jackson MR, Erlander MG. Gene expression profiles of laser-captured adjacent neuronal subtypes. Nat Med 1999;5:117-122.

43. Chomczynski P, Sacchi N. Single step extraction of RNA isolation. Anal Biochem 1987;162:156-159.

44. Alizadeh A, Eisen M, Botstein D, Brown PO, Staudt LM. Probing lymphocyte biology by genomic-scale gene expression analysis. J Clin Immunol 1998;18:373-379.

45. Tarmin L, Yin J, Harpaz N, Kozam M, Noordzij J, Antonio LB, Jiang HY, Chan O, Cymes K, Meltzer SJ. Adenomatous polyposis coli gene mutations in ulcerative colitis-associated dysplasias and cancers versus sporadic colon neoplasms. Cancer Res 1995;55: 2035-2038.

46. Fogt F, Vortmeyer AO, Stolte M, Mueller E, Mueller J, Noffsinger A, Poremba C, Zhuang Z. Loss of heterozygosity of the von Hippel Lindau gene locus in polypoid dysplasia but not flat dysplasia in ulcerative colitis or sporadic adenomas. Hum Pathol 1998;29: 961-964.

47. Fogt F, Vortmeyer AO, Goldman H, Giordano T J, Merino M J, Zhuang Z. Comparison of genetic alterations in colonic adenoma and ulcerative colitis-associated dysplasia and carcinoma. Hum Pathol 1998;29:131-136.

48. Walsh SV, Loda M, Torres CM, Antonioli D, Odze RD. P53 and beta catenin expression in chronic ulcerative colit is-associated polypoid dysplasia and sporadic adenomas: an immunohisto- chemical study. Am J Surg Pathol 1999;23:963-969.

49. Odze RD, Brown CA, Hartmann C J, Noffsinger AE, Fogt F. Genetic alterations in chronic ulcerative colitis-associated adenoma-like DALMs are similar to non-colitic sporadic adenomas. Am J Surg Pathol 2000;24 :1209-1216.

50. Nugent FW, Haggitt, RC, Gilpin PA. Cancer surveillance in ulcer-

ative colitis [see comments]. Gastroenterology 1991;100:1241- 1248.

51. Itzkowitz SH. Inflammatory bowel disease and cancer. Gastroen- terol Clin North Am 1997;26 :129-139.

52. Shapiro BD, Lashner BA. Cancer biology in ulcerative colitis and potential use in endoscopic surveillance. Gastrointest Endosc Clin North Am 1997;7 :453-468.

53. Snapper SB, Syngal S, Friedman LS. Ulcerative colitis and colon cancer: more controversy than clarity. Dig Dis 1998;16:81-87.

54. Solomon M J, Schnitzler M. Cancer and inflammatory bowel dis- ease: bias, epidemiology, surveillance, and treatment. World J Surg 1998;22:352-358.

55. Tytgat GN, Dhir V, Gopinath N. Endoscopic appearance of dys- plasia and cancer in inflammatory bowel disease. Eur J Cancer 1995;31A:1174-1177.

56. Pohl C, Hombach A, Kruis W. Chronic inflammatory bowel disease and cancer. Hepatogastroenterology 2000;47:57-70.

57. Pedersen G, Dalseg AM, Brynskov J, Bendtzen K. The role of cell adhesion molecules in inflammatory bowel disease and colorec- tal cancer. Ugeskr Laeger 1997;159:5077-5084.

58. Lindahl R. Aldehyde dehydrogenases and their role in carcinogen- esis. Crit Rev Biochem Mol Biol 1992;27:283-335.

59. Lisiewicz J, Moszczynski P. Acid phosphatase of neutrophils in patients with malignancies. Folia Haematol Int Mag Klin Morphol Blutforsch 1984;111:595-598.

60. Somers GR, Hammet FM, Trute L, Southey MC, Venter DJ. Ex- pression of the P2Y6 purinergic receptor in human T cells infil- trating inflammatory bowel disease. Lab Invest 1998;78:1375- 1383.

61. Khoo HE, Ho CL, Chhatwal V J, Chan ST, Ngoi SS, Moochhala SM. Differential expression of adenosine A1 receptors in colorectal cancer and related mucosa. Cancer Lett 1996;106:17-21.

62. Denizot Y, Chaussade S, Nathan N, Colombel JF, Bossant M J, Cherouki N, Benveniste J, Couturier D. PAF-acether and acetylhy- drolase in stool of patients with Crohn's disease. Dig Dis Sci 1992;37 :432-437.

63. Gallagher WM, Greene LM, Ryan MP, Sierra V, Berger A, Laurent- Puig P, Conseiller E. Human fibulin~4: analysis of its biosynthetic processing and mRNA expression in normal and tumour tissues. FEBS Lett 2001 ;489 :59-66 .

64. Woodford JK, Behnke WD, Schroeder F. Liver fatty acid binding protein enhances sterol transfer by membrane interaction. Mol Cell Biochem 1995;152:51-62.

65. Sakai Y. Quantitative measurement of liver fatty acid binding protein in human gastrointestinal tract. Nippon Shokakibyo Gak- kai Zasshi 1990;87 :2594-2604.

Received November 15, 2001. Accepted December 13, 2001. Address requests for reprints to: Stephen J. Meltzer, M.D., 8-009

BRB, 655 West Baltimore Street, Baltimore, Maryland 21201. e-mail: [email protected]; fax: (410) 706-1325.

Supported by NIH grants DK 47717, CA 95323, CA 85069, and CA 77057.

Drs. Selaru, Xu, Yin, and Zou contributed equally to this work.