Optimisation of cDNA Microarray Tumour Profiling and Molecular Analysis of Epithelial Ovarian Cancer

Optimisation of cDNA Microarray Tumour Profiling and Molecular Analysis

of Epithelial Ovarian Cancer

Ryan van Laar

Submitted in total fulfilment of the requirements of the degree of Doctor of Philosophy

June 2005

The Peter MacCallum Cancer Centre and The Department of Biochemistry and Molecular Biology

The University of Melbourne

Abstract The advent of microarray technology has allowed the study of diseases such as epithelial

ovarian cancer (EOC) to occur at an unprecedented level of molecular resolution.

EOC is fifth leading cause of female cancer death world wide. The prognosis of women

diagnosed with this disease is often extremely poor, partially due to the difficulty of

detection in its early and most treatable stages. It is hypothesised that gene expression

profiling can shed light on the molecular events responsible for EOC development and

progression. This information could one day be used to develop novel screening methods

and therapeutic approaches based on individual tumour profiling.

This thesis first describes the optimisation of several aspects of the microarray work flow

and demonstrates their impact on the sensitivity and robustness of cDNA microarray data.

An evaluation of reference RNA options was conducted, in which gene expression data

generated using either a pool of RNA sourced from a diverse range of cell lines, or from a

cohort of EOC specimens was compared. The cell line RNA was found to be the most

suitable choice for a large-scale tumour profiling study based on the diverse criteria

applied. A number of factors with the potential to impact on the spatial distribution of

gene expression are also described and a novel method for quantification of this type of

systematic bias is proposed.

The findings from these comparisons are then used to create and analyse two clinically

annotated dataset of EOC specimens. These data are interrogated to identify gene

expression patterns related to overall length of patient survival and the phenotypic

differences between the invasive and low malignant potential EOC subtypes. These

analyses generated several validated sets of differentially regulated genes, many of which

were clinically relevant or previously implicated in other cancer types. The molecular

signatures identified were technically and biologically validated before bioinformatic

analyses to identify the key biological processes and functional relationships they

represent.

Comparison of the gene expression signatures deduced for patient survival and serous

low malignant potential vs. invasive cancer to studies of similar and disparate cancer

types was carried out. The universality of the molecular events regulated by these genes

in order to mediate survival and/or the malignant potential of EOC was evaluated. A

significant relationship involving the altered expression of interacting calcium-dependant

cell adhesion molecules was found to be important for both aspects of this disease.

Declaration

This is to certify that

(i) the thesis comprises only my original work towards the PhD except where

indicated in the Preface*,

(ii) due acknowledgement has been made in the text to all other material used,

(iii) the thesis is less than 100,000 words in length, exclusive of tables, maps,

bibliographies and appendices.

Ryan van Laar

Preface

The work presented in this thesis is the result of a number of collaborations. Samples of

ovarian cancer were kindly provided by Dr Georgia Chenevix-Trench of the Royal

Brisbane Hospital and Dr Anna DeFazio of the Westmead Millennium Institute, Sydney.

This preparation and hybridisation of tumour material to cDNA microarrays used in this

study was carried out by Sophie Katsabanis and Dileepa Diyagama of the Peter

MacCallum Cancer Centre Microarray Facility, Melbourne.

Clinical and gene expression data representing nine primary tumour types used to

generate a signature of primary EOC was provided Dr Richard Tothill. An additional

dataset comprising gastric cancer gene expression profiles and associated follow up

information was provided by Dr Alex Boussioutas.

Tissue microarray construction and immunohistochemistry was carried out in

collaboration with Dr Melissa Robbie from St.Vincent’s Hospital, Melbourne and also Mr

Neil O’Callaghan and Dr Melanie Trivett, of the Peter MacCallum Cancer Centre

Pathology Department.

Acknowledgements

I am indebted to a large number of people for their invaluable assistance in the

completion of this thesis.

Firstly I would like to thank my primary supervisor, Professor David Bowtell, for

allowing me to complete this body of work under his guidance, in his excellent research

group and utilising the unparalleled framework of the Australian Ovarian Cancer Study.

I would also like to thank Andrew Holloway, for his involvement in the supervision of

this project and the mentoring, scientific or otherwise, I have received over the past six

rather eventful years. He is responsible for much of what I have learnt about cancer and

also for a great deal of the enjoyment, satisfaction and growth I have experienced

working at Peter Mac.

Other senior members of the Australian Ovarian Cancer Study (AOCS) including Georgia

Chenevix-Trench of the Royal Brisbane Hospital and Anna DeFazio of the Westmead

Millennium Institute, Sydney have welcomed me into the AOCS group and provided

valuable assistance and materials throughout.

Fundamental in my understanding of ovarian pathology has been Dr Melissa Robbie, who

worked extremely hard to review a large number of cases and provided much appreciated

assistance with the biological validation stages of this project.

I would also like to thank Nadia Traficante, Sian Fereday and Anna Tinker, who make a

great team and have been truly amazing people to interact with on a daily basis. Between

them they are responsible for much of the current, and no doubt future, success of the

AOCS.

Members of the Bowtell Lab, Microarray Core Facility and wider Peter Mac Research

Division have also played a significant role in the completion of this project. These

include Izi Haviv and Alex Bousioutas who between them have enough enthusiasm and

ideas for 100 people; Sophie Katsabanis, Dileepa Diyagama and Bianca Locandro, who

have been instrumental in creating a world-class microarray facility, and the many others

who make Peter Mac an outstanding place to work and study.

The ever-fashionable Linda Stevens deserves a special mention for her support and

friendship, which has also been one of the most enjoyable and reliable aspects of working

at Peter Mac.

Successful collaborations that resulted in high quality publications have arisen from my

interactions with Richard Tothill, Bedrich Eckhardt and Melissa Peart, whom I thank for

including me in their projects. An additional thank-you also to Melissa for her help at the

bench and also over the occasional, but much deserved Treasury Café latte.

Thank-you to my parents for giving me a good head-start in life and finally, to my closest

friends Dean Chesterman, Dane McManus, and Chris Sherman, for their laughter and

unfailing support I could not have done without.

Publications and presentations

The following publications arose out of collaborative work during this project:

Eckhardt, B. L., Parker, B. S., van Laar, R. K., Restall, C. M., Natoli, A. L., Tavaria, M.

D., Stanley, K. L., Sloan, E. K., Moseley, J. M., and Anderson, R. L. (2005). Genomic

analysis of a spontaneous model of breast cancer metastasis to bone reveals a role for the

extracellular matrix. Mol Cancer Res 3, 1-13.

Holloway, A. J., van Laar, R. K., Tothill, R. W., and Bowtell, D. D. (2002). Options

available--from start to finish--for obtaining data from DNA microarrays II. Nat Genet 32

Suppl, 481-489.

Peart, M. J., Smyth, G. K., van Laar, R. K., Bowtell, D. D., Richon, V. M., Marks, P. A.,

Holloway, A. J., and Johnstone, R. W. (2005). Identification and functional significance

of genes regulated by structurally different histone deacetylase inhibitors. Proc Natl Acad

Sci U S A 102, 3697-3702.

Tothill, R. W., Kowalczyk, A., Rischin, D., Bousioutas, A., Haviv, I., van Laar, R. K.,

Waring, P. M., Zalcberg, J., Ward, R., Biankin, A. V., et al. (2005). An expression-based

site of origin diagnostic method designed for clinical application to cancer of unknown

origin. Cancer Res 65, 4031-4040.

The following invited presentations were given based on this thesis:

February 2003: Bioinformatics tools for expression-based tumour classification

Bioinformatics Workshop, St. Vincent’s Hospital, Melbourne.

October 2004: Microarray Profiling of Low Malignant Potential & Invasive Ovarian,

Cancer, Familial Cancer 2004: Research and Practice - A combined meeting of kConFab

& Australian Ovarian Cancer Study (AOCS) & Family Cancer Clinics of Australia and

New Zealand, Couran Cove.

February 2005: Understanding invasive ovarian cancer by microarray analysis and

comparison with the low malignant potential phenotype. AACR Oncogenomics 2005,

San Diego, USA. AstraZenica Travelling Scholar award recipient.

Table of Contents

1. Literature Review....................................................................................................... 1 1.1. Overview .................................................................................................................... 1 1.2. Microarray technology and its impact on ovarian cancer research ............................ 1

1.2.1. Selection of an appropriate reference RNA for cDNA microarray analysis ... 7 1.2.2. The impact of microarray scanning hardware on gene expression data........ 11 1.2.3. Spatial bias in cDNA microarray data........................................................... 14

1.3. Ovarian cancer.......................................................................................................... 17 1.3.1. Clinical background ...................................................................................... 17 1.3.2. Histology and associated genetic aberrations................................................ 18 1.3.3. Current needs in ovarian cancer diagnosis and treatment ............................. 19 1.3.4. Molecular pathology of EOC and its relevance to patient prognosis ............ 22 1.3.5. The ovarian tumour marker CA-125 and EOC prognosis............................. 27 1.3.6. The use of DNA microarrays to discover novel biomarkers of EOC............ 29 1.3.7. Current status of microarray-based EOC prognostic signatures ................... 38

1.4. Low malignant potential ovarian cancer .................................................................. 42 1.4.1. Molecular background and clinical information ........................................... 42 1.4.2. Molecular characteristics of LMP tumours ................................................... 45 1.4.3. Mucinous EOC and tumours metastatic to the ovary .................................... 48 1.4.4. Existing microarray profiling studies of LMP ovarian cancer ...................... 49 1.4.5. Other microarray studies of invasive vs. non-invasive cancer subtypes ....... 52

1.5. Summary and goals of this thesis ............................................................................. 53 2. Materials & Methods ............................................................................................... 55

2.1. Ethical Issues............................................................................................................ 55 2.1.1. Structure of ethical governance ..................................................................... 55 2.1.2. Ethical use of human tissues ......................................................................... 55 2.1.3. Patient Identifiers used in this thesis ............................................................. 56 2.1.4. Protection of privacy ..................................................................................... 56 2.1.5. Ethical contingencies..................................................................................... 57

2.2. Pathology review and associated tumour classifications.......................................... 57 2.2.1. Assessment of relative percentage tumour content ....................................... 57 2.2.2. Residual disease ............................................................................................ 57 2.2.3. Tumour grade ................................................................................................ 58 2.2.4. Tumour stage................................................................................................. 58 2.2.5. Patient status.................................................................................................. 58

2.3. In-vitro methods ....................................................................................................... 58 2.3.1. Construction of cDNA microarrays............................................................... 59 2.3.2. Collection and processing of tumour samples............................................... 60 2.3.3. Construction of reference RNA pools ........................................................... 61 2.3.4. Target labelling.............................................................................................. 62 2.3.5. Slide hybridisation......................................................................................... 62 2.3.6. RT-PCR......................................................................................................... 62 2.3.7. Tissue microarray construction ..................................................................... 64 2.3.8. Immunohistochemistry .................................................................................. 66

2.4. In-silico methods ...................................................................................................... 66 2.4.1. Image capture and data extraction................................................................. 67 2.4.2. Microarray image analysis ............................................................................ 67 2.4.3. Normalisation of cDNA microarray data ...................................................... 70 2.4.4. Microarray data visualisation methods.......................................................... 74 2.4.5. Unsupervised identification of differential gene expression ......................... 76 2.4.6. Identification of genes differentially expression between tumour subtypes . 77

2.4.7. Machine-learning approaches for class prediction ........................................ 79 2.4.8. Class Prediction ............................................................................................. 80 2.4.9. Gene ontology analysis.................................................................................. 85 2.4.10. Quantification of IHC staining ...................................................................... 86

3. Optimisation of cDNA microarray profiling for large-scale tumour profiling studies ............................................................................................................................... 87

3.1. Introduction .............................................................................................................. 87 3.1.1. A method for quantification of spatial bias in cDNA microarray data.......... 87 3.1.2. Reference RNA options for large-scale cDNA microarray profiling studies 88 3.1.3. The impact of experimental replication on the robustness of cDNA microarray gene expression measurements ................................................................ 91 3.1.4. The impact of scanning hardware on cDNA microarray data quality ........... 92

3.2. Results ...................................................................................................................... 95 3.2.1. Develop a method for measuring the degree of spatial bias present on a cDNA microarray ....................................................................................................... 95 3.2.2. Evaluation of reference RNA options for a large-scale tumour profiling study 106 3.2.3. Analysis of cDNA microarray slide scanning technology on data quality ..122

3.3. Discussion .............................................................................................................. 134 3.3.1. The use of Moods Median Test to quantify spatial bias in cDNA microarray data 134 3.3.2. Evaluation of reference RNA options suitable for large-scale tumour profiling studies ........................................................................................................ 136 3.3.3. Microarray scanners and cDNA gene expression data quality .................... 141

3.4. General conclusions................................................................................................ 142 4. Gene expression analysis of epithelial ovarian cancer overall survival ............. 143

4.1. Introduction ............................................................................................................ 143 4.2. Results .................................................................................................................... 144

4.2.1. Case selection and pathology review aimed at ensuring suitability for arraying and outcome analysis.................................................................................. 144 4.2.2. A descriptive statistical analysis of the study cohort................................... 147 4.2.3. Processing of microarray data prior to investigation molecular signatures of patient survival ......................................................................................................... 152 4.2.4. Identification of genes differentially expressed between patient survival groups 152 4.2.5. Experimentation with normalisation algorithms to improve detection of survival-related gene expression............................................................................... 171 4.2.6. RT-PCR validation: selection of genes with minimum 2-fold change in expression between patient survival groups ............................................................. 172 4.2.7. Analysis of published gene lists for predicting EOC prognosis .................. 180 4.2.8. Network and pathway analysis of genes differentially expressed between survival groups ......................................................................................................... 184

4.3. Discussion .............................................................................................................. 188 4.3.1. The impact of residual disease and distribution of survival times on the identification of genes related to length of survival.................................................. 188 4.3.2. EOC heterogeneity and its impact on the success of genomic analyses...... 190 4.3.3. Attempts to identify gene expression patterns with statistically significant relationships to length of survival............................................................................. 191 4.3.4. Biological and clinical relevance of genes identified .................................. 193 4.3.5. General conclusions..................................................................................... 196

5. Molecular analysis of invasive and low malignant potential ovarian tumours .197 5.1. Introduction ............................................................................................................ 197

5.2. Results .................................................................................................................... 201 5.2.1. Case selection and pathology review of suitable cases ............................... 201 5.2.2. Generation of cDNA microarray expression dataset................................... 204 5.2.3. Creation of a EOC gene expression signature for assistance in confirmation of primary ovarian origin.......................................................................................... 206 5.2.4. Application of the trained predictive algorithms to the invasive/LMP dataset 211 5.2.5. Gene expression based prediction of EOC histological subtype................. 219 5.2.6. Confirmation of LMP/invasive status with gene expression prediction analysis 221 5.2.7. Identification of differentially expressed genes between serous LMP and serous invasive EOC ................................................................................................ 222 5.2.8. Molecular pathway analysis of the invasive and LMP EOC gene expression signature 229 5.2.9. Validation of selected differentially expressed genes with RT-PCR .......... 240 5.2.10. Biological validation of the LMP/invasive expression signature................ 249

5.3. Discussion .............................................................................................................. 256 5.3.1. Findings from this analysis and relevance to published studies of LMP or invasive EOC............................................................................................................ 256 5.3.2. Analysis of differentially expressed genes identified by multiple studies .. 259 5.3.3. Use of gene expression based predictive analysis to confirm specimen diagnosis and identify metastatic disease ................................................................. 261 5.3.4. Cell adhesion molecules and EOC malignancy........................................... 262 5.3.5. High throughput analysis of TMA IHC....................................................... 267

5.4. Summary and conclusions from chapter ................................................................ 267 6. Discussion & Conclusion........................................................................................ 269

6.1. Summary of major findings.................................................................................... 269 6.1.1. Optimisation of microarray technology for large-scale tumour profiling studies 269 6.1.2. Gene expression based prediction of patient survival ................................. 270 6.1.3. Molecular characterisation of ovarian LMP and invasive epithelial cancer 272 6.1.4. EOC and the differential expression of genes involved cell adhesion processes; a reoccurring theme................................................................................. 273

6.2. Future directions..................................................................................................... 276 6.2.1. Meta-analysis of gene expression datasets .................................................. 276 6.2.2. Extension of cDNA expression dataset with Affymetrix GeneChip profiling 276 6.2.3. Translation of findings to in vivo studies of gene function and the potential for clinical application.............................................................................................. 279 6.2.4. Conclusion................................................................................................... 279

7. Bibliography............................................................................................................ 281

Appendix A: FIGO staging of EOC................................................................................ 332 Appendix A: FIGO staging of EOC................................................................................ 332 Appendix B: Specimens of EOC included in TMA........................................................ 334 Appendix C: Details of pooled tumour and cell line reference RNAs............................ 338 Appendix D: MMT scores from the analysis of spatial bias on LOOCV prediction accuracy .......................................................................................................................... 339 Appendix E: Members of the ‘response to stimulus’ gene ontology .............................. 340

Appendix F: Reference RNA comparison: Predictions of histological subtype ............. 344 Appendix G: Genes with minimum two-fold mean expression differences between survival groups ................................................................................................................ 348 Appendix H: Higher-level gene ontologies represented by genes differentially expressed between survival groups.................................................................................................. 351 Appendix I: Samples used to generate predictive gene expression signature of primary EOC................................................................................................................................. 353 Appendix J: Output of prediction of primary ovarian origin for LMP and invasive EOC cohort............................................................................................................................... 356 Appendix K: Predictive genes expression signature of primary EOC ............................ 358 Appendix L: KEGG pathways significantly represented in gene expression signature of serous LMP and invasive EOC ....................................................................................... 365 Appendix M: Microsoft Access gene ontology filter ...................................................... 368 Appendix N: Visual basic script for batch export of IHC image histogram statistics..... 369 Appendix O: UniGene annotated genes included in thesis ............................................. 371 Appendix P: Genes differentially expressed between serous LMP and invasive EOC after excluding those involved in cell-cycle regulation and the immune response ................. 379 Appendix Q: Microarray images from Gilks et al study of LMP and invasive EOC...... 385

1

1. Literature Review

1.1. Overview This review will focus on how microarray technology has evolved and been applied to

address some of the needs of ovarian cancer research. It will also cover some of the

advances in the microarray work flow that have increased the robustness and accuracy of

cDNA microarray-generated gene expression data to the point where research findings

are beginning to be applied in the clinic to impact on disease diagnosis and treatment. The

current understanding of the molecular basis of primary epithelial ovarian cancer (EOC)

development and progression will be discussed, including examples of how microarray

analysis has been applied to determine molecular signatures associated with a range of

clinically-relevant aspects of the disease.

As with any new technology in its early stages of development, microarrays have suffered

from a range of teething problems. These include incorrect clone annotations, technical

biases introduced by standard laboratory practices and pitfalls associated with the

application of traditional experimental designs and methods of statistical analyses to data

of a structure and magnitude previously unfamiliar to many biomedical researchers.

Methods for addressing many of these issues are reviewed and the outstanding needs that

form the basis of this project are highlighted

1.2. Microarray technology and its impact on ovarian cancer research

As knowledge about the underlying molecular mechanisms for human cancers has

accumulated, the full extent of its complexity, cellular origins, interactions with non-

cancerous tissue, and other previously unconsidered aspects have become apparent

(Hanahan and Weinberg, 2000; Liotta and Petricoin, 2000). At the same time, advances in

the fields of laboratory robotics and desktop computing have enabled a rapid increase in

the rate at which information about the individual components of the human genome can

be generated, stored and exploited. As a result, discoveries based on molecular

information generated from high-throughput technologies such as microarrays are today

occurring faster than ever before (Ochs and Godwin, 2003).

Whilst not specifically designed for cancer research, microarray technology has been one

of the most significant advances in cancer research in recent years. The application of this

2

technology to cancer research has arisen from the recognition that cancer is primarily a

disease of the genes (Hanahan and Weinberg, 2000; Holloway et al., 2002). The field of

microarrays has undergone a rapid evolution, from nylon membrane arrays with fewer

than 100 unique genes (Chen et al., 1998) to the latest commercial “whole-genome”

single-chip oligonucleotide arrays which contain sequence-verified clones for every

known and purported gene in the human genome (Woo et al., 2004).

The two main types of gene expression microarrays are presently used for cancer

research; spotted glass slide arrays and in situ synthesised oligonucleotide arrays. Both

types are based on the concept of DNA fragments (‘probes’) of known identity positioned

at high density on a solid support. Glass slide cDNA microarrays can be produced with

equipment that is in the economic reach of many academic or smaller-scale research

facilities, whereas the specialised machinery required for in situ synthesised

oligonucleotide arrays limit their production to commercial settings (Singh-Gasson et al.,

1999). Commercial cDNA and oligonucleotide microarrays can be purchased from

companies such as Agilent Technologies and Affymetrix, respectively and are a common

alternative to in-house manufacturing (Bowtell, 1999; Holloway et al., 2002). A diagram

summarising they key differences in the chemical processes used by each array type in

order to measure gene expression is shown in Figure 1-1.

Although the array-to-array variability of spotted arrays, particularly those created at

smaller microarray facilities, has the potential to be quite large, the relative nature of the

gene expression measurements produced effectively controls for this source of variation

(Woo et al., 2004).

A key difference with one-colour platforms, such as the Affymetrix GeneChip, is that a

single biological sample is hybridised to the in-situ synthesised oligonucleotides present

on the glass substrate. The precision of the measurement is achieved by the minimisation

of array-to-array variability and highly control environment in which they are produced.

Success using either spotted cDNA or in-situ synthesised oligonucleotides microarrays

depends on tightly controlled array production and hybridisation methods because of the

intrinsic qualities of each platform type and the minute physical amounts of genetic

material actually being quantified (Lockhart et al., 1996).

3

Figure 1-1: Hybridisation properties and differences of cDNA and oligonucleotides microarrays. (A) For cDNA microarrays, three elements are involved in the generation of gene expression measurements. Firstly the arrays are prepared by depositing thousands of individual nanolitre amounts of concentrated PCR product, produced from cDNAs, onto a glass substrate in predefined grid pattern. Next fluorescently labelled cDNAs obtained from two RNA sources (usually a ‘test’ sample which is compared to a ‘reference’ sample) are competitively hybridised to the prepared substrate. The relative fluorescence intensity measured for each label, per spotted feature, is used to determine a gene expression ratio, in the form of test intensity divided by reference intensity. (B) Oligonucleotide in-situ synthesised microarrays, such as the Affymetrix GeneChip, rely on a direct hybridisation of labelled transcript from a sample of interest to up to 20 micro squares of 25-mer oligonucleotides for each gene present on the array. Each of these squares includes perfect and mismatch pairs for each probe or feature. The intensity from the mismatched probes is subtracted from the perfect matches and an average is determined. (Gibson, 2002)

A B

4

The amount of variation within a technology and the amount of agreement between the

presently available platforms are crucial issues that are still being addressed by the

microarray field (Baker et al., 2005). A number of studies have been carried out in which

data from different platforms has been analysed to determine the robustness of gene

expression measurements made with these technologies, but there is no clear consensus.

Some studies have shown a significant divergence exists across platforms (Kuo et al.,

2002; Rogojina et al., 2003; Tan et al., 2003), while others state that the level of

concordance is acceptable (Ishii et al., 2000; Yuen et al., 2002). To date no publications

have appeared in which data from spotted cDNA and in-situ synthesised oligonucleotide

microarray data as been analysed in parallel, possibly reflecting the divergence noted by

some studies.

As summarised in Figure 1-2, generating data from a cDNA microarray involves three

major steps, all of which can be carried out with assistance from the Microarray Core

Facility at the Peter MacCallum Cancer Centre (Melbourne Australia). These steps are:

(i) Preparation of ready-to-print cDNA probes and precise depositing onto glass

slides.

(ii) Extraction of RNA from test and reference biological specimens (e.g. tissue,

cell lines), reverse transcription, Cy3/Cy5 dye labelling and hybridisation of

the target to the printed slide.

(iii) Scanning of slide with a high-resolution imaging device at two laser

wavelengths.

(iv) Analysis of scanned image and quantification of the bound target as

numerical gene expression ratios.

Recently, a standard data format for recording the specific steps in a microarray

experiment has been proposed by a committee of microarray users and organisations and

has since been adopted by a large proportion of journals and publishers. The Minimal

Information About a Microarray Experiment (MIAME) standard describes a minimum set

of information that scientists are required to provide about gene expression data to ensure

that it can be easily interpreted and that results derived from its analysis can be

independently verified (Brazma et al., 2001). The introduction of this standard has been

successful in addressing some of the early problems with studies based on microarray

data whereby results could not be replicated due to the information required to create the

5

actual microarray slides or process the biological specimens not being provided with

processed findings

In addition to the laboratory-based processes of microarray fabrication and usage, data

management is an important part of microarray research. Approximately 15 different

measurements for each feature on a cDNA array can be generated depending on the

image analysis software used, describing foreground and pixel intensity, spot size and

shape, foreground and background variation and a range of quality measures. This can

result in almost 160,000 individual data points per 10.5k cDNA microarray hybridisation,

which presents data storage and manipulation challenges that must be met by the use of

complex relational databases, such as BASE (Saal et al., 2002).

6

Figure 1-2: Schematic diagram of the cDNA microarray workflow. The process can be viewed in three stages (I) Probe preparation: Thousands of cDNAs of known identity are prepared in large quantities and robotically printed in a grid structure onto a glass substrate. (ii) Target preparation: RNA from the tissue or cell line of interest is extracted, purified and labelled with either a Cy3 (red) or Cy5 (green) dye before being competitively hybridised to the printed microarray. (iii) Data analysis: Specific-wavelength lasers are used to excite the probes bound to the microarray surface and a high resolution TIFF image for each dye is created. Image analysis software is used to convert the image to numerical data which is then analysed in the form of gene expression ratios.

Cy3

Cy5

Probe preparation

Target preparation

Data analysis

7

1.2.1. Selection of an appropriate reference RNA for cDNA microarray analysis

The competitive hybridisation design of cDNA microarrays allows the researcher the

flexibility of choosing the reference RNA that best suits the experimental design. The

intensity measurements obtained from the amount of Cy3-labelled reference RNA bound

to each probe (or feature) on the array are used as the denominator value in the

calculation of the final expression ratios. Therefore the appropriate reference RNA is

essential as this has the potential to impact significantly on the entire expression profile

generated. For example, if a particular feature on the array is not bound by a labelled

reference target, no expression ratio can be generated, even if the probe is bound by the

Cy5-labelled sample material.

The most common options used in large-scale cDNA microarray experiments are

commercially available RNA stocks such as the Stratagene Human Reference RNA

(Stratagene, USA), genomic DNA (Gadgil et al., 2005), pooled RNA from all (or a

subset) of the samples actually being investigated (van 't Veer et al., 2002), or a ‘home

grown’ universal RNA produced from cell lines, such as the Stanford pooled 11 cell line

reference (Khan et al., 1998; Ross et al., 2000). While this decision has a major impact on

the final microarray data, few comparative studies have been carried out to determine the

impact, if any, of the type of RNA used (Novoradovskaya et al., 2004; Weil et al., 2002).

The concept of using a pool of cell-line derived RNA as a universal experimental

reference was first introduced by Ross et al (Ross et al., 2000) who combined an equal

mixture of RNA from 12 different cell lines to create a gene expression ‘baseline’ for a

microarray comparison of 60 different cell lines (known as the NCI 60 (Stinson et al.,

1992)). The pool was comprised of RNA extracted from a range of cell lines, known to

have a maximally diverse gene expression based on previously conducted two-

dimensional gel analyses (Khan et al., 1998). These were HL-60 (acute myeloid

leukemia) and K562 (chronic myeloid leukemia); NCI-H226 (non-small-cell-lung);

COLO205 (colon); SNB-19 (central nervous system); LOX-IMVI (melanoma); OVCAR-

3 and OVCAR–4 (ovarian); CAKI-1 (renal); PC-3 (prostate); and MCF7 and Hs578T

(breast).

By excluding those genes on the microarray without significant intensity readings in the

reference channel, 6,831 of the 9,703 total cDNA features were identified, indicating that

the reference pool successfully bound to 70.4% of the particular microarray used. Other

8

groups have reported over 90% array coverage from pooled-cell line universal references

(Bergstrom et al., 2002), however this figure is dependant on the type of array used and

method for calculating the number of successful reference channel hybridisations.

Reference RNA stocks generated from tumour cell lines have the advantage of being

scalable because of the unlimited growth potential of the cell lines used, however there

are concerns over batch-to-batch variations arising from the use of different passages of

cell as well as changes in gene expression patterns resulting from minor variation in

culture conditions (Holloway et al., 2002; Sterrenburg et al., 2002).

Interestingly, Yang et al (Yang et al., 2002a) determined that pooling of RNA from a

small number of tissue samples or cell lines with diverse gene expression profiles can be

superior to the use of more complex RNA mixes. It was hypothesised that while some

cell lines actively express more genes than others, the level at which each gene is

expressed can vary between individual lines. Thus by adding more cell lines to a pool,

those genes expressed at lower levels may be diluted to a level at which they are

undetectable to the microarray platform. Yang et al demonstrated that using a

combination of only three cell lines from dissimilar tissues gives similar array coverage to

the commercial Stratagene universal reference, composed of RNA isolated from ten

different lines (Stratagene, USA).

Genomic DNA is readily available, inexpensive, invariant over time and between

laboratories and represents all genes with a uniform signal rendering it a theoretically

useful reference for competitive hybridisation. Mouse genomic DNA has been

demonstrated to have an extremely high coverage of a 16k mouse microarray and out

performed the Stratagene Universal Mouse Reference RNA (Stratagene, USA), in this

regard (Williams et al., 2004). A benefit of using genomic material (or cDNA) is its

ability to identify low abundance genes that may be undetectable or unstable with the use

of RNA references. With newer arrays including more genes of relatively low abundance

expression levels, this may be an important factor in future evaluations of reference RNA

options.

The differences between the use of genomic DNA compared to pooled RNA was studied

by Kim et al (Kim et al., 2002a). The results from this comparison indicated that genomic

DNA was the inferior option on the basis of a decreased correlation seen in self-self

hybridisations. The accuracy of data obtained with the use of a pooled RNA reference

was comparable to that achieved with self-self hybridisation of a single sample of RNA,

as shown in Figure 1-3. Self-self hybridisations are carried out by labelling a stock of

9

RNA with both Cy3 and Cy5 and hybridising to a single microarray, resulting in

theoretically perfect 1:1 expression ratios for all genes detectably hybridised. The number

and identity of differentially expressed genes were concordant between the pooled RNA

arrays and the direct hybridisations, but varied substantially from the genomic-DNA

hybridised slides.

Sterrenburg et al reported a method for using the pooled cDNA products actually used to

print the microarray as a reference material (Sterrenburg et al., 2002). This method was

shown to yield excellent array coverage (>99%) allowing expression ratios to be

calculated for virtually every array feature although it is was not compared to other

reference types described in this section and would result in cross-experiment analyses

being restricted to data generated from the same microarray platform.

To date, no comprehensive analyses of reference RNA types for tumour profiling have

been published, particularly for aspects other than relative array coverage. Questions still

exist around the use of a project-specific pool of sample RNA versus a ‘universal’ cell

line reference for such tasks as the identification of discriminating genes between

histological subtypes of a given cancer type, predictive machine-learning analyses or

accuracy of any quality control features contained on the microarray.

10

Figure 1-3: Comparison of reference RNA options via self-self hybridisation. Scatter plots of self versus self hybridization intensitie) RNA s for (A) genomic DNA (gDNA), (B) A pool of RNA from 3 separate isolations, and (C from a single isolation. Pearson correlation coefficients (r) for the two channels are shown in each plot. (Kim et al., 2002a)

11

1.2.2. The impact of microarray scanning hardware on gene expression data

The microarray scanner is one of the most expensive and important pieces of equipment

in a cDNA microarray laboratory. By scanning hybridised microarray slides and

generating the high-resolution electronic images that are converted to numerical

expression data, the scanner is effectively the bridge between the in vitro and ‘in silico’ or

bioinformatic stages of an experiment. Due to rapidly expanding market for microarray

products since the technologies’ inception, many companies have introduced scanners

with increasingly sophisticated features. Furthermore, within each scanner type the

settings that control the laser power and photomultiplier tube (PMT) voltage can be either

varied by the operator or controlled by electronic feed-back systems, in response to the

characteristics of the particular slide being scanned (Holloway et al., 2002).

All microarray scanners have a limited range of feature intensity detection, outside of

which the measurements are unreliable, as described by Lyng et al (Lyng et al., 2004). At

the higher end of the spectrum (>50,000 pixel intensity in the Lyng study) saturation of

the detectors became a source of significant error. In recognition of this, the image

analyses carried out for this thesis contained a filter to exclude array features with three

percent or higher pixel saturation. To avoid reducing low-intensity features to an

undetectable level by reducing the overall laser power, Lyng et al suggest scanning each

microarray twice – once at a low PMT setting then again at a higher setting, followed by

the use of a novel algorithm for excluding faint or saturated spots respectively. Whilst this

approach may be suitable for smaller array experiments, the amount of data duplication

and extra image analysis that would be required is unfeasible for most larger-scale

projects.

Few direct comparisons of microarray scanning hardware have been published to date.

One such study is that by Ramdas et al (Ramdas et al., 2001b) in which three types of

scanners were compared, although the identity of each was not revealed. The main

differences between the scanners were summarised as follows Scanner A was a four-

laser-based imaging system that used PMT detectors and a proprietary dark-field

illumination to minimize background signal, Scanner B was a simultaneous dual laser

scanner with a large field depth of 60 µm while Scanner C used patented confocal laser

scanning system with the capability to automatically calibrate the PMT.

12

For the comparison of these scanners, a single image analysis package was used, as to

avoid introducing variation based on use of different image analyses algorithms. The

correlation between data generated by different scanners was in the range of r = 0.90 to

r=0.96, which was not significantly higher than the correlation obtained from scanning

one slide multiple times on a single machine (r=0.93). This indicates that variation

generated from the use of different scanners is equivalent to that generated from scanning

the same slide multiple times on a single machine. Furthermore, the most differentially

expressed genes, as assessed by a 3-fold change in expression, exhibited a 95%

agreement in identity between all three scanners. No gene expression quality control

measures were analysed in this study to determine if a significant difference existed in the

accuracy of data produced from these three scanners. Levels of spatial bias, variation in

background intensity or the overall dynamic range of the data are also important measures

of scanner performance that were not tested in this comparison. In addition, no

information about the normalisation algorithm used was given, making it difficult to

extend the findings to other datasets or laboratories. The authors state that their findings

indicate data from disparate microarray scanners can be interchanged and successfully

analysed, however the limitations of this comparison as described, should be addressed

before accepting these conclusions.

During the course of this project, the Peter MacCallum Cancer Centre (Peter Mac)

Microarray Facility acquired a new microarray scanner manufactured by Agilent

Technologies (USA). The Agilent Microarray Scanner BA was claimed to offer

substantial improvements in the quality of cDNA expression data when compared to

other scanners, through the inclusion of features such as ‘Sure Scan’ technology, whereby

the focal point of the lasers is dynamically maintained throughout the duration of the

scan. This is a point of difference compared to other scanners, such as the Packard

Scanarray 5000 (Packard Bioscience, USA) in which the scanning lasers are focused

before the beginning of the scan and the focal point maintained constant for the duration

of the scan. Despite claims about the benefits of such hardware advances reducing the

level of systematic noise in cDNA microarray data, the actual benefits appear not to have

been rigorously tested, outside of the manufactures own literature, an example of which is

shown in Figure 1-4.

13

Figure 1-4: Representation of background fluorescence intensity variation with and without dynamic auto-focus, a feature of the Agilent Microarray Scanner. A trend towards lower background intensity measurements in relation to the physical location of the feature can be observed, as reflected by darkened upper-right corner of the lower scanned image (Agilent Technology, USA).

14

1.2.3. Spatial bias in cDNA microarray data

In some microarray hybridisations, the values of the expression ratios are dependant on

their physical location on the array, more so than their true expression in the specimen of

interest. This is known as spatially-dependant gene expression and has been observed in

cDNA microarray data by several groups and identified as significant source of technical

error (Lee, 2004; Miles, 2001; Quackenbush, 2002; Yang et al., 2002b).

False colour representations of microarrays are an effective way of visualising these

patterns, as shown in Figure 1-5. This type of systematic noise can be caused by a range

of factors including small variations in the dimensions of printing tips used to spot

individual microarray features onto the glass substrate, inadequate distribution of the

labelled target during the hybridisation stage, variation in the thickness of the glass

substrate or a slight angle in the position of the hybridised slide during the scanning

process.

The spatial arrangement of probes on the array can also lead to the appearance of

spatially-dependant patterns of differential expression (Balazsi et al., 2003), however

some randomisation of probe types (by known gene function or sequence homology) is

usually incorporated into the assignment of probes throughout the array layout to avoid

this factor, as was the case with the Peter Mac 10.5k cDNA microarray used for this

thesis. Often spatial bias appears as a gradual effect from one corner of the array to that

diagonally opposite (Figure 1-5), however as shown in Figure 1-6, a spatially-dependant

variation in expression ratios can occur in other patterns, depending on its cause.

15

Figure 1-5: False-colour or ‘virtual array’ images representing different components of a microarray affected by spatial bias. (A) Probe intensities of the Cy5 channel, (B) Corresponding background intensities for the same channel. The gradual fading of intensities can be observed in the background-subtracted image in (C). (Lee, 2004)

Figure 1-6: Position effect or spatial bias in cDNA microarray data as visualised by a high-density graph of relative fold change vs. array position. This method of visualisation shows that several different patterns of spatial bias can be evident in a dataset, this particular slide generating a Cy5 bias in approximately the second quarter of the data set (Miles, 2001)

A B C

16

Two methods for addressing this issue that are commonly used by researchers are print-

tip lowess normalisation (Yang et al., 2002b) and the Statistical Normalisation of

Microarray Data (SNOMAD) method (Colantuoni et al., 2002). Both methods use the

robust local linear regression (lowess) curve fitting algorithm (Cleveland, 1979) to

identify a line-of-best-fit through non-linear data. The Yang et al method groups

individual gene expression measurements into ‘bins’ for normalisation according to the

printing tip used to spot the respective array feature onto the slide. This algorithm has the

benefit of allowing the identification of individual tips that may be releasing too much or

too little cDNA with each printing cycle. A line is fitted through the data for each print tip

using the lowess curve fitting method. Next, this curve is corrected to fit a linear 1:1

intensity line and the amount of correction required at each point of the line is applied to

the individual expression points, effectively correcting for any variation between printing

tips.

Because not all spatial irregularity is caused by variation in print-tip dimensions or

similar printing attributes, this method may not always be effective for minimising

spatially-dependant bias in cDNA microarray data. The SNOMAD method uses the

physical X-Y (i.e. Row X, Column Y) coordinates of each array measurement and adjusts

each according to a mean intensity that is determined locally across the microarray

surface (Colantuoni et al., 2002). This technique is a multi-step approach and first

involves normalising the array to its median expression ratio in order to assist in

visualising the spatial bias present. The main point of difference between SNOMAD and

print-tip normalisation is the two-dimensional local estimation of mean hybridisation

intensity that is used to normalise each array feature. Again, the lowess function is used to

estimate the local mean intensity as a function of its specific location within the array.

The area or ‘window’ of the array used in this estimation can be controlled by the user.

Because this method is not limited to grouping data points into predetermined categories

associated with only one of the cause of spatial bias such as printing tips, it is potentially

a more versatile approach to addressing this issue associated with cDNA microarrays.

However both print-tip and SNOMAD methods take into account the location of a

microarray feature, therefore are both effective for correcting for spatial bias in cDNA

expression data. Print-tip normalisation is available through the Bioconductor analysis

package (Gentleman et al., 2004) and also has recently been implemented through an

online interface: http://gepas.bioinfo.cnio.es (Herrero et al., 2003), similar to SNOMAD

17

(http://pevsnerlab.kennedykrieger.org/snomad.php). While normalisation methods such as

these described are effective for correcting spatial bias, it can be difficult to determine

when this type of normalisation is required and the extent to which the bias is reduced as

neither method described provides a quantification of the level of bias present.

As well as normalisation algorithms, various aspects of the laboratory-based stages of the

microarray workflow may be adjusted to minimise the introduction of spatial bias into

cDNA expression data. These include changes to hybridisation methods as new

techniques are proposed and validated (McQuain et al., 2004; Yuen et al., 2003) and

scanning equipment as previously discussed in section 1.2.2. Despite spatial bias being an

obvious problem for cDNA microarray experimentation, to date no method for

quantifying its extent has been described in the literature.

1.3. Ovarian cancer

1.3.1. Clinical background

Three main categories of ovarian cancer exist; epithelia, stromal and germ cell tumours,

each having a distinct aetiology and clinical course. Of all gynaecological cancers, EOC

is the most common and has the poorest prognosis, rendering it the fifth leading cause of

female cancer deaths world-wide (Ries LAG, 2004). In patients where the disease is

confined to the ovaries (FIGO Stage 1 – See Appendix A) surgery alone can achieve a

cure in up to 90% of cases. However for the 80% of patients who present with more

advanced disease (FIGO stages 2-4), combined therapy of debulking surgery and

chemotherapy is required (Agarwal and Kaye, 2003). Platinum agents such as cisplatin

and carboplatin are the most active and frequently used chemotherapeutic drugs used for

ovarian cancer. Recent randomised trials have suggested additional benefits of adding

taxanes to platinum drugs (Harper, 2002). The standard treatment for Australian women

with ovarian cancer is presently a combination of carboplatin and paclitaxel (Harries and

Gore, 2002a; Markman et al., 2001; Marsden et al., 2000; Piccart et al., 2000).

While survival times have significantly increased over the past 20 years, this has not

correlated with an equally significant improvement in the cure rate (Engel et al., 2002).

Development of drug resistance is a large factor in this statistic, with the majority of

women who are diagnosed with more advanced stages of ovarian cancer disease

eventually experiencing relapse following their initial treatment and ultimately dying

from drug resistant tumour (Agarwal and Kaye, 2003). Drug-resistant disease is observed

18

in more than 75% of cases four years from diagnosis and consequently the 5 year survival

rate in Australia is around 42% ; lower than the 63% mean combined 5-year survival rate

for all other female cancers sufferers between 1992 and 1997 (Australian Institute of

Health and Welfare and Australasian Association of Cancer Registries, 2001).

1.3.2. Histology and associated genetic aberrations

EOC is classified into five main histological categories according to the cellular

appearance of the tumour. These classes are serous, mucinous, endometrioid, clear cell

and transitional cell (the latter sometimes referred to as Brenner tumours) (World Health

Organization, 1999). The resemblance of the differentiation present in a tumour to other

tissues is the basis of the classifications. Serous tumours most closely resemble fallopian

tube epithelium, mucinous tumours the gastrointestinal tract or endocervical epithelium,

endometrioid tumours the proliferative endometrium, clear cell tumours the gestational

endometrium and transitional cell tumours the epithelium of the urinary tract. A range of

malignant behaviours is also observed between these groups. These are classified as (i)

benign, with simple non-stratified epithelium in which no cytologic atypia is present, (ii)

low malignant potential (LMP) in which epithelial proliferation featuring stratification

and tufting is observed (with varied mitotic activity and atypical nuclei) and finally (iii)

malignant carcinoma in which stromal invasion and cytologic atypia is observed

(Kurman, 2003).

Approximately 10% of all EOCs are associated with autosomal dominant genetic

predisposition, primarily inherited mutations in the BRCA11 or BRCA2 tumour suppressor

genes (Jazaeri et al., 2002; Lakhani et al., 2004; Malander et al., 2004). Mutations of

these genes are also seen in a small proportion (~5%) of sporadic ovarian cancers

(Matias-Guiu and Prat, 1998). Other genetic features tend to relate to specific types of

ovarian cancer. For example invasive serous and undifferentiated ovarian carcinomas are

characterized by mutations of the tumour-suppressing gene TP53 and accumulation of the

protein it encodes (Baekelandt et al., 1999). As well, the loss of genetic material from

chromosome 17, where the TP53 gene is located, is also common (Chenevix-Trench et

al., 1997). Over expression of the apoptosis suppressing gene BCL2 is reported in

endometrioid carcinomas (90% of cases) (Baekelandt et al., 1999). Mutations of the

KRAS oncogene are characteristic features of mucinous carcinomas (detected in 40-50%

1 The UniGene symbol is used as the primary gene identifier in this thesis. A full list of UniGene symbols and complete gene names can be found in Appendix O and also in a spreadsheet format on the CD-ROM attached to this document (file name: “RvL_Thesis_Genelist.xls”).

19

of cases), although less frequent in mucinous tumours of low malignant potential (LMP)

where they are detected in approximately 30% of cases (Cuatrecasas et al., 1998). The

LMP form of ovarian cancer shares many of the characteristics of its invasive

counterpart, however exhibits a markedly different clinical course and women diagnosed

with this form of the disease have a significantly more favourable prognosis (Kliman et

al., 1986; Trimble and Trimble, 2003), as discussed later in section 1.4.

Despite what is already known about the underlying molecular basis of EOC, a much

deeper understanding of the events leading to EOC development and progression is

needed. The high cure rate for those patients diagnosed early in the stages of this disease

is responsible for a keen interest in identifying the specific genes or proteins whose

expression or silencing indicate the first stages of tumorigenesis. Insight into the events

responsible for malignancy, particularly those required for a tumour to spread beyond the

confines of the ovary, may also lead to the discovery of novel therapeutic agents or

molecular targets for treating patients diagnosed with invasive or advanced stage disease.

1.3.3. Current needs in ovarian cancer diagnosis and treatment

Like many forms of human malignancies, when diagnosed early in its’ clinical course,

EOC is a disease that can be treated effectively and often cured using the currently

available range of surgical and chemotherapeutic strategies (Karlan, 1995; Smart and

Chu, 1992; Teneriello and Park, 1995). When the disease is identified before it has a

chance to invade into nearby tissues, or grow to a size where bowel obstruction becomes

a serious risk to the patients life, the 5-year survival rate is between 80 and 90%, with a

steady decline in this rate as the cancer progresses, as shown in Figure 1-5 (Society,

2005). Unfortunately, only approximately one fifth of all cases are detected before local

spread to other pelvic and abdominal structures has occurred (i.e. FIGO stage 1) (Agarwal

and Kaye, 2003), hence the often poor prognosis of most EOC patients.

The challenge of identifying EOC in its early stages, where prognosis is significantly

more favourable, is compounded by the vagueness of its most common symptoms. Many

of the symptoms are often interpreted by patients and health-care professionals as normal

events associated with childbearing, menopause or the aging process (Bankhead et al.;

Fitch et al., 2002). The most common symptoms experienced by women with EOC

according to large retrospective studies, are gastrointestinal discomfort, weight gain, pain

and swelling of the abdomen and indigestion and shortness of breath (Fitch et al., 2002).

20

In a large survey of US women diagnosed with ovarian cancer it was found that 95%

experienced a range of symptoms prior to their diagnosis, despite the common belief that

early stage EOC is largely asymptomatic (Bankhead et al.; Goff et al., 2000). Women

who ignored these indications were significantly more likely to be diagnosed with

advanced stage disease compared to those who acted upon them (p=0.002). The study

concluded that ovarian cancer may not be as asymptomatic as once thought (Chan et al.,

2003), however the most common symptoms are often not considered indicative of a

gynaecologic condition, sometimes resulting in delayed or incorrect diagnoses (Ferrell et

al., 2003). Some studies have suggested that education of patients and doctors about

considering EOC as a possible cause of the symptoms described, coupled with more

effective screening using existing methods (e.g.. pelvic examinations, CA-125 or

ultrasound) may be beneficial for increasing the frequency of early-stage diagnoses (Igoe,

1997). In spite of this, others have recently shown that advanced stage disease at

presentation and consequently a poor prognosis, is rarely attributable to a delay in

diagnosis attributable to misinterpreted symptoms (Lataifeh et al., 2005).

Because of the known relationship between advancing disease stage and poor prognosis,

there remains a pressing need to understand the molecular events underlying the

transition from one stage of EOC to the next. In particular, those genes controlling a

tumour’s ability to spread from the originating ovary to nearby tissues, as this phase of

disease progression is associated with the largest change in treatment course and a

significant decrease in patient survival (Clark et al., 2001; Friedlander, 1998). Advances

in our understanding of these processes will aid the development of tests designed to

identify the first stages of ovarian tumorigenesis and may also allow the development of

novel therapeutics targeted towards the specific gene products responsible for disease

progression.

Given the absence of any effective late-stage treatment of EOC, research into the

molecular events responsible for EOC development, particularly those mediating a

tumour’s drug resistance and invasive potential, offers the most promise for reducing the

impact of this disease on the community. While not in the scope of the aims of this thesis,

microarray technology is also being used to investigate the precise mechanisms of drug

resistance (Sakamoto et al., 2001)

21

0 10 20 30 40 50 60 70 80 90

100

Ia Ib Ic II IIIa IIIb IIIc IV

EOC Stage at diagnosis

5-ye

ar su

rviv

al r

ate

(%)

A

B

Figure 1-7: (A) 5-year survival rates of EOC patients by tumour grade at time of diagnosis (Society, 2005). (B) Diagram representing the region of the body to which the tumour has spread that corresponds to the four main FIGO stages of disease progression. Full descriptions of each stage can be found in Appendix 1.

22

1.3.4. Molecular pathology of EOC and its relevance to patient prognosis

Alterations in the oncogene TP53 and its downstream targets p21 (cell cycle inhibitor),

BAX (apoptosis agonist) and BCL-2 (apoptosis antagonist) are often observed in EOC,

however there is still debate concerning the prognostic ability of these changes. Schuyer

et al used a range of molecular and immunohistological methods to examine the

relationship of these genes with important clinico-pathological variables including

outcome and response to platinum-based chemotherapy drugs including cisplatin

(Schuyer et al., 2001). Interestingly, while TP53 mutations were present in up to 50% of

EOC’s, no correlation with increased rate of progression or death was observed, nor with

expression of p21 or BCL-2 in this study. Higher TP53 expression levels were correlated

with shorter overall survival rate (p=0.03). Factoring TP53 mutation and over-expression

resulted in a more significant correlation with overall survival than the expression data

alone (p=0.08), as observed in other studies (Wen et al., 1999). The other gene

downstream of TP53 investigated as part of this study, BAX, was however significantly

linked to progression-free and overall survival. Furthermore, patients with expression of

both BAX and BCL-2 exhibited longer survival times than those with tumours expressing

BAX alone. The authors concluded that high expression of BAX may therefore be a

potential independent prognostic indicator for this disease.

Expression of P21/WAF1, a tumour suppressor gene, is inversely correlated to TP53. It

has been associated with higher EOC grades (i.e. a less differentiated cellular structure)

and later FIGO stages (Anttila et al., 1999). DNA damaging agents that result in cell

cycle arrest of wild-type TP53 cells in the G1 phase are capable of inducing the

p21/WAF1 gene. Antilla et al used immunohistochemical profiling of over 300 ovarian

tumour specimens to explore the relationship between expression of p21/WAF1 and

patient outcome. Statistical analysis of expression levels and patient clinical information

revealed that high level expression of p21/WAF1 were associated with lower levels of

cellular proliferation. In a univariate approach, the gene appeared to be a negative

prognostic factor. Patients whose tumours had minimal or no expression appeared to have

a higher risk of tumour recurrence after treatment and shorter disease-free and overall

survival rates, particularly for those positive for TP53 also. Whilst not statistically

significant, there was also a trend of higher p21/WAF1 expression in patients that

exhibited a complete response to chemotherapy.

23

The gene KLK4 (Kallikrein 4) has been associated with disease progression and survival

time in EOC. KLK4 has been implicated in other hormonally regulated cancers including

those of the breast and prostate (Obiezu et al., 2001). In 147 EOC samples, expression of

this gene was detected by RT-PCR in 69 cases (55%). Furthermore, a significant

association with tumour grade and stage was observed. Overall the authors of this study

concluded that KLK4 expression was related to a more aggressive phenotype, which

generally translated to an increased risk of disease relapse and ultimately death. When

tested against chemotherapy response rates, a correlation between positive expression and

lack of treatment efficacy was detected. Interestingly, comparing the expression of KLK4

in grade 1 and 2 versus grade 3 tumours showed that positive expression in grade 1 and 2

cases indicated a 2.5-fold increase in relative risk of relapse yet was not significantly

predictive for relapse of the least differentiated grade 3 tumours (see Figure 1-8), which

may indicate the loss of expression with dedifferentiation status.

24

Figure 1-8: Variation in rate of tumour relapses between grade 1-2 (A) and grade 3 (B) tumours by KLK4 expression(Obiezu et al., 2001). The level of this gene appears to be related to survival in moderate and well differentiated tumours but not to the same extent in those of poor differentiation.

A

B

25

The Fanconi anemia-BRCA pathway has been implicated in the molecular changes

occurring in cisplatin-resistant EOC. According to research by Taniguchi et al,

interruption of this genetic pathway ultimately leads to the development and selection of

drug-resistant cancer cells (Taniguchi et al., 2003). This pathway is made up of six genes

(FANC-A, -C, -D2, -E, -F and -G) plus BRCA1 and BRCA2 and normally regulates

cellular reaction to cisplatin and other DNA cross-linking substances. The pathway gets

its name from Fanconi anemia, a rare autosomal recessive disease causing abnormal

development and predisposition to a wide range of tumours. The authors of this study

showed that cisplatin resistance in EOC cell lines could be attributed to initial

methylation-induced inactivation and subsequent demethylation of FANCF. A proposed

model of tumour progression based on these findings is shown in Figure 1.7.

In this model, methylation of the FANCF occurs during the early stages of tumour

progression. This results in chromosomal instability and accumulation of other tumour

causing mutations. The majority of cells in the growing tumour remain hypersensitive to

cisplatin, due to their underlying Fanconi pathway defect. As a result, cisplatin treatment

results in significant apoptosis of the drug-susceptible cell population. In rare cells,

demethylation of FANCF occurs, leading to reactivation of the pathway and selective

growth of these cells, eventually forming a cisplatin-resistant tumour mass. As shown in

Figure 1.7, the use of a small-molecule Fanconi-pathway inhibitor may be clinically

useful for resensitising these relapsed tumours.(Taniguchi et al., 2003)

As this Fanconi pathway analysis was carried out using EOC cell lines, validation using

other methods such as expression profiling of RNA extracted from human tissue should

be carried out. Microarray profiling of cell lines and primary ovarian tumour has revealed

that significant molecular differences exist between these two forms of the disease,

questioning the validity of cell line models for the study of human cancers without

confirming the observations made with validation studies using actual human tissue (Ross

and Perou, 2001). The extent of these differences between EOC cell lines and human

tumours has been described by Sawiris et al who used principal component analysis

(PCA), a data reduction technique to visualise complex gene expression patterns in three

dimensions, to describe the molecular differences identified by expression profiling, as

shown in Figure 1-10. In this analysis, the primary EOC tissue specimens appear as

related to primary colorectal tissue specimens as to ovarian cell lines (Sawiris et al.,

2002).

26

Fi

gure

1-9

: A

pro

pose

d m

odel

of

EO

C p

rogr

essi

on a

nd d

evel

opm

ent

of c

hem

o-re

sist

ance

. In

the

ear

ly s

tage

s of

tum

our

prog

ress

ion,

the

FAN

CF

gene

is

met

hyla

ted

whi

ch re

sults

in c

hrom

osom

al in

stab

ility

and

bui

ld u

p of

oth

er tu

mou

r cau

sing

mut

atio

ns.

The

maj

ority

of

cells

in

the

grow

ing

tum

our

rem

ain

hype

rsen

sitiv

e to

cis

plat

in, d

ue t

o th

eir

unde

rlyin

g Fa

ncon

i pa

thw

ay d

efec

t. A

s a

resu

lt, c

ispl

atin

tre

atm

ent

resu

lts i

n si

gnifi

cant

apo

ptos

is o

f th

e su

scep

tible

cel

l po

pula

tion.

In

rare

cel

ls,

dem

ethy

latio

n of

FA

NC

F oc

curs

, rea

ctiv

atin

g th

e pa

thw

ay a

nd le

adin

g to

sel

ectiv

e gr

owth

of

thes

e ce

lls,

resu

lting

in a

cis

plat

in re

sist

ant t

umou

r mas

s. A

s sh

own

in th

e m

odel

the

use

of a

sm

all-

mol

ecul

e Fa

ncon

i-pat

hway

inh

ibito

r m

ay b

e cl

inic

ally

use

ful

for

rese

nsiti

sing

the

se

rela

psed

tum

ours

. (Ta

nigu

chi e

t al.,

200

3)

Figu

re 1

-10:

Thr

ee d

imen

sion

al p

lot

of p

rinc

ipal

com

pone

nt

anal

ysis

of

mic

roar

ray

gene

exp

ress

ion

data

gen

erat

ed u

sing

E

OC

ce

ll lin

es,

EO

C

tissu

e sa

mpl

es

and

colo

n tu

mou

r sa

mpl

es.

Sign

ifica

nt d

iffer

ence

s ex

ist

betw

een

the

expr

essi

on

prof

iles

gene

rate

d fr

om c

ell

lines

and

hum

an t

issu

es c

an b

e ob

serv

ed(S

awiri

s et a

l., 2

002)

27

1.3.5. The ovarian tumour marker CA-125 and EOC prognosis

The currently established EOC prognostic factors are (Clark et al., 2001):

Age at diagnosis,

Histology,

Tumour stage and grade,

Volume of ascites,

Performance status according to the ZUBROD-ECOG-WHO scale (Oken et al.,

1982),

Findings at second-look laparotomy and

Debulk status

Along with the above prognosticators, the abundance of a cell-surface molecule called

CA-125 detected by a blood test is frequently used to assess the risk of ovarian

malignancy. To a lesser extent, this marker is also used identify disease stage and

histological subtype, although this is more often determined pathologically using staging

systems such as the FIGO scale (Benedet et al., 2000). The clinical usefulness of CA-125

was first identified in 1981 (Bast et al., 1981) and it remains one of the most commonly

measured indicators of the disease to this day (Agarwal and Kaye, 2005). The molecule is

expressed by over 80% of ovarian cancers and secreted into the blood stream, enabling its

detection through an un-invasive blood test. CA-125 levels are measured at regular

intervals throughout the course of a woman’s treatment and used to predict the likelihood

of a favourable response to chemotherapy and also the probability of disease recurrence

up to 60 days following treatment (Meyer and Rustin, 2000).

In using CA-125 levels to decide whether to continue, modify or stop therapy all together,

the definition for treatment response recently proposed by the Gynaecological Cancer

Intergroup (GCIG), is a 50% reduction in the level of the protein that is sustained for 28

days (Rustin, 2004; Rustin et al., 2004). This was determined by comparing patient

response rates according to CA-125 with response rates expected according to standard

criteria and calculation of the proportion of patients in whom the CA-125 prediction

agreed or differed with the response determined by standard criteria. The accuracy of the

28

definition for response according to CA-125 was also been determined by examination of

how accurate the CA-125–defined response was in predicting the activity of drugs in

phase II trials, compared with response rates obtained by standard criteria (Rustin et al.,

2000).

Despite the demonstrated link between CA-125 and the onset or progression of several

clinically important aspects of EOC, none of the indices have universal acceptance in

disease prognostication, despite extensive evaluation. The sample size of many of these

evaluating studies is a frequent limiting factor, along with the lack of prospective studies

to confirm the original observations. And finally, there is insufficient predictive ability of

the indices when applied to an individual patient to justify a change in management

(Cruickshank et al., 1992; Rustin, 2004).

According to the literature, over 230 papers have been published on potential prognostic

factors for EOC in the past 5 years (Agarwal and Kaye, 2005), however despite this

volume of research, no single factor has passed all the criteria necessary for acceptance

into research clinical practice for this disease (Agarwal and Kaye, 2003). The prognostic

value of the tumour suppressor gene TP53 has been studied extensively in EOC, although

its precise role in tumour response to DNA damage remains controversial. In a review of

published TP53 analyses it was found that 43% found a significant correlation between

TP53 status and clinical end point with respect to chemoresistance. However only six

studies met the minimum criteria established in the review, none of which found a

reliable correlation between drug-resistance end points (Hall et al., 2004). The criteria

used for evaluating the TP53 studies included variables such as sample size, inadequate

positive and negative controls or the use of more than one antibody to assess TP53 levels.

One explanation for the difficulty in finding truly prognostic markers of EOC may be the

univariate methods of analysis used to evaluate most novel candidates. This method does

not account for the impact of other established prognostic variables, known to be

important in determining patient outcome or chemotherapy response (e.g.. the amount of

residual disease remaining after surgery, patient age, etc) (Altman, 2001). Another reason

for the absence of a truly universally applicable EOC prognosticator may be the single

gene/protein nature of most studies to date, such of TP53 (Wen et al., 1999), ERBB2

(Meden and Kuhn, 1997) and MDR (Ikeda et al., 2003). EOC is known to be a complex

and heterogeneous disease and it therefore may require the simultaneous measurement

and analysis of multiple molecular markers and/or clinical variables to accurately

determine patient prognosis (Hernandez et al., 1984; Pieretti et al., 2002).

29

1.3.6. The use of DNA microarrays to discover novel biomarkers of EOC

Although microarray technology is still undergoing rapid development, early indications

are that it has the potential to impact significantly on diagnosis of diseases with

underlying molecular causes and also methods for assessing patient prognosis.

Identification of gene expression signatures associated with patient prognosis has been

achieved for a range of cancer types, including breast (van de Vijver et al., 2002) B-cell

lymphoma (Alizadeh et al., 2000), ovarian (Berchuck et al., 2004), prostate

(Dhanasekaran et al., 2001), renal cell (Vasselli et al., 2003) and oesophageal cancer

(Kihara et al., 2001). In breast cancer, the predictive gene expression signature described

by Van’t Veer et al, has been developed into a conventional clinical trial where treatment

decisions are being made based on the expression profile of the 70-genes represented in

the signature that has been demonstrated to correlate with either a good or poor prognosis

(Branca, 2003). In this case, women enrolled in the study (N>5000) will be assigned to

one of two treatment groups either based on their molecular profile or conventional

assessment by clinicians. Patient outcome between the ‘microarray’ vs. clinician assigned

groups will be compared throughout the study to determine if either method is superior

for identifying those women most at risk of recurrent disease and therefore requiring

more aggressive treatment.

The ability of microarrays to monitor and predict the response of a tumour to a specific

chemotherapy agent, a variable that impacts on prognosis, is currently being

demonstrated for the multiple myeloma drug Velcade (Jung et al., 2004; Mitsiades et al.,

2002). By analysing the Affymetrix GeneChip expression profiles of patients before and

after drug treatment, a predictive signature was devised for identifying whether a tumour

is likely to respond favourably to the drug. This study is one of the first Government

approved clinical trials in the USA to incorporate microarray-based expression profiling

in a trial protocol.

Several groups have attempted to use microarray profiling to discover novel biomarkers

for EOC, particularly markers of early stage disease. One of the first published studies

was carried out using Affymetrix HuGeneFL GeneChips, an early version of the presently

available GeneChips that contained approximately 6,000 oligonucleotide features (Welsh

et al., 2001). A novel “array of arrays” format was used whereby 49 separate microarrays,

separated by individual chambers, were hybridised in parallel on a single glass wafer.

Criteria for selection of genes as potential diagnostic markers were: (i) low expression in

30

normal tissue and high expression in neoplastic tissue and (ii) a clear and unambiguous

difference in expression between these two tissue types. In the analysis of the expression

data generated, certain samples in the cohort were found to be expressing high levels of

genes that are normally associated with stroma or infiltrating immune cells. These

tumours were confirmed pathologically to have low epithelial content and subsequently

excluded from the analysis, reducing the sample size significantly and therefore limiting

the statistical power of the study. The hybridisation intensity of each gene in the normal

and malignant specimens was analysed with three different methods for detection of

differential expression; (i) difference of means, (ii) fold change and (iii) unpaired t-test.

By ranking the genes according to each of these measures, the sum of each was able to be

used to calculate an overall estimate of differential expression, as reproduced in Figure

1-12.

Genes identified by this approach included several cell-proliferation genes (e.g. CCNB1,

CDC20, RAN), previously identified tumour specific genes (COX5B, PRSS8, PRAME),

stromal genes upregulated in normal tissues (CNN1, MLCK, MUC18), genes only

expressed in normal tissue (EGR1, IGFBP5, BTG2) and several ribosomal genes.

Important factors for determining the potential of a novel disease biomarker include the

copy number of the gene, particularly for mRNA-based detection, or the translation of the

gene into circulating protein product for a biomarker that can potentially be identified

from blood, urine or saliva samples.

31

Figure 1-11: Affymetrix GeneChip gene expression measurements of the 30 highest ranked potential EOC biomarkers by Welsh et al (Welsh et al., 2001). Red and blue squares correspond to mean expression level of each gene in malignant and normal ovarian tissue respectively. Green bars correspond to expression of the gene in a pool of six normal tissues for comparison. 95% confidence intervals shown. Studies such as these demonstrate the power of microarrays to identify large numbers of potential biomarker candidates.

32

The gene Prostasin (PRSS8) identified by Welsh et al, was also proposed as a potential

serum marker for the early detection of EOC in an independent cDNA-microarray based

study by Mok et al (Mok et al., 2001). Using a 2.4k commercial cDNA array

(MICROMAX human cDNA microarray system, manufactured by Perkin Elmer, USA)

this group identified those genes coding for over expressed proteins potentially suitable

for use as early detection markers. Thirty genes with expression ratios greater than five

were identified (EOC cell line to normal ovarian surface epithelial (OSE) cell). Included

in the output of this analysis was PKB, which encodes a protein marker that is already

used clinically for assessing renal cell carcinoma and lung cancer prognosis. PRSS8 had a

Cy3:Cy5 expression ratio of 170, indicting an extreme increase in its abundance relative

to normal OSE. This over expression was confirmed with RT-PCR and

immunohistochemistry was also carried out to determine its cellular localisation.

Antibody staining of EOC sections revealed high levels of serum prostasin in 64 cases of

EOC compared to 134 control cases (examples of staining shown in Figure 1-12), this

independent validation step being crucial in the process of evaluating a novel biomarker

(Statnikov et al., 2005). Importantly, variables that can potentially confound analyses of

novel biomarkers, such as patients’ age and specimen quality, were controlled for in the

statistical analysis of gene expression and disease state in this study. After factoring in

these clinical variables, a highly significant difference (P < 0.001) between EOC and

normal tissue expression of PRSS8 was still observed.

One caveat to this study was the exclusion of residual disease information in estimate of

significance, a potential oversight given the prognostic value of this variable (Hoskins et

al., 1992; Hoskins et al., 1994).

33

Figure 1-12: Immunohistochemistry validation of Prostasin (PRSS8), a novel serum marker for EOC identified by microarray analysis. Low prostasin expression in normal surface epithelial cells (A) and serous LMP tumour (B). Higher expression of the marker in a grade 3 EOC specimen is shown in (C) and no positive signal is observed for the same case in (D) for which a preimmune rabbit serum was used. S = stroma and the horizontal scale bar indicates a length of 50um. (Mok et al., 2001)

34

Another candidate marker from microarray profiling of EOC is Osteopontin (SPP1), one

of 30 candidate genes identified by Wong et al (Wong et al., 2001) from cell-line

experiments also using the MICROMAX platform. This gene was observed to be 150-

180-fold over expressed in the tumour derived cell lines relative to the seven cultured

normal OSE cell lines used as a reference. However, this study was largely a validation of

the microarray platform itself and only a rudimentary data analysis was used to identify

candidate genes. The product of the SPP1 gene is an acidic calcium-binding

glycophosphoprotein found in virtually all body fluids and in the components of the

extracellular matrix. It is thought to be involved in regulation of cell adhesion and also a

cytokine for CD44 and several integrins (Standal et al., 2004).

The expression of SSP1 was validated initially by Kim et al (Kim et al., 2002b) using

normal and cancerous cell lines, archival paraffin-embedded ovarian tissue as well as

fresh tissue and plasma from 144 patients treated for a pelvic mass at two locations in the

United States (Brakora et al., 2004; Schorge et al., 2004). RT-PCR on microdissected

tumour material revealed higher expression of this Osteopontin mRNA relative to normal

tissue, but the difference was not statistically significant. Immunohistochemical analysis

revealed histological-subtype specific pattern of staining. For example high cytoplasmic

staining in mucinous tumours compared to psammoma-body localised expression in the

serous subtype. Ovarian tumours of low malignant potential have also been noted to

express higher levels of osteopontin protein than their invasive counterparts, which

suggests a role for this molecule in regulation of tumour dissemination to other tissues

(Tiniakos et al., 1998). Serum testing in this study revealed clearer differences between

healthy controls and tumour patients with osteopontin ELISA with preoperative plasma

levels being significantly higher, for all histological subtypes tested.

Continuing the exploration of the potential diagnostic value of SSP1, genome-

comprehensive Affymetrix U95 GeneChips, were used to assess the expression of this

gene in 42 EOC and normal OSE samples, along with ten other potential tumour

markers(Lu et al., 2004). Of the eleven markers used, SSP1 was not selected by the

recursive descent partition analysis (Hastie et al., 2001), rather a formula based on the

expression of HE4, CA-125 and MUC1 was formulated which was able to discriminate

between 100% of the tumour and OSE samples tested when the expression levels of all

markers were found to be elevated. Another set of genes was identified from the available

data with high classification accuracy. Claudin 3 (CLDN3) expression by itself could

classify all serous, clear cell, endometrioid and one of eight mucinous samples from the

35

non-cancerous OSE. With the addition of vascular endothothelial growth factor (VEGF)

expression into the classifier, the remaining mucinous samples were correctly classified.

In a follow up IHC analysis of 158 EOC cases, it was demonstrated that a combination of

CLDN3, CA-125, MUC1 and VEGF staining were able to classify all tumour samples

from normal tissue. This study demonstrated the potential for microarrays to aid in the

development or improvement of cancer diagnostics. As the authors commented, one

limitation of such studies is the need for any potential markers to be present in serum, not

just expressed in at the mRNA level a given tissue type for a candidate to become a

clinically useful diagnostic tool.

Studies such as these describe the exhaustive process required to identify new biomarkers

for EOC using microarrays as the initial discovery platform and other more established

methods such as RT-PCR and IHC for validation.

Meta-analysis is an approach to data mining in which raw gene expression data from

separately conducted microarray experiments are combined to create one dataset with

increased statistical power. This method has been used in successful studies of prostate

cancer (Rhodes et al., 2002) and also to identify a transcriptional profile commonly

activated in a large range of cancer types (Rhodes et al., 2004)

By compiling a database of gene expression information from 14 different microarray

studies of EOC gene expression relative to OSE or other forms of non-malignant tissue,

Heinzelmann-Schwarz et al (Heinzelmann-Schwarz et al., 2004) identified three cell-

adhesion genes that were overexpressed in all histological subtypes tested. This approach,

as well as increasing the statistical power of the analysis, is an effective way of

controlling for variation between laboratory protocols, microarray platforms and data

analysis methods. Genes that are found to be differentially expressed between phenotypes

of interest in more than one independent study are more likely to be significant on a

population-level as opposed to those identified by one study alone.

By using a database created from these compiled studies in association with the authors

own unpublished dataset of EOC Affymetrix profiles, 69 genes differentially expressed

between EOC and OSE were found in common. From this list, cellular localisation,

minimal expression in normal ovarian tissue and the gene’s individual p-value for

differential expression were used to identify candidate tumour markers of EOC. Three

cell-adhesion markers were chosen for follow up analysis with immunohistochemistry;

discoidin domain receptor 1 (DDR1), claudin 3 (CLND3) and epithelial cell adhesion

36

molecule (EP-CAM). The relative levels of these molecules in surface epithelium and a

range of EOC histological subtypes is shown in Figure 1-13. Immunohistochemistry

revealed low expression of these candidate markers in normal surface epithelium and

significantly higher expression in the EOC subtypes profiled.

Whilst none of these potential markers was predictive of relapse-free survival, compared

to other variables tested such as age, debulk status and tumour stage, patients with lower

CLND3 expression exhibited a trend towards shorter survival (p=0.068).

These types of studies highlight the benefit to the search for novel biomarkers microarray

technology represents, but also the immense amount of work that is still required to find

suitable molecules. Whilst the scientific community now has the capacity to screen

thousands of genes in parallel due to the continual refinement of microarray technology,

much effort is still required to ensure any candidate genes are expressed specifically in

the tissue of interest and that the gene product is present in corresponding and detectable

levels in plasma. Making use of the growing databases of tissue expression profiles

(Ramaswamy et al., 2001; Su et al., 2001) are one method for assessing tissue specificity,

however issues of cross-platform and inter-laboratory variation still exist (King and

Sinha, 2001; Simon et al., 2003b).

As well as improved means of early detection and population screening, a need exists for

more accurate prognostic factors to assist in individualising the treatment EOC patients

receive. This need is pressing as most ovarian cancer patients are diagnosed with

advanced stage disease and present treatment options are only effective in a portion of

these cases (Harries and Gore, 2002a). Debate still exists in the medical community on

the most effective methods for determining whether a patient receives any form of

chemotherapy, the route of administration, dosage levels and also the most appropriate

surgical approach to take for maximum benefit (Agarwal and Kaye, 2003; Harries and

Gore, 2002a; Harries and Gore, 2002b; Marsden et al., 2000).

37

Figure 1-13: IHC expression of potential EOC markers. Units are mean percentage of cells expressing each marker. SOC: Serous ovarian cancer; MOC: Mucinous ovarian cancer; EnOC: Endometrioid ovarian cancer; ClCCA: Clear cell ovarian cancer. Black and white bars correspond to cytoplasmic and membrane expression respectively (Heinzelmann-Schwarz et al., 2004).

38

After initial surgery, some patients can be identified with a sufficiently favourable

histological assessment (stage 1) that indicates the chance of being cured by surgery alone

is sufficiently high as to avoid the personal and financial cost of chemotherapy. However

this category of EOC is still remarkably heterogeneous and substantial variation exists in

survival times, underscoring the need for reliable prognostic measures. A comprehensive

retrospective study by Vergote et al (Vergote et al., 2001) based on 1545 women with

stage 1 EOC identified the most important prognostic factors for probability of relapse

being degree of tumour differentiation, the presence of cyst rupture, bilateralism

(presence of tumour in both ovaries) and the age of the patient at diagnosis.

Another recently reported prognostic factor for risk of relapse amongst stage 1 patients is

DNA ploidy (Kristensen et al., 2003; Trope et al., 2000). Kristensen et al found that those

patients with polyploid and aneuploid tumours had 10-year relapse-free survival rates of

70% and 29% respectively. Those with diploid and tetraploid tumours had significantly

higher rates of 95% and 89% respectively. In a multivariate analysis including tumour

grade, FIGO stage and histological subtype variables, DNA ploidy was found to be the

strongest predictor of survival, while all variables were independently prognostic at

statistically significant levels. The investigators identified low, medium and high risk

relapse groups based on DNA ploidy and other variables and propose the routine use of

DNA ploidy analysis for the selection of early-stage patients likely to benefit from post-

surgical adjuvant chemotherapy.

1.3.7. Current status of microarray-based EOC prognostic signatures

Possibly due to the difficulty in accruing sufficient numbers of patients representing a

broad range of survival times, microarray-based identification of novel EOC prognostic

factors has lagged behind that of other cancer types such as breast (Huang et al., 2003;

Sorlie et al., 2001; van de Vijver et al., 2002) or B-cell lymphoma (Rosenwald et al.,

2002) (Lossos et al., 2004) (Yeoh et al., 2002) for example.

Based on a 68-patient cohort, Spentzos et al (Spentzos et al., 2004) identified a set of 115

genes referred to as the ‘Ovarian Cancer Prognostic Profile (OCPP)’. These genes were

narrowed down from over 12,000 contained on the Affymetrix U95A GeneChip and their

expression patterns grouped patients into classes with statistically significant differences

in survival times, based on a three-step process. Genes were first selected by the

comparison of expression data from patients at the extreme ends of the survival

39

distribution. Following this, those samples corresponding to patients who had survival

times in-between the extreme long and short term groups were classified using the

deduced OCCP. The authors noted a bias in debulking efficiency and patient age between

the favourable and unfavourable groups, with both of these variables being significantly

prognostic, based upon univariate analysis. This observation has been reported by a

number of other studies (Clark et al., 2001; Friedlander, 1998; Vergote et al., 2001).

Despite this, multivariate analysis of the OCCP, corrected for age and debulking status,

maintained its prognostic independence. Whilst an independent test set showed the OCCP

to be significantly prognostic for samples not used in the training process, the total

number of cases in the study (n=102) and small number of specimen collection sites

would need to be expanded before the OCCP could be confidently applied to EOC

patients for diagnostic purposes. Importantly, many of the genes in the list of 115 have

previously been implicated in processes such as invasion and disease progression. These

include:

Fibronectin (FN1); known to be integral in neovascularisation and metastasis,

immunosuppressive and apoptotic pathways, and in a large

immunohistochemistry study based in Germany was significantly correlated with

other established prognostic factors as well as overall patient survival (Franke et

al., 2003).

Plasminogen activator inhibitor 1 (PAI1); elevated expression of this gene and

its target Urokinase-type plasminogen activator (PLAU) were significantly

associated with disease prognosis and progression by quantitative ELISA of a

large cohort of patients by Konecny et al (Konecny et al., 2001). This enzyme

and inhibitor complex are thought to mediate a tumours ability to degrade

extracellular matrix and basement membranes, essential for invasion to occur

(Dano et al., 1985; Schmitt et al., 1997). They have also been implicated as

prognostic markers in breast (Duffy et al., 1988; Foekens et al., 1992), kidney

(Hofmann et al., 1996), colon (Ganesh et al., 1994), lung (Pedersen et al.,

1994)and gastrointestinal cancers (Nekarda et al., 1994)

Thrombospondin 2 (TSP2); the role of this gene in EOC is still debated, but

over expression has been associated with a more aggressive phenotype and

shorter survival (Kodama et al., 2001). It is a disulfide-linked glycoprotein that

controls cell-cell and cell-matrix adhesion and interaction, potently inhibiting

tumour growth and angiogenesis (Lopes et al., 2003).

40

Figure 1-14: Kaplan Meier analysis of EOC patients classified into prognosis groups on the basis of a 115-gene expression profile. (A) Prognostic model applied to a validation set of EOC (independent to the cohort used for creation of the original model) (B) Model applied to entire cohort. Highly significant differences between survival curves were observed. (Spentzos et al., 2004)

41

The potential of microarray technology to reveal important information about the

underlying causes of variation in EOC survival rates, as well as uncovering novel markers

of disease development and progression, was demonstrated by Lancaster et al (Lancaster

et al., 2004). By comparing 31 advanced stage serous EOCs obtained from patients with

either (i) less than two years or (ii) greater than seven years survival times, a list of

differentially expressed genes was obtained. A gene called Tumour Necrosis Factor-

related Apoptosis-inducing Ligand (TRAIL) was flagged for further validation after it was

found to be 7.4 fold higher expressed in ovarian cancer compared to normal epithelium. It

was also observed to be 1.5-fold higher in patients with longer survival times compared to

those with shorter survival times in the cohort investigated.

Using RT-PCR profiling in a follow up study involving 120 EOCs, the authors describe a

significant relationship between TRAIL expression and increased length of survival.

Patients who lived for more than five years had 2.2-fold higher expression of this gene

than those who died within 12 months of diagnosis (Lancaster et al., 2003). This gene is a

member of the “death ligands” and involved in regulation of apoptosis, by increasing the

chemosensitivity of tumours in which it is expressed at high levels. A follow up study has

since independently demonstrated the combination of TRAIL and chemotherapy lead to a

significant increase in apoptosis and growth inhibition of EOC cell lines further adding

weight to the potential clinical use of this molecule to improve the effectiveness of

current chemotherapeutics (Cuello et al., 2001).

In another study with clinically relevant findings, Hartmann et al (Hartmann et al., 2005)

identified a gene expression signature that discriminated between ovarian cancer patients

having either a short or long time to recurrence, following platinum-paclitaxel

combination chemotherapy. This type of platinum-based chemotherapy given after

surgery has the highest clinical benefit as defined by response rate, time to recurrence and

overall survival making it the current standard of care for EOC (Harries and Gore, 2002a;

McGuire et al., 1996). Gene expression profiling of a cohort of 79 patients with advanced

stage, high grade EOC was carried out using cDNA microarrays. A 14-gene signature

was identified that was able to classify patients to either an early (≤21 months) or late

(≥21 months) relapse category. This classification was carried out with an accuracy of

86%, as measured by cross validation of the dataset and the use of an independent test

cohort of patients not involved the determination of the 14-gene signature. This study

demonstrates that gene expression data may be able to identify those EOC patients at risk

of early disease relapse, making them candidates for more aggressive treatment

42

modalities or novel therapies. This analysis however, was limited by not considering

other prognostic variables such as residual disease levels or patient age. Also there was

very little overlap between those genes in the prognostic signature and those identified in

other studies of EOC, however as discussed later, this phenomenon has been observed for

other cancer types and reflects on the heterogeneity of EOC, method of gene selection

and also the sample size (Ein-Dor et al., 2005).

1.4. Low malignant potential ovarian cancer

1.4.1. Molecular background and clinical information

Cancers of the ovary are a heterogeneous class of malignancies (Hernandez et al., 1984).

Classification is primarily carried out according to cell type, the main subtypes described

in section 1.3.2. These labels refer to the histological appearance of the tumour as

observed by the pathologist. Each of these major categories is then classified further

according to the behaviour of the tumour – benign, malignant or low malignant potential

(LMP), the latter sometimes referred to in the past as ‘borderline’ due to it once being

thought of as a intermediate stage between benign and malignant disease and not an entity

of its own (World Health Organization, 1999).

The LMP subtype of EOC is of keen interest in the field of EOC research because it

shares several characteristics of the invasive counterpart, yet has a markedly different

clinical course. Several important characteristics define this subtype, introduced into the

FIGO grading system in 1971 (International Federation of Gynecology and Obstetrics,

1971), from the invasive form of the disease. These include:

Atypical cellular proliferation, but the lack of stromal invasion despite sharing

other malignant characteristics such as cellular stratification and nuclear atypia

Significantly better prognosis; the 5-year survival rate for women diagnosed with

stage 1 disease is in excess of 95% (compared to 30% for all EOC)

Younger age of patients at diagnosis; in a retrospective study the median age of

the 339 women with LMP tumours was found to be 39 years (Zanetta et al.,

2001).

43

The efficacy of conservative (fertility-sparing) treatment, as indicated by the

higher 5-year survival rate for this subtype of EOC, compared to the those

described previously.

LMP ovarian tumours account for 15% of all EOC diagnoses (Ries LAG, 2004). Debate

once existed as to whether these tumours are a separate class of tumour from invasive

EOC or represent a transitional stage from benign to invasive cancer (Kurman and

Trimble, 1993). There is a general consensus now that true LMP tumours rarely develop

invasive characteristics. Occasionally, LMP tumours of the mucinous or endometrioid

type are associated with invasive carcinomas in the same patient, unlike the serous type

which appears to rarely progress or be associated with invasive disease.

The overall percentage of true LMP (i.e. not metastases from a primary tumour in another

tissue) to invasive carcinoma conversions is extremely low. In the Zanetta et al study of

339 women with LMP disease the percentage of LMP tumours that progressed to invasive

disease was two percent (Zanetta et al., 2001)). Consequently this cancer type is no longer

referred to as ‘borderline’ EOC by many clinicians and scientists, a label that implies the

previously held belief that this form of EOC represents a transitional entity rather than a

clinically distinct tumour subtype.

44

A B

C D

Serous LMP Serous invasive

Mucinous LMP Mucinous invasive

Figure 1-15: Survival rates for serous LMP (A) and serous carcinomas (B), mucinous LMP (C) and mucinous invasive (D) EOC. For both histological subtypes the LMP phenotype has a markedly higher survival rate than invasive tumours. Data is shown grouped by disease stages with a greater spread of tumour corresponding to a shorter survival time (Sherman et al., 2004)

45

In the past, the treatment of LMP tumours with chemotherapy has been controversial,

reflecting the uncertainty of how this subtype related to the more common invasive form

of EOC. Because of the low mitotic index of LMP cells, some have argued chemotherapy

is theoretically incapable of producing the desired response (Kurman and Trimble, 1993),

while others contend a high response rate is observed despite this characteristic (Fort et

al., 1989). In current practice, treatment usually involves conservative surgery, followed

by observation (Trope et al., 2000). In a review of retrospective patient data Kurman and

Trimble (Kurman and Trimble, 1993) found that the deaths caused from complications

associated with chemotherapy or radiotherapy exceeded that caused by disease

progression to the invasive type. A number of studies have reported no significant

difference in the frequency of disease recurrence or progression between women who did

and did not receive postoperative chemotherapy, therefore today it is rarely used to treat

this form of EOC (Kliman et al., 1986; Nikrui, 1981; Trope et al., 1993). At the 10-year

survival point used by some studies, 97% of women diagnosed with serous LMP tumours

are alive, compared to only 30% of those with invasive disease at this same time point

(Sherman et al., 2004). This difference in survival times highlights the importance of

exploring the molecular differences between these two classes of EOC.

1.4.2. Molecular characteristics of LMP tumours

Several genetic mutations have been demonstrated to be differentially represented

between LMP and invasive ovarian tumours. For example mutations in TP53 and somatic

or germ-line BRCA1/2 abnormalities are commonly observed in invasive EOC but not in

LMP. Conversely point mutations in the KRAS & BRAF genes and microsatellite

instability are well documented traits of the LMP type (Russell and McCluggage, 2004;

Singer et al., 2003).

KRAS and BRAF are members of the RAS-RAF-MEK-ERK-MAP kinase pathway

(hereafter referred to as the RAS pathway), the primary function of which is to control

how a cell responds to a range of growth signals (Davies et al., 2002).The RAS pathway

is mutated in approximately 15% of all human cancers and is involved in regulating the

tumour-suppressing functions of the TP53 pathway . To determine the role of BRAF and

KRAS in EOC, Singer et al (Singer et al., 2003) screened for three common mutations in a

series of serous LMP and invasive tumours. 15 of 22 (62%) invasive micropapillary

serous carcinomas (MPSCs) were found to have mutations in either codon 599 of BRAF

or codons 12 and 13 of KRAS. This subtype displays a micropapillary architecture and

low grade nuclei and are thought to arise from atypical serous tumours, as opposed to

46

conventional serous carcinomas which are thought to develop de novo (Smith Sehdev et

al., 2003). 31 of 51 (68%) serous LMP tumours tested contained the same BRAF or KRAS

mutations, suggesting a shared pathway of carcinogenesis between MPSC and LMP

disease, involving these members of the RAS pathway. 72 high-grade invasive serous

carcinomas were also tested by Singer et al and neither type of mutation was detected.

Thus appearance of KRAS and BRAF mutations in only low grade serous carcinomas

suggests separate development pathways for low and high-grade serous EOC. No tumour

tested yielded mutations in both KRAS and BRAF genes.

Based on findings such as these, a model for the development of EOC was proposed by

Shih I.e. and Kurman (Shih Ie and Kurman, 2004), illustrated in Figure 1-16. The model

is comprised of two main pathways that can lead to EOC and attempts to resolve the

position of the LMP type within the spectrum of all ovarian malignancies. The two

pathways are:

Type I: Low grade neoplasms that arise in a linear fashion from LMP tumours.

This type is composed of low grade serous tumours, mucinous, clear-cell and

endometrioid carcinomas as well as malignant Brenner tumours.

Type II: High grade neoplasms that do not arise from a precursor lesions or

morphologically distinguishing transitional state. High grade serous carcinomas,

malignant mesodermal tumours (carcinosarcoma) and undifferentiated

carcinomas make up this group.

One of the key reasons for this division are the associated molecular changes of Type I

tumours rarely found in Type II, as described above. Other than the high frequency of

TP53 mutations, little is known about other possible genetic alterations present in Type II

tumours.

It is currently believed that advanced stage LMP tumours, defined by the detection of

nodal metastases or peritoneal implants (Rao et al., 2004), do not represent a precursor to

grade 1 serous invasive EOC, a hypothesis supported by studies such as Oritz et al (Ortiz

et al., 2001). This analysis focused on a group of eight patients who initially presented

with advanced stage serous LMP tumours and later developed grade 1 invasive serous

disease. Single-stranded conformational polymorphism-PCR was used to investigate

mutations in TP53 and KRAS. Differences in the mutations of the primary and secondary

tumours were observed in seven of the eight cases, suggesting the secondary tumours had

arisen independently of the previous LMP cancer.

47

Figure 1-16: Proposed two-pathway model of ovarian carcinoma development. Type 1 pathway has frequent BRAF/KRAS mutations, low cellular proliferation, a gradual increase in CIN and a 5-year survival rate of approximately 55%. Pathway II has a high frequency of TP53 mutation, higher cellular proliferation and CIN and a lower 5-year survival rate of approximately 30%. (Shih Ie and Kurman, 2004)

48

1.4.3. Mucinous EOC and tumours metastatic to the ovary

The relationship between mucinous LMP and mucinous invasive EOC is complicated by

the fact that invasive mucinous tumour found in the ovary is often metastatic from

another primary site (Ronnett et al., 2004). In one study, 40 of 52 mucinous tumours of

the ovary collected in a consecutive series of 124 ovarian malignancies were found to be

of metastatic origin (77%). Three of the remaining being atypical proliferative mucinous

tumours with microinvasion (Seidman et al., 2003). Overall, only three tumours from the

total cohort of 124 (2.4%) were classified as primary invasive mucinous EOC. Mucinous

EOC has historically been reported as representing up to 25% of all ovarian cancer

diagnoses, although recent advances in the interpretation of histological features,

immunohistochemistry and other molecular classification methods, suggest that the actual

proportion may be substantially lower. Agreeing with the Seidman et al figures, of

patients recruited for the Australian Ovarian Cancer Study (http://www.aocstudy.org/),

only 2% of patients recruited have a confirmed diagnosis of primary mucinous invasive

EOC (Unpublished data).

The most common origins of metastatic mucinous tumours to the ovary are the

gastrointestinal tract (45% of the Seidman et al metastatic cases), pancreatic (20%),

gynaecologic malignancies such as cervical or endometrium (18%), breast (7%) and

unknown primary site (10%). A rule for identifying primary mucinous carcinomas is

proposed; any unilateral EOC greater than or equal to 10cm in diameter is deemed to

have arisen from the ovary, with all others being metastatic. 90% of the tumours in this

cohort were correctly classified by this formula although no independent validation was

carried out (Seidman et al., 2003). Other histological parameters that are indicative of

metastases include surface implants (i.e. microscopic surface involvement by epithelial

cells and an infiltrative pattern of invasion (Lee and Young, 2003).

Recent studies suggest mucinous ovarian carcinomas may have a substantially better

prognosis than previously described, most likely due to metastatic tumours from the

pancreas and intestines being misclassified as primary ovarian tumours and wrongly

included in survival analyses (Lee and Scully, 2000; Ronnett et al., 1997). Findings that

most, if not all, ovarian mucinous cystic tumours associated with a grossly visible

accumulation of mucus in the pelvis or abdomen (pseudomyxoma peritonei) are

metastatic from the appendix (or sometimes the gastrointestinal tract), have prompted

49

calls to changes in the official grading systems such as FIGO, Union Internationale Contre le

Cancer (UICC) and American Joint Committee on Cancer (AJCC) systems.

Because of the controversy surrounding the classification and treatment of mucinous and

serous ovarian tumours (LMP and invasive), microarray analysis is a suitable approach

for gaining insight into the underlying molecular differences between these classes of

EOC (Benedet et al., 2000). Comparing gene expression profiles of these tumour types

may yield an increased understanding of the genes and molecular events responsible for a

true LMP tumour’s inability to invade the tissues by which it is surrounded. The

therapeutic manipulation of these processes may therefore have potential to greatly

reduce the mortality of invasive EOC.

Given the frequency of metastatic tumours being incorrectly diagnosed as primary

disease, accurate pathological diagnosis based on up to date guidelines of EOC

classification, is therefore essential for any microarray based study to avoid

contamination of data with non-ovarian gene expression data. Possibly reflecting the

difficulty of this task, only a small number of studies comparing gene expression data

generated from true LMP and invasive EOC have been published to date.

1.4.4. Existing microarray profiling studies of LMP ovarian cancer

One of the first groups to publish a microarray-based analysis of LMP and invasive

ovarian carcinoma was Lee et al (Lee et al., 2003), using the Atlas 1.2k cDNA array

platform (Clonetech, USA). 76 of the total 1176 (6.4%) nylon array features were

identified as differentially expressed between normal ovarian tissue (n=4), LMP (n=2)

and invasive EOC (n=4). A higher proportion of genes were upregulated in the invasive

tumours relative to the less malignant types, although little information was given about

the statistical method used to define differential expression or about selection and review

of the samples involved. Several of the differentially expressed genes observed in this

study had been previously implicated in the glucose/insulin pathway (e.g. S100A1,

ERBB3, HMG1), suggesting its importance in the progression of EOC. Both the small

sample size and type of microarray used are limiting factors in this study. However, a

number of biologically-relevant differentially expressed genes were identified and a

pathway describing their potential interactions with the well characterised glucose/insulin

pathway was deduced. A link was also made between molecular events in ovarian and

breast cancer based on a number of genes being previously implicated in breast cancer

50

studies. These include COUPTFII, one of the few down regulated genes in the invasive

EOC cases, which has reduced expression levels in 30% of breast cancer and has been

demonstrated to bind to the insulin promoter as well as influencing the expression of

CCND1 and p21.

Warrenfeltz et al (Warrenfeltz et al., 2004) used Affymetrix U95A GeneChips to profile

the expression of a small cohort (n=18) of ovarian tumours including two mucinous and

two serous LMP, with the goal of identifying genes that correlated with malignant

potential. A set of 163 genes (1.6% of reliably detected probe sets) was found to be

differentially expressed between the benign, LMP and invasive tumours compared. A

relationship between loss of insulin-like growth factor (IGF) binding proteins, molecules

involved in regulation of cell adhesion and malignant potential was observed with several

examples of multiple differentially expressed genes sharing chromosomal locations. The

authors state that the expression levels of a significant proportion of the genes identified

in the LMP tumours were intermediate between the benign and invasive samples and

suggest this as evidence of LMP tumours representing a transitional state between the two

extremes of benign and invasive EOC. However the small sample size, lack of

information given about the pathology review process applied to ensure the true primary

ovarian status of the samples and the large amount of evidence in the literature to the

contrary weighs against the validity of this statement.

In a study focusing on the serous type of LMP and invasive tumour, Gilks et al (Gilks et

al., 2005) generated cDNA microarray profiles for 23 tumours subject to thorough

pathologic review from two pathologists according to WHO criteria (World Health

Organization, 1999). This study used extremely comprehensive 43k cDNA microarrays,

however only a relatively small number of genes were identified as being differentially

expressed with supervised or unsupervised analyses leading the authors to postulate that

the responsible mechanisms for the phenotype difference under investigation may be

outside the scope of microarray detection. A list of 541 genes (1.25% of the total

microarray feature set) was identified as being differentially expressed across the dataset

as determined by an unsupervised filter consisting of a minimum 2-fold change in at least

three samples.

This figure seems disproportionately small for a microarray platform containing over

43,000 features and indicates that no significant variation was observed for over 98% of

the clones represented. Furthermore, a permutation based approach to identifying

differentially expressed genes between invasive and LMP tumours also yielded a

51

similarly small list of genes (n = 217; 0.5% of total clone set). Somewhat unusually, all

these differentially expressed genes were over expressed in the LMP type relative to the

invasive tumours. This observation implies that no genes are expressed at a higher level

in the invasive EOC subtype, which known to have a comparatively faster growth rate

and proliferative ability, biological processes known to involve substantial gene

regulation.

Whist ontology analysis of the 217 differentially expressed genes identified was not

carried out, one would expect a large number of differentially expressed cell cycle or

proliferation genes to be identified from a comparison of LMP and invasive tumour

because of the known difference in mitotic rate, however based on the discussion of the

genes selected, this does not appear to have been observed.

Gilks et al make the observation that many previous array studies of ovarian cancer have

used RNA cultured cell lines or normal OSE as a reference for normal ovary (Lu et al.,

2004; Zorn et al., 2003). However, as the normal precursor of EOC is still under question

and isolation of uncontaminated and appropriate quantities of surface epithelium

notoriously difficult, this may not be the most suitable reference material to study the

gene expression of malignant ovarian tissue. Several potential novel markers for EOC

that have been identified from such studies, such as HE4, MUC1, MSLN and PAX8 were

observed to have higher expression in the LMP tumours compared to the invasive type in

this study. The authors highlight this as a pitfall of not including LMP tumours in studies

designed to investigate tumourigenic pathways or identify novel molecular markers.

Overall there is a trend towards larger numbers of genes being upregulated in invasive

EOC relative to those upregulated in LMP tumours, with the exception of the Gilks et al

study, reasons for which are discussed further in Chapter 5. Genes involved in the

insulin/glucose pathway have been identified as differentially expressed between the

phenotypes by at least two independent studies. Only limited analysis appears to have

been carried out for the majority of LMP microarray datasets published to date including

only minimal in silico or other functional characterisation of those genes that appear to

discriminate between the LMP and invasive subtype.

The relationship of LMP tumours to invasive serous carcinomas of varying grade, or

differentiation status, has recently been investigated by the use of oligonucleotide

microarray profiling. It was found that the LMP tumours shared many of the molecular

characteristics of the well differentiated (grade 1) invasive tumours, compared to those

52

with moderately (grade 2) or poorly (grade 3) differentiated features. A high degree of

similarity in gene expression was present between the higher grade tumours. These

microarray observations were supported by findings from comparative genomic

hybridisation (CGH) in which the same similarity between LMP and low grade invasive

EOC was observed; in general far fewer chromosomal abnormalities than the grade 2 and

3 specimens (Meinhold-Heerlein et al., 2005). Taken together, it appears that LMP EOC

has a common transcriptional profile to grade 1 invasive EOC which is lost as

dedifferentiation occurs in association with tumour progression.

1.4.5. Other microarray studies of invasive vs. non-invasive cancer subtypes

The ability of microarray-based studies to elucidate the underlying molecular processes

involved other models of invasive vs. non-invasive cancer has been demonstrated by a

number of other studies. These include analyses of breast cancer (Iacobuzio-Donahue et

al., 2002; Kluger et al., 2004; Seth et al., 2003; van 't Veer et al., 2002; van de Vijver et

al., 2002), gastric cancer (Notterman et al., 2001), bladder cancer (Dyrskjot et al., 2003)

and prostate cancer (Bull et al., 2001; Calvo et al., 2002; Singh et al., 2002).

Invasive ductal breast carcinoma (IDC) and ductal carcinoma in-situ (DCIS) represent

two well characterised stages subtypes of breast cancer that have been thoroughly

characterised and widely held to be part of a progression from normal to malignant tissue.

In a comparison by Seth et al (Seth et al., 2003) 9k cDNA microarrays were used to

generate an expression model which consisted of 303 genes differentially expressed at a

two-fold level between DCIS and IDC. The most upregulated genes in the invasive

tumours were immunoglobulin heavy constant gamma 3 (IGHG3) and calgranulin B

(S100A9) – both known to be involved in the immune system and inflammatory response

to cancer.

Ma et al used laser microdissection of the premalignant stages of breast cancer to isolate

sufficient quantities of uncontaminated RNA to profile against breast tumours of other

stages (Ma et al., 2003). One unexpected finding from this study was the consistency of

molecular profiles from tumours of distinct pathological stages. Significant changes were

observed in the profiles of patient-matched normal breast epithelium and the first

recognised stage of malignancy, atypical ductal hyperplasia (ADH). These changes then

appear to be maintained as the tumour progresses through the following DCIS and IDC

stages. This suggests the metastatic potential of a tumour is determined in the very early

53

stages of its development, a hypothesis that has been validated by Van’t Veer et al (van 't

Veer et al., 2002) whereby the metastatic potential of a large cohort of tumours was

predicted based on the expression pattern of a gene set already present the primary

tumour.

Another interesting finding from the Ma et al study was the set of genes identified as

having a relationship to both the grade of the tumours as well as the transition from DCIS

to IDC. These genes may represent a connection between tumour stage and grade,

suggesting that the mechanisms that lead to loss of differentiation and increasing

malignancy may also control a tumour’s invasive ability. RRM2 was identified as

correlating with advanced grade and stage and is thought to play a dual role in

encouraging accelerated cell proliferation as well as conferring invasive capacity to the

tumour in which it is over expressed.

While the model for breast cancer development and progression is far more established

than those presently deduced for EOC, studies such as this indicate the potential clinical

benefit that can be obtained from using microarrays to analyse tumour subtypes of

varying metastatic potential. Multi-gene signatures capable of predicting complex clinical

variables, such as probability of disease recurrence and development of metastases, as

well as identification of individual genes potentially responsible for tumour invasion, can

be achieved from microarray-based studies with appropriate reviewed sample cohorts and

independent validation methods.

1.5. Summary and goals of this thesis This review has described the positive impact on cancer research that has resulted from

the advent of DNA microarray technology, along with some of its short comings and

areas in which progress is still to be made.

It also covers the clinical and molecular background of EOC, the fifth leading cause of

cancer death in women world-wide (Ries LAG, 2004). There is a clear need for a greater

understanding of the precise molecular events that the epithelial cells of the ovary

undergo during the transition from a normal to a malignant state. Attempts to identify

novel prognostic markers or molecular signatures through the use of microarray analysis

are described, both for EOC as well as cancer types such as breast, for which significant

progress has been made towards clinical application.

54

Furthermore, the potential benefits that may come from an understanding of the genes

and processes responsible for dictating either an invasive or non-invasive (LMP)

phenotype are also described, along with the current state of research into these areas.

This study therefore aims to:

(i) Experimentally determine the optimal conditions for carrying out a large-

scale tumour profiling study using cDNA microarrays, including selection of

an appropriate reference RNA, methods for monitoring data quality as well as

the impact of scanning hardware and normalisation and replication.

(ii) Analyse a cohort of EOC gene expression data to identify genes or molecular

processes related to length of patient survival, thereby gaining an insight into

the malignant events responsible for death from this disease, and

(iii) Analyse differences in expression patterns between invasive and non-

invasive (LMP) EOC to identify genes responsible for the observed

phenotypic differences and clinical course of these disease subtypes.

55

2. Materials & Methods

2.1. Ethical Issues This project has occurred during the establishment phases of the Australian Ovarian

Cancer Study (http://www.aocstudy.org) (AOCS). As such the undertaking of obtaining

appropriate ethical approvals to conduct a molecular profiling study of human tissues

obtained from a diverse range of hospitals and research institutes was done by the AOCS

Management Committee, as detailed below.

2.1.1. Structure of ethical governance

Human Research Ethics Committee (HREC) approval was first obtained at Peter Mac

and the Queensland Institute of Medical Research (QIMR) (AOCS host institutions).

Approval for the study was then obtained from each of the 19 collaborating centres across

the country. Thereafter, modifications to the protocol were first considered by Peter Mac

and QIMR, and then submitted to each of the collaborating centres.

2.1.2. Ethical use of human tissues

Tissues removed from patients in the normal course of that person’s treatment are

required by law to be stored (archived) for diagnostic or forensic reasons. This tissue can

be used for teaching and quality assurance without the consent of the patient and is not

considered tissue banking.

The context of this project is in reference to the National Health and Medical Research

Council’s (NHMRC) “National Statement on Ethical Conduct in Research Involving

Humans” definition of tissue banking: “The collection and storage of human tissue into a

database specifically for the purpose of medical research which researchers and the

HREC have deemed conforms to the guidelines outlined by the National Statement”

(NHMRC, 1999).

The principle of obtaining informed consent prior to collection of tissue samples was

strongly adhered to in this study. Patients scheduled for surgery with a suspected

diagnosis of ovarian cancer were identified through the surgeons or hospital pre-

admission clinics. A research nurse or research assistant approached patients and

explained the study. Using a research assistant was considered preferable for recruitment,

56

rather than the treating clinician, to avoid the potential for unintentional coercion. Written

informed consent was obtained for all patients.

All aspects of the study were explained to the subject and they were asked to volunteer

for blood and tissue collection and follow up.

The disclosure of “no duty of care” towards the subject was emphasized during the

consent process. No researcher in the study was a primary care provider for the subject

and any medical questions that arose during the study were referred to the treating doctor.

Any concerns on the part of the researcher were discussed with the treating surgeon and,

if necessary, addressed by the Peter Mac HREC.

All tissue was catalogued using a unique identifier to protect the privacy of the individual.

Access to identifying information was necessary for initial case discovery and for clinical

information collection, including follow up, however, this was restricted to the Chief

Investigators, the Program Manager, and Research Nurses dealing with individual

patients.

2.1.3. Patient Identifiers used in this thesis

No personally identifying information can be interpreted from the system used to

enumerate biological specimens in this study. In general, the method used involves letters

and numbers which refer to the hospital/institute where the specimen was processed and

stored and the order in which they were selected for the study. This does not necessarily

relate to the location or time that a patient received treatment.

2.1.4. Protection of privacy

The privacy of participants was maintained in a number of ways and conformed to the

provisions of the Privacy Act 2001 (Privacy Act No.119 1988 as amended)

(www.privacy.gov.au). Personal identifiers were retained in the master database and were

accessible only to the Tissue Bank Manager, Chief Investigators and staff with specific

access rights. Electronic databases containing patient information and questionnaire data

were stored on a server at QIMR and backed up regularly. Study nurses around Australia

had password protected on-line access to the patient database for real-time monitoring of

recruitment. Data transmitted to and from the database is protected with a 128-bit

encryption algorithm implemented via Secure Socket Layer (SSL). The web server sits

behind a firewall implemented by the QIMR IT department. Biospecimen, pathology and

57

clinical data is stored in electronic password protected databases on a server (firewall

protected) at the Peter MacCallum Cancer Centre that is backed up regularly

Records were kept in locked cabinets within the Tissue Bank and electronic records were

kept on the database, behind a firewall and utilizing standard Microsoft Windows 2000

administrative security measures. All records and communications with patients were

kept confidential and de-identified once entered into the database.

2.1.5. Ethical contingencies

There were no adverse events during the study that required HREC intervention.

2.2. Pathology review and associated tumour classifications

Standard pathology procedures were used to review EOC specimens for inclusion in this

study. A number of classifications were used to create groupings of patients which could

be used to compare gene expression profiles, described below.

2.2.1. Assessment of relative percentage tumour content

Hematoxylin and eosin (H&E) stained sections of fixed or fresh tumour were analysed by

either Dr Melissa Robbie or Dr Paul Waring to determine their suitability for microarray,

RT-PCR or IHC analysis. Sections were reviewed for percentage necrosis by cross-

sectional area and percentage tumour epithelial cells by the tumour nuclei method (i.e.

percentage of tumour cells present). This was on the basis that the RNA content is likely

to correlate best with the percentage of cells, with large areas of collagen containing

occasional fibroblasts likely to have a different RNA profile as a section that comprised

of densely packed epithelial cells. Estimates were made based on light microscopy survey

of the whole section on low to medium power.

2.2.2. Residual disease

The level of residual disease remaining after debulking surgery was categorised by

measurement of the thickness of the largest visible area of tumour remaining after

surgery.

The categories used were: 0cm: Nil, 0-1cm: Minimal, 1-2cm: Moderate, >2cm,

Maximum.

58

2.2.3. Tumour grade

Tumours were graded according to the level of cellular differentiation observed by the

reviewing pathologist. Grade 1: the least malignant appearance with well differentiated

cells, Grade 2: intermediate with moderately differentiated cells and Grade 3: the most

malignant, with poorly differentiated cells.

2.2.4. Tumour stage

Tumours were staged based on the international FIGO staging guidelines (Benedet et al.,

2000), detailed in Appendix A.

2.2.5. Patient status

In order to identify those patients suitable for the gene expression analysis of survival

times carried out in Chapter 4, a classification system was to define the status of the

patient at the time of last follow-up. The status criteria are:

0 = Patient alive, disease absent,

1= Patient alive, disease present,

2 = Patient deceased from cancer,

3 = Patient deceased over other causes,

4 = Patient deceased as a result of treatment,

5 = Patient deceased, cause unknown,

6 = Patient lost to follow-up, disease absent at last point of contact,

7 = Patient lost to follow-up, disease present at last point of contact,

8 = No registry follow-up

2.3. In-vitro methods Table 2-1: General reagents and suppliers for in vitro work.

General reagents: Reagent Supplier Acetic acid BDH Agarose Progen β-mercaptoethanol Sigma Boric acid BDH 1st strand buffer Invitrogen

59

Chloroform BDH Cot-1 DNA (10 mg/mL) GIBCO Cyanine-3 dCTP (Cy-3) Renaissance Cyanine-5 dCTP (Cy-5) Renaissance DAKO LSAB+ kit DAKO Denhart’s solution Sigma 3,3'-diaminobenzidine (DAB) DAKO Diethyl pyrocarbonate (DEPC) Sigma Dimethylsulphoxide (DMSO) Sigma Dithiothreitol (DTT) Invitrogen 50X Low C dNTP Amersham Biosciences Eosin BDH Ethanol (99.7-100%) BDH Ethidium Bromide Sigma Ethylenediamine tetra-acetic acid (EDTA) Boehringer Mannheim Ficoll Amersham Biosciences Foetal calf serum (FCS) Trace tRNA (4 mg/mL) Sigma Tween-20 Bio-Rad Xylene cyanol Ajax

2.3.1. Construction of cDNA microarrays

Human cDNA microarrays were supplied by the Peter MacCallum Cancer Centre

Microarray Core Facility using protocols described in Sambrook and Bowtell (Sambrook

and Bowtell, 2003). Briefly, clone inserts from a set of approximately 10,500 cDNA

clones (‘10.5K clone set’) were PCR amplified. These were then printed on superamine

glass slides (Telechem) using an ESI Chipwriter Pro (Perkin Elmer) robotic arrayer. The

10.5K clone set contained approximately 8,000 sequence verified cDNA clones

predominantly corresponding to named human genes, obtained from Research Genetics

(USA). In addition to these, approximately 2,500 clones were picked from a larger 40K

Research Genetics human cDNA clone collection based on their relevance to cancer-

related processes as determined by literature searching.

The 10.5k cDNA microarray contains 11,088 unique features or probes, including the

Lucidea Microarray Scorecard System (GE Healthcare, USA). 9,857 of these features

correspond to unique Genbank accession numbers, in turn representing 7,833 individual

UniGene clusters (UniGene Build #184).

Full array details and feature identifies can be viewed in the online European

Bioinformatics Institute (EBI) database ArrayExpress (Brazma et al., 2003) using the

Array ID “A-MEXP-28”. This database contains information about the printing

60

configuration of the Peter Mac 10.5k cDNA microarray and also MIAME-compliant

descriptions of all protocols used (Brazma et al., 2001).

By sequencing of several hundred clones selected from a large number of experiments

carried out using the services of the Peter Mac Microarray Facility, the rate of incorrect

assignment of feature identities has been determined to be less than 3%.

2.3.1.1. Lucidea Microarray Scorecard 1.0

A number of statistical comparisons in this work are made using data from the Lucidea

Microarray Scorecard system. This kit includes a series of control targets which are

incorporated into the design of the cDNA microarray, specifically in the last row of each

sub-grids of the Peter Mac 10.5k cDNA microarray. Also included are control mRNA

spike mixes which consist of in vitro transcribed intergenic region mRNAs that

correspond to a dynamic range and Cy3:Cy5 ratio controls (listed in Table 2-2). By

adding these spike mixes to the mRNA sample of interest prior to hybridisation they can

be used to assess the sensitivity of the experiment at a precise level.

Table 2-2: Details of the Lucidea Microarray Scorecard quality control features present on Peter Mac 10.5k cDNA microarray.

Feature Cy3:Cy5 ratio Cy3 (pg/5 µl mix) Cy5 (pg/5 µl mix) Relative abundance 1RC 1:3 1000 3000 NA 2RC 3:1 3000 1000 NA 3RC 1:10 1000 10000 NA 4RC 10:1 10000 1000 NA 1DR 1:1 33000 33000 3.3% 2DR 1:1 10000 10000 1% 3DR 1:1 1000 1000 0.1% 4DR 1:1 330 330 0.033% 5DR 1:1 100 100 0.01% 6DR 1:1 33 33 0.0033%

2.3.2. Collection and processing of tumour samples

Fresh frozen tumour specimens were collected through the Australian Ovarian Cancer

Study collection sites The Peter MacCallum Cancer Centre (Melbourne), Royal Brisbane

Hospital (Brisbane) and Westmead Hospital (Sydney). Archival paraffin embedded tissue

specimens were collected from St.Vincents’ Hospital (Melbourne) according to the same

ethical criteria.

61

2.3.2.1. Extraction of RNA from frozen tissues

Total RNA was isolated from fresh frozen tumour specimens with assistance from Sophie

Katsabanis and Dileepa Diyagama, Peter Mac Microarray Facility staff. Total RNA from

ovarian tumours and common reference cell lines was isolated using phenol-chloroform

extraction (TRIzol; Invitrogen) and purified by column chromatography (RNeasy;

QIAGEN). The common reference RNA, containing pooled RNA from 11 human tumour

cell lines, was prepared as described previously (Pollack, 2002).

2.3.2.2. RNA amplification, labelling and hybridisation

Total RNA was used to amplify mRNA using modified Eberwine protocol (Van Gelder,

1990). Briefly 3ug of Total RNA was primed for cDNA synthesis with PolyT

Oligonucleotide with a sequence for T7 RNA Polymerase promoter at 5’ end. After the

second strand synthesis in the presence of E.Coli RNaseH and DNA Polymerase I, double

strand template was linearly transcribed using T7 Polymerase (Ambion). Amplified RNA

was cleaned up using RNeasy mini columns according to manufactures protocol

(QIAGEN). After verifying antisense RNA (aRNA) quality and quantity, the RNA was

stored at –80˚C until the required for the labelling process.

2.3.3. Construction of reference RNA pools

2.3.3.1. 11 cell line reference

The construction of universal cell-line derived reference RNA was as first described by

Perou (Perou et al., 2000) and also Sambrook et al (Sambrook and Bowtell, 2003).

Briefly, mRNA was combined in equal proportions from the following cell lines: MCF7

(breast adenocarcinoma, ATCC catalogue number HTB-22), Hs578T (breast

adenocarcinoma, HTB-126), NTERA-2 cl.D1 (testicular embryonal carcinoma, CRL-

1973), Colo205 (colorectal adenocarcinoma, CCL-222), OVCAR-3 (ovarian

adenocarcinoma, HTB-161), MOLT-4 (acute lymphoblastic lymphoma, CRL-1582),

RPMI-8226 (myeloma, CCL-155), SW-872 (fibrosarcoma, HTB-92), HEP-G2

(hepatocellular carcinoma, HB-8065), UACC-62 (melanoma, (Stinson et al., 1992)), NB4

+ATRA (acute promyelocytic leukemia, (Lanotte et al., 1991)).

Cells were grown to confluence and media changed 48 hours prior to harvesting. Cells

were pelleted by centrifugation and mRNA extracted using TRIzol (Invitrogen) and

RNeasy columns (Qiagen) according to the manufacturers' protocols. RNA was

62

quantified by spectrophotometry and equal proportions of each RNA type were mixed

before aliquotting.

2.3.3.2. Pooled tumour reference

Extracted RNA from 22 samples of EOC was combined to generate a project-specific

stock of reference RNA. The volume of this pool was sufficient to hybridize several

hundred 10.5k cDNA microarray slides. The specimens selected for the pool were chosen

so the final pool would contain a number of different histological subtypes including

eight serous type, six mucinous type and a third group of mixed histology

(adenocarcinoma, benign and endometrioid carcinomas, n=8)

2.3.4. Target labelling

10ug of amplified aRNA primed with random hexamers was reverse transcribed with

Moloney Murine Leukemia Virus Reverse transcriptase (Promega), in the presence of

amino-allyl (AA)–modified dUTP (Sigma-Aldrich). AA-dUTP cDNA was labelled by

coupling to Cy3 and Cy5 (reference and sample, respectively) mono-reactive dyes

(Amersham Biosciences).

2.3.5. Slide hybridisation

Labelled probe was hybridized to the array in 3.1 SSC and 50% formamide at 42˚C for

14–16 hours in a humidified and temperature-controlled chamber (HyPro20; Thermo

Hybaid). Slides were washed at room temperature with 0.5x SSC/0.01% SDS (for 1

minute), then with 0.5x SSC (for 3 minutes), and finally with 0.06X SSC (for 3 minutes).

2.3.6. RT-PCR

RT-PCR was used as a tool to validate results of DNA microarray expression studies,

offering an independent platform as well as facilitating investigation of independent

samples.

2.3.6.1. Primer design and supply

Primers were designed using GenScript primer design service (GenScript Corporation).

This program selected unique primer sets based on sequence specificity and GC content,

and then checked for genome-wide specificity using BLAST (Wheeler et al., 2003).

63

Oligonucleotides were obtained from Geneworks (40 nmole synthesis; sequencing grade).

Sequences for primer sets used are shown in the text.

The most representative accession number for each selected gene were obtained from

SOURCE (Diehn et al., 2003) using the UniGene gene names of interest. This

representative accession number (potentially different to that in the 10.5k clone set) was

then used to query GenScript and identify primer sets with minimum, optimum and

maximum annealing temperatures of 58, 59 and 60˚C respectively and as well as crossing

an exon boundary.

2.3.6.2. PCR reactions

Primer concentrations varied but the stocks were maintained at 100 nM in 10 mM Tris-

HCl, 0.5 M EDTA pH 7.6 and stored at -20ºC. Quantification using SYBR green Total

RNA from samples was isolated as described in section 2.3.2.1 and cDNA template

produced using 5 µg total RNA Reverse transcription was performed using dNTP’s (25

mM each) and incubation (42°C) was performed for 60 minutes. The reaction was

brought to 100ºC for 5 minutes, diluted to 50 µL with 10 mM Tris, 0.5 mM EDTA pH 7.6

and stored at -20ºC.

Reactions were performed in 384-well plates with a 10 µL reaction volume per well.

Each reaction comprised 1 µL of template cDNA, 1 µL primers (combined), 5 µL SYBR

green master mix (Applied Biosystems) and 4 µL distilled water. The plate was sealed

and centrifuged at 100g for 1 minute at room temperature and placed in an ABI PRISM®

7900 thermal cycler (Applied Biosystems) and run for 2 hours and 15 minutes.

Replicates and no template (water only) controls were incorporated on each plate, as was

a ratio control (control gene selected due to little variation in expression across all

samples being tested, according to the microarray experiments). Results were expressed

as CT (cycle number at defined threshold) and delta CT was calculated for each

primer/template pair as shown in Equation 2-1.

Ratio = 2∆CT∆CT

= (HPRT CTtest - GENE CTtest) - (HPRT CTref - GENE CTref)

Equation 2-1: Formula for calculation of RT-PCR gene expression ratios

64

CT refers to the number of cycles at the threshold (most linear part of amplification

graph). ∆CT is the difference in CT of a particular gene (GENE) compared to reference

and normalised against the gene HPRT. HPRT is a gene that did not vary its expression

significantly across all samples tested and was used as a DNA loading control. Ratio

refers to the expression ratio compared to the reference. The reference in these

experiments was the universal reference, allowing comparison to the microarrays given

the same reference was used.

2.3.7. Tissue microarray construction

Tissue Microarrays (TMAs) were created for high-throughput validation and large-scale

experimental design. Tissue was sourced from both the AOCS collection and from the

St.Vincents Hospital (Melbourne) by Dr Melissa Robbie. TMAs were produced

essentially as described in Sambrook and Bowtell (Sambrook and Bowtell, 2003) and

schematically in Figure 2-1.

Briefly, H&E stained slides from cases identified as suitable for inclusion in the study

(i.e. ovarian serous carcinomas, invasive or LPM) were reviewed to confirm the

diagnosis, find areas of tumour typical of the diagnosis and check that these areas

contained features plentifully represented elsewhere (i.e. the area was diagnostically

redundant). This area was then circled on the slide and used to locate the matching area

on the paraffin block for needle punch biopsy. The diagnosis was recorded but no other

information retained. Agar blocks were processed in paraffin for the recipient block.

After melting, the histology scientist Neal O’Callaghan attempted to poke the cores down

so both long and short cores were present on the base of the cassette which becomes the

cutting face.

Two identical copies of each TMA were constructed to allow a large number of sections

to be cut for this and future analyses. The finished blocks were stored by the Pathology

Department in appropriate conditions. The layout of each TMA constructed and relevant

information for each specimen is shown in Appendix B. Tissue histology and original

specimen number were recorded for each grid reference and stored in a Microsoft Excel

spreadsheet.

65

Figure 2-1: Schematic diagram of the TMA-construction process. Multiple formalin-fixed tissue specimens are embedded in individual paraffin blocks. A 2mm core of a tumour-representative tissue is then taken with a punch biopsy tool. The cores (shown here packed together) are inserted into a pre-cored paraffin embedded donor block of agar as described in the text. The final block consisting up to 54 tumour cores is then sectioned into thin slices and placed onto standard microscopy slides for IHC analysis (Liotta and Petricoin, 2000).

66

2.3.8. Immunohistochemistry

Blocks were routinely sectioned at 3µm. The sections were then stored in foil at room

temperature to avoid exposure to light and air. Immediately prior to use, sections were de-

waxed with xylene for 3 minutes twice and then rehydrated by passage through a series of

ethanol solutions (100%, 100% to 70% and then tap water).

Antigen retrieval (Shi et al., 2001) was necessary for all antibodies used. Sections were

placed into 10 mM Sodium Citrate buffer (pH 6.0) and boiled under pressure for two

minutes using Biocare Decloaker (Biocare Medical, USA).

IHC was performed on a Dako Autostainer (Dako, USA) and all incubations were

performed in a humidified chamber at room temperature for 30 minutes. Blocking was

carried out for 10 mins in 3% hydrogen peroxide. Dako diluent was used in the

concentrations shown in Table 2-3 (Dako Product Code S0809).

The primary antibody was then detected with a polymer linked detection system,

Envision+ (Dako) with a 30 minute incubation. The chromogen, DAB+ (Dako K3468)

was applied for 10 minutes. Slides were finally washed and then counterstained with

Haematoxylin and progressively dehydrated through an ethanol series (70% to 100%),

then placed in xylene. Cover slips were mounted using DPX mounting medium (BDH)

and air dried overnight in a fume hood.

Table 2-3: Antibody information used for IHC on TMAs.

Antibody Clone Supplier Dilution Detection Kit

CD31 JC/70A Dako M0823 1:50 Envision + mouse

(product code: K4001)

Cyclin D SP-4 (rabbit mono)

Labvision RM9104-S 1:50 Envision + rabbit

(product code K4003)

E-Cadherin NCL-Ecad Novocastra NCL-E-cad 1:50 Envision + mouse

ER 6F11 Novocastra NCL-ER-L-6F11 1:100 Envision + mouse

Ki67 MIB1 Dako M7240 1:100 Envision + mouse

2.4. In-silico methods A comprehensive range of data analysis methods were used in this thesis to interrogate a

range of raw data types (predominately microarray gene expression data) to explore a

range of biological questions.

67

2.4.1. Image capture and data extraction

Hybridised microarray slides were scanned with either a ScanArray 5000 (Packard

Biosystems, USA) or Agilent Microarray Scanner BA (Agilent Technologies, USA), as

indicated in the text.

For the Scanarray 5000, the confocal laser was focused using both channels. Excitatory

wavelengths of 570 nm and 670 nm were used for Cy3 and Cy5 channels, respectively.

The ScanArray 5000 scanner required manual allocation of laser power and

photomultiplier tube (PMT) settings. These settings were selected to produce the largest

dynamic range of signal detection, with minimal increase in background intensity.

Finally, the settings for each laser (Cy3 and Cy5) were adjusted to give equivalent

excitation to avoid bias due to the dominant Cy5 signal.

For the Agilent Microarray Scanner, excitatory wavelengths of 570 nm and 670 nm were

used for Cy3 and Cy5 channels, respectively. Unlike the ScanArray 5000, Agilent

scanner does not require manual adjustment of laser power and PMT settings. Thus it

minimises the photo-bleaching of features due to iterative scanning to obtain the optimum

dynamic range of signal intensity. This process is done independently and simultaneously

for both Cy3 and Cy5 channels, which significantly improves the signal-to-noise ratio.

Furthermore, the dynamic auto-focus ability of the Agilent scanner achieves better spot-

to-spot consistency by minimizing spatial bias that results from glass curvature and

misaligned slides. These features are compared and contrasted further in Chapter 3.

2.4.2. Microarray image analysis

A 16-bit TIFF image was obtained for each hybridisation channel, which was stored

initially on a dedicated hard-drive at Peter Mac and subsequently archived onto DVD-

RW media. The images were reviewed using a pseudo-colour overlay image of the

Cy5/Cy3 channels, with red allocated to Cy5 and green to Cy3. The overlay images

provide the ability conduct a visual assessment of background non-specific staining and

other staining artefacts, consistency of spot morphology, and relative signal intensity

between the two excitation channels.

Data extraction from TIFF images and conversion was performed with either GenePix

Pro 4.1 (“GenePix”) (Molecular Devices, USA) or Quantarray (Packard Bioscience,

USA) as specified. These programs function by overlaying a user-defined grid structure

onto the scanned TIFF images corresponding to each hybridisation channel. The identity,

68

layout and size of the probes are built into the grid file which, after being positioned

accurately, converts the pixel intensities to numerical measurements. The areas of the

microarray used for hybridisation intensity quantification are described in Figure 2-2.

Specific features on each microarray with poor morphology or no signal detection were

flagged as ‘absent’ in the image analysis software, which assigns a code to the

hybridisation values recorded that can be used to inform the data analysis software of this

fact. In Quantarray this is carried out manually by the operator by visual inspection of the

array images whereas GenePix uses a series of numerical criteria to identify poor quality

or missing features. The formula used to exclude poor quality feature for microarrays in

this thesis is shown below in Equation 2-2.

These criteria translate to the flagging of features for which fewer than 55% of pixels

have an intensity reading at least greater than the median local background intensity plus

one standard deviation, a diameter of less than 80uM or greater than 150uM, more than

3% saturation and the sum of Cy3 and Cy5 intensities is less than 300. Spots that are

flagged by these rules can be excluded from downstream data analysis to reduce the

chance of introducing systematic noise into a gene expression profile.

[% > B635+1SD] > 55 or [% > B532+1SD] > 55 And

[Dia.] > 80 And [Dia.] < 150 And

[Flags] <> [Bad] And

[Flags] <> [Absent] And

[Flags] <> [Not Found] And

[F532 % Sat.] < 3 And

[F635 % Sat.] < 3 And

[Sum of Medians] > 300

Equation 2-2: Criteria for flagging poor quality cDNA microarray features in GenePix image analysis software

The measurements of hybridisation intensities are stored in a tab-delimited text file which

can be read into a number of different analysis packages or opened directly by most

spreadsheet applications.

69

Figure 2-2: Schematic diagram of cDNA microarray image analysis. The region shaded back indicates the area of the slide in between spotted probes used to assess the level of background hybridisation. The GenePix program uses the median intensity level this area whilst Quantarray uses the mean level. A 2-pixel gap is left between the areas used for quantification to avoid including small fragments of the spotted probe, or other artefact, in the calculation of background intensity (Axon, 2004).

70

2.4.3. Normalisation of cDNA microarray data

Normalisation refers to adjustment of systematic differences in the relative intensity of

the Cy3 and Cy5 channels so that data can be compared within and between microarrays

(Yang et al., 2002b).

Normalisation of microarray data in this study was carried out using a range of methods.

These include simple median normalisation, intensity-dependant normalisation (Yang et

al., 2002b) or one of two spatially-dependant normalisation algorithms, SNOMAD

(Colantuoni et al., 2002) and print-tip normalisation (Yang et al., 2002b).

The latter of these has been adapted into a broad range of bioinformatic tools toward the

later stages of this study and therefore has been used in a larger number of published

analyses to data that have made use of these tools. Print-tip normalisation is implemented

in packages such as BRB ArrayTools (Biometric Research Branch, National Cancer

Institute, USA) and Bioconductor (www.bioconductor.org) (Gentleman et al., 2004).

Both SNOMAD and print tip methods use information about a genes location within a

microarray when determining the level of correction that needs to be applied.

2.4.3.1. Median normalisation

Median normalisation is a simple method for addressing variation between cDNA

microarrays one experiment and also for centring the distribution of genes within each

array around the expression ratio of 1.0, which indicates equal expression in both test and

reference RNA samples.

This is achieved by (i) dividing the value of each gene by its median value across the

entire dataset and (ii) by the median of all values on the particular microarray.

2.4.3.2. Intensity-dependant normalisation

Intensity dependent normalisation (also called non-linear or lowess normalization) is a

technique that is used to eliminate dye-related artefacts in two-colour experiments that

cause the Cy5/Cy3 ratio to be affected by the total intensity of the spot. This

normalisation process attempts to correct for artefacts caused by non-linear rates of dye

incorporation as well as inconsistencies in the relative fluorescence intensity between

71

some red and green dyes. Such artefacts often result in a curve in the graph of raw versus

control signal (Quackenbush, 2002; Yang et al., 2002b; Yang and Speed, 2002).

In the absence of bias, one would expect there to be no dependence of Cy5 signal on Cy3

signal and thus the data points would be scattered symmetrically around the 1:1 line of

Cy5:Cy3 expression. Intensity-dependent normalization fits a curve through the

expression data and uses this curve to adjust the control value for each measurement.

When the resulting normalised data are graphed versus the adjusted control value, the

points are distributed more symmetrically between hybridisation channels. In this project,

20% of the total data is used for the smoothing process (Quackenbush, 2002).

2.4.3.3. Print tip normalisation

Examples of an individual microarray slide, pre and post normalisation, as well as a

representation of the entire dataset at these same stages, are shown in Figure 2-3. Each

array print tip (n=24, corresponding to the number of array sub-grids present) is

represented by a box plot in A and B of this figure. The variation of data from each print-

tip from the baseline expression of 0.0 (log2 of 1.0) can be observed and was corrected by

the normalisation process as shown in panel B. Panels C and D of this figure show the

entire dataset, with each array represented by an individual box and whisker plot. The

normalisation process has effectively centred the distribution of data points for each array

about the baseline expression level, effectively correcting for any bias in fluorescence

intensity (Yang et al., 2002b).

2.4.3.4. SNOMAD normalisation

SNOMAD uses a two-dimensional approach whereby a topographical view of the

hybridisation channel is created using the lowess algorithm and the difference between

the patterns of expression in each channel used to determine the level of adjustment to

apply to each data point.

72

Figure 2-3: Impact of print-tip normalisation on cDNA microarray expression data (Herrero et al., 2003; Vaquerizas et al., 2004). (A) Sample individual microarray dataset prior to normalisation. Each box and whisker corresponds to data generated by an individual print tip in the microarray fabrication process. Some drift from base line expression (horizontal dashed line) can be observed. (B) The same sample microarray post normalisation – all data is now centred on the baseline expression level. (C) Representation of entire dataset used for Chapter 5 before normalisation shows a large degree of variation in the data range of individual arrays with the majority having a median expression ratio of <1, suggesting a bias towards the Cy3 channel. (D) Chapter 5 dataset following print tip normalisation.

A

C D

Print tip ID Print tip ID

Individual array ID Individual array ID

B

73

Figure 2-4: Diagram of 3D lowess-based mapping of array data carried out during SNOMAD normalisation. (A) Mean pixel intensity vs. Cy5:Cy3 ratio for each array feature (Ratio Intensity Plot) of sample array. Blue dots indicate upregulated genes, red dots down-regulated. (B) Distribution of same up and down regulated features in two dimensional ‘virtual array’ view of microarray slide. A disproportionate number of similarly regulated genes appear in opposite corners of this array, indicating a technical error has resulted in spatial bias. The lowess curve fitting algorithm is then used to map the variation in feature intensity relative to the location on the array and used to normalise the features for (C) the test channel and (D) the reference channel. (E) Ratio-Intensity plot of normalised data shows the redistributed expression data still contains similar proportions of up and down regulated genes. (F) ‘Virtual array’ view of slide reveals a more even distribution of up and down regulated features across the array surface. (Colantuoni et al., 2002)

A B

C D

E F

74

2.4.4. Microarray data visualisation methods

2.4.4.1. Hierarchical Clustering

Clustering is one method for uncovering patterns of gene expression and the relationships

between these patters and reducing data complexity to facilitate visualization.

Hierarchical clustering uses similarity algorithms to divide genes or samples into groups

with similar gene expression profiles (Quackenbush, 2001). In this thesis clustering was

carried out using GeneSpring (Silicon Genetics, USA) and unless otherwise stated, genes

are displayed on the horizontal axis and samples on the vertical.

In any clustering algorithm, the calculation of a ‘distance’ between any two objects is

fundamental to placing them into groups. Correlations of multiple experiments (arrays)

are performed through a weighted correlation in which the weight of each experiment can

be specified. It is possible to make one sample more important in the clustering process

than another. If all of the experiments or experiment sets are given the same weight, they

are averaged equally. For example, you could give Experiment 1 a weight of 2, and

Experiment 2 a weight of 1. Therefore, in this example, the correlations found in the

Experiment 1 are twice as influential in creating the tree as the correlations between the

genes in the Experiment 2 study.

The equation used to determine the overall correlation is shown as Equation 2-3 where

the variables are:

A: The correlation coefficient between the gene in question in experiment 1 and the gene

named in the Experiments to Use box, also from Experiment 1.

a: the weight specified for Experiment 1.

B: The correlation coefficient of the gene in question in experiment 2, to the gene named

in the title bar, also from Experiment 2.

b: The weight associated with Experiment 2

C: The correlation coefficient of the gene in question in experiment 3 to the gene named

in the title-bar, also from Experiment 3

c: The weight associated with Experiment 3, and so on.

75

)()(

K

K

++++++=

cbaCcBbAaX

Equation 2-3: Equation for determining the overall correlation coefficient for hierarchical clustering

If X is between the minimum and maximum correlations specified by the researcher, the

gene in question passes the correlations. The minimum distance and separation ratio

equation is shown as Equation 2-4.

Equation 2-4: Equation for determining minimum distance and separation ratio for hierarchical clustering

To make a tree or dendrogram, GeneSpring calculates the correlation for each gene with

every other gene in the set. Then it takes the highest correlation and pairs those two

genes, averaging their expression profiles. GeneSpring then compares this new composite

gene with all of the other unpaired genes.

This is repeated until all of the genes have been paired. At this point the minimum

distance and the separation ratio come in to play. Both of these affect the branching

behaviour of the tree. The minimum distance deals with how far down the tree discrete

branches are depicted. Using a value smaller than p=0.001 has minimal impact because

few genes are more highly correlated than this cut off. A higher number tends to

incorporate more genes into each group, making the groups less specific.

2.4.4.2. Principal component analysis and multidimensional scaling

Principal component analysis (PCA) is a decomposition technique that produces a set of

expression patterns known as principal components (Holter et al., 2000). Linear

combinations of these patterns can be assembled to represent the behaviour of all of the

genes in a given data set. The application of PCA to microarray gene expression data was

76

carried out according to principles first described by Raychaudhuri et al (Raychaudhuri et

al., 2000).

Principal Components Analysis is a covariance analysis between different factors.

Covariance is always measured between two factors. So with three factors, covariance is

measured between factor x and y; y and z, and x and z. When more than 2 factors are

involved, covariance values can be placed into a matrix. This is where PCA becomes

useful. PCA will find Eigenvectors and eigenvalues relevant to the data using a

covariance matrix. Eigenvectors can be thought of as “preferential directions” of a data

set, or in other words, main patterns in the data. For PCA on genes, an eigenvector would

be represented as an expression profile that is most representative of the data. For PCA on

conditions, an eigenvector could be similar to main condition profiles. For either PCA,

there cannot be more components than there are conditions in the data. Eigenvalues can

be thought of as quantitative assessment of how much a component represents the data.

The higher the eigenvalues of a component, the more representative it is of the data.

Eigenvalues can also be representative of the level of explained variance as a percentage

of total variance. By themselves, eigenvalues by are not informative. The percent of

variance explained is dependent on how well all the components summarize the data. In

theory, the sum of all components explains 100% variability in the data.

2.4.5. Unsupervised identification of differential gene expression

In order to reduce the size of a microarray dataset to facilitate other analyses, an

unsupervised filter is often applied to remove genes not significantly contributing to the

phenotype of interest. The unsupervised nature of these filters means that no information

about the class structure present in the dataset is used to select genes at this stage.

2.4.5.1. Fold change method.

A common method for identifying genes without substantial variation in expression

compared to the reference channel intensity is to filter on the basis of a minimum fold

change. This is done by specifying a proportion of a dataset in which a given gene must

be x-fold differentially expressed. A common application of this filter is to exclude genes

that do not vary at least 1.5-fold in at least 20% of the dataset. Studies have shown that

cDNA microarray platforms are accurate at detecting gene expression fold changes of

1.4-fold or greater (Yue et al., 2001).

77

2.4.5.2. Signal-to-noise

The signal-to-noise method of gene selections involves the calculation of the following

formula for each gene: (µ0 - µ1) / (σ0 – σ1); where µ and σ represent the mean and

standard deviation expression level of each class, respectively.

A threshold can then be applied to the signal-to-noise scores, or the genes can be ranked

by this metric and an algorithm which iteratively tests for the optimal number, and

combination, of genes for a particular task can use this ranking to prioritise genes within

the search.

2.4.5.3. Log-expression variation method

An alternative method to using absolute fold-changes for identifying differentially

expressed genes is the calculation of p-vales for each gene describing their level of

statistical variance. This method does not rely on setting a fixed threshold for excluding

genes rather is based upon statistical comparison of the variation of a particular gene

across the dataset to either baseline expression (i.e. 1.0) or the median variation of all

genes on the array. Those genes not significantly more variable than the median gene, at a

pre-determined p-value are filtered out. The p-value of 0.001 for most log-expression

variation filtering in this thesis

Specifically, the quantity (n-1) Vari / Varmed is computed for each gene: i. Vari is the

variance of the log intensity for gene i across the entire set of n arrays and Varmed is the

median of these gene-specific variances. This quantity is compared to a percentile of the

chi-square distribution with n-1 degrees of freedom. This is an approximate test of the

hypothesis that gene i has the same variance as the median variance

2.4.6. Identification of genes differentially expression between tumour subtypes

2.4.6.1. Significance of Microarray Analysis (SAM)

The SAM method is a popular method for identifying genes with significant differential

expression from microarray data, described by Tusher et al (Tusher et al., 2001). The

SAM algorithm is one method of controlling the False Discovery Rate (FDR), which is

defined in SAM as the median number of false positive genes divided by the number of

significant genes. The SAM algorithm is an alternative to the multivariate permutation

test used by several other algorithm described in this chapter.

78

Firstly for each gene in the dataset, a modified F-statistic (or t-statistic for two-class data)

in which a “fudge factor for standard deviation” is included in the denominator to

stabilize the gene specific standard deviation estimates. The F-statistics are then used by

sorting them from smallest to largest (F (1), F (2)… F (i), F (n)), where n is the number of

genes. Next the class labels are permuted and a set of ordered F-statistics for each

permutation is re-computed. The expected ordered statistics are estimated as the average

of the ordered statistics over the set of permutations. A cut point is then defined as F (i*)

(∆), where i* ( ∆) is the first index i in which the actual ordered F-statistic is larger than

the expected ordered F-statistic by a ∆-threshold value, and is a function of this ∆. Genes

which have an F-statistic larger than this cut point are considered to be “significant”. For

random permutations, any “significant” genes are presumed to be false positives, and a

median number of false positive genes can be computed over the set of permutations. The

median number of false positive genes is then multiplied by a shrinkage factor π, which

represents the proportion of true null genes in the dataset, and is computed as the number

of actual F-statistics which fall within the interquartile range of the set of F-statistics

computed for all permutations and all genes, divided by the quantity of .5 times the

number of genes. If this π factor is greater than 1, then a π factor of 1 is used instead. The

median number of false positive genes, multiplied by π and divided by the number of

“significant” genes, yields the FDR for a given ∆ value.

2.4.6.2. Class comparison using data bocking

Genes that were differentially expressed between classes of samples were determined

using a multivariate permutation test (Korn et al., 2004b; Simon et al., 2003a). In

comparing classes of samples, gene expression variation from a potentially confounding

factor was controlled for using ‘data blocking’. This refers to the inclusion of

confounding factors in the general linear model (ANOVA) used to identify genes with

expression differences between classes of interest. This could be different batches of

microarray slides in the one experiment, or a variable such as residual disease volume

remaining after surgery. As such variation in the expression data that is attributable to

these variables is controlled for, allowing the test to identify genes with expression

differences between the true classes of interest.

The multivariate permutation test was used to provide 90% confidence that the false

discovery rate was less than 10%. The false discovery rate is the proportion of the list of

genes claimed to be differentially expressed that are false positives. The test statistics

used are random variance F-statistics for the effect of tumour type for each gene (Wright

79

and Simon, 2003). The F-statistics were computed from a two-way analysis of variance

with tumour type and gender as factors. Although F-statistics were used, the multivariate

permutation test is non-parametric and does not require the assumption of Gaussian

distributions. In our analyses to find genes that were differentially expressed among

classes, technical replicates of the same sample were averaged.

2.4.7. Machine-learning approaches for class prediction

2.4.7.1. Survival analysis

Survival analysis was used to identify genes whose expression was significantly related to

survival of the patients in a given experiment. A statistical significance level was

computed for each gene based on univariate proportional hazards models (Cox, 1972).

These p values were then used in a multivariate permutation test (Korn et al., 2004b;

Simon et al., 2003a) in which the survival times and censoring indicators were randomly

permuted among arrays. The multivariate permutation test was used to provide 90%

confidence that the false discovery rate was less than 10%. The false discovery rate is the

proportion of the list of genes claimed to be differentially expressed that are false

positives. The multivariate permutation test is non-parametric and does not require the

assumption of Gaussian distributions.

2.4.7.2. Quantitative Trait Analysis

Genes whose expression was significantly related to a continuous variable such as patient

survival were identified with Quantitative Trait Analysis. A statistical significance level

was computed for each gene for testing the hypothesis that the Spearman’s correlation

between gene expression and survival time (in months) was zero. These p values were

then used in a multivariate permutation test (Korn et al., 2004b; Simon et al., 2003a) in

which the ages were randomly permuted among arrays. The multivariate permutation test

was used to provide 90% confidence that the false discovery rate was less than 10% of

the number of genes identified. The false discovery rate is the proportion of the list of

genes claimed to be differentially expressed that are false positives. The multivariate

permutation test is non-parametric and does not require the assumption of Gaussian

distributions.

80

2.4.8. Class Prediction

Using the BRB ArrayTools package (Simon and Lam), models for utilising gene

expression profile to predict the class of future samples were created. Models developed

were based on the Compound Covariate Predictor (Radmacher et al., 2002), Diagonal

Linear Discriminant Analysis (Dudoit et al., 2002), Nearest Neighbour Classification

(Dudoit et al., 2002), and Support Vector Machines with linear kernel (Ramaswamy et

al., 2001). The models incorporated genes that were differentially expressed among genes

at the 0.001 significance level as assessed by the random variance t-test (Wright and

Simon, 2003). The prediction error of each model was estimated using leave-one-out

cross-validation (LOOCV) as described by Simon et al (Simon et al., 2003b). For each

LOOCV training set, the entire model building process was repeated, including the gene

selection process. It was also evaluated whether the cross-validated error rate estimate for

a model was significantly less than one would expect from random prediction. The class

labels were randomly permuted and the entire LOOCV process was repeated. The

significance level is the proportion of the random permutations that gave a cross-

validated error rate no greater than the cross-validated error rate obtained with the real

data. 1000 random permutations were used.

2.4.8.1. Multivariate permutation tests for controlling the number and proportion of false discoveries

The multivariate permutation tests for controlling number and proportion of false

discoveries is used for class comparison, survival analysis, and quantitative traits

analysis. Using a stringent p<0.001 threshold for identifying differentially expressed

genes is a valid way for controlling the number of false discoveries. A false discovery is a

gene that is declared differentially expressed among the classes, when in fact it is not.

There are two problems with this approach to controlling the number of false discoveries.

One is that it is based on p values computed from the parametric t/F tests or random

variance t/F tests. These parametric p values may not be accurate in the extreme tails of

the normal distribution for small numbers of samples. The second problem is that this

approach does not take into account the correlation among the genes. Using stringent p

value thresholds on the univariate permutation p values won’t be effective when there are

few samples and will not account for correlations. Multivariate permutation tests that

accomplish both objectives were used in this study, as described in Technical Report 3,

Biometric Research Branch, National Cancer Institute, 2002;

(http://linus.nci.nih.gov/~brb) and also Reiner et al (Reiner et al., 2003)

81

The multivariate permutation tests are based on permutations of the labels of which

experiments are in which classes. If there are fewer than 1000 possible permutations, then

all permutations are considered. Otherwise, a large number of random permutations are

considered. For each permutation, the parametric tests are re-computed to determine a p

value for each gene that is a measure of the extent it appears differentially expressed

between the random classes determined by the random permutation. The genes are

ordered by their p values computed for the permutation (genes with smallest p values at

the top of the list). For each potential p value threshold, the program records the number

of genes in the list. This process is repeated for a large number of permutations.

Consequently, for any p value threshold, we can compute the distribution of the number

of genes that would have p values smaller than that threshold for permutations. That is the

distribution of the number of false discoveries, since genes that are significant for random

permutations are false discoveries. The algorithm selects a threshold p value so that the

number of false discoveries is no greater than that specified by the user C% of the time,

where C denotes the desired confidence level.

The procedures for controlling the number or proportion of false discoveries are based on

multivariate permutation tests. Although parametric p values are used in the procedures, s

the permutation distribution of these p values is determined, and hence the false discovery

control is non-parametric and does not depend on normal distribution assumptions.

The multivariate permutation tests also take advantage of the correlation among the

genes. For a given p value for truncating an ordered gene list; the expected number of

false discoveries does not depend on the correlations among the genes, but distribution of

the number of false discoveries does. The distribution of number of false discoveries is

skewed for highly correlated data. If the confidence coefficient at is specified at 50%, the

program provides the length of the gene list associated with a specified median number of

false discoveries or given proportion of false discoveries.

2.4.8.2. Compound Covariate Predictor

The Compound Covariate (CC) method of prediction uses a weighted linear combination

of log-ratios (or log intensities for single-channel experiments) for genes that are

univariately significant at the specified level. By specifying a more stringent significance

level, fewer genes are included in the multivariate predictor. Genes in which larger values

of the log-ratio pre-dispose to class 2 rather than class 1 have weights of one sign,

whereas genes in which larger values of the log-ratios pre-dispose to class 1 rather than

82

class 2 have weights of the opposite sign. The univariate t-statistics for comparing the

classes are used as the weights. Detailed information about the Compound Covariate

Predictor is available in Hedenfalk et al (Hedenfalk et al., 2001) or in Technical Report

01, 2001, Biometric Research Branch, National Cancer Institute, USA .

(http://linus.nci.nih.gov/~brb/TechReport.htm.)

2.4.8.3. Diagonal Linear Discriminant Analysis

The Diagonal Linear Discriminant Analysis (DLDA) is similar to the Compound

Covariate Predictor. It is a version of linear discriminant analysis that ignores correlations

among the genes in order to avoid over-fitting the data. Many complex methods have too

many parameters for the amount of data available. Consequently they appear to fit the

training data used to estimate the parameters of the model, but they have poor prediction

performance for independent data. The study by Dudoit et al (Dudoit et al., 2002) found

that diagonal linear discriminant analysis performed as well as much more complicated

methods on a range of microarray data seta.

2.4.8.4. k Nearest Neighbour Predictor

The k Nearest Neighbour (kNN) Predictor is based on determining which expression

profile in the training set is most similar to the expression profile of the specimen whose

class is to be predicted.

The expression profile is a vector of log-ratios or log-intensities for the genes selected for

inclusion in the multivariate predictor. Euclidean distance is used as the distance metric

for the Nearest Neighbour Predictor. Once the nearest neighbour in the training set of the

test specimen is determined, the class of that nearest neighbour is used as the predicted

class of the test specimen. kNN prediction is an extension of the Nearest Neighbour

method. For example, with the 3-Nearest Neighbour algorithm, the expression profile of

the test specimen is compared to the expression profiles of all of the specimens in the

training set and the 3 specimens in the training set most similar to the expression profile

of the test specimen are determined. The distance metric is also Euclidean distance with

regard to the genes that are univariately significantly differentially expressed between the

two classes at the threshold significance level specified. Once the 3 nearest specimens are

identified, their classes vote and the majority class among the 3 is the class predicted for

the test specimen.

83

This approach was first applied to microarray data by Golub et al (Golub et al., 1999) and

despite its relative simplicity compared to other algorithms frequently used for microarray

analysis due to its realistic computational requirements and potential for producing highly

accuracy results.

2.4.8.5. Nearest Centroid Predictor

Nearest Centroid Prediction (NC) is another algorithm implemented in a range of data

analysis tools, including the ArrayTools suite (Simon and Lam). In the training set there

are samples belonging to class 1 and to class 2. The centroid of each class is determined.

The centroid of class 1, for example, is a vector containing the means of the log-ratios (or

log intensities for single label data) of the training samples in class 1. There is a

component of the centroid vector for each gene represented in the multivariate predictor;

that is, for each gene that is univariately significantly differentially expressed between the

two classes at the threshold significance level specified. The distance of the expression

profile for the test sample to each of the two centroids is measured and the test sample is

predicted to belong to the class corresponding to the nearest centroid.

2.4.8.6. Support Vector Machines

A Support Vector Machine (SVM) is a class prediction algorithm that has appeared

effective in other contexts and is currently of great interest to the bioinformatics and

machine learning communities. SVMs were developed by V. Vapnik (Vapnik, 1998).

The SVM predictor is a linear function of the log-ratios or the log-intensities that best

separates the data subject to penalty costs on the number of specimens misclassified. The

SVM implementation used in this study is LIBSVM of Chang and Lin (Fan et al., 2005).

For all classification tasks the SVM algorithm used a one-vs.-all architecture, which

means that separate classifiers were trained to distinguish each class from all the other

cases in the data set. An individual support vector machine solution is determined by the

vector w and the constant b (bias) obtained as minimiser of the so-called ‘regularised

risk’, where n is the number of genes; nRw �¸ is a vector of the dimensionality equal to

the number of genes, n, and w is its Euclidean norm; Rb ∈ is a constant (bias);

ni Rx ∈ and 1±=iy are the sample measurement vector and label, i = 1, …, S

(no.samples); “.” denotes dot-product in nR ; and C > 0.

84

Equation 2-5: Equation for determining an individual support vector machine solution

The SVM creates a hyperplane in n-dimensional space where n is the number of genes

selected. This hyperplane in effect allows a decision of whether a case is within a class or

not. The relative distance of a case from the hyperplane provides a measure of decision

confidence.

The absolute output from the SVM is a number between -1 and +1. The +1 denotes

within the class in question whereas -1 the case does not belong in that class. In Figure

2.2 Class 1 may be the default, hence labelled +1. All class 2 cases will be 0 < x < -1

depending on the confidence of the prediction. The more confident the SVM prediction is

the closer to -1 or +1 is the resultant score. Therefore in a 3 class problem the highest

score determines the class label (Brown et al., 2000).

2.4.8.7. Cross validated calculations of misclassification probabilities

To determine the p-value for the cross-validated misclassification error rate, permutation

analysis was carried out. For each random permutation of class labels, the entire cross-

validation procedure was repeated to determine the cross-validated misclassification rate

obtained from developing a multivariate predictor with two random classes. The final p-

value is the proportion of the random permutations that gave as small a cross-validated

misclassification rate as was obtained when using the true class labels. A cross-validated

misclassification rate and a corresponding p-value for each class prediction method used

is obtained.

One thousand permutations were carried out for each test of this kind, unless otherwise

stated, to obtain a statistically robust permutation p-value for the cross-validated

misclassification rate of a given algorithm.

85

2.4.9. Gene ontology analysis

2.4.9.1. EASE

Gene ontology analysis was carried out on the list of overlapping genes identified as

being significantly differentially expressed between histological subtypes to determine

whether the difference in the size of these lists corresponded to particular classes of genes

or known functional groups. The online ontology analysis tool EASE (Hosack et al.,

2003) was used to determine significantly represented ontologies or biological ‘themes’,

in both sets of histologically discriminant genes.

Briefly, this method annotates a given list of genes with their known ontology

memberships and statistically compares the biological themes represented in the list with

the total ontology profile of a ‘background list’ of genes, usually the entire list of genes

present on the array being used. It also takes into account the total number of genes in the

genome known to belong to each ontology classification. A Fishers Exact p-value and

EASE score is calculated and the ontologies are ranked in order of significance. The

number and names of genes present in each class can be viewed, along side the number of

genes present in the background list belonging to the same class, as well as the total

number of genes on a genome level with the same classification. These values are used to

determine the significance of observing groups of genes with similar functions in a given

list of genes.

The EASE score is a conservative adjustment to the Fisher exact probability that weights

significance in favour of themes supported by more genes. The concept of jack-knifing a

probability is the theoretical basis of the EASE score (Baty et al., 2005). The stability of

any given statistic can be ascertained by a procedure called jack-knifing. This is where a

single data point is removed and the statistic is recalculated many times to give a

distribution of probabilities that is broad if the result is highly variable and tight if the

result is robust. The EASE score is calculated by penalizing (removing) one gene within

the given category from the list and calculating the resulting Fisher exact probability for

that category. It therefore represents the upper bound of the distribution of jack-knife

Fisher exact probabilities and has advantages in terms of penalizing the significance of

categories supported by few genes.

86

2.4.9.2. FatiGO

Comparative ontology analysis was carried out using the FatiGO algorithm which

operates based on similar concepts as EASE, described above, however has the ability to

statistically compare the ontologies represented by two separate gene lists (Al-Shahrour et

al., 2004). This method was used for comparing gene lists generated from a range of

analyses to determine if significantly different biological themes were represented by

either list. A Fishers exact test is used to calculate the level of significance and

permutation testing used to control the false discovery rate, i.e. the chance of selecting

ontologies as being differentially represented between two gene lists by chance alone, due

to the large numbers of individual tests carried out.

2.4.10. Quantification of IHC staining

In order to analyse the staining intensity of the antibodies used for IHC validation work in

this thesis, three high power (400x) fields of each stained tumour section were captured.

An effort was made to select those areas most representative of the predominant staining

pattern in each tumour. These images were backed up onto external media then processed

with a modified protocol for quantifying IHC staining intensity using Adobe Photoshop

CS2 (Adobe Systems Inc., USA) described by a number of groups (Lehr et al., 1997;

Lehr et al., 1999; Matkowskyj et al., 2000).

This method involves the application of a threshold filter to each image, calibrated for

each antibody or IHC batch if necessary, to exclude the unstained sections of the image as

well as any non-specific staining. The threshold command in Adobe Photoshop converts

colour images to high-contrast black-and-white. All pixels lighter than the threshold

value, determined by the operator, are converted to white; all pixels darker are converted

to black. The threshold can be set between 0 and 255, the digital tonal range of the image.

Next the image is inverted and converted to greyscale, resulting in the areas of antibody

staining appearing white and the rest of the field black. A histogram of pixel intensities is

then created for the inverted image and the mean and standard deviation values exported

into a tab-delimited text file. Statistical analysis of these mean and standard deviation

values was then carried using Minitab version 13 (Minitab Inc, USA) to determine if a

significant difference existed between the IHC intensities of the tumour classes under

investigation. Examples of this method are given in chapter five.

87

3. Optimisation of cDNA microarray profiling for large-scale tumour profiling studies

3.1. Introduction This chapter aims to experimentally analyse several technical aspects of the cDNA

microarray work flow in order to determine the optimal parameters for large-scale tumour

profiling studies. The use of a universal or project-specific reference RNA and impact of

microarray scanner on the robustness and statistical accuracy of expression data generated

are evaluated. Issues of replication and normalisation are investigated and a novel method

for quantification of spatial bias in expression data is proposed.

The concepts introduced and finding generated from this chapter have implications for

chapters four and five, in which cDNA microarrays are used to profile EOC specimens

with the goals of exploring the molecular basis of clinically important phenotypic

differences.

3.1.1. A method for quantification of spatial bias in cDNA microarray data

Systematic error is frequently detected in data generated from microarray experiments.

Common sources of such non-biological variability include differences between the

labelling efficiency of the Cy3/Cy5 dyes and inconsistencies in the surface on which the

probes are printed (Quackenbush, 2002). These factors can be minimised by careful

laboratory work and selection of quality reagents and substrates, but can not be

completely avoided. Different methods of normalisation and other data manipulations,

such as analysis of gene expression ratio ranks rather than actual intensities, have been

proposed (Bilban et al., 2002; Broberg, 2003; Hoffmann et al., 2002; Qin and Kerr, 2004)

and are routinely used to correct for sources of technical noise in microarray data.

Specific examples of normalisations include intensity-dependant (non-linear)

normalisation and print-tip normalization, both of which use a locally weighted

regression method of curve fitting (Lowess) (Cleveland, 1979).

Intensity-dependent normalization is used to compensate for differences in the labelling

efficiency of Cy dyes, used to fluorescently label the reverse-transcribed sample RNA,

whilst print-tip lowess normalisation used to compensate for variation in the amount of

probe deposited by individual pins during the printing process (Park et al., 2003).

88

The issue of technically introduced spatial bias, or position effect (Miles, 2001; Smyth et

al., 2003) is more difficult to identify as it cannot be readily detected with a Ratio-

Intensity plot (also known as an ‘M vs. A’ plot). These are a common method for

visualising the range and distribution of both Cy3/Cy5 expression ratios and absolute

hybridisation intensities of microarray data (Yang et al., 2002b). Inspection of these plots

can assist in determining the type of normalisation required, particularly in the case of

bias due to higher incorporation or fluorescence intensity of one Cy dye. Print-tip

normalisation (Yang et al., 2002b) and Statistical Normalisation of Microarray Data

(SNOMAD) (Colantuoni et al., 2002) are two methods that take the physical position of

the array features into account in the normalisation process. However as with all

statistical manipulations of raw data, normalisation of array data can have a detrimental

impact. Often the overall (dynamic) range of data points is substantially reduced in the

process of correcting for technical bias (Yang and Speed, 2002). This can potentially

impact on the biological question under investigation by reducing the proportion of genes

observed to be differentially expressed over a given threshold or between known classes

of samples (Hoffmann et al., 2002). Therefore, wherever possible, it is preferable to

identify and correct the cause of systematic error at the experimental level, rather than

relying on statistical manipulations, themselves a potential source of systematic variation

(Tsodikov et al., 2002).

3.1.2. Reference RNA options for large-scale cDNA microarray profiling studies

As microarray technology gives researchers the ability to investigate the expression of

thousands of genes in parallel, they are ideally suited for studying diseases with genetic

foundations such as EOC. To date, the wide availability and lower cost of cDNA

‘spotted’ microarrays has lead to many academic research institutes, including the Peter

MacCallum Cancer Centre, adopting this technology for large-scale tumour profiling

studies. Microarray technology is a recent development compared to more traditional

methods for analysing patterns of gene expression information, such as differential

display (DD) (Liang and Pardee, 1992; Martin and Pardee, 1999), representational

differential analysis (RDA) (Lisitsyn et al., 1993) or subtractive differential hybridisation

(SSH) (Diatchenko et al., 1996). Therefore before conducting a large scale project using

valuable human tissue derived RNA, extensive planning and validation of the techniques

involved should be carried out in order to conserve resources and ensure the data

produced accurately represents true biology. Key decisions around choice of reference

RNA, appropriate levels of replication, selection of equipment, software and algorithms

89

for scanning, data extraction, normalisation and analysis need to be made at the outset of

the project to ensure the output is free of any technical bias and represent a true

measurement of gene expression in a particular tissue or cell line at a given time.

At the beginning of this project, little information was available on the impact of using

different types of reference RNA on the data generated. The selection of the most

appropriate material for the study in question is imperative as this provides a ‘base line’

expression level for the genes being investigated. Because the reference RNA is an

experimental constant used for every microarray slide in a cDNA microarray-based study,

it is important to select one most appropriate for the goals of the planned study. Changing

either the type or even batch of reference RNA used has the potential to significantly alter

the results and may lead to a significant ‘batch effect’ in the final expression data. If this

occurs, one cannot determine if any observed differences in gene expression correspond

to the type or batch of reference used, rather than true variation in gene expression

profiles in specimens being profiled.

To investigate the most suitable reference RNA for the EOC-profiling study planned as

part of the AOCS, an experiment was carried out using two conceptually different

reference RNA types to determine what, if any, impact each reference had on the quality

of the expression data produced. The reference RNAs used for this comparison were:

(i) Pooled cell line reference: composed of RNA from 11 human cell lines (of

breast, testicular, colorectal, ovarian, melanoma, acute lymphoblastic

leukaemia, myeloma, acute promyelocytic leukaemia, fibrosarcoma and

hepatocellular origins) as first described by Perou et al (Perou et al., 2000).

(ii) Pooled tumour reference: composed of RNA extracted from a subset of the

ovarian tumours to be profiled for future analysis as part of the AOCS.

Tumours were selected for the pool to represent the most common

histological subtypes; predominantly serous and mucinous, with smaller

numbers of endometrioid, benign and other less frequent subtypes.

Specific sample information for the cell lines and tumour specimens used to create the

reference RNA stocks are given in Appendix C.

90

The pooled tumour reference represented a project-specific reference RNA. Samples were

chosen for inclusion in the ‘tumour pool’ in approximately the same histological subtype

proportions as observed in the general population. Hybridisations against this reference

will result in expression ratios with a common denominator approximately equivalent to

an ‘average’ specimen of EOC. The pooled cell-line based reference is designed to

represent a molecular ‘average’ of gene expression across a selection of the most

common forms of the disease and therefore is not specific to cancer of one primary

origin.

One of the key goals of the AOCS gene expression profiling study is to identify

potentially subtle differences in gene expression between known classes of EOC, such as

histological subtypes for example, as well as the identification of novel classes that

correspond to variables of clinical importance. Because of this, the ability of a reference

RNA to give maximum resolution of gene expression differences between histological

and other subtypes was an important factor in evaluating the performance of these two

reference types. Other bioinformatic measures used to evaluate the references were:

Accuracy and reproducibility of synthetic quality control genes. These are

printed in the last row of each sub-grid and targeted by synthetic RNAs spiked

into the reverse transcribed specimen RNA to be hybridised. Comparison of the

observed expression ratios to the theoretical values of these features can be used

to assess the accuracy the accuracy of the microarray platform.

The proportion of the genes on the array which were identified as detectably

expressed. As the final gene expression measurements obtained from two-colour

cDNA arrays are ratios of two intensity readings (test and reference intensity), a

reference that binds to a larger proportion of the total array will produce a larger

number of valid expression ratios available for downstream analysis.

The proportion of genes on the array identified differentially expressed.

Differential expression can be defined as a gene above or below a predetermined

ratio threshold or based on statistical assessment of differential expression based

on the global variance present on each array analysed. Many analytical methods

involve the exclusion of genes without sufficient variation across a series of

arrays to constitute a contribution to the phenotype or variable of interest,

therefore any difference in the proportion of genes excluded on the basis of the

type of reference RNA used is important to note.

91

The number and type of genes with statistically significant differences in

expression between known histological subtypes in the dataset. A common

goal of microarray analysis is to identify genes that are differentially expressed

between two tumour subtypes. Analysis was carried out to determine if a

significant difference in the number of genes or represented ontologies identified

were attributable to the reference RNA used.

The performance of the total dataset as assessed with a sample predictive

analysis. A large gene expression dataset of samples representing different

tumour or histological types can be used as a reference for the correct

classification of unknown samples (Huang et al., 2003; Kan et al., 2004;

Ramaswamy et al., 2001; Shedden et al., 2003; Tothill et al., 2005). In this

section, a number of classification algorithms were trained to predict the

histological subtype of a set of tumours for which the subtype was unknown to

the classifier. The percentage accuracy generated by each algorithm was

compared to determine if either reference type conveyed a predictive advantage

for this type of investigation.

As well as bioinformatic comparisons, several practical issues are considered and taken

into account when evaluating the most appropriate reference RNA for a large-scale

tumour profiling study. These include the time and expense required to maintain and

extract RNA from eleven different cell lines, the longevity of the reference in terms of the

maximum amount able to be produced and its ability to be regenerated if the original

supply is exhausted, plus the significant advantage of being able to combine data sets

from disparate sources using the same reference material.

3.1.3. The impact of experimental replication on the robustness of cDNA microarray gene expression measurements

Due to a range of factors, including the relative expense of microarray analysis and often

the limited nature of the material being profiled, it is rarely feasible to replicate the

microarray analysis of every sample included in a large-scale tumour profiling study. It

has been widely demonstrated that cDNA microarray experiments are susceptible to a

range of technical biases, including variation attributable to individual array batches,

hybridisation method and the type of scanner used (Holloway et al., 2002; King and

Sinha, 2001; Ramdas et al., 2001a; Simon et al., 2003b; Wang et al., 2001; Yang et al.,

2001). As a common goal of tumour profiling studies is to identify a small number of

92

genes with robust differential expression between classes, it is important for the gene

expression values generated to reflect actual biology rather than technical or systematic

noise.

ScoreCard (GE Healthcare, USA) gene expression measurements from a series of quality-

control microarray hybridisations routinely carried out by the Peter Mac Microarray

facility were used to assess the impact of replication on data quality. The same batch of

commercially-supplied solution of synthetic genes used to spike into the Cy3 and Cy5-

labelled samples prior to hybridisation was used for all arrays in this analysis. In this

sense each hybridisation including this material can be regarded as a replicate

measurement for these genes. Therefore, it was assumed for the purposes of this

investigation, using an average calculated from multiple ScoreCard measurements was

the equivalent of replicate profiling of true biological specimens. As more replicates were

included in the average of a particular ScoreCard measurement, variation from its

theoretical expected value was analysed using ANOVA.

Another method of assessing the level of replication required to generate statistically

accurate cDNA expression data was the use of individual Genbank Accession numbers

from the same UniGene clusters. UniGene Cluster IDs represented with multiple

accession numbers in the 10.5k cDNA human clone set used to generate the Peter Mac

10.5k cDNA microarray were identified (UniGene Build number 160). The questions

posed by this analysis were (i) what proportion of UniGene IDs represented on the array

by multiple accession numbers have significant variation in the expression values of these

array features and (ii) does replication and averaging reduce this variation to a level

where the difference between accession numbers from the same UniGene cluster is no

longer statistically significant?

3.1.4. The impact of scanning hardware on cDNA microarray data quality

A key piece of equipment in any microarray experiment is the scanner, effectively the

bridge from the in vitro to in silico components of a microarray experiment. The primary

role of the scanner in two-colour microarray systems is to generate a high-resolution

digital image of the hybridised slide at two laser wavelengths selected to excite the

labelled probe bound to genetic material printed on the surface of the slide. Another

important function is to measure the amount of non-specific or background hybridisation.

Accurate measurement of this is important to calculate the proportion of a given gene’s

93

intensity reading due to random binding of labelled cDNA probes to the slide substrate or

non-specific probes.

Microarray scanners are commonly comprised of lasers used to excite fluorescently

labelled probes at two wavelengths and dual channel photomultiplier tubes (PMTs) for

signal detection. During the scanning process a series of filters, mirrors and lenses are

used to convert and digitise the electronic signal produced by laser excitation of the

printed region of the microarray. One image per dye is recorded for each microarray

scanned by either moving the slide or the laser itself (Lyng et al., 2004).

Newer model scanners such as the Agilent Microarray Scanner (Model BA, Product

#G2565BA) (Agilent Technologies, USA) feature a laser focusing technology referred to

as ‘dynamic focussing’ whereby the focal point of the laser detectors is continually

adjusted throughout the scan in response to minor variations in the substrate surface or

slide angle. Theoretically this is designed to reduce technical variation in the

measurement of fluorescence intensities that can be introduced by even minor variations

in glass thickness and topography. There is little in the way of reviewed literature

comparing the effectiveness of dynamic focussing on microarray data quality although

manufacturer’s data suggests that this feature substantially decreases the spatial

dependency of background hybridisation measurements. A series of microarray

hybridisations was scanned on two different microarray scanners, with and without this

dynamic focussing technology to determine the impact, if any, on the data produced.

Significant differences observed in the datasets produced by the scanners tested have

implications on the previous analysis of replication and is discussed later.

An important caveat to consider with this analysis is that a number of other potentially

confounding variables exist between the two microarray scanners used for this

comparison, including the age of the machines. The Agilent Microarray Scanner was less

than one year old at the time of this project whilst the Packard Scanarray model had been

in operation for approximately three years. As a result, it is possible that some depletion

in laser intensity may have occurred over time in the Packard scanner. However to

counteract this, the machine was regularly serviced and the quality of its output

monitored by staff of the Microarray Facility. (Sambrook and Bowtell, 2003)

94

Figu

re 3

-1: S

chem

atic

dia

gram

of t

wo

mic

roar

ray

slid

es b

eing

sca

nned

with

and

with

out a

dapt

ive

focu

ssin

g. (A

) Slid

e lo

aded

into

mac

hine

with

slig

ht ti

lt re

sults

in

mea

sure

men

ts b

eing

mad

e at

diff

eren

t foc

al p

oint

s; (B

) Slid

e on

sam

e an

gle

resu

lts in

adj

ustm

ent o

f the

foca

l poi

nt in

the

Agi

lent

Mic

roar

ray

Scan

ner a

nd m

easu

rem

ents

re

cord

ed w

ith e

ach

feat

ure

in f

ocus

. (C

) Slid

e sc

anne

d w

ith v

aria

tion

in g

lass

thic

knes

s re

sulti

ng in

feat

ure

inte

nsiti

es a

gain

bei

ng r

ead

at v

aryi

ng fo

cal p

oint

s. (D

) Th

e sa

me

slid

e sc

anne

d w

ith th

e A

gile

nt M

icro

arra

y Sc

anne

r (A

gile

nt T

echn

olog

ies,

USA

).

AB

CD

95

3.2. Results

3.2.1. Develop a method for measuring the degree of spatial bias present on a cDNA microarray

3.2.1.1. The importance of quantifying spatial-bias, application and calibration of Moods Median Test

Some arrays have significant areas where all features appeared to be up or down-

regulated. As the distribution of features on the arrays was intentionally randomised with

respect to the biological properties of the associated genes, these spatial patterns were

most likely to be due to technical problems. A number of normalisation algorithms exist

for addressing the issue of spatial bias in cDNA microarray data (Colantuoni et al., 2002;

Schuchhardt et al., 2000; Yang et al., 2002b), however no method for visualising and

quantifying the degree of this type of error as been described.

Spotted cDNA microarrays such as the Peter Mac 10.5k human array are printed by a

series of pins which results in a 4 x 6 sub-grid structure in the final product, each sub-grid

containing 19 rows and 24 columns of spotted cDNAs. It was hypothesised that for a

given cDNA array not affected by spatial bias, the difference between the median

expression ratio of each sub-grid should be small, relative to the same measure generated

on an array with a more extensive level of spatial bias. Based on this premise, a statistical

test was sought that determined the significance of observed differences in median

expression between array sub-grids.

Moods Median Test (MMT) is a statistical test, implemented in the Minitab statistical

package (Minitab Inc, USA), and designed to measure the significance of variation of

median values of a series of values in ‘data blocks’. This test was implemented on each

array in the series using the sub-grid number to group the expression ratios into data

blocks. The graphical output of the test reflected the pattern of spatial bias and a chi-

square score provided a numerical measurement of the degree of variation between sub-

grid median values. This value is referred to as an MMT score from here on. This test is

non-parametric and being median-based is robust to outliers, frequently present in gene

expression data. These factors make the test suitable for microarray data where the

expression ratio of one individual gene theoretically has no impact on the expression of

an adjacent gene. Because of this, small numbers of highly upregulated features can be

present in regions of a microarray where the majority of other features are down-

96

regulated or not expressed at all, which would impact on the mean, but not median

expression ratio of an array sub-grid.

For all Peter Mac cDNA microarrays tested with Moods Median Test, the p-value for

variation in sub-grid median expression ratio was below 0.001, even when no visible

spatial bias is present, indicating that a statistically significant difference existed between

at least two sub-grids. Examination of the graphical output of the large number of tests

carried out using expression data from several studies being carried out through the Peter

Mac Microarray Facility (e.g. Figure 3-2), showed that even on those arrays without

apparent spatial bias, as judged by visual inspection of virtual array images, the 95%

confidence intervals for each sub-grid median expression ratio differ sufficiently to result

in a p-values of less than 0.001. However, after applying the test to a large number of

arrays, it was determined that the Moods Median Test Chi square values for the series of

arrays analysed positively correlated with an increasing extent of spatial bias that was

visible to the naked eye.

By applying this test to a large number of arrays and relating both the MMT score and

confidence-interval diagrams to the level of spatial bias observed in virtual-array

representations of the data, the MMT chi-square value was calibrated to determine an

appropriate cut-off value for determining when the level of spatial bias present warrants

correction by spatially-dependant normalisation algorithms as previously described. In

the case where the MMT chi square was in the range of >1000, it was found that the use

of such algorithms did not result in a sufficient MMT reduction and the array was

excluded from further analysis. Arrays that generated a MMT score of below 200 had no

visible spatial bias on inspection of their virtual array representation, therefore this was

determined to be the threshold for an acceptable level of spatial bias. Examples of the test

applied to arrays with and without spatial bias are shown in Figure 3-2.

Another method to calibrate an acceptable MMT score for a particular microarray would

be the use of a box plot to visualise the range of MMT values present in a series of

hybridisations belonging to the one experiment, such as those generated in the statistical

package Minitab(Minitab Inc, 2002). This type of graph highlights any individual data

points (MMT scores) that are sufficiently larger than the main distribution as outliers. It is

an effective method for identifying any sub-standard arrays in a large series based on the

overall distribution of MMT scores in the entire dataset and the MMT scores

corresponding to these arrays can be used to determine the threshold of acceptable spatial

bias.

97

Figure 3-2: (A) Example of MMT output for a microarray slide with low spatial bias as observed with a virtual array diagram (right). The MMT score for this array is 107, as reflected by the small range of sub-grid confidence intervals (i.e. Scale 0.6 – 1.14). (B) Example of a second array with a high degree of spatial bias observed upon inspection of the virtual array figure on right. The range of sub-grid median confidence intervals is far greater (Cy5:Cy3 ratio 0.7 – 1.6) and the corresponding chi square value is 2976, reflecting the substantially greater degree of bias present in this array.

Mood median test for microarray UP127 Chi-Square = 106.70 P = 0.000

Subgrid 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Individual 95.0% CIs ---------+---------+---------+------- (---+----) (-------+--) (-----+----) (-------+----) (-------+---) (-----+-----) (---+------) (---+---) (-----+------) (------+----) (---+-------) (------+---) (----+----) (-------+----) (--------+-----) (-----+---) (----+---) (---+----) (----+----) (---+---) (----+----) (-----+----) (---+---) (----+---) ---------+---------+---------+------- 0.80 0.96 1.12 Expression ratio (Cy5/Cy3)

Mood median test for microarray UP013 Chi-Square = 2976.28 P = 0.000

Subgrid 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Individual 95.0% CIs ----+---------+---------+---------+-- (+) (+) (+) +) (+) (+) (+) +) (+ +) +) (+ (+ +) (+) (+ +) (+) +) +) (+ +) +) +) ----+---------+---------+---------+-- 0.70 1.00 1.30 1.60 Expression ratio (Cy5/Cy3)

A

B

98

3.2.1.2. Creation of Microsoft Excel Macro workbook to automate handling of raw gene expression data files

Most tumour profiling studies involve large numbers of specimens, with larger sample

sizes generally resulting in greater statistical power. Because of this it is often necessary

to automate repetitive aspects of the data handling involved. In order to facilitate the

calculation of MMT scores for multiple microarray profiles a Visual Basic macro was

created in Microsoft Excel (Microsoft Corporation, USA). This macro, accessible through

interface shown in Figure 3-3, carries out the following functions:

(i) Asks the user to specify the directory (local or networked) containing the

GenePix (Fielden et al., 2002) or Quantarray (Packard Bioscience, 2000) raw

data files

(ii) Creates a new sub-directory at this location in which to place the finished

summary file

(iii) Opens the first .txt (Quantarray) or .gpr (GenePix) in Microsoft Excel

(iv) Creates a column of background-subtracted gene expression ratios for feature

on the array (by applying the formula: (Cy5 Feature Intensity – Cy5

Background Intensity) / (Cy3 Feature Intensity – Cy3 Background Intensity).

(v) Annotates each ratio with a number identifying the array sub-grid to which it

belongs.

(vi) Copies the following columns to a new spreadsheet: Feature identifier

(generally the Genbank Accession number), Subgrid identifier, background-

subtracted expression ratio

(vii) Closes the raw data file

(viii) Opens the next file in directory and repeats steps (iv) – (vii).

(ix) Saves a compiled summary spreadsheet at in the directory created in step (ii).

This summary spreadsheet can then be directly opened in Minitab and the MMT score for

each microarray profile determined rapidly from the now correctly formatted expression

data.

99

Figure 3-3: Custom Excel Macro worksheet interface created to automate several aspects of microarray data manipulation. A number of functions are available, including the ability to prepare multiple arrays for SNOMAD normalisation (#2) and converting expression ratios from log to linear scale (#4).

100

3.2.1.3. Using Moods Median Test to evaluate the impact of image analysis software on cDNA microarray spatial bias

The MMT test described in this chapter can also be used to assess the impact of other

variables in the microarray work flow that have the potential to impact on spatial

distribution of data. These include variation in methods array hybridisation or the type of

software used for image analysis.

During the period of this project, developments in image analysis software were made,

including methods for dynamically adjusting the circumference of the region used to

identify an individual array feature (Kim et al., 2001; Korn et al., 2004a). Minor

variations in the amount of material spotted onto the slide are a common feature of cDNA

arrays, and can be caused by slight variations in the physical dimensions of the spotting

pins used or the length of time each pin is in contact with the slide. Because of this, even

the most carefully controlled array platform will produce arrays with a range of feature

diameters. An image analysis tool that is not capable of responding to these variations, by

having a spot detection window of fixed size, is likely to include areas of background in

the quantification of feature intensity. This has the potential to impact on the final

intensity measurement due to the inclusion of pixels actually corresponding to the glass

slide and not the spotted probe.

To investigate the impact of image analysis software on spatial bias a series of cDNA

microarray images was analysed using two difference programs. GenePix (Axon, USA),

with the ability to dynamically adjust the size of the circular window used to identify the

hybridised array feature from the surrounding background area, and Quantarray (Packard

Bioscience, USA) which maintains a fixed spot size for the entire array. Both packages

use the histogram method for determining intensity threshold levels (Ahmed et al., 2004)

and also allow the despots to be moved in any direction, individually or in blocks, to cater

for some misalignment in the printing process.

MMT scores were calculated for the series of 45 cDNA microarrays analysed in duplicate

using these two packages to determine if the distribution and magnitude of spatial bias

differed according to the image analysis software used. A box-plot of the MMT scores

was used to compare the series of MMT scores and analysis of variance carried out to

determine if the difference was statistically significant.

101

As shown in Figure 3-4, the data resulting from the GenePix analysis exhibited an overall

lower and less varied degree of spatial bias. The mean MMT score for the Quantarray-

analysed images was 169.8 compared to 111.9 for the same array images analysed with

GenePix, representing a reduction in spatial bias of 34% (p=0.006).

This analysis demonstrates the utility of MMT scores for evaluating advances in

microarray technology and their impact on data quality, specifically a reduction in

systematic spatial bias. It also describes the benefit of image analysis algorithms that

respond to varying feature diameters, a characteristic of this type of microarray.

102

Figure 3-4: MMT scores from 45 EOC microarray profiles analysed using (i) Quantarray and (ii) GenePix. Data generated using Quantarray had on average a higher and more varied range of MMT scores than data from the same microarray images analysed with GenePix. p-value for difference between mean values (indicated by red dots) = 0.006

1000

500

0

MM

T

Figure 3-5: Box plot of MMT scores from Gastric Cancer Prediction of Recurrent Disease dataset (described in section 3.2.1.4). Four arrays are indicated as outliers by this method, with one an extreme outlier, appearing at the very top of the figure. This particular array produced a MMT score of 1224 and was subsequently removed from the dataset after normalisation failed to reduce its MMT score to the level determined as acceptable for this microarray platform (200).

MM

T -

GP

MM

T -

QA

500

400

300

200

100

0

MMT scores from Quantarray

analysed array images

MMT scores from GenePix analysed

array images

MM

MT

scor

e

MM

MT

scor

e

103

3.2.1.4. Use of MMT to identify cDNA microarrays with extreme spatial bias and the impact of their exclusion on sample dataset

In order to assess the impact of a reduction in spatial bias on the bioinformatic

performance of a sample dataset, a comparison of cross-validation accuracies obtained for

a tumour classification problem was carried out. In a parallel study being carried out by

Dr Alex Boussioutas using the same Peter Mac 10.5k microarray platform (Boussioutas

et al., 2003), specimens of gastric cancer were profiled to identify a gene expression

signature capable of predicting the likelihood of disease recurrence following initial

treatment. Two version of the dataset were created, one normalised with intensity-

dependant lowess and the other using a spatially dependant normalisation method

(SNOMAD). MMT scores for the spatially-normalised dataset were calculated, as shown

in Appendix D. A box plot was used to visually analyse the distribution of MMT scores

across the dataset and identify any outliers, shown in Figure 3-5.

Array AB016 produced a MMT score of 1,224 and is represented by the asterisks at the

top of Figure 3-5. Following the identification of this array with extensive spatial bias, a

third version of the dataset was created without the array AB016 in order to determine if

by excluding arrays with extremely high MMT scores (even after spatially-dependant

normalisation), impacted on the cross-validated accuracy obtained when the dataset is

used for a sample classification task.

For each of the three data set versions, the optimal number of genes required to predict

the recurrent-disease status of each sample with the highest leave-one-out cross validation

accuracy was then determined. This was achieved using GeneCluster version 1.0 from the

Broad Institute (USA) (Reich et al., 2004). This method is based on a recursive signal-to-

noise approach for identifying genes of interest and a kNN method of class prediction, as

described in Material & Methods section 2.4.8.4. GeneCluster allows the researcher to

specify a range of gene numbers to search (e.g. 2 – 1,000) and calculates the classification

accuracy of each combination of genes within this range. Because of computing

limitations, the number of genes used in each iteration increases by a factor of two. The

cross validation accuracy of each dataset was also recorded along with the number of

genes used by the kNN algorithm to ascertain if a reduction in spatial bias also impacted

on the optimal number of genes required to perform the classifications.

The results of this analysis are shown in Figure 3-6. In summary, it appears that as spatial

bias is reduced in the three versions of this dataset, the number of genes required for

104

optimal classification also decreases, while the corresponding accuracy of the predictions

(as assessed by leave-one-out cross validation) increases. This observation supports the

hypothesis that reducing the level of technical bias present in a cDNA microarray dataset,

in this case spatial bias, results in more accurate classifications.

The highest cross-validation accuracy was obtained using the spatially-normalised dataset

minus one array exhibiting extremely high spatial bias (87% correct). This was followed

by the complete spatially-normalised dataset (78% correct) and finally the dataset

normalised with intensity dependant normalisation (73% correct). Following an inverse

trend to that of the classification accuracies, the number of genes selected by the

algorithm decreased from 1024 for the intensity dependant lowess data, 256 for the

complete spatially-normalised data to only 8 genes required for optimal classification of

the spatially-normalised dataset without array AB016. This large reduction in gene

number is amplified by the 2-fold increasing method of gene selection and evaluation

implemented in GeneCluster. This observation suggests that by reducing the level of

spatial bias present in a cDNA microarray dataset, by removing unacceptably biased

arrays and then minimising the spatial bias in those remaining, predictive algorithms such

as kNN are able to perform with a higher degree of accuracy using a smaller number of

genes expression data points.

The substantial increase in cross-validation accuracy observed in this study, attributable

to the exclusion of only one array from the dataset, also reflects the impact of sample size

on cDNA datasets of this size. This phenomenon has been recently described by Ein-Dor

et al (Ein-Dor et al., 2005), who noted that genes selected for prediction of the probability

of breast cancer patients developing metastatic disease is highly contingent on the subset

of patients used for gene selection processes.

105

Figure 3-6: Sample machine learning based predictive analysis to demonstrate impact of spatial bias on dataset performance. (A) Percent of gastric cancer specimens assigned to the correct disease recurrence category on the basis of leave one out cross-validation following lowess normalisation, spatially-dependant normalisation and then exclusion of a single array with an extremely high MMT score (B) The optimal number of genes (tested in doubling increments) selected by a kNN classification algorithm resulting in the corresponding prediction accuracy shown in (A) Genes were selected based on signal-to-noise ranking.

1

10

100

1000

10000

Intensity-dependantLowess

SNOMAD SNOMAD - highMMT

65

70

75

80

85

90

Intensity-dependantLowess

SNOMAD SNOMAD - highMMT sample

A

B

Perc

enta

ge o

f sam

ples

co

rrec

tly c

lass

ified

N

o. g

enes

106

3.2.2. Evaluation of reference RNA options for a large-scale tumour profiling study

3.2.2.1. Selection of samples and creation of a dataset to compare the performance of the two reference RNA options

To create a dataset for comparison of these reference options, RNA was extracted from

95 tissue samples obtained from patients diagnosed with EOC. The histological subtypes

of these cases are summarised in Figure 3-7. Samples were selected to resemble the

frequency of the EOC histological subtypes observed in the general population as closely

as possible (Ries LAG, 2004). The method for RNA extraction and labelling are

described in Materials and Methods section 2.3.2.1. Specimen RNA was reverse

transcribed incorporating the Cy5 fluorescing dye and hybridised to cDNA microarrays in

duplicate by Sophie Katsabanis, as described in Materials and Methods section 2.3.4 and

2.3.5. The first series of arrays was hybridised against the Cy3 labelled pooled tumour

reference and the second series using the Cy3 labelled pooled cell line material.

In total, RNA from 95 specimens of EOC were hybridised to 10.5k cDNA microarrays

using the two types of reference RNA as described. Image analysis and data extraction

was carried out using GenePix, as per Materials and Methods section 2.4.2.

107

41%

18%

14%

6%

21%SerousMucinousEndometroidAdenocarcinomaOther/Unknown

Figure 3-7: Summary of histological subtypes. cDNA gene expression dataset used to compare the use of either a universal pooled cell line or project-specific pooled tumour reference RNA. N = 95. The serous and mucinous endometrioid subtypes of EOC represent 73% of the samples in the dataset.

108

3.2.2.2. Hierarchical clustering of EOC datasets

Hierarchical clustering was used to visualise the natural grouping of samples on the basis

of their expression profiles. Initially an unsupervised method was used whereby all genes

identified as reliably hybridised were used in the clustering process. Genes not varying

significantly according to the Log-ratio method from baseline expression (ration = 1.0) in

a 20% of samples were excluded and the remaining data used for unsupervised

hierarchical clustering. The dendrogram resulting from the hierarchical clustering

analyses are shown in Figure 3-8.

From these clustering figures it can be observed that the first and most divergent branch

points in the dendrogram structure correspond to the mucinous and serous histological

subtypes present in this cohort of patients. The endometrioid subtype does not appear to

form a discrete cluster with either reference RNA type or gene set. It is important to note

however, clustering is not used to predict or classify samples in this study, rather as a

method of visualising the natural groupings in the data available (Simon et al., 2003b).

Overall both reference types were capable of generating expression data which generated

biologically-driven hierarchical clustering results, even without supervised gene

selection. One notable difference between clustering associated with the reference RNA

used is the small number of serous tumours that are grouped in the predominantly

mucinous branch, cell-line reference cluster (Figure 3-8 (B)). One of these four cases is

also grouped with the mucinous tumours in the pooled-tumour cluster; however the other

three are positioned in the main serous branch.

This observation suggests that the dataset created using the project-specific pooled

tumour RNA dataset may allow the known histological to be detected with greater

resolution than with the use of the universal 11 cell line reference, particularly with

samples that are difficult to classify. It is important to emphasise however, that

hierarchical clustering is not a recommended method for classification of tumour

samples, however is an effective and method for visualising complex gene expression

data and its relationship to known clinical variables including histological subtype (Simon

et al., 2003b).

109

Figure 3-8: Dendrogram structure of unsupervised hierarchically clustered (A) pooled tumour-reference arrays and (B) pooled-cell line reference arrays. Both patient IDs and histological subtype information is given to allow identification of individual specimens within the figure. The separation of mucinous and serous tumours is the dominant pattern observed, with the majority of the endometrioid samples being clustered in the mucinous branch.

A

B

Serous Mucinous Endometrioid

110

3.2.2.3. Differences in the distribution of expression data based on reference RNA type.

In order to determine if either the pooled 11 cell line or tumour RNA resulted in a greater

dynamic range of gene expression measurements, the mean expression ratio for each gene

detectably hybridised was calculated for the two datasets. Intensity-dependant

normalisation was used. The mean normalised values where analysed with a test for equal

variances using Minitab and visualised with a histogram of intensities, shown in Figure

3-9. The range of mean expression ratios was significantly wider in the data generated

from the use of the 11 cell line reference (p<0.001).

The wider range of gene expression ratios resulting from the hybridisation with the cell

line reference RNA is a reflection of the relative molecular differences between the

specimens being profiled and the composition of the reference RNA. In a hybridisation

against the pooled tumour RNA, each gene in the specimen of interest is quantified

relative to the level of that gene in a pool of other primary EOCs. When the same

specimen is profiled against the 11 cell line reference RNA, each gene measurement

made is relative to the expression of that gene in a much broader range of cancer types,

resulting in a larger expression ratio. As well as representing 11 unique cell lines, this

reference RNA also contains the molecular differences known to exist between cultured

cell lines and actual human tissue, further widening the difference in molecular profiles of

the two specimens being competitively hybridised to the microarray.

111

Figure 3-9: Histogram of mean normalised gene expression measurements for a series of microarray hybridisations using either a pooled cell line (universal) or pooled tumour (project-specific) reference RNA. A test for equal variances revealed the range of gene expression ratios in the dataset generated with the 11 cell line reference was significantly different (i.e. wider) than for the same dataset constructed using the pooled tumour reference RNA (p<0.001).

43210-1-2-3-4-5

1500

1000

500

0

Mean expression ratio (log2)

Freq

uenc

y

Pooled tumour reference 11 cell line reference

112

3.2.2.4. Comparison of the proportion of the microarray clone set represented by each reference type

After background subtraction, genes expression ratios are created by dividing the

intensity of hybridisation in the test (sample) channel by that of the reference channel;

therefore the degree of differential expression observed for a particular gene is relative to

the abundance of that gene in the reference RNA used. As a result of this, an important

measure of reference RNA performance is the proportion of the total microarray for

which valid expression ratios are generated.

Using the BRB ArrayTools microarray analysis package (Biometric Research Branch,

National Institute of Health, USA) the proportion of array features with (i) minimum

absolute expression of 300 in the reference Cy3-labelled channel and (ii) fewer than 20%

missing values across all arrays was determined for each individual array, in both

datasets. The minimum intensity level of 300 has been determined by the Peter Mac

Microarray Facility as an appropriate intensity cut-off, below which measurements are

unreliable. The mean number of features, per array, passing these criteria was tallied for

each of the 98 arrays and a one-way ANOVA used to determine if a significant difference

existed between these proportions of microarray coverage.

The mean number of array features successfully bound by the Cy3 labelled tumour

reference (9,056) was marginally higher (<0.01% of total array) than for the pooled cell

line reference (9,018). ANOVA of this difference reveals it is not significant (p= 0.499)

as indicated by the box plots shown in Figure 3-10. Therefore both types of reference

RNA used in this analysis hybridise to the same proportion of the microarray. It is

important to note that the proportion of array features bound is highly dependant on array

quality, RNA-labelling and hybridisation efficiency, all variables that are difficult to

quantify. However, by using a large number of arrays such as in this comparison the

impact of these variables is reduced and the conclusion that there is no significant

difference in array coverage between these types of reference RNA can be drawn.

113

Figure 3-10: Box plot of mean proportion of array features successfully bound by each reference type. Over 90% of array features were identified by both reference types in the dataset analysed. The difference between means was not statistically significant.

Prop

ortio

n of

tota

l arr

ay d

etec

tabl

y hy

brid

ised

Cel

l lin

ere

feren

ce

Tum

our

refer

ence

0.6

0.7

0.8

0.9

1.0

Pooled cell-line

reference

Pooled tumour

reference

114

3.2.2.5. Analysis of genes uniquely detected by either reference type

Although no difference in the proportion of features on the microarray detectably bound

by the reference types being investigated was observed, 241 genes (2.6%) were bound by

the tumour reference that were not detected by the cell line material, and a further 213

(2.4%) vice versa.

In order to characterise these genes differentially represented between the two reference

types, gene ontology analysis was carried out using the EASE method (Materials and

Methods section 2.4.9.1) (Hosack et al., 2003). This method determines the biological

functions, molecular processes and cellular components that are significantly represented

by a list of genes, relative to the composition of the particular microarray platform used.

The ontologies identified as statistically significant for these two lists of genes are shown

in Table 3-1 (tumour reference) and Table 3-2 (cell line reference).

This analysis revealed that genes known to be involved in cell-cell communication (e.g.

WNT1, FGF14, TNFRSF2) and also structurally involved membranes and extracellular

features (CCR5, MUC4, NLGN1) were identified by the tumour reference and not by the

cell line reference. It is possible these genes were not expressed in the cell line reference

RNA due to the in vitro culturing conditions in which they are grown where conditions

vary substantially to those present in the human body. The reference generated by

extraction of RNA from whole tumour contains a significantly higher proportion of genes

involved in the tumours interactions with microenvironment. No cellular components

ontologies were significantly identified in the list of genes unique to the cell-line

reference, further supporting this hypothesis. This difference may be an important

consideration if the goal of a microarray study is to explore these interactions.

Genes uniquely detected by the use of a cell line reference RNA include those involved in

morphogenesis and the detection and response to external and mechanical stimuli (e.g.

OPHN1, ALDH7A1). This again most likely reflects the growth conditions of these cells,

whereby they are dislodged from their substrates and transferred between different tissue

culture flasks, where most resettle resulting in a change in cell morphology in response to

contact with each other and the flask itself.

115

Table 3-1: Significantly represented gene ontologies in the list of 241 genes uniquely detected by the pooled tumour reference RNA

Biological process ontologies EASE Score Sexual reproduction 0.00861 Cell surface receptor linked signal transduction 0.0147 Neurogenesis 0.0228 G-protein coupled receptor protein signalling pathway 0.0248 Ion transport 0.028 Organogenesis 0.0295 Cell communication 0.0396 Organismal physiological process 0.0462 Cellular component ontologies EASE score Integral to plasma membrane 0.00139 Plasma membrane 0.00471 Cell fraction 0.0158 Extracellular 0.0163 Integral to membrane 0.0209 Molecular function ontologies EASE score Signal transducer activity 4.5 x 10-6 Receptor binding 4.17 x 10-5 Cytoskeleton protein binding 0.00467 Cation channel activity 0.0134 Cytokine activity 0.0144 Rhodopsin-like receptor activity 0.0163 G-protein coupled receptor activity 0.0239 Growth factor activity 0.0492

Table 3-2: Significantly represented gene ontologies in the list of 213 genes uniquely hybridised by the pooled cell line reference RNA

Biological process ontologies Ease score Organogenesis 0.00897 Ectoderm development 0.011 Detection of abiotic stimulus 0.0117 Development 0.0181 Morphogenesis 0.0184 Sensory perception of mechanical stimulus 0.0217 Detection of mechanical stimulus 0.0236 Skeletal development 0.0292 Histogenesis 0.03 Detection of external stimulus 0.039 Regulation of transcription from pol ii promoter 0.039 Purine ribonucleotide biosynthesis 0.0487 Molecular function ontologies: Ease score Monooxygenase activity 0.02 Nucleic acid binding 0.0457

116

3.2.2.6. Comparison of the proportion of the microarray set identified as differentially expressed relative to baseline expression

Two methods were used to compare the proportion of differentially expressed genes

present in a given proportion of arrays from both datasets; (i) minimum fold change of

1.5-fold or greater in at least 20% of samples and (ii) log-expression variation whereby

the variance of the log-ratios for each gene is compared to the median of all the variances.

Genes whose variance is not significantly more variable than the median gene are filtered

out

The decision to set the cut off for differential expression relative to the reference at 1.5

fold was based on this level representing a 50% increase or decrease of relative

expression and also published work in which a fold change of 1.4 was shown to be the

minimum gene expression ratio fold change that can accurately detected by cDNA

microarrays (Yue et al., 2001).

Intensity-dependant, lowess (Cleveland, 1979), normalisation was carried out prior to

applying the filter for differentially expressed genes to reduce bias in the data due to

factors such as variation in the efficiency of Cy-dye incorporation. The same filtering as

described in section 3.2.2.4 was used to screening out those features with unreliably low

intensity measurements and/or failing to produce valid expression ratios in over 20% of

the total number of samples.

Using the fold-change method of identifying differential expression, 2,956 genes from the

cell line reference data and 2,259 genes from the tumour reference set were identified as

differentially expressed. A statistical test of proportions indicated this difference is highly

significant (p<0.001). Therefore at the 1.5-fold level of differential expression, use of the

cell-line based reference yields a significantly larger proportion of differentially

expressed genes, relative to the expression of each gene in the reference channel. This

result reflects the greater dynamic range of mean expression ratios present in the cell line

reference dataset compared to the tumour reference data (section 3.2.2.3).

A more sophisticated method of identifying differential gene expression, whilst taking

into account sources of variation such as low overall intensity, is the use of t-tests to

compare the variation of a given gene versus the median variation of all genes on the

same array. This method was developed after it became apparent that fold change cut-offs

can vary in their biological relevance depending on various technical factors that impact

117

on the dynamic range of data points on a given microarray, although both methods are

still in use (Cui and Churchill, 2003; Yang et al., 2002a).

Using a log-expression variation filter with p-value cut-off of 0.001, 4,092 features from

the tumour reference arrays and 4,026 features from the cell-line reference arrays were

identified as differentially expressed. This difference was not statistically significant

(p=0.356) as determined by a t-test of proportions. This indicates that unlike the result

obtained by the fold-change method of gene selection, no difference between the

proportions of genes identified as differentially expressed based on log-expression

variation selection.

It is to be expected that a filtering method based on the comparison of each individual

genes variation across the dataset to the median degree of variation will result in equal

numbers of genes detected between reference types. This is because the median variation

is calculated for each dataset; therefore the actual fold change that correlates to the same

p-value can vary substantially. This method is generally accepted as being the most

robust approach to filtering out genes that are not contributing to the biological question

being investigated as it is not dependant on setting a fixed ratio threshold which may not

always correspond to the same increase or decrease in gene copy number (Cui and

Churchill, 2003).

Further analyses of microarray data in this chapter will use the log-expression filtered

gene lists generated from the reference comparison datasets.

3.2.2.7. Analysis of differentially expressed genes between histological subtypes

Histological subtype information was available for 82 of the profiled EOC specimens, as

summarised in Figure 3-7. As different subtypes of ovarian cancer have markedly

different clinical outcome (Pieretti et al., 2002), the genes and molecular pathways

underlying these differences are of interest. The ability of each reference type to identify

differentially expressed genes between these categories was tested.

Using the Significance Analysis of Microarrays (SAM) method for identification of

differentially expressed genes (Tusher et al., 2001), the largest three subtypes of ovarian

cancer in the available data set were analysed. This method of gene selection assigns a

score to each gene based on its change in expression relative to the standard deviation of

repeated measurements. For those genes with a score over a predetermined threshold,

permutation testing is then carried out to estimate the percentage of the gene list

118

identified by chance (i.e. False discovery rate (FDR)) and then a gene list is created based

on the FDR computed from the permutation analysis. This method allows for the

possibility of dependant measurements in the dataset, i.e. genes whose expression levels

are dependant on the level of other genes in the same sample.

1,737 genes were identified by SAM as being differentially expressed between

histological subtypes in the cell line reference dataset, compared to 1,287 from the pooled

tumour reference data. The p-value for the variation in proportion of differentially

expressed genes was p<0.001 indicating significantly more genes were identified by the

use of cell line reference RNA. 862 genes were selected by SAM analysis in both

datasets. The statistical significance of this overlap is P < 0.001 (representation factor =

4.2), as calculated by the online tool provided by the Kim Laboratory, Stanford

University (CA, USA)) (Lund, 2003) indicating the overlap is highly significant. 99 of

the top 100 genes in both lists are identical, the top 50 of which are listed, along with the

mean expression for each histology type, in Table 3-3.

119

Table 3-3 Top 50 most differentially expressed genes between histological subtypes for both reference types. The order of genes was identical for both datasets and mean expression ratios for each EOC subtype are given. Unless otherwise states UniGene build #184 was used for all annotations. Clones without current UniGene IDs are represented as Genbank Accession numbers.

UniGene Symbol mRNA accession number Endometrioid Mucinous Serous 1 LGALS4 BC005146 0.284 2.876 0.221 2 GPX2 BE512691 0.182 1.338 0.115 3 TFF1 BM923753 0.097 1.719 0.059 4 PTPRN2 BX537722 0.704 1.465 0.552 5 S100P BG571732 0.428 1.514 0.187 6 PPARG AK123253 0.636 1.592 0.549 7 SPINK1 AA652500 0.472 3.271 0.346 8 RNASE4 NM_194430 0.642 1.839 0.504 9 TM4SF5 W84524 0.666 1.51 0.647 10 AKR1C2 AK091128 0.373 0.555 0.142 11 RNASE4 NM_194430 0.794 1.785 0.607 12 CRIP2 AK091845 2.102 0.927 3.078 13 AZGP1 BC014470 0.41 1.215 0.348 14 AKR1C3 BQ939577 0.288 0.747 0.207 15 PTGS1 NM_000962 1.973 1.191 4.842 16 ABCC3 NM_020038 0.731 1.599 0.487 17 FOLR1 NM_016730 1.896 1.108 5.626 18 TNRC9 AK095095 0.705 1.397 0.495 19 SPTBN1 NM_003128 0.619 1.268 0.607 20 PAX8 CR936798 2.372 0.793 2.857 21 CYP2S1 NM_030622 0.642 1.958 0.753 22 CaMKIIN alpha NM_033259 0.561 1.192 0.323 23 MUC3B XM_168578 1.01 2.36 0.982 24 TFF3 BU536516 1.015 1.928 0.33 25 AKR1C1 AK095239 0.18 0.477 0.118 26 CYP3A4 NM_017460 0.677 2.439 0.644 27 AGR2 BM924878 1.096 3.749 0.524 28 KLK9 NM_012315 1.518 0.809 3.805 29 GALNT5 AY277591 0.559 1.309 0.559 30 PIP5K1B NM_001031687 0.602 1.326 0.486 31 XPR1 BC028576 1.418 0.816 2.146 32 PEG3 NM_006210 0.544 1.48 0.277 33 VIL1 AK125889 0.556 1.61 0.486 34 CA2 AK123309 0.359 1.113 0.246 35 CaMKIIN alpha NM_033259 0.487 1.264 0.307 36 USP37 BX538024 0.436 1.477 0.525 37 WT1 L25110 1.243 1.116 3.298 38 UCC1 NM_017549 1.068 1.324 0.561 39 H84871 H84871 0.505 0.981 0.4 40 UCP2 AK025742 1.707 0.872 2.84 41 MSLN BC003512 2.469 1.636 7.337 42 ALDH1A1 NM_000689 0.934 1.424 0.691 43 HPS3 NM_032383 2.543 0.502 3.32 44 KLK8 BC040887 1.318 0.933 2.806 45 FMO5 NM_001461 0.69 1.508 0.74 46 ITGB8 AB209429 1.741 0.767 2.393 47 N57754 N57754 0.198 0.496 0.125 48 LYPDC1 AK122643 1.897 1.165 4.941 49 KLK7 NM_005046 1.869 1.26 5.157 50 R95691 R95691 0.209 0.466 0.106

120

3.2.2.8. Gene ontology analysis of genes with differential expression patterns influenced by type of reference RNA used

In order to determine the biological significance of genes selected in common and also

specific to each dataset, ontology analysis was carried out. Firstly, gene ontologies for the

862 genes identified by both reference types were analysed, relative to all genes present

on the microarray, to determine the significance of the ontologies observed in this list of

genes differentially expressed between EOC histological subtypes. The EASE method

was used for this analysis (Hosack et al., 2003).

Significantly represented (p<0.05) ontologies for all three of the main Gene Ontology

Consortium categories (Ashburner et al., 2000) are shown in Table 3-4. This analysis

shows that the genes in common contain significant representation of processes including

differentiation (p=0.00603), matrix adhesion (p=0.0119), proliferation (p=0.0145),

transferase activity (p=0.000158) and protease inhibition (p=0.033).

Next, to determine if any ontologies were significantly enriched for one reference type,

differential ontology analysis was carried out using the FatiGO method (Al-Shahrour et

al., 2004) as described in Materials and Methods, section 2.4.9.2. All gene ontology

analyses carried out in this thesis include up to 3 levels of the gene ontology hierarchy,

recommended to generate interpretable results whilst minimising the level of ontology

redundancy present.

FatiGO analysis revealed that only one gene ontology was differently represented

between the two histology-discriminating gene sets. This was the ‘response to stimulus’

category, of which 25.05% of the cell line reference genes were classified into; compared

to 14.17% of the tumour reference genes. This broad classification covers genes whose

expression can cause a change in state or activity of a cell or organism, such as

movement, secretion, enzyme production, gene expression, in response to the detection of

a stimulus. A list of annotated genes representing this single ontology identified as

differentially represented between datasets generated using different types reference RNA

type are shown in Appendix E. The larger number of differentially expressed genes of

this type from the cell line reference type may reflect the culturing process of the 11 cell

lines used to create the reference, as it was also observed as being a theme of genes

successfully hybridised by this reference type but not a pooled tumour RNA reference in

section 3.2.2.7.

121

Of the large number of gene ontologies represented in these lists of histological subtype

discriminating genes, comparative ontology analysis has revealed only one significant

variation. This suggests that while the total number of genes identified as differentially

expressed between EOC histological subtypes may be significantly higher when using a

cell line reference RNA, there is no difference in the classes or ontologies represented by

these additional genes.

Therefore, from this analysis it appears both reference types are equally capable of

identifying the classes of genes and molecular events that define the differences between

endometrioid, serous and mucinous EOC, with the cell line reference resulting in a larger

number of genes in some categories.

Table 3-4 Gene ontology analysis of genes differentially expressed between histological subtypes in using both reference types.

Biological Process Ontologies No. genes in list EASE Score Digestion 13 0.000004 Homophilic cell adhesion 16 0.000178 Xenobiotic metabolism 8 0.00256 Glycoprotein metabolism 14 0.00603 Cell differentiation 18 0.00926 Cell-matrix adhesion 10 0.0119 Protein amino acid glycosylation 12 0.014 Negative regulation of cell proliferation 16 0.0145 Anion transport 13 0.0195

Cellular component ontologies No. genes in list EASE Score Membrane fraction 54 0.000158 Membrane 214 0.000223 Microsome 14 0.0124 Vesicular fraction 14 0.0139 Cell fraction 59 0.015 Extracellular space 30 0.0217

Molecular function ontologies No. genes in list EASE Score Transferase activity, transferring hexosyl groups 15 0.000297 Oxidoreductase activity 42 0.00144 Anion transporter activity 10 0.00924 Calcium ion binding 40 0.00951 Acyl-coa or acyl binding 7 0.0176 Organic anion transporter activity 7 0.0217 Ion transporter activity 24 0.0241 Electrochemical potential-driven transporter activity 16 0.0254 Serine-type endopeptidase inhibitor activity 9 0.0315 Protease inhibitor activity 12 0.033 Trypsin activity 10 0.0341

122

3.2.2.9. Comparison of reference RNA types with the use of a sample machine learning-based classification task

To determine if either the pooled tumour or pooled cell line reference RNA offered an

advantage when using machine learning algorithms to predict a clinical variable of

interest a sample classification task was set. Using all genes on the array as to avoid

algorithm over-fitting, leave-one out-cross-validation (LOOCV) was used in conjunction

with three different learning algorithms to assess the ability of each dataset to predict the

histological subtype of an ‘unknown’ sample. The algorithms used were: kNN (with 1

and 3 neighbours), NC and DLDA as described in Materials and Methods section 2.4.8.

Genes were included in the classification model if they were significantly different

between the histology classes at the univariate level of p=0.001 using a randomised

variance model. Gene selection was repeated for each iteration of the learning process,

minimising the chance of over-fitting the model to the cohort being analysed.

The complete output from the LOOCV predictions of histological subtype using both

reference types is listed in Appendix F along with a summary of the percentage of

samples correctly classified. The output of this sample classification task shows that for 3

of the 4 classification attempts, the cell line reference data achieved fewer errors in

classifying samples into according to their correct histological subtype. The DLDA, 3-NN

and NC classifiers correctly predicted the subtype of each specimen in 84%, 82% and

84% of cases respectively. This compared to 75%, 76% and 76% accuracy when using

the same algorithms on data generated using a pooled tumour reference.

The mean number of genes selected by the algorithms was 654 from the cell line

reference data compared to 424 from the tumour reference data. The difference between

number of genes selected is statistically significant (p<0.001), which supports the

previous observation of more genes being robustly differentially expressed between

histological subtypes with the use of a cell line based reference.

3.2.3. Analysis of cDNA microarray slide scanning technology on data quality

To test the impact of two common scanning methods on the quality of expression data, 46

10.5k cDNA microarray slides were scanned in duplicate, once using a Packard

Scanarray 5000 scanner (Packard Bioscience, USA) and immediately afterwards with a

Agilent Microarray Scanner BA (Agilent Technologies, USA).

123

These slides were hybridised with EOC RNA using the pooled 11 cell line reference, as

part of the reference RNA comparison study (section 3.2.2), using protocols described

previously.

3.2.3.1. Correlation of expression data generated from a series of cDNA microarray slides scanned on different scanners

Pearson correlation coefficients were calculated for each pair of array scans. The mean

correlation was 0.91 and a box plot of the correlation scores is shown in Figure 3-11. This

correlation indicates that while the overall consensus between data generated by different

microarray scanners is high, some variation does exist; reflecting the large number of

measurements made and also indicates the potential for introduction of systematic noise

into microarray data at this stage of the work flow. This compares to the Pearson

correlation obtained when comparing the results of an individual slide scanned multiple

times on the one scanner (Pearson correlation = 0.98).

This finding contrary to that of Ramdas et al (Ramdas et al., 2001b), in which the

difference in correlation of data between scanners was equivalent to that of data generated

by a single machine.

3.2.3.2. Impact of the microarray scanner used on the consistency of repeated microarray measurements.

To determine whether a significant difference existed in the measurement of individual

array features between the Agilent and Packard scanners, a comparison of quality control

features was carried out. Variation between repeated measurements of eleven different

synthetic quality control genes, printed multiple times throughout the layout of cDNA

microarray was carried out. A general linear model was used to control for the expected

variation between different types of ScoreCard gene (i.e. Quality controls at

predetermined ratios and intensities) and also between individual arrays.

Highly significant variation (p<0.001) was observed between the QC expression ratios

generated using both microarray scanners, after factoring in the expected variation

between the individual 46 arrays (p<0.001) and also for the expected differences between

gene ratios (p<0.001). Further analysis was then carried out to determine which of the

two scanners produced the more accurate measurement of these genes; however this

comparison revealed that statistically significant differences do exist between the

expression ratios generated for theoretically identical features when measured by

different microarray scanners.

124

3.2.3.3. Comparison of spatial bias present in cDNA microarray data from two different scanners

MMT scores (section 3.2.1) were then calculated for each dataset. As shown in Figure

3-12, the measurements of spatial bias obtained from the images generated by the Agilent

microarray scanner exhibited significantly lower (mean: 140) and less varied (standard

deviation: 105) MMT scores compared to those obtained from the Scanarray machine

(mean: 461, standard deviation: 445). Analysis of variance of the two series of MMT

scores revealed a highly significant statistical difference (p<0.001). Visual inspection of

virtual array images validated the MMT scores for these datasets with arrays having the

highest MMT scores having clearly visible position effects, such as those shown in Figure

3-13.

125

1.0

0.9

0.8

0.7

0.6

0.5

Pear

son

corr

elat

ion

Figure 3-11: Box plot of Pearson correlations for duplicate scanned microarrays (N=45). On average there is good agreement between microarray data generated by two different slide scanners (Pearson correlation >0.95)

PackardAgilent

2000

1000

0

MM

T

Figure 3-12: Box plot of MMT scores from duplicate scanned microarrays. The Agilent Microarray Scanner results in a dataset with significantly lower mean and range of MMT scores compared to the Packard Machine. Outliers are observed with both machines indicating a small number of arrays heavily affected by spatial bias compared to the dataset as a whole.

126

Figure 3-13: Virtual array images of two microarray slides with visible spatial bias scanned on: (A & B) Packard and (C & D) Agilent microarray scanners. A visible reduction in spatial bias is observed in the image generated by the Agilent scanner by the reduction of the overwhelming Cy5 (red) in the lower right and lower left of these two arrays respectively.

A B

C D

127

3.2.3.4. Comparison of background hybridisations spatially-dependant variation between sample microarrays scanned on different scanner

To explore the variation in spatial bias introduced between scans of the same microarray

slide in finer detail, one array was selected where a large difference in MMT score was

observed between the images generated by the two available microarray scanners. An

example of how MMT reflects and quantifies spatial bias is shown in Figure 2-4. 200-

point moving average plots of the foreground and background hybridisation intensity

measurements were generated for both Cy3 and Cy5 dyes from each scan of this array,

from the first to final array feature.

As seen in Figure 3-14, both foreground and background absolute intensity measurements

exhibit substantial variation across the surface of this sample microarray. The lines

representing background intensity appear to track with array location to a greater degree

in the Packard-generated data compared to the Agilent scan of the same microarray slide.

The same trends can also be observed in the measurement of the foreground (feature)

intensities with that of the Packard increasing with array location to a greater degree than

the Agilent data.

As both background and foreground measurements generated by the Packard scanner

appear to follow the same trend it was hypothesised that the routine practice of

subtracting the background intensity from the foreground may in fact eradicate the spatial

bias in the final expression data. Background subtracted intensity measurements were

then plotted as shown in Figure 3-15. By visualising the intensity measurements

generated by these two scanner types in this manner it is possible to observe the impact of

bias in background hybridisation on the overall level of spatial bias in the final expression

ratios.

128

100

1000

10000

0 2000 4000 6000 8000 10000

200 per. Mov. Avg. (ch2Background)200 per. Mov. Avg. (ch1Background)200 per. Mov. Avg. (ch2Intensity)200 per. Mov. Avg. (ch1Intensity)

100

1000

10000

1 547 1093 1639 2185 2731 3277 3823 4369 4915 5461 6007 6553 7099 7645 8191 8737 9283 9829 10375 10921

Feature number

Back

gro

un

d in

ten

sity

200 per. Mov. Avg. (ch2Background)




Figure 3-14: 200-point moving average graphs of individual detection channels from a sample microarray slide scanned in two cDNA microarrays scanners. (A) Packard-scanned microarray slide with visible spatial bias. Measurement of both foreground and background intensity correlates with array location (x-axis). (B) Slide rescanned using continual laser focus – resulting in reduction in visible spatial bias and MMT score. Measurement of both foreground and background intensity correlates with array location (x-axis). Background intensity readings from this scan are noticeably less variable and correlated with location than in the data from the Packard machine.

200 per. Mov. Avg. (ch2Background)200 per. Mov. Avg. (ch1Background)200 per. Mov. Avg. (ch2Intensity)200 per. Mov. Avg. (ch1Intensity)

Feature number

Feature number

Hyb

ridi

satio

n in

tens

ity (L

og10

) H

ybri

disa

tion

inte

nsity

(Log

10)

A

B

129

.

100

1000

10000

1 630 1259 1888 2517 3146 3775 4404 5033 5662 6291 6920 7549 8178 8807 9436 10065 10694

200 per. Mov. Avg. (Ch2 Bg Sub)200 per. Mov. Avg. (Ch1 Bg sub)

10

100

1000

10000

1 529 1057 1585 2113 2641 3169 3697 4225 4753 5281 5809 6337 6865 7393 7921 8449 8977 9505 10033 10561

200 per. Mov. Avg. (Ch1 Bg sub)

200 per. Mov. Avg. (Ch2 bg sub)

Figure 3-15: 200 point moving average graphs of the difference between foreground and background intensity measurements on a sample microarray (A) Packard-scanned microarray with visible spatial bias. The correlation with array location indicates that the difference between foreground and background hybridisation is not constant for the entire slide. (B) Slide rescanned using the Agilent Microarray Scanner. The difference between background and foreground intensity does not track with array location to the same extent.

Feature number

Feature number

Hyb

ridi

satio

n in

tens

ity (l

Lg1

0)

Hyb

ridi

satio

n in

tens

ity (L

og10

)

B

A

130

3.2.3.5. Analysis of ScoreCard quality control features to compare microarray scanner accuracy

As previously detailed, the ScoreCard quality control system allows researchers to

determine the accuracy of their expression data by comparing the difference between

observed and expected expression ratios at a range of ratios and absolute abundances. A

reliable microarray platform should generate expression data without significant variation

between the known correct values for these features distributed throughout the array and

those observed from a given hybridisation.

Intensity-dependant lowess normalised mean ratios for ScoreCard ratio control (RC)

genes were compared between both datasets. Differences between down-regulated

expression ratios were tested with ANOVA, factoring in differences between gene types

(four ratio control genes), individual arrays and differences between scanners. P-values

for all comparison were highly significant (p<0.001) indicating a statistically significant

difference existed in data generated by the two scanners tested. Box plots were created for

the mean expression of each Scorecard gene generated by both Agilent and Packard

microarray scanners. As shown in Figure 3-16, measurements of ratio-control ScoreCard

genes were significantly closer to their expected values, indicated by the dashed

horizontal lines, in data generated by the Agilent scanner. Furthermore, the range of

values obtained for these controls was significantly smaller (p<0.001) as determined by a

test for equal variances, implemented in Minitab and illustrated by size of the

interquartile ranges in Figure 3-16.

It is important that a microarray platform can accurately detect differences in relative

gene expression levels at varying absolute concentrations as genes are present at a broad

spectrum of concentrations in all biological samples. Dynamic range control genes are an

effective way of monitoring sensitivity of a platform to varying RNA abundances. Six

synthetic genes are double-spotted on the array, six times each so that each dynamic

range gene is represented 12 times. Mean expression ratios of each dynamic range are

taken to compare the accuracy of the array platform at detecting hybridisation of probes

at a range of absolute concentrations. The exact concentrations of the dynamic range

RNA spiked into the biological sample applied to the array range from 33pg/5uL to

33,000pg/5ul as described in Material and Methods.

131

Analysis of the mean raw expression ratio obtained for these dynamic range controls is

shown in Figure 3-16. For all six of the dynamic range controls it was observed that the

distribution of observed values crossed the expected value, indicated by a dashed line.

The distributions of dynamic control measurements from the fixed focus scanner were

further from the expected value and also more varied within each individual control.

This comparison indicated that the Agilent microarray scanner produced data with a

significantly higher degree of accuracy for a range of genes with known Cy3:Cy5 ratios

and absolute abundances. From this one can expect that data produced by this machine is

superior in its ability to generate expression data accurately reflecting the true biological

signal in the sample being profiled.

13

2

Figu

re 3

-16:

Box

plo

ts

of p

er-a

rray

mea

n ex

pres

sion

rat

ios f

or

sele

cted

Sco

reC

ard

qual

ity c

ontr

ol g

enes

. D

ashe

d lin

es in

dica

te th

e th

eore

tical

ly e

xpec

ted

valu

es fo

r eac

h ge

ne. A

g:

Agi

lent

Mic

roar

ray

Scan

ner B

A; P

a: P

acka

rd

Scan

arra

y 50

00 (A

) U

preg

ulat

ed q

ualit

y co

ntro

l fea

ture

s, C

y5:C

y3

ratio

s giv

en in

figu

re. (

B)

Dow

n re

gula

ted

qual

ity

cont

rol f

eatu

res.

(C)

Dyn

amic

rang

e fe

atur

e.

In a

ll ca

ses,

the

Agi

lent

da

ta w

as c

lose

r to

the

theo

retic

al v

alue

s of t

hese

qu

ality

con

trol g

enes

, in

dica

ting

a hi

gher

deg

ree

of a

ccur

acy

at a

rang

e of

ex

pres

sion

ratio

s and

re

lativ

e ab

unda

nces

.

A

B

C

3:1

Ag

3:1

Pa

10:1

Ag

10:1

Pa

1:3

Ag

1:3

Pa

1:10

Ag

1:10

Pa

Cy5:Cy3 ratio

Cy5:Cy3 ratio Cy5:Cy3 ratio

Agi

lent

Pa

ckar

d

3.3%

1.0%

0.1%

0.03%

0.00%

0.00%

0.01%

0.03%

0.1%

1.0%

3.3%

0.01%

Dyn

amic

ran

ge r

elat

ive

abun

danc

e (C

y5:C

y3)

133

3.2.3.6. Analysis of the effect of scanner type on the biological outcome of a sample microarray experiment

A crucial question when comparing technical modifications of existing methods is

whether the changes have any significant impact on the biological question being posed

by the experiment. The proportion of genes differentially expressed (2-fold) in a

proportion of the two datasets was used as a measure of how the differences in scanning

methods impact on the range of gene expression ratios in a given dataset. The fold change

method was used for gene selection specifically because it does not compensate for

variation between slide intensities potentially attributable to the type of microarray

scanner used.

This expression threshold was applied to the data after an intensity-based normalisation

was carried out to compensate for subtle differences in Cy3 and Cy5 incorporation during

the RNA labelling process. In 10 of the 46 arrays scanned on the fixed focus scanner (an

arbitrary proportion), 1176 of the 9995 genes available were either 2-fold up or down

regulated compared to the reference. When scanned on an adaptively focussing system a

total 1925 genes pass this filter. The difference in proportions of differentially expressed

genes is highly significant in a two-sided t-test (p<0.001).

The variation in quality control features can also be used to infer how varying scanning

methods impact on the biological meaning of a microarray. Although two colour

microarrays produce values that are relative to a reference RNA, relating absolute

intensity levels back to control genes of known abundance can give a measure of a gene’s

actual concentration or abundance in the sample being profiled. Accuracy of quality

control is required to achieve reliable calculations of gene absolute abundances. Mean

values for all quality control genes were significantly closer to the expected values in data

generated using the newer Agilent microarray scanner, indicating a higher potential for

reliable estimates of absolute gene abundances than from data generated by the Packard

scanner.

134

3.3. Discussion

3.3.1. The use of Moods Median Test to quantify spatial bias in cDNA microarray data

Spatial bias in cDNA microarray data is often observed as a spatial gradient or trend in

gene expression ratios related to their position on the microarray, rather than their actual

expression level in the specimen of interest. By randomising genes of similar structure

and function throughout the printing area, as much as possible, a random distribution of

up and down regulated features is expected across the surface of a cDNA microarray.

Therefore, patterns of hybridisation related to physical location are an indication of

technical error, such as inadequate probe distribution during hybridisation, irregular slide

topography or the angle of the microarray slide during scanning.

These effects can be visualised by creation of a virtual-array image, as shown in Figure

3-2. Often the effect is more visible after intensity-based normalisation is applied and any

imbalance between Cy3 and Cy5 labelling efficiency has been corrected. A method for

quantifying the degree of this bias is proposed whereby a chi-square statistic is generated

from a test of disparity of array sub-grid median ratio values. As the median sub-grid

expression values deviate from one another, the magnitude of the chi-square test statistic

generated by MMT increases. Examples of the test applied to arrays with and without

obvious spatial irregularities are shown in Figure 3-2. A global measure of position effect

is useful for identifying arrays in a particular experiment that may need to be repeated if

the effect cannot be corrected to an acceptable level with normalisation or other statistical

manipulation. As demonstrated, this test can also be used to compare differences between

array scanners, image analysis packages or other variables in the microarray process that

could potentially impact on the degree of position effect present in the final data obtained.

MMT is based on nonparametric statistics, therefore robust to outliers which are often

present in microarray data, depending on the type of sample RNA being investigated.

RNA extracted from human tissues will generally bind a higher proportion of array

features than RNA extracted from cell lines for example therefore potentially resulting in

more features being identified as outliers. While array data often approaches a normal

distribution, this cannot be assumed (Aris et al., 2004); therefore the non-parametric

nature of MMT is another factor which makes it suitable for analysing microarray data.

In section 3.2.1.2, the test is used to evaluate the impact of different microarray image

analysis software packages on the degree of spatial bias present in the final data. A key

135

difference between the two packages compared is the variable spot size algorithm

implemented in GenePix. This allows the diameter of each individual array feature to be

adjusted in response to variation in the actual amount of cDNA deposited on the glass

surface. The benefit of this is a reduction in the amount of background hybridisation

included in the measurement of smaller array features and conversely the amount of

foreground (or feature) intensity included with the measurement of background

hybridisation nearby larger features. Another important difference is the method

employed for calculating background hybridisation. The Quantarray package uses the

average pixel intensity of a circular area surrounding each spot, whereas GenePix takes

four separate measurements of background intensity and uses the median of these as the

final measurement.

By calculating the MMT scores for a series of arrays analysed in duplicate by these two

methods it was observed that GenePix resulted in expression data with overall lower and

less varied range of spatial bias, as determined by MMT scores. From this observation it

can be concluded that the image analysis stage of microarray analysis can also lead to the

introduction of spatial bias into the final data obtained.

Other methods for correcting spatial irregularity in expression data exist and operate on

similar principles as SNOMAD. These include the widely used print-tip normalisation

method, implemented in the R language (Ihaka, 1997) through the Bioconductor package

(Gentleman et al., 2004) and also online at GEPAS Tools (http://gepas.bioinfo.cnio.es/)

(Herrero et al., 2003) in which the adjustment factor is determined by a lowess regression

curve fitted through the expression data binned into groups according to the printing tip

used to deposit the probe onto the glass substrate during the array fabrication process.

To demonstrate how MMT can be used in a practical application, a sample classification

task was carried out using a dataset designed to develop a predictive algorithm for gastric

cancer disease recurrence. By creating three separate versions of the dataset and carrying

out the cross-validated prediction analysis, it was possible to observe the increase in

prediction accuracy associated with a reduction in the overall level of spatial bias present

in a cDNA microarray dataset. The number of genes required by the algorithm, designed

to iteratively add genes to the predictive set based on their signal-to-noise ranking until

optimal classification is reached, decreased with the reduction in spatial bias. This may

further reflect the reduction in systematic error component of each genes expression

profile across the dataset, allowing the algorithm to perform optimally using a smaller

136

number of genes whose expression patterns more closely resemble the underlying

biology, rather than characteristics of the laboratory procedures.

In general, the effect of reducing the overall amount of spatial bias in a dataset appears to

impact favourably on classification accuracy. Switching from intensity-dependant to

spatially-dependant normalisation resulted in a 9% increase in classification accuracy and

a four-fold reduction in the number of genes required to achieve optimal prediction.

Calculating MMT scores for arrays in this experiment after SNOMAD normalisation

show a significant reduction in the mean level of spatial bias, however one array

continued to generate a high MMT score. Removing this one array from the dataset

resulted in a further classification-accuracy increase and also a decrease in the optimal

number of genes required by the algorithm to perform the classifications.

This section of the analysis further supports the benefits of identifying and reducing the

degree of spatial bias in cDNA microarray datasets. As technical noise is reduced in gene

expression through improved wet-lab protocols or bioinformatic methods like SNOMAD

or print-tip normalisation, the biological signal appears to become less obscured by

technical noise, resulting in the ability to generate predictive signatures with higher cross-

validation accuracy and requiring fewer genes. This is important, as often a core set of

genes is sought for follow up analysis using other methods such as RT-PCR, in-situ

hybridisation or immunohistochemistry. Therefore methods that reduce redundancy in

gene sets identified as predictive of a phenotype of interest, facilitate these validation

analyses by allowing the minimal number of most influential genes to be determined.

3.3.2. Evaluation of reference RNA options suitable for large-scale tumour profiling studies

The advantages and disadvantages of the two distinct types of reference RNA analysed

for suitability in a large-scale tumour profiling study vary depending on the scope of the

project, method of data analysis planned and technical considerations such as cost of

production, longevity of the material and the ability to combine data for meta-analysis

studies.

An important measure of reference RNA performance is the proportion of array features

successfully hybridised and thus capable of generating an expression ratio that passes the

predetermined quality criteria. An expression ratio is only generated if the image analysis

software detects an adequate level of fluorescence in both sample and reference channels

of the scanned microarray images. In this study, no significant difference was detected in

137

the proportion of the microarray successfully hybridised by either reference type,

therefore in this regard, both references perform equally, producing viable expression

measurements for the vast proportion of array features. While no statistically significant

difference was found, the box plots of proportion of successful hybridisations (Figure

2-4) shows that five of the cell-line reference microarrays were outliers with proportions

as low as 55%. No outliers were identified in the tumour reference arrays, however it is

difficult to know whether this is a reflection of individual array quality, the particular

printing batch the arrays were sourced from, a hybridisation problem, or a by-product of

the reference type used.

Despite no difference in the proportion of genes hybridised by the reference types being

detected, several important ontology differences were observed between the lists of genes

bound by only one reference type, approximately 2% of the total set. In line with the

structurally more diverse tissue used to generate the pooled tumour reference, a

significant proportion of genes on the microarray involved in cell-cell communication,

membrane and extracellular structure were detected by this reference type. This

observation may be extremely important in selecting a reference type for a microarray

study in which the interactions between a tumour and its surrounding tissue or the

immune system are of interest. Those genes identified by the cell line reference alone

included genes related to cell morphology and response to external stimuli, reflecting the

tissue culture conditions in which they are grown.

Reduction of spatial correlation or bias in expression data is important for robust and

repeatable array results (Qian et al., 2003). The proposed method of visualising and

quantifying this factor revealed a significantly lower degree of spatial bias present across

the arrays using the pooled cell line reference compared to that observed in the tumour-

derived reference hybridisations. One reason for the significant difference in spatial bias

could be a difference in the overall mean intensity values of the hybridised features. With

higher feature intensity, the resulting expression ratios are less affected by the subtraction

of background hybridisation when compared to features where the distinction between

background and spot intensity is smaller. As a result any spatially-dependant variation in

background hybridisation due to uneven slide topography for example, is less likely to

result in up or down regulation of all genes in a specific section of an array. It is also

important to note that the difference in spatial bias between reference types was reduced

to a non-significant level after SNOMAD normalisation had been applied, indicating that

for the extent of spatial bias present in the datasets tested, this normalisation method is

effective for correcting this source of technical error (Colantuoni et al., 2002).

138

Approximately 600 more genes were identified from the cell line reference data using an

unsupervised method of identifying those genes potentially contributing to a phenotype of

interest (i.e. at least 1.5-fold differentially expressed in at least 20% of samples). This

indicates that the cell line reference RNA results in expression data with a wider dynamic

range, supported by an analysis of variance in the overall mean expression ratios for each

dataset, illustrated in Figure 3-9. The use of t-test based approach (log expression

variation) for identification of genes with significant variation in a given proportion of a

dataset yielded no difference in the proportion of genes identified. By adapting to the

specifics of each individual microarray, this method of data reduction, or unsupervised

gene selection, is therefore more suited to cDNA microarray data for which the overall

dynamic range of expression ratios can vary based on a number of variables, including

the type of reference RNA used.

In a comparison of genes identified as being differentially expression between

histological subtypes present in the dataset, 450 more genes were identified from the cell-

line reference data. Therefore, while an unsupervised gene selection filter identified a

similar number of genes, a supervised approach yielded a substantially larger number of

genes from the cell-line data. In the top 50 differentially expressed genes listed in Table

3-3, identified in both datasets, a number of biologically relevant genes were observed.

These include:

Trefoil factor 1 (TFF1) – upregulated 2-fold in mucinous tumours and

implicated in number of other cancers including breast and colorectal. Trefoil

factor genes are thought to be involved in protecting the mucosa from damage,

stabilizing the mucosal layer and have a role in epithelium healing and tumour

suppression (Dossinger et al., 2002; Schwartz et al., 2002).

Folate receptor 1 (FOLR1) – upregulated 5 fold in serous and 2 fold in

endometrioid samples and used as a marker for ovarian tumour progression. A

membrane glycoprotein not present on ovarian surface epithelium but found

during early transformation stages (Galmozzi et al., 2001; Tomassetti et al.,

2003).

Kallikrein 8 (KLK8) – upregulated 3 fold in serous and 1.3 fold in endometrioid

samples. Proposed as a biomarker for ovarian cancer and thought to be involved

in invasion and metastasis. One of several kallikrein genes identified as

differentially expressed (Shigemasa et al., 2004)

139

Mesothelin (MSLN) - upregulated 7 fold in serous and 2.5 fold in endometrioid

samples. Binds to the ovarian cancer antigen CA-125 specifically and controls

cellular adhesion to the mesothelial epithelium therefore mediating one of the key

stages of tumour invasion (Rump et al., 2004)

Although not one of the goals of this chapter, comparison of gene expression profiles

between subtypes of EOC, the data generated has the potential to form the basis of such

an analysis. In Chapter 5 the molecular differences between several subtypes of EOC is

explored, making use of data and findings from this preliminary chapter.

For differential ontology analysis, gene lists were restricted to those differentially

expressed between the mucinous and serous subtype only. Comparing the significantly

represented ontologies in these lists revealed only a single class or ontology was

differentially represented between the two lists. This again indicates that neither reference

type identified a substantially different list of genes differentially expressed between EOC

subtypes. Furthermore, it shows that Peter Mac cDNA microarray platform is sufficiently

robust that the same classes of genes are identified from a duplicate experiment, even

when a major experiential variable such as the type of reference RNA used, is changed.

By using the data in a sample classification task it was possible to observe whether either

reference type conferred an advantage in what is often the main purpose of a microarray

dataset; pattern leaning and prediction of one or more samples for which the class or

phenotype of interest is unknown. The data was used in conjunction of learning

algorithms to generate a predictive signature for ovarian cancer histological subtype. In

three of the four prediction trials, with LOOCV used to evaluate the performance of the

algorithm the data generated with the cell line RNA reference produced fewer

misclassifications. On average 81% of the samples were assigned their true histological

subtype compared to 78% for the classifier trained on the tumour reference data. While

only a small difference in terms of the number of actual samples predicted incorrectly

with one dataset and correctly with the other, it is still an important advantage particularly

when an algorithm is being used to predict clinically-relevant phenotypes.

The practical and technical considerations outlined such as cost of production, long term

availability and portability of data generated are also important factors to consider when

selecting a reference RNA type. Experiments carried out using the same reference type

have a greater potential for meta-analysis, in which the statistical power of a dataset is

increased by combining raw expression data from separate studies. Where a common

140

reference is used, such as the 11 cell line reference described in this chapter, it acts as

common denominator or baseline between studies, even if they have been carried out in

separate laboratories. Therefore this is a significant advantage over project-specific

references, such as the pooled tumour material, for which the expression ratios generated

are relative to the expression profiles of a specific group of tumours.

Additional time and expense can be associated with the construction of a pooled cell line

reference RNA. Large volumes of 11 different cell lines need to be generated, each with

their own nutrient requirements and varying growth rates. In contrast, a pooled tumour

reference can be made by combining amplified or non-amplified RNA from tissue

specimens at the same time as those being processed for individual hybridisation,

requiring little additional time or expense.

After weighing up the advantages and disadvantages of each reference type, summarised

in Table 3-5, the pooled 11 cell line reference was determined to be the most appropriate

choice for large scale EOC-profiling studies.

Table 3-5: Summary of reference RNA comparisons carried out in this study. In most of the comparisons conducted, the pooled cell line reference out-performed the pooled tumour reference.

Method of assessment Pooled cell line reference

Pooled tumour reference

Proportion of array features identified as reliably expressed

Wide dynamic range of expression ratios Identification of genes related to interactions between tumour cells and their environment

Proportion of genes differentially expressed – fold change

Proportion of genes differentially expressed – log-variance

Lower degree of spatial bias as measured by MMT Number of genes significantly differentially expressed between histological subtype Accuracy of machine-learning based predictions of EOC histological subtype

Comparability of data generated to other datasets

Long term availability

Reproducibility by other laboratories

Cost of production

141

3.3.3. Microarray scanners and cDNA gene expression data quality

During the course of this project the Agilent Microarray Scanner BA was introduced with

several new features not found in other scanners on the market at the time. One of the

most significant new features was its ability to maintain the focal point of the scanning

lasers throughout the duration of the scan. This feature was claimed to result in higher

quality data because the individual measurements were made with the correct laser focal

point, irrespective of variation in substrate topography, tilt or any movement of the slide

during the scanning process. By scanning a series of hybridised microarray slides on an

Agilent Microarray scanner and then immediately again using a Packard Bioscience

Scanarray 5000 without this dynamic focusing technology, several important differences

in the final data were observed that supported the claim of improved data quality over

scanner without the features of its new scanner.

The comparisons of the datasets produced by the two scanners revealed a number of

significant differences in the accuracy of individual measurements and other methods of

assessment including variation in the degree of spatial bias present. These findings

suggest the use of multiple scanners for one experiment may result in a systematic bias

being introduced in the data generated. Through normalisation, or inclusion of scanner-

type as a variable in the data analysis stage, the impact of the observed differences

between scanners could possibly be minimised; however this may limit the statistical

power of the dataset. Therefore, where possible it is preferably to use a single scanner

when creating a cDNA microarray dataset.

By analysing variation in raw background hybridisation intensity, for both Cy3 and Cy5

channels of sample arrays with and without visible spatial bias, it was possible to identify

that a significant proportion of the spatial bias present in cDNA microarray data comes

from a spatial relationship between background hybridisation intensity and array location.

This variation in transferred to the final gene expression ratios when the measurement of

background hybridisation is subtracted from the actual intensity of the hybridised probe, a

step intended to control for non-specific binding. Significantly lower variation was

observed in the background channels of array images produced by the Agilent Microarray

Scanner, explaining their overall lower MMT scores and greater accuracy of individual

data points.

These results also suggest that forgoing the background subtraction stage of cDNA

microarray data processing may reduce the level of spatial bias present in the final data,

142

particularly for microarrays where a distinct relationship between background

hybridisation and array location is observed.

The analysis of Scorecard expression measurements obtained from the two scanner types

revealed a significant difference in the accuracy of individual feature measurements. At a

range of ratio levels and absolute RNA concentrations, values generated from the Agilent

Microarray Scanner were significantly closer to the expected values and exhibited less

variation within repeated measurements of the same array feature.

3.4. General conclusions In this chapter a number of options available for several important steps of the microarray

work flow are compared to determine the optimal choice for future tumour profiling

studies.

A novel method for identifying and quantifying spatial bias in cDNA microarray

data is proposed. Its usefulness for identifying problem arrays whose removal

from a dataset dramatically increases the predictive accuracy of the dataset is

demonstrated.

A comparison of two types of reference RNA revealed that for a large-scale

project such as that planned for the AOCS a pooled-cell line material is superior

to a more project-specific material made of pooled RNA from the samples to be

profiled.

A comparison of data generated by different microarray scanners is also

described in which the Agilent Microarray Scanner generated the most accurate

and robust measurements of gene expression with a significantly lower degree of

spatial bias.

Adoption of the optimal methods determined by analyses described in this chapter can be

expected to result in cDNA microarray data that is highly accurate, thus requiring less

technical replication. The data will also be readily comparable to other studies with

minimal statistical manipulation required and have a lower degree of technical error

present compared to data generated with other methods. These findings are applied to the

following chapters in this thesis, analysing patterns of gene expression that correlate with

length of EOC patient survival (Chapter 4) and the molecular characterisation of invasive

and LMP EOC subtypes (Chapter 5).

143

4. Gene expression analysis of epithelial ovarian cancer overall survival

4.1. Introduction Ovarian cancer remains one of the most lethal of all cancer types with a five year survival

rate of 42% (Ries LAG, 2004). Patients who are diagnosed in the early stages of the

disease’s progression have a significantly better prognosis than those diagnosed with

advance stage tumours. Approximately 80% of invasive epithelial ovarian cancer

diagnoses are stage III or IV; in which tumour is present in both ovaries and spread to

other organs in the body (see Appendix A). Substantial variation in the prognosis and

survival time for these women is observed (Australian Institute of Health and Welfare and

Australasian Association of Cancer Registries, 2001). It is hypothesised that those

molecular differences underlying difference in clinical behaviour, including survival time,

are of interest as they may represent aspects of the disease which could be therapeutically

manipulated to improve patient outcome.

It is difficult to assign EOC patients into valid survival groups because of the complexity

of treatment variables. These include both the number of cycles and specific type of

chemotherapy used, type of surgery and also level of residual disease remaining after

surgery (debulk status) (van der Burg et al., 1995). In one recent study of EOC that

sought to relate survival time to gene expression profiles, a group of patients classified as

‘short-term survivors’ had a median survival time of 30 months (Spentzos et al., 2004).

However this definition was based on the specifics of the cohort under investigation and

may or may not be appropriate for other studies.

Microarrays are one of the most promising high-throughput methods for analysing

diseases at a fundamental molecular level. By integrating gene expression and clinical

data it is possible to gain insight into the foundations of clinical variables such as

disparate survival times. In recent times they have been used to explore molecular

differences between patients with short or long term survival (which can be defined as

disease-free or overall survival) in a range of other cancer types including breast (Jenssen

et al., 2002; van de Vijver et al., 2002), mesothelioma (Gordon et al., 2003), kidney

(Vasselli et al., 2003), prostate (Singh et al., 2002), diffuse large-B-cell lymphoma

(Rosenwald et al., 2002) and most recently, EOC (Hartmann et al., 2005; Spentzos et al.,

2004).

144

This chapter describes various bioinformatic approaches to identifying genes whose

expression patterns correlate with the variable of patient survival, in a cohort of EOC

cases analysed with the Peter Mac 10.5k cDNA microarray platform. The first section

describes case selection and the review processes carried out to create the best possible

cohort of patients for this type of analysis. A variety of approaches are then described in

which the survival variable is related to gene expression data, either as a categorical (e.g.

short versus long-term survival) or as a continuous variable (no. months). Methods for

assessing the significance of gene expression signatures obtained from these analyses are

then explored, along with different approaches for normalising cDNA microarray data to

address technical error. Finally genes whose expression patterns are identified as being

significantly related to length of survival are analysed with respect to other functional

information and their discovery in other studies of a similar nature is reported.

4.2. Results

4.2.1. Case selection and pathology review aimed at ensuring suitability for arraying and outcome analysis

4.2.1.1. Identification of appropriate cases from the AOCS microarray database

As part of the AOCS microarray project, a database of EOC microarray profiles was

created from retrospectively collected specimens of tissue sourced from several

participating institutions including Westmead Hospital, Royal Brisbane Hospital, as well

as Peter Mac. 10.5k cDNA microarrays were used as previously described. Sample

processing and microarray hybridisation was carried out by Sophie Katsabanis of the

Peter Mac Microarray Facility between 2001 and 2003.

A series of criteria for case selection was devised with assistance from Sian Fereday,

AOCS Data Manager and Dr Sherene Loi, a medical oncologist at Peter Mac. Patients

were included in the study if the following clinical and pathological information was

available (descriptions of these criteria are give in Materials and Methods section 2.2):

Date of diagnosis,

Date of last follow up or date of death,

Patient status at last follow up,

If patient deceased, cause of death,

Pathology grade and stage,

145

Histological subtype,

Chemotherapy information,

The amount of disease remaining at the conclusion of surgery (debulk status),

Arrayed using the pooled cell line reference RNA

In total, 26 cases that satisfied these criteria were identified. A large proportion of the

total number of cases available initially were excluded on the basis of missing

information about residual disease levels after surgical debulking. Given that the extent of

residual disease is known to be a significant prognosticator, it was not possible to

discount this factor and include cases without this data in the study (Bristow et al., 2002;

Grossi et al., 2002; van der Burg et al., 1995). The samples selected for this study are

described in Table 4-1.

The serous EOC subtype was selected as this represents the majority of EOC diagnoses.

The molecular and genetic differences between serous and the other major EOC subtypes,

endometrioid and mucinous, have been documented and it is clear that these subtypes

represent distinct disease types, each requiring separate investigation (Hess et al., 2004;

Pieretti et al., 2002; Schwartz et al., 2002).

14

6

Tab

le 4

-1 S

ampl

es se

lect

ed fo

r an

alys

is o

f gen

es e

xpre

ssio

n pr

ofile

s ass

ocia

ted

with

leng

th o

f sur

viva

l. Ta

ble

sorte

d by

dis

ease

stag

e. A

ll pa

tient

s had

de

ceas

ed o

f dis

ease

at t

ime

of la

st fo

llow

up.

Pat

ient

stat

us: 2

= d

ecea

sed

from

can

cer,

3 =

dece

ased

due

to o

ther

cau

ses.

Gra

de 9

= u

nkno

wn.

Patie

nt ID

M

orph

olog

y D

escr

iptio

n G

rade

St

age

No.

Mon

ths s

urvi

val

Patie

nt S

tatu

s R

esid

ual D

isea

se

85.0

05

Papi

llary

sero

us c

ysta

deno

carc

inom

a

9 II

I 15

2

Mod

92

.015

Se

rous

cys

tade

noca

rcin

oma

3

III

1 3

Min

91

.052

Pa

pilla

ry se

rous

cys

tade

noca

rcin

oma

2

IIB

17

2

Nil

93.1

31

Papi

llary

sero

us a

deno

carc

inom

a

9 II

IA

20

2 N

il 92

.071

Se

rous

cys

tade

noca

rcin

oma

2

IIIB

14

2

Min

90

.061

Pa

pilla

ry se

rous

ade

noca

rcin

oma

3

IIIC

10

2

Mod

91

.007

Pa

pilla

ry se

rous

ade

noca

rcin

oma

2

IIIC

9

2 M

ax

92.0

03

Sero

us su

rfac

e pa

pilla

ry c

arci

nom

a

3 II

IC

7 2

Min

92

.004

Pa

pilla

ry se

rous

ade

noca

rcin

oma

0

IIIC

45

2

Min

92

.074

Pa

pilla

ry se

rous

ade

noca

rcin

oma

3

IIIC

6

2 M

ax

93.0

21

Papi

llary

sero

us a

deno

carc

inom

a

3 II

IC

13

2 M

in

93.0

35

Papi

llary

sero

us a

deno

carc

inom

a

0 II

IC

15

2 M

in

93.1

08

Papi

llary

sero

us c

ysta

deno

carc

inom

a

0 II

Ic

16

2 M

in

93.1

20

Sero

us c

arci

nom

a 2

IIIC

15

2

Min

93

.128

Se

rous

car

cino

ma

3 II

IC

27

2 M

in

94.0

44

Papi

llary

sero

us c

ysta

deno

carc

inom

a

3 II

IC

101

2 N

il 94

.050

Pa

pilla

ry se

rous

ade

noca

rcin

oma

3

IIIC

40

2

Min

94

.056

Se

rous

cys

tade

noca

rcin

oma

3

IIIC

1

3 N

il 94

.067

Pa

pilla

ry se

rous

cys

tade

noca

rcin

oma

3

IIIC

20

2

Min

94

.070

Se

rous

cys

tade

noca

rcin

oma

2

IIIC

32

2

Nil

94.0

84

Sero

us c

arci

nom

a 3

IIIC

14

2

Min

94

.093

Se

rous

cys

tade

noca

rcin

oma

1

IIIC

15

2

Max

94

.113

Pa

pilla

ry se

rous

ade

noca

rcin

oma

9

IIIC

25

2

Min

94

.116

Se

rous

cys

tade

noca

rcin

oma

2

IIIC

12

2

Mod

94

.019

Pa

pilla

ry se

rous

ade

noca

rcin

oma

3

IV

11

2 M

in

94.0

68

Sero

us c

ysta

deno

carc

inom

a

3 IV

13

2

Max

92

.001

Se

rous

car

cino

ma

3 x

8 2

Max

147

4.2.1.2. Pathology review of selected cases for outcome analysis

Cases identified as suitable for this study were reviewed by pathologists Dr. Paul Waring

and Dr. Melissa Robbie to confirm adequate percentage tumour content and also the

histological subtype and grade of each specimen. All samples were determined to be over

50% tumour according to the tumour-nuclei method (Materials and Methods section

2.2.1) and the histological subtype of each was confirmed to match the original diagnosis.

4.2.2. A descriptive statistical analysis of the study cohort

To explore and visualise the characteristics of the 26 samples comprising the cohort for

analysis, a descriptive statistical analysis was carried out using Minitab and Microsoft

Excel. The median survival time for these 26 cases was 15 months (Figure 4-1d) with a

mean and standard deviation each of 19 months. These figures indicate a wide range of

survival times was present in this cohort, however the distribution of survival times does

not accurately resemble the distribution of EOC survival in the general population. All

but two patients in this sample set had deceased in less than 5 years, whereas according to

the Australian Institute of Health and Welfare, on average 42% of Australian women

survive past this time point (Australian Institute of Health and Welfare and Australasian

Association of Cancer Registries, 2001).

The Australian statistics are used for this study as they do not include the LMP form of

EOC in the calculations of survival rates, unlike figures from US agencies such as SEER

(Parkin and Muir, 1992; Ries LAG, 2004). As described in Chapter 5, this form of the

disease has a more favourable prognosis and is associated with statistically significant

longer survival times (Behtash et al., 2004; Sherman et al., 2004).

The overall median survival of optimally-debulked women with stage III disease who are

treated with standard chemotherapy (consisting of the drugs cisplatin or carboplatin plus

paclitaxel) is approximately 50 months (Gadducci et al., 2000; Markman et al., 2001;

Piccart et al., 2000) (shown as a dashed horizontal line across the survival time box plot

in Figure 4-1).

Based on the clinical information available for the study cohort identified, a descriptive

statistical analysis was carried out using Minitab, the results of which are shown in Figure

4-1. This analysis showed that over half the samples in the cohort were classified as stage

III, which indicated that the majority tumours had spread to both ovaries, beyond the

pelvis to the abdomen lining and/or into the lymph system by the time the tumour was

148

diagnosed. Grade 3 tumours represented over half of the cohort, indicating an overall

poor differentiation status. More then three quarters of samples were stage IIIC which

indicates that abdominal deposits of 2cm or greater were observed in the patients

afflicted, corresponding to an advanced stage of EOC. The following categories for

assessing the level of residual disease were as follows: nil: 0cm, min: 0-1cm, mod: 1-

2cm, max: >2cm thick section of tumour remaining after surgery. Two thirds of patients

were recorded as having nil or minimal levels of residual disease following surgery with

the remainder having moderate (1-2cm) or maximum (>2cm) levels indicating the

surgeon was unable to adequately debulk the tumour.

Therefore in summary, the average patient in this study had grade 3, stage IIIC EOC with

between 0 and 1cm of residual disease after surgery and lived for approximately 15

months after diagnosis. In the Spentzos et al study of gene expression profiles of EOC,

the most common clinical profile was grade 3, stage III, residual disease of less than 1cm

with a median overall survival of 49 months (Spentzos et al., 2004).

149

Figure 4-1: Descriptive statistical analysis of patient cohort. (A) Tumour grade summary of 26 sample cohort – over 50% were grade three (Key: 1: the least malignant, with well-differentiated cells, 2: intermediate, with moderately differentiated cells, 3: the most malignant, with poorly differentiated cells, 9: unknown) (B) Pathology Stage (FIGO) of tumours in cohort with stage III tumours representing bulk of this cohort (C) Residual disease summary – nil: 0cm, min: 0-1cm, mod: 1-2cm, max: >2cm thick section of tumour remaining after debulking surgery (D) Box plot of patient survival times (months) Asterisks correspond to outlier cases RBH 94.019 (101 months survival) and RBH 94.116 (116) months survival. Dashed horizontal line indicates population median survival time (50 months) for optimally debulked stage III EOC (Gadducci et al., 2000; Markman et al., 2001; Piccart et al., 2000).

Zero

One

Two

Three

Nine

Nil

Min

Mod

Max

II IIIAIIIB

IIIC

IVB

D 120

100

80

60

40

20

0

Mon

ths

A

C

150

4.2.2.1. Analysis of interaction between survival time and debulk status/residual disease

Due to the large body of evidence concerning the relationship between residual disease

and patient prognosis, the interaction of these two variables in the cohort was analysed.

By plotting the sample cohort in order of increasing survival, and then assigning each

class of residual disease a shading code, as shown in Figure 4-2A, a trend agreeing with

the literature was observed. Patients with higher levels of disease present after surgery

also appeared to have shorter survival periods. All patients with moderate or maximum

levels of residual disease experienced survival times equal or less than the cohort’s

median survival of 15 months.

To determine if a statistically significant relationship existed, patients were grouped into

two categories; (i) nil (0cm) or minimal (0-1cm) residual disease and (ii) moderate (1-

2cm) or maximum (>2cm) residual disease. A box plot of the survival times for each

group (Figure 4-2B) shows the overall shorter and less varied range of survival times seen

in group (ii) compared to group (i). Whilst a one way ANOVA revealed that this

difference was not statistically significant (p=0.149), there is a trend towards shorter

survival time correlating with higher levels of residual disease, agreeing with the current

literature on the impact of this variable as previously described (Hoskins et al., 1992;

Hoskins et al., 1994). Had the cohort available for this study been larger it can be

hypothesised that this trend may have reached a statistically significant level.

151

Figure 4-2: (A) Samples ordered by increasing survival time and coloured by residual disease status (B) Box plot of patient survival times grouped by residual disease categories. Categories: i – nil or minimal (0-1cm); ii – moderate or maximum (>1cm). Difference between two groups was not statistically significant (p=0.149) although the trend of shorter survival time associated with greater residual disease can be observed. Red dots indicate mean survival time for each group.

0

20

40

60

80

100

120

92.0

15

94.0

5692

.074

92.0

0392

.001

91.0

0790

.061

94.0

1994

.116

93.0

2194

.068

92.0

7194

.084

85.0

05

93.0

3593

.120

94.0

9393

.108

91.0

5293

.131

94.0

6794

.113

93.1

2894

.070

94.0

5092

.004

94.0

44

Patient ID

Mon

ths s

urvi

val

MaxModMinNilResidual disease:

10

100

50

0

Dis Code

Mon

ths

surv

ival

A

B

(i) (ii)

152

4.2.3. Processing of microarray data prior to investigation molecular signatures of patient survival

Microarray image sets generated by an Agilent Microarray Scanner BA for the 26 EOC

specimens were processed using GenePix image analysis software. The standard

algorithm for identifying poor quality and unreliably features was applied as described in

Material and Methods section 2.4.2. MMT scores were calculated following a standard

intensity-dependant lowess normalisation. All were within the range of acceptable levels

of spatial bias (100-200) therefore no further normalisation was applied in the interest of

preserving the dynamic range present in data and not unnecessarily manipulating the data.

Genes were then filtered to remove those missing values in 50% or more of the sample

set or having a log-ratio variation p-value > 0.01. Genes excluded by these criteria are

assumed not to contribute to the molecular phenotype of interest as described in Material

and Methods section 2.4.3. After this filtering, 474 genes remained, corresponding to a

95.5% reduction. Because this filter resulted in the exclusion of such a high proportion of

the total gene set, it was decided to use a less strict method for excluding genes with

lower (unsupervised) variation across the dataset, in order to supply the downstream

analyses with an adequately large list of genes. To achieve this, a minimum fold change

filter (1.5-fold change in either direction in >20% of samples) was used, which resulted in

a list of 4508 candidate genes for further analysis (42% of the total clone set).

Ein-Dor et al (Ein-Dor et al., 2005) have recently noted that in several published analysis

of breast cancer survival, no single gene was found to have a strong individual correlation

with patient survival. Rather, a large number of genes appeared to have a moderate

relationship with this variable. Thus it was the combination of these genes which was able

to accurately predict patient survival. This observation supports the decision to

compromise on the magnitude of individual gene variation in order to increase the

number of genes available for predictive analysis.

4.2.4. Identification of genes differentially expressed between patient survival groups

Several methods were used to identify genes with expression patterns that correlate with

length of survival, as described in Material and Methods section 2.4.8.

153

Quantitative trait analysis – identifies genes that have a significant correlation

with survival time (no. months) for each patients, using either Spearman or

Pearson correlation coefficients

Class comparison – T or F-tests between patients grouped into the following

classes, including the variable of residual disease in the statistical calculations:

o two survival groups (median)

o three survival groups based on approximately 1/3 of the cohort

represented in each (<12months, 12-24 months, >24 months)

o first and last quartiles

Survival analysis – specifically designed algorithm for identifying genes that are

predictive of patient survival based on Cox’s proportional hazards model and

censoring of any patients still alive at the time of last follow up (Cox, 1972).

The quantitative trait analysis has the advantage of using a Spearman, gene-rank based,

correlation measure to assess the relationship between gene expression and the survival

variable. Using the rank level of genes rather than absolute expression ratios can reduce

variation within a dataset by minimising the impact of outliers on the identification of

genes with significant correlation to the survival variable (Broberg, 2003; Troyanskaya et

al., 2002).

By combining the expression profiles of patients into discrete categories associated with

length of survival, the Class comparison methods allows the use of T or F-tests

(ANOVA) to identify genes with large differences in expression between survival classes,

relative to the variation within each class.

A survival analysis based on Cox’s Proportional hazards model is a non-parametric

method which does not make any assumptions about the nature or shape of the data

distribution (Cox, 1972). This approach allows the inclusion of data from patients still

alive at the time of last follow up, however as all patients included in the cohort for this

chapter were deceased at the time of last follow up, it was found that this method did not

differ greatly from the quantitative trait method.

154

4.2.4.1. Quantitative trait analysis of gene expression data to find genes related to patient survival

Samples were analysed for genes with significant correlation to survival time (months). A

Spearman correlation was used to assess the relationship between each gene and the

survival variable, with a significance threshold of univariate testing of 0.001. The

maximum number and proportion of false positives was set at 10 and 0.1 respectively.

This was achieved through the use of multivariate permutation testing, based on 1000

random permutations of the dataset to control the false discovery rate. This provides 90%

confidence that the list generated contains no more than 10% false discoveries.

Two genes were identified at the 0.001 significance level, which was not statistically

significantly larger than that expected by chance, as measured by permutation analysis

(p=0.825). Genes identified by this analysis, up to the maximum number of false

positives permitted, are shown in Table 4-2 although the relationship between these ten

genes and the months-survival variable is not statistically significant on a multivariate

level, due to the large number of repeated tests carried out.

In the list are genes involved in apoptosis (PHDLA), several genes related to endoplasmic

reticulum function (SEC22L3, GALNT1 and CHST2), the negative regulator of

transcription ZFN189 and a gene involved in muscle development and contraction

regulation, TNNT1. Of these ten genes, GALNT1 has been identified as being over

expressed in colorectal cancer compared to normal tissue (Kohsaki et al., 2000; White et

al., 1995), PHLDA2 expression, negatively correlated with survival in this study, has been

mapped to a chromosomal region frequently altered in breast, lung and ovarian cancer

(Hu et al., 1997), GNB5 over expression has been identified as predictive of lymph node

metastasis in oesophageal squamous cell carcinoma (Jones et al., 1998; Kan et al., 2004)

and TNNT1 was contained in a gene set used to identify small round blue-cell tumours

(Barton et al., 1999; Khan et al., 2001).

To visualise the strength of the relationship between the two significantly correlated

genes (at the univariate level) and length of survival, scatter plots were constructed as

shown in Figure 4.2 3. SCOC is under expressed, whilst PPAP2B is over expressed in

patients with shorter survival times.

Limited information exists about the function of SCOC. Although widely expressed in

human tissues, it does not appear to have been previously implicated in EOC. The

155

primary function of this gene is to act as a binding partner for the gene ARL1 which

regulates intracellular vesicular membrane trafficking (Van Valkenburgh et al., 2001).

The VEGF-inducible gene PPAP2B, also known as VCIP, has been demonstrated to

function in the regulation of cell-cell interaction and aggregation. Cell line studies have

demonstrated that the recombinant expression of this gene leads to greater cell adhesion

and spreading in endothelial cells (Humtsoe et al., 2003).

Whilst the probability of observing two genes correlated with patient survival is not a

statistically significant observation, due to the large number of genes present, inspection

of these plots reveals the potential of this method to identify potentially interesting and

clinically relevant genes from microarray expression data. Removal of the two outlier

cases (RBH 94.019 with 101 months survival and RBH 94.116 with 116 months survival)

in the second half of this figure further shows the association of these two genes with

length of survival for the remaining 24 patients.

This method did not take into account the important factor of residual disease in

determining the significance of each gene’s relationship with survival. Therefore an

ANOVA was also performed in which data from SCOC and PPAP2B expression levels

were analysed in regard to the residual disease categories corresponding to each patient.

No significant difference in the expression of these two genes was detected between

patient residual disease categories (P >0.05 for all tests). This suggests that the expression

of SCOC and PPAP2B correlates with the length of survival of these patients,

independent of the amount of residual disease present, with the important caveat of that

because of the larger number of genes to being with, it such patters of expression may be

detected by chance alone.

15

6

Tab

le 4

-2: G

enes

iden

tifie

d by

qua

ntita

tive

trai

t ana

lysi

s. Th

e fir

st tw

o ar

e si

gnifi

cant

at t

he 0

.001

leve

l of t

he u

niva

riate

test

.

Uni

Gen

e N

ame

Uni

Gen

e Sy

mbo

l

Spea

rman

C

orre

latio

n co

effic

ient

P-

valu

e

Sum

mar

y of

func

tion

Sum

mar

y of

gen

e on

tolo

gy m

embe

rshi

ps

Shor

t coi

led-

coil

prot

ein

SCO

C

-0.6

84

0.00

0442

4 - L

ittle

func

tiona

l inf

orm

atio

n kn

own.

Bin

ds to

GTP

ases

in th

e A

RF

fam

ily

(Van

Val

kenb

urgh

et a

l., 2

001)

N

/A

Phos

phat

idic

aci

d ph

osph

atas

e ty

pe 2

B

PPAP

2B

0.81

1

0.00

0712

9 - M

embr

ane

glyc

opro

tein

loca

lized

at t

he c

ell p

lasm

a m

embr

ane.

- E

xpre

ssio

n is

enh

ance

d by

epi

derm

al g

row

th fa

ctor

and

is k

now

n to

pr

omot

e gr

owth

and

mot

ility

in E

OC

(Kai

et a

l., 1

997;

Luq

uain

et a

l., 2

003)

germ

cel

l mig

ratio

n |

hydr

olas

e ac

tivity

| in

tegr

al

to m

embr

ane

| lip

id

met

abol

ism

Car

bohy

drat

e (N

-ac

etyl

gluc

osam

ine-

6-O

) sul

fotra

nsfe

rase

2

CH

ST2

0.77

4

0.00

108

- I

nvol

ved

in th

e in

flam

mat

ory

resp

onse

of v

ascu

lar e

ndot

helia

l cel

ls

- Sul

finat

ion

of th

e le

ukoc

yte

adhe

sion

mol

ecul

e L-

sele

ctin

(L

i and

Ted

der,

1999

)

carb

ohyd

rate

met

abol

ism

| in

flam

mat

ory

resp

onse

| in

tegr

al to

mem

bran

e |

sulfo

trans

fera

se a

ctiv

ity

poly

pept

ide

N-

acet

ylga

lact

osa

min

yl-

trans

fera

se 1

(G

alN

Ac-

T1)

GAL

NT1

-0

.723

0.

0021

738

- Ini

tiate

s muc

in-ty

pe O

-link

ed g

lyco

syla

tion

in th

e G

olgi

app

arat

us

- Exp

ress

ed in

var

ied

leve

ls in

col

orec

tal c

ance

r, tre

nd to

war

ds h

ighe

r ex

pres

sion

in tu

mou

r tis

sue

com

pare

d to

nor

mal

. (K

ohsa

ki e

t al.,

200

0; W

hite

et a

l., 1

995)

O-li

nked

gly

cosy

latio

n |

inte

gral

to m

embr

ane

| m

anga

nese

ion

bind

ing

|

Zinc

fing

er p

rote

in

189

ZNF1

89

-0.6

72

0.00

4061

4 - M

aps t

o ch

rom

osom

al re

gion

com

mon

ly d

elet

ed in

bla

dder

can

cer

(Ode

berg

et a

l., 1

998)

Neg

ativ

e re

gula

tion

of

trans

crip

tion

from

RN

A |

nucl

eus |

zin

c io

n bi

ndin

g SW

I/SN

F re

late

d,

mat

rix a

ssoc

iate

d,

actin

dep

ende

nt

regu

lato

r of

chro

mat

in, s

ubfa

mily

a-

like

1

SMAR

CAL

1 -0

.571

0.

0042

025

- Has

hel

icas

e an

d A

TPas

e ac

tiviti

es

- Reg

ulat

ion

of tr

ansc

riptio

n of

cer

tain

gen

es b

y al

terin

g ch

rom

atin

stru

ctur

e - M

utat

ions

in th

is g

ene

are

a ca

use

of a

con

ditio

n as

soci

ated

with

T-c

ell

imm

unod

efic

ienc

y.(C

olem

an e

t al.,

200

0)

ATP

bin

ding

| D

NA

bi

ndin

g | h

elic

ase

activ

ity

Plec

kstri

n ho

mol

ogy-

like

dom

ain,

fam

ily

A, m

embe

r 2

PHLD

A2

-0.7

25

0.00

4457

3

- Loc

ated

at 1

1p15

.5, a

n im

porta

nt tu

mou

r sup

pres

sor g

ene

regi

on.

- Alte

ratio

ns a

ssoc

iate

d w

ith lu

ng, o

varia

n, a

nd b

reas

t can

cers

and

pot

entia

lly

invo

lved

in re

gula

tion

of p

lace

ntal

gro

wth

. (H

u et

al.,

199

7)

apop

tosi

s | im

prin

ting

SEC

22 v

esic

le

traff

icki

ng p

rote

in-

like

3 (S

. cer

evis

iae)

SE

C22

L3

-0.6

99

0.00

4876

4 - V

esic

le tr

affic

king

pro

tein

s, lo

caliz

ed a

t the

end

opla

smic

retic

ulum

- D

own-

regu

late

d in

diff

use-

type

gas

tric

canc

er

(Has

egaw

a et

al.,

200

2; T

ang

et a

l., 1

998)

ER to

Gol

gi tr

ansp

ort |

in

tegr

al to

mem

bran

e

Gua

nine

nuc

leot

ide

bind

ing

prot

ein,

bet

a 5

GN

B5

0.69

8

0.00

4876

4 - I

dent

ified

as p

redi

ctiv

e of

lym

ph n

ode

met

asta

sis i

n oe

soph

agea

l squ

amou

s ce

ll ca

rcin

oma

(Jon

es e

t al.,

199

8; K

an e

t al.,

200

4)

G-p

rote

in c

oupl

ed re

cept

or

prot

ein

sign

allin

g pa

thw

ay

Trop

onin

T1,

skel

etal

, sl

ow

TNN

T1

-0.6

75

0.00

5147

- Com

pone

nt o

f the

trop

onin

com

plex

, for

min

g th

e ca

lciu

m-s

ensi

tive

mol

ecul

ar sw

itch

that

regu

late

s stri

ated

mus

cle

cont

ract

ion

- Ide

ntifi

ed in

an

alys

is o

f pre

dict

ive

gene

s for

smal

l rou

nd b

lue-

cell

tum

ours

(B

arto

n et

al.,

199

9; K

han

et a

l., 2

001)

mus

cle

deve

lopm

ent |

re

gula

tion

of m

uscl

e co

ntra

ctio

n | t

ropo

myo

sin

bind

ing

157

Figu

re 4

-3 S

catt

er p

lots

of

the

two

gene

s si

gnifi

cant

ly c

orre

late

d w

ith t

he v

aria

ble

of m

onth

s-su

rviv

al a

t th

e 0.

0001

lev

el;

(a)

SCO

C (

Spea

rman

co

rrel

atio

n =

-0.6

84) a

nd (b

) PPA

P2B

(0.8

11).

Pane

ls (c

) and

(d) s

how

the

corr

elat

ion

of th

ese

gene

s afte

r exc

ludi

ng p

atie

nts i

dent

ified

as s

urvi

val t

ime

outli

ers;

R

BH

94.

019

(101

mon

ths s

urvi

val)

and

RB

H 9

4.11

6 (1

16) m

onth

s sur

viva

l.

-3

-2.5-2

-1.5-1

-0.50

0.51

1.52

020

40

60

80

100

120

-3

-2.5-2

-1.5-1

-0.50

0.51

1.52

020

4060

80100

120

-3

-2.5-2

-1.5-1

-0.50

0.51

1.52

010

20

30

40

50

-3

-2.5-2

-1.5-1

-0.50

0.51

1.52

05

1015

20

25

30

35

40

45

50

A

C

B

D

SCO

C e

xpre

ssio

n vs

. sur

viva

l PP

AP2

B e

xpre

ssio

n vs

. sur

viva

l

Surv

ival

tim

e (m

onth

s)

Surv

ival

tim

e(m

onth

s)

SCOC expression ratio SCOC expression ratio

PPAP2B expression ratio PPAP2B expression ratio

Surv

ival

tim

e(m

onth

s)Su

rviv

al ti

me

(mon

ths)

R2 =

0.4

69

R2 =

0.1

51

R2 =

0.2

33

R2 =

0.4

70

158

4.2.4.2. F-test class comparison

This analysis was carried out to identify genes with statistically significant expression

differences between discrete groups of patients grouped into three categories of survival

time; <12 months (n=7), 12-24 (n=12) months and >24 months (n=7). The variable of

residual disease was included as a potential source of gene expression variation in the

data.

A random variance version of the F-test was used because of the small sample sizes

present for each group. The minimum significance level of each univariate test was set at

0.001, 90% confidence level of false discovery rate assessment and maximum number of

false positive genes was 10, as per the quantitative trait analysis described above.

Five genes were identified as having significant differential expression between these

classes of tumours at the 0.001 significance level. An additional five genes are included

whose expression difference between the groups was approaching significance at the

0.001 level defined. These are listed in Table 4-3, along with functional summaries and

gene ontology information. The probability of obtaining five genes from a total of 4,508

F-tests, if there are no real differences between classes (i.e. Null hypothesis), is 0.577.

This is based on 1000 random permutations of the dataset with the number of significant

genes between randomised classes calculated for each permutation. Again this reveals

that it is not possible to state whether the genes identified by this approach were selected

by chance or they represent genuine differences between these groups of patients.

Supervised hierarchical clustering of the ten genes (Figure 4-4) reveals imperfect

segregation of samples into survival group categories. The first two major branch points

of the dendrogram separate the five of the seven shortest-term survivors away from the

remainder of the cohort, but there is no real discrimination between the 12-24 and >24

month categories in this representation of the data.

Closer inspection of the genes selected and the associated literature reveals a broad range

of molecular functions represented. These include NAPSA (generating the most

significant p-value of all genes identified from this analysis), which increases the cell-

surface expression of E-cadherin protein on breast cancer cells and is a novel therapeutic

target presently being investigated (Tatnell et al., 1998; Thibout et al., 1999). NAPSA is

more highly expressed in the longer-term survival groups in this dataset.

159

Also included was BRF1 (expressed at lower levels in the longer-term survival groups),

which degrades AU-rich element containing mRNA. This gene is thought to facilitate

oncogenesis by regulating the decay of mRNA of the proliferation-associated genes IL3,

GM-CSF and TNF (Stoecklin et al., 2002; Wang and Roeder, 1995).

Another gene identified, ACP6, increased in expression in association with longer

survival, has been demonstrated to mediate cell proliferation and protect cells from

apoptosis when treated in vitro with cisplatin, a commonly used chemotherapeutic agent

(Hiroyama and Takenawa, 1999; Mackeigan et al., 2005). The protein encoded by this

gene has been suggested as a novel biomarker for EOC as it is detected at significantly

higher levels in plasma from patients with EOC compared to normal controls (Xu et al.,

1998).

MAPK1 was also selected, which has been found to be amplified and over expressed in

EOC (Benetkiewicz et al., 2005; Goedert et al., 1997) and important for a range of

processes such as proliferation and cellular differentiation. This gene was found to be

expressed at higher levels in the long term survival group in this dataset.

Finally, expression of the oncogene ETV3, also known as PE1, was found to be also

significantly different between patient groups, although was expressed at higher levels in

the mid-term survival group, making the result difficult to interpret in terms of a linear

relationship to the survival variable. This gene has an anti-proliferative effect by blocking

the oncogenic RAS-pathway of genes, suggesting it may be upregulated in those patients

with longer survival times (Klappacher et al., 2002).

16

0

Tab

le 4

-3: G

enes

iden

tifie

d by

ran

dom

-var

ianc

e F-

test

with

res

idua

l dis

ease

cat

egor

y us

ed a

s a b

lock

ing

vari

able

. Gro

ups u

sed

for A

NO

VA

wer

e (a

) <12

m

onth

(b) 1

2-24

mon

ths a

nd (c

) >24

mon

ths s

urvi

val.

Uni

Gen

e N

ame

Uni

Gen

e Sy

mbo

l P-

valu

e

Mea

n ra

tios:

<

12 m

onth

s;

12-2

4 m

onth

s;

>24

mon

ths

Sum

mar

y of

func

tion

Sum

mar

y of

gen

e on

tolo

gy

mem

bers

hips

Nap

sin

A

aspa

rtic

pept

idas

e N

APSA

0.

0001

20

0.39

; 0.6

33;

0.66

9

- Im

porta

nt fo

r cor

rect

fold

ing,

targ

etin

g, a

nd c

ontro

l of t

he

activ

atio

n of

asp

artic

pro

tein

ase

zym

ogen

s.

- Inc

reas

es e

xpre

ssio

n of

E-c

adhe

rin o

n su

rfac

e of

bre

ast

canc

er c

ells

- N

ovel

ther

apeu

tic a

gent

, clin

ical

tria

ls u

nder

way

. (T

atne

ll et

al.,

199

8; T

hibo

ut e

t al.,

199

9)

peps

in A

act

ivity

| pr

oteo

lysi

s an

d pe

ptid

olys

is

Fam

ily w

ith

sequ

ence

si

mila

rity

50,

mem

ber B

FAM

50B

0.00

0429

0.

032;

0.6

53;

0.87

6

- Fun

ctio

nal r

etro

poso

n ex

pres

sed

in w

ide

rang

e of

tiss

ues

(Sed

lace

k et

al.,

199

9)

nucl

eus

Aci

d ph

osph

atas

e 6,

ly

soph

osph

atid

ic

ACP6

0.

0004

78

0.20

5; 0

.692

; 0.

647

Incr

ease

d ba

sal c

ell s

urvi

val a

nd p

rovi

des s

igni

fican

t pr

otec

tion

from

cis

plat

in-in

duce

d ap

opto

sis i

n H

eLa

cell

stud

ies

(Hiro

yam

a an

d Ta

kena

wa,

199

9; M

acke

igan

et a

l., 2

005)

acid

pho

spha

tase

act

ivity

Hyp

othe

tical

pr

otei

n M

GC

1587

5 M

GC

1587

5 0.

0006

39

0.52

8; 0

.443

; 0.

752

- Seq

uenc

es a

s par

t of N

atio

nal I

nstit

utes

of H

ealth

M

amm

alia

n G

ene

Col

lect

ion.

(S

traus

berg

et a

l., 2

002)

mito

chon

drio

n | p

yrid

oxal

ph

osph

ate

bind

ing

| tra

nsam

inas

e ac

tivity

| tra

nsfe

rase

act

ivity

Ets v

aria

nt g

ene

3 ET

V3

0.00

0639

0.

284;

0.4

00;

0.19

9

- An

Ets r

epre

ssor

sugg

este

d to

con

tribu

te to

gro

wth

arr

est

durin

g te

rmin

al m

acro

phag

e di

ffer

entia

tion

- R

epre

sses

Ets

targ

et g

enes

invo

lved

in R

as-d

epen

dent

pr

olife

ratio

n (K

lem

sz e

t al.,

199

4; S

awka

-Ver

helle

et a

l.,

2004

)

nucl

eus |

nuc

leus

| re

gula

tion

of

trans

crip

tion,

DN

A-d

epen

dent

| tra

nscr

iptio

n fa

ctor

act

ivity

BR

F1 h

omol

og,

subu

nit o

f RN

A

poly

mer

ase

III

trans

crip

tion

initi

atio

n fa

ctor

II

IB (S

. ce

revi

siae

)

BRF1

0.

0011

2

0.25

2; 0

.157

; 0.

148

- Cen

tral r

ole

in tr

ansc

riptio

n in

itiat

ion

by R

NA

pol

ymer

ase

III o

n ge

nes e

ncod

ing

tRN

A, 5

S rR

NA

, and

oth

er st

ruct

ural

R

NA

s.

- Pro

mot

es d

egra

datio

n of

AR

E (A

U-r

ich

elem

ent)-

cont

aini

ng m

RN

A; i

mpo

rtant

in c

ell a

ctiv

atio

n an

d on

coge

nesi

s (S

toec

klin

et a

l., 2

002;

Wan

g an

d R

oede

r, 19

95)

RN

A p

olym

eras

e II

I tra

nscr

iptio

n fa

ctor

act

ivity

| tR

NA

tran

scrip

tion

| zin

c io

n bi

ndin

g

161

Uni

Gen

e N

ame

Uni

Gen

e Sy

mbo

l P-

valu

e

Mea

n ra

tios:

<

12 m

onth

s;

12-2

4 m

onth

s;

>24

mon

ths

Sum

mar

y of

func

tion

Sum

mar

y of

gen

e on

tolo

gy

mem

bers

hips

Will

iam

s B

eure

n sy

ndro

me

chro

mos

ome

regi

on 2

0C

WBS

CR2

0C

0.00

122

0.

294;

0.2

70;

0.25

6

- Del

eted

in W

illia

ms s

yndr

ome,

a m

ulti-

syst

em

deve

lopm

enta

l dis

orde

r cau

sed

by th

e de

letio

n of

con

tiguo

us

gene

s at 7

q11.

23.

(Dol

l and

Grz

esch

ik, 2

001)

N/A

Mito

gen-

activ

ated

pro

tein

ki

nase

11

MAP

K11

0.

0013

8 0.

186;

0.1

20;

0.51

9

- Mem

ber o

f the

MA

P ki

nase

fam

ily, w

hich

is a

n in

tegr

atio

n po

int f

or m

ultip

le b

ioch

emic

al si

gnal

s - I

nvol

ved

in p

rolif

erat

ion,

diff

eren

tiatio

n, tr

ansc

riptio

n re

gula

tion

and

deve

lopm

ent

- Ide

ntifi

ed a

s am

plifi

ed a

nd d

iffer

entia

lly e

xpre

ssed

in

EOC

. (B

enet

kiew

icz

et a

l., 2

005;

Goe

dert

et a

l., 1

997)

MA

P ki

nase

act

ivity

| | p

rote

in

kina

se c

asca

de |

resp

onse

to

stre

ss |

sign

al tr

ansd

uctio

n |

trans

fera

se a

ctiv

ity

A k

inas

e (P

RK

A) a

ncho

r pr

otei

n 8

AKAP

8 0.

0016

9 0.

414;

0.2

55;

0.27

1

- Bin

ds to

the

regu

lato

ry su

buni

t of p

rote

in k

inas

e (P

KA

) an

d co

nfin

es th

e ho

loen

zym

e to

dis

cret

e lo

catio

ns w

ithin

the

cell.

- H

as a

cel

l cyc

le-d

epen

dent

inte

ract

ion

with

the

RII

subu

nit

of P

KA

. (E

ide

et a

l., 1

998)

DN

A b

indi

ng |

mem

bran

e |

mito

sis |

pro

tein

kin

ase

A

bind

ing

| sig

nal t

rans

duct

ion

| su

gar p

orte

r act

ivity

| tra

nspo

rt |

zinc

ion

bind

ing

Dou

ble

C2-

like

dom

ains

, bet

a D

OC

2B

0.00

169

0.53

1; 1

.413

; 0.

508

- Int

erac

ts w

ith C

a2+ a

nd p

hosp

holip

id

(Orit

a et

al.,

199

5)

N/A

162

Figure 4-4: Hierarchical clustering using 10 genes identified by a random variance F-test of 26 EOC microarray profiles grouped into three survival-length categories. All but one case of the shortest-term survivors are clustered away from the mid and long term groups at the first major branch point in the dendrogram. Most genes appear to be down regulated in this short term group suggesting that a reduction or loss of their expression may confer a more aggressive tumour phenotype. Gray squares indicate absent expression values for these genes, which may have influenced the clustering.

<12 months

12-24 months

>24 months

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.2

1.5

2.0

2.5

3.0

3.0

1.0

0.3

NAP1

AI478508

D6S2654

MGC1587

ETV3

AKAP8

ACP6

BRF1

WBSCR20C

MAPK11

Survival category

Expression ratio

163

4.2.4.3. Genes with significant expression differences between survival groups independent of residual disease status

In an attempt to increase the chances of identifying genes with statistically significant

expression between groups of varying survival times, a two-class ANOVA was carried

out between the <12 month (n=7) and >24 month (n=7) survival groups present in the

cohort. This approach of comparing patients representing the outer regions of the survival

distribution is similar to that used by Spentzos et al (Spentzos et al., 2004).

Five of the seven patients in the <12 month group had either moderate or maximum

residual disease, compared to only minimal or nil residual disease in those patients who

lived for more than 24 months after diagnosis. This factor was taken into consideration

during the T-test as to identify genes that were differentially expressed between survival

groups, independent of the level of residual disease present.

In summary, this approach identified 27 genes significant at the 0.001 level of univariate

testing. Four of these genes overlapped with the previous 3-group ANOVA results

(NAP1, MGC15875, ETV3 and ACP6). Again, 1000 permutations of the dataset were

carried out to assess the significance of observing this number of differentially expressed

genes in a dataset of this size. The p-value for this assessment was not significant

(p=0.194), although this was an improvement on previous attempts at identifying a

molecular signature of patient survival using other approaches as described. The gene

identities, along with functional and ontology information are listed in Table 4-4.

Standard hierarchical clustering using the 27 genes is shown in Figure 4-5. Most of the

genes appear upregulated in both classes of tumours when the data is median-centred,

making the differences between patient groups difficult to visualise with this technique.

Whilst the mean fold change differences are small between two the two groups, their

consistency within groups resulted in a high level of statistical significance.

Gene ontology analysis was carried out on these 27 genes using the EASE method

(Hosack et al., 2003), with the total list of genes exhibiting significant variation in an

unsupervised 20% of the dataset used as a reference list. In reflection of the small number

of genes in this list and also their varied range of individual functions as summarised in

Table 4-4, no significantly represented gene ontologies were identified.

Using the Fishers Exact method of assessing significance, which does not take into

account the potential for co-variation of gene expression, one significantly represented

ontology class was identified. This was the transcription regulator activity (p=0.03)

164

category, represented in this gene list by ETV3, ESR1, ZFN161, SOX13, XFN161 and

ESR1, which are expressed at higher levels in the longer term survival group. The genes

ETV13 and SOX13, which are expressed at lower levels in the longer term survival group,

also belong to this ontology.

Genes in this ontology play a role in regulation of transcription, for example by

interacting with a DNA-binding factor, or binding a promoter or enhancer DNA

sequence. Loss of the normal controls over DNA replication is a process associated with

tumourigenesis and disease progression. Therefore, the relative down regulation of most

genes in this ontology within the short term survival group may reflect this occurrence of

aberrant DNA replication, resulting in a more aggressive phenotype and shorter survival

times.

165

Tab

le 4

-4: G

enes

diff

eren

tially

exp

ress

ed b

etw

een

patie

nts w

ith e

ither

<12

mon

ths o

r >2

4 m

onth

s sur

viva

l at t

he 0

.001

sign

ifica

nce

leve

l. G

enes

are

so

rted

by si

gnifi

canc

e of

diff

eren

tial e

xpre

ssio

n be

twee

n th

e tw

o su

rviv

al g

roup

s. Se

vera

l gen

es in

volv

ed in

cal

cium

tran

spor

t are

obs

erve

d as

wel

l as g

enes

im

plic

ated

in th

e pr

ogre

ssio

n of

a n

umbe

r of o

ther

can

cer t

ypes

.

Uni

Gen

e N

ame

Uni

Gen

e Sy

mbo

l P-

valu

e

Geo

m.

mea

n: <

12

mon

ths;

>2

4 m

onth

s

Fold

di

ffer

ence

of

geo

m.

mea

ns

Sum

mar

y of

func

tion

Sum

mar

y of

gen

e on

tolo

gy

mem

bers

hips

Late

nt

trans

form

ing

grow

th fa

ctor

be

ta b

indi

ng

prot

ein

2

LTBP

2 p

< 1.

0 x

10-7

0.

07; 0

.233

0.

3

- Ext

race

llula

r mat

rix p

rote

in w

ith m

ulti-

dom

ain

stru

ctur

e.

– Po

sses

ses u

niqu

e re

gion

s sim

ilar t

o th

e fib

rillin

s.

- Mul

tiple

func

tions

: mem

ber o

f the

TG

F-be

ta la

tent

co

mpl

ex, s

truct

ural

com

pone

nt o

f mic

rofib

rils,

and

role

in

cell

adhe

sion

. (M

oren

et a

l., 1

994)

calc

ium

ion

bind

ing

| ext

race

llula

r m

atrix

(sen

su M

etaz

oa) |

gro

wth

fa

ctor

bin

ding

| pr

otei

n se

cret

ion

| re

gula

tion

of c

ell c

ycle

| tra

nsfo

rmin

g gr

owth

fact

or b

eta

rece

ptor

sign

allin

g pa

thw

ay

Zinc

fing

er

prot

ein

161

ZNF1

61

p <

1.0

x 10

-7

0.21

; 0.3

11

0.67

5

- Inv

olve

d in

bot

h no

rmal

and

abn

orm

al c

ellu

lar

prol

ifera

tion

and

diff

eren

tiatio

n.

- Pos

sibl

e tra

nscr

iptio

n fa

ctor

, bin

ds to

the

CT/

GC

-ric

h re

gion

of t

he in

terle

ukin

-3 p

rom

oter

and

med

iate

s tax

tra

nsac

tivat

ion

of IL

-3

(Koy

ano-

Nak

agaw

a et

al.,

199

4)

DN

A b

indi

ng |

cellu

lar d

efen

se

resp

onse

| re

gula

tion

of

trans

crip

tion

from

RN

A

poly

mer

ase

II p

rom

oter

| zi

nc io

n bi

ndin

g

Odz

, odd

Oz/

ten-

m h

omol

og 4

(D

roso

phila

) O

DZ4

p

< 1.

0 x

10-7

0.

583;

0.

813

0.71

7

- Typ

e II

tran

smem

bran

e m

olec

ule

- Chr

omos

omal

tran

sloc

atio

n th

at le

ads t

o th

e fu

sion

of

DO

C4

and

HG

L, o

n ch

rom

osom

es 1

1 an

d 8

in b

reas

t ca

ncer

may

lead

to a

ctiv

atio

n of

Erb

B si

gnal

ling

thro

ugh

the

prod

uctio

n of

an

auto

crin

e lig

and

(Ben

-Zur

et a

l., 2

000)

N/A

Estro

gen

rece

ptor

1

ESR1

p

< 1.

0 x

10-7

2.

144;

2.

359

0.90

9

- Lig

and-

activ

ated

tran

scrip

tion

fact

or c

ompo

sed

of

seve

ral d

omai

ns im

porta

nt fo

r hor

mon

e bi

ndin

g, D

NA

bi

ndin

g, a

nd a

ctiv

atio

n of

tran

scrip

tion.

(G

reen

et a

l., 1

986)

DN

A b

indi

ng |

cell

grow

th |

chro

mat

in re

mod

ellin

g co

mpl

ex |

nega

tive

regu

latio

n of

mito

sis |

st

eroi

d ho

rmon

e re

cept

or a

ctiv

ity

SRY

(sex

de

term

inin

g re

gion

Y)-

box

13

SOX1

3 p

< 1.

0 x

10-7

0.

517;

0.

459

1.12

6 - I

nvol

ved

in th

e re

gula

tion

of e

mbr

yoni

c de

velo

pmen

t and

in

the

dete

rmin

atio

n of

cel

l fat

e.

(Roo

se e

t al.,

199

9)

mor

phog

enes

is |

nucl

eus |

re

gula

tion

of tr

ansc

riptio

n, D

NA

-de

pend

ent

BA

T2 d

omai

n co

ntai

ning

1

XTP2

p

< 1.

0 x

10-7

0.

354;

0.

288

1.22

9 - A

mpl

ified

and

ove

r exp

ress

ed in

inva

sive

bla

dder

can

cer

- Hig

hest

exp

ress

ion

leve

ls fo

und

in o

vary

(H

uang

et a

l., 2

002)

N

/A

16

6

Uni

Gen

e N

ame

Uni

Gen

e Sy

mbo

l P-

valu

e

Geo

m.

mea

n: <

12

mon

ths;

>2

4 m

onth

s

Fold

di

ffer

ence

of

geo

m.

mea

ns

Sum

mar

y of

func

tion

Sum

mar

y of

gen

e on

tolo

gy

mem

bers

hips

Cis

plat

in

resi

stan

ce-

asso

ciat

ed

over

expr

esse

d pr

otei

n

CRO

P p

< 1.

0 x

10-7

0.

24; 0

.179

1.

341

- Thi

s pro

tein

loca

lizes

with

a sp

eckl

ed n

ucle

ar p

atte

rn

- Cou

ld b

e in

volv

ed in

the

form

atio

n of

splic

esom

e vi

a th

e R

E an

d R

S do

mai

ns.

- Iso

late

d fr

om c

ispl

atin

-res

ista

nt c

ell l

ine

(Um

ehar

a et

al.,

200

3)

RN

A sp

licin

g | a

popt

osis

| nu

cleu

s |

resp

onse

to st

ress

Nuc

lear

ant

igen

Sp

100

SP10

0 p

< 1.

0 x

10-7

0.

864;

0.

636

1.35

8

- Int

erac

ts w

ith E

TS1

trans

crip

tion

fact

or

- Inh

ibits

the

inva

sion

of b

reas

t can

cer c

ells

and

is in

duce

d by

Inte

rfer

on-a

lpha

, sho

wn

to in

hibi

t the

inva

sion

of

canc

er c

ells

(S

eele

r et a

l., 2

001;

Yor

dy e

t al.,

200

4)

DN

A b

indi

ng |

chro

mat

in |

regu

latio

n of

tran

scrip

tion,

DN

A-

depe

nden

t

Lym

phot

oxin

be

ta (T

NF

supe

rfam

ily,

mem

ber 3

)

LTB

p <

1.0

x 10

-7

1.27

4;

0.78

9 1.

615

- Typ

e II

mem

bran

e pr

otei

n of

the

TNF

fam

ily.

- Anc

hors

lym

phot

oxin

-alp

ha to

the

cell

surf

ace

thro

ugh

hete

rotri

mer

form

atio

n.

- LTB

is a

n in

duce

r of t

he in

flam

mat

ory

resp

onse

syst

em

- Im

mun

e in

tera

ctio

n w

ith L

TB re

cept

or p

rom

otes

tum

our

grow

th b

y in

duci

ng a

ngio

gene

sis.

(Bro

wni

ng e

t al.,

199

3)

cell-

cell

sign

allin

g | i

mm

une

resp

onse

| in

tegr

al to

mem

bran

e |

mem

bran

e | s

igna

l tra

nsdu

ctio

n |

tum

our n

ecro

sis f

acto

r rec

epto

r bi

ndin

g

KIA

A12

40

prot

ein

KIA

A124

0 p

=

0.00

0112

5 0.

754;

1.

224

0.61

6 -I

sola

ted

from

bra

in c

DN

A li

brar

ies.

- No

func

tiona

l inf

orm

atio

n av

aila

ble

(Nag

ase

et a

l., 1

999)

ATP

bin

ding

| D

NA

bin

ding

| ce

ll cy

cle

| nuc

leos

ide-

triph

osph

atas

e ac

tivity

Bon

e ga

mm

a-ca

rbox

yglu

tam

ate

(gla

) pro

tein

(o

steo

calc

in)

BGLA

P p

=

0.00

0515

8 0.

186;

0.

412

0.45

1

- Hig

hly

cons

erve

d bo

ne-s

peci

fic o

steo

blas

t-syn

thes

ised

pr

otei

n - M

ay b

e in

volv

ed in

cal

cium

pho

spha

te d

epos

ition

in

psam

mom

a bo

dies

of o

varia

n se

rous

pap

illar

y cy

stta

deno

mac

arci

nom

as, a

ssoc

iate

d w

ith c

ellu

lar

degr

adat

ion.

(R

aym

ond

et a

l., 1

999)

calc

ium

ion

bind

ing

| cel

l adh

esio

n | h

ydro

xyap

atite

bin

ding

| od

onto

gene

sis |

regu

latio

n of

bon

e m

iner

aliz

atio

n

Aci

d ph

osph

atas

e 6,

ly

soph

osph

atid

ic

ACP6

p

=

0.00

0734

6 0.

205;

0.

647

0.31

7

Incr

ease

d ba

sal c

ell s

urvi

val a

nd p

rovi

des s

igni

fican

t pr

otec

tion

from

cis

plat

in-in

duce

d ap

opto

sis i

n H

eLa

cell

stud

ies

(Hiro

yam

a an

d Ta

kena

wa,

199

9; M

acke

igan

et a

l., 2

005)

acid

pho

spha

tase

act

ivity

Hyp

othe

tical

pr

otei

n FL

J201

52

FLJ2

0152

p

=

0.00

0734

6 0.

213;

0.

549

0.38

8 - S

eque

nces

as p

art o

f Nat

iona

l Ins

titut

es o

f Hea

lth

Mam

mal

ian

Gen

e C

olle

ctio

n.

(Stra

usbe

rg e

t al.,

200

2)

N/A

167

Uni

Gen

e N

ame

Uni

Gen

e Sy

mbo

l P-

valu

e

Geo

m.

mea

n: <

12

mon

ths;

>2

4 m

onth

s

Fold

di

ffer

ence

of

geo

m.

mea

ns

Sum

mar

y of

func

tion

Sum

mar

y of

gen

e on

tolo

gy

mem

bers

hips

Nap

sin

A a

spar

tic

pept

idas

e N

APSA

p

=

0.00

0734

6 0.

39; 0

.669

0.

583

- Im

porta

nt fo

r cor

rect

fold

ing,

targ

etin

g, a

nd c

ontro

l of

the

activ

atio

n of

asp

artic

pro

tein

ase

zym

ogen

s.

- Inc

reas

es e

xpre

ssio

n of

E-c

adhe

rin o

n br

east

can

cer c

ells

- P

oten

tial n

ovel

ther

apeu

tic a

gent

, clin

ical

tria

ls u

nder

w

ay.

(Tat

nell

et a

l., 1

998;

Thi

bout

et a

l., 1

999)

peps

in A

act

ivity

| pe

ptid

ase

activ

ity |

prot

eoly

sis a

nd

pept

idol

ysis

Hyp

othe

tical

pr

otei

n M

GC

1587

5 M

GC

1587

5 p

=

0.00

0734

6 0.

528;

0.

752

0.70

2 - S

eque

nces

as p

art o

f Nat

iona

l Ins

titut

es o

f Hea

lth

Mam

mal

ian

Gen

e C

olle

ctio

n.

(Stra

usbe

rg e

t al.,

200

2)

mito

chon

drio

n | p

yrid

oxal

ph

osph

ate

bind

ing

| tra

nsam

inas

e ac

tivity

| tra

nsfe

rase

act

ivity

Cas

pase

re

crui

tmen

t do

mai

n fa

mily

, m

embe

r 8

CAR

D8

p =

0.

0007

346

0.38

8;

0.53

5 0.

725

- Inv

olve

d in

pat

hway

s lea

ding

to a

ctiv

atio

n of

cas

pase

s or

nucl

ear f

acto

r kap

pa-B

in th

e co

ntex

t of a

popt

osis

or

infla

mm

atio

n, re

spec

tivel

y

- Mem

ber o

f CA

RD

fam

ily th

at se

lect

ivel

y su

ppre

sses

ap

opto

sis .

Exp

ress

ion

corr

elat

es w

ith sh

orte

r sur

viva

l tim

e in

col

orec

tal c

ance

r (P

atha

n et

al.,

200

1)

nucl

eus |

pro

tein

bin

ding

| re

gula

tion

of a

popt

osis

Rya

nodi

ne

rece

ptor

1

(ske

leta

l) RY

R1

p =

0.

0007

346

0.6;

0.7

25

0.82

8

- A c

alci

um re

leas

e ch

anne

l of t

he sa

rcop

lasm

ic re

ticul

um

as w

ell a

s a b

ridgi

ng st

ruct

ure

conn

ectin

g th

e sa

rcop

lasm

ic

retic

ulum

and

tran

sver

se tu

bule

(P

hilli

ps e

t al.,

199

6)

B-c

ell p

rolif

erat

ion

| apo

ptos

is |

calc

ium

cha

nnel

act

ivity

| ce

ll m

otili

ty |

posi

tive

regu

latio

n of

tra

nscr

iptio

n | p

rote

in fo

ldin

g|

regu

latio

n of

cel

l cyc

le |

regu

latio

n of

end

o &

exo

cyto

sis

RA

LBP1

as

soci

ated

Eps

do

mai

n co

ntai

ning

2

REPS

2 p

=

0.00

0734

6 0.

261;

0.

272

0.96

- Exp

ress

ion

in p

rost

ate

canc

er c

ells

indu

ces a

popt

osis

- A

ffec

ts d

rug

accu

mul

atio

n lo

wer

s dru

g re

flux

in c

ance

r ce

lls

(Ike

da e

t al.,

199

8)

calc

ium

ion

bind

ing

| epi

derm

al

grow

th fa

ctor

rece

ptor

sign

allin

g pa

thw

ay |

prot

ein

com

plex

as

sem

bly

G p

rote

in-

coup

led

rece

ptor

20

G

PR20

p

=

0.00

0734

6 0.

633;

0.

575

1.10

1 - I

nteg

ral m

embr

ane

prot

ein

high

ly e

xpre

ssed

in

gast

roin

test

inal

stro

mal

tum

ours

(O

'Dow

d et

al.,

199

7)

G-p

rote

in c

oupl

ed re

cept

or p

rote

in

sign

allin

g pa

thw

ay |

inte

gral

to

plas

ma

mem

bran

e

Prot

ocad

herin

12

PCD

H12

p

=

0.00

0734

6 0.

248;

0.

223

1.11

2

- Cel

lula

r adh

esio

n m

olec

ule

impo

rtant

for c

ell-c

ell

inte

ract

ions

at i

nter

endo

thel

ial j

unct

ions

- P

rom

otes

hom

otyp

ic c

alci

um d

epen

dent

agg

rega

tion

and

adhe

sion

and

clu

ster

s at i

nter

cellu

lar j

unct

ions

. (L

udw

ig e

t al.,

200

0)

calc

ium

ion

bind

ing

| cel

l adh

esio

n | c

ytos

kele

ton

| hom

ophi

lic c

ell

adhe

sion

| in

tegr

al to

pla

sma

mem

bran

e | n

euro

nal c

ell

reco

gniti

on

Seve

n tra

nsm

embr

ane

dom

ain

prot

ein

NIF

IE14

p

=

0.00

0734

6 0.

374;

0.

328

1.14

- S

eque

nces

as p

art o

f Nat

iona

l Ins

titut

es o

f Hea

lth

Mam

mal

ian

Gen

e C

olle

ctio

n.

(Stra

usbe

rg e

t al.,

200

2)

inte

gral

to m

embr

ane

16

8

Uni

Gen

e N

ame

Uni

Gen

e Sy

mbo

l P-

valu

e

Geo

m.

mea

n: <

12

mon

ths;

>2

4 m

onth

s

Fold

di

ffer

ence

of

geo

m.

mea

ns

Sum

mar

y of

func

tion

Sum

mar

y of

gen

e on

tolo

gy

mem

bers

hips

A d

isin

tegr

in-li

ke

and

met

allo

prot

ease

w

ith

thro

mbo

spon

din

type

1 m

otif,

4

ADAM

TS4

p =

0.

0007

346

0.58

1; 0

.49

1.18

6

- Dis

inte

grin

and

met

allo

prot

eina

se w

ith th

rom

bosp

ondi

n m

otifs

-4, w

hich

is a

mem

ber o

f the

AD

AM

TS p

rote

in

fam

ily.

- Res

pons

ible

for t

he d

egra

datio

n of

agg

reca

n, a

maj

or

prot

eogl

ycan

of c

artil

age

(Tor

tore

lla e

t al.,

200

0)

extra

cellu

lar m

atrix

(sen

su

Met

azoa

) | in

tegr

in-m

edia

ted

sign

allin

g pa

thw

ay |

met

allo

endo

pept

idas

e ac

tivity

|

Hyd

roxy

ster

oid

(11-

beta

) de

hydr

ogen

ase

2 H

SD11

B2

p =

0.

0007

346

0.35

6;

0.29

3 1.

215

- Pla

ys ro

le in

mod

ulat

ing

min

eral

ocor

ticoi

d an

d gl

ucoc

ortic

oid

rece

ptor

occ

upan

cy b

y gl

ucoc

ortic

oids

- D

etec

ted

in a

dult

adre

nal c

ortic

al c

arci

nom

a an

d ad

enom

a (A

lbis

ton

et a

l., 1

994)

cell-

cell

sign

allin

g | g

luco

corti

coid

bi

osyn

thes

is |

met

abol

ism

| m

icro

som

e | o

xido

redu

ctas

e ac

tivity

Scav

enge

r re

cept

or c

lass

A,

mem

ber 3

SC

ARA3

p

=

0.00

0734

6 1.

496;

1.

094

1.36

7

- A m

acro

phag

e sc

aven

ger r

ecep

tor-

like

prot

ein.

- D

eple

tes r

eact

ive

oxyg

en sp

ecie

s, pr

otec

ting

cells

from

ox

idat

ive

stre

ss, w

hich

indu

ces i

ts e

xpre

ssio

n (H

an e

t al.,

199

8)

UV

pro

tect

ion

| cyt

opla

sm |

phos

phat

e tra

nspo

rt | r

espo

nse

to

oxid

ativ

e st

ress

| sc

aven

ger

rece

ptor

act

ivity

Ets v

aria

nt g

ene

3 ET

V3

p =

0.

0007

346

0.28

4;

0.19

9 1.

427

- An

Ets r

epre

ssor

sugg

este

d to

con

tribu

te to

gro

wth

arr

est

durin

g te

rmin

al m

acro

phag

e di

ffer

entia

tion

- R

epre

sses

Ets

targ

et g

enes

invo

lved

in R

as-d

epen

dent

pr

olife

ratio

n (K

lem

sz e

t al.,

199

4; S

awka

-Ver

helle

et a

l.,

2004

)

regu

latio

n of

tran

scrip

tion,

DN

A-

depe

nden

t | tr

ansc

riptio

n fa

ctor

ac

tivity

But

yrop

hilin

, su

bfam

ily 3

, m

embe

r A1

BTN

3A1

p =

0.

0007

346

0.52

4;

0.29

9 1.

753

- In

the

B b

ox fa

mily

of p

rote

ins.

- Inv

olve

d in

cel

l pro

lifer

atio

n an

d de

velo

pmen

t. - S

eque

nce

anal

ysis

sugg

ests

a c

ell s

urfa

ce re

cept

or

func

tion

(Rho

des e

t al.,

200

1; T

aylo

r et a

l., 1

996)

inte

gral

to m

embr

ane

| lip

id

met

abol

ism

Glu

tath

ione

pe

roxi

dase

3

(pla

sma)

G

PX3

p =

0.

0007

346

1.66

1;

0.85

9 1.

934

- Exp

ress

ed h

ighl

y in

cle

ar-c

ell o

varia

n ca

ncer

, a h

ighl

y m

alig

nant

subt

ype

- Fun

ctio

ns in

the

prot

ectio

n of

cel

ls a

gain

st o

xida

tive

dam

age.

(H

ough

et a

l., 2

001;

Tak

ahas

hi e

t al.,

198

7)

elec

tron

trans

porte

r act

ivity

| ex

trace

llula

r reg

ion

| glu

tath

ione

pe

roxi

dase

act

ivity

| ox

idor

educ

tase

ac

tivity

| re

spon

se to

lipi

d hy

drop

erox

ide

| sol

uble

frac

tion

169

Figure 4-5: Hierarchical cluster of generated from EOC specimens associated with either <12 or >24 months survival. The 27 genes used were identified as being differentially expressed between survival classes using a t-test approach with a random variance model. Variation in the level of residual disease remaining after surgery was factored into the model. The univariate level of significance was set at 0.001. The p-value for observing this number genes from a dataset of this size is p=0.194; therefore some genes present may have been selected by chance alone. This approach was the closest to be being statistically significant of all methods tried and a number of biologically relevant genes were identified, as discussed.

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.2

1.5

2.0

2.5

3.0

3.0

1.0

0.3

Expression ratio:

<12 months

>24 months

Survival group

ESR1 KIAA1240 ODZ4 ADAMTS4 SP100 BGLAP XTP2 PCDH12 LTB GPR20 GPX3 CARD8 NIFIE14 LUC7A RYR1 MGC15875 ETV3 LTBP2 HSD11B2 SOX13 SCARA3 ZNF161 BTN3A1 ACP6 FLJ20152 NAP1 REPS2

170

4.2.4.4. Cox’s proportional hazards model survival analysis of gene expression data for predictive model of EOC survival

A Cox proportional hazards model and Wald statistic (also known as a T statistic) was

performed one gene at a time to test for its dependency on the survival time variable. The

significance cut-off was set at 0.001 and 1,000 permutations of the dataset were

performed to determine the statistical significance of the genes identified in relation to the

size of the available dataset. As all patients in the cohort have deceased at the last date of

follow-up, no censoring of data was required.

Only one gene was identified as significant at the specified level, this observation not

being statistically robust following permutation testing (p=0.737) The gene identified,

Hypothetical LOC388298, most likely has no relationship to the survival time and was

selected by chance alone due the large number of tests performed. This function of this

gene, based on 99.2% sequence homology to VAS1, is thought to be the acidification of

intracellular compartments and does not appear to have been associated with cancer-

specific events in the literature (Nelson and Harvey, 1999).

These experiments highlight the importance of evaluating the significance of ‘predictive’

gene lists identified from gene expression data. This can be done by performing large

numbers of random permutations to determine the number of genes that can be expected

by chance alone and also by analysis of the associated literature available for each gene.

The significance level of 0.001 for univariate testing ensures that no more than 10 false-

positive genes can be selected from the ten thousand present on the array platform in use.

However this number is often smaller due to the unsupervised filtering of genes without

significant variation from baseline expression across a subset of the total experiment.

Multivariate permutation testing, as used for the analyses carried out in this chapter is

effective for controlling the proportion of false discoveries whilst taking into account (i)

the small number of samples relative to individual genes measured, (ii) potential for

inaccuracy at the extreme ends of the normal distribution of gene expression values and

(iii) the known correlation between genes of similar structure and/or function (Reiner et

al., 2003).

171

4.2.5. Experimentation with normalisation algorithms to improve detection of survival-related gene expression

The available gene expression dataset was re-normalised with a range of different

methods in order to determine if technical bias or noise introduced by a particular

normalisation algorithm was the cause of the inability of the feature selection approaches

to identifying survival-related set of genes.

The following normalisation methods, as described in section 2.4.3, were applied to

separate copies of the dataset:

Median per-gene and per-array normalisation

Lowess – intensity based normalization

SNOMAD

Print-tip lowess with background subtraction

Print-tip lowess and without background subtraction

For each type of normalisation listed above, the feature selection approaches described in

sections 4.2.4.1 - 4.2.4.4 were repeated to determine if the manipulation carried out by

each algorithm impacted on the ability to identify genes with expression patterns

significantly related to the survival variable. No combination of normalisation and feature

selection yielded a set of genes with significantly different expression between survival

categories, correlation to the continuous variable of survival time or significant

performance with Cox’s proportional hazards model.

Analysis of data from Lucidea Microarray ScoreCard features revealed that

normalisations (ii) – (v) described above resulted in measurements with no significant

difference between the theoretical expected and observed values. Methods (i) and (vi),

which involved scaling all genes to the median expression of each gene and also each

array, did result in average scorecard values significantly different (p>0.05 in all cases) to

their expected values. In this case the use of spatial lowess (print tip and SNOMAD) or

intensity-only based lowess had no significant effect on the accuracy of the ScoreCard

quality control features. This is most likely a reflection of the low spatial bias present in

the data generated using an Agilent Microarray Scanner and GenePix image analysis

package, as demonstrated in Chapter 3.

172

4.2.6. RT-PCR validation: selection of genes with minimum 2-fold change in expression between patient survival groups

In order to obtain a set of genes to validate with RT-PCR and carry the analysis through

to the stage of independent validation, with the caveats of lack of statistical significance

as described above, the mean expression ratio of all samples with <12 months or >24

months was plotted (Figure 4-6). From this approach, 130 genes were identified with 2-

fold or greater differences in mean expression between the two classes. The full list is

given in Appendix G. Four genes were selected based on their fold change and literature

searches of their purported involvement in EOC biology. Four genes were selected with

higher levels in patients with longer survival times (KLK7, SLIP, S100A2 and TNFSF10)

and two with reduced mean expression with increased survival time (FN1 and UPA).

Details and literature information for these genes is given in Table 4-5.

The identity of each gene was confirmed by sequencing and RT-PCR primers

corresponding to the sequences listed in Table 4-6 were designed using the GenScript

Real-time PCR (TaqMan, USA) Primer Design tool (GenScript, USA).

173

-7

-5

-3

-1

1

3

5

-7 -5 -3 -1 1 3 5

mean (<12 months survival)

mea

n (>

24

mon

ths s

urvi

val)

Figure 4-6: Mean expression profile for short (a) and long (c) term survival cases. Diagonal lines indicate 2-fold up and down regulation. Red dots indicate genes above or below this threshold. An annotated list of all genes over 2-fold up or down regulated by this analysis is shown in Appendix G.

17

4

Tab

le 4

-5: S

elec

ted

gene

s with

2-f

old

or g

reat

er d

iffer

ence

s bet

wee

n m

ean

phen

otyp

e pr

ofile

s. M

ean

expr

essi

on ra

tios f

or e

ach

clas

s giv

en a

long

with

di

rect

ion

of e

xpre

ssio

n ch

ange

with

incr

ease

d le

ngth

of s

urvi

val a

nd m

ean

fold

cha

nce.

KLK

7, S

LIP,

S10

0A2

and

TNFS

F10

have

hig

her m

ean

leve

ls o

f ex

pres

sion

in p

atie

nts w

ith lo

nger

surv

ival

whe

reas

FN

1 an

d U

PA h

ave

low

er le

vels

.

Uni

Gen

e Sy

mbo

l U

niG

ene

Nam

e M

ean

(<12

m

onth

s)

Mea

n (>

24

mon

ths)

Cha

nge

in e

xpre

ssio

n w

ith in

crea

sed

leng

th

of su

rviv

al

Rel

evan

ce to

EO

C su

rviv

al.

KLK

7 K

allik

rein

7

(chy

mot

rypt

ic,

stra

tum

cor

neum

) 7.

171

17.9

49

Incr

ease

s (2.

5 fo

ld)

KLK

7 is

a p

oten

tial b

iom

arke

r for

EO

C, w

here

it is

exp

ress

ed a

t sig

nific

antly

hi

gher

leve

ls in

late

-sta

te d

isea

se c

ompa

red

to n

orm

al o

r ben

ign

aden

oma.

K

LK7

stat

us (n

egat

ive/

posi

tive)

has

bee

n de

mon

stra

ted

to b

e a

pred

icto

r of

both

dis

ease

free

and

ove

rall

surv

ival

in E

OC

and

cor

rela

tes t

o am

ount

of

resi

dual

dis

ease

rem

aini

ng a

fter s

urge

ry (D

ong

et a

l., 2

003;

Kyr

iako

poul

ou e

t al

., 20

03)

SLPI

Se

cret

ory

leuk

ocyt

e pr

otea

se in

hibi

tor

(ant

ileuk

opro

tein

ase)

5.

296

13.5

72

Incr

ease

s (2.

6-fo

ld)

Prom

otes

the

tum

ourig

enic

and

met

asta

tic p

oten

tial o

f can

cer c

ells

. Pro

mot

es

repo

rter a

nd su

icid

e ge

ne e

xpre

ssio

n. H

as b

een

prop

osed

as a

can

dida

te fo

r ad

enov

irus-

med

iate

d ge

ne th

erap

y fo

r EO

C (B

arke

r et a

l., 2

003;

Shi

gem

asa

et

al.,

2001

)

S100

A2

S100

cal

cium

bin

ding

pr

otei

n A

2 1.

432

4.05

8 In

crea

ses (

2.8

fold

)

S100

A2 h

as a

pot

entia

l rol

e as

a tu

mou

r sup

pres

sor g

ene

and

also

regu

late

s the

ac

cum

ulat

ion

of c

alci

um in

nor

mal

mam

mar

y ep

ithel

ial c

ells

. Its

exp

ress

ion

is

elev

ated

in E

OC

rela

tive

to n

orm

al o

vary

tiss

ue.(H

ough

et a

l., 2

001;

San

tin e

t al

., 20

04)

TNFS

F10

Tum

our n

ecro

sis

fact

or (l

igan

d)

supe

rfam

ily, m

embe

r 10

1.82

6 4.

239

Incr

ease

s (2.

3 fo

ld)

Als

o kn

own

as T

RAIL

and

ass

ocia

ted

with

favo

urab

le o

utco

me

in E

OC

. It i

s a

pote

nt d

eath

pro

tein

that

favo

urs t

he k

illin

g of

var

ious

type

s of c

ance

r cel

ls to

no

rmal

cel

ls. H

igh

expr

essi

on o

f thi

s gen

e in

EO

C is

a si

gnifi

cant

indi

cato

r of

long

er su

rviv

al ti

mes

.(Lan

cast

er e

t al.,

200

4; L

anca

ster

et a

l., 2

003;

Wile

y et

al

., 19

95)

FN1

Fibr

onec

tin 1

1.

114

0.53

5 D

ecre

ases

(2.1

fold

)

FN1

is a

n ex

trace

llula

r mat

rix p

rote

in w

hich

pro

mot

es tu

mou

r mig

ratio

n an

d in

vasi

on th

roug

h im

porta

nt c

ell-a

dhes

ion

func

tions

. It i

s als

o re

porte

d to

hav

e im

mun

osup

pres

sive

func

tions

. Thi

s gen

e is

kno

wn

to b

e up

regu

late

d in

in

vasi

ve tu

mou

rs b

ut tu

mou

r of l

ow m

alig

nant

pot

entia

l (LM

P). (

Fran

ke e

t al.,

20

03; S

hige

mas

a et

al.,

200

1)

UPA

/PLA

U

Plas

min

ogen

ac

tivat

or, u

roki

nase

3.

909

1.94

7 D

ecre

ases

(2.0

fold

)

Hig

h U

PA a

ssoc

iate

d w

ith re

sidu

al d

isea

se a

nd sh

orte

ned

dise

ase-

free

surv

ival

. PL

AU/P

AI-1

axi

s may

play

an

impo

rtant

role

in th

e in

tra-a

bdom

inal

spre

ad a

nd

reim

plan

tatio

n of o

varia

n ca

ncer

cel

ls. T

he p

rogn

ostic

rele

vanc

e of

PLA

U a

nd

PAI-

1 su

ppor

ts th

eir p

ossi

ble

role

in th

e m

alig

nant

pro

gres

sion

of o

varia

n ca

ncer

(Kon

ecny

et a

l., 2

001;

Sch

mitt

et a

l., 1

997)

175

Table 4-6: Primer sequences designed for RT-PCR validation of genes identified with >two-fold mean differential expression between patients of either <12 or >24 months survival

Gene Forward primer Reverse primer HPRT1 (control) CTGGCGTCGTGATTAGTGAT CTCGAGCAAGACGTTCAGTC KLK7 CATCCCCGACTCCAAGAAAA ACCAGACCTTGCAGGGTACCT SLPI ATGTGTGGGAAATCCTGCGT CACACAGAGCAGGACTCCAGAG S100A2 AGGGCGACAAGTTCAAGCTG CTTTCTCCCCCACAAAGCTG TNFSF10 TGCTGATCGTGATCTTCACA AAGAAACAAGCAATGCCACTT FN1 GGTTCGGGAAGAGGTTGTTA TCATCCGTAGGTTGGTTCAA UPA TACTGCAGGAACCCAGACAA AGTCATGCACCATGCACTCT

4.2.6.1. Samples profiled by microarray analysis – PCR validation of selected two-fold differentially expressed genes

RT-PCR was carried out on reverse transcribed RNA as described in section 2.3.2.2. The

housekeeping gene HPRT was used as a control and each sample was repeated in

triplicate. Initially RT-PCR was carried out on samples used to generate the microarray

expression data. Only a subset of the total cohort were available due to limited remaining

amounts of extracted RNA and plans to profile these samples using the Affymetrix

platform in future AOCS projects. Sufficient RNA was available for nine samples, five

with survival times of <12 months (median 10.5 months) and four of >24 months (median

38.5 months), which are listed in Table 4-7.

The mean expression level of a small number of heterogeneous samples is susceptible to

variation by the addition or exclusion of samples. Because of the smaller cohort available

for technical validation, the mean differences for these specific samples were calculated,

shown in Table 4-8. Due to limited amounts of RNA, only the 4 genes with the largest

fold change differences were used for this technical validation, KLK7, SLPI, FN1 and

UPA. The average standard deviation for each triplicate PCR reaction was 0.16 indicating

a low level of variation between separate reactions.

Good agreement was observed between microarray and RT-PCR measurement of KLK7

and SLPI, both being expressed more than 2-fold higher in the longer-term survival group

(>24 months). The fold changes of FN1 and UPA expression did not correlate to the same

extent, which most likely to the small sample size used this analysis. The fold change

between survival groups was still less than half that observed for KLK7 and SLPI, as

shown in Table 4-8.

176

Table 4-7: Samples used for technical validation of genes with > 2-fold differential expression between EOC survival groups.

Patient ID Survival Group 92.003 <12 months 91.007 <12 months 90.061 <12 months 94.019 <12 months 94.116 <12 months 94.113 >24 months 94.070 >24 months 92.004 >24 months 94.044 >24 months

Table 4-8: Mean microarray and RT-PCR gene expression ratios for technical validation subset of primary patient outcome cohort. Ratios calculated using data from specimens available for RT-PCR validation experiment only.

KLK7 SLPI FN1 UPA

Group Array PCR Array PCR Array PCR Array PCR

Groups A: <12 months survival 9.08 777.25 1.79 3785.01 1.23 5.61 2.03 9.14

Group C: >24 months survival 20.67 2401.96 3.92 13754.52 0.89 4.06 1.18 7.76

Mean difference (Mean Group C / Mean Group A)

2.28 3.09 2.19 3.63 0.72 0.72 0.58 0.85

Pearson Correlation 0.72 0.63 0.69 0.62

4.2.6.2. Independent biological validation of genes with two-fold mean differential expression between survival groups

In order to validate the expression differences for these selected genes in samples not

used in the primary analysis, RNA was obtained from EOC specimens not used in the

original cohort. Five samples from patients with survival times of <12 months (median 9)

and nine samples with survival times > 24 months (median 57 months) were obtained.

Specimens were reviewed by pathologist Dr Melissa Robbie as previously described to

ensure adequate tumour content and correct diagnosis. RNA was extracted by Anna

Tinker as part of the ongoing AOCS tumour profiling study and 5ug of total RNA used to

produce cDNA template.

RT-PCR was carried out using the housekeeping gene HPRT as a control gene (de Kok et

al., 2005). Each measurement was repeated three times per sample. Normalised RT-PCR

results are shown in Table 4-9 and a summary of the mean fold chance differences for the

177

two survival classes is given in Table 4-10. Overall the RT-PCR measured mean fold

changes agreed with the microarray data, although the exact fold changes varied slightly,

as expected for a sample set of this limited size. This result does however validate the

ability of this approach to identify a subset of genes with differential expression based on

observed mean fold changes and literature analysis. The standard deviation between

triplicate RT-PCR reactions was 0.14 indicating a high degree of accuracy between

multiple measurements of the same gene/template combination.

This approach is not the ideal one for identifying genes with significant expression

differences between groups of patients, rather was tailored to the sample size of this study

and the inability of more accepted methods of survival analysis to identify a set of genes

for validation.

17

8

Tab

le 4

-9: I

ndep

ende

nt b

iolo

gica

l val

idat

ion

set a

nd R

T-P

CR

dat

a fo

r se

lect

ed m

ean

2-fo

ld d

iffer

entia

lly e

xpre

ssed

gen

es. R

esid

ual d

isea

se s

umm

ary

– ni

l: 0c

m, m

in: 0

-1cm

, mod

: 1-2

cm, m

ax: >

2cm

thic

k se

ctio

n of

tum

our r

emai

ning

afte

r sur

gery

. RT-

PCR

scor

es n

orm

alis

ed to

HPR

T ex

pres

sion

and

are

ave

rage

of

trip

licat

e m

easu

rem

ents

. Das

hes i

ndic

ate

eith

er th

e ab

senc

e of

dat

a ge

nera

ted

from

this

reac

tion,

or a

larg

e di

verg

ence

bet

wee

n re

plic

ate

mea

sure

s.

Patie

nt ID

Pa

thol

ogy

clas

sific

atio

n M

onth

s sur

viva

l R

esid

ual D

isea

se

KLK

7 SL

PI

S100

A2

TNF

SF10

n F

N1

UPA

85

.064

Pa

pilla

ry se

rous

cys

tade

noca

rcin

oma

2 M

ax

58.8

7 -

- 8.

28

17.3

2 67

3.61

93

.086

Pa

pilla

ry se

rous

ade

noca

rcin

oma

3

Max

10

9.04

74

72.1

5 11

46.1

9 13

.85

13.0

8 15

4.34

86

.027

Se

rous

car

cino

ma

5 M

in

5.63

91

4.51

31

.95

9.74

2.

11

15.3

4 85

.031

Pa

pilla

ry se

rous

ade

noca

rcin

oma

9

Min

11

.40

3275

.01

285.

31

15.1

4 72

.95

1255

.68

95.0

14

Sero

us c

arci

nom

a 9

Min

17

.84

3434

.70

94.1

5 16

.19

1.72

48

.61

94.1

13

Papi

llary

sero

us a

deno

carc

inom

a

25

Nil

165.

04

1866

9.01

60

90.1

4 26

.23

126.

42

919.

00

93.0

75

Papi

llary

sero

us a

deno

carc

inom

a

27

Nil

20.9

3 19

5.83

35

.24

7.23

0.

51

19.3

8 93

.072

Se

rous

ade

noca

rcin

oma

33

Nil

149.

99

1185

.98

11.0

8 24

.96

4.30

48

.91

93.0

06

Papi

llary

sero

us c

ysta

deno

carc

inom

a 34

M

ax

56.4

5 27

61.1

4 26

0.00

37

.66

11.2

0 11

8.41

95

.002

Pa

pilla

ry se

rous

ade

noca

rcin

oma

80

M

in

9.67

18

17.1

7 20

2.25

8.

03

3.06

83

.23

87.0

35

Sero

us c

ysta

deno

carc

inom

a

86

Max

-

- -

- 4.

25

10.9

6 93

.056

Pa

pilla

ry se

rous

cys

tade

noca

rcin

oma

126

Min

16

.30

2690

.87

1020

.06

5.11

41

.41

804.

65

86.0

28

Sero

us c

ysta

deno

carc

inom

a

211

Min

42

.31

5825

.52

2543

.18

8.04

6.

44

260.

70

86.0

58

Sero

us c

ysta

deno

carc

inom

a

214

Nil

4.95

10

655.

47

741.

92

8.70

9.

18

92.8

0 T

able

4-1

0: S

umm

ary

of g

ene

expr

essi

on m

easu

rem

ents

from

mic

roar

ray

data

, RT

-PC

R v

alid

atio

n w

here

ava

ilabl

e (P

CR

1) a

nd R

T-P

CR

inde

pend

ent

valid

atio

n co

hort

(PC

R2)

. The

mea

n fo

ld c

hang

e of

eac

h ge

ne c

alcu

late

d fr

om th

e co

mpl

ete

prim

ary

mic

roar

ray

data

set i

s sho

wn

for c

ompa

rison

.

K

LK7

SLPI

S1

00A

2 TN

FSF

10n

FN

1 U

PA

Gro

up

Arr

ay

PCR

1 PC

R2

Arr

ay

PCR

1 PC

R2

Arr

ay

PCR

2 A

rray

PC

R2

Arr

ay

PCR

1 PC

R2

Arr

ay

PCR

1 PC

R2

<12

mon

ths

7.17

77

7.25

38

.68

5.30

1.

79

2921

.05

1.43

29

9.21

1.

83

12.3

0 1.

11

5.61

31

.05

3.91

2.

03

459.

62

>24

mon

ths

17.9

5 24

01.9

6 54

.18

13.5

7 3.

92

5042

.28

4.06

12

66.4

0 4.

24

18.4

2 0.

54

4.06

20

.75

1.94

7 1.

18

237.

25

Mea

n fo

ld c

hang

e 2.

5 3.

09

1.4

2.6

2.19

1.

7 2.

8 4.

2 2.

3 1.

5 0.

4 0.

72

0.6

0.5

0.58

0.

5

179

0

500

1000

UPA

0

10000

20000

SLPI

0

50

100

G

FN1

40

30

20

10

TNFS

F10n

<12 months >24 months <12 months >24 months



6000

5000

4000

3000

2000

1000

0

G

S100

A2

A

C D

E F

0

50

100

150

G

KLK7

B

Figure 4-7: Box plots of RT-PCR assessed gene expression of selected genes on independent validation samples. Red dots indicate mean expression levels per class. All p-values comparison of the mean expression level between groups were >0.05. The mean fold change in expression however did agree with values observed in the microarray data. (A) KLK7 (B) SLPI (C) S100A2 (D) TNFSF10 (E) FN1 (F) UPA.

180

4.2.7. Analysis of published gene lists for predicting EOC prognosis

Prognostic gene lists identified by other published studies of EOC were used to

interrogate the dataset generated for this chapter in order to determine the relationship

between datasets. Univariate F-tests were carried out on expression data corresponding to

the published gene lists and hierarchical clustering was performed to visualise the

patterns of expression formed.

4.2.7.1. Comparison to EOC gene expression prognostic signature

The Spentzos et al (Spentzos et al., 2004) 115 gene independent prognostic signature was

matched via UniGene IDs (Wheeler et al., 2003) to 90 genes on the Peter Mac 10.5k

cDNA microarray. None of the genes exhibited statistically significant variation between

survival groups present in this study or correlated significantly to with survival time.

Hierarchical clustering of the 26 samples used in this chapter samples using the ‘Ovarian

Cancer Prognostic Signature’ is shown in Figure 4-9. While regions of co-expressed

genes can be observed in the cluster image generated, the samples were not grouped into

survival groups, nor were groups of samples corresponding to other clinical variables.

4.2.7.2. Comparison to a molecular signature of EOC residual disease levels

Microarrays have been used to investigate the molecular component of residual disease

following surgery, thought to be dependant on both the physiological characteristics of

the tumour and the ability of the operating surgeon. Berchuck et al (Berchuck et al.,

2004) found 32 genes that could distinguish between optimal and suboptimal debulking

with 72.2% accuracy. Only 12 of these genes were present on the Peter Mac array (based

on UniGene ID linking, build #184) reducing the power of this comparison. However

using this12 gene subset, no significant grouping of patients into either survival or

residual disease categories on the basis of hierarchal clustering was observed (Figure 4-

8).

Furthermore, none of these twelve genes were present in any of the other published gene

lists used to interrogate these 26 EOC expression profiles. This suggests that the signature

of residual disease identified by Berchuck et al may be specific to the 44-patient cohort

from which it was developed, or these genes are not involved in the range of other

tumour-related processes represented by these gene lists.

181

Figure 4-8: EOC survival dataset clustered using 12 genes overlap between the Peter Mac 10.5k cDNA microarray and a predictive signature of residual disease (Berchuck et al., 2004). Sample colour bar corresponds to residual disease categories nil (0cm tumour remaining), min (0-1cm), mod (1-2cm) or max (>2cm).

Minimal

Maximum

Moderate

Nil

Residual disease category:

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.2

1.5

2.0

2.5

3.0

3.0

1.0

0.3

Expression ratio:

PARD6A RPS6KA4 EIF3S8 SEPHS1 FLJ20397 FGFR3 ARPC3 P2RXL1 FGFR1 PCP4 RARB SDCCAG16

182

4.2.7.3. Comparison to a molecular signature for predicting the likelihood of EOC relapse

Hartmann et al (Hartmann et al., 2005) found 14 genes evaluated on 51 specimens of

EOC that were capable of predicting an early relapse of the disease following platinum-

Paclitaxel therapy. Disease relapse can be considered a surrogate measure of survival

time due to the often poor prognosis and short survival time associated with

chemotherapy-resistant recurrent EOC.

Ten of the fourteen genes found by Hartmann et al were matched to the Peter Mac array

and the analysis repeated. Again none of the genes had significant univariate expression

variation between the survival groups or patients grouped by residual disease categories.

Hierarchical clustering revealed no separation of survival groups as expected based on the

lack of statistical difference between these genes in this cohort.

There were no genes in common between the Spentzos et al and Hartmann et al gene lists,

despite both correlating with patient prognosis. The implications of this are discussed

further in section 4.3.4

183

Figu

re 4

-9:

Hie

rarc

hal

clus

teri

ng u

sing

the

ove

rlap

bet

wee

n Pe

ter

Mac

10

.5k

mic

roar

ray

and

thos

e ge

nes

iden

tifie

d by

Spe

ntzo

s et

al (

Spen

tzos

et

al.,

2004

) as

hav

ing

inde

pend

ent

prog

nost

ic s

igni

fican

ce f

or E

OC

. No

clea

r cl

uste

ring

of p

atie

nts

acco

rdin

g to

surv

ival

tim

e is

obs

erve

d us

ing

thes

e ge

nes o

n da

ta g

ener

ated

for t

his c

hapt

er.

Figu

re 4

-10:

Hie

rarc

hal

clus

teri

ng u

sing

the

ove

rlap

bet

wee

n Pe

ter

Mac

10

.5k

mic

roar

ray

and

thos

e ge

nes

iden

tifie

d by

Har

tman

n et

al (

Har

tman

n et

al

., 20

05)

as

pred

ictiv

e of

ea

rly

recu

rren

ce

follo

win

g fir

st-r

ound

ch

emot

hera

py.

No

rela

tions

hip

betw

een

the

expr

essi

on o

f th

ese

gene

s an

d pa

tient

surv

ival

is e

vide

nt.

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.2

1.5

2.0

2.5

3.0

3.0

1.0

0.3

Exp

ress

ion

ratio

<12

mon

ths

12-2

4 m

onth

s

>24

mon

ths

Surv

ival

gro

up

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.2

1.5

2.0

2.5

3.0

3.0

1.0

0.3

Exp

ress

ion

ratio

<12

mon

ths

12-2

4 m

onth

s

>24

mon

ths

Surv

ival

gro

up

184

4.2.8. Network and pathway analysis of genes differentially expressed between survival groups

In order to explore any potential interaction and functional relationships between the 27

genes identified as differentially expressed between patients with survival times of either

<12 or >24 months, Ingenuity Pathway Analysis (Ingenuity Systems, USA) was used.

This system makes use of a large on-line curated database consisting of millions of

individually modelled relationships between proteins, genes, complexes, cells, tissues,

drugs, and diseases. In order to assess the significance of observing various combinations

of genes, all genes contained in the Ingenuity Pathways Knowledge Base (IPKB) are used

as the reference set for computing significance levels. This database presently contains

information on over 22,700 mammalian genes classified into a custom ontology structure

with over 280,000 biological ‘concepts’, thus providing a comprehensive view of the

human genome and producing accurate statistics. Calculations of significance are based

on right-tailed Fishers Exact Test (Moore and McCabe, 2003), testing for over-

representation of a particular gene annotation in a given list.

Fourteen of the 27 genes were found in this database, ten of which mapped to a known

gene interaction network, shown in Figure 4-11, and known to be involved in a range of

cancer-related processes and diseases based on interrogation of the IPKB. This network

was given a score of 21 based on the number of genes overlapping with the 27 genes

supplied and those present in whole network. All other networks identified by this

analysis had scores of less than two, indicating this cancer-related network best

represented the molecular processes differentially expressed between the two groups of

patients.

Within this network of genes, the most significantly represented functional groups of

genes relate to the biological question at the centre of this chapter; cancer development

and progression (p=4.36E-7 - 9.83E-3), cell death (p=2.71E-6 - 9.83E-3) , diseases of the

reproductive system (p=3.95E-6 - 9.83E-3), cellular growth and proliferation (p=4.88E-6

- 9.83E-3) and tumour morphology (p=2.34E-5 - 9.83E-3). The range of p-values refers

to the range of p-values for all sub-classifications contained in these ontologies. The full

list of processes identified is given in Appendix H.

Several genes in the network identified to have the most significant overlap with the 27

genes have a demonstrated involvement in EOC malignancy and progression. These

include AKT2, a member of the protein kinase B family, the product of which is

185

frequently activated in primary ovarian cancer. The inhibition of this protein leads to

activation of apoptosis pathways, opening up the possibility of manipulating this protein

as a novel therapeutic approach to treating EOC (Yuan et al., 2000).

Also present in the network of gene expression and well represented in the literature is the

estrogen receptor gene ESR1. Expression of this gene has been linked to a range of

processes crucial to tumour development and proliferation, predominantly in breast

cancer models (Oesterreich et al., 2001). Levels of the ERS1 gene product are routinely

assessed to make decisions about breast cancer treatment (Simpson et al., 2005).

Four of the five genes with calcium transport or binding functions are contained in the

network (LTBP2, BGLAP, REPS2, and RYR1), further underscoring the importance of

calcium-related events in tumour development and progression (Giovannucci et al., 1998;

Goodman et al., 2002; Kubota et al., 1999; Raymond et al., 1999).

Pathway analysis reveals a significant number of tumour-related molecular events are

shared between members of this gene list, generated by comparison of expression data

from patients with either <12 or >24 month survival. This is despite the lack of statistical

significance in the size of the list, relative to the starting gene set, as determined by

permutation analysis in section 4.2.4.3.

It is hypothesised that repeating this analysis using a larger cohort of patients and a more

genome-comprehensive microarray platform would yield a statistically significant,

biologically relevant gene expression EOC prognostic signature.

186

Table 4-11: Known molecular networks identified by Ingenuity Pathway Analysis as having a significant representation of the 27 genes differentially expressed between EOC patients of <12 or >24 months survival. ‘Focus’ genes are those present in the list of genes input into the Ingenuity program (shown below in bold).

Genes in Network Network Score

No. Genes overlapping

Gene ontologies significantly represented by network

ADAMTS4, AKT2, BGLAP, CARD8, CASP1, CASP4, CCNG1, CDC2, Cdc2b, CKM, COL4A1, E4F1, ESR1, ETS1, ETV3, FBLN1, FN1, HBP1, LATS2, LTB, LTBP2, MYOD1, PHB, RB1, RBBP6, REPS2, RYR1, SFN, SKIIP, SLC2A1, SP100, TAF1A, TFAP2C, TGFA, TP53

21 10 Cancer, Cell Death, Reproductive System Disease

NAPSA, SFTPB 2 1

Cellular Assembly and Organization, Respiratory System Development and Function, Nervous System Development and Function

SMAD7, SOX13 2 1 Cancer, Cellular Growth and Proliferation, Gastrointestinal Disease

ARHGAP22, ZNF161 2 1

Cardiovascular System Development and Function, Cellular Growth and Proliferation, Tissue Development

GPX3, NFE2L2 2 1 Cancer, Gastrointestinal Disease, Inflammatory Disease

187

Figure 4-11: Gene interaction network identified as containing a statistically significant proportion of the 27 genes differentially expressed between patients with <12 months or >24 months survival times (shown in grey). Other genes contained in this figure were not present in the initial list of 27 but have documented relationships to the 14 present, based on literature and database mining using the Ingunuity system. This network is implicated in a range of cancer types (including breast, colorectal and ovarian), as well as apoptosis, cell growth, proliferation, morphology and movement. These gene ontologies are related to this specific gene network at a significance level of p<0.001.

Key:

188

4.3. Discussion This chapter describes the use of cDNA gene expression data to identify genes related to

the variable of EOC patient survival. While several methods of data analysis ranging in

complexity and having been successfully used by others for similar analyses, it was not

possible to identify a statistically significant gene set from the particular cohort available.

Several confounding factors were identified as contributing to the inability to achieve this

goal of the study.

4.3.1. The impact of residual disease and distribution of survival times on the identification of genes related to length of survival

One of the most significant confounding factors in the attempt to identify genes related to

patient survival was the pattern of residual disease present in this cohort, particularly in

light of the demonstrated importance of this variable on EOC prognosis (Berchuck et al.,

2004; Bristow et al., 2002; Hoskins et al., 1994). A distinct trend of increasing levels of

residual disease associated with shorter survival times could be observed in the 26

patients analysed for this chapter, although it was not statistically significant. By

grouping patients into categories of nil/minimal (<1cm) or moderate/maximum (>1cm)

levels of residual disease and incorporating this information into the analysis of

differential gene expression between survival groups, it was hoped to identify genes

whose expression was related to survival, independent of the level of residual disease

present. Whilst a number of gene lists were obtained, none were found to be more

significant than could be expected by chance alone. The method most closely resembling

that of Spentzos et al (Spentzos et al., 2004), in which patients representing the outer

edges of the survival time distribution were compared, resulted in the gene list that was

the closest to statistical significance on the basis of permutation analysis.

Various retrospective studies have demonstrated the benefit of optimal surgical debulking

for advanced stage EOC. Current figures from these reports show a median survival of

approximately 5 years for patients who are diagnosed with <1cm diameter residual

tumour nodules, compared to 3 years in cases where larger volumes of tumour remain

(Hoskins et al., 1992; Hoskins et al., 1994). Microarray analysis of late stage tumours of

varying debulk status has revealed a small gene signature associated capable of predicting

this variable in approximately 75% of cases tested. This implies that the benefits obtained

from optimal debulking are at least partially linked to the molecular characteristics of the

189

individual tumour and not just due to the physical removal of tumour bulk alone

(Berchuck et al., 2004). The model generated from this gene expression study does not fit

every sample in the cohort and as such other theories about the reason for the prognostic

significance of residual disease remain valid. These include hypotheses that smaller

tumour masses are more susceptible to chemotherapy, more likely to trigger an effective

immune response and have a lower chance of developing chemoresistance (Berek, 1995;

Memarzadeh et al., 2003; van der Burg et al., 1995).

Another significant hurdle this dataset presented was its limited sample size, a factor

whose influence was most likely amplified in this study of EOC due to the previously

discussed heterogeneity of this disease and influence of varying residual disease levels on

patient survival (Hoskins et al., 1992; Hoskins et al., 1994). In keeping to the criteria

determined for selecting suitable cases of EOC from the total set available at the start of

this project, many samples had to be excluded because of incomplete clinical data,

particularly information concerning the level of residual disease present. Several attempts

were made over the course of this project to obtain more clinical information about the

total cohort, most of which was obtained through a collaboration with the Royal Brisbane

Hospital, QLD. The age of the specimens, some collected up to 15 years ago, as well as

the varying locations at which the women were treated, made this information difficult to

obtain.

One of the few studies to successfully identify a set of genes with prognostic capabilities

for EOC is that by Spentzos et al (Spentzos et al., 2004). This work involved the use of 68

EOC microarray profiles generated with the 12,625 feature Affymetrix U95A2 array

(Affymetrix, Santa Clara, CA USA), more than twice the number of samples used for this

study. A similar method of analysis as employed in section 4.2.4.3 was employed in

which groups of patients with the shortest and longest survival times were compared to

identify genes with significant expression differences. The substantially greater range of

survival times present in the Spentzos et al cohort permitted a greater separation between

the short and long term survival groups, with the former have survival times below 26

months and the latter 58 months or greater. It is reasonable to assume that further the

distance between two groups of specimens are from each other as defined by a linear

variable such as survival time; the greater the molecular contrast between these two

classes, improving the chances of identifying differentially expressed genes.

The distribution of survival times in the cohort analysed in the chapter did not permit for

the equivalent separation of short and long term survival groups as used by Spentoz et al.

190

However should this dataset be extended by incorporation of gene expression profiles

generated from specimens prospectively collected by the AOCS, associated with

substantially more comprehensive clinical information, such separation may be possible.

4.3.2. EOC heterogeneity and its impact on the success of genomic analyses

Microarray profiling of whole tumour specimens gives a ‘global overview’ of gene

expression in a given piece of tissue. However as human tissue is a complex network of

tissue and cell types, it can be difficult to ascertain the exact cell type responsible for a

particular expression pattern generated when profiling macroscopic sized specimens of

tissue (Liotta and Petricoin, 2000). The heterogeneity of EOC has been described

previously and is a confounding factor for any study seeking to identify its underlying

molecular causes using high throughput approaches such as microarray technology where

samples must be grouped into classes of sufficient numbers to allow statistically valid

comparisons to be made (Hernandez et al., 1984; Pieretti et al., 2002).

Concentrating the analysis on the single histological subtype is one method of reducing

heterogeneity in a dataset. Limiting the analysis in this chapter to the serous subtype

achieved this and also reduced the risk of the dataset being contaminated by metastases

which are frequently of mucinous histology (Lee and Young, 2003; Seidman et al., 2003).

Despite limiting the cohort to the serous type only, the importance of full pathology

review of specimens intended for microarray analysis cannot be underestimated. One

crucial measure obtained from a review of the tissue prior to array analysis is the

percentage of tumour present in the section relative to the stroma and other non-malignant

tissue. As the tissue processing protocol used for this study did not incorporate

microdissection, any non-tumour tissue present will be homogenised and the genetic

material it contains extracted along with that of the tumour, contributing to the overall

gene expression profile generated for the specimen. In this study all samples were

reviewed as having sufficient tumour content for cDNA microarray profiling by the

reviewing pathologist, as described in Material & Methods section 2.2.1. It has been

documented that particularly for heterogeneous tumour types such as EOC the specific

location relative to the entire tumour, from which the biopsy for microarray analysis is

taken can have a significant impact on the resulting gene expression profile (Pieretti et al.,

2002). This may be due to certain areas of a tumour of this type being more malignant,

191

containing more stroma or a higher level of infiltrating immune cells for example, each

influencing the molecular signature generated (Liotta and Petricoin, 2000).

Another method that could be employed to increase the clarity of information obtained by

microarray analysis would be the use of microdissection and RNA amplification. This

process is extremely time-consuming to carry out, however would facilitate the exclusion

of all non-tumour tissue from the biopsy being analysed resulting in microarray data

generated from a more pure cell population. Therefore one could state with confidence

that the gene expression signature obtained from the amplified RNA was truly

representative of the malignant tissue rather than its surrounding stroma or nearby normal

tissue (Player et al., 2004; Sambrook and Bowtell, 2003). Comparisons could then be

made to microarray profiles of the microdissected stroma and other non-tumour cells in

order to profile the expression of genes in these cells.

To date this microdissection of tumour material has not been employed in the vast

majority of EOC gene expression studies (Adib et al., 2004; Donninger et al., 2004; Gilks

et al., 2005; Hartmann et al., 2005; Jazaeri et al., 2003; Lancaster et al., 2004; Lee et al.,

2003; Sakamoto et al., 2001; Santin et al., 2004; Schaner et al., 2003; Schwartz et al.,

2002; Spentzos et al., 2004; Tonin et al., 2001), which in light of a growing body of

evidence concerning the importance of tumour-stroma interaction in disease progression,

represents somewhat of a deficiency in this area of research. The future use of

microdissection coupled with microarray profiling may increase the level of

understanding of the interaction of EOC and its environment.

4.3.3. Attempts to identify gene expression patterns with statistically significant relationships to length of survival

A gene or set of genes whose expression correlated with survival length in a linear

fashion would make an ideal prognostic marker. Theoretically such a marker could be

measured throughout the course of a patient’s treatment regime to assess disease

progression and possibly contribute to any decision concerning how aggressively a

tumour should be treated. To this end, several approaches were attempted to find such

genes in this dataset. Both the quantitative trait (section 4.2.4.1) and Cox proportional

hazards (section 4.2.4.4) analyses interrogate the microarray data with respect to the

continuous variable of survival time, the hazards model permitting the censoring of data

from any patients who were still alive at the last data collection point (follow-up date in

this instance).

192

Unfortunately, neither analysis approach yielded a list of genes more significantly

correlated with survival than could be expected by chance association. However,

inspection of the genes identified do show a significant correlation with the length of

survival experienced by the 26 cases in this study, demonstrating the theoretical ability of

this approach for identifying a small number of potentially clinically important genes

from a very large starting set (4,508 genes were tested after excluding non-varying genes

using an unsupervised filter).

This phenomenon of non-overlapping gene sets claimed to reflect the same clinical

question has been observed by a number of groups (Ein-Dor et al., 2005). It may be a

reflection of the heterogeneity of ovarian cancer, a disease whose subtypes are known to

have markedly different clinical courses (Hess et al., 2004; Ronnett et al., 2004; Zanetta

et al., 2001). Alternatively, or perhaps additionally, it may be due to the sample sizes used

in each study being inadequate to produce a truly population-representative expression

signature. Other explanations may be the disparate clone sets used to create the respective

microarrays used by each study or method of bioinformatic analysis employed (Lossos et

al., 2004). Certain statistical approaches have been identified to cause over fitting of

findings which would result in an expression signature only being applicable to the

samples it was generated from (Ambroise and McLachlan, 2002; Simon et al., 2003b).

Ein-Dor et al (Ein-Dor et al., 2005) have recently shown that the creation of a predictive

gene list is highly dependant on the subset of patients used in the analysis and even small

changes in the number of samples used can result in significant changes in the number

and identity of genes selected by many feature selection algorithms. Other reasons these

authors proposed for the difficulty of identifying universal sets of genes with prognostic

expression patterns include the possibility that while large number of genes are correlated

with survival, the scale of these differences is often very small. This hypothesis agrees

with analyses carried out in this chapter in which the statistically significant changes in

expression detected between patients of <12 or >24 months survival corresponded to

seemingly very small differences in relative fold changes (e.g.. less than 0.1 difference in

mean fold change for some instances).

Attempts to identify genes correlated with EOC survival in this chapter were most likely

hindered by the number of samples available having the requisite clinical information, the

number of genes contained on the microarray platform used and also the known

heterogeneity of EOC, even within an individual histological subtype (Hernandez et al.,

1984; Pieretti et al., 2002; Sevin and Perras, 1997).

193

Several methods for identifying and correcting sources of technical error described in

Chapter 3 were applied to the dataset in an attempt to reduce any systematic noise that

may have been clouding true biological information. The level of spatial bias present in

the dataset was determined using the MMT method, however was of an acceptable level,

indicating this source of error was not a significant concern. Despite this, a range of

normalisation approaches were applied to the dataset and the analyses of survival times

repeated, however no increase in statistical significance of the resulting gene lists were

obtained.

4.3.4. Biological and clinical relevance of genes identified

Despite the statistical uncertainty of the gene lists identified, a number of biologically

relevant genes were identified from the analyses carried out. The T-test carried out

between patients with <12 or >24 months, resembling the method carried out by Spentzos

et al (Spentzos et al., 2004), was the closest of those used in this chapter to generating a

list of differentially expressed genes with greater significance than would be obtained by

chance alone on the basis of 1000 permutations of the dataset (p=0.019). As discussed

previously and in light of this approach generating the closest to a statistically significant

list of genes, a larger sample cohort with a greater divide in survival times between

classes could be expected to yield results similar to those published.

Nine genes are differentially expressed at the most significant univariate level possible in

this analysis (p<1e-07). While it is not possible to state that the genes in this list were not

selected by chance alone, some have interesting and relevant biology behind them, which

in itself is a form of experimental validation. Amongst these is Cisplatin resistance-

associated over expressed protein (CROP) which is a stress-response molecule isolated

from a cisplatin-resistant cell line. CROP is expressed at higher levels in the tumours of

the short term survivors (Umehara et al., 2003). While its precise function remains

unknown, its location within the cell is altered following cisplatin treatment, suggesting a

mode of activation for this chemotherapeutic agent commonly used for EOC that involves

modulation of stress-response genes including CROP (Umehara et al., 2003). Its higher

expression detected in cisplatin-treated patients with shorter survival times and both a cell

line resistant to the same drug may represent a molecular mechanism for evading the

cytotoxic effects of this treatment, leading to chemoresistant disease and a poorer

prognosis.

194

Expressed at a level 3-fold lower level in short term survivors, was the extracellular

matrix protein Latent transforming growth factor beta binding protein 2 (LTBP2). The

cell adhesive properties of this gene have been demonstrated in melanoma adhesion

assays in which cell attachment to the LTBP2 was inhibited in a dose-dependant manner

by antibodies against beta-1 integrin (Moren et al., 1994; Vehvilainen et al., 2003). A

reduction in the expression or functionality of cell adhesion genes is frequently associated

with increased proliferation of tumour cells (Wijnhoven et al., 2000), this trend being

supported by the observation of lower levels of LTBP2 in patients with shorter survival

times.

Other genes expressed at lower levels in those patients with shorter survival times

identified by this analysis are:

ACP6, which has been demonstrated to protect cells from cisplatin induced death

and contributes to overall cell survival (Mackeigan et al., 2005);

BGLAP, which is thought to contribute to the accumulation of calcium phosphate

in EOC psammoma bodies and associated with cellular degradation (Raymond et

al., 1999); and

NAPSA, for which therapeutic targeting is currently under way. In breast cancer,

increased expression of this gene results in higher levels of E-cadherin (CHD1),

in important cell surface molecule heavily implicated in EOC progression

(Tatnell et al., 1998; Thibout et al., 1999),

The reduction of expression of these genes is associated with shorter survival times and

agrees with the current literature concerning their molecular functions. Other genes

selected by this analysis, as listed in Table 4-4, function to control events associated with

the cell cycle, cell adhesion and regulate apoptosis, all important processes in tumour

growth and progression.

Amongst those genes found to have higher expression levels in those patients with shorter

survival times were:

GPX3, which protects cells against damage from oxidative stress and has

approximately 6-fold higher expression in the highly malignant clear-cell ovarian

cancer subtype relative to mucinous and serous EOC (Hough et al., 2001;

Takahashi et al., 1987);

195

LTB, which promotes tumour growth by interacting with the immune system and

triggering angiogenesis allowing the tumour cells to obtain the required nutrients

to proliferate (Browning et al., 1993);

SCARA3, also protecting from oxidative stress, a condition under which non-

malignant cells would die, by depleting the levels of reactive oxygen present. The

expression of this gene is increased in response to an elevation of these damaging

molecules (Han et al., 1998).

Notably, in the list of 27 genes differentially expressed between short and long terms

survivors in this study, five genes are involved in either calcium binding (LTBP2,

BGLAP, REPS2, and PCDH12) or calcium channel activity (RYR1). All genes besides

PCDH12 are expressed at higher levels in the longer term (>24 months) survival group

relative to the short-term group (<12 months) and although the differences are small they

are highly statistically significant (p<0.0001).

Large epidemiology studies have described a significant link between levels of dietary

calcium intake and rates of ovarian cancer (Goodman et al., 2002). This study noted an

inverse association between the consumption of lactose, thought to increase calcium

absorption, and the risk of EOC (Gueguen and Pointillart, 2000). In summary, the data

revealed that women who consume higher levels of both calcium and lactose, particularly

from dairy sources, are at a significantly decreased risk of EOC development. Dietary

calcium intake has also been reported to be inversely related to breast cancer (Lipkin and

Newmark, 1999) and colorectal cancer (Martinez and Willett, 1998) and positively

related to prostate cancer (Giovannucci et al., 1998) suggesting an important role for this

nutrient and the molecular processes in which it is involved for a range of malignancies.

A potential mechanism to explain the association between calcium and EOC is the

requirement of this compound for correct function of cellular adhesion molecules,

specifically transmembrane glycoprotein cadherins. Variation in calcium availability,

either through dietary consumption or defective regulation of calcium-processing genes,

may alter the adhesive functionality of cadherins. This in turn may lead to increased rates

of tumour progression and invasion as cells lose or gain adhesive abilities that confer a

malignant phenotype. The regulation of cadherin expression and function has been

demonstrated as a pivotal stage of EOC progression (Patel et al., 2003). In another

microarray based study of EOC malignant potential, a significant number of calcium

channel related genes were observed to be differentially expressed between benign,

borderline and malignant ovarian specimens (Warrenfeltz et al., 2004). These genes were

196

under expressed in the malignant tissues relative to the other tissues profiled by

Warrenfeltz study. This agrees with the observation made in this chapter in which lower

calcium-related gene expression was associated with those patients experiencing shorter

survival times; possibly reflect a more malignant disease phenotype of EOC present in

these patients.

4.3.5. General conclusions

In this chapter a range of methods were used to identify genes with patterns of expression

related to EOC survival. Whilst no single list of genes was identified with more statistical

significance than could be expected by chance, a number of biologically and clinically

relevant genes were identified, particularly in the list of 27 genes found by comparing the

two groups of patients with either <12 or >24 month survival. Several of these genes were

identified as being involved in cell adhesion and calcium binding or transport, therefore

regulation of these processes is hypothesised to be important for EOC progression and

ultimately patient survival.

Significant gene lists from a number of published studies of EOC survival or malignancy

were used to interrogate the microarray data generated for this study; however none were

found to reproduce the same result as observed their respective published analyses. This

is thought to reflect the heterogeneity of EOC, limited genome-coverage of some

microarray platforms and also the small cohort sizes of most EOC studies of this kind

carried out to date.

RT-PCR was used to attempt technical and biological validation of a series of genes

found to have a mean 2-fold or greater fold change between patient survival groups and a

relevant literature base. Despite complications arising from the limited quantities of RNA

available for validation purposes, in general good agreement was found between

microarray and independent RT-PCR based measurements of gene expression.

197

5. Molecular analysis of invasive and low malignant potential ovarian tumours

5.1. Introduction The invasive and low malignant potential (LMP) subtypes of ovarian cancer both arise

from the epithelial lining of the ovary, yet have a number of important differences. By

studying the molecular differences between these two tumour types it is hoped to increase

the understanding of those events responsible for EOC progression and invasion.

The defining characteristics of LMP EOC include:

Atypical cellular proliferation, but the lack of stromal invasion despite sharing

other malignant characteristics such as cellular stratification and nuclear atypia

Significantly better prognosis; the 5-year survival rate for women diagnosed with

stage 1 LMP disease is in excess of 95% (compared to 30% for all EOC)

Younger age at diagnosis.

The efficacy of conservative (fertility-sparing) surgery in achieving a cure.

Arising from the same tissue as invasive cancers, LMP tumours are an excellent model to

assist in identifying the molecular basis of EOC invasion.

This chapter describes a microarray-based investigation of invasive and LMP ovarian

tumours, including those of mucinous and serous histology. The work describes

pathology review of specimens and the creation of an expression-based method for

confirming pathological classifications, including tissue of origin. Bioinformatic analysis

was carried out on the resulting dataset to characterise the molecular differences between

LMP and invasive subtypes. Relationships to other invasive/non-invasive cancer models

are analysed and ontology and pathway analyses are carried out to determine key

processes that may be responsible for controlling the invasive potential of EOC.

High throughput methods such as automated RT-PCR and microarrays have been applied

to a range of cancer models in recent years to advance the understanding of their

molecular foundations, particularly in relation to important clinical variables

(Dhanasekaran et al., 2001; Dyrskjot et al., 2003; Golub et al., 1999; Ramaswamy et al.,

2001; Spentzos et al., 2004; van 't Veer et al., 2002; Zembutsu et al., 2002). One such

198

variable is the malignant or invasive potential of a tumour, as this can influence how a

patient is treated. Tumours showing early indications of a more aggressive phenotype

may be treated with a broader range of therapies (radiotherapy/chemotherapy and

surgery), whereas those exhibiting more benign characteristics may be curable by surgery

alone. Where this is the case, the patient is spared the time, expense and significant

morbidity associated with other forms of treatment.

An unexpected challenge that arose during the course of this chapter was to be the higher

than expected frequency of metastatic tumours to the ovary diagnosed as primary EOC.

During the pathology review process, the original diagnosis of several cases was queried

by the reviewing pathologist, Dr Melissa Robbie. As a result of interrogating the

associated clinical information, revision of diagnostic slides and the comparison of the

gene expression profiles of these queried samples to a large database of other primary

tumour types (Tothill et al., 2005), a proportion of the cohort had to be excluded from

further analysis. The mucinous invasive type of EOC was the most frequently

misdiagnosed, as observed by a number of other studies of these tumours (Ji et al., 2002;

Lee and Scully, 2000; Lee and Young, 2003; Ronnett et al., 2004).

An improved understanding of the molecular differences between invasive and non-

invasive forms of common cancer types, such as EOC, has the potential to assist

clinicians in making important treatment decisions. Exploring the functions of those

genes differentially expressed between tumour subtypes may also lead to the discovery of

specific genes that can be therapeutically manipulated to reduce the malignant potential

of a tumour, thus increasing the efficacy of other treatments (Alizadeh et al., 2000; Liotta

and Petricoin, 2000).

This chapter describes the development of a bioinformatic approach for using cDNA

microarray data, gene ontology analysis and pathway discovery to identify key molecular

events whose aberrant regulation may be responsible for clinically important phenotypic

differences. The relationship of these molecular differences to other models of cancer

progression and invasion is also investigated. This was achieved by comparing the gene

expression signature of LMP or invasive EOC to other published and in-house microarray

analyses of biological relevance. Studies compared include those where gene expression

related to the invasion of breast (van 't Veer et al., 2002), prostate (Singh et al., 2002) and

gastrointestinal tract (Boussioutas et al., 2003) carcinoma were identified. As well as

these studies of other cancer types, the expression profile that distinguishes between LMP

199

and invasive EOC was technically validated by comparison to other recently published

microarray-based studies of EOC malignancy (Gilks et al., 2005; Schwartz et al., 2002).

To biologically validate the gene expression differences found, two methods were used –

RT-PCR and immunohistochemistry. Both techniques were applied to samples

independent of those in the cohort used to generate the LMP/invasive expression

signature, an important step in genomic profiling of any disease type to ensure the

findings are widely applicable on a population level and not restricted to one particular

group of patients (King and Sinha, 2001; Liotta and Petricoin, 2000; Simon et al., 2003b).

Appropriate paraffin-embedded specimens of EOC, confirmed by pathology review, were

used to create two tissue microarrays (TMA). These were used for technical and

biological validation of the microarray signature through IHC analysis with labelled

antibodies specific to the protein product of several genes identified as having differential

expression levels between the EOC subtypes.

A method for capturing and objectively analysing large collection of IHC data using

commercially available image-processing and statistical software is described in the

validation section of this chapter. This technique was used to quantify the differences in

staining intensities for a range of antibodies used to identify specific proteins

corresponding to a selection of genes differentially expressed between the LMP and

invasive tumour types.

200

Identify suitable cases of EOC from AOCS cohort and Peter Mac Tissue

bank

Pathology review of specimens to confirm original

diagnosis

Process tissue and hybridise RNA to

cDNA microarrays.

Analyse patterns of gene expression

Investigate gene ontology and

pathways represented by differentially

expressed genes

Explore relationships to

other in-house and published studies of invasive/non-invasive cancers

and also EOC

Technically validate

expression of differentially

expressed genes with RT-PCR

Biologically validate expression

of differentially expressed genes with RT-PCR on

independent samples

Biologically validate expression

of differentially expressed genes

with qIHC

Figure 5-1: Overview of gene expression based analysis of LMP and invasive EOC; from identification of suitable samples through to quantitative immunohistochemistry (qIHC) on tissue microarrays. Dashed line between pathology review and analysis of gene expression indicates the parallel nature of these two stages. A number of samples were found to be metastatic rather than primary tumours and excluded from the study to avoid contaminating the dataset with non primary EOC gene expression information.

201

5.2. Results

5.2.1. Case selection and pathology review of suitable cases

Patients diagnosed with mucinous or serous EOC were identified from the AOCS and

Peter Mac Tissue Bank databases. H&E stained sections were inspected by either Dr

Melissa Robbie or Dr Paul Waring to confirm the specimen of tumour available matched

the diagnosis given. As the ratio of tumour to non-tumour cell present in a specimen is

important for microarray work, an objective assessment of tumour content was made.

Comments from the review of each case are summarised in Table 5-1. Unless otherwise

stated, percentage tumour was judged to be sufficient for microarray analysis (>50%

tumour cell content by assessment of the number of tumour cell nuclei present per high-

power field, as described in Materials and Method).

A number of samples originally classified as primary EOC were questioned by one or

both of the pathologists during the review process, based on inspection of the original

pathology report and the corresponding H&E stained section. Where possible, the full

range of diagnostic slides was called in for further information, as well as other clinical

notes on the patient available from the treating hospital. Where no further information

was available or slides could not be obtained, the specimen in question was excluded

from further analysis as to avoid contamination of the dataset with poor quality or non-

primary ovarian tumour material. Identification of non-primary or metastatic tumours to

the ovary is crucial for a study of microarray gene expression data. It has been

demonstrated that tumour metastases maintain the expression profile of their originating

tissue, therefore not identifying and excluding such specimens may potentially

contaminate a dataset with gene expression patterns of tissue type other than the one

being investigated (Su et al., 2001). Further analysis of the relationship between

metastatic disease and the tissue it originated from has revealed a molecular signature that

can discriminate between metastases and their primary tumours, however the

predominant gene expression profile obtained from these microarray analyses was

observed to be that of the primary tissue (Ramaswamy et al., 2003).

Four specimens of mucinous invasive, 20 mucinous LMP, 12 serous invasive and 19

serous LMP EOC were reviewed and profiled by cDNA microarray analysis (total no. =

55), as shown in Table 5-1.

20

2

Tab

le 5

-1: P

atho

logy

info

rmat

ion

for

orig

inal

EO

C c

ohor

t (n=

55).

Patie

nt

ID

Subt

ype

Inva

sive

/LM

P G

rade

FI

GO

St

age

Com

men

ts fr

om r

evie

win

g pa

thol

ogis

t

93.0

64

Muc

inou

s In

vasi

ve

1 1C

-

90.0

07

Muc

inou

s In

vasi

ve

2 1C

C

ompl

ex, a

typi

a, n

ecro

sis c

onsi

sten

t we

wel

l diff

eren

tiate

d in

vasi

ve c

arci

nom

a 94

.036

M

ucin

ous

Inva

sive

3

3C

Om

enta

l spr

ead

note

d in

pat

holo

gy re

port.

94

.112

M

ucin

ous

Inva

sive

3

Su

spic

ious

as t

o LM

P st

atus

. Sm

all g

land

s but

min

imal

aty

pia,

no

stro

mal

reac

tion

93.0

02

Muc

inou

s LM

P

LMP

Foca

l ser

ous a

reas

P0

0627

M

ucin

ous

LMP

1 LM

P -

P007

84

Muc

inou

s LM

P 1

LMP

Orig

inal

ly li

sted

as i

nvas

ive

- pat

h re

port

and

arra

y an

alys

is in

dica

te L

MP.

Pos

sibl

y sa

mpl

ing/

biop

sy si

te is

sue

P009

34

Muc

inou

s LM

P 1

LMP

Mos

tly b

enig

n sa

mpl

e, sm

all a

rea

of L

MP

foci

W

M22

3 M

ucin

ous

LMP

1 LM

P -

WM

438

Muc

inou

s LM

P 1

LMP

Path

revi

ew in

dica

ted

met

asta

tic c

olor

ecta

l dis

ease

P0

0488

M

ucin

ous

LMP

3 LM

P Pa

th re

port

sugg

ests

sam

ple

is m

etas

tatic

from

app

endi

x (h

yper

plas

tic p

olyp

) 92

.011

M

ucin

ous

LMP

5 LM

P -

93.0

77

Muc

inou

s LM

P 5

LMP

Orig

inal

repo

rt: m

entio

ned

tum

our h

ad b

osse

late

d fe

atur

es, i

.e. r

ound

ed n

odul

es o

n su

rfac

e 94

.030

M

ucin

ous

LMP

5 LM

P O

rigin

al re

port

men

tions

tum

our o

n br

oad

ligam

ent

94.0

72

Muc

inou

s LM

P 5

LMP

- 94

.080

M

ucin

ous

LMP

5 LM

P V

ery

little

tum

our p

rese

nt o

n H

&E

slid

e, p

redo

min

antly

stro

ma

93.0

85

Muc

inou

s LM

P 6

LMP

Rep

ort s

ays M

etas

tatic

& re

view

indi

cate

s les

s tha

n 1%

tum

our c

onte

nt in

H&

E se

ctio

n 44

247

Muc

inou

s LM

P 9

LMP

Hig

h gr

ade

LMP

tum

our -

may

exh

ibit

mol

ecul

ar c

hara

cter

istic

s sim

ilar t

o in

vasi

ve sp

ecim

en

5102

6 M

ucin

ous

LMP

9 LM

P -

5103

0 M

ucin

ous

LMP

9 LM

P -

P007

18

Muc

inou

s LM

P 9

LMP

No

tum

our i

n H

&E

sect

ion,

nec

rotic

cys

t P0

0807

M

ucin

ous

LMP

9 LM

P -

P009

35

Muc

inou

s LM

P 9

LMP

Spar

se tu

mou

r cel

ls, o

rigin

al re

port

stat

es m

ostly

ben

ign,

smal

l LM

P fo

ci

WM

439A

M

ucin

ous

LMP

9 LM

P Pa

th re

port

indi

cate

s sam

ple

is m

etas

tatic

from

app

endi

x. D

epos

it on

ute

rus a

lso

note

d 93

.117

Se

rous

In

vasi

ve

3B

-

86.0

58

Sero

us

Inva

sive

1

2C

Orig

inal

pat

holo

gy m

entio

ns P

sam

mom

a bo

dies

85

.064

Se

rous

In

vasi

ve

2 3

- 91

.007

Se

rous

In

vasi

ve

2 3C

-

91.0

39

Sero

us

Inva

sive

2

1A

Sam

ple

is e

ndom

etrio

id u

pon

revi

ew o

f H&

E st

aine

d se

ctio

n 91

.052

Se

rous

In

vasi

ve

2 2B

-

93.0

04

Sero

us

Inva

sive

2

3B

Orig

inal

repo

rt m

entio

ns e

xist

ence

of p

rimar

y pe

riton

eal t

umou

r 93

.001

Se

rous

In

vasi

ve

3 3C

-

203

Patie

nt

ID

Subt

ype

Inva

sive

/LM

P G

rade

FI

GO

St

age

Com

men

ts fr

om r

evie

win

g pa

thol

ogis

t

94.0

17

Sero

us

Inva

sive

3

2B

Foca

l TC

C, P

sam

mom

a bo

dies

(i.e

. are

as o

f cal

cific

atio

n), p

oten

tially

rela

ted

to sa

rcom

a 94

.127

Se

rous

In

vasi

ve

3 3C

-

P007

56

Sero

us

Inva

sive

3

-

93.1

31

Sero

us

Inva

sive

9

3A

- 92

.014

Se

rous

LM

P

LMP

- 92

.018

Se

rous

LM

P

LMP

- 93

.007

Se

rous

LM

P

LMP

- 93

.079

Se

rous

LM

P

LMP

--

94.0

46

Sero

us

LMP

LM

P -

95.0

06

Sero

us

LMP

LM

P -

93.0

73

Sero

us

LMP

0 LM

P -

90.0

37

Sero

us

LMP

5 LM

P -

91.0

77

Sero

us

LMP

5 LM

P -

93.0

90

Sero

us

LMP

5 LM

P -

90.0

63

Sero

us

LMP

9 LM

P -

2202

7 Se

rous

LM

P 9

LMP

- 44

232

Sero

us

LMP

9 LM

P -

7005

6 Se

rous

LM

P 9

LMP

- 70

057

Sero

us

LMP

9 LM

P -

P006

33

Sero

us

LMP

9 LM

P -

WM

389A

Se

rous

LM

P 9

LMP

- W

M54

2A

Sero

us

LMP

9 LM

P -

WM

578A

Se

rous

LM

P 9

LMP

Pote

ntia

l ser

omuc

inou

s cas

e. In

vasi

ve im

plan

ts

204

5.2.2. Generation of cDNA microarray expression dataset

Fresh-frozen pathology-reviewed biopsies from selected cases of EOC were processed by

Dileepa Diyagama according to standard protocols, as previously described (Boussioutas

et al., 2003). Amplified RNA was hybridised to 10.5k cDNA microarray slides using the

pooled cell line reference as described in Material and Methods section 2.3.3.1 and

Sambrook and Bowtell (Sambrook and Bowtell, 2003). Hybridised arrays were scanned

on an Agilent Microarray Scanner BA) and the scanned images were converted to gene

expression ratios with Axon GenePix image analysis software. Individual array features

were marked or flagged as either ‘present’, ‘marginal’ or ‘absent’ based on predetermined

quality control settings, described in section 2.4.2, to exclude hybridisation artefacts and

poor quality array features from downstream analysis.

MMT scores were calculated to assess the level of spatial bias present in the microarray

data generated for this study as described in Chapter 3. All scores were in the acceptable

range of <200, as determined by calibration of this test to the Peter Mac 10.5k cDNA

platform. During this work, an online tool that facilitated the batch-wise normalisation of

microarray data using the print-tip normalisation method was released (Herrero et al.,

2003; Vaquerizas et al., 2004). This method uses the lowess method of curve fitting to

correct for bias associated with the individual printing pins used to spot the cDNA

material onto the glass substrate (Sambrook and Bowtell, 2003). Over the course of this

thesis this method of spatially-dependant normalisation has been widely adopted by the

microarray field more so than the SNOMAD method previously described (section 2.4.3).

As both methods are based on similar principles, the print-tip method was selected for

this chapter, described in Materials & Methods section 2.4.3.3.

205

Figure 5-2: Overview of gene expression based predictions used in association with the pathology sample review process. (i) Initially each specimen of EOC (n=55) was compared to a database of nine other tumour types to identify any samples of non-primary ovarian origin. (ii) Samples confirmed as primary EOC were then analysed to confirm their histological subtype. This was a concern for specimens where the pathology report indicated the specimen contained regions of mixed histological subtype. (iii) The invasive or LMP phenotype of each microarray profile was then predicted based on LOOCV. This assumes that the majority of samples in the cohort were originally correctly classified at this level. Discrepancies can exist between the official pathology classification of a specimen and that obtained from this process because of sampling bias and heterogeneity within individual tumours. Genes were re-selected at each iteration of LOOCV, for all three levels of classification (Simon et al., 2003b).

(ii) Predict histological subtype

(Mucinous/Serous/Other)

(i) Predict primary ovarian vs. nine primary tumour classes

Exclude from study

Exclude from study

(iiia) Predict invasive/LMP

status

‘Other’ Ovarian

‘Other’ Mucinous

(iiib) Predict invasive/LMP

status

Serous

(iv-a) Confirmed as

primary mucinous LMP

EOC

(iv-b) Confirmed as

primary mucinous

invasive EOC

(iv-c) Confirmed as

primary serous LMP

EOC

(iv-d) Confirmed as

primary serous

invasive EOC

LMP Invasive LMP Invasive

Predict tissue of origin

206

5.2.3. Creation of a EOC gene expression signature for assistance in confirmation of primary ovarian origin

In order to assist in the pathology review process and confirm some of the observations

and classifications made, predictive algorithms were trained using gene expression data

from this and other studies and applied to all cases in the study. In a hierarchical process,

samples were first analysed to confirm their primary ovarian origin, as previously

described. To achieve this, a classifier was created using gene expression data provided

by Richard Tothill representing over nine different tumour types (Tothill et al., 2005). An

overview of this process is shown in Figure 5-2.

5.2.3.1. Selection of cases for use in training set for the prediction of primary ovarian origin

To build the first classifier in the predictive pathway for this study, raw gene expression

data for specimens of nine types of primary carcinoma were obtained from a parallel

study at the Peter MacCallum Cancer centre into carcinoma of unknown primary (Tothill

et al., 2005). For some tumour types, a larger number of samples were available than

could be used in the training process. In this case those samples with the highest

percentage tumour and pathology review agreement with the original diagnosis were

selected. A full list of the samples used in this analysis and associated pathology review

comments is given in Appendix I.

To train the first level binary predictor, capable of separating ovarian tumours from other

tumour types, two groups of samples were created; (i) 19 confirmed primary ovarian

tumours from the carcinoma of unknown primary project as described and (ii) 115

samples representing nine other tumour types as summarised in Figure 5-3. These

included lung, breast, colorectal, gastric, renal, melanoma, uterine, SCC and pancreatic

cancers, representing the most common origin of metastatic disease found in the ovary

(Blaustein, 1982; Fujiwara et al., 1995; Giordano et al., 2001).

207

Figure 5-3: Pie chart of tumour types and sample numbers used to train a range of predictive algorithms to identify primary EOC. Tumour types were selected based on the most frequently observed origins of metastatic EOC based on literature reports (Blaustein, 1982; Fujiwara et al., 1995; Giordano et al., 2001). Individual specimens were selected from the total pool available based on highest tumour percentage and pathology review agreement with the original diagnosis (Tothill et al., 2005).

Lung, 20

Breast, 20

Ovarian, 19

Colorectal, 16

Gastric, 15

Renal, 10

Melanoma, 10

Uterine, 9

SCC, 9 Pancreas, 6

208

5.2.3.2. Algorithm training for the gene expression based prediction of ovarian vs. non-ovarian primary origin

An unsupervised data-reduction filter was first applied to the training set to remove genes

not expressed at detectable levels, or not significantly varying across the dataset relative

to the median level of variation present. After removing any gene with (i) no signal in

50% or more of the samples and (ii) a log-ratio variation p-value of < 0.001, a list of

2,907 genes was left for further supervised analyses.

Next a range of algorithms were trained to distinguish between primary EOC and the

group of nine other tumour types using methods described in Materials and Methods

section 2.4.8. After the most significantly predictive subset of these 2,907 genes had been

identified by LOOCV of the training set, the trained algorithms were applied to the 54

gene expression profiles specifically generated for the comparison of LMP and invasive

EOC. These data were not included in the selection of the ‘EOC:other’ predictive genes.

A combination of algorithms, implemented in the BRB ArrayTools analysis package,

were used to classify each of the 54 profiles as either primary EOC or not (Simon and

Lam). These algorithms were LDA, 1-NN, 3-NN, and NCC, as described in Materials and

Methods. By using a multiple algorithms the primary EOC status of each sample was thus

predicted four independent times and it was possible to identify the algorithm most suited

to this form of classification.

All four algorithms were highly accurate in their ability to assign samples from the

training set into their correct class. On average the classification accuracy observed was

98.3%. As the gene selection process is repeated for each cycle of the LOOCV, a slightly

different number of genes may be used by the algorithm for each classification. The mean

number of genes required for these predictions was 213. A summary of the predictions

made on the training set of samples for each algorithm is shown in Table 5-2. Several

important criteria are given to evaluate the performance of each algorithm. These are:

• Sensitivity: True positive rate – the probability of predicting a true primary

ovarian sample as ‘ovarian’

• Specificity: True negative rate – the probability of predicting a non-ovarian

sample as ‘non-ovarian’

209

• Positive predictive value (PPV): The probability a sample is actually primary

ovarian cancer, if given an ‘ovarian’ prediction by the algorithm.

• Negative predictive value (NPV): The probability that a sample is NOT

ovarian if predicted as ‘non-ovarian’

The closer these four values are to 1.0 the greater the accuracy of the algorithm and the

lower chance of false negative or false positive predictions being made. The 1-NN

algorithm generated only 1 misclassification from the 134 samples tested and

consequently produced the optimal sensitivity, specificity, PPV and NPV. In general, all

algorithms trialled performed with a high degree of accuracy, comparable or superior to

several published analyses of tumour origin on the basis of molecular profiling.

Table 5-2: Performance of the machine learning classifiers in predicting the primary ovarian origin of a given specimen, as determined by LOOCV.

Classifier Class Sensitivity Specificity PPV NPV

Other 1 0.947 0.991 1 1-NN Ovarian 0.947 1 1 0.991 Other 1 0.895 0.983 1 3-NN Ovarian 0.895 1 1 0.983 Other 0.991 0.895 0.983 0.944 NCC Ovarian 0.895 0.991 0.944 0.983 Other 0.991 0.842 0.974 0.941 LDA Ovarian 0.842 0.991 0.941 0.974

5.2.3.3. Permutation analysis to assess the statistical significance of predictions

In order to determine whether the error rate reported by the LOOCV experiments was

significantly lower than one would expect from random predictions, permutation analysis

was carried out. This involved the random permutation of the class labels of the samples

and repeating of the entire LOOCV process. The number of random permutations, for

which a cross-validated error rate lower than that obtained when the correct class labels

were assigned, was used to determine the significance level of each predictor. A total of

2,000 permutations per algorithm, of the class labels and predictive analyses were carried

out to ensure adequate sampling. Gene selection was repeated for each trial. The p-value

for each predictor was <5 x 10-4, confirming that the predictions made by these

algorithms are not based on random noise within the dataset.

210

5.2.3.4. Use of multiple algorithms and LOOCV to predict primary ovarian origin of specimens

One benefit of using multiple predictive algorithms is that each serves as an

‘independent’ attempt at classification. Prediction results can be compared to determine

whether a sample was incorrectly predicted on a single occasion or by more than one

algorithm. The misclassified samples from the training set are shown in Tan;e 5-3, along

with the number of misclassifications. Comments based on further interrogation of the

pathology associated with these cases are also given. It is important to investigate

incorrect classifications to determine if a particular class is frequently being misclassified,

which may indicate a problem with the quality of the expression data for that class or

inadequate number of samples to generate a robust predictive signature.

These results indicate that this approach is effective for identifying samples that may have

been incorrectly diagnosed as EOC or potentially mislabelled during the experimental or

data analysis stages of the experiment.

Both samples of endometrioid uterine cancer present in the training set were predicted as

EOC. This cancer arises from the endometrium lining of the uterus and their classification

as ovarian in this analysis suggests these tumours may have a higher degree of molecular

similarity to EOC than to the other tumour types present.

Of the four samples of EOC assigned to the non-ovarian category, review of the

pathology revealed one case of Malignant Mixed Mullerian type, one case of

endometrioid cancer and one case of metastatic colorectal cancer, all incorrectly assigned

to the ovarian category. No discrepancy between the original pathology report review

information for the remaining incorrectly classified sample was found. This tumour was

of mucinous LMP histology.

211

Table 5-3: Predictions and associated pathology review comments for samples resulting in incorrect classifications during the algorithm training stage.

Sample ID

Class used for algorithm training

Details about case available at time of analysis:

No. times incorrectly predicted

Comments

UP415 Other Endometrioid uterine cancer 1

Only two samples of endometrioid cancer included in training set

UP421 Other Endometrioid uterine cancer 2

Only two samples of endometrioid cancer included in training set

UP075 Ovarian Mucinous LMP EOC 2

No discrepancy between pathology review and original diagnostic report noted.

UP146 Ovarian Serous EOC 3 Sample appears to be a Malignant Mixed Mullerian Tumour (MMMT) of the Ovary

UP165 Ovarian Serous EOC 1 Sample initially diagnosed as serous but appears to be endometrioid upon review.

UP286 Ovarian Mucinous LMP EOC 5

Sample is metastatic colorectal tissue and not primary EOC on review of pathology report and H&E stained section

5.2.4. Application of the trained predictive algorithms to the invasive/LMP dataset

Following the algorithm training stage and development of the 231 gene predictor of

EOC primary origin, predictive analysis of all prospective studies for this project was

carried out. 39 of the 55 samples analysed by this method were predicted as being

primary EOC by a majority of the algorithms used (71%). The prediction details for each

specimen are given in Appendix J.

Those samples predicted as non-ovarian by one or more algorithm used are listed in Table

5-4, together with comments from the pathology review process, which was carried out

completely independent to the gene expression based analysis. Array scores for the EOC

markers cytokeratin 7 and 20 are also given where data were present for these genes.

High levels of cytokeratin 20 and low levels of cytokeratin 7 are suggestive of metastatic

disease; however immunohistochemistry is usually performed to determine their relative

212

abundance (Ji et al., 2002; Loy et al., 1996; Nishizuka et al., 2003). The information

obtained from the pathology review of the majority of these specimens agreed with their

microarray based prediction of non-primary ovarian origin, confirming the validity of

gene expression based predictions of tumour origins for metastatic disease (Ramaswamy

et al., 2003; Ramaswamy et al., 2001; Tothill et al., 2005).

After the predictions of primary origin had been carried out and compared to the

information obtained from the pathology review process, it became evident that the

invasive mucinous subtype of EOC is the most frequently misclassified or most subject to

sampling bias, of those included in this analysis. Consultation of the literature confirmed

this observation (Hess et al., 2004; Lee and Scully, 2000; Lee and Young, 2003; Ronnett

et al., 2004; Seidman et al., 2003). Only one case of serous carcinoma was predicted to be

non-ovarian by the expression-based classifiers. Upon review of the H&E stained section

associated with the case it was determined to be of endometrioid histology, a subtype not

intended to be included in this study.

After excluding samples that did not pass pathology review process, all of which were

predicted to be non-ovarian based on their expression profile, the breakdown of samples

remaining in the cohort was: mucinous invasive EOC (2), mucinous LMP (14), serous

invasive (11) and serous LMP (19), for a total number of 46 cases.

213

Tab

le 5

-4:

Sam

ples

of

LM

P or

inva

sive

EO

C p

redi

cted

as

non-

ovar

ian

by a

t le

ast

one

algo

rith

m. C

omm

ents

fro

m p

atho

logy

rev

iew

list

ed

with

gen

e ex

pres

sion

ratio

s for

cyt

oker

atin

mar

kers

7 (C

K7)

and

20

(CK

20),

whi

ch a

re u

sed

diag

nost

ical

ly fo

r ide

ntify

ing

met

asta

tic d

isea

se to

the

ovar

y.

Patie

nt ID

H

isto

logi

cal

subt

ype

Inva

sive

/LM

P C

omm

ent

Arr

ay

CK

7 A

rray

C

K20

E

xclu

ded

from

st

udy

90.0

07

Muc

inou

s In

vasi

ve

Com

plex

, aty

pia,

nec

rosi

s con

sist

ent w

ith w

ell

diff

eren

tiate

d in

vasi

ve c

arci

nom

a 3.

08

0.74

Y

es

91.0

39

Sero

us

Inva

sive

Sa

mpl

e is

end

omet

rioid

upo

n re

view

of H

&E

sect

ion

0.93

1.

48

Yes

94.0

36

Muc

inou

s In

vasi

ve

Om

enta

l spr

ead

indi

cate

d in

pat

h re

port

– su

gges

tive

of

met

asta

tic d

isea

se

1.20

23

.37

Yes

5102

6 M

ucin

ous

LMP

Not

hing

susp

icio

us in

pat

h re

port

or o

n re

view

of H

&E

sect

ion

3.64

1.

61

5103

0 M

ucin

ous

LMP

Hig

h ar

ray

CK

20 -

sugg

estiv

e of

col

orec

tal m

etas

tase

s. 1.

82

24.8

7

92.0

11

Muc

inou

s LM

P N

othi

ng su

spic

ious

in p

ath

repo

rt or

on

revi

ew o

f H&

E se

ctio

n

18.0

2

93.0

02

Muc

inou

s LM

P N

othi

ng su

spic

ious

in p

ath

repo

rt or

on

revi

ew o

f H&

E se

ctio

n

2.05

93.0

85

Muc

inou

s LM

P R

epor

t say

s tum

our m

ay b

e m

etas

tatic

and

revi

ew

indi

cate

s sec

tion

is a

ppro

xim

atel

y 1%

tum

our c

onte

nt

1.

36

Yes

94.0

72

Muc

inou

s LM

P N

othi

ng su

spic

ious

in p

ath

repo

rt or

on

revi

ew o

f H&

E se

ctio

n

3.21

94.0

80

Muc

inou

s LM

P V

ery

little

tum

our p

rese

nt o

n H

&E

slid

e +

high

arr

ay

CK

20.

12

.65

Yes

P004

88

Muc

inou

s LM

P Pa

th re

port

indi

cate

s sam

ple

is m

etas

tatic

from

app

endi

x

37.5

8 Y

es

P006

27

Muc

inou

s LM

P N

othi

ng su

spic

ious

in p

ath

repo

rt or

on

revi

ew o

f H&

E se

ctio

n

1.07

P007

18

Muc

inou

s LM

P N

o tu

mou

r in

H&

E se

ctio

n

4.26

Y

es

P008

07

Muc

inou

s LM

P N

othi

ng su

spic

ious

in p

ath

repo

rt - r

eque

st C

K7

CK

20.

Hig

h ar

ray

CK

20.

6.

79

WM

438

Muc

inou

s LM

P Pa

th re

view

indi

cate

d m

etas

tatic

col

orec

tal d

isea

se

pres

ent

Yes

WM

439A

M

ucin

ous

LMP

Path

repo

rt in

dica

tes s

ampl

e is

met

asta

tic fr

om a

ppen

dix

Yes

214

5.2.4.1. Hierarchical clustering analysis of the 231 gene EOC expression signature

A list of 231 genes was chosen by the classifier for prediction of ‘test’ samples based on

the number of times each gene was selected in the LOOCV iterations and its individual

parametric p-value for discriminating between ovarian and ‘other’ tumours of 9 varieties.

This list of genes was annotated using the UniGene database (Build #184) and is provided

in Appendix K. The genes are discussed in section 5.2.5.2.

Hierarchical clustering (as described section 2.4.4.1) using the genes identified by

LOOCV allowed inspection of differences in their expression levels between ovarian and

the other tumour types. The resulting cluster image is shown in Figure 5-5. A clear divide

between ovarian and non-ovarian tissues can be observed, which is expected from a

supervised clustering analysis. It is interesting to note however, that despite the binary

method that was used to select these 231 genes, i.e. EOC or non-EOC, the other nine

tumour types have formed sub-clusters corresponding to their site of origin. By inspecting

the body of the cluster figure it is possible to observe patterns of up and down regulation

that have resulted in the grouping of samples belonging to the same tumour type,

particularly for the breast, colorectal, uterine and renal cancers which have formed

discrete sub-clusters. This observation implies that tissue-specific patterns exist in gene

expression data at several levels. By design, the largest and most significant difference in

the expression of these 231 genes is between EOC and the other nine tumour types

grouped as a single class. Despite this, inspection of the hierarchical cluster generated

shows clear tissue or site-specific patterns of expression.

Inspection of the relative positioning of ovarian samples in Figure 5-4 revealed a

histologically driven dendrogram structure. An expanded view of this section is shown in

Figure 5-5 with the colour bar below the cluster corresponding to histological subtype of

the tumours. Again, the intrinsic molecular differences present between the predominant

EOC subtypes present (serous and mucinous) appears to have created a biologically

relevant sub-structure, despite these samples being treated as a single class for the gene

selection and evaluation section of the analysis.

In general, this molecular signature is capable of distinguishing EOC from the nine other

tumour types tested as well as identifying a sample as either mucinous or serous with a

high degree of accuracy.

215

Figure 5-4: Supervised hierarchical cluster of ovarian and nine other classes of primary carcinomas using 231 differentially expressed genes (p<0.001). Clear separation of ovarian (green section of colour bar at base of cluster) and the other nine tumour types can be observed on the basis of these selected genes. Even though a binary comparison of EOC vs. non-EOC was used to generate the gene list, grouping of the non-EOC portion of the cluster is according to tumour types. This indicates that as well as distinguishing EOC from other tissue types, these genes have expression characteristics unique to other types of cancer.

Lung

Ovarian

Breast

Colorectal

Melanoma

Uterine

Gastric

Pancreas

Renal

SCC

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.2

1.5

2.0

2.5

3.0

3.0

1.0

0.3

Expression ratio

21

6

Figu

re 5

-5: E

xpan

sion

of E

OC

tree

from

supe

rvis

ed c

lust

er o

f EO

C-p

redi

ctiv

e ge

nes (

Figu

re 5

-4).

EOC

spec

imen

s hav

e cl

uste

red

acco

rdin

g to

his

tolo

gica

l su

btyp

e ev

en th

ough

gen

es w

ere

sele

cted

for

diff

eren

tial e

xpre

ssio

n be

twee

n EO

C a

nd a

gro

up o

f ni

ne o

ther

prim

ary

tum

our

type

s. Th

is r

efle

cts

the

inna

te

stru

ctur

e w

ithin

the

mic

roar

ray

data

that

cor

resp

ondi

ng to

kno

wn

clin

ical

par

amet

ers.

EO

C su

btyp

e

Muc

inou

s

Sero

us

Endo

met

rioid

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.2

1.5

2.0

2.5

3.0

3.0

1.0

0.3

Exp

ress

ion

ratio

217

5.2.4.2. Gene ontology analysis of the 231 gene signature predictive of primary EOC

Gene ontology analysis was carried out to investigate the functional composition of the

genes contained in the list of 231 genes selected by LOOCV. The list of 2,907 genes

generated by unsupervised data-reduction filtering was used as a reference list of genes to

provide a measure of the frequency of genes belonging to a specific ontological group.

The analysis was carried out using the EASE method, which uses a modified Fishers

Exact test to determine statistical significance as described in section 2.4.9.1.

The biological process ‘development’ ontology is the most significantly represented gene

classification in the EOC signature (EASE score = 0.00292). This second-level ontology

represents genes involved in the developmental progression of an organism over time

(Zeeberg et al., 2003) and contains a number of genes involved in skeletal development

as well as having demonstrated involvement in ovarian malignancy, including BMP5,

ADAMTS4, KLK7 and KLK8 (Dong et al., 2003; Shigemasa et al., 2004). Genes in this

higher-level ontology have a broad range of functions, with the central link being their

involvement in tissue development, possibly representing pathways of progression unique

to EOC.

The ontologies ‘cell adhesion’, ‘basement membrane’, ‘extracellular matrix’ and

‘binding’ represent genes expressed by both the tumour cells and cells of the extracellular

matrix involved in regulation of important adhesion interactions which are characteristic

of EOC development (Gardner et al., 1995; Kim et al., 2003; Patel et al., 2003; Sundfeldt,

2003). NID2 and LAMB2 are both major components of the basement membrane and

important regulators of tumour/stroma interaction and associated growth regulation;

ADAMTS4 and other genes with matrix-degradation properties; CD31, CD44, CHL1,

CDH2 and other cell-surface expressed molecules involved in the physical attachment of

cancer cells to each other, or to the extracellular matrix (Gardner et al., 1995; Martin et

al., 2003; Sillanpaa et al., 2003). As observed in Chapter 4 and also discussed in later

sections, cell-cell and cell-matrix genes play a crucial role in EOC invasion and

progression and have certain ovarian-specific properties not observed in other cells of the

body. Therefore the observation of these ontologies in the 231 gene signature of EOC is

well supported by the literature as an important class of genes in the development and

progression of this disease type.

218

The ‘transcription factor activity’ ontology (GO consortium ID: 0003700) contains a

number of known genes whose mutation or altered expression results in tumorigenesis.

These include ESR1, MYCN and WT1 which are capable of altering DNA transcription

leading to over expression of oncogenes or under expression of tumour suppressor genes

relative to a non-cancerous state (Bardin et al., 2004; Lee et al., 2002; Slamon et al.,

1986). Whilst this is a process common to many cancers, the transcription factors

included belonging to this gene ontology in the list of 231 genes are expressed at

significantly different levels in EOC relative to the nine other tumour types analysed.

Table 5-5: Significantly represented gene ontologies represented by the 231 gene signature of EOC. The identification of adhesion, membrane and extracellular matrix ontologies indicate that the nature or extent of extracellular interaction of EOC may be a defining characteristic.

Ontology Genes (listed in order of statistical significance for discriminating between EOC and nine other tumour types)

EASE Score

Biological process: Development

FGF18, ZNF258, PAX8, ZNF261, BMP6, CPZ, , GATA4, HOXD4, TNFAIP2, ITGA2, D8S2298E, SMCY, BST2, PLXND1, EFNB3, PPP2CB, MYH10, PLXNB1, PLXNB1, POSTN, TRO, IMP-2, NDP, BMP5, KDR, KLK7, AMHR2, ADAMTS4, KLK8, ETS2, NGFRAP1, APLP1, DLK1, PBX1, UFD1L, PITX2, , KLK5, BMP7, HOXD8, SGCB

0.00292

Biological process: Cell adhesion

LAMB2, CD44, CDH6, DCBLD2, , NCAM1, ITGA2, NID2, PPP2CB, POSTN, TRO, SSPN, PTPRU, ASTN, NPTX2, NCAM1, FLRT2, ENTPD1, APLP1, CHST10, PECAM1, TNC, CHL1, MAG, ITGA6, CDH

0.00345

Biological process: Regulation of transcription

SMARCD3, WT1, ESR1, PAX8, SALL2, GATA4, HOXD4, PEG3, PEG3, ID4, SMCY, PPP2CB, PLXNB1, PLAGL2, TLE4, ETS2, FOXF2, ZFPM2, MYCN, SP110, PBX1, PITX2, HOXD8

0.0416

Cellular component: Basement membrane

LAMB2, NID2, SSPN, EFEMP2, APLP1, SGCB 0.0116

Cellular component: Extracellular matrix

MATN2, LAMB2, NID2, POSTN, SSPN, EFEMP2, FLRT2, ADAMTS4, APLP1, TNC, SGCB 0.031

219

Ontology Genes (listed in order of statistical significance for discriminating between EOC and nine other tumour types)

EASE Score

Molecular function: Binding

FGF18, DAPK1, EIF1AY, SMARCD3, MATN2, WT1, RAB11FIP5, LAMB2, SDC3, FGR, ZNF258, PNOC, CD44, ESR1, PLTP, PDZK3, PAX8, CEACAM1, ZNF261, CDH6, BMP6, SALL2,GATA4, DCBLD2, DDX3Y, HOXD4, , STX6, PEG3, PEG3,NCAM1, GAS6, , ITGA2, ID4, SMCY, NID2, LGALS3BP, TRIP6, PPP2CB, MYH10, PLXNB1, POSTN, RCN2, TRO, IMP-2, ASTN, NDP, ASS, EFEMP2, NPTX2, CA14, PLAGL2, GABRA1, , BMP5, KDR, NCAM1, FLRT2, AMHR2, AGT, ADAMTS4, KCNS3, , VAV3, ENTPD1, ARF4L, PEG10, CP, ETS2, APLP1, KIF5C, FOXF2, DLK1, PECAM1, FOLR1, TNC, MAG, ITGA6, ZFPM2, MYCN, SP110, PBX1, PRKCI, PITX2, IQGAP2, CDH2, BMP7, LPHN2, HOXD8

0.0259

Molecular function: Peptidase activity

CPZ, ST5, , KLK7, ADAMTS4, KLK8, NAALAD2, BAP1, PRSS23, UFD1L, KLK5 0.0382

Molecular function: Transcription factor activity

WT1, ESR1, PAX8, SALL2, GATA4, HOXD4, PEG3, PLXNB1, PLAGL2, , ETS2, FOXF2, MYCN, PBX1, PITX2, HOXD8 0.049

5.2.5. Gene expression based prediction of EOC histological subtype

Samples that had been confirmed as primary ovarian in origin, from both pathology and

microarray analysis, were then analysed to confirm their histological LMP or invasive

classification. Whilst distinct histological features of the mucinous and serous EOC

subtypes mean they are rarely misdiagnosed, the heterogeneity of EOC has the potential

to result in significant discrepancy between the cellular composition of the specimen used

for microarray analysis and that used for sectioning, IHC and pathological review

Using LOOCV, the subtype of each specimen was predicted, based on expression of

those genes identified as differentially expression between mucinous and serous tumours

in the remainder of the dataset. For each iteration of the cross validation process, gene

selection was repeated to ensure that information in the sample being predicted did not

bias the identification of predictive genes. Samples of mucinous or serous EOC from

Tothill et al (Tothill et al., 2005) were also included in the analysis to strengthen the

predictions by increasing the sample size of each subtype.

220

The endometrioid subtype was excluded from this stage of the analysis due to insufficient

numbers of this class being available for adequate representation and the intention of this

project to focus on the mucinous and serous histological subtypes.

Binary predictors were again created using the same four algorithms described for the

prediction of ovarian vs. non-ovarian primary origin. From the list of 2,907 genes

generated previously, a final classifier comprised of 618 genes was obtained. Genes were

included in the final classification set according to the proportion of LOOCV iterations in

which they are identified as being differentially expressed at a significant level.

In summary, the predictions from three of the four of the algorithms used (the exception

being the 1-kNN method) agreed with the original classification for 44 of 46 cases

remaining in the cohort at this stage of the analysis. This confirmed that the correct

histological classification was given to the majority of cases at the time of their original

diagnosis and also that the specimen of tumour used for microarray profiling was

consistent with that used for pathological review. Samples for which a discrepancy

between the predicted histology and that give at time of diagnosis were noted are shown

in Table 5-6.

Sample BRA02 was originally diagnosed and included in the study as a mucinous LMP

specimen, however it was observed to cluster more tightly with the serous EOC tumours

in the hierarchical clustering analysis, shown in Figure 5-4. During LOOCV, BRA02 was

also predicted as being of serous histology, adding suspicion to the original diagnosis or

highlighting a potential sampling bias.

Review of the pathology record and H&E stained section associated with sample BRA02

revealed the presence of focal serous areas within the predominantly mucinous tumour.

Therefore the gene expression profile of this specimen is likely to contain elements of

both mucinous and serous EOC cells, the proportion of each being dependant on the

composition of the tumour biopsy that was used for microarray profiling.

As a result, specimen BRA02 was excluded from further analysis because of the strength

of the discrepancies observed between the original diagnosis and subsequent pathology

review and microarray analysis.

The other sample which resulted in a discrepancy between the diagnosed and predicted

histological subtype was BRA24. This tumour was diagnosed as mucinous EOC and

predicted as such by four of the four classifiers used. This specimen clustered

221

immediately adjacent to the poorly-differentiated serous sample UP149, predicted to be

of mucinous histology, as shown in Figure 5-5. No unusual features were noted during

the pathology review BRA24 and the majority of gene expression predictors were in

agreement with the original diagnosis. In light of these observations, this sample was not

excluded from the cohort.

Table 5-6: Samples with incorrect histological subtype predictions from 1 or more classifier

Agreement with original diagnosis: Array id Class label Number of genes in classifier DLDA 1-kNN 3-kNN NCC

BRA02 Mucinous 632 NO NO NO NO

BRA24 Mucinous 596 YES NO YES YES

5.2.6. Confirmation of LMP/invasive status with gene expression prediction analysis

Following the exclusion of two samples based on the pathology review and array

prediction of histological subtype, 45 samples remained in the cohort. These included 2

mucinous invasive, 13 mucinous LMP, 11 serous invasive and 19 serous LMP tumours.

Next the LMP or invasive status of each specimen was confirmed by gene expression

based prediction, using the same method for the prediction of histological subtype. The

two samples of mucinous invasive tumour were both predicted as being of LMP. This

most likely reflects the inadequate number of this class of tumours present in the dataset

for ANOVA-based gene selection and predictive analysis, rather than true misdiagnoses.

Even prior to obtaining this result, it was apparent that the number of true mucinous

invasive samples available was inadequate for a balanced comparison between invasive

and LMP EOC within the two histological subtypes. As a result, it was decided to focus

this study on the serous subtype only, for which 11 invasive and 19 LMP reviewed and

diagnosis-confirmed samples were available. Mucinous EOC is increasingly being

regarded as a separate entity to the serous type of ovarian cancer on the basis of a

growing body of molecular and clinical evidence (Gilks, 2004; Hess et al., 2004; Lassus

et al., 2001).

222

5.2.7. Identification of differentially expressed genes between serous LMP and serous invasive EOC

To identify those genes with robust patterns of differential expression between the serous

LMP and invasive tumours, the Significance Analysis of Microarrays (SAM) method was

used (Tusher et al., 2001). An unsupervised filter of log-ratio variation p-value < 0.001

and fewer than 50% missing values was again applied to the normalised data, leaving

2,965 genes available for SAM testing.

1,302 genes were identified as having significant differential expression between invasive

and LMP serous EOC. The median false discovery rate among the list of 1302 genes was

0.0088, indicating that less than 1% of the genes in this list were selected by chance

alone. A SAM plot is shown in Figure 5-6 and reveals the balanced distribution of up and

down regulated genes in the list created. Approximately 13% of the entire microarray and

over 50% of the unsupervised-filtered gene list was therefore identified as differentially

expressed between these tumour subtypes. This indicates the substantial variation that

exists at the molecular level between these classes of EOC.

5.2.7.1. Visualisation of expression differences between serous LMP and invasive EOC

Hierarchical clustering and principle component analysis were used to visualise the

degree of difference between these classes of tumours on the basis of the genes identified

by SAM analysis, shown in Figure 5-7. The clear divide between these samples can be

observed as well as the balance between up and down regulated genes selected for each

class.

Principle component analysis (PCA) was carried out to further visualise the difference

between LMP and invasive serous tumours represented by these genes. This method

reduced the variation in gene expression values to a small number of ‘principle

components’ each representing a percentage of the total variation present. By plotting the

first three principle components it is possible to visualise the samples being investigated

in three-dimensional space, as shown in Figure 5-8 where the extent of difference

between these tumour classes can again be clearly observed. Visualisation techniques

such as these assist in gauging the extent of the divide between two classes of samples

and identification of misclassified samples.

223

Figure 5-6: SAM plot for selection of genes differentially expressed between serous LMP and invasive EOC. Genes indicated by red dots above the threshold (dashed) line represent those over expressed in the invasive EOC cases relative to the LMP tumours. Green dots indicate genes with lower expression in invasive EOC than LMP tumours. 1,302 differentially expressed genes were identified by this approach.

224

Figure 5-7: Hierarchical clustering of 1302 SAM-selected genes identified as differentially expressed between serous LMP and invasive EOC. A slightly larger proportion of the genes identified by SAM are upregulated in the serous invasive tumours relative to the proportion upregulated in LMP samples.

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.2

1.5

2.0

2.5

3.0

3.0

1.0

0.3

Expression ratio

Serous invasive

Serous LMP

Histology

225

Figure 5-8: Principle component analysis of SAM identified differentially expressed genes between serous LMP and invasive EOC. This three dimensional view of the three most significant principal components reveals a clear divide between these subtypes of EOC on the basis the differentially expressed genes identified.

226

5.2.7.2. Gene ontology analysis of 1302 genes differentially expressed between serous LMP and invasive EOC

Gene ontology analysis was carried out as previously described using the 1302 genes

differentially expressed between LMP and invasive serous EOC. The EASE method was

used for ontology analysis as it takes into account the composition of the array platform

used, the size of each ontology classification and also the potential for co-expression of

genes in computing the significance scores. Table 5-7 lists the classifications with EASE

scores < 0.05. The analysis was carried out to three levels of the gene ontology hierarchy

as recommended by Hosack et al (Hosack et al., 2003).

Inspection of the significantly represented gene ontologies revealed that a high proportion

of the genes differentially expressed between LMP and invasive EOC are involved in

regulation of the immune response, control of the cell cycle, as well as cell proliferation,

movement and adhesion. The cellular localisation most significantly represented suggests

that a large number of these differentially expressed genes are membrane bound or

expressed in the extracellular matrix.

The most significant biological processes distinguishing serous LMP and invasive EOC

involved genes that regulate the immune system. This gene ontology includes INHBA,

which regulates cell proliferation and has a tumour suppressing role, facilitates TGFβ-

mediated immunosuppression in thymocytes (T-cell precursors) (Ying and Becker, 1995);

CXCL9, a chemotactic protein for activated T-cells, the presence of which in EOC has

been shown to have a significant prognostic value (Zhang et al., 2003); and GAGEB1

which codes for a specific antigen recognised by cytolytic T-cells (Van den Eynde et al.,

1995). Genes in this ontology are upregulated in the invasive tumours relative to the LMP

type and represent one aspect of the body’s own response to the detection of invading

tumour cells.

The ontologies representing genes involved in mitosis and proliferation are also highly

significant by EASE analysis. These genes are again predominantly upregulated in the

invasive tumours, reflecting the faster growth rate of this subtype. Differentially

expressed genes representing these ontologies include CCNB2, essential for control of the

cell cycle at the G2/M transition and whose over expression can lead to chromosomal

instability (Sarafan-Vasseur et al., 2002); ANAPC7, a ubiquitin ligase that controls

progression through mitosis and the G1 phase of the cell cycle (Sarafan-Vasseur et al.,

2002); and PTTG1 which blocks segregation of chromosomes during mitosis and also has

227

been shown to negatively regulate the transcriptional and related apoptosis activity of

TP53, a known oncogene implicated in many cancers (Zhang et al., 1999).

Another distinct theme present in the 1,302 differentially genes are ontologies

representing cell-cell and cell-matrix adhesion genes. Reflecting the cellular location of a

large number of these genes, the cellular component ontologies ‘integral to plasma

membrane’ and ‘extracellular matrix’ are also highly statistically significant. Genes

representing these categories include FN1, a component of the extracellular matrix and

prognostic EOC marker with an important role in the attachment of tumour cells to the

mesothelium (Franke et al., 2003); MSLN, which has recently been shown to bind the

cell-surface EOC marker CA-125 and mediate cell-matrix adhesion (Rump et al., 2004);

CLDN10 which plays a major role in tight junctions, a type of cell adhesion that serves as

a physical barrier to prevent solutes and water from passing freely through the space

between epithelial or endothelial cell sheets (Kubota et al., 1999), and finally CDH11,

one of a large number of calcium-dependant cadherins differentially expressed between

invasive and LMP tumours, required for homophilic cell adhesion (Tanihara et al., 1994).

Table 5-7: Significantly represented gene ontologies in the 1,302 genes differentially expressed between invasive and LMP EOC.

Biological process ontologies No. genes EASE Score Immune response 96 0.000089 Mitosis 28 0.000169 Cell adhesion 81 0.000204 Response to biotic stimulus 109 0.000239 Skeletal development 23 0.000289 Cell proliferation 129 0.000469 Complement activation 10 0.00144 Muscle contraction 20 0.00812 Cell motility 38 0.0188 Cell-matrix adhesion 12 0.0324 Cellular component ontologies No. genes EASE score Integral to plasma membrane 147 0.000000895 Chromosome, pericentric region 9 0.00331 Extracellular matrix 38 0.0119 Molecular function ontologies No. genes EASE score Transmembrane receptor activity 82 0.00018 Glucuronosyltransferase activity 8 0.000609 Copper ion binding 8 0.0109 Growth factor binding 11 0.0111 MHC class II receptor activity 6 0.0226 Extracellular matrix structural constituent 14 0.0267 Exopeptidase activity 12 0.027 Ligand-dependent nuclear receptor activity 12 0.0379 Protease inhibitor activity 16 0.0384 Chymotrypsin activity 13 0.0431

228

5.2.7.3. Identification of differentially expressed genes significantly representing known molecular (KEGG) pathways

The KEGG database is a comprehensive collection of biological pathways created from

studies of gene-gene interactions in various cellular processes (Kanehisa, 1997; Kanehisa

and Goto, 2000). It presently contains over 24,000 molecular pathways, onto which gene

expression data can be overlayed to gain insight into the outcome of microarray

experiments. Using similar statistical approaches as described for gene ontology analyses,

the list of genes differentially expressed between serous LMP and invasive EOC was

queried against the KEGG database and those pathways with statistically significant

representation were identified.

The cell-cycle pathway was amongst those represented with high significance (p= 2.81E-

5) and contains a large number of genes that were detected as upregulated in the invasive

tumours compared to those of LMP (Appendix L). Aberrant expression of genes involved

in regulating the cell cycle has been noted in a wide range of tumour types and their up

regulation in the invasive EOC type corresponds to the known difference in growth rate

and speed of disease progression for this disease as previously described (D'Andrilli et

al., 2004). Over expression of cyclin D1 for example has been demonstrated to correlate

with chromosomal instability in breast cancer (Lung et al., 2002) although other research

suggests that distinct pathways of cyclin activation and effect exist between breast and

ovarian cancer (Courjal et al., 1996; D'Andrilli et al., 2004). The mitosis check-point

control gene MAD2 is upregulated in this pathway. Studies in ovarian cancer cell lines

have demonstrated that steady-state amounts of this molecule are required for cells to

maintain proper control of the replication process (Wang et al., 2002).

The complement and coagulation cascade pathway contains several differentially

expressed genes previously linked to ovarian cancer invasion and metastasis, including

thrombin/F2 (Wilhelm et al., 1998) and the PLAU/PLAUR combination (Konecny et al.,

2001; van der Burg et al., 1996). The complement pathway is a crucial part of the bodies

immune and inflammatory response to a tumour (Smith and Oi, 1984) which have been

identified as a prognostic factor for EOC (Zhang et al., 2003).

Another significantly represented KEGG pathway in the EOC LMP/invasive signature is

the cytokine-cytokine receptor interaction pathway (p=5.28E-5). It contains chemokines

such as CXCL9 and CXCL10, molecules also involved in the immune and inflammatory

responses. These genes function particularly in the recruitment of T-cells, which can

229

infiltrate a malignancy and assist in the bodies’ ability to slow or terminate the

uncontrolled growth by engaging the other components cells (Zhang et al., 2003).

5.2.8. Molecular pathway analysis of the invasive and LMP EOC gene expression signature

Pathway analysis is an emerging technique for mining existing databases of biological

and clinical knowledge to add value to microarray data. This method involves the use of

large databases, created from the use of intelligent text-mining algorithms applied to data

sources such as the PubMed. Souces such as this contain decades of medical, clinical and

molecular information, however much of the information is reliant on human

interpretation of long strings of text (Wheeler et al., 2003). The use of these algorithms on

such data sources has generated an index of over 500,000 biological interactions. These

data exist in a proprietary format (ResNet) that can be queried with standard database

protocols (Daraselia et al., 2004). The pathway analysis tool PathwayAssist (Ariadne

Genomics, USA) was used to explore interactions between the list of 1,302 genes

identified by statistical analysis of the serous LMP and invasive EOC profiles generated

for this study (Nikitin et al., 2003).

Initially the entire list of genes identified as differentially regulated by LMP and invasive

serous EOCs was used to interrogate the ResNet database and generate a network of

interactions. The analysis was restricted to finding direct interactions between ‘nodes’

(i.e. genes) and only those genes coding for known Homo sapiens proteins. Despite these

restrictions the size of the network returned was prohibitively large for display and

interpretation with the available computing resources (data not shown). This result likely

reflects the representation of ResNet entries involving the dominant gene ontologies

represented in the 1,302 differentially expressed genes; cell cycle regulation, proliferation

and the immune response, as described in section 5.2.7.2.

5.2.8.1. Gene ontology-based filtering of the LMP and invasive EOC gene expression signature

In order to explore the key biological processes represented by the LMP and invasive

EOC signature other than cell cycle regulation, proliferation and the immune response, a

custom database query was written to identify and exclude genes implicated in these

processes. This query was applied to a UniGene-annotated (Batch #184) version of gene

list using Microsoft Access. The details of the query composed are shown in Appendix

M.

230

This filtering resulted in a list of 142 genes with functions other than cell cycle

regulation, proliferation and immune system activity. These are shown in hierarchical

cluster format in Figure 5-11 and listed with annotations in Appendix P. It can be seen

that the expression of this 142 gene subset remains highly distinct between tumour

subtypes.

The large decrease (89%) in the number of differentially expressed genes remaining after

excluding those involved in the cell cycle, proliferation and the immune response,

correlates with their degree of over-representation observed from EASE analysis of the

total list. Manual inspection of the annotated set of 142 genes remaining, coupled with the

output of hierarchical clustering was carried out. This revealed that a large proportion of

this subset of the LMP and invasive EOC signature were involved in cell-cell or cell-

matrix adhesion processes. These adhesion-related genes also appeared to be

differentially expressed at extreme levels between the EOC subtypes based on the

intensity of the hierarchical clustering colouring. Gene ontology analysis of the 142 gene

subset, again using EASE, confirmed this observation with the p-values for over-

representation of adhesion-related ontologies being less than 0.001.

This list of 142 genes was then used for pathway analysis and discovery to explore

potential interactions between the individual members and identify other molecular

events that may be differentially regulated between these EOC subtypes.

231

Figure 5-9: Hierarchical cluster of 142 genes remaining after key-word filtering. Columns indicated by red and blue section of colour-bar correspond to invasive and LMP tumours, respectively. Yellow indicators on right correspond to genes in cell-cell adhesion or cell-matrix adhesion gene ontologies. Gene names and mean expression ratios per class given in Appendix P and on the attached CD-ROM.

Serous invasive

Serous LMP

Histology

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.2

1.5

2.0

2.5

3.0

3.0

1.0

0.3

Expression ratio

232

5.2.8.2. Pathway analysis of 142 gene subset of the invasive and LMP EOC expression signature

Using the PathwayAssist interface to for the ResNet database of molecular interactions,

all linkages between the 142 ontology-filtered genes differentially expressed between

LMP and invasive EOC set were determined. Genes without connection to the main

network were removed and the network arranged for optimal viewing of the relationships

present. Each linkage was manually checked by reading the PubMed abstract from which

it was obtained to confirm the natural language processor used had interpreted the

information correctly. The final result is shown in Figure 5-10.

For selected genes from the network created with PathwayAssist, expression box plots, a

description of their potential function in EOC malignancy and the other members in the

network with which they are know to interact, are shown in Table 5-8. The network

members are coloured according to their mean expression level in invasive EOC

specimens (up-regulation: red, down-regulation: green).

A hierarchical cluster of genes contained in the network generated was also produced to

visualise the relative consistency of each gene with the two EOC classes (Figure 5-11).

233

Figure 5-10: Gene expression network created from pathway analysis of 142 keyword-filtered invasive/LMP EOC differentially expressed genes. Regulation of cell-cell and cell-matrix adhesion is a primary function of the majority these interacting genes with significant differences in expression between LMP and invasive EOC. Genes shown in grey shading were not present in the dataset after filtering, but were added based on literature information.

3.0

1.0

0.3

Expression ratio

Gene not present on microarray

Connection types Network elements

Protein

Extracellular protein

Receptor

234

Table 5-8: Details about key genes from differentially expressed LMP/invasive network. Box plots of Log2 expression ratios from microarray dataset shown. Left box : Invasive EOC, Right box: LMP EOC. All genes differentially expressed at univariate level p<0.001

Gene Interacts with

Summary of purported involvement in EOC development and/or progression

MSLN (Mesothelin)

CA-125

- A cell surface molecule expressed in the mesothelial lining in many tumour cells. - CA-125 and MSLN are co-expressed in advanced grade ovarian adenocarcinoma. - Initiates cell attachment to the mesothelial epithelium via binding to mesothelin,

contributing to the metastasis of ovarian cancer to the peritoneum (Gilks et al., 2005; Hippo et al., 2001; Lu et al., 2004; Rump et al., 2004; Schaner et al., 2003)

LGALS1 (Galactin 1)

CA-125 FN1

- LGALS1 binds to CA-125 - Is a component of the extracellular matrix and implicated in the regulation of cell adhesion, apoptosis, ad tumour progression. - LGALS1 export to the cell surface may be regulated by CA-125 activity.

(Seelenmeyer et al., 2003)

CDH1 (E-cadherin)

CDH6 CCDH11 VIL2 BAIP1 PLAU CA-125 CD44

- The cell-adhesion molecule CDH1 plays an important role in maintaining tissue integrity. - Disappearance or impaired function of CDH1 has often been associated with tumour formation and invasion in vivo and in vitro. - In normal ovaries, the expression of CDH1 is limited to

inclusion cysts or deep clefts lined with OSE, whereas no CDH1 staining of the OSE is detected at the ovary surface. - Benign and borderline EOC tumours uniformly express CDH1. (Auersperg et al., 1999; Davies et al., 1998; Hiscox and Jiang, 1999; Sasaki et al., 1999; Sundfeldt et al., 1997; Xu and Yu, 2003)

LMPInvasive

1

0

-1

-2

Class

MSL

N

LMPInvasive

1

0

-1

-2

-3

Class

LGAL

S1

LMPInvasive

1

0

-1

-2

-3

-4

Class

CD

H1

235

LMPInvasive

2

1

0

-1

-2

Class

CD

44

Gene Interacts with


FN1 (Fibronectin)

CD44 LGALS1 PLAU LCP1 CD9

- Involved in adhesion, motility, opsonization, wound healing, and maintenance of cell shape. - Chemotaxis of EOC cells is partially prevented by antibodies against FN1. - The mesothelium plays an active role in inducing the intraperitoneal spread of EOC cells, and FN1 is

one of the main mediators of mesothelium-induced cell motility. - EOC cells bound to fibronectin are protected from apoptosis when treated with cisplatin, or other drugs including paclitaxel. (Franke et al., 2003; Rieppi et al., 1999; Zand et al., 2003)

PLAU (Plasminogen activator, urokinase)

CDH1 FN1

- Over-expression of PLAU or PLAUR is a feature of malignancy and is correlated with tumour progression and metastasis. - Expression is predictive of patient outcome, particularly when residual disease present. - PLAU expression is

activated after contact with FN1 and initiation of cell-cell interactions that are mediated by CHD1. (Foekens et al., 1992; Konecny et al., 2001; Nekarda et al., 1994; Pedersen et al., 1994; Sasaki et al., 1999; Schmitt et al., 1997; van der Burg et al., 1996)

CD44

VIL2 (ezrin) CDH1 TNFAIP6 CD9

- Ovarian cancer cell adhesion to mesothelium can be inhibited by antibodies against CD44 suggesting a role in the mediation the adhesion of cellular adhesion - CD44 interacts with members of the ezrin family (ERM family)

such as VIL2 and forms a complex with properties that suggest its importance in tumour-endothelium interactions, cell migrations, cell adhesion, tumour progression and metastasis. (Bar et al., 2004; Gardner et al., 1995; Lessan et al., 1999; Martin et al., 2003; Sillanpaa et al., 2003; Xu and Yu, 2003)

LMPInvasive

3

2

1

0

-1

-2

-3

Class

FN1

LMPInvasive

3

2

1

0

-1

-2

Class

PLAU

236

Gene Interacts with


VIL2 (Ezrin)

CDH1 CD44

• The function of these proteins such as that coded by VIL2 is to link the plasma membrane to the actin cytoskeleton. • Inhibition of VIL2 expression in colorectal cancer cells causes reduced cell-cell adhesiveness together with a gain in motility and invasive behaviour. These

cells also displayed increased spreading over matrix-coated surfaces. • VIL2 regulates cell-cell and cell-matrix adhesion, by interacting with cell adhesion molecules CHD1 and beta-catenin, and is thought to play an important role in the control of adhesion and invasiveness of EOC cells. (Hiscox and Jiang, 1999; Martin et al., 2003; Shen et al., 2003)

LMPInvasive

2

1

0

-1

-2

Class

VIL2

237

Figure 5-11: Hierarchical cluster of genes included in adhesion-related network found to be differentially expressed between LMP and invasive EOC. Red and blue sections of colour bar correspond to invasive and LMP tumours, respectively. A high level of consistent differential expression can be seen between the two EOC subtypes for these genes, particularly those upregulated in the LMP tumours, e.g. MSLN and PTTG1.

Serous LMP

Serous Invasive

Histology:

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.2

1.5

2.0

2.5

3.0

3.0

1.0

0.3

Expression ratio

238

5.2.8.3. Comparison of the LMP/invasive gene expression signature to published studies of other invasive/non-invasive cancers

After performing the ontology analysis described previously, there remained a large

proportion of the 1,302 differentially expressed genes whose function or involvement in

these tumours had not been accounted for. To further explore the LMP/invasive gene

expression signature, comparative analysis was carried out using a range of published

dataset and lists of genes found to be differentially expressed between other forms of

human cancers.

Similarity between gene lists was determined by first mapping the lists of differentially

expressed genes from a study of interest to the Peter Mac 10.5k cDNA microarray clone

set. This was done by mapping the by converting genes IDs used in the study to current

UniGene cluster IDs (Build #184). The significance of any overlap with the UniGene-

annotated list of LMP/invasive genes was then calculated using a standard Fisher's exact

test to ascertain if the degree of overlap observed could be expected by chance alone.

Table 5-9 summarises those gene lists found to have a significant homology to the

invasive/LMP EOC signature.

This analysis revealed a high similarity to a range of other models of invasive and non-

invasive cancer subtypes that have been studied with microarrays, including several

studies of ovarian cancer. The statistically significant overlap of genes identified by this

study and those described by Gilks et al (Gilks et al., 2005), Warrenfeltz et al

(Warrenfeltz et al., 2004) and Santin et al (Santin et al., 2004) represents a validation step

in the process of characterising these EOC subtypes. These studies were performed using

tissue from EOC patients from independent cohorts, processed in different laboratories,

hybridised to different array platforms and analysed with different algorithms and tools.

Despite these extensive differences, the genes found to be associated with EOC

malignancy overlap with those identified by this study at a statistically significant level.

This finding provides some validation for the use of microarray profiling as a means of

identifying molecular signatures of human disease. It shows that the findings generated

are transferable between patient cohorts and laboratories.

The overlap between the recently published meta-signature of undifferentiated tumours

further strengthens the power of this analysis. It also supports the hypothesis that

tumorigenesis and progression is reliant on a common transcriptional profile, found to

present in a large proportion of cancer types (Rhodes et al., 2004). The similarity with the

239

meta-signature furthermore suggests that the molecular differences between LMP and

invasive EOC are similar to the core processes responsible for the malignant

transformation of a broad range of tissues.

This meta-analysis of microarray datasets also revealed a relationship between the

invasive/LMP signature and that from a microarray study of breast cancer (BrCa)

subtypes DCIS and IDC (Ma et al., 2003). BrCa is frequently studied as a progressive

cancer model with defined pathological stages thought to correlate with distinct molecular

events. These stages are: atypical ductal hyperplasia (ADH), DCIS and IDC.

Initially the overlap between the genes thought to characterise these stages of BrCa

progression and the invasive/LMP signature deduced in this chapter suggested a

progressive relationship between LMP and invasive EOC. Warrenfeltz et al have

proposed that LMP tumours exist as an intermediate state between normal ovarian

epithelium and fully invasive EOC (Warrenfeltz et al., 2004). However, one important

observation from the Ma et al study was that same genes found to correlate with the

stages of breast cancer were also significantly related to the grade of the tumours used to

generate the signature. Therefore it appears unlikely that these stages of breast cancer are

separated by distinct molecular profiles, at least those detectable by microarray analysis.

Therefore the overlap between breast and ovarian cancer expression profiles therefore

most likely reflects the increasing tumour grade associated with the development of a

more aggressive phenotype.

Genes found to be differentially expressed between pre-invasive and invasive prostate

cancer (Dhanasekaran et al., 2001) had no significant overlap with the LMP/invasive

EOC signature. Also not significantly related was a gene expression signature deduced

from a comparison of human embryonic stem cells and a somatic cell line (Sperger et al.,

2003). This indicates a different range of molecular events responsible for these

phenotypes compared to the events occurring in LMP and invasive EOC.

240

Table 5-9: Published gene lists or studies with significant homology to LMP/invasive EOC signature

List Name p value

Differentially expressed genes between low/intermediate grade breast DCIS and high grade DCIS/IDC (Seth et al., 2003). 0.000554

Genes differentially expressed between cell cultures of serous EOC and normal ovarian epithelium (Santin et al., 2004) 0.000122

Prognostic signature of breast cancer capable of predicting a short interval to the development of metastatic disease after surgery (van 't Veer et al., 2002) 4.82 x 10-5

Genes with increased expression in association with both increasing breast tumour grade and the transition of tumours from premalignant (benign), pre-invasive (DCIS) to malignant (IDC) (Ma et al., 2003)

1.33 x 10-5

Genes identified by SAM analysis of serous LMP and invasive EOC (Gilks et al., 2005) 4.67 x 10-8

Genes identified as differentially expressed between gastric cancer specimens of different stages of invasive (Boussioutas et al., 2003). 5.16 x 10-8

Genes differentially expressed between benign and malignant EOC – proposed as signature of EOC malignant potential (Warrenfeltz et al., 2004) 7.15 x 10-9

Metasignature of undifferentiated cancer identified by profiling of >3,700 tumours to identify core genes involved in malignant transformation and tumour progression in the majority of human cancer types (Rhodes et al., 2004)

1.51 x 10-11

Table 5-10: Published gene lists from microarray studies in which no significant homology to the LMP/invasive EOC signature was observed

List Name p value

Genes differentially expressed between human embryonic stem cells and somatic cell line (Sperger et al., 2003) 0.342

Genes differentially expressed between prostate cancer and benign prostate hyperplasia (Dhanasekaran et al., 2001) 0.089

5.2.9. Validation of selected differentially expressed genes with RT-PCR

5.2.9.1. Selection of appropriate genes for expression signature validation and design of RT-PCR primers

RT-PCR was chosen as one method for both technically and biologically validating the

expression of several genes identified by microarray analysis as having statistically

significant differences in expression between the EOC subtypes of interest. A high-

throughput method of this technique was employed using the ABI 7900HT system

(Applied Biosystems, USA), enabling up to 384 RT-PCR reactions to be performed

241

simultaneously, whilst requiring half the reaction volume as the standard 96-well plate

format (Pinhasov et al., 2004). The performance of RT-PCR as a method to validate

microarray results has been widely reported (Jenson et al., 2003; Mutch et al., 2001).

Several criteria were formulated to select genes appropriate for validation method:

(i) Statistically significant difference between serous invasive and LMP EOC as

identified by SAM analysis.

(ii) Upregulated (mean normalised expression ratio >1.0) in at least one class, i.e.

not down-regulated to differing extents in both LMP and invasive tumours

(iii) Representative of significantly enriched gene ontologies.

(iv) Mean differences in expression levels that are robust to changes in the

number of samples in each class.

Primers for the RT-PCR reactions were designed using the online primer design tool

provided by GenScript (GenScript Corporation, 2005). Details of the primers designed

are shown in Table 5-11. All sequences were checked for homology to other unrelated

genes by BLAST search (Altschul et al., 1990). Official UniGene names, mean

expression ratios for the two EOC subtypes of interest and a summary of the genes

relevant functions are shown in Table 5-12. Gene ontologies represented by each gene are

shown in Table 5-13

Table 5-11: RT-PCR primer sequences designed for validation of gene expression signature of LMP and invasive EOC

Symbol Forward primer sequence Reverse primer sequence BIRC5 AGTGAGGGAGGAAGAAGGCA ATTCACTGTGGAAGGCTCTGC CHL1 AATCATCCATTTGCTGGTGA CGGACATCCACAACATCAAT CLDN10 TATTTGCGCTCTTTGGAATG AGCACAGCCCTGACAGTATG COL5A3 TTTGAGATCGTGAAATTGGC AGTTCAGCTGCACGACATTC CXCL9 ATTGGAGTGCAAGGAACCC GGATAGTCCCTTGGTTGGTG FN1 GGTTCGGGAAGAGGTTGTTA TCATCCGTAGGTTGGTTCAA MSLN GAATGTGAGCATGGACTTGG CCAGAAGTTTCTGCACCTCA KLK5 AGGTCCTCCAGTGCTTGAAT ATCACCCTGGCAGGAGTCT TNFSF10 TGCTGATCGTGATCTTCACA AAGAAACAAGCAATGCCACTT PTTG1 ACCTGTGAAGATGCCCTCTC ACATCCAGGGTCGACAGAAT SSPN GCTAGTCAGGGACACTCCATTT CCGTTCGTCAACCTGATATG STK6 CATCTTCCAGGAGGACCACT AAGAACTCCAAGGCTCCAGA

24

2

Tab

le 5

-12:

Det

ails

of g

enes

diff

eren

tially

exp

ress

ed b

etw

een

LM

P an

d in

vasi

ve E

OC

sele

cted

for

RT

-PC

R v

alid

atio

n. T

able

is so

rted

in o

rder

of

incr

easi

ng m

ean

fold

-cha

nge

diff

eren

ce.

Uni

Gen

e Sy

mbo

l

Mea

n ex

pres

sion

ra

tio:

Inva

sive

; L

MP

Fold

diff

eren

ce

betw

een

mea

n ex

pres

sion

ra

tios

Uni

Gen

e N

ame

Sum

mar

y of

func

tion

CLD

N10

0.

554;

7.0

51

0.07

9 C

laud

in 1

0

- An

inte

gral

mem

bran

e pr

otei

n an

d co

mpo

nent

of t

ight

junc

tion

stra

nds,

whi

ch se

rve

as a

phy

sica

l bar

rier t

o pr

even

t sol

utes

and

wat

er fr

om p

assi

ng

betw

een

epith

elia

l or e

ndot

helia

l cel

l she

ets.

- H

as c

ell a

dhes

ion

and

stru

ctur

al p

rope

rties

- E

xpre

ssio

n le

vels

ass

ocia

ted

with

risk

of r

ecur

renc

e of

prim

ary

hepa

toce

llula

r car

cino

ma

(Che

ung

et a

l., 2

005;

Kub

ota

et a

l., 1

999)

CH

L1

0.19

7; 2

.129

0.

093

Cel

l adh

esio

n m

olec

ule

with

hom

olog

y to

L1

CAM

(clo

se h

omol

og

of L

1)

- A

mem

ber o

f the

L1

gene

fam

ily o

f neu

ral c

ell a

dhes

ion

mol

ecul

es.

- Inv

olve

d in

sign

al tr

ansd

uctio

n pa

thw

ays.

-

Gen

eral

cel

l adh

esio

n m

olec

ule,

invo

lved

in v

ario

us st

ages

of e

mbr

yoni

c de

velo

pmen

t (W

ei e

t al.,

199

8)

SSPN

1.

095;

2.6

96

0.40

6 Sa

rcos

pan

(Kra

s on

coge

ne-a

ssoc

iate

d ge

ne)

- Exp

ress

ed in

a v

arie

ty o

f tis

sues

with

hig

hest

leve

ls in

mus

cle,

- I

n ce

rtain

tum

ours

KRA

S2, S

SPN

, and

ITPR

2 ar

e co

-am

plifi

ed.

- The

func

tion

of th

is g

ene

is u

nkno

wn

alth

ough

has

cel

l adh

esiv

e pr

oper

ties.

(Hei

ghw

ay e

t al.,

199

6)

TNFS

F10

0.63

8; 1

.376

0.

638

Tum

our n

ecro

sis f

acto

r (li

gand

) sup

erfa

mily

, m

embe

r 10

- Pre

fere

ntia

lly in

duce

s apo

ptos

is in

tran

sfor

med

and

tum

our c

ells

, but

doe

s no

t app

ear t

o ki

ll no

rmal

cel

ls

- Thi

s pro

tein

bin

ds to

seve

ral m

embe

rs o

f TN

F re

cept

or su

perf

amily

- T

he b

indi

ng o

f thi

s pro

tein

to it

s rec

epto

rs h

as b

een

show

n to

trig

ger t

he

activ

atio

n of

MAP

K8/

JNK

, cas

pase

8, a

nd c

aspa

se 3

. - E

xpre

ssio

n si

gnifi

cant

ly c

orre

late

s with

leng

th o

f sur

viva

l in

EOC

(L

anca

ster

et a

l., 2

003;

Wile

y et

al.,

199

5)

243

Uni

Gen

e Sy

mbo

l

Mea

n ex

pres

sion

ra

tio:

Inva

sive

; L

MP

Fold

diff

eren

ce

betw

een

mea

n ex

pres

sion

ra

tios

Uni

Gen

e N

ame

Sum

mar

y of

func

tion

FN1

1.53

4; 0

.521

1.

534

Fibr

onec

tin 1

- A g

lyco

prot

ein

pres

ent i

n a

dim

eric

or m

ultim

eric

form

at t

he c

ell s

urfa

ce

and

in e

xtra

cellu

lar m

atrix

. - F

ibro

nect

in is

invo

lved

in c

ell a

dhes

ion

and

mig

ratio

n pr

oces

ses i

nclu

ding

em

bryo

gene

sis,

wou

nd h

ealin

g, h

ost d

efen

ce, a

nd m

etas

tasi

s.

(Kor

nblih

tt et

al.,

198

5)

LTBP

2 3.

113;

1.1

08

2.81

La

tent

tran

sfor

min

g gr

owth

fact

or b

eta

bind

ing

prot

ein

2

- Ext

ra-c

ellu

lar p

rote

in b

elon

ging

to th

e fa

mily

of l

aten

t tra

nsfo

rmin

g gr

owth

fa

ctor

(TG

F)-b

eta

bind

ing

prot

eins

(LTB

P),

- Mul

tiple

func

tions

incl

ude:

mem

ber o

f the

TG

F-be

ta la

tent

com

plex

, a

stru

ctur

al c

ompo

nent

of m

icro

fibril

s, an

d a

role

in c

ell a

dhes

ion.

(M

oren

et a

l., 1

994)

CO

L5A3

2.

086;

0.6

01

3.47

1 C

olla

gen,

type

V, a

lpha

3

- Enc

odes

an

alph

a ch

ain

for o

ne o

f the

low

abu

ndan

ce fi

brill

ar c

olla

gens

. - F

ound

in ti

ssue

s con

tain

ing

type

I co

llage

n an

d ap

pear

s to

regu

late

the

asse

mbl

y of

het

erot

ypic

fibr

es c

ompo

sed

of b

oth

type

I an

d ty

pe V

col

lage

n.

- Stro

mal

exp

ress

ion

asso

ciat

ed w

ith c

olor

ecta

l can

cer

- Kno

wn

to h

ave

high

er e

xpre

ssio

n in

ent

omet

reio

tic le

sion

s rel

ativ

e to

no

rmal

ova

ry

(Im

amur

a et

al.,

200

0; K

onno

et a

l., 2

003)

KLK

5 7.

933;

2.2

67

3.48

6 K

allik

rein

5

- Im

plic

ated

in c

arci

noge

nesi

s and

som

e ha

ve p

oten

tial a

s nov

el c

ance

r and

ot

her d

isea

se b

iom

arke

rs.

- Exp

ress

ion

is u

p-re

gula

ted

by e

stro

gens

and

pro

gest

ins.

- M

ay b

e in

volv

ed in

shed

ding

of c

ells

in th

e ep

ider

mis

. (B

ratts

and

and

Egel

rud,

199

9)

STK

6 3.

037;

0.6

96

4.36

4 Se

rine/

thre

onin

e ki

nase

6

A c

ell c

ycle

-reg

ulat

ed k

inas

e th

at a

ppea

rs to

be

invo

lved

in m

icro

tubu

le

form

atio

n an

d/or

stab

iliza

tion

durin

g ch

rom

osom

e se

greg

atio

n.

- Has

a ro

le in

tum

our d

evel

opm

ent a

nd p

rogr

essi

on.

- Has

bee

n pr

opos

ed a

s a c

andi

date

low

pen

etra

nce

EOC

-sus

cept

ibili

ty g

ene

(D

icio

ccio

et a

l., 2

004;

Zho

u et

al.,

199

8)

24

4

Uni

Gen

e Sy

mbo

l

Mea

n ex

pres

sion

ra

tio:

Inva

sive

; L

MP

Fold

diff

eren

ce

betw

een

mea

n ex

pres

sion

ra

tios

Uni

Gen

e N

ame

Sum

mar

y of

func

tion

CXC

L9

3.85

3; 0

.739

5.

214

Che

mok

ine

(C-X

-C

mot

if) li

gand

9

- Tho

ught

to b

e in

volv

ed in

T c

ell t

raff

icki

ng.

- Pre

senc

e of

intra

-tum

oura

l T-c

ells

is a

sign

ifica

nt p

rogn

ostic

indi

cato

r in

EOC

. (F

arbe

r, 19

93; Z

hang

et a

l., 2

003)

MSL

N

5.68

2; 2

.654

5.

682

Mes

othe

lin

- The

exa

ct fu

nctio

n of

mes

othe

lin is

unk

now

n, it

may

pla

y a

role

in c

ellu

lar

adhe

sion

and

is p

rese

nt o

n m

esot

heliu

m, m

esot

helio

mas

, and

ova

rian

canc

ers.

- R

ecen

tly h

as b

een

show

n to

con

trol c

ellu

lar a

dhes

ion

in E

OC

by

bind

ing

spec

ifica

lly to

CA-

125

(Cha

ng a

nd P

asta

n, 1

996;

Rum

p et

al.,

200

4)

BIRC

5 4.

326;

0.7

1 6.

093

Bac

ulov

iral I

AP

repe

at-

cont

aini

ng 5

(sur

vivi

n)

- An

apop

tosi

s inh

ibito

r tha

t is e

xpre

ssed

dur

ing

the

G2/

M p

hase

of t

he c

ell

cycl

e.

- BIR

C5

asso

ciat

es w

ith th

e m

icro

tubu

les o

f the

mito

tic sp

indl

e

- Dis

rupt

ion

resu

lts in

the

loss

of a

popt

osis

act

ivity

. - E

xpre

ssio

n co

rrel

ates

with

surv

ival

in b

reas

t but

not

can

cer,

alth

ough

is

low

er in

the

sero

us su

btyp

e th

an o

ther

s (A

mbr

osin

i et a

l., 1

997;

Fer

rand

ina

et a

l., 2

005)

PTTG

1 4.

168;

0.5

4 7.

648

Pitu

itary

tum

our-

trans

form

ing

1

- Has

a tr

ansf

orm

ing

activ

ity in

vitr

o an

d tu

mou

rigen

ic a

ctiv

ity in

viv

o, a

nd

the

gene

is h

ighl

y ex

pres

sed

in v

ario

us tu

mou

rs.

- Con

tain

s 2 P

XX

P m

otifs

, whi

ch a

re re

quire

d fo

r its

tran

sfor

min

g an

d tu

mou

rigen

ic a

ctiv

ities

, as w

ell a

s for

its s

timul

atio

n of

bas

ic fi

brob

last

gr

owth

fact

or e

xpre

ssio

n.

- Bel

ieve

d to

hav

e a

role

in tu

mou

r ang

ioge

nesi

s and

mito

gene

sis

(Pur

i et a

l., 2

001;

Zha

ng e

t al.,

199

9)

245

Table 5-13: Gene ontology information for selected RT-PCR genes. Ontologies identified as significantly represented by the 1,302 genes differentially expressed between LMP and invasive EOC are shown in bold

Symbol Gene ontologies

BIRC5 G2/M transition of mitotic cell cycle | anti-apoptosis | caspase inhibitor activity | cysteine protease inhibitor activity | microtubule binding | spindle microtubule | zinc ion binding

CHL1 cell adhesion | integral to membrane | membrane | protein binding | signal transduction

CLDN10 cell adhesion | integral to membrane | tight junction

CXCL9 G-protein coupled receptor protein signalling pathway | cell-cell signalling | cellular defence response | chemokine activity | extracellular space | inflammatory response | signal transduction

KLK5 chymotrypsin activity | epidermis development | extracellular space | proteolysis and peptidolysis | trypsin activity

PTTG1

DNA metabolism | DNA repair | DNA replication and chromosome cycle | chromosome segregation | cysteine protease inhibitor activity | cytokinesis | cytoplasm | mitosis | nucleus | protein binding | spermatogenesis | transcription factor activity | transcription from RNA polymerase II promoter

SSPN cell adhesion | cytoskeleton | dystrophin-associated glycoprotein complex | integral to plasma membrane | muscle contraction

STK6 ATP binding | cell cycle | mitosis | protein amino acid phosphorylation | protein serine/threonine kinase activity | spindle | transferase activity

MSLN cell adhesion | membrane | protein binding

FN1 acute-phase response | cell adhesion | cell migration | collagen binding | extracellular matrix structural constituent | heparin binding | metabolism | oxidoreductase activity | response to wounding

TNFSF10 cell-cell signalling | immune response | induction of apoptosis | integral to plasma membrane | membrane | positive regulation of I-kappaB kinase/NF-kappaB cascade | signal transduction | soluble fraction | tumour necrosis factor receptor binding

5.2.9.2. Results from RT-PCR and correlation with microarray values

Hierarchical clustering was performed using data from both methods of expression

measurement, as shown in Figure 5-12. The standard correlation method, implemented in

Silicon Genetics Genespring (Agilent Technologies, USA), was used to quantify the

strength of the relationships present. Standard correlation is identical the Pearson

correlation, however is cantered around one, rather than zero. Both data series were

median centred (the value of each gene was divided by the median of that gene across all

samples) to assist in visualising the differences present between tumour subtypes.

246

The hierarchical clustering of both microarray and RT-PCR data separated the LMP and

invasive samples and there was a high degree of concordance between the pattern and

extent of differential expression observed.

To quantify the extent of agreement between methods of gene expression quantification,

correlation coefficients were calculated for the 10 genes analysed. Pearson (standard) and

Spearman (rank-based) methods were used to assess the strength of agreement between

these data, as shown in Table 5-14. The mean correlation between methods is 0.73 for

Pearson correlation and 0.76 for Spearman (0.79 excluding SSPN), indicating good

concordance between methods for this validation subset of the total cohort.

A multivariate method of analysis was carried out using the microarray and RT-PCR data

to determine if a significant difference existed between methods after controlling for

known variation in the data. Using a general linear model, the significance of variation

between individual samples, genes and quantification methods was determined.

Statistically significant differences were observed between samples (p=0.001) and genes

(p=0.0321) as expected. No significant difference in the data was found between

microarray and PCR quantification (p=0.320), confirming the high concordance between

microarray and RT-PCR gene expression measurements.

Table 5-14: Correlation coefficients for microarray and RT-PCR gene expression measurements

Gene Pearson correlation coefficient

Spearman correlation coefficient

KLK5 0.74 0.89 FN1 0.61 0.79 CHL1 0.54 0.59 STK6 0.85 0.87 TNFSF10 0.83 0.92 SSPN 0.22 0.33 MSLN -0.16 -0.25 BIRC5 0.94 0.81 CLDN10 0.80 0.78 PTTG1 0.87 0.92 CXCL9 0.94 0.67 Mean (excluding MSLN) 0.73 0.76 Mean (excluding MSLN & SSPN) 0.79 0.80

247

(a) M

icro

arra

y da

ta

(b) R

T-P

CR

dat

a

Figu

re 5

-12:

Hie

rarc

hica

l clu

ster

of 1

0 ge

nes f

or v

alid

atio

n of

LM

P/in

vasi

ve e

xpre

ssio

n da

ta u

sing

(a) m

edia

n ce

ntre

d m

icro

arra

y da

ta a

nd (b

) med

ian

cent

red

RT

-PC

R d

ata.

A h

igh

leve

l of c

onco

rdan

ce c

an b

e ob

serv

ed b

etw

een

the

data

gen

erat

ed b

y th

ese

two

met

hods

of a

naly

sis.

This

was

con

firm

ed b

y st

atis

tical

ana

lysi

s whi

ch re

veal

ed n

o si

gnifi

cant

diff

eren

ce e

xist

s bet

wee

n ge

ne e

xpre

ssio

n qu

antif

ied

by R

T-PC

R o

r mic

roar

ray

for t

he g

enes

test

ed in

this

se

ctio

n.

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.2

1.5

2.0

2.5

3.0

3.0

1.0

0.3

Exp

ress

ion

ratio

Sero

us in

vasi

ve

Sero

us L

MP

His

tolo

gy:

248

Figure 5-13: Mean fold changes in the expression ratios for selected genes as determined by (A) microarray and (B) RT-PCR from the validation cohort of 10 cases. Confidence intervals based on the standard-error are shown. In general good agreement was observed between these assessments of gene expression with larger mean fold change differences observed by RT-PCR.

RT-PCR

0

1

2

3

4

5

6

7

8

9

10

KLK5

FN1

CHL1STK

6

TNFS

F10

SSPN

MSLN

BIRC5

CLDN10

PTTG

1

CXCL9

Me

an

fo

ld c

ha

ng

e

LMPInvasive

Microarray

0

1

2

3

4

5

6

7

8

9

10

KLK5

FN1

CHL1STK

6

TNFS

F10

SSPN

MSLN

BIRC5

CLDN10

PTTG

1

CXCL9

Me

an

fo

ld c

ha

ng

e

LMPInvasive

B

Microarray

RT-PCR

A

249

5.2.10. Biological validation of the LMP/invasive expression signature

To validate the extent to which protein expression changes in relation to the observed

mRNA changes, immunohistochemical analysis using EOC tissue microarrays (TMAs)

was performed. The use of TMAs, in association with microarray data, as a high

throughput method of evaluating protein expression has been widely demonstrated for a

range of cancer types (Abd El-Rehim et al., 2005; Rihl et al., 2004; Sallinen et al., 2000).

Tissue microarrays add significant structural and cellular localisation information to gene

expression data and enable large numbers of correctly preserved tumour specimens to be

analysed simultaneously.

The cases chosen for TMA validation represent samples completely independent to those

used to generate the microarray expression data. The use of independent samples to

biologically validate microarray-based findings is an essential step to ensure the

observations are not specific to the sample cohort used for the primary analysis.

Previously, the expression profiles analysed in this study have been validated technically

(section 5.2.9) and also by comparison to lists of differentially expressed genes from

other independent microarray studies of EOC or other invasive/non-invasive cancer

models (section 5.2.8.3).

5.2.10.1. Selection of independent EOC cases for validation and TMA

Paraffin-embedded, formalin fixed specimens of EOC were obtained from the AOCS and

also Mercy Hospital Tissue Banks, with the assistance of Dr Melissa Robbie. This was

done by searching the respective specimen databases for the terms ‘ovarian’ and ‘serous’.

De-identified pathology reports were reviewed by Dr Robbie to confirm the suitability of

each specimen for this study, however a full pathology review including determination of

tumour grade was not possible due to time restraints. Dr Robbie did however attempt to

select cases for this validation cohort that resembled this histological characteristics of the

primary cohort used for microarray analysis.

H&E stained sections of each case were reviewed by Dr Robbie to confirm the diagnosis

of serous LMP or serous invasive EOC and to assess the relative tumour content of each

specimen to ensure adequate material was present for immunohistochemistry. The

method for TMA construction described in Sambrook and Bowtell (Sambrook and

Bowtell, 2003) was followed with a few modifications (Kononen et al., 1998).

250

Areas of each tumour typical of the diagnosis given and suitable for TMA inclusion, were

identified. Confirmation was made that the features present in the section to be punch

biopsied were plentifully represented elsewhere in the block, allowing to be used for

future studies if necessary. The appropriate area was then circled on the H&E slide which

was in turn placed over the tumour block and used to locate the matching area from

which to take the needle punch biopsy.

Tumour punches were inserted into an agar block, processed into paraffin for the

recipient block. After melting the block now containing the punches, the histology

scientist (Mr. Neal O’Callaghan) attempted to press the cores down so punches of varying

lengths were present on the base of the cassette, which became the cutting face. 5uM

sections were cut from the final blocks for each antibody to be analysed. IHC staining

was performed using a Dako Autostainer (Dako, USA) using standard IHC protocols.

In total, 84 cores (3mm diameter) were taken from areas of representative EOC content

from 52 cases of serous invasive EOC and 32 serous LMP tumours. These are detailed in

Appendix B.

5.2.10.2. Selection of antibodies corresponding to differentially expressed genes identified by microarray analysis

Using online antibody database AbCam (AbCam Inc., UK) and gene and protein

information from GeneCards (Rebhan et al., 1998), IHC markers used by the Peter Mac

Pathology Department were mapped to specific features present on the 10.5k cDNA

microarray used for this project. Those genes overlapping with the list of 1,302

differentially expressed between serous LMP and invasive EOC were identified.

Antibodies were chosen to represent gene ontologies previously identified as being

differentially expressed, including proliferation (MKI67), regulation of the cell cycle

(CCND1), and cell adhesion (CDH1).

IHC was conducted on one section of each TMA created for this project. As the

antibodies selected were in routine use by the Peter Mac Pathology Department, the IHC

was carried out by Dr Melanie Trivet using routine diagnostic IHC protocols (Materials

and Methods section 2.3.8). Sample images of EOC sections showing areas of

representative staining can be seen in Figure 5-14. Box plots of the gene expression levels

corresponding to each antibody are also shown. These plots allow the extent of variation

in the expression of these genes to be observed for the two EOC subtypes.

251

Tab

le 5

-15:

Diff

eren

tially

exp

ress

ed g

enes

cor

resp

ondi

ng to

dia

gnos

tic a

ntib

odie

s use

d by

the

Pete

r M

ac P

atho

logy

Dep

artm

ent

Uni

Gen

e Sy

mbo

l /

Ant

ibod

y na

me

Mea

n ex

pres

sion

ra

tio:

Inva

sive

; L

MP

EO

C

Fold

diff

eren

ce

of m

ean

expr

essi

on r

atio

s U

niG

ene

Nam

e Su

mm

ary

of fu

nctio

n an

d lit

erat

ure

refe

renc

es

PEAC

AM1

/ CD

31

1.35

2, 0

.594

2.

276

Plat

elet

/end

othe

lial c

ell

adhe

sion

mol

ecul

e

- A su

rfac

e gl

ycop

rote

in e

xpre

ssed

on

plat

elet

s and

end

othe

lial c

ell

junc

tions

. - E

xpre

ssed

in a

rang

e of

solid

tum

ours

and

thou

ght t

o po

sitiv

ely

regu

late

the

atta

chm

ent o

f tum

our c

ells

to e

ndot

heliu

m.

(New

man

et a

l., 1

990;

Tan

g et

al.,

199

3).

MK

I67

/ Ki6

7 2.

662;

0.6

3 4.

225

Ant

igen

iden

tifie

d by

m

onoc

lona

l ant

ibod

y K

i-67

- Req

uire

d fo

r mai

nten

ance

of c

ell p

rolif

erat

ion.

- E

xpre

ssed

in G

1, S

and

G2

phas

es o

f cel

l cyc

le.

- Cor

rela

tes w

ith p

oor s

urvi

val i

n EO

C

(Ant

tila

et a

l., 1

998;

Sch

lute

r et a

l., 1

993)

CC

ND

1 / C

yclin

D

1 0.

901;

1.5

3 0.

587

Cyc

lin D

1 (P

RA

D1:

pa

rath

yroi

d ad

enom

atos

is 1

)

- Thi

s cyc

lin fo

rms a

com

plex

with

CD

K4

or C

DK

6, w

hose

act

ivity

is

requ

ired

for c

ell c

ycle

G1/

S tra

nsiti

on.

- Mut

atio

ns, a

mpl

ifica

tion

and

over

exp

ress

ion

alte

r cel

l cyc

le

prog

ress

ion

cont

ribut

ing

to tu

mor

igen

esis

. (M

otok

ura

et a

l., 1

991)

ESR1

/ Es

troge

n re

cept

or α

0.

87; 1

.76

0.50

7 Es

troge

n re

cept

or 1

- A li

gand

-act

ivat

ed tr

ansc

riptio

n fa

ctor

com

pose

d of

seve

ral d

omai

ns

impo

rtant

for h

orm

one

bind

ing,

DN

A b

indi

ng, a

nd a

ctiv

atio

n of

tra

nscr

iptio

n.

- Exp

ress

ion

of th

is m

olec

ule

used

bro

adly

to d

eter

min

e cl

inic

al

man

agem

ent o

f bre

ast c

ance

r pat

ient

s.

(Gre

en e

t al.,

198

6)

CD

H1

/ E-c

adhe

rin

0.57

6; 1

.404

0.

41

Cad

herin

1, t

ype

1, E

-ca

dher

in (e

pith

elia

l)

- A c

alci

um d

epen

dent

cel

l-cel

l adh

esio

n m

olec

ule.

- M

utat

ions

in th

is g

ene

are

corr

elat

ed w

ith g

astri

c, b

reas

t, co

lore

ctal

, th

yroi

d an

d ov

aria

n ca

ncer

. - L

oss o

f fun

ctio

n co

ntrib

utes

to p

rogr

essi

on in

can

cer b

y in

crea

sing

pr

olife

ratio

n, in

vasi

on, a

nd/o

r met

asta

sis.

(Bus

sem

aker

s et a

l., 1

993)

25

2

Figu

re 5

-14:

IHC

stai

ned

sect

ions

of L

MP

and

inva

sive

EO

C fr

om ti

ssue

mic

roar

ray

biol

ogic

al v

alid

atio

n. B

ox p

lots

indi

cate

the

mic

roar

ray-

quan

tifie

d ge

ne e

xpre

ssio

n le

vels

in th

ese

two

EOC

subt

ypes

.

IHC: LMP IHC: Invasive Microarray

LMP

Inva

sive

1 0 -1

Sb

l

Ki6

7C

D31

E-c

adhe

rin

Cyc

lin D

ER

-α

LMP

Inva

sive

2 1 0 1

INV

L

MP

LMP

Inva

sive

1 0 -1 -2 -3 -4

INV

L

MP

2 1 0 -1 -2 -3 -4

INV

L

MP

LMP

Inva

sive

2 1 0 -1 -2

INV

L

MP

INV

L

MP

253

5.2.10.3. Quantification of tissue microarray immunohistochemistry

Three high-power (400x) images were captured of representative staining for each tumour

on each array. In total, 993 images were captured from 10 separate IHC-stained TMA

sections. As a small proportion of sectioned TMA cores had floated off or otherwise been

negatively affected during the staining process, it was not possible to record images for

every specimen included in the two TMA designs described previously.

In order to facilitate the quantification and statistical analysis of IHC staining in this large

number of digital images, an automated procedure was created, based on protocols

described by several groups (Lehr et al., 1997; Lehr et al., 1999; Matkowskyj et al.,

2000). Briefly these methods involve using thresholding and the use of the histogram-

analysis feature of Adobe Photoshop to calculate the median and standard deviation pixel

intensity of a given image. Prior to this a threshold is applied to the image to exclude

background or haematoxylin staining.

Using Adobe Photoshop CS2 (Adobe Systems Inc., USA) and Microsoft Visual Basic

(Microsoft Corporation, USA) a program was written that allowed an entire directory of

images to be processed through a series of functions. An output file containing the mean,

median and standard deviation of pixel intensities for each image was created. A sample

script from Adobe for exporting of image histogram statistics, provided with Adobe

Photoshop, was used as a foundation for the output section of the program. The full VBA

code required to run the program is given in Appendix N.

The program created carried out the following steps (as shown in Figure 5-15):

• Open the first image in a specified directory; images stored in TIFF format.

• Apply a threshold adjustment at a tonal level of 150 (the range 0 – 255,

represents the full tonal range of any digital image); consequently eliminating

the unstained sections of the image and low-level background or non-specific

staining.

• Invert the image and convert to greyscale; results in IHC stained sections

showing up as white areas on a largely black background

• Export the image histogram statistics to a tab-delimitated text file created in a

sub-directory of the images currently being processed.

254

(A) Sample image 1: Ki67 low staining

(B) Sample image 2: Ki76 high staining

(C) Sample image 1: Image converted to greyscale and threshold level of 150 applied.

(D) Sample image 2: Image converted to greyscale and threshold level of 150 applied.

(E) Sample image 1: Image inverted

(F) Sample image 2: Image inverted

Histogram statistics: Mean: 0.04 Standard deviation: 2.7

Histogram Statistics:

Mean: 52 Standard deviation: 96.73

Figure 5-15: Examples of the IHC quantification program applied to sample specimens with low (sample 1) and high (sample 2) expression of Ki67. The images are converted to gray scale and a threshold level between 0-255 is applied. The appropriate threshold level can be calibrated to the level of background or contrast staining present. The image statistics from the inverted and thresholded image, including pixel intensity mean and standard deviation, are then determined and exported.

255

5.2.10.4. Statistical analysis of quantified IHC data

Using Microsoft Excel and Minitab Statistical software, a multivariate analysis of IHC

intensity data was carried out. Using a general linear model, variation between the two

TMAs, measurements from the same tumour and specifically, EOC histological types,

was analysed to determine if significant differences in expression existed for these

proteins. Including the TMA number and replicate measurements in the ANOVA model

allowed these variables to be controlled for when determining the statistical significance

of any observed difference in the mean pixel intensities calculated from each high power

image of invasive or LMP EOC. Combining the data from the two separate arrays for

each antibody increased the statistical power of the analysis. The result of the general

linear model analyses are summarised in Table 5-16.

Table 5-16: Summary of statistical analysis and comparison of microarray and qIHC data used for biological validation of findings.

Microarray data (no. samples = 30) qIHC data (n=84)

Antibody Mean expression: Invasive

Mean expression: LMP

P-value

Mean expression: Invasive

Mean expression: LMP

P-value (n=84)

E-cadherin 1.032 2.469 <0.001 12.66 8.98 0.068

Ki67 2.662 0.63 <0.001 13.49 2.465 <0.001

CD31 1.352 0.594 <0.001 2.878 1.584 0.005

Cyclin D1 0.901 1.535 0.01 5.023 6.052 0.18

ERα 0.87 1.716 >0.05 14.36 14.74 0.849

5.2.10.5. Summary of IHC-based biological validation of invasive/LMP gene expression profile

From this analysis it was observed that increases in the expression of the gene coding for

the proliferation marker MKI64 (Ki67) and angiogenesis and tumour-recurrence marker

PECAM1 (CD31) correlates with protein expression in an independent cohort of samples

as measured with quantitative IHC (qIHC). Both these genes were detected as being

upregulated in invasive tumours relative to the LMP type by microarray analysis and

were also observed to be upregulated with high confidence from qIHC.

The cell-cycle regulating gene cyclin D1 (CCND1) was observed to be up-regulated by

microarray analysis in the invasive tumours. The same trend was observed with qIHC

256

analysis although the difference between LMP and invasive specimens was not

statistically significant.

The qIHC data from TMA1 for the cell-adhesion protein E-cadherin when analysed

separately produced mean differences in the same direction as observed by microarray

analysis and also RT-PCR validation of the same cohort, although this was not

statistically significant (p=0.096).

This section of the study shows that changes in gene expression, identified by microarray

analysis and appropriate analytical methods, can be extended to an independent cohort.

These results also show that changes in mRNA expression levels correlate with changes

in protein expression, as measured by a novel, automated qIHC analysis method.

5.3. Discussion This chapter describes the molecular characterisation of serous LMP and invasive EOC

through the use of microarray, RT-PCR and qIHC methods. The considerations around

correct pathological classification are also covered, incorporating novel methods for

confirming the primary ovarian origin of an individual specimen, based on a microarray

gene expression profile is described. A robust list of genes with differential expression

between LMP and invasive EOC subtypes was determined. The difference in expression

of a subset of this list was then validated using RT-PCR. Gene ontology and pathway

analyses were carried out to determine the key molecular processes represented within the

total list of genes and also for an ontology-filtered subset. IHC was performed on sections

of two TMAs, comprised of needle biopsies taken from an independent cohort of EOC

specimens. An automated method for objectively quantifying IHC staining intensity was

also described and demonstrated for the antibodies selected.

5.3.1. Findings from this analysis and relevance to published studies of LMP or invasive EOC

At the commencement of this project, few studies had been published involving the gene

expression profiling of LMP EOC, despite the potential for insight into the events

responsible for the, ultimately fatal, invasive capability of serous EOC. During the course

of this study however, two studies have been published in the literature involving

microarray profiling of this invasive/non-invasive EOC model (Gilks et al., 2005;

Warrenfeltz et al., 2004).

257

One of the published microarray studies of EOC that was found to significantly overlap

with the genes identified by comparison of LMP and invasive EOC in this chapter was

that by Warrenfeltz et al in 2004. This analysis was done using the oligonucleotide

Affymetrix U95aVv2 chips, making it difficult to directly compare gene expression

measurements to those determined with cDNA microarrays. The Warrenfeltz study was

limited by its small sample size (n=13) and the inclusion of only two mucinous and two

serous LMP specimens. Another limitation of the study was the combining of the LMP

tumours into one class, despite the known extensive molecular and behavioural

differences between these subtypes. No mention is made in the publication of how the

specimens were reviewed to confirm their suitability for the study, nor is any comparison

of the generated expression profiles to other datasets in order to confirm the primary

ovarian origin of these mucinous type tumours. This comparison could have been carried

out using publicly available datasets from studies comparing broad ranges of human

cancer types using the Affymetrix platform (Ramaswamy et al., 2001; Su et al., 2001).

One conclusion drawn by the authors of this study is that borderline tumours represent an

intermediate stage between benign adenomas and malignant adenocarcinomas, based on

the expression levels for selected genes being mid-way between those observed for the

benign and malignant types. The small sample size and combination of mucinous and

serous type LMP samples question the validity of this statement. If either or both

mucinous LMP specimens profiled were in fact metastatic invasive tumours from another

site such as the appendix, pooling the profiles of these samples may result in an increase

in the mean level of genes associated with malignancy and invasion, thus making the

class appear as a transitional one between the extremities of the other two.

The literature concerning the frequency of malignant transformation of LMP EOC

suggests this phenomenon is extremely rare (Puls et al., 1992). One meta-analysis of over

137 serous LMP tumours found only a single case of recurrence in the form of invasive

cancer was observed from the 66 stage I diagnoses investigated. Of 45 stage II-IV

tumours, six (13%) of these were noted to possess invasive implants and the women

affected experienced an unfavourable outcome, three of whom died of their conditions.

The presence of invasive implants in serous LMP tumours is associated with a

significantly poorer prognosis (Prat and De Nictolis, 2002). From this study it could be

hypothesised that tumours with these pathological features would exhibit a more

‘invasive-like’ gene expression profile if the biopsy of tumour used for microarray

analysis included areas of invasive implants and were not microdissected prior to RNA

extraction and processing (Liotta and Petricoin, 2000).

258

Warrenfeltz et al do not state if microdissection of the LMP tumours was performed or if

any evidence of invasive implants was noted during the review process. Despite these

factors, the list of 163 differentially expressed genes between benign, LMP and invasive

EOC overlaps with high statistical significance with that produced from the larger cohort

(but smaller microarray) used in this analysis. In this list are a high proportion of genes

involved in similar processes as observed from the gene ontology analysis carried in out

in 5.2.7.3, including regulation of growth/proliferation, adhesion and control of DNA

replication and associated events.

Of the differentially expressed genes that overlap with this study, the cell-adhesion

molecule E-cadherin (CDH1) is noted as an example of a gene that progressively

increases over 2-fold in expression from benign, to LMP and then invasive cancer. The

authors propose that since the adhesive function of cadherins is calcium-dependant

(Pokutta et al., 1994) and their list of differentially expressed genes contained molecules

involved in controlling calcium transport or channel activity, that dietary calcium intake

may be related to the development and progression of EOC. Studies of the dietary

patterns of women with and without EOC have in fact showed shown a reduced risk of

ovarian cancer with increased dietary calcium intake (Goodman et al., 2002), supporting

this hypothesis.

Another more recently published microarray study of LMP EOC and comparison to the

invasive type is that by Gilks et al (Gilks et al., 2005). This study used genome-

comprehensive cDNA microarrays to profile 23 samples of serous invasive or LMP

cancer. Contrary to the findings of this study, only a small number of genes were found to

be differentially expressed between the compared tumour classes. Unsupervised filtering

of this dataset reduced the number of genes from >45,000 to 541 on the basis of

excluding genes without log2 2-fold variation from the mean value in at least 3 arrays,

from which a total of 217 genes were identified by SAM gene selection (Tusher et al.,

2001). Inspection of the SAM output reveals all genes selected as differentially expressed

were upregulated in the LMP tumours relative to the invasive cases. This is a surprising

observation as other studies comparing high and low grade tumours have revealed a

significant number of genes with higher expression in the more metabolically-active high-

grade samples. The small number class-discriminating genes and uni-directional change

in expression may be an indication of a technical fault in the experimental stages of this

study. Visual inspection of the hybridised microarray images available from the Stanford

Microarray Database at http://genome-www5.stanford.edu, revealed extremely poor array

printing and hybridisation quality (Sherlock et al., 2001). Examples of three randomly

259

selected microarrays from this dataset are shown in Appendix Q. Individual array features

had poor morphology and there appears to have been problems with uneven hybridisation

of labelled probe across the surface of the printed area. This most likely resulted in a

spatial bias in the final gene expression ratios obtained, affecting all down-stream data

analysis. No mention of the use of a spatially-dependant normalisation algorithm was

made in the manuscript or supporting material, which may have lessened the impact of

these technical issues. As a result of these observations about the claims that only a small

number of genes discriminate LMP and invasive serous EOC, or that the molecular

events responsible for such marked phenotypic differences are below the threshold of

detection by microarrays, must be questioned.

5.3.2. Analysis of differentially expressed genes identified by multiple studies

Only three differentially expressed genes were identified in common between this study

and those by Gilks et al and Warrenfeltz et al. These were: progestagen-associated

endometrial protein (PAEP), connective tissue growth factor (CTGF) and anterior

gradient 2 homolog (AGR2).

PAEP, five-fold upregulated in LMP tumours in this study, codes for a protein known as

glycodelin. This molecule has been detected in both normal and malignant ovaries. It is

also secreted in the endometrium during the menstrual cycle and also during the first

semester of pregnancy (Joshi et al., 1982). One form of the protein encoded by this gene

(glycodelin-a) is known to have immunosuppressive activity, which may assist the slower

growing LMP tumours in slowing or preventing the immune system from successfully

engaging to the extent observed in invasive disease. (Kamarainen et al., 1996) IHC

analysis of 460 serous invasive EOC specimens using the TMA method revealed higher

levels of this gene are associated with a higher 5-year survival rate. Expression decreased

with increasing tumour stage, defined by the extent of invasion observed, with the most

significant reduction occurring between stage III and IV tumours compared to stage 1.

The reduction at stage II was not statistically significant, suggesting that loss of

expression or function of glycodelin may be required for successful invasion beyond the

pelvis, a pathological distinction between LMP and invasive EOC (Mandelin et al.,

2003).

Over-expression of CTGF, coding for a secreted integrin-binding protein, has been

implicated in inhibition of lung adenocarcinoma metastasis and invasion (Chang et al.,

260

2004) however literature concerning its involvement in ovarian cancer is scarce at best.

One group identified overexpression of this gene in a cisplatin-resistant ovarian cancer

cell line through the use of microarray profiling (Sakamoto et al., 2001). However CGTF

is actually down regulated in invasive tumours relative to the LMP type in both this and

the two published studies, which may suggest another mode of action for this molecule in

ovarian cancer.

AGR2/HAG-2 is a human homolog of the cement gland gene in Xenepus laevis and is on

average 5.6-fold upregulated in the LMP tumours profiled in this chapter. The expression

of this gene in breast cancer has been shown to be significantly higher in malignant cell

lines and human tumours compared to benign cell lines and non-cancerous tissues.

Transfection of this gene into non-metastatic cell lines has been shown to result in

metastatic lung formations in animals transplanted with these cells. An increase in the

rate of adhesion was also observed in the AGR2-transfected cells further implicating this

gene in the process of metastasis (Liu et al., 2005a). Increased expression of this gene has

been detected in prostate cancer compared to benign disease, which is the inverse pattern

to that detected in LMP/invasive EOC (Zhang et al., 2005). However another study has

shown this over-expression in prostate cancer is not of prognostic significance

(Kristiansen et al., 2005).

Analysis of genes associated with pancreatic cancer has identified AGR2 as a marker of

malignancy however besides being highly over expressed in this malignancy relative to

normal pancreas, little is know of its specific involvement in this disease as with ovarian

cancer (Missiaglia et al., 2004; Ryu et al., 2002).

In breast cancer this gene is co-expressed with the oestrogen receptor and is thought to

play a role in metastasis through the regulation of receptor adhesion and function. AGR2

is proposed as a novel molecular marker of disease progression or potential therapeutic

target for hormone responsive breast tumours (Fletcher et al., 2003).

Each of these three overlapping genes has an interesting clinical and molecular profile,

across a range of human cancers. Their expression is clearly involved in core processes

associated with tumour invasion, including that of EOC. Each gene has been shown to

have some form of adhesive functionality. Whether by increasing the rate of cell

attachment to a specific substrate (AGR2), significantly decreasing its expression in

response to an EOC adhering to and invading of the pelvis (PAEP), or binding to cell-

attachment intergrins, the adhesive function is shared by all three genes.

261

The very small overlap between all three lists of differentially expressed genes between

LMP and invasive EOC is perhaps not surprising in light of the known heterogeneity of

this disease. Furthermore, observations from microarray studies of other cancer types,

particularly BrCa, have indicated that sample size, specific cohort composition and

method of analysis play a major role in the robustness of lists describing differential gene

expression between tumour subtypes.

5.3.3. Use of gene expression based predictive analysis to confirm specimen diagnosis and identify metastatic disease

One distinguishing feature between this and other studies of LMP EOC is the use of

cross-validated gene-expression based confirmation of tumour diagnosis. Making use of a

large microarray dataset comprising of expression profiles of tumours from 10 different

primary sites, a signature of genes with ovarian-specific expression patters was identified.

Using a combination of machine learning algorithms and LOOCV, each sample of EOC

was subject to classification as either primary ovarian or not, before inclusion in the final

cohort. This was done in association with a standard pathology review which consisted of

review of the H&E stained section taken from the tumour specimen and the original

diagnostic pathology report associated with each case.

In a landmark study by Ramaswamy et al, it was determined that the molecular profile of

a metastatic tumour was more similar to the tissue from which it arose rather than that it

is excised from, suggesting that metastatic potential is a variable that is determined in the

early stages of tumorigenesis (Ramaswamy et al., 2003). Since this a number of other

studies have demonstrated the use of gene expression profiling to identify the origin of

metastases taken from a range of sites (Li et al., 2003; Roepman et al., 2005; Shedden et

al., 2003; Talbot et al., 2005). This technique is used and extended here to confirm the

primary ovarian origin and other relevant classifications of the specimens being analysed.

The list of genes identified (shown in hierarchical cluster format in Figure 5-4), validated

with extensive permutation testing, serves as a molecular fingerprint for EOC and with

further refinement could form the basis of a high-throughput screening test for confirming

the origin of a suspicious sample of tumour found in the ovary, but exhibiting

characteristics of another tissue type. Several examples of this approach exist in the

literature to date, including a study in which a predictive microarray signature of breast

cancer recurrence was translated to 384-well RT-PCR format using the ABI 7900 real-

time thermal cycler (Paik et al., 2004) and also a recently published PCR-based predictor

262

of the primary origin for cancers of unknown primary by Tothill et al (Tothill et al.,

2005).

In this chapter, further to confirming the primary ovarian origin of a given tumour

specimen, predictive analyses was also carried out to confirm the histological subtype and

LMP/invasive classification. With further development it may be possible to combine the

outcome of these lists of predictive genes and generate a single test capable of predicting

a range of clinically important parameters on the basis of a small number of genes,

therefore minimising expense and delay in such information being available to the

treating clinician.

5.3.4. Cell adhesion molecules and EOC malignancy

The importance of cell-cell and cell-matrix functions in controlling the malignant

potential of a range of cancer types has become apparent over recent years. The

metastatic process in EOC involves cancer cells being shed from the epithelium,

dissemination throughout the pelvis and localised proteolysis leading to invasion

throughout the body (Rodriguez et al., 2001). A defining characteristic of true LMP

tumours is their inability to spread beyond the pelvis, yet ability to grow successfully in

this area, avoiding the immune response for many years (Allen et al., 1987; Kurman and

Trimble, 1993; Prat and De Nictolis, 2002). It is therefore hypothesised that differential

expression of cell adhesion molecules play a pivotal role in determining the clinical

behaviour and consequent mortality rates of these two disease subtypes.

From this study it was observed that genes with cell-cell/matrix adhesive functionality

were a differentially regulated class of genes between invasive and LMP EOC. This was

found only after removal of the large number of genes whose selection during the

analysis of LMP and invasive EOCs was due to the faster growth rate and increased

metabolic activity of the invasive form of this disease (D'Andrilli et al., 2004; Prat and De

Nictolis, 2002). This observation agrees with the study by Warrenfeltz et al (Warrenfeltz

et al., 2004), who also noted the over representation of genes of these classes in their

analysis of tumours of varying malignant potential. The finding is taken further in this

chapter by the use of pathway analysis to elucidate a network of interconnected genes

whose combined regulation may be responsible mediating a tumour’s invasive ability.

The gene expression network deduced from pathway analysis in section 5.2.8 represents a

collection of genes whose products are crucial for a tumour cells’ ability to adhere to each

other, as well as the mesothelium lining in the peritoneal cavity, a major route of invasion

263

for EOC (Gardner et al., 1995). Amongst these genes is mesothelin (MSLN), for which a

soluble product has recently been demonstrated to bind exclusively to the CA-125

molecule. This binding has been shown to initiate cell adhesion processes. Conversely,

antibodies binding and blocking the mesothelin protein results in the inability of an

ovarian cell line to successfully attach to a mesothelin coated substrate. Both MSLN and

CA-125 exhibit higher levels of expression in advanced grade tumours and it is

hypothesised that their interaction may represent a crucial stage in the attachment of

cancer cells to the mesothelial epithelium, facilitating the invasion process (Rump et al.,

2004). While the cDNA clone for CA-125 was not present on the particular cDNA

microarray used for this study, the mesothelin gene was on average 2.1-fold higher

expressed in the invasive tumours profiled.

The tumour-antigen CA-125 is measured routinely as a measure of disease burden and

EOC patient prognosis. It is expressed in both normal and malignant ovarian tissue;

however release into the extracellular space is strongly associated with tumorigenesis

(Meyer and Rustin, 2000). Although it was discovered some time ago and is accepted as a

reliable prognostic marker, its precise function in EOC remains unknown (Bast et al.,

1981; Meyer and Rustin, 2000). It has a demonstrated high-affinity interaction with an

extracellular matrix protein, galectin 1 (LGAL1). This molecule is a cell surface lectin

implicated in the regulation of cell-cell/matrix adhesion. It has also been demonstrated

that the binding characteristics of CA-125 may be regulated by gene expression in tissues

surrounding the tumour cell on which it is expressed (Seelenmeyer et al., 2003). Other

studies have demonstrated the addition galactin 1 results in a dose-dependant increase in

EOC cell adhesion to both laminin-1 and fibronectin (FN1) (van den Brule et al., 2003)

Another recent finding which has stimulated interest in galactin 1, described its

immunosuppressive function. By directly causing T cell apoptosis, specifically at the site

of tumour infiltration, this molecule appears to aid in tumour progression by preventing

an effective immune response (He and Baum, 2004).

Fibronectin (FN1) is expressed on the cell surface and also in the extracellular matrix of

EOC cells, however it is the intracellular form of this molecule that has the most

significant impact on the behaviour of invasive EOC (Zand et al., 2003). This gene, like

galactin 1, to which it binds, also has multiple functions additional to its adhesive

properties, including tumour neovascularisation, immunosuppression and prevention of

apoptosis – all crucial events during tumour proliferation (Franke et al., 2003). In this

study, a 1.5-fold statistically significant increase in its expression was detected in

invasive EOC compared to the LMP tumours. Others have demonstrated its correlation

264

with tumour stage and growth fraction making it a candidate prognostic factor. In a study

to determine the level of influence the mesothelium exerts over the motility and

invasiveness of ovarian cancer cells, monoclonal antibody-blocking of fibronectin was

observed to significantly inhibit ovarian cancer cell motility (Rieppi et al., 1999). When

EOC cells are bound to fibronectin however, they are protected from apoptosis when

exposed to chemotherapy drugs including cisplatin and paclitaxel, further underscoring

the importance of this molecule in ovarian malignancies (Jun S, 2003).

One of the adhesion genes upregulated in the LMP tumours analysed is CD44,

traditionally known as a homing receptor that interacts with members of the ezrin (VIL2)

family of genes and also capable of binding to extracellular fibronectin (Jalkanen and

Jalkanen, 1992; Martin et al., 2003). The CD44/ezrin combination has been found in a

range of cancer types and is thought to assist the tumour cell locating and binding to

favourable organs for invasion and metastasis. Both genes are down-regulated in invasive

tumours profiled in this analysis. CD44 is a surface glycoprotein with a range of functions

including cell-matrix adhesion, whereas ezrin is a member of the ERM protein family

(Ezrin, Radixin, and Meosin) which performs structural and regulatory roles in plasma

membrane domains. In ovarian cancer, CD44 expression is detected in primary tumours

but much less frequently in metastatic growths (Bar et al., 2004), which is thought to be

due to hypermethylation of its promoter region. This reduction in expression correlates

with stage, survival and dissemination of cancer cells to surrounding tissues (Martin et

al., 2003).

In colorectal cancer studies the function of ezrin in controlling the adhesive potential of

tumour cells has been demonstrated through the use of antisense oligonucleotides.

Inhibiting ezrin expression was observed to result in reduced cell-cell adhesion and

subsequent increase in motility and invasiveness. An association with the cell adhesion

molecule E-cadherin (CHD1) was also noted through coprecipitation studies (Hiscox and

Jiang, 1999), another member of the differentially expressed gene network identified in

this chapter.

Also present in the expression network are CD9, CD36 and CD47 which code for

membrane proteins involved in mediating a tumour cells adhesive interactions with the

surrounding stroma and (unlike CD44) are upregulated in the majority of invasive EOC

specimens profiled. Expression of these genes is associated with cellular differentiation

265

This study points to the importance in cell-cell adhesion junctions in EOC as a major

determinant of a tumour’s malignant potential. Impairment of a range of adhesion

interactions resulting from (or as a consequence of) over or under expression of a

relatively small number of genes, with seemingly crucial roles, appears to be closely

connected to a tumour’s intra-abdominal spread and subsequent invasion to other parts of

the body.

E-cadherin is one of the central molecules in the gene expression network shown in

Figure 5-10 and was identified as being, on average, 2-fold upregulated in LMP tumours.

Conflicting reports exist in the literature concerning the regulation of this gene and its

impact on cellular adhesion, differentiation and tumour proliferation. Some studies show

a decrease in expression correlates with dedifferentiation and increased invasive

properties in free floating EOC cells, invasive endometrial carcinoma and other cancer

types including bladder and prostate (Fujimoto et al., 1997; Ross et al., 1995; Sakuragi et

al., 1994; Umbas et al., 1992; Veatch et al., 1994). Other studies have shown levels of E-

cadherin mRNA is higher in weakly metastatic cell lines compared to those with high

metastatic capability (Hashimoto et al., 1989), although cell lines can often exhibit

substantial differences in gene expression compared to primary tumours (Ross and Perou,

2001). Conceptually, genes involved in extracellular interactions such as adhesion to

other cells and extracellular matrix may be the most severely affected by the in-vitro

culturing process. Despite this several studies have shown that E-cadherin can be detected

in benign, LMP and malignant EOC (Sundfeldt et al., 1997). One hypothesis for this

apparent variation in expression between tumour types and stages of malignancy is a

transient up and down-regulation of E-cadherin, and feasibly other associated

differentially expressed genes identified in this study, to promote tumour proliferation

and multiple phases of invasion (Sundfeldt, 2003). As a significant number of genes also

found discriminating between invasive and LMP EOC were involved in promotion or

suppression of the immune response, a punctuated process of invasion may actually assist

the tumour in prevention of a successful attack from the bodies natural defences.

In the biological validation section of this chapter no significant difference was detected

between IHC detection levels of e-cadherin, despite the significant reduction observed in

the invasive EOC samples by microarray analysis. One explanation for this apparent lack

of correlation may be the higher stromal content of LMP tumours, in which E-cadherin is

not expressed. Profiling of tumours of this kind would result in a reduction of the amount

of e-cad expression detected by IHC per high-power field compared to an equivalent area

of invasive EOC tumour, which has a generally higher tumour to stroma ratio.

266

Figure 5-16: Examples of ER-stained LMP and invasive tumour. (A) Sample LMP tumour with a higher proportion of stroma, and lower proportion of tumour present per high-power field. On average, mRNA expression of ERα is higher in LMP tumours than the invasive type, but lower expression or no change was observed at the protein level via IHC analysis. (B) Sample invasive tumour showing, more tumour material and less stroma per HPF.

A

B

267

5.3.5. High throughput analysis of TMA IHC

Also described in this chapter is a method for automation of antibody staining intensity

for large numbers of tumour sections, as generated from the use of TMAs. A number of

studies have demonstrated the use of commercially available image analysis and statistics

software for the objective analysis of IHC (Lehr et al., 1997; Lehr et al., 1999;

Matkowskyj et al., 2000), a frequent objective of gene expression studies to determine the

biological significance of differentially expressed mRNA levels in a given tissue sample

(Abd El-Rehim et al., 2005; Al Kuraya et al., 2004; Liu et al., 2005b; Pacifico et al.,

2004). Other groups have published and distributed software designed for handling the

large numbers of images and relevant clinical information associated with a TMA (Liu et

al., 2002).

The method proposed and demonstrated in this study makes use of Microsoft Visual

Basic scripting (Microsoft Corporation, USA) and several automated features of Adobe

Photoshop (Adobe Systems Inc., USA), to convert complex digital images of IHC stained

tumour sections to determine the mean pixel intensity of a given image and associated

standard deviation value. The method makes use of the threshold function to eliminate

unstained areas of a section during the quantification process. Because this function is

based on a user-definable level from 1 – 255 (arbitrary units), it allows the flexibility of

adapting the method for varying levels of background or secondary haematoxylin

staining. Whilst the method is not fully automated in the form of a packaged graphical-

user-interface, the Visual Basic code written and protocols described allowed the analysis

of almost 1000 IHC images in a relatively short period of time and could potentially be

developed into a more user-friendly application.

5.4. Summary and conclusions from chapter This chapter describes the use of molecular profiling to characterise EOC in relation to

other tumour types and also to explore potential reasons for the differences between the

LMP and invasive subtypes of the disease. Through extensive pathology reviewing and

microarray-based predictions, a high quality cDNA microarray dataset was generated

which was capable of predicting the primary origin of a tumour found in the ovary, as

either primary ovarian or metastatic, on the basis of a 231 gene signature. Extensive

bioinformatic analysis of the microarray data confirmed the diagnosis of the specimens of

primary serous invasive and LMP cancer and also revealed a large number of

268

differentially expressed genes. A significant proportion of the genes differentiating

between these tumour types were found to be involved in the bodies immune reaction to

invading tumour cells, control - or perhaps loss of control - of the cell cycle leading to

unrestrained proliferation. After ontology filtering, a number of inter-connected cell-cell

or cell-matrix adhesion molecules were also identified as having differential expression

between EOC subtypes. These genes regulate a tumour cells ability to attach to surfaces

and it is proposed their disregulation resembles a crucial first step of the invasion process.

Gene expression pathway analysis was performed and a network of interacting

differentially expressed genes was deduced with a central function of cell adhesion

regulation, a reoccurring theme in several sections of this chapter. Finally, RT-PCR and

IHC on TMAs were used to confirm the expression of selected genes and their

corresponding proteins on a series of independent samples.

269

6. Discussion & Conclusion

6.1. Summary of major findings

6.1.1. Optimisation of microarray technology for large-scale tumour profiling studies

Despite the significant advances in molecular biology, robotics and computing power that

have made cDNA microarray technology accessible to many laboratories, ranging from

major pharmaceutical companies to small academic research groups, the generation of

high quality gene expression data remains a complex process that is dependant on many

factors. Chapter Three of this thesis describes the analysis of a number of important

variables in the microarray work flow that have the ability to impact on the quality and

interpretation of cDNA microarray data and thus impact on the success or failure of an

experiment carried out using this tool.

A novel method for quantifying and visualising spatial bias in cDNA microarray data was

proposed and its ability to favourably impact on a bioinformatic analysis of genes that

were predictive of tumour relapse was demonstrated. This simple statistical test is

applicable to any cDNA microarray platform and makes use of widely available statistical

software. An automated script to facilitate its application to large datasets was also

described. This test was also demonstrated to be an effective way of monitoring the

impact of new image analysis algorithms and software packages, or changes to the

printing or scanning protocols and/or equipment by providing objective measurements of

spatial bias, rather than relying on a subjective visual interpretation.

There are important practical and data-quality issues associated with the choice of a

reference RNA for a large-scale gene expression study. By comparing gene expression

data obtained from a series of EOC specimens hybridised to two different types of

reference RNA, it was determined that a reference comprised of cell-line RNA was, albeit

marginally, the most appropriate choice for a study of EOC gene expression. This

conclusion was based on the proportion of the total probe set detectibly hybridised by the

reference RNA, the ability of the data generated to discriminate between tumour

subtypes, and practical considerations associated with ease of cross-study comparison and

the durability of the reference RNA resource. Although equal proportions of the

microarray analysed was detectably hybridised by each of the references tested, the

reference material generated by combining RNA extracted from a subset of the cohort of

270

interest (i.e. project-specific) identified a number of extra genes involved in a tumour’s

interaction with its environment. This may be an important factor for studies seeking to

explore this aspect of tumour biology.

By comparing expression data generated by two microarray scanners with a number of

different features, it was observed that a significant amount of systematic error can be

introduced at this stage of the cDNA microarray work flow. It was found that one scanner

offered clear advantages in terms of lower spatial bias and more accurate individual gene

expression measurements at a range of different Cy3:Cy5 ratios and absolute abundances.

Importantly, the findings from chapter 3 were applied to the expression profiling and

bioinformatic analyses of EOC carried out in chapters 4 and 5.

6.1.2. Gene expression based prediction of patient survival

Gene expression profiling has been used to explore the molecular basis of a wide range of

solid human cancers, including malignancies of the breast (Hedenfalk et al., 2001; Seth et

al., 2003; Sorlie et al., 2001), prostate (Bull et al., 2001; Calvo et al., 2002; Dhanasekaran

et al., 2001; Singh et al., 2002), gastrointestinal tract (Boussioutas et al., 2003; Hasegawa

et al., 2002; Hippo et al., 2002). They have also been used successfully to profile various

forms of leukaemia and lymphoma (Alizadeh et al., 2000; Golub, 2001; Khan et al., 1998;

Lossos et al., 2004). In these studies, the gene expression data was analysed in the context

of various types of clinical information, such as histological subtypes, response to

treatment and length of survival or in relation to genetic information such as the mutation

status of genes such as BRCA1 or BRCA2.

Studies of EOC using microarrays to profile gene expression related to clinically

important variables have been limited by small sample sizes and the heterogeneous nature

of this type of cancer. During the course of this thesis a number of studies were published

in which comparisons between EOC and normal ovarian surface epithelium or

histological subtypes were made (Ono et al., 2000; Schaner et al., 2003; Schwartz et al.,

2002; Wang et al., 1999; Welsh et al., 2001). During the later stages of this work several

studies were published in which expression profiles were related to tumour grade,

malignant potential or patient prognosis (Gilks et al., 2005; Jazaeri et al., 2003; Spentzos

et al., 2004; Warrenfeltz et al., 2004).

Chapter 4 sought to profile specimens of EOC surgically removed from patients with

varying survival times. The goal of this chapter was to identify gene expression patterns

271

that correlate with the length of survival and explore the biology behind these patterns.

Based on compelling evidence that the amount of residual disease following surgery

impacts substantially on patient survival, analyses were confined to those patients with

adequate clinical information. The resulting cohort was analysed with using a range of

approaches, including methods for considering survival as a continuous or a categorical

variable. None of the analyses generated a statistically significant list of survival-related

genes, almost certainly due to the limited sample size. Despite this, within the gene lists

that were generated, a number of biologically interesting relevant genes were observed,

many of which are implicated in the development and progression of other cancer types,

other reproductive system diseases, as well as regulation of cell growth and proliferation.

An analysis similar to that by Spentzos et al (Spentzos et al., 2004) came the closest to

generating a statistically significant list of differentially expressed genes between patient

survival groups. The list of 27 genes obtained was enriched for molecules known to be

involved in calcium-binding and calcium channel functions. Coupled with interesting

epidemiological evidence about the level of dietary calcium intake and EOC-risk and also

the known importance of calcium-dependant cell-adhesion molecules in EOC progression

and invasion, these 27 genes may represent a prognostic group worthy of further

investigation.

Gene lists obtained from other studies of EOC were related to the data generated for this

chapter to determine whether they were able to segregate patients on the basis of survival,

level of residual disease or probability of tumour relapse. These gene lists were not able

to predict these key clinical parameters to the same extent achieved in their original

studies. This may reflect the heterogeneity of EOC leading to the difficulty of applying

findings based on one cohort of specimens to other samples or a deficiency in the quality

of the microarray data generated for this section of the thesis. It may also indicate that the

sample sizes used to analyse EOC have been insufficient to result in a truly universal

prognostic signature for this cancer type.

The phenomenon of non-overlapping sets of genes generated by independent studies of

the same cancer type, particularly for breast cancer for which a number of molecular

signatures for predicting the development of metastases, has been described by Ein-Dor

et al (Ein-Dor et al., 2005). By focusing on one published dataset, that of van’t Veer et al

(van 't Veer et al., 2002), it was found that no single gene had a very high correlation to

the outcome variable, rather a large number of genes in the dataset had moderately

correlating patterns of expression. As a result, a number of different non-overlapping

272

predictive subsets were identified that produced the same classification accuracy as the

published 70-gene profile. The ranking of genes in order of prediction accuracy was

observed to fluctuate drastically with even small changes in the training cohort of

samples.

The issue of sample size on the discovery of significant predictive gene expression

signatures using microarrays has been evaluated by Ntzani et al (Ntzani and Ioannidis,

2003). This was achieved by comparing the cohort size and prediction accuracies of 84

published studies in which microarray data was used to predict clinical outcomes such as

death, metastasis, recurrence or response to therapy. It was found that a doubling in the

number of samples profiled, lead to a 3.5-fold increase in the probability of a study

identifying a significant association between gene expression and outcome. Furthermore

it was also revealed that a significant association was 9.7 times more likely with each ten-

fold increase in the number of clones represented by the microarray platform used.

Together these observations about the possibility for generating multiple signatures of

outcome from a given microarray dataset, coupled with the impact of cohort and

microarray clone set size, explain how studies of the one cancer type can result in non-

overlapping sets of outcome-related genes. As cohort sizes in published studies continue

to expand over time and newer microarrays are developed in which the vast majority of

genes in the human genome can be profiled simultaneously, it is expected to see a

convergence of predictive gene signatures in the future.

6.1.3. Molecular characterisation of ovarian LMP and invasive epithelial cancer

The LMP type of EOC represents a subtype of ovarian cancer with a number of clinically

important differences to the invasive form, which result in it having a significantly more

favourable prognosis. An analysis of both the mucinous and serous types of LMP tumour

was planned, however after it was determined with pathology review and microarray

analysis, that a significant proportion of the mucinous EOC specimens were actually

metastases from other tissues. Consequently a decision was made to restrict the

investigation to the serous type tumours only.

Through pathology review and gene expression-based predictive analysis using LOOCV,

a high quality gene expression dataset and associated clinical information was generated.

A large proportion of the detectably hybridised genes present on the Peter Mac 10.5k

273

human cDNA microarray were found to be differentially expressed between carefully

selected and validated samples of serous LMP and invasive EOC.

As a part of this analysis, a predictive signature of EOC based on 231 genes was

identified. This was shown to be capable of accurately discriminating between samples of

primary EOC and a series of several hundred tumours of other origins, representing nine

types of primary tumour. Characterisation of these genes with gene ontology analysis

revealed that the EOCs have a unique pattern of cell-adhesion gene expression which

distinguishes then from the other cancer types used in the analysis.

Using a combination of gene ontology and novel pathway/network analysis a series of

interacting genes were identified that share a common function of controlling the

processes required to maintain cell-cell or cell-matrix adhesion, a critical step in the

process of tumour invasion. Also differentially expressed between LMP and invasive

serous EOC were large numbers of genes involved in regulation of the cell cycle and a

significant proportion of genes related to the bodies’ immune reaction to an invading

tumour.

Comparison of the genes identified as differentially expressed between LMP and invasive

EOC in this study to other published studies revealed a number of similarities, particular

in terms of the biological processes represented. Statistically significant overlaps were

observed between the genes found in this chapter and those found from studies of breast

cancer DCIS and IDC, gastric cancer depth of invasion and a recently identified

transcriptional profile of undifferentiated cancer that appears to be almost universally

activated in human cancer. These findings suggest that the molecular events responsible

for the phenotypic differences between LMP and invasive EOC are similar to those

responsible for malignancy and invasion in other parts of the body

6.1.4. EOC and the differential expression of genes involved cell adhesion processes; a reoccurring theme

Genes that have been demonstrated to regulate a cell’s ability to adhere to other cells of

the same kind and/or cells of the extracellular matrix were identified in this study as

having key roles in (i) molecular differentiation of EOC from nine other primary tumour

types, (ii) processes relating to length of patient survival and (iii) the phenotypic

differences between LMP and invasive disease.

274

The importance of adhesion genes in ovarian malignancies has been described by a

number of authors (Davies et al., 1998; Hashimoto et al., 1989; Lessan et al., 1999; Patel

et al., 2003; Rump et al., 2004; Sundfeldt, 2003; Zand et al., 2003). This study extends

previous findings by further underscoring their correlation with extent of disease spread

and survival. The extent to which cell adhesion genes are involved in EOC malignancy,

relative to nine other tumour types, was indicated by the significant representation of this

gene ontology in the 231 gene signature of EOC identified in Chapter 5. This gene set

was capable of identify primary EOC from metastatic disease in the ovary, thus the genes

can be viewed as representing EOC-unique processes. This highlights the importance of

also observing genes of this functionality as having a key role in the LMP/invasive

phenotype and also the potential benefits that may come from the therapeutic

manipulation of these processes.

The cell adhesion gene PLAU, identified as differentially expressed between serous LMP

and invasive EOC in Chapter 5. The interaction of PLAU with its receptor was recently

studied by Krol et al in a model of EOC progression and invasion. Tri-functional

inhibitors composed of N-TIMP-1 or -3 (human matrix metalloproteinase inhibitors) and

a chicken variant of the protease inhibitor cystatin, harbouring the PLAUR binding site of

PLAU, (chCys-PLAU19-31) have been transfected into in ovarian cancer cells lines to

test their ability to reduce the growth and spread of ovarian cancer cells (Krol et al.,

2003). The transfected cell lines were observed to display the same adhesive and

proliferative features as a vector-only transfected control line, however exhibited a

significant reduction in invasive potential in vitro. By inoculating the cell lines into the

peritoneum of nude mice a significant reduction in tumour burden was observed with the

inhibitor expressing cell lines relative to those with the vector alone, indicating the

potential for these inhibitors to be used as gene therapy agents against solid ovarian

malignancies.

Another promising study into the potential therapeutic benefits of anti-adhesive drugs

involved their use in conjunction with standard chemotherapeutic agents to improve the

efficiency of tumour cell killing. Tumour cells grown as multicellular spheroids are

known to be inherently more resistant to a large array of chemotherapeutic drugs

compared to the same cells grown as dispersed monolayer cell cultures. This process is

known as acquired multicellular resistance (Kobayashi et al., 1993).

The drug hyaluronidase has been demonstrated to sensitise tumour cells to a range of

chemotherapeutic agents. In a study of the effect of this drug on mouse mammary cell

275

lines it was observed that this agent was able to disrupt tight clusters of cells which

resulted in a significant increase in chemosensitivity. For in vitro and in vivo models, this

drug was able to disrupt inter-cellular adhesion and sensitise cells and tumours to the

chemotherapeutic agent tested. By actually dispersing clusters of cells, this observation

supports the hypothesis that its chemosensitising ability is not a result of increased drug

penetration, rather of its anti-adhesive properties. It is suggested that by over-riding cell

contact-dependent growth inhibition more cells are actively dividing, thus increasing the

proportion of tumour cells sensitive to cytotoxic agents (Croix et al., 1996).

Other methods for therapeutically manipulating cell adhesion genes include the use of

anti-E-cadherin monoclonal antibodies. By using these antibodies to disrupt E-cadherin-

mediated cell adhesion interactions in multicellular spheroids of colorectal cancer cells a

resensitisation to a range of chemotherapy agents was observed, including paclitaxel

(commonly used for EOC) but not to cisplatin. This demonstrates the principal of

modifying a tumours adhesive ability to enhance the efficacy of conventional therapies

(Green et al., 2004).

FAK/PTK22 is a non-protein tyrosine kinase that becomes phosphorylated and activated

during integrin-mediated EOC cell adhesion. Recent cell line studies have demonstrated

that the expression of this gene is up-regulated in invasive EOC and is significantly

associated with an aggressive phenotype, corresponding to a poor outcome in patients. In

addition, it was found that by inhibiting FAK phosphorylation by introducing a dominant-

negative construct called FAK-related non-kinase (FRNK) into highly aggressive EOC

cells, a decrease in invasion (56-85% decrease), migration (52-68% decrease), and cell

spreading were observed. This indicates both the importance of this gene’s function in

key metastatic events and also its potential use in cell-adhesion based gene therapy

approaches to EOC treatment (Sood et al., 2004).

In summary, there appears to be growing interest in the use of gene therapy approaches to

modify the expression and/or function of molecules involved in regulating cell adhesion

interactions for a range of cancer types. This study provides several lists of candidate cell-

cell and cell-matrix adhesion genes that with further validation and translational studies

could be used in this fashion to alter the clinical course of invasive EOC.

2 The feature corresponding to PTK2 on the Peter Mac Microarray 10.5k human cDNA microarray was excluded during the unsupervised filtering of non-expressing genes in the Chapter 5 analysis of LMP and invasive EOC (Log-ratio variation P-value = 0.98). However an individual ANOVA of this gene revealed a significantly higher PTK2 expression in the serous LMP tumours (P=0.002).

276

6.2. Future directions

6.2.1. Meta-analysis of gene expression datasets

Combining of datasets from publicly available datasets is one method for increasing the

total sample size available for the type of bioinformatic analyses carried out in this thesis.

With the adoption of the MIAME guidelines and requirement for complete gene

expression profiles to be made publicly available prior to manuscript acceptance, the

body of raw data available for meta-analysis is continually expanding.

Many microarray studies referenced in this thesis have not made their entire datasets

made publicly available. Most have opted to providing only the data specifically relating

to identified differentially expressed genes. This has restricted the opportunity for meta-

analysis of data generated for this study and other suitable microarray studies of EOC

outcome (Lancaster et al., 2004; Spentzos et al., 2004) or malignancy (Warrenfeltz et al.,

2004). The raw gene expression data used by Gilks et al (Gilks et al., 2005) to analyse

serous LMP and invasive EOC was recently made available through the Standford

Microarray Database (SMD) (Sherlock et al., 2001). However, as discussed in Chapter 5,

inspection of these arrays revealed a range of hybridisation artefacts and irregular probe

distribution, reducing the value of the data extracted from these microarrays for meta-

analysis.

Databases such as SMD, ArrayExpress (Brazma et al., 2003) and the Gene Expression

OmniBus (Edgar et al., 2002) are online repositories of a wide range of high-throughput

data generated from both single and dual channel microarray experiments of mRNA,

genomic DNA and protein abundance (proteomics). It is hoped that with the increasing

adoption of policies requiring the full disclosure of raw expression data prior to

publication, along with the associated clinical information required to repeat the analyses

described, opportunities to extend this work by meta-analysis will arise.

6.2.2. Extension of cDNA expression dataset with Affymetrix GeneChip profiling

During the course of this project, the cost of Affymetrix GeneChip microarrays reduced

to a level where they are a viable option for large-scale tumour profiling studies, such as

that being carried out by the AOCS. As well as the price reduction, the refinement of the

construction of these chips and the extensive clone sets from which they are generated

has enabled the creation of the ‘whole genome’ expression microarray known as the

277

Human Genome U133 Plus 2.0. This single chip array contains over 47,000 transcripts,

representing far more of the human genome than the 10.5k cDNA used for this study.

38,500 well characterised human genes are included and multiple independent measures

of each transcript per array, claimed to increase data accuracy and reproducibility by

lowering the probability of identifying differentially expressed genes by chance.

As a result of these developments, future tumour profiling by the AOCS will be carried

out using Human Genome U133 Plus 2.0 arrays. Approximately 500 specimens of EOC

with extensive clinical annotation are planned to be profiled on this platform. As shown

in Figure 6-1, the clone set used to create the Affymetrix array contains 70.3% of those

genes present on the cDNA microarray used in this study. This overlap includes 78.6% of

those identified as differentially expressed between LMP and invasive EOC in Chapter 5.

The future microarray profiling planned by the AOCS includes an extensive analysis of

genes related to patient survival and will involve many more samples than those analysed

in this thesis. The majority of specimens to be analysed on the Affymetrix platform are

from patients prospectively recruited into the study and as a result a more extensive and

complete level of clinical annotation will be available. Another branch of the AOCS

study will extend upon the analysis of LMP and invasive tumours carried out here, as

well as other histological subtypes such as clear cell and endometrioid tumours.

These future studies will benefit from the experience gained by this present body of work,

particularly with respect to issues of sample size and value of specimen review by

pathologists who are experts in the field of gynaecological pathology. The analytical

methods outlined throughout this thesis are entirely scalable and will potentially serve as

a guide for the bioinformatic investigation of future Affymetrix GeneChip datasets.

The substantial overlap of genes present on the Peter Mac 10.k5 cDNA microarray with

the Affymetrix platform will allow the data generated in this study to be incorporated

with future analyses. This will increase the total sample size and probability of observing

significant relationships between gene expression and clinical variables. Another possible

future use of data from this thesis is as an independent validation set, for evaluating

findings generated by Affymetrix profiling. This would achieve the double purpose of

confirming the significance of a discovered molecular signature on biological specimens

completely independent to the original cohort and also using a microarray platform based

on a different probe type, demonstrating platform independence.

278

Figure 6-1: Visualisation of clone set overlap between Peter Mac 10.5k cDNA microarray and Affymetrix U133A Plus 2.0 genechip. Array features were matched using the gene-list homology function of Silicon Genetics Genespring 7.2 (Agilent Technologies, USA) which identifies genes shared between two lists on the basis of UniGene (build #184) and/or LocusLink identifiers. The region shown in orange corresponds to the overlap between clone sets used to generate the two microarrays. The red and yellow regions indicate those features unique to the Peter Mac and Affymetrix platforms respectively.

Affymetrix U133A Plus 2.0 Total features: 54,978

7,695

47,283

3,249Peter Mac 10.5k cDNA microarray

Total features: 10,944

279

6.2.3. Translation of findings to in vivo studies of gene function and the potential for clinical application

A number of findings described in thesis could be extended upon to determine their

clinical relevance through the use of translation approaches. These include the use of

RNA-interference (RNAi) to modulate the expression of the cell adhesion genes

identified in Chapters 4 and 5 in cell lines derived from either LMP or invasive EOC.

Any observed difference in cell growth, proliferation, adhesive or spreading ability may

suggest the suitability of a gene for further analysis using three dimensional cell culturing

systems or animal models of EOC.

Monoclonal antibodies targeted to gene products of differentially expressed genes could

also be used to investigate the in vivo effect of blocking cell adhesion gene products

and/or their specific receptors. Findings from this thesis indicate such an experiment

would result a significant reduction in a tumour’s ability to invade through other tissues

and organs. As a result its growth may be restricted to a more localised area, increasing

the chances it being completely excised by surgery alone and consequently dramatically

improving patient prognosis as indicated by the significant relationship between residual

disease and outcome.

6.2.4. Conclusion

The work outlined in this study critically evaluates several important aspects of the

microarray workflow and determines a number of methods for generating high quality

cDNA microarray gene expression data with a minimum of systematic error. It also

provides an analysis of genes involved in both length of patient survival and the LMP or

invasive phenotype of this highly lethal disease. These analyses point to calcium-

dependant cell adhesion molecules as potential novel therapeutic targets which could be

manipulated to improve patient prognosis and enhance the efficacy of existing treatments.

The future microarray and translational work planned as part of the ongoing AOCS as

well as meta-analysis of any suitable publicly available will extend upon and hopefully

support the findings of this thesis. With the increase in understanding of the molecular

foundation of EOC, brought about by high throughput genomic tools and analytical

approaches such as those used in this study, it is hoped a reduction in the burden of this

disease on the community will be seen in the near future.

280

281

7. Bibliography

Abd El-Rehim, D. M., Ball, G., Pinder, S. E., Rakha, E., Paish, C., Robertson, J. F.,

Macmillan, D., Blamey, R. W., and Ellis, I. O. (2005). High-throughput protein

expression analysis using tissue microarray technology of a large well-characterised

series identifies biologically distinct classes of breast cancer confirming recent cDNA

expression analyses. Int J Cancer.

Adib, T. R., Henderson, S., Perrett, C., Hewitt, D., Bourmpoulia, D., Ledermann, J., and

Boshoff, C. (2004). Predicting biomarkers for ovarian cancer using gene-expression

microarrays. Br J Cancer 90, 686-692.

Agarwal, R., and Kaye, S. B. (2003). Ovarian cancer: strategies for overcoming resistance

to chemotherapy. Nat Rev Cancer 3, 502-516.

Agarwal, R., and Kaye, S. B. (2005). Prognostic factors in ovarian cancer: how close are

we to a complete picture? Ann Oncol 16, 4-6.

Ahmed, A. A., Vias, M., Iyer, N. G., Caldas, C., and Brenton, J. D. (2004). Microarray

segmentation methods significantly influence data precision. Nucleic Acids Res 32, e50.

Al-Shahrour, F., Diaz-Uriarte, R., and Dopazo, J. (2004). FatiGO: a web tool for finding

significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20,

578-580.

Al Kuraya, K., Simon, R., and Sauter, G. (2004). Tissue microarrays for high-throughput

molecular pathology. Ann Saudi Med 24, 169-174.

Albiston, A. L., Obeyesekere, V. R., Smith, R. E., and Krozowski, Z. S. (1994). Cloning

and tissue distribution of the human 11 beta-hydroxysteroid dehydrogenase type 2

enzyme. Mol Cell Endocrinol 105, R11-17.

Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A.,

Boldrick, J. C., Sabet, H., Tran, T., Yu, X., et al. (2000). Distinct types of diffuse large B-

cell lymphoma identified by gene expression profiling. Nature 403, 503-511.

282

Allen, H. J., Porter, C., Gamarra, M., Piver, M. S., and Johnson, E. A. (1987). Isolation

and morphologic characterization of human ovarian carcinoma cell clusters present in

effusions. Exp Cell Biol 55, 194-208.

Altman, D. G. (2001). Systematic reviews of evaluations of prognostic variables. Bmj

323, 224-228.

Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic local

alignment search tool. J Mol Biol 215, 403-410.

Ambroise, C., and McLachlan, G. J. (2002). Selection bias in gene extraction on the basis

of microarray gene-expression data. Proc Natl Acad Sci U S A 99, 6562-6566.

Ambrosini, G., Adida, C., and Altieri, D. C. (1997). A novel anti-apoptosis gene,

survivin, expressed in cancer and lymphoma. Nat Med 3, 917-921.

Anttila, M., Kosma, V. M., Ji, H., Wei-Ling, X., Puolakka, J., Juhola, M., Saarikoski, S.,

and Syrjanen, K. (1998). Clinical significance of alpha-catenin, collagen IV, and Ki-67

expression in epithelial ovarian cancer. J Clin Oncol 16, 2591-2600.

Anttila, M. A., Kosma, V. M., Hongxiu, J., Puolakka, J., Juhola, M., Saarikoski, S., and

Syrjanen, K. (1999). p21/WAF1 expression as related to p53, cell proliferation and

prognosis in epithelial ovarian cancer. Br J Cancer 79, 1870-1878.

Aris, V. M., Cody, M. J., Cheng, J., Dermody, J. J., Soteropoulos, P., Recce, M., and

Tolias, P. P. (2004). Noise filtering and nonparametric analysis of microarray data

underscores discriminating markers of oral, prostate, lung, ovarian and breast cancer.

BMC Bioinformatics 5, 185.

Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A.

P., Dolinski, K., Dwight, S. S., Eppig, J. T., et al. (2000). Gene ontology: tool for the

unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25-29.

Auersperg, N., Pan, J., Grove, B. D., Peterson, T., Fisher, J., Maines-Bandiera, S.,

Somasiri, A., and Roskelley, C. D. (1999). E-cadherin induces mesenchymal-to-epithelial

transition in human ovarian surface epithelium. Proc Natl Acad Sci U S A 96, 6249-6254.

Australian Institute of Health and Welfare, and Australasian Association of Cancer

Registries (2001). Cancer survival in Australia, 2001: [relative survival data for selected

283

cancers for the period 1982 to 1997], (Canberra: Australian Institute of Health and

Welfare).

Axon (2004). GenePix Pro. In, pp. GenePix Pro is the complete standalone image

analysis software for microarrays, tissue arrays and cell arrays.

Baekelandt, M., Kristensen, G. B., Nesland, J. M., Trope, C. G., and Holm, R. (1999).

Clinical significance of apoptosis-related factors p53, Mdm2, and Bcl-2 in advanced

ovarian cancer. J Clin Oncol 17, 2061.

Baker, S. C., Bauer, S. R., Beyer, R. P., Brenton, J. D., Bromley, B., Burrill, J., Causton,

H., Conley, M. P., Elespuru, R., Fero, M., et al. (2005). The External RNA Controls

Consortium: a progress report. Nat Methods 2, 731-734.

Balazsi, G., Kay, K. A., Barabasi, A. L., and Oltvai, Z. N. (2003). Spurious spatial

periodicity of co-expression in microarray data due to printing design. Nucleic Acids Res

31, 4425-4433.

Bankhead, C. R., Kehoe, S. T., and Austoker, J. Symptoms associated with diagnosis of

ovarian cancer: a systematic review.

Bar, J. K., Grelewski, P., Popiela, A., Noga, L., and Rabczynski, J. (2004). Type IV

collagen and CD44v6 expression in benign, malignant primary and metastatic ovarian

tumors: correlation with Ki-67 and p53 immunoreactivity. Gynecol Oncol 95, 23-31.

Bardin, A., Hoffmann, P., Boulle, N., Katsaros, D., Vignon, F., Pujol, P., and Lazennec,

G. (2004). Involvement of estrogen receptor beta in ovarian carcinogenesis. Cancer Res

64, 5861-5869.

Barker, S. D., Coolidge, C. J., Kanerva, A., Hakkarainen, T., Yamamoto, M., Liu, B.,

Rivera, A. A., Bhoola, S. M., Barnes, M. N., Alvarez, R. D., et al. (2003). The secretory

leukoprotease inhibitor (SLPI) promoter for ovarian cancer gene therapy. J Gene Med 5,

300-310.

Barton, P. J., Cullen, M. E., Townsend, P. J., Brand, N. J., Mullen, A. J., Norman, D. A.,

Bhavsar, P. K., and Yacoub, M. H. (1999). Close physical linkage of human troponin

genes: organization, sequence, and expression of the locus encoding cardiac troponin I

and slow skeletal troponin T. Genomics 57, 102-109.

284

Bast, R. C., Jr., Feeney, M., Lazarus, H., Nadler, L. M., Colvin, R. B., and Knapp, R. C.

(1981). Reactivity of a monoclonal antibody with human ovarian carcinoma. J Clin Invest

68, 1331-1337.

Baty, F., Bihl, M. P., Perriere, G., Culhane, A. C., and Brutsche, M. H. (2005). Optimized

between-group classification: a new jackknife-based gene selection procedure for

genome-wide expression data. BMC Bioinformatics 6, 239.

Behtash, N., Modares, M., Abolhasani, M., Ghaemmaghami, F., Mousavi, M., Yarandi,

F., and Hanjani, P. (2004). Borderline ovarian tumours: clinical analysis of 38 cases. J

Obstet Gynaecol 24, 157-160.

Ben-Zur, T., Feige, E., Motro, B., and Wides, R. (2000). The mammalian Odz gene

family: homologs of a Drosophila pair-rule gene with expression implying distinct yet

overlapping developmental roles. Dev Biol 217, 107-120.

Benedet, J. L., Bender, H., Jones, H., 3rd, Ngan, H. Y., and Pecorelli, S. (2000). FIGO

staging classifications and clinical practice guidelines in the management of gynecologic

cancers. FIGO Committee on Gynecologic Oncology. Int J Gynaecol Obstet 70, 209-262.

Benetkiewicz, M., Wang, Y., Schaner, M., Wang, P., Mantripragada, K. K., Buckley, P.

G., Kristensen, G., Borresen-Dale, A. L., and Dumanski, J. P. (2005). High-resolution

gene copy number and expression profiling of human chromosome 22 in ovarian

carcinomas. Genes Chromosomes Cancer 42, 228-237.

Berchuck, A., Iversen, E. S., Lancaster, J. M., Dressman, H. K., West, M., Nevins, J. R.,

and Marks, J. R. (2004). Prediction of optimal versus suboptimal cytoreduction of

advanced-stage serous ovarian cancer with the use of microarrays. Am J Obstet Gynecol

190, 910-925.

Berek, J. S. (1995). Interval debulking of ovarian cancer--an interim measure. N Engl J

Med 332, 675-677.

Bergstrom, D. A., Penn, B. H., Strand, A., Perry, R. L., Rudnicki, M. A., and Tapscott, S.

J. (2002). Promoter-specific regulation of MyoD binding and signal transduction

cooperate to pattern gene expression. Mol Cell 9, 587-600.

Bilban, M., Buehler, L. K., Head, S., Desoye, G., and Quaranta, V. (2002). Normalizing

DNA microarray data. Curr Issues Mol Biol 4, 57-64.

285

Blaustein, A. (1982). Metastatic carcinoma in the ovary - Pathology of the female

genatalia tract, Second edn (New York: Stryer).

Boussioutas, A., Li, H., Liu, J., Waring, P., Lade, S., Holloway, A. J., Taupin, D.,

Gorringe, K., Haviv, I., Desmond, P. V., and Bowtell, D. D. (2003). Distinctive patterns

of gene expression in premalignant gastric mucosa and gastric cancer. Cancer Res 63,

2569-2577.

Bowtell, D. D. (1999). Options available--from start to finish--for obtaining expression

data by microarray. Nat Genet 21, 25-32.

Brakora, K. A., Lee, H., Yusuf, R., Sullivan, L., Harris, A., Colella, T., and Seiden, M. V.

(2004). Utility of osteopontin as a biomarker in recurrent epithelial ovarian cancer.

Gynecol Oncol 93, 361-365.

Branca, M. (2003). Genetics and medicine. Putting gene arrays to the test. Science 300,

238.

Brattsand, M., and Egelrud, T. (1999). Purification, molecular cloning, and expression of

a human stratum corneum trypsin-like serine protease with possible function in

desquamation. J Biol Chem 274, 30033-30040.

Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C.,

Aach, J., Ansorge, W., Ball, C. A., Causton, H. C., et al. (2001). Minimum information

about a microarray experiment (MIAME)-toward standards for microarray data. Nat

Genet 29, 365-371.

Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., Vilo, J., Abeygunawardena, N.,

Holloway, E., Kapushesky, M., Kemmeren, P., Lara, G. G., et al. (2003). ArrayExpress--

a public repository for microarray gene expression data at the EBI. Nucleic Acids Res 31,

68-71.

Bristow, R. E., Tomacruz, R. S., Armstrong, D. K., Trimble, E. L., and Montz, F. J.

(2002). Survival effect of maximal cytoreductive surgery for advanced ovarian carcinoma

during the platinum era: a meta-analysis. J Clin Oncol 20, 1248-1259.

Broberg, P. (2003). Statistical methods for ranking differentially expressed genes.

Genome Biol 4, R41.

286

Brown, M. P., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares,

M., Jr., and Haussler, D. (2000). Knowledge-based analysis of microarray gene

expression data by using support vector machines. Proc Natl Acad Sci U S A 97, 262-

267.

Browning, J. L., Ngam-ek, A., Lawton, P., DeMarinis, J., Tizard, R., Chow, E. P.,

Hession, C., O'Brine-Greco, B., Foley, S. F., and Ware, C. F. (1993). Lymphotoxin beta,

a novel member of the TNF family that forms a heteromeric complex with lymphotoxin

on the cell surface. Cell 72, 847-856.

Bull, J. H., Ellison, G., Patel, A., Muir, G., Walker, M., Underwood, M., Khan, F., and

Paskins, L. (2001). Identification of potential diagnostic markers of prostate cancer and

prostatic intraepithelial neoplasia using cDNA microarray. Br J Cancer 84, 1512-1519.

Bussemakers, M. J., van Bokhoven, A., Mees, S. G., Kemler, R., and Schalken, J. A.

(1993). Molecular cloning and characterization of the human E-cadherin cDNA. Mol Biol

Rep 17, 123-128.

Calvo, A., Xiao, N., Kang, J., Best, C. J., Leiva, I., Emmert-Buck, M. R., Jorcyk, C., and

Green, J. E. (2002). Alterations in gene expression profiles during prostate cancer

progression: functional correlations to tumorigenicity and down-regulation of

selenoprotein-P in mouse and human tumors. Cancer Res 62, 5325-5335.

Chan, Y. M., Ng, T. Y., Lee, P. W., Ngan, H. Y., and Wong, L. C. (2003). Symptoms,

coping strategies, and timing of presentations in patients with newly diagnosed ovarian

cancer. Gynecol Oncol 90, 651-656.

Chang, C.-C., Shih, J.-Y., Jeng, Y.-M., Su, J.-L., Lin, B.-Z., Chen, S.-T., Chau, Y.-P.,

Yang, P.-C., and Kuo, M.-L. (2004). Connective Tissue Growth Factor and Its Role in

Lung Adenocarcinoma Invasion and Metastasis. J Natl Cancer Inst 96, 364-375.

Chang, K., and Pastan, I. (1996). Molecular cloning of mesothelin, a differentiation

antigen present on mesothelium, mesotheliomas, and ovarian cancers. Proc Natl Acad Sci

U S A 93, 136-140.

Chen, J. J., Wu, R., Yang, P. C., Huang, J. Y., Sher, Y. P., Han, M. H., Kao, W. C., Lee,

P. J., Chiu, T. F., Chang, F., et al. (1998). Profiling expression patterns and isolating

differentially expressed genes by cDNA microarray system with colorimetry detection.

Genomics 51, 313-324.

287

Chenevix-Trench, G., Kerr, J., Hurst, T., Shih, Y. C., Purdie, D., Bergman, L.,

Friedlander, M., Sanderson, B., Zournazi, A., Coombs, T., et al. (1997). Analysis of loss

of heterozygosity and KRAS2 mutations in ovarian neoplasms: clinicopathological

correlations. Genes Chromosomes Cancer 18, 75-83.

Cheung, S. T., Leung, K. L., Ip, Y. C., Chen, X., Fong, D. Y., Ng, I. O., Fan, S. T., and

So, S. (2005). Claudin-10 expression level is associated with recurrence of primary

hepatocellular carcinoma. Clin Cancer Res 11, 551-556.

Clark, T. G., Stewart, M. E., Altman, D. G., Gabra, H., and Smyth, J. F. (2001). A

prognostic model for ovarian cancer. Br J Cancer 85, 944-952.

Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots.

Journal of the American Statistical Association, 829-836.

Colantuoni, C., Henry, G., Zeger, S., and Pevsner, J. (2002). Local mean normalization of

microarray element signal intensities across an array surface: quality control and

correction of spatially systematic artifacts. Biotechniques 32, 1316-1320.

Coleman, M. A., Eisen, J. A., and Mohrenweiser, H. W. (2000). Cloning and

characterization of HARP/SMARCAL1: a prokaryotic HepA-related SNF2 helicase

protein from human and mouse. Genomics 65, 274-282.

Courjal, F., Louason, G., Speiser, P., Katsaros, D., Zeillinger, R., and Theillet, C. (1996).

Cyclin gene amplification and overexpression in breast and ovarian cancers: evidence for

the selection of cyclin D1 in breast and cyclin E in ovarian tumors. Int J Cancer 69, 247-

253.

Cox, D. R. (1972). Regression models and life-tables (with discussion). Journal of the

Royal Statistical Society B, 187-220.

Croix, B. S., Rak, J. W., Kapitain, S., Sheehan, C., Graham, C. H., and Kerbel, R. S.

(1996). Reversal by hyaluronidase of adhesion-dependent multicellular drug resistance in

mammary carcinoma cells. J Natl Cancer Inst 88, 1285-1296.

Cruickshank, D. J., Paul, J., Lewis, C. R., McAllister, E. J., and Kaye, S. B. (1992). An

independent evaluation of the potential clinical usefulness of proposed CA-125 indices

previously shown to be of prognostic significance in epithelial ovarian cancer. Br J

Cancer 65, 597-600.

288

Cuatrecasas, M., Erill, N., Musulen, E., Costa, I., Matias-Guiu, X., and Prat, J. (1998). K-

ras mutations in nonmucinous ovarian epithelial tumors: a molecular analysis and

clinicopathologic study of 144 patients. Cancer 82, 1088-1095.

Cuello, M., Ettenberg, S. A., Nau, M. M., and Lipkowitz, S. (2001). Synergistic induction

of apoptosis by the combination of trail and chemotherapy in chemoresistant ovarian

cancer cells. Gynecol Oncol 81, 380-390.

Cui, X., and Churchill, G. A. (2003). Statistical tests for differential expression in cDNA

microarray experiments. Genome Biol 4, 210.

D'Andrilli, G., Kumar, C., Scambia, G., and Giordano, A. (2004). Cell cycle genes in

ovarian cancer: steps toward earlier diagnosis and novel therapies. Clin Cancer Res 10,

8132-8141.

Dano, K., Andreasen, P. A., Grondahl-Hansen, J., Kristensen, P., Nielsen, L. S., and

Skriver, L. (1985). Plasminogen activators, tissue degradation, and cancer. Adv Cancer

Res 44, 139-266.

Daraselia, N., Yuryev, A., Egorov, S., Novichkova, S., Nikitin, A., and Mazo, I. (2004).

Extracting human protein interactions from MEDLINE using a full-sentence parser.

Bioinformatics 20, 604-611.

Davies, B. R., Worsley, S. D., and Ponder, B. A. (1998). Expression of E-cadherin, alpha-

catenin and beta-catenin in normal ovarian surface epithelium and epithelial ovarian

cancers. Histopathology 32, 69-80.

Davies, H., Bignell, G. R., Cox, C., Stephens, P., Edkins, S., Clegg, S., Teague, J.,

Woffendin, H., Garnett, M. J., Bottomley, W., et al. (2002). Mutations of the BRAF gene

in human cancer. Nature 417, 949-954.

de Kok, J. B., Roelofs, R. W., Giesendorf, B. A., Pennings, J. L., Waas, E. T., Feuth, T.,

Swinkels, D. W., and Span, P. N. (2005). Normalization of gene expression

measurements in tumor tissues: comparison of 13 endogenous control genes. Lab Invest

85, 154-159.

Dhanasekaran, S. M., Barrette, T. R., Ghosh, D., Shah, R., Varambally, S., Kurachi, K.,

Pienta, K. J., Rubin, M. A., and Chinnaiyan, A. M. (2001). Delineation of prognostic

biomarkers in prostate cancer. Nature 412, 822-826.

289

Diatchenko, L., Lau, Y. F., Campbell, A. P., Chenchik, A., Moqadam, F., Huang, B.,

Lukyanov, S., Lukyanov, K., Gurskaya, N., Sverdlov, E. D., and Siebert, P. D. (1996).

Suppression subtractive hybridization: a method for generating differentially regulated or

tissue-specific cDNA probes and libraries. Proc Natl Acad Sci U S A 93, 6025-6030.

Dicioccio, R. A., Song, H., Waterfall, C., Kimura, M. T., Nagase, H., McGuire, V.,

Hogdall, E., Shah, M. N., Luben, R. N., Easton, D. F., et al. (2004). STK15

polymorphisms and association with risk of invasive ovarian cancer. Cancer Epidemiol

Biomarkers Prev 13, 1589-1594.

Diehn, M., Sherlock, G., Binkley, G., Jin, H., Matese, J. C., Hernandez-Boussard, T.,

Rees, C. A., Cherry, J. M., Botstein, D., Brown, P. O., and Alizadeh, A. A. (2003).

SOURCE: a unified genomic resource of functional annotations, ontologies, and gene

expression data. Nucleic Acids Res 31, 219-223.

Doll, A., and Grzeschik, K. H. (2001). Characterization of two novel genes, WBSCR20

and WBSCR22, deleted in Williams-Beuren syndrome. Cytogenet Cell Genet 95, 20-27.

Dong, Y., Kaushal, A., Brattsand, M., Nicklin, J., and Clements, J. A. (2003). Differential

splicing of KLK5 and KLK7 in epithelial ovarian cancer produces novel variants with

potential as cancer biomarkers. Clin Cancer Res 9, 1710-1720.

Donninger, H., Bonome, T., Radonovich, M., Pise-Masison, C. A., Brady, J., Shih, J. H.,

Barrett, J. C., and Birrer, M. J. (2004). Whole genome expression profiling of advance

stage papillary serous ovarian cancer reveals activated pathways. Oncogene.

Dossinger, V., Kayademir, T., Blin, N., and Gott, P. (2002). Down-regulation of TFF

expression in gastrointestinal cell lines by cytokines and nuclear factors. Cell Physiol

Biochem 12, 197-206.

Dudoit, S., Fridlyand, J., and Speed, T. (2002). Comparison of Discrimination Methods

for the Classification of Tumors Using Gene Expression Dat. Journal of the American

Statistical Association 97, 77-87.

Duffy, M. J., O'Grady, P., Devaney, D., O'Siorain, L., Fennelly, J. J., and Lijnen, H. J.

(1988). Urokinase-plasminogen activator, a marker for aggressive breast carcinomas.

Preliminary report. Cancer 62, 531-533.

290

Dyrskjot, L., Thykjaer, T., Kruhoffer, M., Jensen, J. L., Marcussen, N., Hamilton-Dutoit,

S., Wolf, H., and Orntoft, T. F. (2003). Identifying distinct classes of bladder carcinoma

using microarrays. Nat Genet 33, 90-96.

Edgar, R., Domrachev, M., and Lash, A. E. (2002). Gene Expression Omnibus: NCBI

gene expression and hybridization array data repository. Nucleic Acids Res 30, 207-210.

Eide, T., Coghlan, V., Orstavik, S., Holsve, C., Solberg, R., Skalhegg, B. S., Lamb, N. J.,

Langeberg, L., Fernandez, A., Scott, J. D., et al. (1998). Molecular cloning, chromosomal

localization, and cell cycle-dependent subcellular distribution of the A-kinase anchoring

protein, AKAP95. Exp Cell Res 238, 305-316.

Ein-Dor, L., Kela, I., Getz, G., Givol, D., and Domany, E. (2005). Outcome signature

genes in breast cancer: is there a unique set? Bioinformatics 21, 171-178.

Engel, J., Eckel, R., Schubert-Fritschle, G., Kerr, J., Kuhn, W., Diebold, J., Kimmig, R.,

Rehbock, J., and Holzel, D. (2002). Moderate progress for ovarian cancer in the last 20

years: prolongation of survival, but no improvement in the cure rate. Eur J Cancer 38,

2435-2445.

Fan, R. E., Chen, P. H., and J., L. C. (2005). Working set selection using the second order

information for training SVM. In, (Taiwan: Department of Computer Science, National

Taiwan University).

Farber, J. M. (1993). HuMig: a new human member of the chemokine family of

cytokines. Biochem Biophys Res Commun 192, 223-230.

Ferrandina, G., Legge, F., Martinelli, E., Ranelletti, F. O., Zannoni, G. F., Lauriola, L.,

Gessi, M., Gallotta, V., and Scambia, G. (2005). Survivin expression in ovarian cancer

and its correlation with clinico-pathological, surgical and apoptosis-related parameters. Br

J Cancer 92, 271-277.

Ferrell, B., Smith, S., Cullinane, C., and Melancon, C. (2003). Symptom concerns of

women with ovarian cancer. J Pain Symptom Manage 25, 528-538.

Fielden, M. R., Halgren, R. G., Dere, E., and Zacharewski, T. R. (2002). GP3: GenePix

post-processing program for automated analysis of raw microarray data. Bioinformatics

18, 771-773.

291

Fitch, M., Deane, K., Howell, D., and Gray, R. E. (2002). Women's experiences with

ovarian cancer: reflections on being diagnosed. Can Oncol Nurs J 12, 152-168.

Fletcher, G. C., Patel, S., Tyson, K., Adam, P. J., Schenker, M., Loader, J. A., Daviet, L.,

Legrain, P., Parekh, R., Harris, A. L., and Terrett, J. A. (2003). hAG-2 and hAG-3,

human homologues of genes involved in differentiation, are associated with oestrogen

receptor-positive breast tumours and interact with metastasis gene C4.4a and

dystroglycan. Br J Cancer 88, 579-585.

Foekens, J. A., Schmitt, M., van Putten, W. L., Peters, H. A., Bontenbal, M., Janicke, F.,

and Klijn, J. G. (1992). Prognostic value of urokinase-type plasminogen activator in 671

primary breast cancer patients. Cancer Res 52, 6101-6105.

Fort, M. G., Pierce, V. K., Saigo, P. E., Hoskins, W. J., and Lewis, J. L., Jr. (1989).

Evidence for the efficacy of adjuvant therapy in epithelial ovarian tumors of low

malignant potential. Gynecol Oncol 32, 269-272.

Franke, F. E., Von Georgi, R., Zygmunt, M., and Munstedt, K. (2003). Association

between fibronectin expression and prognosis in ovarian carcinoma. Anticancer Res 23,

4261-4267.

Friedlander, M. L. (1998). Prognostic factors in ovarian cancer. Semin Oncol 25, 305-

314.

Fujimoto, J., Ichigo, S., Hirose, R., Sakaguchi, H., and Tamaya, T. (1997). Expression of

E-cadherin and alpha- and beta-catenin mRNAs in uterine cervical cancers. Tumour Biol

18, 206-212.

Fujiwara, K., Ohishi, Y., Koike, H., Sawada, S., Moriya, T., and Kohno, I. (1995).

Clinical implications of metastases to the ovary. Gynecol Oncol 59, 124-128.

Gadducci, A., Carnino, F., Chiara, S., Brunetti, I., Tanganelli, L., Romanini, A.,

Bruzzone, M., and Conte, P. F. (2000). Intraperitoneal versus intravenous cisplatin in

combination with intravenous cyclophosphamide and epidoxorubicin in optimally

cytoreduced advanced epithelial ovarian cancer: a randomized trial of the Gruppo

Oncologico Nord-Ovest. Gynecol Oncol 76, 157-162.

292

Gadgil, M., Lian, W., Gadgil, C., Kapur, V., and Hu, W. S. (2005). An analysis of the use

of genomic DNA as a universal reference in two channel DNA microarrays. BMC

Genomics 6, 66.

Galmozzi, E., Tomassetti, A., Sforzini, S., Mangiarotti, F., Mazzi, M., Nachmanoff, K.,

Elwood, P. C., and Canevari, S. (2001). Exon 3 of the alpha folate receptor gene contains

a 5' splice site which confers enhanced ovarian carcinoma specific expression. FEBS Lett

502, 31-34.

Ganesh, S., Sier, C. F., Griffioen, G., Vloedgraven, H. J., de Boer, A., Welvaart, K., van

de Velde, C. J., van Krieken, J. H., Verheijen, J. H., Lamers, C. B., and et al. (1994).

Prognostic relevance of plasminogen activators and their inhibitors in colorectal cancer.

Cancer Res 54, 4065-4071.

Gardner, M. J., Jones, L. M., Catterall, J. B., and Turner, G. A. (1995). Expression of cell

adhesion molecules on ovarian tumour cell lines and mesothelial cells, in relation to

ovarian cancer metastasis. Cancer Lett 91, 229-234.

GenScript Corporation (2005). Real-Time PCR Primer Design. In, pp. GenScript's design

algorithm adopted a cross-exon-boundary design strategy (to minimize genomic sequence

contamination). The success rate for our design is more than 99%.

Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., Ellis,

B., Gautier, L., Ge, Y., Gentry, J., et al. (2004). Bioconductor: open software

development for computational biology and bioinformatics. Genome Biol 5, R80.

Gibson, G. (2002). Microarrays in ecology and evolution: a preview. Mol Ecol 11, 17-24.

Gilks, C. B. (2004). Subclassification of ovarian surface epithelial tumors based on

correlation of histologic and molecular pathologic data. Int J Gynecol Pathol 23, 200-205.

Gilks, C. B., Vanderhyden, B. C., Zhu, S., van de Rijn, M., and Longacre, T. A. (2005).

Distinction between serous tumors of low malignant potential and serous carcinomas

based on global mRNA expression profiling. Gynecol Oncol 96, 684-694.

Giordano, T. J., Shedden, K. A., Schwartz, D. R., Kuick, R., Taylor, J. M., Lee, N.,

Misek, D. E., Greenson, J. K., Kardia, S. L., Beer, D. G., et al. (2001). Organ-specific

molecular classification of primary lung, colon, and ovarian adenocarcinomas using gene

expression profiles. Am J Pathol 159, 1231-1238.

293

Giovannucci, E., Rimm, E. B., Wolk, A., Ascherio, A., Stampfer, M. J., Colditz, G. A.,

and Willett, W. C. (1998). Calcium and fructose intake in relation to risk of prostate

cancer. Cancer Res 58, 442-447.

Goedert, M., Cuenda, A., Craxton, M., Jakes, R., and Cohen, P. (1997). Activation of the

novel stress-activated protein kinase SAPK4 by cytokines and cellular stresses is

mediated by SKK3 (MKK6); comparison of its substrate specificity with that of other

SAP kinases. Embo J 16, 3563-3571.

Goff, B. A., Mandel, L., Muntz, H. G., and Melancon, C. H. (2000). Ovarian carcinoma

diagnosis. Cancer 89, 2068-2075.

Golub, T. R. (2001). Genomic approaches to the pathogenesis of hematologic

malignancy. Curr Opin Hematol 8, 252-261.

Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P.,

Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., et al. (1999). Molecular

classification of cancer: class discovery and class prediction by gene expression

monitoring. Science 286, 531-537.

Goodman, M. T., Wu, A. H., Tung, K. H., McDuffie, K., Kolonel, L. N., Nomura, A. M.,

Terada, K., Wilkens, L. R., Murphy, S., and Hankin, J. H. (2002). Association of dairy

products, lactose, and calcium with the risk of ovarian cancer. Am J Epidemiol 156, 148-

157.

Gordon, G. J., Jensen, R. V., Hsiao, L. L., Gullans, S. R., Blumenstock, J. E., Richards,

W. G., Jaklitsch, M. T., Sugarbaker, D. J., and Bueno, R. (2003). Using gene expression

ratios to predict outcome among patients with mesothelioma. J Natl Cancer Inst 95, 598-

605.

Green, S., Walter, P., Kumar, V., Krust, A., Bornert, J. M., Argos, P., and Chambon, P.

(1986). Human oestrogen receptor cDNA: sequence, expression and homology to v-erb-

A. Nature 320, 134-139.

Green, S. K., Francia, G., Isidoro, C., and Kerbel, R. S. (2004). Antiadhesive antibodies

targeting E-cadherin sensitize multicellular tumor spheroids to chemotherapy in vitro.

Mol Cancer Ther 3, 149-159.

294

Grossi, M., Quinn, M. A., Thursfield, V. J., Francis, P. A., Rome, R. M., Planner, R. S.,

and Giles, G. G. (2002). Ovarian cancer: patterns of care in Victoria during 1993-1995.

Med J Aust 177, 11-16.

Gueguen, L., and Pointillart, A. (2000). The bioavailability of dietary calcium. J Am Coll

Nutr 19, 119S-136S.

Hall, J., Paul, J., and Brown, R. (2004). Critical evaluation of p53 as a prognostic marker

in ovarian cancer. Expert Rev Mol Med 2004, 1-20.

Han, H. J., Tokino, T., and Nakamura, Y. (1998). CSR, a scavenger receptor-like protein

with a protective role against cellular damage causedby UV irradiation and oxidative

stress. Hum Mol Genet 7, 1039-1046.

Hanahan, D., and Weinberg, R. A. (2000). The hallmarks of cancer. Cell 100, 57-70.

Harper, P. (2002). Current clinical practices for ovarian cancers. Semin Oncol 29, 3-6.

Harries, M., and Gore, M. (2002a). Part I: chemotherapy for epithelial ovarian cancer-

treatment at first diagnosis. Lancet Oncol 3, 529-536.

Harries, M., and Gore, M. (2002b). Part II: chemotherapy for epithelial ovarian cancer-

treatment of recurrent disease. Lancet Oncol 3, 537-545.

Hartmann, L. C., Lu, K. H., Linette, G. P., Cliby, W. A., Kalli, K. R., Gershenson, D.,

Bast, R. C., Stec, J., Iartchouk, N., Smith, D. I., et al. (2005). Gene expression profiles

predict early relapse in ovarian cancer after platinum-Paclitaxel chemotherapy. Clin

Cancer Res 11, 2149-2155.

Hasegawa, S., Furukawa, Y., Li, M., Satoh, S., Kato, T., Watanabe, T., Katagiri, T.,

Tsunoda, T., Yamaoka, Y., and Nakamura, Y. (2002). Genome-Wide Analysis of Gene

Expression in Intestinal-Type Gastric Cancers Using a Complementary DNA Microarray

Representing 23,040 Genes. Cancer Res 62, 7012-7017.

Hashimoto, M., Niwa, O., Nitta, Y., Takeichi, M., and Yokoro, K. (1989). Unstable

expression of E-cadherin adhesion molecules in metastatic ovarian tumor cells. Jpn J

Cancer Res 80, 459-463.

Hastie, T., Tibshirani, R., and Friedman, J. (2001). The elements of statistical learning,

(New York, USA: Springer Publishing).

295

He, J., and Baum, L. G. (2004). Presentation of galectin-1 by extracellular matrix triggers

T cell death. J Biol Chem 279, 4705-4712.

Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P.,

Gusterson, B., Esteller, M., Kallioniemi, O. P., et al. (2001). Gene-expression profiles in

hereditary breast cancer. N Engl J Med 344, 539-548.

Heighway, J., Betticher, D. C., Hoban, P. R., Altermatt, H. J., and Cowen, R. (1996).

Coamplification in tumors of KRAS2, type 2 inositol 1,4,5 triphosphate receptor gene,

and a novel human gene, KRAG. Genomics 35, 207-214.

Heinzelmann-Schwarz, V. A., Gardiner-Garden, M., Henshall, S. M., Scurry, J., Scolyer,

R. A., Davies, M. J., Heinzelmann, M., Kalish, L. H., Bali, A., Kench, J. G., et al. (2004).

Overexpression of the cell adhesion molecules DDR1, Claudin 3, and Ep-CAM in

metaplastic ovarian epithelium and ovarian cancer. Clin Cancer Res 10, 4427-4436.

Hernandez, E., Rosenshein, N. B., Bhagavan, B. S., and Parmley, T. H. (1984). Tumor

heterogeneity and histopathology in epithelial ovarian cancer. Obstet Gynecol 63, 330-

334.

Herrero, J., Al-Shahrour, F., Diaz-Uriarte, R., Mateos, A., Vaquerizas, J. M., Santoyo, J.,

and Dopazo, J. (2003). GEPAS: A web-based resource for microarray gene expression

data analysis. Nucleic Acids Res 31, 3461-3467.

Hess, V., A'Hern, R., Nasiri, N., King, D. M., Blake, P. R., Barton, D. P., Shepherd, J. H.,

Ind, T., Bridges, J., Harrington, K., et al. (2004). Mucinous epithelial ovarian cancer: a

separate entity requiring specific treatment. J Clin Oncol 22, 1040-1044.

Hippo, Y., Taniguchi, H., Tsutsumi, S., Machida, N., Chong, J. M., Fukayama, M.,

Kodama, T., and Aburatani, H. (2002). Global gene expression analysis of gastric cancer

by oligonucleotide microarrays. Cancer Res 62, 233-240.

Hippo, Y., Yashiro, M., Ishii, M., Taniguchi, H., Tsutsumi, S., Hirakawa, K., Kodama,

T., and Aburatani, H. (2001). Differential gene expression profiles of scirrhous gastric

cancer cells with high metastatic potential to peritoneum or lymph nodes. Cancer Res 61,

889-895.

296

Hiroyama, M., and Takenawa, T. (1999). Isolation of a cDNA encoding human

lysophosphatidic acid phosphatase that is involved in the regulation of mitochondrial lipid

biosynthesis. J Biol Chem 274, 29172-29180.

Hiscox, S., and Jiang, W. G. (1999). Ezrin regulates cell-cell and cell-matrix adhesion, a

possible role with E-cadherin/beta-catenin. J Cell Sci 112 Pt 18, 3081-3090.

Hoffmann, R., Seidl, T., and Dugas, M. (2002). Profound effect of normalization on

detection of differentially expressed genes in oligonucleotide microarray data analysis.

Genome Biol 3, RESEARCH0033.

Hofmann, R., Lehmer, A., Buresch, M., Hartung, R., and Ulm, K. (1996). Clinical

relevance of urokinase plasminogen activator, its receptor, and its inhibitor in patients

with renal cell carcinoma. Cancer 78, 487-492.

Holloway, A. J., van Laar, R. K., Tothill, R. W., and Bowtell, D. D. (2002). Options

available--from start to finish--for obtaining data from DNA microarrays II. Nat Genet 32

Suppl, 481-489.

Holter, N. S., Mitra, M., Maritan, A., Cieplak, M., Banavar, J. R., and Fedoroff, N. V.

(2000). Fundamental patterns underlying gene expression profiles: simplicity from

complexity. Proc Natl Acad Sci U S A 97, 8409-8414.

Hosack, D., Dennis, G., Sherman, B., Lane, H., and Lempicki, R. (2003). Identifying

biological themes within lists of genes with EASE. Genome Biology 4, R70.

Hoskins, W. J., Bundy, B. N., Thigpen, J. T., and Omura, G. A. (1992). The influence of

cytoreductive surgery on recurrence-free interval and survival in small-volume stage III

epithelial ovarian cancer: a Gynecologic Oncology Group study. Gynecol Oncol 47, 159-

166.

Hoskins, W. J., McGuire, W. P., Brady, M. F., Homesley, H. D., Creasman, W. T.,

Berman, M., Ball, H., and Berek, J. S. (1994). The effect of diameter of largest residual

disease on survival after primary cytoreductive surgery in patients with suboptimal

residual epithelial ovarian carcinoma. Am J Obstet Gynecol 170, 974-979; discussion

979-980.

Hough, C. D., Cho, K. R., Zonderman, A. B., Schwartz, D. R., and Morin, P. J. (2001).

Coordinately Up-Regulated Genes in Ovarian Cancer. Cancer Res 61, 3869-3876.

297

Hu, R. J., Lee, M. P., Connors, T. D., Johnson, L. A., Burn, T. C., Su, K., Landes, G. M.,

and Feinberg, A. P. (1997). A 2.5-Mb transcript map of a tumor-suppressing

subchromosomal transferable fragment from 11p15.5, and isolation and sequence analysis

of three novel genes. Genomics 46, 9-17.

Huang, E., Cheng, S. H., Dressman, H., Pittman, J., Tsou, M. H., Horng, C. F., Bild, A.,

Iversen, E. S., Liao, M., Chen, C. M., et al. (2003). Gene expression predictors of breast

cancer outcomes. Lancet 361, 1590-1596.

Huang, W. C., Taylor, S., Nguyen, T. B., Tomaszewski, J. E., Libertino, J. A.,

Malkowicz, S. B., and McGarvey, T. W. (2002). KIAA1096, a gene on chromosome 1q,

is amplified and overexpressed in bladder cancer. DNA Cell Biol 21, 707-715.

Humtsoe, J. O., Feng, S., Thakker, G. D., Yang, J., Hong, J., and Wary, K. K. (2003).

Regulation of cell-cell interactions by phosphatidic acid phosphatase 2b/VCIP. Embo J

22, 1539-1554.

Iacobuzio-Donahue, C. A., Argani, P., Hempen, P. M., Jones, J., and Kern, S. E. (2002).

The desmoplastic response to infiltrating breast carcinoma: gene expression at the site of

primary invasion and implications for comparisons between tumor types. Cancer Res 62,

5351-5357.

Igoe, B. A. (1997). Symptoms attributed to ovarian cancer by women with the disease.

Nurse Pract 22, 122, 127-128, 130 passim.

Ihaka, R. G. a. R. (1997). R. In.

Ikeda, K., Sakai, K., Yamamoto, R., Hareyama, H., Tsumura, N., Watari, H., Shimizu,

M., Minakami, H., and Sakuragi, N. (2003). Multivariate analysis for prognostic

significance of histologic subtype, GST-pi, MDR-1, and p53 in stages II-IV ovarian

cancer. Int J Gynecol Cancer 13, 776-784.

Ikeda, M., Ishida, O., Hinoi, T., Kishida, S., and Kikuchi, A. (1998). Identification and

characterization of a novel protein interacting with Ral-binding protein 1, a putative

effector protein of Ral. J Biol Chem 273, 814-821.

Imamura, Y., Scott, I. C., and Greenspan, D. S. (2000). The pro-alpha3(V) collagen

chain. Complete primary structure, expression domains in adult and developing tissues,

298

and comparison to the structures and expression domains of the other types V and XI

procollagen chains. J Biol Chem 275, 8749-8759.

International Federation of Gynecology and Obstetrics (1971). Classification and staging

of malignant tumours in the female pelvis. Acta Obstet Gynecol Scand 50, 1-7.

Ishii, M., Hashimoto, S., Tsutsumi, S., Wada, Y., Matsushima, K., Kodama, T., and

Aburatani, H. (2000). Direct comparison of GeneChip and SAGE on the quantitative

accuracy in transcript profiling analysis. Genomics 68, 136-143.

Jalkanen, S., and Jalkanen, M. (1992). Lymphocyte CD44 binds the COOH-terminal

heparin-binding domain of fibronectin. J Cell Biol 116, 817-825.

Jazaeri, A. A., Lu, K., Schmandt, R., Harris, C. P., Rao, P. H., Sotiriou, C.,

Chandramouli, G. V., Gershenson, D. M., and Liu, E. T. (2003). Molecular determinants

of tumor differentiation in papillary serous ovarian carcinoma. Mol Carcinog 36, 53-59.

Jazaeri, A. A., Yee, C. J., Sotiriou, C., Brantley, K. R., Boyd, J., and Liu, E. T. (2002).

Gene expression profiles of BRCA1-linked, BRCA2-linked, and sporadic ovarian

cancers. J Natl Cancer Inst 94, 990-1000.

Jenson, S. D., Robetorye, R. S., Bohling, S. D., Schumacher, J. A., Morgan, J. W., Lim,

M. S., and Elenitoba-Johnson, K. S. (2003). Validation of cDNA microarray gene

expression data obtained from linearly amplified RNA. Mol Pathol 56, 307-312.

Jenssen, T. K., Kuo, W. P., Stokke, T., and Hovig, E. (2002). Associations between gene

expressions in breast cancer and patient survival. Hum Genet 111, 411-420.

Ji, H., Isacson, C., Seidman, J. D., Kurman, R. J., and Ronnett, B. M. (2002).

Cytokeratins 7 and 20, Dpc4, and MUC5AC in the distinction of metastatic mucinous

carcinomas in the ovary from primary ovarian mucinous tumors: Dpc4 assists in

identifying metastatic pancreatic carcinomas. Int J Gynecol Pathol 21, 391-400.

Jones, P. G., Lombardi, S. J., and Cockett, M. I. (1998). Cloning and tissue distribution of

the human G protein beta 5 cDNA. Biochim Biophys Acta 1402, 288-291.

Joshi, S. G., Bank, J. F., Henriques, E. S., Makarachi, A., and Matties, G. (1982). Serum

levels of a progestagen-associated endometrial protein during the menstrual cycle and

pregnancy. J Clin Endocrinol Metab 55, 642-648.

299

Jun S, V. K., Spriggs D (2003). Adhesion mediated drug resistance in ovarian cancer cell

lines. Paper presented at: 2003 ASCO Annual Meeting (ASCO).

Jung, L., Holle, L., and Dalton, W. S. (2004). Discovery, Development, and clinical

applications of bortezomib. Oncology (Huntingt) 18, 4-13.

Kai, M., Wada, I., Imai, S., Sakane, F., and Kanoh, H. (1997). Cloning and

characterization of two human isozymes of Mg2+-independent phosphatidic acid

phosphatase. J Biol Chem 272, 24572-24578.

Kamarainen, M., Leivo, I., Koistinen, R., Julkunen, M., Karvonen, U., Rutanen, E. M.,

and Seppala, M. (1996). Normal human ovary and ovarian tumors express glycodelin, a

glycoprotein with immunosuppressive and contraceptive properties. Am J Pathol 148,

1435-1443.

Kan, T., Shimada, Y., Sato, F., Ito, T., Kondo, K., Watanabe, G., Maeda, M., Yamasaki,

S., Meltzer, S. J., and Imamura, M. (2004). Prediction of lymph node metastasis with use

of artificial neural networks based on gene expression profiles in esophageal squamous

cell carcinoma. Ann Surg Oncol 11, 1070-1078.

Kanehisa, M. (1997). A database for post-genome analysis. Trends Genet 13, 375-376.

Kanehisa, M., and Goto, S. (2000). KEGG: kyoto encyclopedia of genes and genomes.

Nucleic Acids Res 28, 27-30.

Karlan, B. Y. (1995). Screening for ovarian cancer: what are the optimal surrogate

endpoints for clinical trials? J Cell Biochem Suppl 23, 227-232.

Khan, J., Simon, R., Bittner, M., Chen, Y., Leighton, S. B., Pohida, T., Smith, P. D.,

Jiang, Y., Gooden, G. C., Trent, J. M., and Meltzer, P. S. (1998). Gene expression

profiling of alveolar rhabdomyosarcoma with cDNA microarrays. Cancer Res 58, 5009-

5013.

Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F.,

Schwab, M., Antonescu, C. R., Peterson, C., and Meltzer, P. S. (2001). Classification and

diagnostic prediction of cancers using gene expression profiling and artificial neural

networks. Nat Med 7, 673-679.

Kihara, C., Tsunoda, T., Tanaka, T., Yamana, H., Furukawa, Y., Ono, K., Kitahara, O.,

Zembutsu, H., Yanagawa, R., Hirata, K., et al. (2001). Prediction of sensitivity of

300

esophageal tumors to adjuvant chemotherapy by cDNA microarray analysis of gene-

expression profiles. Cancer Res 61, 6474-6479.

Kim, H., Zhao, B., Snesrud, E. C., Haas, B. J., Town, C. D., and Quackenbush, J.

(2002a). Use of RNA and genomic DNA references for inferred comparisons in DNA

microarray analyses. Biotechniques 33, 924-930.

Kim, J. H., Herlyn, D., Wong, K. K., Park, D. C., Schorge, J. O., Lu, K. H., Skates, S. J.,

Cramer, D. W., Berkowitz, R. S., and Mok, S. C. (2003). Identification of epithelial cell

adhesion molecule autoantibody in patients with ovarian cancer. Clin Cancer Res 9,

4782-4791.

Kim, J. H., Kim, H. Y., and Lee, Y. S. (2001). A novel method using edge detection for

signal extraction from cDNA microarray image analysis. Exp Mol Med 33, 83-88.

Kim, J. H., Skates, S. J., Uede, T., Wong Kk, K. K., Schorge, J. O., Feltmate, C. M.,

Berkowitz, R. S., Cramer, D. W., and Mok, S. C. (2002b). Osteopontin as a potential

diagnostic biomarker for ovarian cancer. Jama 287, 1671-1679.

King, H. C., and Sinha, A. A. (2001). Gene expression profile analysis by DNA

microarrays: promise and pitfalls. Jama 286, 2280-2288.

Klappacher, G. W., Lunyak, V. V., Sykes, D. B., Sawka-Verhelle, D., Sage, J., Brard, G.,

Ngo, S. D., Gangadharan, D., Jacks, T., Kamps, M. P., et al. (2002). An induced Ets

repressor complex regulates growth arrest during terminal macrophage differentiation.

Cell 109, 169-180.

Klemsz, M., Hromas, R., Raskind, W., Bruno, E., and Hoffman, R. (1994). PE-1, a novel

ETS oncogene family member, localizes to chromosome 1q21-q23. Genomics 20, 291-

294.

Kliman, L., Rome, R. M., and Fortune, D. W. (1986). Low malignant potential tumors of

the ovary: a study of 76 cases. Obstet Gynecol 68, 338-344.

Kluger, H. M., Kluger, Y., Gilmore-Hebert, M., DiVito, K., Chang, J. T., Rodov, S.,

Mironenko, O., Kacinski, B. M., Perkins, A. S., and Sapi, E. (2004). cDNA microarray

analysis of invasive and tumorigenic phenotypes in a breast cancer model. Lab Invest 84,

320-331.

301

Kobayashi, H., Man, S., Graham, C. H., Kapitain, S. J., Teicher, B. A., and Kerbel, R. S.

(1993). Acquired multicellular-mediated resistance to alkylating agents in cancer. Proc

Natl Acad Sci U S A 90, 3294-3298.

Kodama, J., Hashimoto, I., Seki, N., Hongo, A., Yoshinouchi, M., Okuda, H., and Kudo,

T. (2001). Thrombospondin-1 and -2 messenger RNA expression in epithelial ovarian

tumor. Anticancer Res 21, 2983-2987.

Kohsaki, T., Nishimori, I., Nakayama, H., Miyazaki, E., Enzan, H., Nomoto, M.,

Hollingsworth, M. A., and Onishi, S. (2000). Expression of UDP-GalNAc: polypeptide

N-acetylgalactosaminyltransferase isozymes T1 and T2 in human colorectal cancer. J

Gastroenterol 35, 840-848.

Konecny, G., Untch, M., Pihan, A., Kimmig, R., Gropp, M., Stieber, P., Hepp, H.,

Slamon, D., and Pegram, M. (2001). Association of Urokinase-Type Plasminogen

Activator and Its Inhibitor with Disease Progression and Prognosis in Ovarian Cancer.

Clin Cancer Res 7, 1743-1749.

Konno, R., Yamada-Okabe, H., Fujiwara, H., Uchiide, I., Shibahara, H., Ohwada, M.,

Ihara, T., Sugamata, M., and Suzuki, M. (2003). Role of immunoreactions and mast cells

in pathogenesis of human endometriosis--morphologic study and gene expression

analysis. Hum Cell 16, 141-149.

Kononen, J., Bubendorf, L., Kallioniemi, A., Barlund, M., Schraml, P., Leighton, S.,

Torhorst, J., Mihatsch, M. J., Sauter, G., and Kallioniemi, O. P. (1998). Tissue

microarrays for high-throughput molecular profiling of tumor specimens. Nat Med 4,

844-847.

Korn, E. L., Habermann, J. K., Upender, M. B., Ried, T., and McShane, L. M. (2004a).

Objective method of comparing DNA microarray image analysis systems. Biotechniques

36, 960-967.

Korn, E. L., Troendle, J. F., McShane, L. M. L. M., and Simon, R. (2004b). Controlling

the number of false discoveries: application to high-dimensional genomic data. 124, 379.

Kornblihtt, A. R., Umezawa, K., Vibe-Pedersen, K., and Baralle, F. E. (1985). Primary

structure of human fibronectin: differential splicing may generate at least 10 polypeptides

from a single gene. Embo J 4, 1755-1759.

302

Koyano-Nakagawa, N., Nishida, J., Baldwin, D., Arai, K., and Yokota, T. (1994).

Molecular cloning of a novel human cDNA encoding a zinc finger protein that binds to

the interleukin-3 promoter. Mol Cell Biol 14, 5099-5107.

Kristensen, G. B., Kildal, W., Abeler, V. M., Kaern, J., Vergote, I., Trope, C. G., and

Danielsen, H. E. (2003). Large-scale genomic instability predicts long-term outcome for

women with invasive stage I ovarian cancer. Ann Oncol 14, 1494-1500.

Kristiansen, G., Pilarsky, C., Wissmann, C., Kaiser, S., Bruemmendorf, T., Roepcke, S.,

Dahl, E., Hinzmann, B., Specht, T., Pervan, J., et al. (2005). Expression profiling of

microdissected matched prostate cancer samples reveals CD166/MEMD and CD24 as

new prognostic markers for patient survival. J Pathol 205, 359-376.

Krol, J., Kopitz, C., Kirschenhofer, A., Schmitt, M., Magdolen, U., Kruger, A., and

Magdolen, V. (2003). Inhibition of intraperitoneal tumor growth of human ovarian cancer

cells by bi- and trifunctional inhibitors of tumor-associated proteolytic systems. Biol

Chem 384, 1097-1102.

Kubota, K., Furuse, M., Sasaki, H., Sonoda, N., Fujita, K., Nagafuchi, A., and Tsukita, S.

(1999). Ca(2+)-independent cell-adhesion activity of claudins, a family of integral

membrane proteins localized at tight junctions. Curr Biol 9, 1035-1038.

Kuo, W. P., Jenssen, T. K., Butte, A. J., Ohno-Machado, L., and Kohane, I. S. (2002).

Analysis of matched mRNA measurements from two different microarray technologies.


Kurman, R. J. (2003). Ovarian Cancer: Diagnosis and Treatment, (Baltimore, USA: John

Hopkins Pathology).

Kurman, R. J., and Trimble, C. L. (1993). The behavior of serous tumors of low

malignant potential: are they ever malignant? Int J Gynecol Pathol 12, 120-127.

Kyriakopoulou, L. G., Yousef, G. M., Scorilas, A., Katsaros, D., Massobrio, M.,

Fracchioli, S., and Diamandis, E. P. (2003). Prognostic value of quantitatively assessed

KLK7 expression in ovarian cancer. Clin Biochem 36, 135-143.

Lakhani, S. R., Manek, S., Penault-Llorca, F., Flanagan, A., Arnout, L., Merrett, S.,

McGuffog, L., Steele, D., Devilee, P., Klijn, J. G., et al. (2004). Pathology of ovarian

cancers in BRCA1 and BRCA2 carriers. Clin Cancer Res 10, 2473-2481.

303

Lancaster, J. M., Dressman, H. K., Whitaker, R. S., Havrilesky, L., Gray, J., Marks, J. R.,

Nevins, J. R., and Berchuck, A. (2004). Gene expression patterns that characterize

advanced stage serous ovarian cancers. J Soc Gynecol Investig 11, 51-59.

Lancaster, J. M., Sayer, R., Blanchette, C., Calingaert, B., Whitaker, R., Schildkraut, J.,

Marks, J., and Berchuck, A. (2003). High Expression of Tumor Necrosis Factor-related

Apoptosis-inducing Ligand Is Associated with Favorable Ovarian Cancer Survival. Clin

Cancer Res 9, 762-766.

Lanotte, M., Martin-Thouvenin, V., Najman, S., Balerini, P., Valensi, F., and Berger, R.

(1991). NB4, a maturation inducible cell line with t(15;17) marker isolated from a human

acute promyelocytic leukemia (M3). Blood 77, 1080-1086.

Lassus, H., Laitinen, M. P., Anttonen, M., Heikinheimo, M., Aaltonen, L. A., Ritvos, O.,

and Butzow, R. (2001). Comparison of serous and mucinous ovarian carcinomas: distinct

pattern of allelic loss at distal 8p and expression of transcription factor GATA-4. Lab

Invest 81, 517-526.

Lataifeh, I., Marsden, D. E., Robertson, G., Gebski, V., and Hacker, N. F. (2005).

Presenting symptoms of epithelial ovarian cancer. Aust N Z J Obstet Gynaecol 45, 211-

214.

Lee, B. C., Cha, K., Avraham, S., and Avraham, H. K. (2003). Microarray analysis of

differentially expressed genes associated with human ovarian cancer. Int J Oncol 24, 847-

851.

Lee, K. R., and Scully, R. E. (2000). Mucinous tumors of the ovary: a clinicopathologic

study of 196 borderline tumors (of intestinal type) and carcinomas, including an

evaluation of 11 cases with 'pseudomyxoma peritonei'. Am J Surg Pathol 24, 1447-1464.

Lee, K. R., and Young, R. H. (2003). The distinction between primary and metastatic

mucinous carcinomas of the ovary: gross and histologic findings in 50 cases. Am J Surg

Pathol 27, 281-292.

Lee, M.-L. T. (2004). Analysis of microarray gene expression data, (Boston, Mass:

Kluwer; 2004.).

304

Lee, T. H., Lwu, S., Kim, J., and Pelletier, J. (2002). Inhibition of Wilms tumor 1

transactivation by bone marrow zinc finger 2, a novel transcriptional repressor. J Biol

Chem 277, 44826-44837.

Lehr, H. A., Mankoff, D. A., Corwin, D., Santeusanio, G., and Gown, A. M. (1997).

Application of photoshop-based image analysis to quantification of hormone receptor

expression in breast cancer. J Histochem Cytochem 45, 1559-1565.

Lehr, H. A., van der Loos, C. M., Teeling, P., and Gown, A. M. (1999). Complete

chromogen separation and analysis in double immunohistochemical stains using

Photoshop-based image analysis. J Histochem Cytochem 47, 119-126.

Lessan, K., Aguiar, D. J., Oegema, T., Siebenson, L., and Skubitz, A. P. (1999). CD44

and beta1 integrin mediate ovarian carcinoma cell adhesion to peritoneal mesothelial

cells. Am J Pathol 154, 1525-1537.

Li, X., and Tedder, T. F. (1999). CHST1 and CHST2 sulfotransferases expressed by

human vascular endothelial cells: cDNA cloning, expression, and chromosomal

localization. Genomics 55, 345-347.

Li, Y., Tang, Y., Ye, L., Liu, B., Liu, K., Chen, J., and Xue, Q. (2003). Establishment of a

hepatocellular carcinoma cell line with unique metastatic characteristics through in vivo

selection and screening for metastasis-related genes through cDNA microarray. J Cancer

Res Clin Oncol 129, 43-51.

Liang, P., and Pardee, A. B. (1992). Differential display of eukaryotic messenger RNA by

means of the polymerase chain reaction. Science 257, 967-971.

Liotta, L., and Petricoin, E. (2000). Molecular profiling of human cancer. Nat Rev Genet

1, 48-56.

Lipkin, M., and Newmark, H. L. (1999). Vitamin D, calcium and prevention of breast

cancer: a review. J Am Coll Nutr 18, 392S-397S.

Lisitsyn, N., Lisitsyn, N., and Wigler, M. (1993). Cloning the differences between two

complex genomes. Science 259, 946-951.

Liu, C. L., Prapong, W., Natkunam, Y., Alizadeh, A., Montgomery, K., Gilks, C. B., and

van de Rijn, M. (2002). Software tools for high-throughput analysis and archiving of

305

immunohistochemistry staining data obtained with tissue microarrays. Am J Pathol 161,

1557-1565.

Liu, D., Rudland, P. S., Sibson, D. R., Platt-Higgins, A., and Barraclough, R. (2005a).

Human homologue of cement gland protein, a novel metastasis inducer associated with

breast carcinomas. Cancer Res 65, 3796-3805.

Liu, K., Lei, X. Z., Zhao, L. S., Tang, H., Liu, L., Feng, P., and Lei, B. J. (2005b). Tissue

microarray for high-throughput analysis of gene expression profiles in hepatocellular

carcinoma. World J Gastroenterol 11, 1369-1372.

Lockhart, D. J., Dong, H., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S.,

Mittmann, M., Wang, C., Kobayashi, M., Horton, H., and Brown, E. L. (1996).

Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat

Biotechnol 14, 1675-1680.

Lopes, N., Gregg, D., Vasudevan, S., Hassanain, H., Goldschmidt-Clermont, P., and

Kovacic, H. (2003). Thrombospondin 2 regulates cell proliferation induced by Rac1

redox-dependent signaling. Mol Cell Biol 23, 5401-5408.

Lossos, I. S., Czerwinski, D. K., Alizadeh, A. A., Wechser, M. A., Tibshirani, R.,

Botstein, D., and Levy, R. (2004). Prediction of survival in diffuse large-B-cell

lymphoma based on the expression of six genes. N Engl J Med 350, 1828-1837.

Loy, T. S., Calaluce, R. D., and Keeney, G. L. (1996). Cytokeratin immunostaining in

differentiating primary ovarian carcinoma from metastatic colonic adenocarcinoma. Mod

Pathol 9, 1040-1044.

Lu, K. H., Patterson, A. P., Wang, L., Marquez, R. T., Atkinson, E. N., Baggerly, K. A.,

Ramoth, L. R., Rosen, D. G., Liu, J., Hellstrom, I., et al. (2004). Selection of potential

markers for epithelial ovarian cancer with gene expression arrays and recursive descent

partition analysis. Clin Cancer Res 10, 3291-3300.

Ludwig, D., Lorenz, J., Dejana, E., Bohlen, P., Hicklin, D. J., Witte, L., and Pytowski, B.

(2000). cDNA cloning, chromosomal mapping, and expression analysis of human VE-

Cadherin-2. Mamm Genome 11, 1030-1033.

Lund, J. (2003). Statistical significance of overlap of two groups of genes. In.

306

Lung, J. C., Chu, J. S., Yu, J. C., Yue, C. T., Lo, Y. L., Shen, C. Y., and Wu, C. W.

(2002). Aberrant expression of cell-cycle regulator cyclin D1 in breast cancer is related to

chromosomal genomic instability. Genes Chromosomes Cancer 34, 276-284.

Luquain, C., Singh, A., Wang, L., Natarajan, V., and Morris, A. J. (2003). Role of

phospholipase D in agonist-stimulated lysophosphatidic acid synthesis by ovarian cancer

cells. J Lipid Res 44, 1963-1975.

Lyng, H., Badiee, A., Svendsrud, D. H., Hovig, E., Myklebost, O., and Stokke, T. (2004).

Profound influence of microarray scanner characteristics on gene expression ratios:

analysis and procedure for correction. BMC Genomics 5, 10.

Ma, X. J., Salunga, R., Tuggle, J. T., Gaudet, J., Enright, E., McQuary, P., Payette, T.,

Pistone, M., Stecker, K., Zhang, B. M., et al. (2003). Gene expression profiles of human

breast cancer progression. Proc Natl Acad Sci U S A 100, 5974-5979.

Mackeigan, J. P., Murphy, L. O., and Blenis, J. (2005). Sensitized RNAi screen of human

kinases and phosphatases identifies new regulators of apoptosis and chemoresistance. Nat

Cell Biol.

Malander, S., Ridderheim, M., Masback, A., Loman, N., Kristoffersson, U., Olsson, H.,

Nilbert, M., and Borg, A. (2004). One in 10 ovarian cancer patients carry germ line

BRCA1 or BRCA2 mutations. results of a prospective study in Southern Sweden. Eur J

Cancer 40, 422-428.

Mandelin, E., Lassus, H., Seppala, M., Leminen, A., Gustafsson, J. A., Cheng, G.,

Butzow, R., and Koistinen, R. (2003). Glycodelin in ovarian serous carcinoma:

association with differentiation and survival. Cancer Res 63, 6258-6264.

Markman, M., Bundy, B. N., Alberts, D. S., Fowler, J. M., Clark-Pearson, D. L., Carson,

L. F., Wadler, S., and Sickel, J. (2001). Phase III trial of standard-dose intravenous

cisplatin plus paclitaxel versus moderately high-dose carboplatin followed by intravenous

paclitaxel and intraperitoneal cisplatin in small-volume stage III ovarian carcinoma: an

intergroup study of the Gynecologic Oncology Group, Southwestern Oncology Group,

and Eastern Cooperative Oncology Group. J Clin Oncol 19, 1001-1007.

Marsden, D. E., Friedlander, M., and Hacker, N. F. (2000). Current management of

epithelial ovarian carcinoma: a review. Semin Surg Oncol 19, 11-19.

307

Martin, K. J., and Pardee, A. B. (1999). Principles of differential display. Methods

Enzymol 303, 234-258.

Martin, T. A., Harrison, G., Mansel, R. E., and Jiang, W. G. (2003). The role of the

CD44/ezrin complex in cancer metastasis. Crit Rev Oncol Hematol 46, 165-186.

Martinez, M. E., and Willett, W. C. (1998). Calcium, vitamin D, and colorectal cancer: a

review of the epidemiologic evidence. Cancer Epidemiol Biomarkers Prev 7, 163-168.

Matias-Guiu, X., and Prat, J. (1998). Molecular pathology of ovarian carcinomas.

Virchows Arch 433, 103-111.

Matkowskyj, K. A., Schonfeld, D., and Benya, R. V. (2000). Quantitative

Immunohistochemistry by Measuring Cumulative Signal Strength Using Commercially

Available Software Photoshop and Matlab. J Histochem Cytochem 48, 303-312.

McGuire, W. P., Hoskins, W. J., Brady, M. F., Kucera, P. R., Partridge, E. E., Look, K.

Y., Clarke-Pearson, D. L., and Davidson, M. (1996). Cyclophosphamide and cisplatin

versus paclitaxel and cisplatin: a phase III randomized trial in patients with suboptimal

stage III/IV ovarian cancer (from the Gynecologic Oncology Group). Semin Oncol 23,

40-47.

McQuain, M. K., Seale, K., Peek, J., Fisher, T. S., Levy, S., Stremler, M. A., and

Haselton, F. R. (2004). Chaotic mixer improves microarray hybridization. Anal Biochem

325, 215-226.

Meden, H., and Kuhn, W. (1997). Overexpression of the oncogene c-erbB-2 (HER2/neu)

in ovarian cancer: a new prognostic factor. Eur J Obstet Gynecol Reprod Biol 71, 173-

179.

Meinhold-Heerlein, I., Bauerschlag, D., Hilpert, F., Dimitrov, P., Sapinoso, L. M.,

Orlowska-Volk, M., Bauknecht, T., Park, T. W., Jonat, W., Jacobsen, A., et al. (2005).

Molecular and prognostic distinction between serous ovarian carcinomas of varying grade

and malignant potential. Oncogene 24, 1053-1065.

Memarzadeh, S., Lee, S. B., Berek, J. S., and Farias-Eisner, R. (2003). CA125 levels are a

weak predictor of optimal cytoreductive surgery in patients with advanced epithelial

ovarian cancer. Int J Gynecol Cancer 13, 120-124.

308

Meyer, T., and Rustin, G. J. (2000). Role of tumour markers in monitoring epithelial

ovarian cancer. Br J Cancer 82, 1535-1538.

Miles, M. F. (2001). Microarrays: lost in a storm of data? Nat Rev Neurosci 2, 441-443.

Minitab Inc (2002). Minitab. In, (State College, PA, USA: Minitab Inc.).

Missiaglia, E., Blaveri, E., Terris, B., Wang, Y. H., Costello, E., Neoptolemos, J. P.,

Crnogorac-Jurcevic, T., and Lemoine, N. R. (2004). Analysis of gene expression in

cancer cell lines identifies candidate markers for pancreatic tumorigenesis and metastasis.

Int J Cancer 112, 100-112.

Mitsiades, N., Mitsiades, C. S., Poulaki, V., Chauhan, D., Fanourakis, G., Gu, X., Bailey,

C., Joseph, M., Libermann, T. A., Treon, S. P., et al. (2002). Molecular sequelae of

proteasome inhibition in human multiple myeloma cells. Proc Natl Acad Sci U S A 99,

14374-14379.

Mok, S. C., Chao, J., Skates, S., Wong, K., Yiu, G. K., Muto, M. G., Berkowitz, R. S.,

and Cramer, D. W. (2001). Prostasin, a potential serum marker for ovarian cancer:

identification through microarray technology. J Natl Cancer Inst 93, 1458-1464.

Moore, D. S., and McCabe, G. P. (2003). Introduction to the practice of statistics, 4th edn

(New York: W.H. Freeman and Co.).

Moren, A., Olofsson, A., Stenman, G., Sahlin, P., Kanzaki, T., Claesson-Welsh, L., ten

Dijke, P., Miyazono, K., and Heldin, C. H. (1994). Identification and characterization of

LTBP-2, a novel latent transforming growth factor-beta-binding protein. J Biol Chem

269, 32469-32478.

Motokura, T., Bloom, T., Kim, H. G., Juppner, H., Ruderman, J. V., Kronenberg, H. M.,

and Arnold, A. (1991). A novel cyclin encoded by a bcl1-linked candidate oncogene.

Nature 350, 512-515.

Mutch, D. M., Berger, A., Mansourian, R., Rytz, A., and Roberts, M. A. (2001).

Microarray data analysis: a practical approach for selecting differentially expressed

genes. Genome Biol 2, PREPRINT0009.

Nagase, T., Ishikawa, K., Kikuno, R., Hirosawa, M., Nomura, N., and Ohara, O. (1999).

Prediction of the coding sequences of unidentified human genes. XV. The complete

309

sequences of 100 new cDNA clones from brain which code for large proteins in vitro.

DNA Res 6, 337-345.

Nekarda, H., Siewert, J. R., Schmitt, M., and Ulm, K. (1994). Tumour-associated

proteolytic factors uPA and PAI-1 and survival in totally resected gastric cancer. Lancet

343, 117.

Nelson, N., and Harvey, W. R. (1999). Vacuolar and plasma membrane proton-

adenosinetriphosphatases. Physiol Rev 79, 361-385.

Newman, P. J., Berndt, M. C., Gorski, J., White, G. C., 2nd, Lyman, S., Paddock, C., and

Muller, W. A. (1990). PECAM-1 (CD31) cloning and relation to adhesion molecules of

the immunoglobulin gene superfamily. Science 247, 1219-1222.

Nikitin, A., Egorov, S., Daraselia, N., and Mazo, I. (2003). Pathway studio--the analysis

and navigation of molecular networks. Bioinformatics 19, 2155-2157.

Nikrui, N. (1981). Survey of clinical behavior of patients with borderline epithelial

tumors of the ovary. Gynecol Oncol 12, 107-119.

Nishizuka, S., Chen, S. T., Gwadry, F. G., Alexander, J., Major, S. M., Scherf, U.,

Reinhold, W. C., Waltham, M., Charboneau, L., Young, L., et al. (2003). Diagnostic

markers that distinguish colon and ovarian adenocarcinomas: identification by genomic,

proteomic, and tissue array profiling. Cancer Res 63, 5243-5250.

Notterman, D. A., Alon, U., Sierk, A. J., and Levine, A. J. (2001). Transcriptional gene

expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined

by oligonucleotide arrays. Cancer Res 61, 3124-3130.

Novoradovskaya, N., Whitfield, M. L., Basehore, L. S., Novoradovsky, A., Pesich, R.,

Usary, J., Karaca, M., Wong, W. K., Aprelikova, O., Fero, M., et al. (2004). Universal

Reference RNA as a standard for microarray experiments. BMC Genomics 5, 20.

Ntzani, E. E., and Ioannidis, J. P. (2003). Predictive ability of DNA microarrays for

cancer outcomes and correlates: an empirical assessment. Lancet 362, 1439-1444.

O'Dowd, B. F., Nguyen, T., Jung, B. P., Marchese, A., Cheng, R., Heng, H. H.,

Kolakowski, L. F., Jr., Lynch, K. R., and George, S. R. (1997). Cloning and chromosomal

mapping of four putative novel human G-protein-coupled receptor genes. Gene 187, 75-

81.

310

Obiezu, C. V., Scorilas, A., Katsaros, D., Massobrio, M., Yousef, G. M., Fracchioli, S.,

Rigault de la Longrais, I. A., Arisio, R., and Diamandis, E. P. (2001). Higher Human

Kallikrein Gene 4 (KLK4) Expression Indicates Poor Prognosis of Ovarian Cancer

Patients. Clin Cancer Res 7, 2380-2386.

Ochs, M. F., and Godwin, A. K. (2003). Microarrays in cancer: research and applications.

Biotechniques Suppl, 4-15.

Odeberg, J., Rosok, O., Gudmundsson, G. H., Ahmadian, A., Roshani, L., Williams, C.,

Larsson, C., Ponten, F., Uhlen, M., Asheim, H. C., and Lundeberg, J. (1998). Cloning and

characterization of ZNF189, a novel human Kruppel-like zinc finger gene localized to

chromosome 9q22-q31. Genomics 50, 213-221.

Oesterreich, S., Zhang, P., Guler, R. L., Sun, X., Curran, E. M., Welshons, W. V.,

Osborne, C. K., and Lee, A. V. (2001). Re-expression of estrogen receptor alpha in

estrogen receptor alpha-negative MCF-7 cells restores both estrogen and insulin-like

growth factor-mediated signaling and growth. Cancer Res 61, 5771-5777.

Oken, M. M., Creech, R. H., Tormey, D. C., Horton, J., Davis, T. E., McFadden, E. T.,

and Carbone, P. P. (1982). Toxicity and response criteria of the Eastern Cooperative

Oncology Group. Am J Clin Oncol 5, 649-655.

Ono, K., Tanaka, T., Tsunoda, T., Kitahara, O., Kihara, C., Okamoto, A., Ochiai, K.,

Takagi, T., and Nakamura, Y. (2000). Identification by cDNA microarray of genes

involved in ovarian carcinogenesis. Cancer Res 60, 5007-5011.

Orita, S., Sasaki, T., Naito, A., Komuro, R., Ohtsuka, T., Maeda, M., Suzuki, H., Igarashi,

H., and Takai, Y. (1995). Doc2: a novel brain protein having two repeated C2-like

domains. Biochem Biophys Res Commun 206, 439-448.

Ortiz, B. H., Ailawadi, M., Colitti, C., Muto, M. G., Deavers, M., Silva, E. G., Berkowitz,

R. S., Mok, S. C., and Gershenson, D. M. (2001). Second primary or recurrence?

Comparative patterns of p53 and K-ras mutations suggest that serous borderline ovarian

tumors and subsequent serous carcinomas are unrelated tumors. Cancer Res 61, 7264-

7267.

Pacifico, M. D., Grover, R., Richman, P., Daley, F., and Wilson, G. D. (2004). Validation

of tissue microarray for the immunohistochemical profiling of melanoma. Melanoma Res

14, 39-42.

311

Packard Bioscience (2000). QuantArray. In, (Billerica, USA: Packard Bioscience).

Paik, S., Shak, S., Tang, G., Kim, C., Baker, J., Cronin, M., Baehner, F. L., Walker, M.

G., Watson, D., Park, T., et al. (2004). A multigene assay to predict recurrence of

tamoxifen-treated, node-negative breast cancer. N Engl J Med 351, 2817-2826.

Park, T., Yi, S. G., Kang, S. H., Lee, S., Lee, Y. S., and Simon, R. (2003). Evaluation of

normalization methods for microarray data. BMC Bioinformatics 4, 33.

Parkin, D. M., and Muir, C. S. (1992). Cancer Incidence in Five Continents.

Comparability and quality of data. IARC Sci Publ, 45-173.

Patel, I. S., Madan, P., Getsios, S., Bertrand, M. A., and MacCalman, C. D. (2003).

Cadherin switching in ovarian cancer progression. Int J Cancer 106, 172-177.

Pathan, N., Marusawa, H., Krajewska, M., Matsuzawa, S., Kim, H., Okada, K., Torii, S.,

Kitada, S., Krajewski, S., Welsh, K., et al. (2001). TUCAN, an antiapoptotic caspase-

associated recruitment domain family protein overexpressed in cancer. J Biol Chem 276,

32220-32229.

Pedersen, H., Brunner, N., Francis, D., Osterlind, K., Ronne, E., Hansen, H. H., Dano, K.,

and Grondahl-Hansen, J. (1994). Prognostic impact of urokinase, urokinase receptor, and

type 1 plasminogen activator inhibitor in squamous and large cell lung cancer tissue.

Cancer Res 54, 4671-4675.

Perou, C. M., Sorlie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Rees, C. A.,

Pollack, J. R., Ross, D. T., Johnsen, H., Akslen, L. A., et al. (2000). Molecular portraits

of human breast tumours. Nature 406, 747-752.

Phillips, M. S., Fujii, J., Khanna, V. K., DeLeon, S., Yokobata, K., de Jong, P. J., and

MacLennan, D. H. (1996). The structural organization of the human skeletal muscle

ryanodine receptor (RYR1) gene. Genomics 34, 24-41.

Piccart, M. J., Bertelsen, K., James, K., Cassidy, J., Mangioni, C., Simonsen, E., Stuart,

G., Kaye, S., Vergote, I., Blom, R., et al. (2000). Randomized intergroup trial of

cisplatin-paclitaxel versus cisplatin-cyclophosphamide in women with advanced

epithelial ovarian cancer: three-year results. J Natl Cancer Inst 92, 699-708.

312

Pieretti, M., Hopenhayn-Rich, C., Khattar, N. H., Cao, Y., Huang, B., and Tucker, T. C.

(2002). Heterogeneity of ovarian cancer: relationships among histological group, stage of

disease, tumor markers, patient characteristics, and survival. Cancer Invest 20, 11-23.

Pinhasov, A., Mei, J., Amaratunga, D., Amato, F. A., Lu, H., Kauffman, J., Xin, H.,

Brenneman, D. E., Johnson, D. L., Andrade-Gordon, P., and Ilyin, S. E. (2004). Gene

expression analysis for high throughput screening applications. Comb Chem High

Throughput Screen 7, 133-140.

Player, A., Barrett, J. C., and Kawasaki, E. S. (2004). Laser capture microdissection,

microarrays and the precise definition of a cancer cell. Expert Rev Mol Diagn 4, 831-840.

Pokutta, S., Herrenknecht, K., Kemler, R., and Engel, J. (1994). Conformational changes

of the recombinant extracellular domain of E-cadherin upon calcium binding. Eur J

Biochem 223, 1019-1026.

Prat, J., and De Nictolis, M. (2002). Serous borderline tumors of the ovary: a long-term

follow-up study of 137 cases, including 18 with a micropapillary pattern and 20 with

microinvasion. Am J Surg Pathol 26, 1111-1128.

Puls, L. E., Powell, D. E., DePriest, P. D., Gallion, H. H., Hunter, J. E., Kryscio, R. J.,

and van Nagell, J. R., Jr. (1992). Transition from benign to malignant epithelium in

mucinous and serous ovarian cystadenocarcinoma. Gynecol Oncol 47, 53-57.

Puri, R., Tousson, A., Chen, L., and Kakar, S. S. (2001). Molecular cloning of pituitary

tumor transforming gene 1 from ovarian tumors and its expression in tumors. Cancer Lett

163, 131-139.

Qian, J., Kluger, Y., Yu, H., and Gerstein, M. (2003). Identification and correction of

spurious spatial correlations in microarray data. Biotechniques 35, 42-44, 46, 48.

Qin, L. X., and Kerr, K. F. (2004). Empirical evaluation of data transformations and

ranking statistics for microarray analysis. Nucleic Acids Res 32, 5471-5479.

Quackenbush, J. (2001). Computational analysis of microarray data. Nat Rev Genet 2,

418-427.

Quackenbush, J. (2002). Microarray data normalization and transformation. Nat Genet 32

Suppl, 496-501.

313

Radmacher, M. D., McShane, L. M., and Simon, R. (2002). A paradigm for class

prediction using gene expression profiles. J Comput Biol 9, 505-511.

Ramaswamy, S., Ross, K. N., Lander, E. S., and Golub, T. R. (2003). A molecular

signature of metastasis in primary solid tumors. Nat Genet 33, 49-54.

Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C. H., Angelo, M., Ladd,

C., Reich, M., Latulippe, E., Mesirov, J. P., et al. (2001). Multiclass cancer diagnosis

using tumor gene expression signatures. Proc Natl Acad Sci U S A 98, 15149-15154.

Ramdas, L., Coombes, K. R., Baggerly, K., Abruzzo, L., Highsmith, W. E., Krogmann,

T., Hamilton, S. R., and Zhang, W. (2001a). Sources of nonlinearity in cDNA microarray

expression measurements. Genome Biol 2, RESEARCH0047.

Ramdas, L., Wang, J., Hu, L., Cogdell, D., Taylor, E., and Zhang, W. (2001b).

Comparative evaluation of laser-based microarray scanners. Biotechniques 31, 546, 548,

550, passim.

Rao, G. G., Skinner, E., Gehrig, P. A., Duska, L. R., Coleman, R. L., and Schorge, J. O.

(2004). Surgical staging of ovarian low malignant potential tumors. Obstet Gynecol 104,

261-266.

Raychaudhuri, S., Stuart, J. M., and Altman, R. B. (2000). Principal components analysis

to summarize microarray experiments: application to sporulation time series. Pac Symp

Biocomput, 455-466.

Raymond, M. H., Schutte, B. C., Torner, J. C., Burns, T. L., and Willing, M. C. (1999).

Osteocalcin: genetic and physical mapping of the human gene BGLAP and its potential

role in postmenopausal osteoporosis. Genomics 60, 210-217.

Rebhan, M., Chalifa-Caspi, V., Prilusky, J., and Lancet, D. (1998). GeneCards: a novel

functional genomics compendium with automated data mining and query reformulation

support. Bioinformatics 14, 656-664.

Reich, M., Ohm, K., Angelo, M., Tamayo, P., and Mesirov, J. P. (2004). GeneCluster 2.0:

an advanced toolset for bioarray analysis. Bioinformatics 20, 1797-1798.

Reiner, A., Yekutieli, D., and Benjamini, Y. (2003). Identifying differentially expressed

genes using false discovery rate controlling procedures. Bioinformatics 19, 368-375.

314

Rhodes, D. A., Stammers, M., Malcherek, G., Beck, S., and Trowsdale, J. (2001). The

cluster of BTN genes in the extended major histocompatibility complex. Genomics 71,

351-362.

Rhodes, D. R., Barrette, T. R., Rubin, M. A., Ghosh, D., and Chinnaiyan, A. M. (2002).

Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals

pathway dysregulation in prostate cancer. Cancer Res 62, 4427-4433.

Rhodes, D. R., Yu, J., Shanker, K., Deshpande, N., Varambally, R., Ghosh, D., Barrette,

T., Pandey, A., and Chinnaiyan, A. M. (2004). Large-scale meta-analysis of cancer

microarray data identifies common transcriptional profiles of neoplastic transformation

and progression. Proc Natl Acad Sci U S A 101, 9309-9314.

Rieppi, M., Vergani, V., Gatto, C., Zanetta, G., Allavena, P., Taraboletti, G., and

Giavazzi, R. (1999). Mesothelial cells induce the motility of human ovarian carcinoma

cells. Int J Cancer 80, 303-307.

Ries LAG, E. M., Kosary CL, Hankey BF, Miller BA, Clegg L, Mariotto A, Feuer EJ,

Edwards BK (2004). SEER Cancer Statistics Review, 1975-2002. In, (Bethesda, MD).

Rihl, M., Baeten, D., Seta, N., Gu, J., De Keyser, F., Veys, E. M., Kuipers, J. G., Zeidler,

H., and Yu, D. T. (2004). Technical validation of cDNA based microarray as screening

technique to identify candidate genes in synovial tissue biopsy specimens from patients

with spondyloarthropathy. Ann Rheum Dis 63, 498-507.

Rodriguez, G. C., Haisley, C., Hurteau, J., Moser, T. L., Whitaker, R., Bast, R. C., Jr., and

Stack, M. S. (2001). Regulation of invasion of epithelial ovarian cancer by transforming

growth factor-beta. Gynecol Oncol 80, 245-253.

Roepman, P., Wessels, L. F., Kettelarij, N., Kemmeren, P., Miles, A. J., Lijnzaad, P.,

Tilanus, M. G., Koole, R., Hordijk, G. J., van der Vliet, P. C., et al. (2005). An expression

profile for diagnosis of lymph node metastases from primary head and neck squamous

cell carcinomas. Nat Genet 37, 182-186.

Rogojina, A. T., Orr, W. E., Song, B. K., and Geisert, E. E., Jr. (2003). Comparing the

use of Affymetrix to spotted oligonucleotide microarrays using two retinal pigment

epithelium cell lines. Mol Vis 9, 482-496.

315

Ronnett, B. M., Kajdacsy-Balla, A., Gilks, C. B., Merino, M. J., Silva, E., Werness, B. A.,

and Young, R. H. (2004). Mucinous borderline ovarian tumors: points of general

agreement and persistent controversies regarding nomenclature, diagnostic criteria, and

behavior. Hum Pathol 35, 949-960.

Ronnett, B. M., Kurman, R. J., Shmookler, B. M., Sugarbaker, P. H., and Young, R. H.

(1997). The morphologic spectrum of ovarian metastases of appendiceal

adenocarcinomas: a clinicopathologic and immunohistochemical analysis of tumors often

misinterpreted as primary ovarian tumors or metastatic tumors from other gastrointestinal

sites. Am J Surg Pathol 21, 1144-1155.

Roose, J., Korver, W., de Boer, R., Kuipers, J., Hurenkamp, J., and Clevers, H. (1999).

The Sox-13 gene: structure, promoter characterization, and chromosomal localization.

Genomics 57, 301-305.

Rosenwald, A., Wright, G., Chan, W. C., Connors, J. M., Campo, E., Fisher, R. I.,

Gascoyne, R. D., Muller-Hermelink, H. K., Smeland, E. B., Giltnane, J. M., et al. (2002).

The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-

cell lymphoma. N Engl J Med 346, 1937-1947.

Ross, D. T., and Perou, C. M. (2001). A comparison of gene expression signatures from

breast tumors and breast tissue derived cell lines. Dis Markers 17, 99-109.

Ross, D. T., Scherf, U., Eisen, M. B., Perou, C. M., Rees, C., Spellman, P., Iyer, V.,

Jeffrey, S. S., Van de Rijn, M., Waltham, M., et al. (2000). Systematic variation in gene

expression patterns in human cancer cell lines. Nat Genet 24, 227-235.

Ross, J. S., del Rosario, A. D., Figge, H. L., Sheehan, C., Fisher, H. A., and Bui, H. X.

(1995). E-cadherin expression in papillary transitional cell carcinoma of the urinary

bladder. Hum Pathol 26, 940-944.

Rump, A., Morikawa, Y., Tanaka, M., Minami, S., Umesaki, N., Takeuchi, M., and

Miyajima, A. (2004). Binding of ovarian cancer antigen CA125/MUC16 to mesothelin

mediates cell adhesion. J Biol Chem 279, 9190-9198.

Russell, S. E., and McCluggage, W. G. (2004). A multistep model for ovarian

tumorigenesis: the value of mutation analysis in the KRAS and BRAF genes. J Pathol

203, 617-619.

316

Rustin, G. J. (2004). Can we now agree to use the same definition to measure response

according to CA-125? J Clin Oncol 22, 4035-4036.

Rustin, G. J., Quinn, M., Thigpen, T., du Bois, A., Pujade-Lauraine, E., Jakobsen, A.,

Eisenhauer, E., Sagae, S., Greven, K., Vergote, I., et al. (2004). Re: New guidelines to

evaluate the response to treatment in solid tumors (ovarian cancer). J Natl Cancer Inst 96,

487-488.

Rustin, G. J. S., Nelstrop, A. E., Bentzen, S. M., Bond, S. J., and McClean, P. (2000).

Selection of Active Drugs for Ovarian Cancer Based on CA-125 and Standard Response

Rates in Phase II Trials. J Clin Oncol 18, 1733-1739.

Ryu, B., Jones, J., Blades, N. J., Parmigiani, G., Hollingsworth, M. A., Hruban, R. H.,

and Kern, S. E. (2002). Relationships and differentially expressed genes among

pancreatic cancers examined by large-scale serial analysis of gene expression. Cancer Res

62, 819-826.

Saal, L. H., Troein, C., Vallon-Christersson, J., Gruvberger, S., Borg, Å., and Peterson, C.

(2002). BioArray Software Environment: A Platform for Comprehensive Management

and Analysis of Microarray Data. Genome Biol 3, software0003.0001-0003.0006.

Sakamoto, M., Kondo, A., Kawasaki, K., Goto, T., Sakamoto, H., Miyake, K.,

Koyamatsu, Y., Akiya, T., Iwabuchi, H., Muroya, T., et al. (2001). Analysis of gene

expression profiles associated with cisplatin resistance in human ovarian cancer cell lines

and tissues using cDNA microarray. Hum Cell 14, 305-315.

Sakuragi, N., Nishiya, M., Ikeda, K., Ohkouch, T., Furth, E. E., Hareyama, H., Satoh, C.,

and Fujimoto, S. (1994). Decreased E-cadherin expression in endometrial carcinoma is

associated with tumor dedifferentiation and deep myometrial invasion. Gynecol Oncol

53, 183-189.

Sallinen, S. L., Sallinen, P. K., Haapasalo, H. K., Helin, H. J., Helen, P. T., Schraml, P.,

Kallioniemi, O. P., and Kononen, J. (2000). Identification of differentially expressed

genes in human gliomas by DNA microarray and tissue chip techniques. Cancer Res 60,

6617-6622.

Sambrook, J., and Bowtell, D. (2003). DNA microarrays: a molecular cloning manual,

(Cold Spring Harbor, N.Y.; [Great Britain]: Cold Spring Harbor Laboratory Press).

317

Santin, A. D., Zhan, F., Bellone, S., Palmieri, M., Cane, S., Bignotti, E., Anfossi, S.,

Gokden, M., Dunn, D., Roman, J. J., et al. (2004). Gene expression profiles in primary

ovarian serous papillary tumors and normal ovarian epithelium: identification of

candidate molecular markers for ovarian cancer diagnosis and therapy. Int J Cancer 112,

14-25.

Sarafan-Vasseur, N., Lamy, A., Bourguignon, J., Pessot, F. L., Hieter, P., Sesboue, R.,

Bastard, C., Frebourg, T., and Flaman, J. M. (2002). Overexpression of B-type cyclins

alters chromosomal segregation. Oncogene 21, 2051-2057.

Sasaki, C. Y., Lin, H., and Passaniti, A. (1999). Regulation of urokinase plasminogen

activator (uPA) activity by E-cadherin and hormones in mammary epithelial cells. J Cell

Physiol 181, 1-13.

Sawiris, G. P., Sherman-Baust, C. A., Becker, K. G., Cheadle, C., Teichberg, D., and

Morin, P. J. (2002). Development of a highly specialized cDNA array for the study and

diagnosis of epithelial ovarian cancer. Cancer Res 62, 2923-2928.

Sawka-Verhelle, D., Escoubet-Lozach, L., Fong, A. L., Hester, K. D., Herzig, S., Lebrun,

P., and Glass, C. K. (2004). PE-1/METS, an antiproliferative Ets repressor factor, is

induced by CREB-1/CREM-1 during macrophage differentiation. J Biol Chem 279,

17772-17784.

Schaner, M. E., Ross, D. T., Ciaravino, G., Sorlie, T., Troyanskaya, O., Diehn, M., Wang,

Y. C., Duran, G. E., Sikic, T. L., Caldeira, S., et al. (2003). Gene Expression Patterns in

Ovarian Carcinomas. Mol Biol Cell, E03-05-0279.

Schluter, C., Duchrow, M., Wohlenberg, C., Becker, M. H., Key, G., Flad, H. D., and

Gerdes, J. (1993). The cell proliferation-associated antigen of antibody Ki-67: a very

large, ubiquitous nuclear protein with numerous repeated elements, representing a new

kind of cell cycle-maintaining proteins. J Cell Biol 123, 513-522.

Schmitt, M., Harbeck, N., Thomssen, C., Wilhelm, O., Magdolen, V., Reuning, U., Ulm,

K., Hofler, H., Janicke, F., and Graeff, H. (1997). Clinical impact of the plasminogen

activation system in tumor invasion and metastasis: prognostic relevance and target for

therapy. Thromb Haemost 78, 285-296.

318

Schorge, J. O., Drake, R. D., Lee, H., Skates, S. J., Rajanbabu, R., Miller, D. S., Kim, J.

H., Cramer, D. W., Berkowitz, R. S., and Mok, S. C. (2004). Osteopontin as an adjunct to

CA125 in detecting recurrent ovarian cancer. Clin Cancer Res 10, 3474-3478.

Schuchhardt, J., Beule, D., Malik, A., Wolski, E., Eickhoff, H., Lehrach, H., and Herzel,

H. (2000). Normalization strategies for cDNA microarrays. Nucleic Acids Res 28, E47.

Schuyer, M., van der Burg, M. E., Henzen-Logmans, S. C., Fieret, J. H., Klijn, J. G.,

Look, M. P., Foekens, J. A., Stoter, G., and Berns, E. M. (2001). Reduced expression of

BAX is associated with poor prognosis in patients with epithelial ovarian cancer: a

multifactorial analysis of TP53, p21, BAX and BCL-2. Br J Cancer 85, 1359-1367.

Schwartz, D. R., Kardia, S. L., Shedden, K. A., Kuick, R., Michailidis, G., Taylor, J. M.,

Misek, D. E., Wu, R., Zhai, Y., Darrah, D. M., et al. (2002). Gene expression in ovarian

cancer reflects both morphology and biological behavior, distinguishing clear cell from

other poor-prognosis ovarian carcinomas. Cancer Res 62, 4722-4729.

Sedlacek, Z., Munstermann, E., Dhorne-Pollet, S., Otto, C., Bock, D., Schutz, G., and

Poustka, A. (1999). Human and mouse XAP-5 and XAP-5-like (X5L) genes:

identification of an ancient functional retroposon differentially expressed in testis.

Genomics 61, 125-132.

Seelenmeyer, C., Wegehingel, S., Lechner, J., and Nickel, W. (2003). The cancer antigen

CA125 represents a novel counter receptor for galectin-1. J Cell Sci 116, 1305-1318.

Seeler, J. S., Marchio, A., Losson, R., Desterro, J. M., Hay, R. T., Chambon, P., and

Dejean, A. (2001). Common properties of nuclear body protein SP100 and TIF1alpha

chromatin factor: role of SUMO modification. Mol Cell Biol 21, 3314-3324.

Seidman, J. D., Kurman, R. J., and Ronnett, B. M. (2003). Primary and metastatic

mucinous adenocarcinomas in the ovaries: incidence in routine practice with a new

approach to improve intraoperative diagnosis. Am J Surg Pathol 27, 985-993.

Seth, A., Kitching, R., Landberg, G., Xu, J., Zubovits, J., and Burger, A. M. (2003). Gene

expression profiling of ductal carcinomas in situ and invasive breast tumors. Anticancer

Res 23, 2043-2051.

Sevin, B. U., and Perras, J. P. (1997). Tumor heterogeneity and in vitro chemosensitivity

testing in ovarian cancer. Am J Obstet Gynecol 176, 759-766; discussion 766-758.

319

Shedden, K. A., Taylor, J. M., Giordano, T. J., Kuick, R., Misek, D. E., Rennert, G.,

Schwartz, D. R., Gruber, S. B., Logsdon, C., Simeone, D., et al. (2003). Accurate

molecular classification of human cancers based on gene expression using a simple

classifier with a pathological tree-based framework. Am J Pathol 163, 1985-1995.

Shen, Z. Y., Xu, L. Y., Chen, M. H., Li, E. M., Li, J. T., Wu, X. Y., and Zeng, Y. (2003).

Upregulated expression of Ezrin and invasive phenotype in malignantly transformed

esophageal epithelial cells. World J Gastroenterol 9, 1182-1186.

Sherlock, G., Hernandez-Boussard, T., Kasarskis, A., Binkley, G., Matese, J. C., Dwight,

S. S., Kaloper, M., Weng, S., Jin, H., Ball, C. A., et al. (2001). The Stanford Microarray

Database. Nucleic Acids Res 29, 152-155.

Sherman, M. E., Mink, P. J., Curtis, R., Cote, T. R., Brooks, S., Hartge, P., and Devesa,

S. (2004). Survival among women with borderline ovarian tumors and ovarian carcinoma:

a population-based analysis. Cancer 100, 1045-1052.

Shi, S. R., Cote, R. J., and Taylor, C. R. (2001). Antigen retrieval techniques: current

perspectives. J Histochem Cytochem 49, 931-937.

Shigemasa, K., Tanimoto, H., Underwood, L. J., Parmley, T. H., Arihiro, K., Ohama, K.,

and O'Brien, T. J. (2001). Expression of the protease inhibitor antileukoprotease and the

serine protease stratum corneum chymotryptic enzyme (SCCE) is coordinated in ovarian

tumors. Int J Gynecol Cancer 11, 454-461.

Shigemasa, K., Tian, X., Gu, L., Tanimoto, H., Underwood, L. J., O'Brien, T. J., and

Ohama, K. (2004). Human kallikrein 8 (hK8/TADG-14) expression is associated with an

early clinical stage and favorable prognosis in ovarian cancer. Oncol Rep 11, 1153-1159.

Shih Ie, M., and Kurman, R. J. (2004). Ovarian tumorigenesis: a proposed model based

on morphological and molecular genetic analysis. Am J Pathol 164, 1511-1518.

Sillanpaa, S., Anttila, M. A., Voutilainen, K., Tammi, R. H., Tammi, M. I., Saarikoski, S.

V., and Kosma, V. M. (2003). CD44 expression indicates favorable prognosis in

epithelial ovarian cancer. Clin Cancer Res 9, 5318-5324.

Simon, R., Korn, E. L., McShane, L. M., Radmacher, M., Wright, G., and Zhao, Y.

(2003a). Design and Analysis of DNA Microarray Investigations, (New York: Springer-

Verlag).

320

Simon, R., and Lam, A. BRB-ArrayTools. In, (Biometric Research Branch, National

Cancer Institute).

Simon, R., Radmacher, M. D., Dobbin, K., and McShane, L. M. (2003b). Pitfalls in the

use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer

Inst 95, 14-18.

Simpson, P. T., Reis-Filho, J. S., Gale, T., and Lakhani, S. R. (2005). Molecular evolution

of breast cancer. J Pathol 205, 248-254.

Singer, G., Oldt, R., III, Cohen, Y., Wang, B. G., Sidransky, D., Kurman, R. J., and Shih,

I.-M. (2003). Mutations in BRAF and KRAS Characterize the Development of Low-

Grade Ovarian Serous Carcinoma. JNCI Cancer Spectrum 95, 484-486.

Singh-Gasson, S., Green, R. D., Yue, Y., Nelson, C., Blattner, F., Sussman, M. R., and

Cerrina, F. (1999). Maskless fabrication of light-directed oligonucleotide microarrays

using a digital micromirror array. Nat Biotechnol 17, 974-978.

Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo, P.,

Renshaw, A. A., D'Amico, A. V., Richie, J. P., et al. (2002). Gene expression correlates

of clinical prostate cancer behavior. Cancer Cell 1, 203-209.

Slamon, D. J., Boone, T. C., Seeger, R. C., Keith, D. E., Chazin, V., Lee, H. C., and

Souza, L. M. (1986). Identification and characterization of the protein encoded by the

human N-myc oncogene. Science 232, 768-772.

Smart, C. R., and Chu, K. C. (1992). Staging patterns and early cancer detection. Semin

Surg Oncol 8, 62-72.

Smith, L. H., and Oi, R. H. (1984). Detection of malignant ovarian neoplasms: a review

of the literature. I. Detection of the patient at risk; clinical, radiological and cytological

detection. Obstet Gynecol Surv 39, 313-328.

Smith Sehdev, A. E., Sehdev, P. S., and Kurman, R. J. (2003). Noninvasive and invasive

micropapillary (low-grade) serous carcinoma of the ovary: a clinicopathologic analysis of

135 cases. Am J Surg Pathol 27, 725-736.

Smyth, G. K., Yang, Y. H., and Speed, T. (2003). Statistical issues in cDNA microarray

data analysis. Methods Mol Biol 224, 111-136.

321

Society, A. C. (2005). Cancer Facts and Figures 2005. In, (Atlanta, ISA: American

Cancer Society), p. 64.

Sood, A. K., Coffin, J. E., Schneider, G. B., Fletcher, M. S., DeYoung, B. R., Gruman, L.

M., Gershenson, D. M., Schaller, M. D., and Hendrix, M. J. (2004). Biological

significance of focal adhesion kinase in ovarian cancer: role in migration and invasion.

Am J Pathol 165, 1087-1095.

Sorlie, T., Perou, C. M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T.,

Eisen, M. B., van de Rijn, M., Jeffrey, S. S., et al. (2001). Gene expression patterns of

breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad

Sci U S A 98, 10869-10874.

Spentzos, D., Levine, D. A., Ramoni, M. F., Joseph, M., Gu, X., Boyd, J., Libermann, T.

A., and Cannistra, S. A. (2004). A Gene Expression Signature With Independent

Prognostic Significance in Epithelial Ovarian Cancer. J Clin Oncol.

Sperger, J. M., Chen, X., Draper, J. S., Antosiewicz, J. E., Chon, C. H., Jones, S. B.,

Brooks, J. D., Andrews, P. W., Brown, P. O., and Thomson, J. A. (2003). Gene

expression patterns in human embryonic stem cells and human pluripotent germ cell

tumors. Proc Natl Acad Sci U S A 100, 13350-13355.

Standal, T., Borset, M., and Sundan, A. (2004). Role of osteopontin in adhesion,

migration, cell survival and bone remodeling. Exp Oncol 26, 179-184.

Statnikov, A., Aliferis, C. F., Tsamardinos, I., Hardin, D., and Levy, S. (2005). A

comprehensive evaluation of multicategory classification methods for microarray gene

expression cancer diagnosis. Bioinformatics 21, 631-643.

Sterrenburg, E., Turk, R., Boer, J. M., van Ommen, G. B., and den Dunnen, J. T. (2002).

A common reference for cDNA microarray hybridizations. Nucl Acids Res 30, e116-.

Stinson, S. F., Alley, M. C., Kopp, W. C., Fiebig, H. H., Mullendore, L. A., Pittman, A.

F., Kenney, S., Keller, J., and Boyd, M. R. (1992). Morphological and

immunocytochemical characteristics of human tumor cell lines for use in a disease-

oriented anticancer drug screen. Anticancer Res 12, 1035-1053.

322

Stoecklin, G., Colombi, M., Raineri, I., Leuenberger, S., Mallaun, M., Schmidlin, M.,

Gross, B., Lu, M., Kitamura, T., and Moroni, C. (2002). Functional cloning of BRF1, a

regulator of ARE-dependent mRNA turnover. Embo J 21, 4709-4718.

Strausberg, R. L., Feingold, E. A., Grouse, L. H., Derge, J. G., Klausner, R. D., Collins,

F. S., Wagner, L., Shenmen, C. M., Schuler, G. D., Altschul, S. F., et al. (2002).

Generation and initial analysis of more than 15,000 full-length human and mouse cDNA

sequences. Proc Natl Acad Sci U S A 99, 16899-16903.

Su, A. I., Welsh, J. B., Sapinoso, L. M., Kern, S. G., Dimitrov, P., Lapp, H., Schultz, P.

G., Powell, S. M., Moskaluk, C. A., Frierson, H. F., Jr., and Hampton, G. M. (2001).

Molecular classification of human carcinomas by use of gene expression signatures.

Cancer Res 61, 7388-7393.

Sundfeldt, K. (2003). Cell-cell adhesion in the normal ovary and ovarian tumors of

epithelial origin; an exception to the rule. Mol Cell Endocrinol 202, 89-96.

Sundfeldt, K., Piontkewitz, Y., Ivarsson, K., Nilsson, O., Hellberg, P., Brannstrom, M.,

Janson, P. O., Enerback, S., and Hedin, L. (1997). E-cadherin expression in human

epithelial ovarian cancer and normal ovary. Int J Cancer 74, 275-280.

Takahashi, K., Avissar, N., Whitin, J., and Cohen, H. (1987). Purification and

characterization of human plasma glutathione peroxidase: a selenoglycoprotein distinct

from the known cellular enzyme. Arch Biochem Biophys 256, 677-686.

Talbot, S. G., Estilo, C., Maghami, E., Sarkaria, I. S., Pham, D. K., P, O. c., Socci, N. D.,

Ngai, I., Carlson, D., Ghossein, R., et al. (2005). Gene expression profiling allows

distinction between primary and metastatic squamous cell carcinomas in the lung. Cancer

Res 65, 3063-3071.

Tan, P. K., Downey, T. J., Spitznagel, E. L., Jr., Xu, P., Fu, D., Dimitrov, D. S.,

Lempicki, R. A., Raaka, B. M., and Cam, M. C. (2003). Evaluation of gene expression

measurements from commercial microarray platforms. Nucleic Acids Res 31, 5676-5684.

Tang, B. L., Low, D. Y., and Hong, W. (1998). Hsec22c: a homolog of yeast Sec22p and

mammalian rsec22a and msec22b/ERS-24. Biochem Biophys Res Commun 243, 885-

891.

323

Tang, D. G., Chen, Y. Q., Newman, P. J., Shi, L., Gao, X., Diglio, C. A., and Honn, K. V.

(1993). Identification of PECAM-1 in solid tumor cells and its potential involvement in

tumor cell adhesion to endothelium. J Biol Chem 268, 22883-22894.

Taniguchi, T., Tischkowitz, M., Ameziane, N., Hodgson, S. V., Mathew, C. G., Joenje,

H., Mok, S. C., and D'Andrea, A. D. (2003). Disruption of the Fanconi anemia-BRCA

pathway in cisplatin-sensitive ovarian tumors. Nat Med 9, 568-574.

Tanihara, H., Sano, K., Heimark, R. L., St John, T., and Suzuki, S. (1994). Cloning of

five human cadherins clarifies characteristic features of cadherin extracellular domain and

provides further evidence for two structurally different types of cadherin. Cell Adhes

Commun 2, 15-26.

Tatnell, P. J., Powell, D. J., Hill, J., Smith, T. S., Tew, D. G., and Kay, J. (1998). Napsins:

new human aspartic proteinases. Distinction between two closely related genes. FEBS

Lett 441, 43-48.

Taylor, M. R., Peterson, J. A., Ceriani, R. L., and Couto, J. R. (1996). Cloning and

sequence analysis of human butyrophilin reveals a potential receptor function. Biochim

Biophys Acta 1306, 1-4.

Teneriello, M. G., and Park, R. C. (1995). Early detection of ovarian cancer. CA Cancer J

Clin 45, 71-87.

Thibout, D., Kraemer, M., Di Benedetto, M., Saffar, L., Gattegno, L., Derbin, C., and

Crepin, M. (1999). Sodium phenylacetate (NaPa) induces modifications of the

proliferation, the adhesion and the cell cycle of tumoral epithelial breast cells. Anticancer

Res 19, 2121-2126.

Tiniakos, D. G., Yu, H., and Liapis, H. (1998). Osteopontin expression in ovarian

carcinomas and tumors of low malignant potential (LMP). Hum Pathol 29, 1250-1254.

Tomassetti, A., Mangiarotti, F., Mazzi, M., Sforzini, S., Miotti, S., Galmozzi, E., Elwood,

P. C., and Canevari, S. (2003). The variant hepatocyte nuclear factor 1 activates the P1

promoter of the human alpha-folate receptor gene in ovarian carcinoma. Cancer Res 63,

696-704.

324

Tonin, P. N., Hudson, T. J., Rodier, F., Bossolasco, M., Lee, P. D., Novak, J., Manderson,

E. N., Provencher, D., and Mes-Masson, A. M. (2001). Microarray analysis of gene

expression mirrors the biology of an ovarian cancer model. Oncogene 20, 6617-6626.

Tortorella, M., Pratta, M., Liu, R. Q., Abbaszade, I., Ross, H., Burn, T., and Arner, E.

(2000). The thrombospondin motif of aggrecanase-1 (ADAMTS-4) is critical for

aggrecan substrate recognition and cleavage. J Biol Chem 275, 25791-25797.

Tothill, R. W., Kowalczyk, A., Rischin, D., Bousioutas, A., Haviv, I., van Laar, R. K.,

Waring, P. M., Zalcberg, J., Ward, R., Biankin, A. V., et al. (2005). An expression-based

site of origin diagnostic method designed for clinical application to cancer of unknown

origin. Cancer Res 65, 4031-4040.

Trimble, C. L., and Trimble, E. L. (2003). Ovarian tumors of low malignant potential.

Oncology (Williston Park) 17, 1563-1567; discussion 1567-1570, 1575.

Trope, C., Kaern, J., Hogberg, T., Abeler, V., Hagen, B., Kristensen, G., Onsrud, M.,

Pettersen, E., Rosenberg, P., Sandvei, R., et al. (2000). Randomized study on adjuvant

chemotherapy in stage I high-risk ovarian cancer with evaluation of DNA-ploidy as

prognostic instrument. Ann Oncol 11, 281-288.

Trope, C., Kaern, J., Vergote, I. B., Kristensen, G., and Abeler, V. (1993). Are borderline

tumors of the ovary overtreated both surgically and systemically? A review of four

prospective randomized trials including 253 patients with borderline tumors. Gynecol

Oncol 51, 236-243.

Troyanskaya, O. G., Garber, M. E., Brown, P. O., Botstein, D., and Altman, R. B. (2002).

Nonparametric methods for identifying differentially expressed genes in microarray data.


Tsodikov, A., Szabo, A., and Jones, D. (2002). Adjustments and measures of differential

expression for microarray data. Bioinformatics 18, 251-260.

Tusher, V. G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays

applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98, 5116-5121.

Umbas, R., Schalken, J. A., Aalders, T. W., Carter, B. S., Karthaus, H. F., Schaafsma, H.

E., Debruyne, F. M., and Isaacs, W. B. (1992). Expression of the cellular adhesion

325

molecule E-cadherin is reduced or absent in high-grade prostate cancer. Cancer Res 52,

5104-5109.

Umehara, H., Nishii, Y., Morishima, M., Kakehi, Y., Kioka, N., Amachi, T., Koizumi, J.,

Hagiwara, M., and Ueda, K. (2003). Effect of cisplatin treatment on speckled distribution

of a serine/arginine-rich nuclear protein CROP/Luc7A. Biochem Biophys Res Commun

301, 324-329.

van 't Veer, L. J., Dai, H., van de Vijver, M. J., He, Y. D., Hart, A. A., Mao, M., Peterse,

H. L., van der Kooy, K., Marton, M. J., Witteveen, A. T., et al. (2002). Gene expression

profiling predicts clinical outcome of breast cancer. Nature 415, 530-536.

van de Vijver, M. J., He, Y. D., van't Veer, L. J., Dai, H., Hart, A. A., Voskuil, D. W.,

Schreiber, G. J., Peterse, J. L., Roberts, C., Marton, M. J., et al. (2002). A gene-

expression signature as a predictor of survival in breast cancer. N Engl J Med 347, 1999-

2009.

van den Brule, F., Califice, S., Garnier, F., Fernandez, P. L., Berchuck, A., and

Castronovo, V. (2003). Galectin-1 accumulation in the ovary carcinoma peritumoral

stroma is induced by ovary carcinoma cells and affects both cancer cell proliferation and

adhesion to laminin-1 and fibronectin. Lab Invest 83, 377-386.

Van den Eynde, B., Peeters, O., De Backer, O., Gaugler, B., Lucas, S., and Boon, T.

(1995). A new family of genes coding for an antigen recognized by autologous cytolytic

T lymphocytes on a human melanoma. J Exp Med 182, 689-698.

van der Burg, M. E., Henzen-Logmans, S. C., Berns, E. M., van Putten, W. L., Klijn, J.

G., and Foekens, J. A. (1996). Expression of urokinase-type plasminogen activator (uPA)

and its inhibitor PAI-1 in benign, borderline, malignant primary and metastatic ovarian

tumors. Int J Cancer 69, 475-479.

van der Burg, M. E., van Lent, M., Buyse, M., Kobierska, A., Colombo, N., Favalli, G.,

Lacave, A. J., Nardi, M., Renard, J., and Pecorelli, S. (1995). The effect of debulking

surgery after induction chemotherapy on the prognosis in advanced epithelial ovarian

cancer. Gynecological Cancer Cooperative Group of the European Organization for

Research and Treatment of Cancer. N Engl J Med 332, 629-634.

326

Van Valkenburgh, H., Shern, J. F., Sharer, J. D., Zhu, X., and Kahn, R. A. (2001). ADP-

ribosylation factors (ARFs) and ARF-like 1 (ARL1) have both specific and shared

effectors: characterizing ARL1-binding proteins. J Biol Chem 276, 22826-22837.

Vapnik, V. (1998). Statistical Learning Theory, (New York: John Wiley).

Vaquerizas, J. M., Dopazo, J., and Diaz-Uriarte, R. (2004). DNMAD: web-based

diagnosis and normalization for microarray data. Bioinformatics 20, 3656-3658.

Vasselli, J. R., Shih, J. H., Iyengar, S. R., Maranchie, J., Riss, J., Worrell, R., Torres-

Cabala, C., Tabios, R., Mariotti, A., Stearman, R., et al. (2003). Predicting survival in

patients with metastatic kidney cancer by gene-expression profiling in the primary tumor.

Proc Natl Acad Sci U S A 100, 6958-6963.

Veatch, A. L., Carson, L. F., and Ramakrishnan, S. (1994). Differential expression of the

cell-cell adhesion molecule E-cadherin in ascites and solid human ovarian tumor cells. Int

J Cancer 58, 393-399.

Vehvilainen, P., Hyytiainen, M., and Keski-Oja, J. (2003). Latent transforming growth

factor-beta-binding protein 2 is an adhesion protein for melanoma cells. J Biol Chem 278,

24705-24713.

Vergote, I., De Brabanter, J., Fyles, A., Bertelsen, K., Einhorn, N., Sevelda, P., Gore, M.

E., Kaern, J., Verrelst, H., Sjovall, K., et al. (2001). Prognostic importance of degree of

differentiation and cyst rupture in stage I invasive epithelial ovarian carcinoma. Lancet

357, 176-182.

Wang, K., Gan, L., Jeffery, E., Gayle, M., Gown, A. M., Skelly, M., Nelson, P. S., Ng,

W. V., Schummer, M., Hood, L., and Mulligan, J. (1999). Monitoring gene expression

profile changes in ovarian carcinomas using cDNA microarray. Gene 229, 101-108.

Wang, X., Ghosh, S., and Guo, S. W. (2001). Quantitative quality control in microarray

image processing and data acquisition. Nucleic Acids Res 29, E75-75.

Wang, X., Jin, D. Y., Ng, R. W., Feng, H., Wong, Y. C., Cheung, A. L., and Tsao, S. W.

(2002). Significance of MAD2 expression to mitotic checkpoint control in ovarian cancer

cells. Cancer Res 62, 1662-1668.

327

Wang, Z., and Roeder, R. G. (1995). Structure and function of a human transcription

factor TFIIIB subunit that is evolutionarily conserved and contains both TFIIB- and high-

mobility-group protein 2-related domains. Proc Natl Acad Sci U S A 92, 7026-7030.

Warrenfeltz, S., Pavlik, S., Datta, S., Kraemer, E. T., Benigno, B., and McDonald, J. F.

(2004). Gene expression profiling of epithelial ovarian tumours correlated with malignant

potential. Mol Cancer 3, 27.

Wei, M. H., Karavanova, I., Ivanov, S. V., Popescu, N. C., Keck, C. L., Pack, S., Eisen, J.

A., and Lerman, M. I. (1998). In silico-initiated cloning and molecular characterization of

a novel human member of the L1 gene family of neural cell adhesion molecules. Hum

Genet 103, 355-364.

Weil, M. R., Macatee, T., and Garner, H. R. (2002). Toward a universal standard:

comparing two methods for standardizing spotted microarray data. Biotechniques 32,

1310-1314.

Welsh, J. B., Zarrinkar, P. P., Sapinoso, L. M., Kern, S. G., Behling, C. A., Monk, B. J.,

Lockhart, D. J., Burger, R. A., and Hampton, G. M. (2001). Analysis of gene expression

profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular

markers of epithelial ovarian cancer. Proc Natl Acad Sci U S A 98, 1176-1181.

Wen, W. H., Reles, A., Runnebaum, I. B., Sullivan-Halley, J., Bernstein, L., Jones, L. A.,

Felix, J. C., Kreienberg, R., el-Naggar, A., and Press, M. F. (1999). p53 mutations and

expression in ovarian cancers: correlation with overall survival. Int J Gynecol Pathol 18,

29-41.

Wheeler, D. L., Church, D. M., Federhen, S., Lash, A. E., Madden, T. L., Pontius, J. U.,

Schuler, G. D., Schriml, L. M., Sequeira, E., Tatusova, T. A., and Wagner, L. (2003).

Database resources of the National Center for Biotechnology. Nucleic Acids Res 31, 28-

33.

White, T., Bennett, E. P., Takio, K., Sorensen, T., Bonding, N., and Clausen, H. (1995).

Purification and cDNA cloning of a human UDP-N-acetyl-alpha-D-

galactosamine:polypeptide N-acetylgalactosaminyltransferase. J Biol Chem 270, 24156-

24165.

Wijnhoven, B. P., Dinjens, W. N., and Pignatelli, M. (2000). E-cadherin-catenin cell-cell

adhesion complex and human cancer. Br J Surg 87, 992-1005.

328

Wiley, S. R., Schooley, K., Smolak, P. J., Din, W. S., Huang, C. P., Nicholl, J. K.,

Sutherland, G. R., Smith, T. D., Rauch, C., Smith, C. A., and et al. (1995). Identification

and characterization of a new member of the TNF family that induces apoptosis.

Immunity 3, 673-682.

Wilhelm, S., Schmitt, M., Parkinson, J., Kuhn, W., Graeff, H., and Wilhelm, O. G.

(1998). Thrombomodulin, a receptor for the serine protease thrombin, is decreased in

primary tumors and metastases but increased in ascitic fluids of patients with advanced

ovarian cancer FIGO IIIc. Int J Oncol 13, 645-651.

Williams, B. A., Gwirtz, R. M., and Wold, B. J. (2004). Genomic DNA as a

cohybridization standard for mammalian microarray measurements. Nucleic Acids Res

32, e81.

Wong, K. K., Cheng, R. S., and Mok, S. C. (2001). Identification of differentially

expressed genes from ovarian cancer cells by MICROMAX cDNA microarray system.

Biotechniques 30, 670-675.

Woo, Y., Affourtit, J., Daigle, S., Viale, A., Johnson, K., Naggert, J., and Churchill, G.

(2004). A Comparison of cDNA, Oligonucleotide, and Affymetrix GeneChip Gene

Expression Microarray Platforms. J Biomol Tech 15, 276-284.

World Health Organization (1999). Histological Typing of Ovarian Tumours, 2nd edn:

Springer-Verlag Berlin and Heidelberg GmbH & Co. K).

Wright, G. W., and Simon, R. M. (2003). A random variance model for detection of

differential gene expression in small microarray experiments. Bioinformatics 19, 2448-

2455.

Xu, Y., Shen, Z., Wiper, D. W., Wu, M., Morton, R. E., Elson, P., Kennedy, A. W.,

Belinson, J., Markman, M., and Casey, G. (1998). Lysophosphatidic acid as a potential

biomarker for ovarian and other gynecologic cancers. Jama 280, 719-723.

Xu, Y., and Yu, Q. (2003). E-cadherin negatively regulates CD44-hyaluronan interaction

and CD44-mediated tumor invasion and branching morphogenesis. J Biol Chem 278,

8661-8668.

Yang, I. V., Chen, E., Hasseman, J. P., Liang, W., Frank, B. C., Wang, S., Sharov, V.,

Saeed, A. I., White, J., Li, J., et al. (2002a). Within the fold: assessing differential

329

expression measures and reproducibility in microarray assays. Genome Biol 3,

research0062.

Yang, M. C., Ruan, Q. G., Yang, J. J., Eckenrode, S., Wu, S., McIndoe, R. A., and She, J.

X. (2001). A statistical method for flagging weak spots improves normalization and ratio

estimates in microarrays. Physiol Genomics 7, 45-53.

Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J., and Speed, T. P. (2002b).

Normalization for cDNA microarray data: a robust composite method addressing single

and multiple slide systematic variation. Nucleic Acids Res 30, e15.

Yang, Y. H., and Speed, T. P. (2002). An Introduction to Microarray Bioinformatics: Part

3 - Normalisation. In DNA Microarrays - A Molecular Cloning Manual, D.D. Bowtell,

and J. Sambrook, eds. (New York: Cold Spring Harbour Laboratory Press), p. 712.

Yeoh, E. J., Ross, M. E., Shurtleff, S. A., Williams, W. K., Patel, D., Mahfouz, R., Behm,

F. G., Raimondi, S. C., Relling, M. V., Patel, A., et al. (2002). Classification, subtype

discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene

expression profiling. Cancer Cell 1, 133-143.

Ying, S. Y., and Becker, A. (1995). Inhibin and Activin Modulate Transforming Growth

Factor-beta-Induced Immunosuppression. J Biomed Sci 2, 237-241.

Yordy, J. S., Li, R., Sementchenko, V. I., Pei, H., Muise-Helmericks, R. C., and Watson,

D. K. (2004). SP100 expression modulates ETS1 transcriptional activity and inhibits cell

invasion. Oncogene 23, 6654-6665.

Yuan, Z. Q., Sun, M., Feldman, R. I., Wang, G., Ma, X., Jiang, C., Coppola, D., Nicosia,

S. V., and Cheng, J. Q. (2000). Frequent activation of AKT2 and induction of apoptosis

by inhibition of phosphoinositide-3-OH kinase/Akt pathway in human ovarian cancer.

Oncogene 19, 2324-2330.

Yue, H., Eastman, P. S., Wang, B. B., Minor, J., Doctolero, M. H., Nuttall, R. L., Stack,

R., Becker, J. W., Montgomery, J. R., Vainer, M., and Johnston, R. (2001). An evaluation

of the performance of cDNA microarrays for detecting changes in global mRNA

expression. Nucleic Acids Res 29, E41-41.

330

Yuen, P. K., Li, G., Bao, Y., and Muller, U. R. (2003). Microfluidic devices for fluidic

circulation and mixing improve hybridization signal intensity on DNA arrays. Lab Chip

3, 46-50.

Yuen, T., Wurmbach, E., Pfeffer, R. L., Ebersole, B. J., and Sealfon, S. C. (2002).

Accuracy and calibration of commercial oligonucleotide and custom cDNA microarrays.

Nucleic Acids Res 30, e48.

Zand, L., Qiang, F., Roskelley, C. D., Leung, P. C., and Auersperg, N. (2003).

Differential effects of cellular fibronectin and plasma fibronectin on ovarian cancer cell

adhesion, migration, and invasion. In Vitro Cell Dev Biol Anim 39, 178-182.

Zanetta, G., Rota, S., Chiari, S., Bonazzi, C., Bratina, G., and Mangioni, C. (2001).

Behavior of borderline tumors with particular interest to persistence, recurrence, and

progression to invasive carcinoma: a prospective study. J Clin Oncol 19, 2658-2664.

Zeeberg, B. R., Feng, W., Wang, G., Wang, M. D., Fojo, A. T., Sunshine, M.,

Narasimhan, S., Kane, D. W., Reinhold, W. C., Lababidi, S., et al. (2003). GoMiner: a

resource for biological interpretation of genomic and proteomic data. Genome Biol 4,

R28.

Zembutsu, H., Ohnishi, Y., Tsunoda, T., Furukawa, Y., Katagiri, T., Ueyama, Y.,

Tamaoki, N., Nomura, T., Kitahara, O., Yanagawa, R., et al. (2002). Genome-wide

cDNA microarray screening to correlate gene expression profiles with sensitivity of 85

human cancer xenografts to anticancer drugs. Cancer Res 62, 518-527.

Zhang, J. S., Gong, A., Cheville, J. C., Smith, D. I., and Young, C. Y. (2005). AGR2, an

androgen-inducible secretory protein overexpressed in prostate cancer. Genes

Chromosomes Cancer 43, 249-259.

Zhang, L., Conejo-Garcia, J. R., Katsaros, D., Gimotty, P. A., Massobrio, M., Regnani,

G., Makrigiannakis, A., Gray, H., Schlienger, K., Liebman, M. N., et al. (2003).

Intratumoral T cells, recurrence, and survival in epithelial ovarian cancer. N Engl J Med

348, 203-213.

Zhang, X., Horwitz, G. A., Prezant, T. R., Valentini, A., Nakashima, M., Bronstein, M.

D., and Melmed, S. (1999). Structure, expression, and function of human pituitary tumor-

transforming gene (PTTG). Mol Endocrinol 13, 156-166.

331

Zhou, H., Kuang, J., Zhong, L., Kuo, W. L., Gray, J. W., Sahin, A., Brinkley, B. R., and

Sen, S. (1998). Tumour amplified kinase STK15/BTAK induces centrosome

amplification, aneuploidy and transformation. Nat Genet 20, 189-193.

Zorn, K. K., Jazaeri, A. A., Awtrey, C. S., Gardner, G. J., Mok, S. C., Boyd, J., and

Birrer, M. J. (2003). Choice of normal ovarian control influences determination of

differentially expressed genes in ovarian cancer expression profiling studies. Clin Cancer

Res 9, 4811-4818.

332

Appendix A: FIGO staging of EOC

Stage I: The cancer is still contained within the ovary (or ovaries).

Stage IA: Cancer has developed in one ovary, and the tumor is confined to the inside of the ovary. There is no cancer on the outer surface of the ovary. Laboratory examination of washings from the abdomen and pelvis did not find any cancer cells.

Stage IB: Cancer has developed within both ovaries without any tumor on their outer surfaces. Laboratory examination of washings from the abdomen and pelvis did not find any cancer cells.

Stage IC: The cancer is present in one or both ovaries and one or more of the following are present:

• Cancer on the outer surface of at least one of the ovaries • In the case of cystic tumors (fluid-filled tumors), the capsule (outer wall of the

tumor) has ruptured (burst) • Laboratory examination found cancer cells in fluid or washings from the

abdomen.

Stage II: The cancer is in one or both ovaries and has involved other organs (such as the uterus, fallopian tubes, bladder, the sigmoid colon, or the rectum) within the pelvis.

Stage IIA: The cancer has spread to or has actually invaded the uterus or the fallopian tubes, or both. Laboratory examination of washings from the abdomen did not find any cancer cells.

Stage IIB: The cancer has spread to other nearby pelvic organs such as the bladder, the sigmoid colon, or the rectum. Laboratory examination of fluid from the abdomen did not find any cancer cells.

Stage IIC: The cancer has spread to pelvic organs as in stages IIA or IIB and laboratory examination of the washings from the abdomen found evidence of cancer cells.

Stage III: The cancer involves one or both ovaries, and one or both of the following are present: (1) cancer has spread beyond the pelvis to the lining of the abdomen; (2) cancer has spread to lymph nodes.

Stage IIIA: During the staging operation, the surgeon can see cancer involving the ovary or ovaries, but no cancer is grossly visible (can be seen without using a microscope) in the abdomen and the cancer has not spread to lymph nodes. However, when biopsies are checked under a microscope, tiny deposits of cancer are found in the lining of the upper abdomen.

Stage IIIB: There is cancer in one or both ovaries, and deposits of cancer large enough for the surgeon to see, but smaller than 2 cm (about 3/4 inch) across, are present in the abdomen. Cancer has not spread to the lymph nodes.

333

Stage IIIC: The cancer is in one or both ovaries, and one or both of the following are present:

• Cancer has spread to lymph nodes. • Deposits of cancer larger than 2 cm (about 3/4 inch) across are seen in the

abdomen.

Stage IV: This is the most advanced stage of ovarian cancer. The cancer is in one or both ovaries. Distant metastasis (spread of the cancer to the inside of the liver, the lungs, or other organs located outside of the peritoneal cavity) has occurred. Finding ovarian cancer cells in pleural fluid (from the cavity that surrounds the lungs) is also evidence of stage IV disease.

Recurrent ovarian cancer: This means that the disease has come back (recurred) after completion of treatment.

Source: American Cancer Society: http://www.cancer.org/

334

Appendix B: Specimens of EOC included in TMA Details of AOCS cases used for TMA #AOCS-01

Patient ID % Tumour content Histological subtype Phenotype 20022 >50% Serous Invasive 20032 >50% Serous Invasive 22012 >95 Serous Invasive 22013 >90 Serous Invasive 22020 >95 Serous Invasive 22023 >90 Serous Invasive 22027 >50% Serous LMP 22037 >95 Serous Invasive 23036 <80 Serous LMP 23037 >50 Serous LMP 23052 >50% Serous Invasive 23053 >99 Serous Invasive 23055 >50% Serous Invasive 23062 >50% Serous LMP 23066 >50% Serous LMP 23070 >50% Serous Invasive 23076 >50% Serous LMP 23077 >50% Serous Invasive 23098 >50% Serous Invasive 32009 <70 Serous LMP 32022 >95 Serous Invasive 32028 >80 Serous Invasive 32032 >95 Serous Invasive 32037 >50% Serous LMP 32058 >70 Serous LMP 32066 >75 Serous LMP 34002 >80 Serous Invasive 34049 >50% Serous Invasive 34058 >50% Serous Invasive 34066 >50% Serous Invasive 34079 >50% Serous LMP 34087 >50% Serous LMP 34101 >50% Serous LMP 41125 >40 Serous LMP 44262 >75 Serous LMP 51024 <10 Serous LMP 51041 >70 Serous Invasive 51068 >75 Serous LMP 60111 >50% Serous Invasive 70039 >99 Serous Invasive P00633 >50% Serous LMP P00756 >50% Serous Invasive

335

Details of St. Vincent’s Hostital EOC cases used for TMA #AOCS-02

ID % Tumour content Histological subtype LMP or Inv 43 >50% Serous Invasive 698 >50% Serous Invasive 1159 >50% Serous Invasive 1680 >50% Serous LMP 2311 >50% Serous Invasive 4053 >50% Serous LMP 4444 >50% Serous Invasive 4974 >50% Serous Invasive 6350 >50% Serous Invasive 6350 >50% Serous LMP 7362 >50% Serous LMP 7893 >50% Serous Invasive 8938 >50% Serous Invasive 8965 >50% Serous Invasive 9188 >50% Serous Invasive 10082 >50% Serous Invasive 10788 >50% Serous Invasive 10922 >50% Serous LMP 11231 >50% Serous Invasive 11254 >50% Serous Invasive 14119 >50% Serous Invasive 14358 >50% Serous LMP 15255 >50% Serous Invasive 15401 >50% Serous Invasive 16457 >50% Serous Invasive 17587 >50% Serous LMP 18012 >50% Serous Invasive 18264 >50% Serous Invasive 18673 >50% Serous Invasive 20087 >50% Serous Invasive 20356 >50% Serous LMP 22228 >50% Serous LMP 22306 >50% Serous LMP 23344 >50% Serous Invasive 24084 >50% Serous LMP 24711 >50% Serous LMP 24778 >50% Serous Invasive 25060 >50% Serous LMP 26109 >50% Serous Invasive 26361 >50% Serous Invasive 1180_1a >50% Serous LMP 1180_2a >50% Serous Invasive

336

TMA #AOCS01 Layout

A B C D E

1 Blank 22027 (SerousLMP)

23053 (SerousInv)

32066 (SerousLMP)

34049 (SerousInv)

2 20022 (SerousInv)

23036 (SerousLMP)

23055 (SerousInv)

34079 (SerousLMP)

34058 (SerousInv)

3 23037 (SerousLMP)

23070 (SerousInv)

34087 (SerousLMP)

34066 (SerousInv)

4 22012 (SerousInv)

23062 (SerousLMP)

23077 (SerousInv)

34101 (SerousLMP)

51041 (SerousInv)

5 22013 (SerousInv)

23066 (SerousLMP)

23098 (SerousInv)

41125 (SerousLMP)

32057 (SerousLMP)

6 22020 (SerousInv)

23076 (SerousLMP)

32022 (SerousInv)

44262 (SerousLMP)

70039 (SerousInv)

7 22023 (SerousInv)

32009 (SerousLMP)

32028 (SerousInv)

51024 (SerousLMP)

P00756 (SerousInv)

8 22037 (SerousInv)

32037 (SerousLMP)

32032 (SerousInv)

51068 (SerousLMP)

9 23052 (SerousInv)

32058 (SerousLMP)

34002 (SerousInv)

P00633 (SerousLMP)

337

TMA #AOCS2 Layout

A B C D E F

1 Blank 10922 (SerousLMP)

10082 (SerousINV)

24084 (SerousLMP)

24778 (SerousINV) 6115 4d

2 8938 (SerousINV)

14358 (SerousLMP)

23344 (SerousINV)

24711 (SerousLMP)

43 (SerousINV) 6012 1b

3 14119 (SerousINV)

17587 (SerousLMP)

26361 (SerousINV)

1680 (SerousLMP)

698 (SerousINV) 4890 1a

4 15401 (SerousINV)

20356 (SerousLMP)

1180_2a (SerousINV)

4053 (SerousLMP)

1159 (SerousINV) 16457 1d

5 18012 (SerousINV)

22306 (SerousLMP)

6350 (SerousINV)

7362 (SerousLMP)

2311 (SerousINV)

6 18673 (SerousINV)

25060 (SerousLMP)

9188 (SerousINV)

13516 (SerousLMP)

4974 (SerousINV)

7 26109 (SerousINV)

1180_1a (SerousLMP)

10788 (SerousINV)

8965 (SerousINV)

8 4444 (SerousINV)

6350 (SerousLMP)

11231 (SerousINV)

16457 (SerousINV)

11254 (SerousINV)

9 7893 (SerousINV)

22228 (SerousLMP)

20087 (SerousINV)

18264 (SerousINV)

15255 (SerousINV)

338

Appendix C: Details of pooled tumour and cell line reference RNAs Cell lines used in creation of pooled cell-line reference RNA

Name Description MCF7 Breast adenocarcinoma Hs578T Breast carcinosarcoma NTERA-2 cl.D1 Testicular embryonal carcinoma Colo-205 Colorectal adenocarcinoma OVCAR-3 Ovarian adenocarcinoma UACC-62 Melanoma MOLT-4 Acute lymphoblastic leukaemia RPMI-8226 Myeloma NB4+ATRA Acute promyelocytic leukaemia SW872 Fibrosarcoma HepG2 Hepatocellular carcinoma

Ovarian cancer specimens used to create 'pooled tumour' reference RNA.

Patient ID Histological subtype P00726 Adenocarcinoma P00862 Adenocarcinoma P00719 Benign P00909 Endometroid P00784 Mucinous P00807 Mucinous P00819 Mucinous P00934 Mucinous P00935 Mucinous RBH 94.021 Mucinous P00703 Serous P00756 Serous P00772 Serous P00773 Serous P00806 Serous P00933 Serous RBH 85.005 Serous RBH 89.023 Serous RBH 88.014 Unknown RBH 89.009 Unknown RBH 89.018 Unknown RBH 89.022 Unknown

339

Appendix D: MMT scores from the analysis of spatial bias on LOOCV prediction accuracy Array ID MMT score Array ID MMT score ab001 135.95 ab148 66.89 ab003 65.89 ab151 138.09 ab009 80.84 ab154 94.98 ab013 27.89 ab158 99.47 ab016 1223.77 ab166 96.35 ab021 53.61 ab168 324.51 ab034 181.84 ab170 348.26 ab039 86.47 ab172 312.37 ab044 135.7 ab173 120.25 ab049 138.39 ab181a 66.33 ab054 176.58 ab182 72.44 ab056 88.6 ab183 283.72 ab059 40.67 ab184 86.81 ab063 71.09 ab188b 325.32 ab066 70.77 ab189b 119.16 ab068 183.04 ab191b 262.86 ab071 143.28 ab193b 142.14 ab073 89 ab195b 205.84 ab076 65.02 ab198 141.49 ab078 62.81 ab207 156 ab083 76.54 ab209 193.46 ab093 65.3 ab210 122.88 ab103 73.47 ab211 88.68 ab105 111.51 ab212 257.95 ab108 117.74 ab213 213.89 ab110 98.16 ab214 238.47 ab113 74.53 ab215 232.04 ab116 82.95 ab217 133.46 ab118 117.95 ab121 80.7 ab123 116.25 ab128 102.35 ab131 120.09 ab133 45.07 ab138 80.42 ab143 159.04 ab146 82.28

340

Appendix E: Members of the ‘response to stimulus’ gene ontology Of the genes found to be differentially expressed between histological subtypes of EOC

(Chapter 3), a significantly larger number of genes belonging to the ‘response to

stimulus’ category was identified from the data generated the 11 cell line reference RNA.

Differentially expressed genes from cell line reference dataset representing the ‘response to stimulus’ gene ontology

UniGene ID

Symbol Name

Hs.362807 IL7R Interleukin 7 receptor Hs.413924 CXCL10 Chemokine (C-X-C motif) ligand 10 Hs.519162 BTG2 BTG family, member 2 Hs.465221 FECH Ferrochelatase (protoporphyria) Hs.70327 CRIP1 Cysteine-rich protein 1 (intestinal) Hs.437322 TNFAIP6 Tumor necrosis factor, alpha-induced protein 6 Hs.14155 ICOSL Inducible T-cell co-stimulator ligand Hs.497183 IVNS1ABP Influenza virus NS1A binding protein Hs.458262 IGLC2 Immunoglobulin lambda constant 2 (Kern-Oz- marker) Hs.504641 CD163 CD163 antigen Hs.439852 IGHD Immunoglobulin heavy constant delta Hs.513457 IL4R Interleukin 4 receptor Hs.77274 PLAU Plasminogen activator, urokinase Hs.494173 ANXA1 Annexin A1 Hs.352642 FCGR2A Fc fragment of IgG, low affinity IIa, receptor for (CD32) Hs.517307 MX1 Myxovirus (influenza virus) resistance 1, interferon-inducible protein

p78 (mouse) Hs.515369 TYROBP TYRO protein tyrosine kinase binding protein Hs.351279 HLA-DMA Major histocompatibility complex, class II, DM alpha Hs.347270 HLA-

DPA1 Major histocompatibility complex, class II, DP alpha 1

Hs.17483 CD4 CD4 antigen (p55) Hs.436061 IRF1 Interferon regulatory factor 1 Hs.524134 GATA3 GATA binding protein 3 Hs.458272 MPO Myeloperoxidase Hs.274402 HSPA1B Heat shock 70kDa protein 1B Hs.23262 RNASE6 Ribonuclease, RNase A family, k6 Hs.301921 CCR1 Chemokine (C-C motif) receptor 1 Hs.521903 LY6E Lymphocyte antigen 6 complex, locus E Hs.524517 CSF3R Colony stimulating factor 3 receptor (granulocyte) Hs.624 IL8 Interleukin 8 Hs.118110 BST2 Bone marrow stromal cell antigen 2 Hs.436911 AMBP Alpha-1-microglobulin/bikunin precursor Hs.146393 HERPUD1 Homocysteine-inducible, endoplasmic reticulum stress-inducible,

ubiquitin-like domain member 1 Hs.413297 RGS16 Regulator of G-protein signalling 16 Hs.43728 GPX7 Glutathione peroxidase 7 Hs.458485 G1P2 Interferon, alpha-inducible protein (clone IFI-15K) Hs.77810 NFATC4 Nuclear factor of activated T-cells, cytoplasmic, calcineurin-dependent

4

341

UniGene ID

Symbol Name

Hs.3268 HSPA6 Heat shock 70kDa protein 6 (HSP70B') Hs.463059 STAT3 Signal transducer and activator of transcription 3 (acute-phase

response factor) Hs.85258 CD8A CD8 antigen, alpha polypeptide (p32) Hs.77424 FCGR1A Fc fragment of IgG, high affinity Ia, receptor for (CD64) Hs.76530 F2 Coagulation factor II (thrombin) Hs.156727 ANKH Ankylosis, progressive homolog (mouse) Hs.237658 APOA2 Apolipoprotein A-II Hs.82327 GSS Glutathione synthetase Hs.324746 AHSG Alpha-2-HS-glycoprotein Hs.196384 PTGS2 Prostaglandin-endoperoxide synthase 2 (prostaglandin G/H synthase

and cyclooxygenase) Hs.386793 GPX3 Glutathione peroxidase 3 (plasma) Hs.90708 GZMA Granzyme A (granzyme 1, cytotoxic T-lymphocyte-associated serine

esterase 3) Hs.122006 FGF7 Galactokinase 2 Hs.478368 KCNMB2 Potassium large conductance calcium-activated channel, subfamily M,

beta member 2 Hs.281898 AIM2 Absent in melanoma 2 Hs.380334 ZNF148 Zinc finger protein 148 (pHZ-52) Hs.279611 DMBT1 Deleted in malignant brain tumors 1 Hs.484703 CD83 CD83 antigen (activated B lymphocytes, immunoglobulin

superfamily) Hs.433300 FCER1G Fc fragment of IgE, high affinity I, receptor for; gamma polypeptide Hs.422181 S100B S100 calcium binding protein, beta (neural) Hs.311958 IL15 Interleukin 15 Hs.484741 GMPR Guanosine monophosphate reductase Hs.519580 TCF7 Transcription factor 7 (T-cell specific, HMG-box) Hs.371720 SYK Spleen tyrosine kinase Hs.81328 NFKBIA Nuclear factor of kappa light polypeptide gene enhancer in B-cells

inhibitor, alpha Hs.533549 EIF2B3 Eukaryotic translation initiation factor 2B, subunit 3 gamma, 58kDa Hs.20315 IFIT1 Interferon-induced protein with tetratricopeptide repeats 1 Hs.271387 CCL8 Chemokine (C-C motif) ligand 8 Hs.525157 TNFSF13B Tumor necrosis factor (ligand) superfamily, member 13b Hs.478275 TNFSF10 Tumor necrosis factor (ligand) superfamily, member 10 Hs.408767 CRYAB Crystallin, alpha B Hs.431550 MAP4K4 Mitogen-activated protein kinase kinase kinase kinase 4 Hs.512633 PRG2 Proteoglycan 2, bone marrow (natural killer cell activator, eosinophil

granule major basic protein) Hs.384598 SERPING1 Serine (or cysteine) proteinase inhibitor, clade G (C1 inhibitor),

member 1, (angioedema, hereditary) Hs.1706 ISGF3G Interferon-stimulated transcription factor 3, gamma 48kDa Hs.424932 PDCD8 Programmed cell death 8 (apoptosis-inducing factor) Hs.316931 SCAP1 Src family associated phosphoprotein 1 Hs.512683 CCL3L1 Chemokine (C-C motif) ligand 3-like 1 Hs.512211 IFNAR2 Interferon (alpha, beta and omega) receptor 2 Hs.485130 HLA-

DPB1 Major histocompatibility complex, class II, DP beta 1

Hs.208854 CD69 CD69 antigen (p60, early T-cell activation antigen) Hs.239818 PIK3CB Phosphoinositide-3-kinase, catalytic, beta polypeptide Hs.154078 LBP Lipopolysaccharide binding protein Hs.375957 ITGB2 Integrin, beta 2 (antigen CD18 (p95), lymphocyte function-associated

antigen 1; macrophage antigen 1 (mac-1) beta subunit) Hs.430733 CLNS1A Chloride channel, nucleotide-sensitive, 1A

342

UniGene ID

Symbol Name

Hs.14623 IFI30 Interferon, gamma-inducible protein 30 Hs.124503 TCF8 Transcription factor 8 (represses interleukin 2 expression) Hs.504048 CD3D CD3D antigen, delta polypeptide (TiT3 complex) Hs.520048 HLA-DRA Major histocompatibility complex, class II, DR alpha Hs.128856 SCARA3 Scavenger receptor class A, member 3 Hs.188021 KCNH2 Potassium voltage-gated channel, subfamily H (eag-related), member 2 Hs.446529 GUCA1B Guanylate cyclase activator 1B (retina) Hs.77367 CXCL9 Chemokine (C-X-C motif) ligand 9 Hs.160673 RHOH Ras homolog gene family, member H Hs.483829 CSF1R Colony stimulating factor 1 receptor, formerly McDonough feline

sarcoma viral (v-fms) oncogene homolog Hs.519866 LY86 Lymphocyte antigen 86 Hs.449629 OBP2B Odorant binding protein 2B Hs.344812 TREX1 Three prime repair exonuclease 1 Hs.416073 S100A8 S100 calcium binding protein A8 (calgranulin A) Hs.521869 KAL1 Kallmann syndrome 1 sequence Hs.514284 NFE2L1 Nuclear factor (erythroid-derived 2)-like 1 Hs.840 INDO Indoleamine-pyrrole 2,3 dioxygenase Hs.408903 C2 Complement component 2 Hs.474787 IL2RB Interleukin 2 receptor, beta Hs.375600 IGHG1 Immunoglobulin heavy constant gamma 1 (G1m marker) Hs.516249 IL1R1 Interleukin 1 receptor, type I Hs.93304 PLA2G7 Phospholipase A2, group VII (platelet-activating factor

acetylhydrolase, plasma) Hs.407135 ADA Adenosine deaminase Hs.8986 C1QB Complement component 1, q subcomponent, beta polypeptide Hs.224616 EDEM1 ER degradation enhancer, mannosidase alpha-like 1 Hs.29499 TLR3 Toll-like receptor 3 Hs.1116 LTBR Lymphotoxin beta receptor (TNFR superfamily, member 3) Hs.286073 PPM1D Protein phosphatase 1D magnesium-dependent, delta isoform Hs.409934 HLA-

DQB1 Major histocompatibility complex, class II, DQ beta 1

Hs.437058 STAT5A Signal transducer and activator of transcription 5A Hs.154654 CYP1B1 Cytochrome P450, family 1, subfamily B, polypeptide 1 Hs.75348 PSME1 Proteasome (prosome, macropain) activator subunit 1 (PA28 alpha) Hs.348935 IGLL1 Immunoglobulin lambda-like polypeptide 1 Hs.518808 AFP Alpha-fetoprotein Hs.325978 IL18BP Interleukin 18 binding protein Hs.75256 RGS1 Regulator of G-protein signalling 1 Hs.487294 ERCC2 Excision repair cross-complementing rodent repair deficiency,

complementation group 2 (xeroderma pigmentosum D) Hs.110571 GADD45B Growth arrest and DNA-damage-inducible, beta Hs.112405 S100A9 S100 calcium binding protein A9 (calgranulin B) Hs.753 FPR1 Formyl peptide receptor 1 Hs.105806 GNLY Granulysin Hs.463978 MAP2K6 Mitogen-activated protein kinase kinase 6 Hs.174195 IFITM2 Interferon induced transmembrane protein 2 (1-8D) Hs.509554 HIF1A Hypoxia-inducible factor 1, alpha subunit (basic helix-looP-helix

transcription factor) Hs.517240 IFNGR2 Interferon gamma receptor 2 (interferon gamma transducer 1) Hs.444451 ZAK Sterile alpha motif and leucine zipper containing kinase AZK Hs.512898 TNFRSF14 Tumor necrosis factor receptor superfamily, member 14 (herpesvirus

entry mediator) Hs.479220 PROM1 Prominin 1 Hs.156519 MSH2 MutS homolog 2, colon cancer, nonpolyposis type 1 (E. coli)

343

UniGene ID

Symbol Name

Hs.59554 SESN1 Sestrin 1 Hs.524214 MLF2 Myeloid leukemia factor 2 Hs.117825 PARP4 Poly (ADP-ribose) polymerase family, member 4 Hs.118631 TIMELESS Timeless homolog (Drosophila) Hs.192374 TRA1 Tumor rejection antigen (gp96) 1 Hs.523332 OAT Ornithine aminotransferase (gyrate atrophy) Hs.191734 MGST3 Microsomal glutathione S-transferase 3

34

4 App

endi

x F:

Ref

eren

ce R

NA

com

pari

son:

Pre

dict

ions

of h

isto

logi

cal s

ubty

pe

Pool

ed c

ell l

ine

refe

renc

e R

NA

dat

aset

Po

oled

tum

our

refe

renc

e R

NA

dat

aset

Arr

ay id

C

lass

labe

l

Num

ber

of g

enes

in

cl

assi

fier

Dia

gona

l L

inea

r D

iscr

imin

ant

Ana

lysi

s C

orre

ct?

1-N

eare

st

Nei

ghbo

ur C

orre

ct?

3-N

eare

st

Nei

ghbo

urs

Cor

rect

?

Nea

rest

C

entr

oid

Cor

rect

?

Num

ber

of g

enes

in

clas

sifie

r

Dia

gona

l L

inea

r D

iscr

imin

ant

Ana

lysi

s C

orre

ct?

1- Nea

rest

N

eigh

bor

Cor

rect

?

3-N

eare

st

Nei

ghbo

rs

Cor

rect

?

Nea

rest

C

entr

oid C

orre

ct?

P001

09

Endo

met

rioid

70

1 N

O

NO

N

O

NO

44

5 N

O

NO

N

O

NO

P0

0121

En

dom

etrio

id

684

YES

N

O

NO

Y

ES

418

YES

N

O

NO

Y

ES

P001

27

Endo

met

rioid

70

1 N

O

NO

N

O

NO

42

2 Y

ES

YES

Y

ES

YES

P0

0136

En

dom

etrio

id

680

YES

Y

ES

YES

Y

ES

418

YES

Y

ES

YES

Y

ES

P001

38

Endo

met

rioid

66

5 Y

ES

YES

Y

ES

YES

45

3 N

O

NO

N

O

NO

P0

0145

En

dom

etrio

id

688

YES

Y

ES

YES

Y

ES

428

YES

Y

ES

NO

Y

ES

P002

01

Endo

met

rioid

66

6 Y

ES

YES

Y

ES

YES

44

7 N

O

NO

N

O

NO

P0

0212

En

dom

etrio

id

678

YES

N

O

NO

Y

ES

413

YES

Y

ES

NO

Y

ES

P002

17

Endo

met

rioid

66

3 Y

ES

YES

Y

ES

YES

43

0 Y

ES

NO

N

O

YES

P0

0909

En

dom

etrio

id

696

YES

N

O

NO

Y

ES

420

YES

Y

ES

YES

Y

ES

RB

H 9

4020

En

dom

etrio

id

656

YES

Y

ES

YES

Y

ES

432

YES

N

O

NO

Y

ES

RB

H 9

4051

En

dom

etrio

id

662

YES

N

O

NO

Y

ES

454

NO

N

O

NO

N

O

RB

H 9

4123

En

dom

etrio

id

656

YES

Y

ES

YES

Y

ES

396

YES

Y

ES

NO

Y

ES

P000

92

Muc

inou

s 66

9 Y

ES

YES

Y

ES

YES

39

3 Y

ES

YES

Y

ES

YES

P0

0131

M

ucin

ous

665

YES

Y

ES

YES

Y

ES

483

NO

Y

ES

NO

N

O

P001

54

Muc

inou

s 67

6 N

O

NO

N

O

NO

45

0 N

O

NO

N

O

NO

P0

0229

M

ucin

ous

751

NO

N

O

NO

N

O

466

NO

N

O

NO

N

O

P002

45

Muc

inou

s 61

0 Y

ES

YES

Y

ES

YES

39

1 Y

ES

YES

Y

ES

YES

P0

0273

M

ucin

ous

617

YES

Y

ES

YES

Y

ES

376

YES

Y

ES

YES

Y

ES

P002

82

Muc

inou

s 63

0 Y

ES

YES

Y

ES

YES

38

5 Y

ES

YES

Y

ES

YES

345

Pool

ed c

ell l

ine

refe

renc

e R

NA

dat

aset

Po

oled

tum

our

refe

renc

e R

NA

dat

aset

Arr

ay id

C

lass

labe

l

Num

ber

of g

enes

in

cl

assi

fier

Dia

gona

l L

inea

r D

iscr

imin

ant

Ana

lysi

s C

orre

ct?

1-N

eare

st

Nei

ghbo

ur C

orre

ct?

3-N

eare

st

Nei

ghbo

urs

Cor

rect

?

Nea

rest

C

entr

oid

Cor

rect

?

Num

ber

of g

enes

in

clas

sifie

r

Dia

gona

l L

inea

r D

iscr

imin

ant

Ana

lysi

s C

orre

ct?

1- Nea

rest

N

eigh

bor

Cor

rect

?

3-N

eare

st

Nei

ghbo

rs

Cor

rect

?

Nea

rest

C

entr

oid C

orre

ct?

P002

93

Muc

inou

s 60

4 Y

ES

YES

Y

ES

YES

39

5 Y

ES

YES

Y

ES

YES

P0

0316

M

ucin

ous

606

YES

Y

ES

YES

Y

ES

395

YES

Y

ES

YES

Y

ES

P003

23

Muc

inou

s 63

7 Y

ES

YES

Y

ES

YES

53

2 N

O

YES

N

O

NO

P0

0358

M

ucin

ous

658

YES

Y

ES

YES

Y

ES

498

NO

N

O

NO

N

O

P007

84

Muc

inou

s 67

7 N

O

YES

Y

ES

NO

41

9 N

O

YES

Y

ES

YES

P0

0807

M

ucin

ous

631

YES

Y

ES

YES

Y

ES

370

YES

Y

ES

YES

Y

ES

P008

19

Muc

inou

s 63

7 Y

ES

YES

Y

ES

YES

37

1 Y

ES

YES

Y

ES

YES

P0

0934

M

ucin

ous

649

YES

Y

ES

YES

Y

ES

408

YES

Y

ES

YES

Y

ES

P009

35

Muc

inou

s 70

6 N

O

YES

N

O

NO

39

5 N

O

YES

Y

ES

NO

R

BH

890

09

Muc

inou

s 59

8 Y

ES

YES

Y

ES

YES

40

1 Y

ES

YES

Y

ES

YES

P0

0208

Se

rous

64

3 Y

ES

YES

Y

ES

YES

42

9 N

O

NO

N

O

NO

P0

0446

Se

rous

65

6 Y

ES

YES

Y

ES

YES

44

3 Y

ES

YES

Y

ES

YES

P0

0531

Se

rous

64

4 Y

ES

YES

Y

ES

YES

42

3 Y

ES

YES

Y

ES

YES

P0

0667

Se

rous

49

5 Y

ES

YES

Y

ES

YES

43

0 N

O

YES

Y

ES

NO

P0

0703

Se

rous

52

2 Y

ES

YES

Y

ES

YES

44

2 N

O

YES

Y

ES

NO

P0

0706

Se

rous

56

8 Y

ES

YES

Y

ES

YES

42

8 Y

ES

YES

Y

ES

YES

P0

0756

Se

rous

65

7 Y

ES

YES

Y

ES

YES

42

1 Y

ES

YES

Y

ES

YES

P0

0772

Se

rous

56

6 Y

ES

YES

Y

ES

YES

42

1 Y

ES

YES

Y

ES

YES

P0

0933

Se

rous

60

5 Y

ES

YES

N

O

YES

41

7 Y

ES

YES

Y

ES

YES

R

BH

850

05

Sero

us

683

YES

Y

ES

YES

Y

ES

406

YES

Y

ES

YES

Y

ES

RB

H 8

8014

Se

rous

67

7 Y

ES

YES

Y

ES

YES

42

6 Y

ES

YES

Y

ES

YES

R

BH

890

18

Sero

us

688

YES

Y

ES

YES

Y

ES

397

YES

Y

ES

YES

Y

ES

RB

H 8

9023

Se

rous

69

4 Y

ES

YES

Y

ES

NO

44

0 Y

ES

YES

Y

ES

YES

34

6

Pool

ed c

ell l

ine

refe

renc

e R

NA

dat

aset

Po

oled

tum

our

refe

renc

e R

NA

dat

aset

Arr

ay id

C

lass

labe

l

Num

ber

of g

enes

in

cl

assi

fier

Dia

gona

l L

inea

r D

iscr

imin

ant

Ana

lysi

s C

orre

ct?

1-N

eare

st

Nei

ghbo

ur C

orre

ct?

3-N

eare

st

Nei

ghbo

urs

Cor

rect

?

Nea

rest

C

entr

oid

Cor

rect

?

Num

ber

of g

enes

in

clas

sifie

r

Dia

gona

l L

inea

r D

iscr

imin

ant

Ana

lysi

s C

orre

ct?

1- Nea

rest

N

eigh

bor

Cor

rect

?

3-N

eare

st

Nei

ghbo

rs

Cor

rect

?

Nea

rest

C

entr

oid C

orre

ct?

RB

H 9

1007

Se

rous

66

8 Y

ES

YES

Y

ES

YES

41

7 Y

ES

YES

Y

ES

YES

R

BH

910

52

Sero

us

685

YES

Y

ES

YES

Y

ES

392

NO

Y

ES

YES

N

O

RB

H 9

2001

Se

rous

67

3 N

O

NO

Y

ES

NO

42

6 Y

ES

YES

Y

ES

YES

R

BH

920

02

Sero

us

655

YES

N

O

YES

Y

ES

411

YES

Y

ES

YES

Y

ES

RB

H 9

2003

Se

rous

71

0 N

O

NO

Y

ES

NO

42

7 Y

ES

YES

Y

ES

YES

R

BH

920

04

Sero

us

660

YES

Y

ES

YES

Y

ES

424

YES

Y

ES

YES

Y

ES

RB

H 9

2007

Se

rous

69

2 Y

ES

YES

Y

ES

YES

43

8 Y

ES

YES

Y

ES

YES

R

BH

920

15

Sero

us

694

YES

Y

ES

YES

Y

ES

428

YES

Y

ES

YES

Y

ES

RB

H 9

2023

Se

rous

62

3 N

O

NO

N

O

NO

43

0 Y

ES

YES

Y

ES

YES

R

BH

920

71

Sero

us

668

YES

N

O

YES

Y

ES

427

YES

Y

ES

YES

Y

ES

RB

H 9

2071

Se

rous

66

4 Y

ES

YES

Y

ES

YES

42

7 Y

ES

YES

Y

ES

YES

R

BH

930

03

Sero

us

689

NO

N

O

YES

Y

ES

446

NO

Y

ES

YES

N

O

RB

H 9

3013

Se

rous

62

5 Y

ES

NO

Y

ES

YES

42

9 Y

ES

YES

Y

ES

YES

R

BH

930

21

Sero

us

570

YES

Y

ES

YES

Y

ES

431

YES

Y

ES

YES

Y

ES

RB

H 9

3083

Se

rous

66

2 Y

ES

YES

Y

ES

YES

43

9 Y

ES

YES

Y

ES

YES

R

BH

930

87

Sero

us

674

YES

Y

ES

YES

Y

ES

428

YES

Y

ES

YES

Y

ES

RB

H 9

3108

Se

rous

65

4 Y

ES

YES

Y

ES

YES

43

1 Y

ES

YES

Y

ES

YES

R

BH

931

20

Sero

us

666

YES

Y

ES

YES

Y

ES

408

YES

Y

ES

YES

Y

ES

RB

H 9

3130

Se

rous

71

2 N

O

NO

N

O

NO

39

3 Y

ES

YES

Y

ES

YES

R

BH

931

31

Sero

us

657

YES

Y

ES

YES

Y

ES

424

YES

Y

ES

YES

Y

ES

RB

H 9

4056

Se

rous

65

1 Y

ES

YES

Y

ES

YES

43

3 Y

ES

YES

Y

ES

YES

R

BH

940

84

Sero

us

648

YES

Y

ES

YES

Y

ES

436

NO

Y

ES

YES

N

O

RB

H 9

4093

Se

rous

66

1 Y

ES

NO

Y

ES

YES

42

6 Y

ES

YES

Y

ES

YES

347

Pool

ed c

ell l

ine

refe

renc

e R

NA

dat

aset

Po

oled

tum

our

refe

renc

e R

NA

dat

aset

Arr

ay id

C

lass

labe

l

Num

ber

of g

enes

in

cl

assi

fier

Dia

gona

l L

inea

r D

iscr

imin

ant

Ana

lysi

s C

orre

ct?

1-N

eare

st

Nei

ghbo

ur C

orre

ct?

3-N

eare

st

Nei

ghbo

urs

Cor

rect

?

Nea

rest

C

entr

oid

Cor

rect

?

Num

ber

of g

enes

in

clas

sifie

r

Dia

gona

l L

inea

r D

iscr

imin

ant

Ana

lysi

s C

orre

ct?

1- Nea

rest

N

eigh

bor

Cor

rect

?

3-N

eare

st

Nei

ghbo

rs

Cor

rect

?

Nea

rest

C

entr

oid C

orre

ct?

WM

455

A

Sero

us

678

YES

Y

ES

YES

Y

ES

415

YES

Y

ES

YES

Y

ES

WM

494

A

Sero

us

650

YES

Y

ES

YES

Y

ES

425

YES

Y

ES

YES

Y

ES

Sum

mar

y of

pre

dict

ions

of h

isto

logi

cal s

ubty

pe u

sing

eith

er p

oole

d ce

ll lin

e or

tum

our

refe

renc

e R

NA

Ref

eren

ce R

NA

D

iago

nal L

inea

r D

iscr

imin

ant A

naly

sis

1-N

eare

st N

eigh

bour

3-

Nea

rest

Nei

ghbo

urs

Nea

rest

Cen

troi

d

Cel

l lin

e re

fere

nce

84%

75

%

82%

84

%

Tum

our r

efer

ence

75

%

84%

76

%

76%

348

Appendix G: Genes with minimum two-fold mean expression differences between survival groups Genes selected for RT-PCR validation shown in bold.

UniGene Symbol UniGene Name

Mean Difference in expression ratio between groups

KLK7 Kallikrein 7 (chymotryptic, stratum corneum) 10.778475

SLPI Secretory leukocyte protease inhibitor (antileukoproteinase) 8.275739

TSPAN-1 Tetraspan 1 4.402794 HBB Hemoglobin, beta 3.523717 DEFB1 Defensin, beta 1 3.217985 IL1R1 Interleukin 1 receptor, type I 2.903250 WNT7A Wingless-type MMTV integration site family, member 7A 2.727791

SCYE1 Small inducible cytokine subfamily E, member 1 (endothelial monocyte-activating) 2.726838

S100A2 S100 calcium binding protein A2 2.625996

SLC6A8 Solute carrier family 6 (neurotransmitter transporter, creatine), member 8 2.414614

TNFSF10 Tumour necrosis factor (ligand) superfamily, member 10 2.413117

S100A1 S100 calcium binding protein A1 2.243173 IGFBP5 Insulin-like growth factor binding protein 5 2.198474 SHARP SMART/HDAC1 associated repressor protein 2.156932

SLC2A4 Solute carrier family 2 (facilitated glucose transporter), member 4 1.923820

RUNX3 Runt-related transcription factor 3 1.879850 CAV1 Caveolin 1, caveolae protein, 22kDa 1.816437 MYH8 Myosin, heavy polypeptide 8, skeletal muscle, perinatal 1.786125 TU3A TU3A protein 1.716421 FBP2 Fructose-1,6-bisphosphatase 2 1.623844

TIMM44 Translocase of inner mitochondrial membrane 44 homolog (yeast) 1.493028

Ufm1 Ubiquitin-fold modifier 1 1.481993

MALAT1 Metastasis associated lung adenocarcinoma transcript 1 (non-coding RNA) 1.404517

FAAH Fatty acid amide hydrolase 1.240690

GNA11 Guanine nucleotide binding protein (G protein), alpha 11 (Gq class) 1.220377

LOC150383 Similar to RIKEN cDNA 2210021J22 1.199154 HOXB6 Homeo box B6 1.184347 KLK2 Kallikrein 2, prostatic 1.148564 FNTB Farnesyltransferase, CAAX box, beta 1.129690 CSTF3 Hypothetical protein LOC283267 1.118797

CR1 Complement component (3b/4b) receptor 1, including Knops blood group system 1.015026

OXTR Oxytocin receptor 1.006055 GNB5 Guanine nucleotide binding protein (G protein), beta 5 0.993984 DDR1 Discoidin domain receptor family, member 1 0.975661 ACP6 Acid phosphatase 6, lysophosphatidic 0.932364 NAPSA Napsin A aspartic peptidase 0.906903 NBS1 Nijmegen breakage syndrome 1 (nibrin) 0.889349

349



PVALB Parvalbumin 0.855187

ATP5S ATP synthase, H+ transporting, mitochondrial F0 complex, subunit s (factor B) 0.801345

IL27RA Interleukin 27 receptor, alpha 0.783403 GCLM Glutamate-cysteine ligase, modifier subunit 0.778351 FLJ11712 Hypothetical protein FLJ11712 0.720180 CDNA: FLJ22256 fis, clone HRC02860

CDNA: FLJ22256 fis, clone HRC02860 0.713549

OLFM4 Olfactomedin 4 0.692021

CFTR Cystic fibrosis transmembrane conductance regulator, ATP-binding cassette (sub-family C, member 7) 0.544892

PLA2G5 Phospholipase A2, group V 0.526219 FOXH1 Forkhead box H1 0.526186

SERPINA5 Serine (or cysteine) proteinase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 5 0.482543

PMM2 Phosphomannomutase 2 0.458712 JDP2 Jun dimerization protein 2 0.431890 PPARG Peroxisome proliferative activated receptor, gamma 0.303424 IMPA1 Inositol(myo)-1(or 4)-monophosphatase 1 0.280997 GPC3 Glypican 3 0.191362 APOA2 Apolipoprotein A-II -0.025840 APOB Apolipoprotein B (including Ag(x) antigen) -0.034497 ORM2 Orosomucoid 2 -0.052697 AFP Alpha-fetoprotein -0.060447

APBB1IP Amyloid beta (A4) precursor protein-binding, family B, member 1 interacting protein -0.073620

AKR1C2 Aldo-keto reductase family 1, member C2 (dihydrodiol dehydrogenase 2; bile acid binding protein; 3-alpha hydroxysteroid dehydrogenase, type III)

-0.090380

GATA3 GATA binding protein 3 -0.090536 IL13RA2 Interleukin 13 receptor, alpha 2 -0.093234 FGL1 Fibrinogen-like 1 -0.107689 S100B S100 calcium binding protein, beta (neural) -0.111323

AKR1C3 Aldo-keto reductase family 1, member C3 (3-alpha hydroxysteroid dehydrogenase, type II) -0.118542

ITIH2 Inter-alpha (globulin) inhibitor H2 -0.121214 SLC39A14 Solute carrier family 39 (zinc transporter), member 14 -0.211912 SCD Stearoyl-CoA desaturase (delta-9-desaturase) -0.247538 CDH18 Cadherin 18, type 2 -0.276031

SLC2A3 Solute carrier family 2 (facilitated glucose transporter), member 3 -0.312636

GFI1 Growth factor independent 1 -0.400517 AP1S2 Adaptor-related protein complex 1, sigma 2 subunit -0.423360

BAP1 BRCA1 associated protein-1 (ubiquitin carboxy-terminal hydrolase) -0.443343

POSTN Periostin, osteoblast specific factor -0.506935 NGFB Nerve growth factor, beta polypeptide -0.527451 NRG2 Neuregulin 2 -0.531888 FN1 Fibronectin 1 -0.579314 MATN3 Matrilin 3 -0.580105 GCSH Glycine cleavage system protein H (aminomethyl carrier) -0.586785

350



KIAA1078 KIAA1078 protein -0.594821 PHLDA2 Pleckstrin homology-like domain, family A, member 2 -0.674449 LHB Luteinizing hormone beta polypeptide -0.734889 TBC1D8 TBC1 domain family, member 8 (with GRAM domain) -0.778555 ABCG2 ATP-binding cassette, sub-family G (WHITE), member 2 -0.780847

UCHL1 Ubiquitin carboxyl-terminal esterase L1 (ubiquitin thiolesterase) -0.787579

SULF1 Sulfatase 1 -0.798672 GTPBP4 GTP binding protein 4 -0.835177

SEMA3C Sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3C -0.847566

ITGBL1 Integrin, beta-like 1 (with EGF-like repeat domains) -0.876397 COL8A1 Collagen, type VIII, alpha 1 -0.891811 EFHD1 EF hand domain containing 1 -0.899160 PLA2R1 Phospholipase A2 receptor 1, 180kDa -0.930531 CYP1B1 Cytochrome P450, family 1, subfamily B, polypeptide 1 -0.938289

EIF2B4 Eukaryotic translation initiation factor 2B, subunit 4 delta, 67kDa -0.974666

MEIS2 Meis1, myeloid ecotropic viral integration site 1 homolog 2 (mouse) -1.000953

CRABP2 Cellular retinoic acid binding protein 2 -1.014108 GJB2 Gap junction protein, beta 2, 26kDa (connexin 26) -1.030909 TNNT1 Troponin T1, skeletal, slow -1.096990 NRXN2 Neurexin 2 -1.116658 HIC1 Hypermethylated in cancer 1 -1.159792 PCP4 Purkinje cell protein 4 -1.208516 CSPG2 Chondroitin sulfate proteoglycan 2 (versican) -1.209208

COL10A1 Collagen, type X, alpha 1(Schmid metaphyseal chondrodysplasia) -1.595166

ACADS Acyl-Coenzyme A dehydrogenase, C-2 to C-3 short chain -1.689010

CSF2RB Colony stimulating factor 2 receptor, beta, low-affinity (granulocyte-macrophage) -1.762307

PVRL3 Poliovirus receptor-related 3 -1.803357

ERBB2 V-erb-b2 erythroblastic leukemia viral oncogene homolog 2, neuro/glioblastoma derived oncogene homolog (avian)

-1.961532

DSPG3 Dermatan sulfate proteoglycan 3 -2.175947 PNOC Prepronociceptin -2.382475 RBP1 Retinol binding protein 1, cellular -2.536895 COMP Cartilage oligomeric matrix protein -2.878504 HOXD4 Homeo box D4 -3.151666 NRGN Neurogranin (protein kinase C substrate, RC3) -4.199243

351

Appendix H: Higher-level gene ontologies represented by genes differentially expressed between survival groups High Level Function Significance # Genes Cancer 4.36 X 10-7 - 9.83 x 10-3 18 Cell Death 2.71 x 10-6 - 9.83 x 10-3 18 Reproductive System Disease 3.95 x 10-6 - 9.83 x 10-3 12 Cellular Growth and Proliferation 4.88 x 10-6 - 9.83 x 10-3 13 Skeletal and Muscular Disorders 4.88 x 10-6 - 9.83 x 10-3 8 Tumor Morphology 2.34 x 10-5 - 9.83 x 10-3 7 Gastrointestinal Disease 2.36 x 10-5 - 9.83 x 10-3 7 Cellular Assembly and Organization 2.36 x 10-5 - 9.83 x 10-3 6 Ophthalmic Disease 2.36 x 10-5 - 9.83 x 10-3 3 Cell Morphology 2.36 x 10-5 - 9.83 x 10-3 11 Cellular Movement 3.21 x 10-5 - 4.92 x 10-3 10 Cell Cycle 4.72 x 10-5 - 9.83 x 10-3 14 Connective Tissue Development and Function 7.05 x 10-5 - 9.83 x 10-3 9 Tissue Development 7.05 x 10-5 - 9.83 x 10-3 9 Gene Expression 1.33 x 10-4 - 4.92 x 10-3 13 Renal and Urological Disease 1.41 x 10-4 - 4.92 x 10-3 3 Developmental Disorder 2.33 x 10-4 - 9.83 x 10-3 4 Organ Morphology 2.33 x 10-4 - 9.83 x 10-3 6 Hematological Disease 2.33 x 10-4 - 9.83 x 10-3 6 Organismal Injury and Abnormalities 2.33 x 10-4 - 9.83 x 10-3 6 Cellular Development 3.29 x 10-4 - 9.83 x 10-3 11 Skeletal and Muscular System Development and Function 3.29 x 10-4 - 9.83 x 10-3 7

Respiratory Disease 3.49 x 10-4 - 9.83 x 10-3 5 Organ Development 3.66 x 10-4 - 9.83 x 10-3 5 Hepatic System Development and Function 4.48 x 10-4 - 9.83 x 10-3 3 Reproductive System Development and Function 4.87 x 10-4 - 9.83 x 10-3 6 Cellular Function and Maintenance 4.87 x 10-4 - 9.83 x 10-3 7 Tissue Morphology 4.87 x 10-4 - 9.83 x 10-3 11 Connective Tissue Disorders 5.57 x 10-4 - 4.92 x 10-3 5 Hematological System Development and Function 6.48 x 10-4 - 9.83 x 10-3 7 Immune & Lymphatic System Development & Function 6.48 x 10-4 - 5.59 x 10-3 5

Organismal Functions 7.58 x 10-4 - 7.58 x 10-4 3 Dermatological Diseases and Conditions 7.58 x 10-4 - 9.83 x 10-3 6 Neurological Disease 8.35 x 10-4 - 9.83 x 10-3 10 Immunological Disease 1.17 x 10-3 - 9.83 x 10-3 6 DNA Replication, Recombination, and Repair 1.18 x 10-3 - 9.83 x 10-3 5 Embryonic Development 1.26 x 10-3 - 9.83 x 10-3 8 Post-Translational Modification 1.78 x 10-3 - 1.78 x 10-3 2 Cell Signaling 1.84 x 10-3 - 3.71 x 10-3 5 Vitamin and Mineral Metabolism 1.84 x 10-3 - 1.84 x 10-3 4 Inflammatory Disease 2.07 x 10-3 - 4.92 x 10-3 3 Cellular Compromise 2.07 x 10-3 - 9.83 x 10-3 4 Organismal Development 2.78 x 10-3 - 9.83 x 10-3 5 Small Molecule Biochemistry 3.71 x 10-3 - 3.71 x 10-3 3 Organismal Survival 3.82 x 10-3 - 9.83 x 10-3 10 Cardiovascular Disease 3.82 x 10-3 - 9.83 x 10-3 3 Cardiovascular System Development and Function 4.23 x 10-3 - 9.83 x 10-3 3

352

High Level Function Significance # Genes Cell-To-Cell Signaling and Interaction 4.67 x 10-3 - 9.83 x 10-3 4 Energy Production 4.67 x 10-3 - 4.67 x 10-3 2 Nucleic Acid Metabolism 4.67 x 10-3 - 9.83 x 10-3 3 Hair and Skin Development and Function 4.92 x 10-3 - 9.83 x 10-3 5 Carbohydrate Metabolism 4.92 x 10-3 - 4.92 x 10-3 1 Endocrine System Development and Function 4.92 x 10-3 - 9.83 x 10-3 4 Genetic Disorder 4.92 x 10-3 - 9.83 x 10-3 3 Hepatic System Disease 4.92 x 10-3 - 9.83 x 10-3 2 Metabolic Disease 4.92 x 10-3 - 4.92 x 10-3 1 Nutritional Disease 4.92 x 10-3 - 4.92 x 10-3 1 Visual System Development and Function 4.92 x 10-3 - 9.83 x 10-3 4 Nervous System Development and Function 4.92 x 10-3 - 9.83 x 10-3 5 Renal & Urological System Development and Function 4.92 x 10-3 - 9.83 x 10-3 3 Free Radical Scavenging 4.92 x 10-3 - 4.92 x 10-3 1 Immune Response 4.92 x 10-3 - 4.92 x 10-3 1 Endocrine System Disorders 4.92 x 10-3 - 4.92 x 10-3 1 Viral Function 4.92 x 10-3 - 9.83 x 10-3 1 Cellular Response to Therapeutics 4.92 x 10-3 - 9.83 x 10-3 1 Behaviour 4.92 x 10-3 - 9.83 x 10-3 1 Protein Synthesis 9.83 x 10-3 - 9.83 x 10-3 1

353

Appendix I: Samples used to generate predictive gene expression signature of primary EOC

Array ID Cancer type Subtype Pathology review comments % Tumour

UP012 Breast Ductal Breast (ductal) 90% (30% necrotic) UP014 Breast Lobular Breast (lobular) 70% UP016 Breast Breast(ductal) 70% UP064 Breast Lobular Breast(lobularg2) 90% UP082 Breast Ductal Breast 100% UP096 Breast Ductal Breast (ductal) 90% UP097 Breast Lobular Breast(lobular) 95% UP098 Breast Lobular Breast (lobular) 90% UP102 Breast Ductal Breast (ductal) 90% UP111 Breast Ductal Breast(ductal) 95% UP113 Breast Ductal Breast (ductal) 80% UP116 Breast Lobular Breast (lobular) 70% UP119 Breast Ductal Breast (ductal) 90% UP161 Breast Ductal Breast(ductal) T130% T250% UP166 Breast Lobular Breast(lobular) 95% UP213 Breast Ductal Breast(ductal) 90% UP423 Breast Breast Infiltrating ductal 60% UP426 Breast Breast lobular 80% UP428 Breast Breast Infiltrating ductal 70% UP451 Breast Infiltrating ductal 60% UP017 Colorectal Colorectal(moderate) 80% UP019 Colorectal Colorectal (moderate) 70% UP047 Colorectal Colorectal (moderate) 30% UP062 Colorectal Colorectal 100% UP063 Colorectal Colorectal (moderate) 70% UP069 Colorectal UP080 Colorectal Colorectal (moderate) 30% UP341 Colorectal Colon adenocarcinoma 40% UP356 Colorectal Mucinous adenocarcinoma 100% UP369 Colorectal Colon adenocarcnimoa 100% UP371 Colorectal Colonic adenocarcinoma 70% UP380 Colorectal Adenocarinoma of rectum 20% UP388 Colorectal Colon adenocarcinoma (moderate) 5% UP399 Colorectal Colorectal adenocarcinoma (moderate) 80% UP442 Colorectal Colon adenocarcinoma (moderate) 85% UP453 Colorectal Colon adenocarcinoma (moderate) 70% UP034 Gastric Gastric (intestinal) 80% UP040 Gastric Diffuse Gastric (diffuse) 10% UP045 Gastric Signet ring Gastric(diffuse) 80% UP057 Gastric Intestinal Gastric (moderate) 85% UP058 Gastric Gastric(diffuse) 40% UP085 Gastric diffuse Gastric (diffuse) 30% UP127 Gastric Gastric (diffuse) 50%

354


UP136 Gastric Gastric (poorly diff.) 60% UP137 Gastric Diffuse Gastric (signet) 5% UP143 Gastric Gastric (mixed) 90% UP158 Gastric Intestinal Gastric (intestinal) 90% UP162 Gastric Gastric(diffuse) 85% UP163 Gastric Gastric(intestinal) 15% UP164 Gastric Diffuse Gastric(diffuse) 10% UP398 Gastric Gastric (signet ring) 30% UP294 Lung-adeno adenocarcinoma Kf UP295 Lung-adeno adenocarcinoma Kf UP301 Lung-adeno adenocarcinoma Kf UP334 Lung-adeno adenocarcinoma Kf UP339 Lung-adeno adenocarcinoma Kf UP361 Lung-adeno adenocarcinoma Kf UP375 Lung-adeno adenocarcinoma Kf UP382 Lung-adeno adenocarcinoma Kf UP291 Lung-lc large cell Kf UP297 Lung-lc large cell Kf UP328 Lung-lc large cell Kf UP335 Lung-lc large cell Kf UP370 Lung-lc large cell Kf UP372 Lung-lc large cell Kf UP035 Lung-scc scc SCC(poorly diff.) 80% UP065 Lung-scc scc SCC(poorly diff.) 80% UP293 Lung-scc scc Kf UP333 Lung-scc scc Kf UP337 Lung-scc scc Kf UP373 Lung-scc scc Kf UP032 Melanoma Melanoma 100% UP055 Melanoma Melanoma 90% UP060 Melanoma Melanoma 85% UP067 Melanoma Melanoma 100% UP130 Melanoma Melanoma 100% UP153 Melanoma Melanoma 90% UP236 Melanoma Melanoma 100% UP268 Melanoma Melanoma (spindle cell)(atypical) 100% UP269 Melanoma Melanoma 95% UP348 Melanoma Melanoma 99% UP125 Ovarian serous Ovarian serous papillary 80% UP128 Ovarian serous Poorly diff carcinoma of ovary 100% UP135 Ovarian serous Ovarian (serous) 80% UP139 Ovarian endometrioid Ovarian (endometrioid) 100%(60%necrotic) UP140 Ovarian serous Ovarian(serous) 100% UP146 Ovarian serous Ddx MMMT, munous tumour, Kruk 95% UP149 Ovarian serous Ovarian (poorly) 100% UP152 Ovarian endometrioid Ovarian (endometrioid) 100% UP156 Ovarian serous Ovarian(endometrioid) 100% UP165 Ovarian serous Ovarian (endometrioid) 100% UP329 Ovarian serous Ovarian (serous) 90%

355


UP330 Ovarian serous Ovarian (serous) 80% UP351 Ovarian UP377 Ovarian serous Ovarian (serous) 95% UP075 Ovarian mucinous LMP Ovarian mucinous 100% UP083 Ovarian mucinous LMP UP115 Ovarian mucinous LMP Ovarian(mucinous) 100% UP286 Ovarian mucinous LMP Mucinous borderline 100% UP289 Ovarian mucinous Benign mucinous tumour UP323 Pancreas Adenocarcinoma c/w pancreas 50% UP324 Pancreas Adenocarcinoma c/w pancreas 40% UP342 Pancreas Adenocarcinoma c/w pancreas 30% UP345 Pancreas Adenocarcinoma c/w pancreas 30% UP357 Pancreas Adenocarcinoma c/w pancreas 20% UP445 Pancreas Pancreas adenocarcinoma 40% UP108 Renal Renal(clear cell) 100% UP117 Renal Renal(clear cell) 100% UP124 Renal Renal(clear cell) 100% UP142 Renal Renal(clear cell) 100% UP173 Renal Renal 70% UP178 Renal Renal 100% UP186 Renal Renal cell (clear cell type) 100%(40%Necrotic) UP190 Renal Renal(clear cell) 100% UP276 Renal Carcinoma

UP360 Renal Renal cell carcinoma (mixed clear cell/papillary) 100%

UP179 SCC Scc 80% UP180 SCC SCC(moderate) 90% UP182 SCC SCC(moderate) 70% UP184 SCC Scc 90% UP206 SCC Scc 60% UP221 SCC Scc 50% UP239 SCC UP273 SCC SCC (larynx) (moderate) 80% UP346 SCC SCC(mouth) (moderate) 50% UP415 Uterine Endometroid 70% UP418 Uterine Endometroid 95% UP419 Uterine endometroid Endometrial adenocarcinoma (uterine) 100% UP421 Uterine Endometroid 90% UP422 Uterine Endometroid 80% UP427 Uterine Endometroid 70% UP439 Uterine Endometroid 90% UP443 Uterine Endometroid 95% UP446 Uterine endometroid Endometrial adenocarcinoma 100%

356

Appendix J: Output of prediction of primary ovarian origin for LMP and invasive EOC cohort Patient ID Histology Invasive/LMP 1kNN 3kNN Nearest

Centroid

Linear Discriminant Analysis

91.039 Serous Invasive Ovarian Ovarian Ovarian Other 85.064 Serous Invasive Ovarian Ovarian Ovarian Ovarian 86.058 Serous Invasive Ovarian Ovarian Ovarian Ovarian 91.007 Serous Invasive Ovarian Ovarian Ovarian Ovarian 91.052 Serous Invasive Ovarian Ovarian Ovarian Ovarian 93.001 Serous Invasive Ovarian Ovarian Ovarian Ovarian 93.004 Serous Invasive Ovarian Ovarian Ovarian Ovarian 93.117 Serous Invasive Ovarian Ovarian Ovarian Ovarian 93.131 Serous Invasive Ovarian Ovarian Ovarian Ovarian 94.017 Serous Invasive Ovarian Ovarian Ovarian Ovarian 94.127 Serous Invasive Ovarian Ovarian Ovarian Ovarian P00756 Serous Invasive Ovarian Ovarian Ovarian Ovarian 90.037 Serous LMP Ovarian Ovarian Ovarian Ovarian 90.063 Serous LMP Ovarian Ovarian Ovarian Ovarian 91.077 Serous LMP Ovarian Ovarian Ovarian Ovarian 93.007 Serous LMP Ovarian Ovarian Ovarian Ovarian 92.014 Serous LMP Ovarian Ovarian Ovarian Ovarian 92.018 Serous LMP Ovarian Ovarian Ovarian Ovarian 93.073 Serous LMP Ovarian Ovarian Ovarian Ovarian 93.079 Serous LMP Ovarian Ovarian Ovarian Ovarian 94.046 Serous LMP Ovarian Ovarian Ovarian Ovarian 95.006 Serous LMP Ovarian Ovarian Ovarian Ovarian 22027 Serous LMP Ovarian Ovarian Ovarian Ovarian 44232 Serous LMP Ovarian Ovarian Ovarian Ovarian 70056 Serous LMP Ovarian Ovarian Ovarian Ovarian 70057 Serous LMP Ovarian Ovarian Ovarian Ovarian 93.090 Serous LMP Ovarian Ovarian Ovarian Ovarian P00633 Serous LMP Ovarian Ovarian Ovarian Ovarian WM389A Serous LMP Ovarian Ovarian Ovarian Ovarian WM542A Serous LMP Ovarian Ovarian Ovarian Ovarian WM578A Serous LMP Ovarian Ovarian Ovarian Ovarian 90.007 Mucinous Invasive Other Other Other Other 94.036 Mucinous Invasive Other Other Other Other 93.064 Mucinous Invasive Ovarian Ovarian Ovarian Ovarian 94.112 Mucinous Invasive Ovarian Ovarian Ovarian Ovarian P00488 Mucinous LMP Other Other Other Other WM439A Mucinous LMP Other Other Other Other P00718 Mucinous LMP Other Other Other Other WM438 Mucinous LMP Other Other Other Other WM223 Mucinous LMP Other Other Other Other 94.080 Mucinous LMP Other Ovarian Other Other 92.011 Mucinous LMP Other Ovarian Other Other 93.085 Mucinous LMP Ovarian Ovarian Other Other

357

Patient ID Histology Invasive/LMP 1kNN 3kNN Nearest

Centroid

Linear Discriminant Analysis

51030 Mucinous LMP Ovarian Ovarian Other Other P00807 Mucinous LMP Ovarian Other Other Ovarian 93.002 Mucinous LMP Ovarian Ovarian Other Ovarian 51026 Mucinous LMP Ovarian Ovarian Other Ovarian P00627 Mucinous LMP Ovarian Ovarian Other Ovarian 94.072 Mucinous LMP Other Other Ovarian Ovarian 93.077 Mucinous LMP Ovarian Ovarian Ovarian Ovarian 44247 Mucinous LMP Ovarian Ovarian Ovarian Ovarian 94.030 Mucinous LMP Ovarian Ovarian Ovarian Ovarian P00784 Mucinous LMP Ovarian Ovarian Ovarian Ovarian P00934 Mucinous LMP Ovarian Ovarian Ovarian Ovarian P00935 Mucinous LMP Ovarian Ovarian Ovarian Ovarian

35

8 App

endi

x K

: Pre

dict

ive

gene

s exp

ress

ion

sign

atur

e of

pri

mar

y E

OC

Ran

k

Uni

Gen

e N

ame

Uni

gene

Sym

bol

t-va

lue

Para

met

ric

p-va

lue

% C

V

supp

ort

Mea

n of

rat

ios

in c

lass

1:

othe

r

Mea

n of

rat

ios i

n cl

ass 2

: Ova

rian

1 F-

box

prot

ein

21

FBXO

21

-12.

74

p <

0.00

0001

10

0 0.

856

2.68

2

Wilm

s tum

or 1

W

T1

-11.

74

p <

0.00

0001

10

0 0.

653

7.57

3 3

Zinc

fing

er p

rote

in 2

61

ZNF2

61

-7.4

1 p

< 0.

0000

01

100

0.85

5 1.

791

4 M

yelin

ass

ocia

ted

glyc

opro

tein

M

AG

-7.0

5 p

< 0.

0000

01

100

0.86

7 2.

398

5 Zi

nc fi

nger

pro

tein

, mul

tityp

e 2

ZFPM

2 -6

.98

p <

0.00

0001

10

0 0.

726

3.04

8

6 SW

I/SN

F re

late

d, m

atrix

ass

ocia

ted,

act

in d

epen

dent

regu

lato

r of c

hrom

atin

, su

bfam

ily d

, mem

ber 3

SM

ARC

D3

-6.8

9 p

< 0.

0000

01

100

0.81

3 2.

118

7 R

ap g

uani

ne n

ucle

otid

e ex

chan

ge fa

ctor

(GEF

) 3

RAPG

EF3

-6.8

7 p

< 0.

0000

01

100

0.83

2 2.

278

8 Sp

ondi

n 1,

ext

race

llula

r mat

rix p

rote

in

SPO

N1

-6.7

1 p

< 0.

0000

01

100

0.71

1 4.

235

10

Cat

enin

(cad

herin

-ass

ocia

ted

prot

ein)

, alp

ha-li

ke 1

C

TNN

AL1

-6.5

9 p

< 0.

0000

01

100

0.87

5 2.

195

11

LY6/

PLA

UR

dom

ain

cont

aini

ng 1

LY

PDC

1 -6

.58

p <

0.00

0001

10

0 1.

081

4.99

4 12

Sc

aven

ger r

ecep

tor c

lass

A, m

embe

r 3

SCAR

A3

-6.5

3 p

< 0.

0000

01

100

0.80

1 2.

855

13

Fibr

obla

st g

row

th fa

ctor

18

FGF1

8 -6

.51

p <

0.00

0001

10

0 0.

997

2.90

3 15

N

eura

l cel

l adh

esio

n m

olec

ule

1 N

CAM

1 -6

.22

p <

0.00

0001

10

0 0.

758

2.48

16

M

yosi

n, h

eavy

pol

ypep

tide

10, n

on-m

uscl

e M

YH10

-6

.17

p <

0.00

0001

10

0 0.

759

1.77

1 17

PD

Z do

mai

n co

ntai

ning

3

PDZK

3 -6

.15

p <

0.00

0001

10

0 0.

82

1.79

8 18

Pr

otei

n ki

nase

C, i

ota

PRK

CI

-6.1

5 p

< 0.

0000

01

100

0.96

9 2.

175

19

Gro

wth

arr

est-s

peci

fic 6

G

AS6

-6.1

4 p

< 0.

0000

01

100

0.71

5 1.

895

20

Arg

inin

osuc

cina

te sy

nthe

tase

AS

S -6

.11

p <

0.00

0001

10

0 0.

76

2.30

6 21

Pa

tern

ally

exp

ress

ed 3

PE

G3

-6.0

2 p

< 0.

0000

01

100

0.56

7 4.

481

22

Inhi

bito

r of D

NA

bin

ding

4, d

omin

ant n

egat

ive

helix

-looP

-hel

ix p

rote

in

ID4

-5.9

4 p

< 0.

0000

01

100

0.77

5 2.

661

23

Rep

rodu

ctio

n 8

D8S

2298

E -5

.93

p <

0.00

0001

10

0 1.

048

2.28

8 24

R

NA

-bin

ding

regi

on (R

NP1

, RR

M) c

onta

inin

g 1

RNPC

1 -5

.85

p <

0.00

0001

10

0 0.

771

2.00

1 25

B

one

mor

phog

enet

ic p

rote

in 6

BM

P6

-5.7

5 p

< 0.

0000

01

100

1.04

6 3.

169

26

GR

B2-

asso

ciat

ed b

indi

ng p

rote

in 2

G

AB2

-5.7

1 p

< 0.

0000

01

100

0.85

5 1.

751

27

Myo

-inos

itol 1

-pho

spha

te sy

ntha

se A

1 IS

YNA1

-5

.63

p <

0.00

0001

10

0 0.

777

2.34

6 28

Pa

rane

opla

stic

ant

igen

MA

1 PN

MA1

-5

.58

p <

0.00

0001

10

0 0.

771

1.41

5 29

H

omeo

box

D4

HO

XD4

-5.5

8 p

< 0.

0000

01

100

0.93

7 2.

695

359

Ran

k

Uni

Gen

e N

ame

Uni

gene

Sym

bol

t-va

lue

Para

met

ric

p-va

lue

% C

V

supp

ort

Mea

n of

rat

ios

in c

lass

1:

othe

r

Mea

n of

rat

ios i

n cl

ass 2

: Ova

rian

30

Ret

icul

ocal

bin

2, E

F-ha

nd c

alci

um b

indi

ng d

omai

n RC

N2

-5.5

6 p

< 0.

0000

01

100

1.01

4 1.

966

31

Hom

eo b

ox D

8 H

OXD

8 -5

.56

p <

0.00

0001

10

0 0.

888

2.57

6 32

A

nti-M

ulle

rian

horm

one

rece

ptor

, typ

e II

AM

HR2

-5

.51

p <

0.00

0001

10

0 0.

925

2.66

33

EG

F-co

ntai

ning

fibu

lin-li

ke e

xtra

cellu

lar m

atrix

pro

tein

2

EFEM

P2

-5.4

9 p

< 0.

0000

01

100

0.83

9 1.

615

35

Ets v

aria

nt g

ene

1 ET

V1

-5.3

8 p

< 0.

0000

01

100

0.87

4 1.

848

37

Mei

s1, m

yelo

id e

cotro

pic

vira

l int

egra

tion

site

1 h

omol

og (m

ouse

) M

EIS1

-5

.34

p <

0.00

0001

10

0 0.

948

2.59

7 38

G

uano

sine

mon

opho

spha

te re

duct

ase

GM

PR

-5.3

3 p

< 0.

0000

01

100

0.83

2 2.

23

39

Hyp

othe

tical

pro

tein

MG

C20

235

MG

C20

235

-5.3

1 p

< 0.

0000

01

100

0.87

1 1.

885

40

Kin

esin

fam

ily m

embe

r 5C

K

IF5C

-5

.3

p <

0.00

0001

10

0 0.

89

2.65

4 41

G

amm

a-am

inob

utyr

ic a

cid

(GA

BA

) A re

cept

or, a

lpha

1

GAB

RA1

-5.2

8 p

< 0.

0000

01

100

0.81

4 2.

605

42

Paire

d bo

x ge

ne 8

PA

X8

-5.2

7 p

< 0.

0000

01

100

1.01

8 5.

205

43

Neu

ral c

ell a

dhes

ion

mol

ecul

e 1

NC

AM1

-5.2

5 p

< 0.

0000

01

100

0.77

5 2

44

Mat

rilin

2

MAT

N2

-5.2

4 p

< 0.

0000

01

100

0.83

2 2.

215

45

Ephr

in-B

3 EF

NB3

-5

.22

p <

0.00

0001

10

0 0.

889

1.66

4 46

C

ell a

dhes

ion

mol

ecul

e w

ith h

omol

ogy

to L

1CA

M (c

lose

hom

olog

of L

1)

CH

L1

-5.2

1 p

< 0.

0000

01

100

0.77

7 3.

33

48

Hyp

othe

tical

pro

tein

FLJ

1244

2 FL

J124

42

-5.2

p

< 0.

0000

01

100

0.81

8 1.

556

51

Ret

inol

bin

ding

pro

tein

1, c

ellu

lar

RBP1

-5

.16

p <

0.00

0001

10

0 0.

728

2.79

8 52

Sa

rcos

pan

(Kra

s onc

ogen

e-as

soci

ated

gen

e)

SSPN

-5

.12

1.00

E-06

10

0 0.

844

1.81

4 53

K

IAA

0020

K

IAA0

020

-5

2.00

E-06

10

0 0.

945

5.49

7 54

Sy

ndec

an 3

(N-s

ynde

can)

SD

C3

-5

2.00

E-06

10

0 0.

742

1.54

8 56

K

IAA

1240

pro

tein

K

IAA1

240

-4.9

4 2.

00E-

06

100

0.92

4 1.

943

57

RA

B11

fam

ily in

tera

ctin

g pr

otei

n 5

(cla

ss I)

RA

B11F

IP5

-4.8

9 3.

00E-

06

100

0.71

8 1.

717

58

Zinc

fing

er p

rote

in 2

58

ZNF2

58

-4.8

6 3.

00E-

06

100

0.89

2.

37

59

Pros

tate

tum

or o

vere

xpre

ssed

gen

e 1

PTO

V1

-4.8

6 3.

00E-

06

100

0.87

6 2.

239

60

Yip

1 in

tera

ctin

g fa

ctor

hom

olog

(S. c

erev

isia

e)

YIF1

-4

.84

4.00

E-06

10

0 0.

85

1.45

62

U

biqu

itin-

conj

ugat

ing

enzy

me

E2E

2 (U

BC

4/5

hom

olog

, yea

st)

UBE

2E2

-4.8

1 4.

00E-

06

100

0.86

2 1.

615

63

Kal

likre

in 8

(neu

rops

in/o

vasi

n)

KLK

8 -4

.79

7.00

E-06

10

0 0.

978

3.89

5 64

Fu

ll le

ngth

inse

rt c

DN

A cl

one

ZD63

C05

-4

.72

7.00

E-06

10

0 0.

828

2.62

4 65

N

erve

gro

wth

fact

or re

cept

or (T

NFR

SF16

) ass

ocia

ted

prot

ein

1 N

GFR

AP1

-4.7

2 6.

00E-

06

100

0.73

8 1.

553

67

Ner

ve g

row

th fa

ctor

rece

ptor

(TN

FRSF

16) a

ssoc

iate

d pr

otei

n 1

NG

FRAP

1 -4

.72

6.00

E-06

10

0 0.

802

1.58

4 68

B

one

mar

row

stro

mal

cel

l ant

igen

2

BST2

-4

.71

6.00

E-06

10

0 0.

716

1.61

2

36

0

Ran

k

Uni

Gen

e N

ame

Uni

gene

Sym

bol

t-va

lue

Para

met

ric

p-va

lue

% C

V

supp

ort

Mea

n of

rat

ios

in c

lass

1:

othe

r

Mea

n of

rat

ios i

n cl

ass 2

: Ova

rian

69

PTPR

F in

tera

ctin

g pr

otei

n, b

indi

ng p

rote

in 1

(lip

rin b

eta

1)

PPFI

BP1

-4.7

1 8.

00E-

06

100

0.91

9 1.

795

70

Tran

sduc

in-li

ke e

nhan

cer o

f spl

it 4

(E(s

p1) h

omol

og, D

roso

phila

) TL

E4

-4.7

1 6.

00E-

06

100

0.86

1 1.

731

71

Ast

rota

ctin

AS

TN

-4.6

4 1.

10E-

05

100

0.88

5 1.

836

72

Mel

anom

a as

soci

ated

gen

e D

2S44

8 -4

.62

9.00

E-06

10

0 0.

759

1.57

3 74

C

adhe

rin 2

, typ

e 1,

N-c

adhe

rin (n

euro

nal)

CD

H2

-4.5

7 1.

10E-

05

100

0.91

3 2.

662

75

Hyp

othe

tical

pro

tein

MG

C22

014

MG

C22

014

-4.5

7 1.

30E-

05

100

0.88

2 1.

605

76

GA

TA b

indi

ng p

rote

in 4

G

ATA4

-4

.55

1.30

E-05

10

0 1.

163

4.16

2 77

A

DP-

ribos

ylat

ion

fact

or 4

-like

AR

F4L

-4.5

2 1.

40E-

05

100

0.87

2 1.

985

78

Del

ta-li

ke 1

hom

olog

(Dro

soph

ila)

DLK

1 -4

.47

1.60

E-05

10

0 0.

868

5.33

1 79

Le

ctin

, gal

acto

side

-bin

ding

, sol

uble

, 3 b

indi

ng p

rote

in

LGAL

S3BP

-4

.45

1.70

E-05

10

0 0.

857

1.55

3 80

Ph

osph

olip

id tr

ansf

er p

rote

in

PLTP

-4

.44

1.80

E-05

10

0 0.

899

1.72

3 81

Tr

ansc

ribe

d lo

cus

-4.4

4 1.

90E-

05

100

0.89

4 1.

808

82

Dou

blec

ortin

dom

ain

cont

aini

ng 2

D

CD

C2

-4.4

3 2.

10E-

05

100

0.83

7 2.

783

83

BTG

fam

ily, m

embe

r 3

BTG

3 -4

.41

2.20

E-05

10

0 1.

029

1.87

7 84

In

tegr

in, a

lpha

9

ITG

A9

-4.4

2.

70E-

05

100

1.00

8 1.

95

85

Cyc

lin E

1 C

CN

E1

-4.3

7 2.

50E-

05

100

1.00

8 1.

944

86

Dis

cs, l

arge

(Dro

soph

ila) h

omol

og-a

ssoc

iate

d pr

otei

n 3

DLG

AP3

-4.3

6 2.

60E-

05

100

0.76

8 1.

525

87

IGF-

II m

RN

A-b

indi

ng p

rote

in 2

IM

P-2

-4.3

3.

30E-

05

100

0.70

4 1.

738

88

V-m

yc m

yelo

cyto

mat

osis

vira

l rel

ated

onc

ogen

e, n

euro

blas

tom

a de

rived

(a

vian

) M

YCN

-4

.28

3.60

E-05

10

0 0.

887

2.30

1

89

Pate

rnal

ly e

xpre

ssed

10

PEG

10

-4.2

8 3.

60E-

05

100

0.73

8 3.

072

90

ST6

(alp

ha-N

-ace

tyl-n

eura

min

yl-2

,3-b

eta-

gala

ctos

yl-1

,3)-

N-

acet

ylga

lact

osam

inid

e al

pha-

2,6-

sial

yltra

nsfe

rase

5

SIAT

7E

-4.2

3 5.

80E-

05

100

1.08

7 2.

616

91

Plex

in D

1 PL

XND

1 -4

.2

4.70

E-05

10

0 0.

85

1.51

1 92

Ph

osph

ofru

ctok

inas

e, m

uscl

e PF

KM

-4

.2

4.80

E-05

10

0 0.

9 1.

491

93

Bon

e m

orph

ogen

etic

pro

tein

7 (o

steo

geni

c pr

otei

n 1)

BM

P7

-4.1

9 5.

00E-

05

100

0.94

6 2.

848

94

Cad

herin

6, t

ype

2, K

-cad

herin

(fet

al k

idne

y)

CD

H6

-4.1

9 5.

20E-

05

100

0.90

8 2.

841

95

Bra

in e

xpre

ssed

X-li

nked

2

BEX2

-4

.19

5.10

E-05

10

0 0.

9 1.

901

96

Lam

inin

, bet

a 2

(lam

inin

S)

LAM

B2

-4.1

6 5.

50E-

05

100

0.82

9 1.

345

97

Sal-l

ike

2 (D

roso

phila

) SA

LL2

-4.1

5 6.

00E-

05

100

0.84

2 1.

584

361

Ran

k

Uni

Gen

e N

ame

Uni

gene

Sym

bol

t-va

lue

Para

met

ric

p-va

lue

% C

V

supp

ort

Mea

n of

rat

ios

in c

lass

1:

othe

r

Mea

n of

rat

ios i

n cl

ass 2

: Ova

rian

98

Kal

likre

in 7

(chy

mot

rypt

ic, s

tratu

m c

orne

um)

KLK

7 -4

.1

9.10

E-05

10

0 1.

013

4.22

99

La

troph

ilin

2 LP

HN

2 -4

.09

7.50

E-05

10

0 0.

84

1.71

8 10

0 D

eath

-ass

ocia

ted

prot

ein

kina

se 1

D

APK

1 -4

.07

7.90

E-05

10

0 0.

841

1.61

8 10

1 C

alm

odul

in-li

ke 3

C

ALM

L3

-4.0

5 0.

0001

02

100

1.08

9 3.

555

102

Nor

rie d

isea

se (p

seud

oglio

ma)

N

DP

-4.0

3 0.

0001

04

99

0.90

8 1.

904

104

ATP

ase,

Ca+

+ tra

nspo

rting

, pla

sma

mem

bran

e 1

ATP2

B1

-3.9

7 0.

0001

18

100

0.85

4 1.

993

105

Plex

in B

1 PL

XNB1

-3

.96

0.00

0126

10

0 0.

936

1.61

5 10

7 H

ypot

hetic

al p

rote

in M

GC

3504

8 M

GC

3504

8 -3

.95

0.00

0145

10

0 0.

787

1.63

4 10

8 C

arbo

xype

ptid

ase

Z C

PZ

-3.9

4 0.

0001

39

100

0.78

6 1.

719

109

Trop

hini

n TR

O

-3.9

4 0.

0001

43

99

0.81

5 1.

631

110

Sarc

ogly

can,

bet

a (4

3kD

a dy

stro

phin

-ass

ocia

ted

glyc

opro

tein

) SG

CB

-3.9

2 0.

0001

41

100

0.80

4 1.

9 11

2 W

AS

prot

ein

fam

ily, m

embe

r 1

WAS

F1

-3.8

8 0.

0001

67

100

0.91

6 1.

548

113

SAC

3 do

mai

n co

ntai

ning

1

SHD

1 -3

.88

0.00

0165

10

0 0.

885

1.70

8 11

4 Le

ucin

e zi

pper

, dow

n-re

gula

ted

in c

ance

r 1

LDO

C1

-3.8

6 0.

0001

73

100

0.72

9 1.

517

115

Prot

ein

tyro

sine

pho

spha

tase

, rec

epto

r typ

e, U

PT

PRU

-3

.86

0.00

0176

10

0 0.

909

1.69

8 11

6 Pr

osta

glan

din

I2 (p

rost

acyc

lin) s

ynth

ase

PTG

IS

-3.8

4 0.

0001

91

99

0.88

5 1.

513

117

Fibr

onec

tin le

ucin

e ric

h tra

nsm

embr

ane

prot

ein

2 FL

RT2

-3.8

3 0.

0001

97

97

0.93

2 2.

003

118

Sarc

ogly

can,

eps

ilon

SGC

E -3

.83

0.00

0196

99

0.

763

1.63

2 11

9 IM

P (in

osin

e m

onop

hosp

hate

) deh

ydro

gena

se 2

IM

PDH

2 -3

.83

0.00

0197

10

0 0.

888

1.44

8

120

Tran

scri

bed

locu

s, m

oder

atel

y si

mila

r to

XP_5

3447

6.1

sim

ilar t

o Ad

hesi

on re

gula

ting

mol

ecul

e 1

prec

urso

r (11

0 kD

a ce

ll m

embr

ane

glyc

opro

tein

) (G

p110

) [C

anis

fam

iliar

is]

-3.8

2 0.

0002

24

98

1.02

5 1.

776

121

Tum

or n

ecro

sis f

acto

r, al

pha-

indu

ced

prot

ein

2 TN

FAIP

2 -3

.81

0.00

0214

99

0.

962

1.78

7 12

2 K

allik

rein

5

KLK

5 -3

.81

0.00

022

100

0.93

1 4.

19

123

Hyp

othe

tical

LO

C40

1022

LO

C40

1022

-3

.81

0.00

0219

99

1.

014

1.98

2 12

4 Sy

ntax

in 6

ST

X6

-3.8

1 0.

0002

44

99

0.96

3 2.

115

125

Secr

etor

y le

ukoc

yte

prot

ease

inhi

bito

r (an

tileu

kopr

otei

nase

) SL

PI

-3.7

9 0.

0002

28

100

0.75

1 3.

159

126

Supp

ress

ion

of tu

mor

igen

icity

5

ST5

-3.7

8 0.

0002

36

99

0.86

9 1.

333

127

Dis

coid

in, C

UB

and

LC

CL

dom

ain

cont

aini

ng 2

D

CBL

D2

-3.7

7 0.

0002

4 10

0 0.

81

1.62

12

8 Pl

eiom

orph

ic a

deno

ma

gene

-like

2

PLAG

L2

-3.7

7 0.

0002

53

99

1.06

9 1.

781

129

WA

S pr

otei

n fa

mily

, mem

ber 1

W

ASF1

-3

.76

0.00

0266

99

0.

819

1.43

5 13

0 C

yclin

-dep

ende

nt k

inas

e in

hibi

tor 2

C (p

18, i

nhib

its C

DK

4)

CD

KN

2C

-3.7

5 0.

0002

75

99

0.92

7 1.

531

36

2

Ran

k

Uni

Gen

e N

ame

Uni

gene

Sym

bol

t-va

lue

Para

met

ric

p-va

lue

% C

V

supp

ort

Mea

n of

rat

ios

in c

lass

1:

othe

r

Mea

n of

rat

ios i

n cl

ass 2

: Ova

rian

132

Thyr

oid

horm

one

rece

ptor

inte

ract

or 6

TR

IP6

-3.7

3 0.

0002

9 98

0.

747

1.27

5 13

3 M

ater

nally

exp

ress

ed 3

M

EG3

-3.7

3 0.

0002

97

98

0.97

9 2.

153

134

N-a

cety

late

d al

pha-

linke

d ac

idic

dip

eptid

ase

2 N

AALA

D2

-3.7

1 0.

0003

05

97

0.87

4 1.

449

135

Cul

lin 7

C

UL7

-3

.71

0.00

0303

96

0.

878

1.37

2 13

6 C

yclin

G1

CC

NG

1 -3

.7

0.00

0316

97

0.

907

1.43

13

7 Pr

epro

noci

cept

in

PNO

C

-3.7

0.

0003

38

97

0.99

4 1.

555

138

Fola

te re

cept

or 1

(adu

lt)

FOLR

1 -3

.66

0.00

0414

97

1.

113

3.38

5 13

9 TB

P-in

tera

ctin

g pr

otei

n TI

P120

B -3

.66

0.00

0413

97

0.

942

1.67

8 14

0 Si

mila

r to

RIK

EN c

DN

A 2

3100

16C

16

LOC

4938

69

-3.6

4 0.

0003

9 98

0.

879

1.43

5 14

1 W

W d

omai

n bi

ndin

g pr

otei

n 5

WBP

5 -3

.63

4.00

E-04

97

0.

86

1.29

7 14

2 C

erul

opla

smin

(fer

roxi

dase

) C

P -3

.63

0.00

0407

96

0.

992

2.66

1 14

4 A

ctin

bin

ding

LIM

pro

tein

1

ABLI

M1

-3.6

2 0.

0004

14

95

0.77

9 1.

514

145

SP11

0 nu

clea

r bod

y pr

otei

n SP

110

-3.6

2 0.

0004

62

96

0.91

7 1.

484

146

Solu

te c

arrie

r fam

ily 6

(neu

rotra

nsm

itter

tran

spor

ter,

crea

tine)

, mem

ber 8

SL

C6A

8 -3

.6

0.00

0437

96

0.

746

1.87

1 14

7 F-

box

and

leuc

ine-

rich

repe

at p

rote

in 7

FB

XL7

-3.6

0.

0004

43

96

0.89

7 1.

639

148

TU3A

pro

tein

TU

3A

-3.6

0.

0004

72

96

0.90

5 1.

504

149

Prot

ein

phos

phat

ase

2 (f

orm

erly

2A

), ca

taly

tic su

buni

t, be

ta is

ofor

m

PPP2

CB

-3.5

8 0.

0004

73

95

0.95

9 1.

365

150

Gol

gin-

67

GO

LGIN

-67

-3.5

8 0.

0004

77

96

0.89

3 1.

464

151

Tetra

trico

pept

ide

repe

at d

omai

n 7A

TT

C7A

-3

.58

0.00

0482

96

0.

899

1.43

15

2 Es

troge

n re

cept

or 1

ES

R1

-3.5

8 0.

0004

8 98

1.

204

4.97

7 15

3 H

apto

glob

in

HP

-3.5

7 0.

0005

23

94

1.23

5 3.

391

154

Cyc

lin-d

epen

dent

kin

ase

inhi

bito

r 1C

(p57

, Kip

2)

CD

KN

1C

-3.5

7 0.

0004

98

96

0.89

1.

525

155

Neu

rona

l pen

traxi

n II

N

PTX2

-3

.57

0.00

0506

96

0.

834

1.76

1 15

6 D

KFZ

P566

O08

4 pr

otei

n D

KFZ

p566

O08

4 -3

.56

0.00

0512

98

0.

738

1.65

6 15

7 Ly

soso

mal

ass

ocia

ted

prot

ein

trans

mem

bran

e 4

beta

LA

PTM

4B

-3.5

5 0.

0005

25

97

0.92

2 1.

7 15

8 C

DN

A FL

J369

31 fi

s, cl

one

BRAC

E200

5290

-3

.55

6.00

E-04

96

0.

804

1.66

3 15

9 Pr

e-B

-cel

l leu

kem

ia tr

ansc

riptio

n fa

ctor

1

PBX1

-3

.5

0.00

0623

96

0.

983

1.62

9

160

Gly

cine

deh

ydro

gena

se (d

ecar

boxy

latin

g; g

lyci

ne d

ecar

boxy

lase

, gly

cine

cl

eava

ge sy

stem

pro

tein

P)

GLD

C

-3.4

8 0.

0006

89

93

0.93

4 3.

622

161

Car

bohy

drat

e su

lfotra

nsfe

rase

10

CH

ST10

-3

.47

0.00

073

93

0.9

1.58

3 16

2 A

myl

oid

beta

(A4)

pre

curs

or-li

ke p

rote

in 1

AP

LP1

-3.4

5 0.

0008

02

96

0.84

6 1.

626

363

Ran

k

Uni

Gen

e N

ame

Uni

gene

Sym

bol

t-va

lue

Para

met

ric

p-va

lue

% C

V

supp

ort

Mea

n of

rat

ios

in c

lass

1:

othe

r

Mea

n of

rat

ios i

n cl

ass 2

: Ova

rian

163

Nid

ogen

2 (o

steo

nido

gen)

N

ID2

-3.4

3 0.

0007

99

94

0.91

1 1.

647

164

KIA

A12

38 p

rote

in

KIA

A123

8 -3

.43

0.00

0798

91

0.

839

1.46

3 16

5 C

arbo

nic

anhy

dras

e X

IV

CA1

4 -3

.42

0.00

0841

94

0.

956

1.42

8 16

6 C

UG

trip

let r

epea

t, R

NA

bin

ding

pro

tein

2

CU

GBP

2 -3

.41

0.00

0847

92

0.

931

1.72

5 16

7 TT

K p

rote

in k

inas

e TT

K

-3.4

1 0.

0008

63

92

1.04

4 1.

761

168

Tran

smem

bran

e pr

otei

n w

ith E

GF-

like

and

two

folli

stat

in-li

ke d

omai

ns 1

TM

EFF1

-3

.38

0.00

0953

49

0.

841

1.54

4 17

0 Fo

rkhe

ad b

ox F

2 FO

XF2

3.38

0.

0009

78

44

1.20

5 0.

59

172

Ubi

quiti

n fu

sion

deg

rada

tion

1-lik

e U

FD1L

3.

39

0.00

0962

55

1.

401

0.65

7 17

3 G

ardn

er-R

ashe

ed fe

line

sarc

oma

vira

l (v-

fgr)

onc

ogen

e ho

mol

og

FGR

3.39

0.

0009

37

67

1.26

8 0.

656

174

Inte

rfer

on st

imul

ated

gen

e 20

kDa

ISG

20

3.4

0.00

0886

80

1.

156

0.66

7 17

5 Ec

tonu

cleo

side

trip

hosp

hate

dip

hosp

hohy

drol

ase

1 EN

TPD

1 3.

4 0.

0009

25

63

1.23

7 0.

712

176

IQ m

otif

cont

aini

ng G

TPas

e ac

tivat

ing

prot

ein

2 IQ

GAP

2 3.

41

0.00

0857

91

1.

055

0.58

17

7 G

luta

redo

xin

(thio

ltran

sfer

ase)

G

LRX

3.41

0.

0008

46

90

1.27

3 0.

722

180

Paire

d-lik

e ho

meo

dom

ain

trans

crip

tion

fact

or 2

PI

TX2

3.43

0.

0008

34

93

1.51

4 0.

615

181

Man

nosi

dase

, alp

ha, c

lass

2A

, mem

ber 1

M

AN2A

1 3.

44

0.00

0798

90

1.

426

0.36

6 18

2 H

ypot

hetic

al p

rote

in F

LJ38

564

FLJ3

8564

3.

44

0.00

0813

93

1.

207

0.62

8 18

3 In

tegr

in, a

lpha

4 (a

ntig

en C

D49

D, a

lpha

4 su

buni

t of V

LA-4

rece

ptor

) IT

GA4

3.

45

0.00

0753

93

1.

207

0.73

7 18

4 K

IAA

0408

K

IAA0

408

3.47

0.

0007

27

97

1.24

2 0.

572

185

Plat

elet

/end

othe

lial c

ell a

dhes

ion

mol

ecul

e (C

D31

ant

igen

) PE

CAM

1 3.

55

0.00

0531

95

1.

176

0.66

5 18

6 K

IAA

1012

K

IAA1

012

3.59

0.

0005

1 96

1.

444

0.65

8 18

7 D

naJ (

Hsp

40) h

omol

og, s

ubfa

mily

D, m

embe

r 1

DN

AJD

1 3.

6 0.

0004

5 97

1.

157

0.69

5 18

8 Tr

opho

blas

t-der

ived

non

codi

ng R

NA

Tn

cRN

A 3.

62

0.00

0407

96

0.

981

0.58

4 18

9 Sm

cy h

omol

og, Y

-link

ed (m

ouse

) SM

CY

3.63

0.

0004

08

99

2.55

1 0.

436

191

Ubi

quito

usly

tran

scrib

ed te

tratri

cope

ptid

e re

peat

gen

e, Y

-link

ed

UTY

3.

67

0.00

0424

99

1.

343

0.53

1 19

2 C

hrom

osom

e 6

open

read

ing

fram

e 4

C6o

rf4

3.73

0.

0002

88

99

1.19

9 0.

553

193

LYR

IC/3

D3

LYRI

C

3.76

0.

0002

87

99

1.14

3 0.

538

194

BEN

E pr

otei

n BE

NE

3.76

0.

0002

52

100

1.22

9 0.

555

195

Pota

ssiu

m v

olta

ge-g

ated

cha

nnel

, del

ayed

-rec

tifie

r, su

bfam

ily S

, mem

ber 3

K

CN

S3

3.78

0.

0002

37

99

1.31

3 0.

607

196

TRN

A (5

-met

hyla

min

omet

hyl-2

-thio

urid

ylat

e)-m

ethy

ltran

sfer

ase

1 TR

MT1

3.

78

0.00

023

100

1.27

8 0.

781

197

Cal

cium

/cal

mod

ulin

-dep

ende

nt p

rote

in k

inas

e II

C

aMK

IIN

alph

a 3.

8 0.

0002

17

100

1.13

7 0.

49

198

A d

isin

tegr

in-li

ke a

nd m

etal

lopr

otea

se (r

epro

lysi

n ty

pe) w

ith

ADAM

TS4

3.81

0.

0002

2 99

1.

233

0.7

36

4

Ran

k

Uni

Gen

e N

ame

Uni

gene

Sym

bol

t-va

lue

Para

met

ric

p-va

lue

% C

V

supp

ort

Mea

n of

rat

ios

in c

lass

1:

othe

r

Mea

n of

rat

ios i

n cl

ass 2

: Ova

rian

thro

mbo

spon

din

type

1 m

otif,

4

200

Kin

ase

inse

rt do

mai

n re

cept

or (a

type

III r

ecep

tor t

yros

ine

kina

se)

KD

R 3.

86

0.00

0175

10

0 1.

149

0.56

8 20

1 R

AN

-bin

ding

pro

tein

2-li

ke 1

shor

t iso

form

LO

C40

0966

3.

87

0.00

018

99

1.09

8 0.

626

202

Pota

ssiu

m in

war

dly-

rect

ifyin

g ch

anne

l, su

bfam

ily J,

mem

ber 1

5 K

CN

J15

3.94

0.

0001

32

100

1.34

8 0.

639

203

Neu

ritin

1

NRN

1 3.

97

0.00

0117

10

0 1.

308

0.57

6 20

4 In

tegr

in, a

lpha

6

ITG

A6

3.99

0.

0001

1 10

0 1.

12

0.50

9 20

6 V

av 3

onc

ogen

e VA

V3

3.99

0.

0001

07

100

1.65

4 0.

538

207

V-e

ts e

ryth

robl

asto

sis v

irus E

26 o

ncog

ene

hom

olog

2 (a

vian

) ET

S2

3.99

0.

0001

13

100

1.41

4 0.

673

208

Ecto

nucl

eotid

e py

roph

osph

atas

e/ph

osph

odie

ster

ase

3 EN

PP3

3.99

0.

0001

14

99

2.10

4 0.

327

209

Syna

ptot

agm

in V

II

SYT7

4

0.00

0109

99

1.

352

0.44

2 21

1 K

IAA

1539

K

IAA1

539

4.1

7.20

E-05

10

0 1.

159

0.70

4 21

2 Pr

otea

se, s

erin

e, 2

3 PR

SS23

4.

18

5.30

E-05

10

0 1.

321

0.61

5 21

3 C

alci

um/c

alm

odul

in-d

epen

dent

pro

tein

kin

ase

II

CaM

KII

Nal

pha

4.19

5.

00E-

05

100

1.06

0.

474

214

T ce

ll re

cept

or a

lpha

locu

s TR

A@

4.26

3.

80E-

05

100

1.30

6 0.

629

215

Perio

stin

, ost

eobl

ast s

peci

fic fa

ctor

PO

STN

4.

3 3.

30E-

05

100

1.16

8 0.

309

216

Chr

omos

ome

14 o

pen

read

ing

fram

e 14

7 C

14or

f147

4.

34

2.80

E-05

10

0 1.

193

0.61

7 21

7 K

IAA

0753

gen

e pr

oduc

t K

IAA0

753

4.37

2.

40E-

05

100

1.07

6 0.

393

218

Hyp

othe

tical

pro

tein

LO

C15

2485

LO

C15

2485

4.

4 2.

30E-

05

100

1.11

7 0.

656

219

DEA

D (A

sP-G

lu-A

la-A

sp) b

ox p

olyp

eptid

e 3,

Y-li

nked

D

DX3

Y 4.

41

2.20

E-05

99

1.

35

0.41

6 22

1 BR

CA1

ass

ocia

ted

prot

ein-

1 (u

biqu

itin

carb

oxy-

term

inal

hyd

rola

se)

BAP1

4.

54

1.20

E-05

10

0 1.

268

0.29

7 22

2 C

arci

noem

bryo

nic

antig

en-r

elat

ed c

ell a

dhes

ion

mol

ecul

e 1

CEA

CAM

1 4.

58

1.00

E-05

10

0 1.

668

0.49

3 22

3 Eu

kary

otic

tran

slat

ion

initi

atio

n fa

ctor

1A

, Y-li

nked

EI

F1AY

4.

6 1.

00E-

05

100

1.82

1 0.

59

224

Chr

omos

ome

Y o

pen

read

ing

fram

e 15

B

CYo

rf15

B 4.

61

9.00

E-06

10

0 3.

544

0.38

3 22

5 St

omat

in

STO

M

4.64

8.

00E-

06

100

1.25

6 0.

73

226

Bon

e m

orph

ogen

etic

pro

tein

5

BMP5

4.

65

9.00

E-06

10

0 1.

401

0.60

8

227

Ang

iote

nsin

ogen

(ser

ine

(or c

yste

ine)

pro

tein

ase

inhi

bito

r, cl

ade

A (a

lpha

-1

antip

rote

inas

e, a

ntitr

ypsi

n), m

embe

r 8)

AGT

4.75

5.

00E-

06

100

1.90

9 0.

451

228

CD

44 a

ntig

en (h

omin

g fu

nctio

n an

d In

dian

blo

od g

roup

syst

em)

CD

44

4.81

4.

00E-

06

100

1.28

9 0.

565

229

Inte

grin

, alp

ha 2

(CD

49B

, alp

ha 2

subu

nit o

f VLA

-2 re

cept

or)

ITG

A2

4.83

4.

00E-

06

100

1.11

8 0.

434

230

Myo

sin,

ligh

t pol

ypep

tide

kina

se

MYL

K

4.88

3.

00E-

06

100

1.18

4 0.

519

231

Tena

scin

C (h

exab

rach

ion)

TN

C

4.89

3.

00E-

06

100

1.30

6 0.

414

365

App

endi

x L

: KE

GG

pat

hway

s sig

nific

antly

rep

rese

nted

in g

ene

expr

essi

on si

gnat

ure

of

sero

us L

MP

and

inva

sive

EO

C

Cel

l cyc

le K

EG

G p

athw

ay

P-v

alue

for o

verla

p w

ith

LMP/

inva

sive

EO

C

diff

eren

tially

exp

ress

ed g

ene

list:

2.81

x 10

-5.

Red

circ

les:

gen

e ov

er

expr

esse

d in

inva

sive

tu

mou

rs.

36

6

Com

plem

ent a

nd

coag

ulat

ion

KE

GG

pa

thw

ay.

P-va

lue

for o

verla

p w

ith

LMP/

inva

sive

EO

C

diff

eren

tially

exp

ress

ed g

ene

list:

P-va

lue

= 3.

88 x

10-3

Red

circ

les:

gen

e ov

er

expr

esse

d in

inva

sive

tu

mou

rs.

Gre

en c

ircle

s: G

ene

unde

r-ex

pres

sed

in in

vasi

ve tu

mou

rs

367

Cyt

okin

e-cy

toki

ne

rece

ptor

inte

ract

ion

path

way

. P-

valu

e fo

r ove

rlap

with

LM

P/in

vasi

ve E

OC

di

ffer

entia

lly e

xpre

ssed

ge

ne li

st P

= 5

.28

x 10

-5

Red

circ

les:

gen

e ov

er

expr

esse

d in

inva

sive

tu

mou

rs.

Gre

en c

ircle

s: G

ene

unde

r-ex

pres

sed

in

inva

sive

tum

ours

368

Appendix M: Microsoft Access gene ontology filter SQL query applied to total list of differentially expressed genes to exclude cell-cycle

regulating and immune-response genes from the LMP/invasive EOC expression signature

SELECT Invasive_LMP_SAM_source2.ID, Invasive_LMP_SAM_source2.Acc, Invasive_LMP_SAM_source2.Name, Invasive_LMP_SAM_source2.Symbol, Invasive_LMP_SAM_source2.SumFunc, Invasive_LMP_SAM_source2.GOabr FROM Invasive_LMP_SAM_source2 WHERE (((Invasive_LMP_SAM_source2.SumFunc) Like "*cancer*" Or (Invasive_LMP_SAM_source2.SumFunc) Like "*adhesion*") AND ((Invasive_LMP_SAM_source2.GOabr) Not Like "*cell cycle*" And (Invasive_LMP_SAM_source2.GOabr) Not Like "*immune*")) OR (((Invasive_LMP_SAM_source2.SumFunc) Like "*tumour*") AND ((Invasive_LMP_SAM_source2.GOabr) Not Like "*cell cycle*" And (Invasive_LMP_SAM_source2.GOabr) Not Like "*immune*")) OR (((Invasive_LMP_SAM_source2.SumFunc) Like "*epith*") AND ((Invasive_LMP_SAM_source2.GOabr) Not Like "*cell cycle*" And (Invasive_LMP_SAM_source2.GOabr) Not Like "*immune*")) OR (((Invasive_LMP_SAM_source2.SumFunc) Like "*apoptosis*") AND ((Invasive_LMP_SAM_source2.GOabr) Not Like "*cell cycle*" And (Invasive_LMP_SAM_source2.GOabr) Not Like "*immune*")) OR (((Invasive_LMP_SAM_source2.SumFunc) Like "*invasion*") AND ((Invasive_LMP_SAM_source2.GOabr) Not Like "*cell cycle*" And (Invasive_LMP_SAM_source2.GOabr) Not Like "*immune*")) OR (((Invasive_LMP_SAM_source2.SumFunc) Like "*metas*") AND ((Invasive_LMP_SAM_source2.GOabr) Not Like "*cell cycle*" And (Invasive_LMP_SAM_source2.GOabr) Not Like "*immune*")) OR (((Invasive_LMP_SAM_source2.SumFunc) Like "*ovarian*") AND ((Invasive_LMP_SAM_source2.GOabr) Not Like "*cell cycle*" And (Invasive_LMP_SAM_source2.GOabr) Not Like "*immune*")) OR (((Invasive_LMP_SAM_source2.SumFunc) Like "*growth*") AND ((Invasive_LMP_SAM_source2.GOabr) Not Like "*cell cycle*" And (Invasive_LMP_SAM_source2.GOabr) Not Like "*immune*")) OR (((Invasive_LMP_SAM_source2.SumFunc) Like "*tumour*") AND ((Invasive_LMP_SAM_source2.GOabr) Not Like "*cell cycle*" And (Invasive_LMP_SAM_source2.GOabr) Not Like "*immune*"));

369

Appendix N: Visual basic script for batch export of IHC image histogram statistics Dim appRef, startRulerUnits, startTypeUnits, startDisplayDialogs, docRef Dim totalCount, channelIndex, activeChannels, myChannels, secondaryIndex Dim largestCount, histogramIndex, pixelsPerX, outputX, a, visibleChannelCount Dim fsoRef, fileRef Dim folderRef, fileCollection Dim ImageCount, ImageCountTotal Dim i, newFolderName Dim aChannelArray(), aChannelIndex, fileOut, hist Set appRef = CreateObject("Photoshop.Application") ' Save the current preferences startRulerUnits = appRef.Preferences.RulerUnits startTypeUnits = appRef.Preferences.TypeUnits startDisplayDialogs = appRef.DisplayDialogs ' Set Photoshop CS2 to use pixels and display no dialogs appRef.Preferences.RulerUnits = 1 'for PsUnits --> 1 (psPixels) appRef.Preferences.TypeUnits = 1 'for PsTypeUnits --> 1 (psPixels) appRef.DisplayDialogs = 3 'for PsDialogModes --> 3 (psDisplayNoDialogs) i = 0 Set fsoRef = CreateObject( "Scripting.FileSystemObject" ) Set folderRef = fsoRef.GetFolder( "SPECIFY FULL DIRECTORY OF IMAGES HERE" ) Set fileCollection = folderRef.Files newFolderName = folderRef & "\Histogram_reports" Set convertedFolderRef = fsoRef.CreateFolder( newFolderName ) Set fileOut = fsoRef.CreateTextFile(newFolderName & "\" & "compiled_histogram_report.txt") For Each fileRef In fileCollection On Error Resume Next Set docRef = appRef.Open( fileRef.Path ) ' find out how many pixels I have totalCount = docRef.Width * docRef.Height ' more info to the out file 'fileOut.WriteLine " with a total pixel count of " & totalCount ' remember which channels are currently active activeChannels = appRef.ActiveDocument.ActiveChannels

' document histogram only works in these modes If docRef.Mode = 2 Or docRef.Mode = 3 Or docRef.Mode = 6 Then 'enumerated values = PsDocumentMode --> 2 (psRGB), 3 (psCMYK), 6 (psIndexedColor) ' activate the main channels so we can get the document’s histogram ' using the TurnOnDocumentHistogramChannels function Call TurnOnDocumentHistogramChannels(docRef) ' Output the documents histogram Call OutputHistogram(docRef.Histogram, "Luminosity", fileOut) End If ' local reference to work from Set myChannels = docRef.Channels ' loop through each channel and output the histogram For channelIndex = 1 To myChannels.Count ' the channel has to be visible to get a histogram myChannels(channelIndex).Visible = true ' turn off all the other channels for secondaryIndex = 1 to myChannels.Count If Not channelIndex = secondaryIndex Then myChannels(secondaryIndex).Visible = false End If Next ' Use the function to dump the histogram Call OutputHistogram(myChannels(channelIndex).Histogram,myChannels(channelIndex).Name, fileOut) Next ' close down the output file 'fileOut.Close ' reset the active channels docRef.ActiveChannels = activeChannels ' Reset the application preferences appRef.Preferences.RulerUnits = startRulerUnits appRef.Preferences.TypeUnits = startTypeUnits appRef.DisplayDialogs = startDisplayDialogs ' Utility function that takes a histogram and name ' and dumps to the output file appRef.ActiveDocument.Close() i = i + 1 Next fileOut.Close MsgBox i & " files processed by Ryans Histogram analysis tool!" Private Function OutputHistogram (inHistogram, inHistogramName, inOutFile) ' find out which count has the largest number ' I scale everything to this number for the output

370

largestCount = 0 ' a simple indexer I can reuse histogramIndex = 0 ' search through all and find the largest single item For Each hist In inHistogram histogramCount = histogramCount + CLng(hist) If CLng(hist) <> largestCount Then largestCount = CLng(hist) End If Next 'These should match If Not histogramCount = totalCount Then MsgBox "Something bad is happening!" End If 'inOutFile.WriteLine 'see how much each "X" is going to count as pixelsPerX = largestCount / 100 'output this data to the file 'output the name of this histogram 'inOutFile.WriteLine inHistogramName inOutFile.WriteLine docRef.Name & " " & inHistogramName & " Mean Pixels: " & AverageHistogram(inHistogram) & " Std. Dev. Pixels: " & StandardDeviationHistogram(inHistogram) & " Median Pixels: " & MedianHistogram(inHistogram,histogramCount) 'inOutFile.WriteLine docRef.Name & " Std. Dev. Pixels: " & StandardDeviationHistogram(inHistogram) 'inOutFile.WriteLine docRef.Name & " Median Pixels: " & MedianHistogram(inHistogram,histogramCount) ' loop through all the items and output in the following format ' 001 ' 002 ' For histogramIndex = 0 To (inHistogram.Count - 1) End Function ' Function to active all the channels according to the document’s mode ' Takes a document reference for input Private Function TurnOnDocumentHistogramChannels (inDocument) ' see how many channels we need to activate visibleChannelCount = 0 'based on the mode of the document Select Case inDocument.Mode Case 1 visibleChannelCount = 1 Case 5 visibleChannelCount = 1 Case 6 visibleChannelCount = 1 Case 8 visibleChannelCount = 2 Case 2 visibleChannelCount = 3 Case 4 visibleChannelCount = 3 Case 3 visibleChannelCount = 4 Case 8

visibleChannelCount = 4 Case 7 visibleChannelCount = (inDocument.Channels.Count + 1) Case Else visibleChannelCount = (inDocument.Channels.Count + 1) End Select ' now get the channels to activate into a local array ReDim aChannelArray(visibleChannelCount) ' index for the active channels array aChannelIndex = 1 For channelIndex = 1 to inDocument.channels.Count If channelIndex <= visibleChannelCount Then Set aChannelArray(aChannelIndex) = inDocument.Channels(channelIndex) aChannelIndex = aChannelIndex + 1 End If Next End Function Private Function StandardDeviationHistogram(inputArray) Dim numPixels, sum1, sum2, x, gray numPixels = 0 sum1 = 0.0 sum2 = 0.0 ' Compute totals for the various statistics For gray = 0 To 255 x = inputArray(gray) numPixels = numPixels + x sum1 = sum1 + x * gray sum2 = sum2 + x * (gray * gray) Next StandardDeviationHistogram = Sqr((sum2 - (sum1 * sum1) / numPixels) / (numPixels -1)) End Function Private Function AverageHistogram(inputArray) Dim numPixels, sum1, sum2, x, gray numPixels = 0 sum1 = 0.0 sum2 = 0.0 ' Compute totals for the various statistics For gray = 0 To 255 x = inputArray(gray) numPixels = numPixels + x sum1 = sum1 + x * gray sum2 = sum2 + x * (gray * gray) Next AverageHistogram = sum1 / numPixels End Function Private Function MedianHistogram(inputArray, numPixels) Dim gray, total, mid gray = 0 total = inputArray(0) mid = (numPixels + 1) / 2 Do While (total < mid) gray = gray + 1 total = total + inputArray(gray) Loop MedianHistogram = gray End function

371

Appendix O: UniGene annotated genes included in thesis List sorted alphabetically by UniGene Symbol (Build #184). Also present on the CD-

ROM attached to this thesis in Microsoft Excel format.

UniGene Symbol

UniGene Cluster UniGene Name

ABCC3 Hs.463421 ATP-binding cassette, sub-family C (CFTR/MRP), member 3 ABCG2 Hs.480218 ATP-binding cassette, sub-family G (WHITE), member 2 ABLIM1 Hs.438236 Actin binding LIM protein 1 ACADS Hs.507076 Acyl-Coenzyme A dehydrogenase, C-2 to C-3 short chain ACP6 Hs.528084 Acid phosphatase 6, lysophosphatidic ADA Hs.407135 Adenosine deaminase

ADAMTS4 Hs.211604 A disintegrin-like and metalloprotease (reprolysin type) with thrombospondin type 1 motif, 4

AFP Hs.518808 Alpha-fetoprotein AGR2 Hs.530009 Anterior gradient 2 homolog (Xenopus laevis)

AGT Hs.19383 Angiotensinogen (serine (or cysteine) proteinase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 8)

AHSG Hs.324746 Alpha-2-HS-glycoprotein AIM2 Hs.281898 Absent in melanoma 2 AKAP8 Hs.199029 A kinase (PRKA) anchor protein 8 AKT2 Hs.515406 V-akt murine thymoma viral oncogene homolog 2 AMBP Hs.436911 Alpha-1-microglobulin/bikunin precursor AMHR2 Hs.437877 Anti-Mullerian hormone receptor, type II ANAPC7 Hs.529280 Anaphase promoting complex subunit 7 ANKH Hs.156727 Ankylosis, progressive homolog (mouse) ANXA1 Hs.494173 Annexin A1 APLP1 Hs.74565 Amyloid beta (A4) precursor-like protein 1 APOA2 Hs.237658 Apolipoprotein A-II APOB Hs.120759 Apolipoprotein B (including Ag(x) antigen) ARL1 Hs.372616 ADP-ribosylation factor-like 1 ASS Hs.160786 Argininosuccinate synthetase ASTN Hs.495897 Astrotactin AZGP1 Hs.546239 Alpha-2-glycoprotein 1, zinc B1 Hs.372360 Parathyroid hormone-responsive B1 gene BAP1 Hs.106674 BRCA1 associated protein-1 (ubiquitin carboxy-terminal hydrolase) BASE Hs.434194 Breast cancer and salivary gland expression gene BAT2 Hs.436093 HLA-B associated transcript 2 BAX Hs.159428 BCL2-associated X protein BCL2 Hs.150749 B-cell CLL/lymphoma 2 BDH Hs.274539 3-hydroxybutyrate dehydrogenase (heart, mitochondrial) BENE Hs.185055 BENE protein BEX2 Hs.398989 Brain expressed X-linked 2 BF Hs.69771 B-factor, properdin BGLAP Hs.512679 Bone gamma-carboxyglutamate (gla) protein (osteocalcin) BIRC5 Hs.514527 Baculoviral IAP repeat-containing 5 (survivin) BMP5 Hs.296648 Bone morphogenetic protein 5 BMP6 Hs.285671 Bone morphogenetic protein 6 BMP7 Hs.473163 Bone morphogenetic protein 7 (osteogenic protein 1) BRAF Hs.490366 V-raf murine sarcoma viral oncogene homolog B1 BRCA1 Hs.194143 Breast cancer 1, early onset BRCA2 Hs.34012 Breast cancer 2, early onset

BRF1 Hs.424484 BRF1 homolog, subunit of RNA polymerase III transcription initiation factor IIIB (S. cerevisiae)

BST2 Hs.118110 Bone marrow stromal cell antigen 2 BTG2 Hs.519162 BTG family, member 2

372

UniGene Symbol


BTG3 Hs.473420 BTG family, member 3 C2 Hs.408903 Complement component 2 C3 Hs.529053 Complement component 3 CA2 Hs.155097 Carbonic anhydrase II CALML3 Hs.239600 Calmodulin-like 3 CARD8 Hs.446146 Caspase recruitment domain family, member 8 CASP1 Hs.2490 Caspase 1, apoptosis-related cysteine protease (interleukin 1, beta, convertase) CASP4 Hs.138378 Caspase 4, apoptosis-related cysteine protease CAV1 Hs.74034 Caveolin 1, caveolae protein, 22kDa CCL8 Hs.271387 Chemokine (C-C motif) ligand 8 CCNB1 Hs.23960 Cyclin B1 CCNB2 Hs.194698 Cyclin B2 CCND1 Hs.523852 Cyclin D1 (PRAD1: parathyroid adenomatosis 1) CCNE1 Hs.244723 Cyclin E1 CCNG1 Hs.79101 Cyclin G1 CCR1 Hs.301921 Chemokine (C-C motif) receptor 1 CCR5 Hs.546245 Chemokine (C-C motif) receptor 5 CD4 Hs.17483 CD4 antigen (p55) CD9 Hs.114286 CD9 antigen (p24) CDC2 Hs.334562 Cell division cycle 2, G1 to S and G2 to M CDH1 Hs.461086 Cadherin 1, type 1, E-cadherin (epithelial) CDH2 Hs.464829 Cadherin 2, type 1, N-cadherin (neuronal) CDH6 Hs.171054 Cadherin 6, type 2, K-cadherin (fetal kidney) CDK4 Hs.95577 Cyclin-dependent kinase 4 CDK6 Hs.119882 Cyclin-dependent kinase 6

CEACAM1 Hs.512682 Carcinoembryonic antigen-related cell adhesion molecule 1 (biliary glycoprotein)

CFTR Hs.489786 Cystic fibrosis transmembrane conductance regulator, ATP-binding cassette (sub-family C, member 7)

CHD1 Hs.519474 Chromodomain helicase DNA binding protein 1 CHL1 Hs.148909 Cell adhesion molecule with homology to L1CAM (close homolog of L1) CHST1 Hs.104576 Carbohydrate (keratan sulfate Gal-6) sulfotransferase 1 CHST2 Hs.8786 Carbohydrate (N-acetylglucosamine-6-O) sulfotransferase 2 CKM Hs.334347 Creatine kinase, muscle CLDN3 Hs.25640 Claudin 3 CNN1 Hs.465929 Calponin 1, basic, smooth muscle COMP Hs.1584 Cartilage oligomeric matrix protein CP Hs.550470 Ceruloplasmin (ferroxidase) CPZ Hs.78068 Carboxypeptidase Z

CR1 Hs.334019 Complement component (3b/4b) receptor 1, including Knops blood group system

CRABP2 Hs.405662 Cellular retinoic acid binding protein 2 CREM Hs.200250 CAMP responsive element modulator CRIP1 Hs.70327 Cysteine-rich protein 1 (intestinal) CRIP2 Hs.534309 Cysteine-rich protein 2 CROP Hs.130293 Cisplatin resistance-associated overexpressed protein CRYAB Hs.408767 Crystallin, alpha B CSPG2 Hs.443681 Chondroitin sulfate proteoglycan 2 (versican) CSTF3 Hs.44402 Hypothetical protein LOC283267 CTGF Hs.410037 Connective tissue growth factor CTNNAL1 Hs.58488 Catenin (cadherin-associated protein), alpha-like 1 CUGBP2 Hs.309288 CUG triplet repeat, RNA binding protein 2 CUL7 Hs.520136 Cullin 7 CXCL9 Hs.77367 Chemokine (C-X-C motif) ligand 9 DAPK1 Hs.380277 Death-associated protein kinase 1 DCBLD2 Hs.203691 Discoidin, CUB and LCCL domain containing 2 DCDC2 Hs.512603 Doublecortin domain containing 2 DDR1 Hs.520004 Discoidin domain receptor family, member 1 DEFB1 Hs.32949 Defensin, beta 1 DLGAP3 Hs.436393 Discs, large (Drosophila) homolog-associated protein 3

373

UniGene Symbol


DLK1 Hs.533717 Delta-like 1 homolog (Drosophila) DMBT1 Hs.279611 Deleted in malignant brain tumors 1 DNAJD1 Hs.438830 DnaJ (Hsp40) homolog, subfamily D, member 1 DSPG3 Hs.435680 Dermatan sulfate proteoglycan 3 EDEM1 Hs.224616 ER degradation enhancer, mannosidase alpha-like 1 EFEMP2 Hs.381870 EGF-containing fibulin-like extracellular matrix protein 2 EFHD1 Hs.516769 EF hand domain family, member D1 EFNB3 Hs.26988 Ephrin-B3 EGF Hs.419815 Epidermal growth factor (beta-urogastrone) EGR1 Hs.326035 Early growth response 1 ENPP3 Hs.486489 Ectonucleotide pyrophosphatase/phosphodiesterase 3 ENTPD1 Hs.550467 Ectonucleoside triphosphate diphosphohydrolase 1

ERBB2 Hs.446352 V-erb-b2 erythroblastic leukemia viral oncogene homolog 2, neuro/glioblastoma derived oncogene homolog (avian)

ERBB3 Hs.118681 V-erb-b2 erythroblastic leukemia viral oncogene homolog 3 (avian)

ERCC2 Hs.487294 Excision repair cross-complementing rodent repair deficiency, complementation group 2 (xeroderma pigmentosum D)

ESR1 Hs.208124 Estrogen receptor 1 ET Hs.464166 Hypothetical protein ET ETS1 Hs.369438 V-ets erythroblastosis virus E26 oncogene homolog 1 (avian) ETS2 Hs.517296 V-ets erythroblastosis virus E26 oncogene homolog 2 (avian) ETV1 Hs.22634 Hypothetical protein LOC221810 ETV3 Hs.352672 Ets variant gene 3 F2 Hs.76530 Coagulation factor II (thrombin) FAAH Hs.528334 Fatty acid amide hydrolase FANCF Hs.523543 Fanconi anemia, complementation group F FBLN1 Hs.24601 Fibulin 1 FBP2 Hs.61255 Fructose-1,6-bisphosphatase 2 FBXL7 Hs.433057 F-box and leucine-rich repeat protein 7 FECH Hs.465221 Ferrochelatase (protoporphyria) FGF7 Hs.122006 Galactokinase 2 FGL1 Hs.491143 Fibrinogen-like 1 FGR Hs.1422 Gardner-Rasheed feline sarcoma viral (v-fgr) oncogene homolog FLRT2 Hs.533710 Fibronectin leucine rich transmembrane protein 2 FMO5 Hs.303476 Flavin containing monooxygenase 5 FN1 Hs.203717 Fibronectin 1 FNTB Hs.509651 Farnesyltransferase, CAAX box, beta FOLR1 Hs.73769 Folate receptor 1 (adult) FOXF2 Hs.484423 Forkhead box F2 FOXH1 Hs.449410 Forkhead box H1 FPR1 Hs.753 Formyl peptide receptor 1 G2 Hs.502266 G2 protein GAB2 Hs.429434 GRB2-associated binding protein 2 GABRA1 Hs.175934 Gamma-aminobutyric acid (GABA) A receptor, alpha 1 GAGEB1 Hs.128231 P antigen family, member 1 (prostate associated)

GALNT1 Hs.514806 UDP-N-acetyl-alpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase 1 (GalNAc-T1)

GALNT5 Hs.269027 UDP-N-acetyl-alpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase 5 (GalNAc-T5)

GAN Hs.112569 Giant axonal neuropathy (gigaxonin) GAS6 Hs.369201 Growth arrest-specific 6 GATA3 Hs.524134 GATA binding protein 3 GATA4 Hs.243987 GATA binding protein 4 GC Hs.418497 GrouP-specific component (vitamin D binding protein) GCLM Hs.315562 Glutamate-cysteine ligase, modifier subunit GCSH Hs.546256 Glycine cleavage system protein H (aminomethyl carrier) GFI1 Hs.73172 Growth factor independent 1 GJB2 Hs.524894 Gap junction protein, beta 2, 26kDa (connexin 26)

GLDC Hs.149156 Glycine dehydrogenase (decarboxylating; glycine decarboxylase, glycine cleavage system protein P)

374

UniGene Symbol


GLRX Hs.28988 Glutaredoxin (thioltransferase) GMPR Hs.484741 Guanosine monophosphate reductase GNB5 Hs.155090 Guanine nucleotide binding protein (G protein), beta 5 GNLY Hs.105806 Granulysin GPC3 Hs.435036 Glypican 3 GPX2 Hs.2704 Glutathione peroxidase 2 (gastrointestinal) GPX3 Hs.386793 Glutathione peroxidase 3 (plasma) GPX7 Hs.43728 Glutathione peroxidase 7 GRB2 Hs.444356 Growth factor receptor-bound protein 2 GSS Hs.82327 Glutathione synthetase GTPBP4 Hs.215766 GTP binding protein 4 GZMA Hs.90708 Granzyme A (granzyme 1, cytotoxic T-lymphocyte-associated serine esterase 3) HBB Hs.523443 Hemoglobin, beta HBP1 Hs.162032 HMG-box transcription factor 1 HDAC1 Hs.88556 Histone deacetylase 1

HERPUD1 Hs.146393 Homocysteine-inducible, endoplasmic reticulum stress-inducible, ubiquitin-like domain member 1

HIC1 Hs.72956 Hypermethylated in cancer 1 HOXB6 Hs.98428 Homeo box B6 HOXD4 Hs.386365 Homeo box D4 HOXD8 Hs.301963 Homeo box D8 HP Hs.513711 Haptoglobin HPRT1 Hs.412707 Hypoxanthine phosphoribosyltransferase 1 (Lesch-Nyhan syndrome) HPS3 Hs.477898 Hermansky-Pudlak syndrome 3 HSPA6 Hs.3268 Heat shock 70kDa protein 6 (HSP70B') ID4 Hs.519601 Inhibitor of DNA binding 4, dominant negative helix-looP-helix protein IFIT1 Hs.20315 Interferon-induced protein with tetratricopeptide repeats 1 IFITM2 Hs.174195 Interferon induced transmembrane protein 2 (1-8D) IFNAR2 Hs.549042 Interferon (alpha, beta and omega) receptor 2 IFNGR2 Hs.517240 Interferon gamma receptor 2 (interferon gamma transducer 1) IGFBP5 Hs.369982 Insulin-like growth factor binding protein 5 IGHD Hs.439852 Immunoglobulin heavy constant delta IGHG1 Hs.525648 Immunoglobulin heavy constant gamma 1 (G1m marker) IGLC2 Hs.449585 Immunoglobulin lambda variable 3-21 IGLL1 Hs.348935 Immunoglobulin lambda-like polypeptide 1 IL3 Hs.694 Interleukin 3 (colony-stimulating factor, multiple) IL8 Hs.624 Interleukin 8 IMPA1 Hs.492120 Inositol(myo)-1(or 4)-monophosphatase 1 IMPDH2 Hs.476231 IMP (inosine monophosphate) dehydrogenase 2 INDO Hs.840 Indoleamine-pyrrole 2,3 dioxygenase INHBA Hs.28792 Inhibin, beta A (activin A, activin AB alpha polypeptide) IQGAP2 Hs.291030 IQ motif containing GTPase activating protein 2 IRF1 Hs.436061 Interferon regulatory factor 1 ISYNA1 Hs.405873 Myo-inositol 1-phosphate synthase A1 ITGA2 Hs.482077 Integrin, alpha 2 (CD49B, alpha 2 subunit of VLA-2 receptor) ITGA4 Hs.553495 Integrin, alpha 4 (antigen CD49D, alpha 4 subunit of VLA-4 receptor) ITGA6 Hs.133397 Integrin, alpha 6 ITGA9 Hs.113157 Integrin, alpha 9

ITGB2 Hs.375957 Integrin, beta 2 (antigen CD18 (p95), lymphocyte function-associated antigen 1; macrophage antigen 1 (mac-1) beta subunit)

ITGB8 Hs.285724 Integrin, beta 8 ITGBL1 Hs.508597 Integrin, beta-like 1 (with EGF-like repeat domains) ITIH2 Hs.75285 Inter-alpha (globulin) inhibitor H2 ITPR2 Hs.512235 Inositol 1,4,5-triphosphate receptor, type 2 JDP2 Hs.196482 Jun dimerization protein 2 JUN Hs.525704 V-jun sarcoma virus 17 oncogene homolog (avian) KAL1 Hs.521869 Kallmann syndrome 1 sequence KCNH2 Hs.188021 Potassium voltage-gated channel, subfamily H (eag-related), member 2

KCNMB2 Hs.478368 Potassium large conductance calcium-activated channel, subfamily M, beta member 2

375

UniGene Symbol


KCNS3 Hs.414489 Potassium voltage-gated channel, delayed-rectifier, subfamily S, member 3 KDR Hs.479756 Kinase insert domain receptor (a type III receptor tyrosine kinase) KLK2 Hs.515560 Kallikrein 2, prostatic KLK4 Hs.218366 Kallikrein 4 (prostase, enamel matrix, prostate) KLK5 Hs.50915 Kallikrein 5 KLK7 Hs.151254 Kallikrein 7 (chymotryptic, stratum corneum) KLK8 Hs.104570 Kallikrein 8 (neuropsin/ovasin) KLK9 Hs.448942 Kallikrein 9 KRAS2 Hs.505033 V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog LAMB2 Hs.439726 Laminin, beta 2 (laminin S) LATS2 Hs.78960 LATS, large tumor suppressor, homolog 2 (Drosophila) LBP Hs.154078 Lipopolysaccharide binding protein LCP1 Hs.381099 Lymphocyte cytosolic protein 1 (L-plastin) LDOC1 Hs.45231 Leucine zipper, down-regulated in cancer 1 LGALS1 Hs.445351 Lectin, galactoside-binding, soluble, 1 (galectin 1) LGALS4 Hs.5302 Lectin, galactoside-binding, soluble, 4 (galectin 4) LHB Hs.154704 Luteinizing hormone beta polypeptide LOX Hs.102267 Lysyl oxidase LPHN2 Hs.24212 Latrophilin 2 LTB Hs.376208 Lymphotoxin beta (TNF superfamily, member 3) LTBP2 Hs.512776 Latent transforming growth factor beta binding protein 2 LTBR Hs.1116 Lymphotoxin beta receptor (TNFR superfamily, member 3) LU Hs.155048 Lutheran blood group (Auberger b antigen included) LYPDC1 Hs.432395 LY6/PLAUR domain containing 1 LYRIC Hs.377155 LYRIC/3D3 MAG Hs.348346 Malignancy-associated protein MALAT1 Hs.187199 Metastasis associated lung adenocarcinoma transcript 1 (non-coding RNA) MAPK1 Hs.431850 Mitogen-activated protein kinase 1 MAPK8 Hs.522924 Mitogen-activated protein kinase 8 MATN2 Hs.189445 Matrilin 2 MATN3 Hs.6985 Matrilin 3 MAX Hs.285354 MYC associated factor X MEG3 Hs.525589 Maternally expressed 3 MEIS1 Hs.526754 Meis1, myeloid ecotropic viral integration site 1 homolog (mouse) MEIS2 Hs.510989 Meis1, myeloid ecotropic viral integration site 1 homolog 2 (mouse) MGST3 Hs.191734 Microsomal glutathione S-transferase 3 MIB1 Hs.140903 Mindbomb homolog 1 (Drosophila) MLF2 Hs.524214 Myeloid leukemia factor 2 MPO Hs.458272 Myeloperoxidase MSH2 Hs.156519 MutS homolog 2, colon cancer, nonpolyposis type 1 (E. coli) MSLN Hs.408488 Mesothelin MUC1 Hs.89603 Mucin 1, transmembrane MUC4 Hs.369646 Mucin 4, tracheobronchial

MX1 Hs.517307 Myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse)

MYCN Hs.25960 V-myc myelocytomatosis viral related oncogene, neuroblastoma derived (avian) MYH8 Hs.534028 Myosin, heavy polypeptide 8, skeletal muscle, perinatal MYLK Hs.477375 Myosin, light polypeptide kinase MYOD1 Hs.181768 Myogenic factor 3 NAALAD2 Hs.503560 N-acetylated alpha-linked acidic dipeptidase 2 NAPSA Hs.512843 Napsin A aspartic peptidase NBS1 Hs.492208 Nijmegen breakage syndrome 1 (nibrin) NCAM1 Hs.503878 Neural cell adhesion molecule 1 NCL Hs.79110 Nucleolin NDP Hs.522615 Norrie disease (pseudoglioma) NFATC4 Hs.77810 Nuclear factor of activated T-cells, cytoplasmic, calcineurin-dependent 4

NFKBIA Hs.81328 Nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha

NGFB Hs.2561 Nerve growth factor, beta polypeptide NGFRAP1 Hs.448588 Nerve growth factor receptor (TNFRSF16) associated protein 1

376

UniGene Symbol


NID2 Hs.369840 Nidogen 2 (osteonidogen) NLGN1 Hs.549114 Neuroligin 1 NPTX2 Hs.3281 Neuronal pentraxin II NRG2 Hs.408515 Neuregulin 2 NRGN Hs.524116 Neurogranin (protein kinase C substrate, RC3) NRN1 Hs.103291 Neuritin 1 NRXN2 Hs.372938 Neurexin 2 OAT Hs.523332 Ornithine aminotransferase (gyrate atrophy) ODZ4 Hs.213087 Odz, odd Oz/ten-m homolog 4 (Drosophila) OLFM4 Hs.508113 Olfactomedin 4 OPHN1 Hs.128824 Oligophrenin 1 ORM2 Hs.522356 Orosomucoid 2 OXTR Hs.2820 Oxytocin receptor

PAEP Hs.532325 Progestagen-associated endometrial protein (placental protein 14, pregnancy-associated endometrial alpha-2-globulin, alpha uterine protein)

PARP4 Hs.117825 Poly (ADP-ribose) polymerase family, member 4 PAX8 Hs.469728 Paired box gene 8 PBX1 Hs.493096 Pre-B-cell leukemia transcription factor 1 PC Hs.89890 Pyruvate carboxylase PCP4 Hs.80296 Purkinje cell protein 4 PDCD8 Hs.424932 Programmed cell death 8 (apoptosis-inducing factor) PDZK3 Hs.481819 PDZ domain containing 3 PECAM1 Hs.514412 Platelet/endothelial cell adhesion molecule (CD31 antigen) PEG3 Hs.201776 Paternally expressed 3 PFKM Hs.75160 Phosphofructokinase, muscle PHB Hs.514303 Prohibitin PHLDA2 Hs.154036 Pleckstrin homology-like domain, family A, member 2 PITX2 Hs.92282 Paired-like homeodomain transcription factor 2 PLAGL2 Hs.154104 Pleiomorphic adenoma gene-like 2 PLAU Hs.77274 Plasminogen activator, urokinase PLAUR Hs.466871 Plasminogen activator, urokinase receptor PLTP Hs.439312 Phospholipid transfer protein PLXNB1 Hs.476209 Plexin B1 PLXND1 Hs.301685 Plexin D1 PMM2 Hs.459855 Phosphomannomutase 2 PNMA1 Hs.194709 Paraneoplastic antigen MA1 PNOC Hs.88218 Prepronociceptin POSTN Hs.136348 Periostin, osteoblast specific factor PPARG Hs.162646 Peroxisome proliferative activated receptor, gamma PPFIBP1 Hs.172445 PTPRF interacting protein, binding protein 1 (liprin beta 1) PRAME Hs.30743 Preferentially expressed antigen in melanoma

PRG2 Hs.512633 Proteoglycan 2, bone marrow (natural killer cell activator, eosinophil granule major basic protein)

PRKCI Hs.399873 Protein kinase C, iota PROM1 Hs.479220 Prominin 1 PRSS8 Hs.75799 Protease, serine, 8 (prostasin) PSME1 Hs.75348 Proteasome (prosome, macropain) activator subunit 1 (PA28 alpha) PTGIS Hs.302085 Prostaglandin I2 (prostacyclin) synthase

PTGS1 Hs.201978 Prostaglandin-endoperoxide synthase 1 (prostaglandin G/H synthase and cyclooxygenase)

PTGS2 Hs.196384 Prostaglandin-endoperoxide synthase 2 (prostaglandin G/H synthase and cyclooxygenase)

PTK2 Hs.395482 PTK2 protein tyrosine kinase 2 PTOV1 Hs.515540 Prostate tumor overexpressed gene 1 PTPRF Hs.272062 Protein tyrosine phosphatase, receptor type, F PTPRN2 Hs.490789 Protein tyrosine phosphatase, receptor type, N polypeptide 2 PTPRU Hs.19718 Protein tyrosine phosphatase, receptor type, U PTTG1 Hs.350966 Pituitary tumor-transforming 1 PVALB Hs.295449 Parvalbumin PVRL3 Hs.293917 Poliovirus receptor-related 3

377

UniGene Symbol


RALBP1 Hs.528993 RalA binding protein 1 RAN Hs.10842 RAN, member RAS oncogene family RAPGEF3 Hs.8578 Rap guanine nucleotide exchange factor (GEF) 3 RB1 Hs.408528 Retinoblastoma 1 (including osteosarcoma) RBBP6 Hs.188553 Retinoblastoma binding protein 6 RBP1 Hs.529571 Retinol binding protein 1, cellular RCN2 Hs.79088 Reticulocalbin 2, EF-hand calcium binding domain REPS2 Hs.186810 RALBP1 associated Eps domain containing 2 RGS1 Hs.75256 Regulator of G-protein signalling 1 RHOH Hs.160673 Ras homolog gene family, member H RNASE4 Hs.283749 Angiogenin, ribonuclease, RNase A family, 5 RNASE6 Hs.23262 Ribonuclease, RNase A family, k6 RNPC1 Hs.236361 RNA-binding region (RNP1, RRM) containing 1 RRM2 Hs.226390 Ribonucleotide reductase M2 polypeptide RUNX3 Hs.170019 Runt-related transcription factor 3 RYR1 Hs.466664 Ryanodine receptor 1 (skeletal) SALL2 Hs.134709 Sal-like 2 (Drosophila) SCAP1 Hs.316931 Src family associated phosphoprotein 1 SCARA3 Hs.128856 Scavenger receptor class A, member 3 SCD Hs.368641 Stearoyl-CoA desaturase (delta-9-desaturase) SCOC Hs.480815 Short coiled-coil protein

SCYE1 Hs.480465 Small inducible cytokine subfamily E, member 1 (endothelial monocyte-activating)

SDC3 Hs.158287 Syndecan 3 (N-syndecan) SDS Hs.439023 Serine dehydratase

SERPINA5 Hs.510334 Serine (or cysteine) proteinase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 5

SERPING1 Hs.384598 Serine (or cysteine) proteinase inhibitor, clade G (C1 inhibitor), member 1, (angioedema, hereditary)

SESN1 Hs.59554 Sestrin 1 SFN Hs.523718 Stratifin SFTPB Hs.512690 Surfactant, pulmonary-associated protein B SGCB Hs.438953 Sarcoglycan, beta (43kDa dystrophin-associated glycoprotein) SGCE Hs.371199 Sarcoglycan, epsilon SHD1 Hs.23642 SAC3 domain containing 1 SKIIP Hs.445498 SKI interacting protein SLPI Hs.517070 Secretory leukocyte protease inhibitor (antileukoproteinase) SMAD7 Hs.465087 SMAD, mothers against DPP homolog 7 (Drosophila)

SMARCAL1 Hs.516674 SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a-like 1

SMARCD3 Hs.444445 SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily d, member 3

SMCY Hs.80358 Smcy homolog, Y-linked (mouse) SOC Hs.145061 Socius SPINK1 Hs.407856 Serine protease inhibitor, Kazal type 1 SPON1 Hs.445818 Spondin 1, extracellular matrix protein

SPP1 Hs.313 Secreted phosphoprotein 1 (osteopontin, bone sialoprotein I, early T-lymphocyte activation 1)

SPTBN1 Hs.503178 Spectrin, beta, non-erythrocytic 1 SRY Hs.1992 Sex determining region Y SSPN Hs.183428 Sarcospan (Kras oncogene-associated gene) ST5 Hs.117715 Suppression of tumorigenicity 5 STAT3 Hs.463059 Signal transducer and activator of transcription 3 (acute-phase response factor) STK6 Hs.250822 Serine/threonine kinase 6 STOM Hs.253903 Stomatin STX6 Hs.518417 Syntaxin 6 SULF1 Hs.409602 Sulfatase 1 SYK Hs.371720 Spleen tyrosine kinase SYT7 Hs.502730 Synaptotagmin VII T1 Hs.26814 Tularik gene 1

378

UniGene Symbol


TBP Hs.1100 TATA box binding protein TCF7 Hs.519580 Transcription factor 7 (T-cell specific, HMG-box) TCF8 Hs.124503 Transcription factor 8 (represses interleukin 2 expression) TFF1 Hs.162807 Trefoil factor 1 (breast cancer, estrogen-inducible sequence expressed in) TFF3 Hs.82961 Trefoil factor 3 (intestinal) TGFA Hs.170009 Transforming growth factor, alpha TIMELESS Hs.118631 Timeless homolog (Drosophila) TLE4 Hs.444213 Transducin-like enhancer of split 4 (E(sp1) homolog, Drosophila) TLR3 Hs.29499 Toll-like receptor 3 TMEFF1 Hs.336224 Transmembrane protein with EGF-like and two follistatin-like domains 1 TNC Hs.143250 Tenascin C (hexabrachion) TNF Hs.241570 Tumor necrosis factor (TNF superfamily, member 2) TNFAIP2 Hs.525607 Tumor necrosis factor, alpha-induced protein 2 TNFAIP6 Hs.437322 Tumor necrosis factor, alpha-induced protein 6 TNNT1 Hs.534085 Troponin T1, skeletal, slow TNRC9 Hs.460789 Trinucleotide repeat containing 9 TRA1 Hs.192374 Tumor rejection antigen (gp96) 1 TREX1 Hs.344812 Three prime repair exonuclease 1 TRIP6 Hs.534360 Thyroid hormone receptor interactor 6 TRMT1 Hs.439524 TRNA (5-methylaminomethyl-2-thiouridylate)-methyltransferase 1 TRO Hs.434971 Trophinin TTK Hs.169840 TTK protein kinase TYROBP Hs.515369 TYRO protein tyrosine kinase binding protein UCC1 Hs.416007 Ependymin related protein 1 (zebrafish) UCHL1 Hs.518731 Ubiquitin carboxyl-terminal esterase L1 (ubiquitin thiolesterase) UCP2 Hs.80658 Uncoupling protein 2 (mitochondrial, proton carrier) UTY Hs.115277 Ubiquitously transcribed tetratricopeptide repeat gene, Y-linked VAV3 Hs.267659 Vav 3 oncogene VEGF Hs.73793 Vascular endothelial growth factor VIL1 Hs.534364 Villin 1 VIL2 Hs.487027 Villin 2 (ezrin) WAS Hs.2157 Wiskott-Aldrich syndrome (eczema-thrombocytopenia) WASF1 Hs.75850 WAS protein family, member 1 WBP5 Hs.533287 WW domain binding protein 5 WNT1 Hs.248164 Wingless-type MMTV integration site family, member 1 WT1 Hs.408453 Wilms tumor 1 XPR1 Hs.227656 Xenotropic and polytropic retrovirus receptor XTP2 Hs.494614 BAT2 domain containing 1 YIF1 Hs.446445 Yip1 interacting factor homolog (S. cerevisiae) ZAK Hs.444451 Sterile alpha motif and leucine zipper containing kinase AZK ZFPM2 Hs.431009 Zinc finger protein, multitype 2

379

App

endi

x P:

Gen

es d

iffer

entia

lly e

xpre

ssed

bet

wee

n se

rous

LM

P an

d in

vasi

ve E

OC

aft

er

excl

udin

g th

ose

invo

lved

in c

ell-c

ycle

reg

ulat

ion

and

the

imm

une

resp

onse

Ta

ble

sorte

d by

incr

easi

ng m

ean

diff

eren

ce in

LM

P:in

vasi

ve E

OC

exp

ress

ion.

Sym

bol

Mea

n ex

pres

sion

ra

tio:

Inva

sive

E

OC

Mea

n ex

pres

sion

ra

tio:

LM

P

Mea

n fo

ld

chan

ce

diff

eren

ce

Uni

Gen

e sy

mbo

l U

nige

ne

Clu

ster

U

niG

ene

Nam

e

CLD

N10

0.

554

7.05

1 0.

079

CLD

N10

H

s.534

377

Cla

udin

10

CH

L1

0.19

7 2.

129

0.09

3 C

HL1

H

s.148

909

Cel

l adh

esio

n m

olec

ule

with

hom

olog

y to

L1C

AM

(clo

se h

omol

og o

f L1)

U

PK1B

0.

333

1.78

3 0.

187

UPK

1B

Hs.2

7158

0 U

ropl

akin

1B

TS

PAN

1 0.

223

1.08

5 0.

205

TSPA

N1

Hs.3

8972

Te

trasp

anin

1

DLE

C1

0.96

1 4.

546

0.21

1 D

LEC

1 H

s.277

589

Del

eted

in lu

ng a

nd e

soph

agea

l can

cer 1

TF

F3

0.27

7 1.

206

0.23

0 TF

F3

Hs.8

2961

Tr

efoi

l fac

tor 3

(int

estin

al)

KLK

11

0.58

6 2.

374

0.24

7 K

LK11

H

s.577

71

Kal

likre

in 1

1 M

UC

4 0.

717

2.72

7 0.

263

MU

C4

Hs.3

6964

6 M

ucin

4, t

rach

eobr

onch

ial

ARH

I 0.

346

1.25

2 0.

276

ARH

I H

s.194

695

DIR

AS

fam

ily, G

TP-b

indi

ng R

AS-

like

3 TC

F21

0.25

1 0.

906

0.27

7 TC

F21

Hs.7

8061

Tr

ansc

riptio

n fa

ctor

21

ANXA

4 0.

305

1.07

6 0.

283

ANXA

4 H

s.422

986

Ann

exin

A4

ARH

I 0.

437

1.48

8 0.

294

ARH

I H

s.194

695

DIR

AS

fam

ily, G

TP-b

indi

ng R

AS-

like

3 U

CC

1 0.

305

1.03

1 0.

296

UC

C1

Hs.4

1600

7 Ep

endy

min

rela

ted

prot

ein

1 (z

ebra

fish)

PR

OM

1 0.

365

1.20

3 0.

303

PRO

M1

Hs.4

7922

0 Pr

omin

in 1

C

AV2

0.34

2 1.

115

0.30

7 C

AV2

Hs.2

1233

2 C

aveo

lin 2

FL

RT3

0.44

5 1.

388

0.32

1 FL

RT3

Hs.4

1296

Fi

bron

ectin

leuc

ine

rich

trans

mem

bran

e pr

otei

n 3

PPL

1.00

8 3.

050

0.33

0 PP

L H

s.192

233

Perip

laki

n C

DH

8 0.

623

1.82

1 0.

342

CD

H8

Hs.3

6832

2 C

adhe

rin 8

, typ

e 2

ARG

BP2

0.43

2 1.

246

0.34

6 AR

GBP

2 H

s.481

342

Arg

/Abl

-inte

ract

ing

prot

ein

Arg

BP2

PA

R1

0.48

2 1.

388

0.34

7 PA

R1

Hs.5

4684

7 Pr

ader

-Will

i/Ang

elm

an re

gion

-1

SPRY

2 0.

754

2.14

6 0.

352

SPRY

2 H

s.186

76

Spro

uty

hom

olog

2 (D

roso

phila

)

38

0

Sym

bol

Mea

n ex

pres

sion

ra

tio:

Inva

sive

E

OC

Mea

n ex

pres

sion

ra

tio:

LM

P

Mea

n fo

ld

chan

ce

diff

eren

ce

Uni

Gen

e sy

mbo

l U

nige

ne

Clu

ster

U

niG

ene

Nam

e

TAC

C1

0.44

6 1.

247

0.35

7 TA

CC

1 H

s.279

245

Tran

sfor

min

g, a

cidi

c co

iled-

coil

cont

aini

ng p

rote

in 1

M

ET

0.69

0 1.

814

0.38

0 M

ET

Hs.1

3296

6 M

et p

roto

-onc

ogen

e (h

epat

ocyt

e gr

owth

fact

or re

cept

or)

NRC

AM

1.06

6 2.

792

0.38

2 N

RCAM

H

s.214

22

Neu

rona

l cel

l adh

esio

n m

olec

ule

CTS

L2

1.11

0 2.

893

0.38

4 C

TSL2

H

s.874

17

Cat

heps

in L

2 PT

PN3

0.66

8 1.

722

0.38

8 PT

PN3

Hs.4

3642

9 Pr

otei

n ty

rosi

ne p

hosp

hata

se, n

on-r

ecep

tor t

ype

3 D

HX3

4 0.

579

1.48

5 0.

390

DH

X34

Hs.1

5170

6 D

EAH

(AsP

-Glu

-Ala

-His

) box

pol

ypep

tide

34

PRK

CG

0.

435

1.10

5 0.

394

PRK

CG

H

s.289

0 Pr

otei

n ki

nase

C, g

amm

a IN

HBB

0.

668

1.68

1 0.

398

INH

BB

Hs.1

735

Inhi

bin,

bet

a B

(act

ivin

AB

bet

a po

lype

ptid

e)

DD

X6

0.61

8 1.

538

0.40

2 D

DX6

H

s.408

461

DEA

D (A

sP-G

lu-A

la-A

sp) b

ox p

olyp

eptid

e 6

CD

44

0.52

1 1.

285

0.40

5 C

D44

H

s.502

328

CD

44 a

ntig

en (h

omin

g fu

nctio

n an

d In

dian

blo

od g

roup

syst

em)

SSPN

1.

095

2.69

6 0.

406

SSPN

H

s.183

428

Sarc

ospa

n (K

ras o

ncog

ene-

asso

ciat

ed g

ene)

C

DH

1 0.

576

1.40

4 0.

410

CD

H1

Hs.4

6108

6 C

adhe

rin 1

, typ

e 1,

E-c

adhe

rin (e

pith

elia

l) IT

GA2

0.

481

1.15

9 0.

415

ITG

A2

Hs.4

8207

7 In

tegr

in, a

lpha

2 (C

D49

B, a

lpha

2 su

buni

t of V

LA-2

rece

ptor

) TN

FRSF

21

0.55

6 1.

265

0.43

9 TN

FRSF

21

Hs.4

4357

7 Tu

mor

nec

rosi

s fac

tor r

ecep

tor s

uper

fam

ily, m

embe

r 21

MM

P10

0.78

0 1.

772

0.44

0 M

MP1

0 H

s.225

8 M

atrix

met

allo

prot

eina

se 1

0 (s

trom

elys

in 2

) PD

GFR

A 0.

520

1.17

6 0.

442

PDG

FRA

Hs.7

4615

Pl

atel

et-d

eriv

ed g

row

th fa

ctor

rece

ptor

, alp

ha p

olyp

eptid

e FL

RT2

0.39

2 0.

882

0.44

5 FL

RT2

Hs.5

3371

0 Fi

bron

ectin

leuc

ine

rich

trans

mem

bran

e pr

otei

n 2

SOX4

0.

715

1.57

4 0.

454

SOX4

H

s.357

901

SRY

(sex

det

erm

inin

g re

gion

Y)-

box

4 PT

PRN

2 0.

337

0.73

7 0.

457

PTPR

N2

Hs.4

9078

9 Pr

otei

n ty

rosi

ne p

hosp

hata

se, r

ecep

tor t

ype,

N p

olyp

eptid

e 2

ABC

C2

0.51

6 1.

121

0.46

0 AB

CC

2 H

s.368

243

ATP

-bin

ding

cas

sette

, sub

-fam

ily C

(CFT

R/M

RP)

, mem

ber 2

AN

XA13

0.

497

1.07

8 0.

461

ANXA

13

Hs.1

8110

7 A

nnex

in A

13

PTG

S2

0.93

4 2.

006

0.46

6 PT

GS2

H

s.196

384

Pros

tagl

andi

n-en

dope

roxi

de sy

ntha

se 2

(pro

stag

land

in G

/H sy

ntha

se a

nd

cycl

ooxy

gena

se)

CD

47

1.04

4 2.

238

0.46

6 C

D47

H

s.446

414

CD

47 a

ntig

en (R

h-re

late

d an

tigen

, int

egrin

-ass

ocia

ted

sign

al tr

ansd

ucer

) M

ET

0.68

6 1.

462

0.46

9 M

ET

Hs.1

3296

6 M

et p

roto

-onc

ogen

e (h

epat

ocyt

e gr

owth

fact

or re

cept

or)

NRX

N3

0.88

5 1.

881

0.47

0 N

RXN

3 H

s.368

307

Neu

rexi

n 3

SORL

1 1.

040

2.19

3 0.

474

SORL

1 H

s.368

592

Sorti

lin-r

elat

ed re

cept

or, L

(DLR

cla

ss) A

repe

ats-

cont

aini

ng

LIFR

0.

630

1.32

1 0.

477

LIFR

H

s.133

421

Leuk

emia

inhi

bito

ry fa

ctor

rece

ptor

381

Sym

bol

Mea

n ex

pres

sion

ra

tio:

Inva

sive

E

OC

Mea

n ex

pres

sion

ra

tio:

LM

P

Mea

n fo

ld

chan

ce

diff

eren

ce

Uni

Gen

e sy

mbo

l U

nige

ne

Clu

ster

U

niG

ene

Nam

e

ACVR

1B

0.72

9 1.

510

0.48

3 AC

VR1B

H

s.438

918

Act

ivin

A re

cept

or, t

ype

IB

VIL2

0.

653

1.35

0 0.

484

VIL2

H

s.487

027

Vill

in 2

(ezr

in)

MAP

K10

0.

479

0.98

9 0.

485

MAP

K10

H

s.252

09

Mito

gen-

activ

ated

pro

tein

kin

ase

10

PTPR

U

0.96

7 1.

992

0.48

6 PT

PRU

H

s.197

18

Prot

ein

tyro

sine

pho

spha

tase

, rec

epto

r typ

e, U

C

DH

11

0.51

9 1.

048

0.49

5 C

DH

11

Hs.1

1647

1 C

adhe

rin 1

1, ty

pe 2

, OB

-cad

herin

(ost

eobl

ast)

FGFR

3 0.

632

1.23

5 0.

511

FGFR

3 H

s.142

0 Fi

brob

last

gro

wth

fact

or re

cept

or 3

(ach

ondr

opla

sia,

than

atop

horic

dw

arfis

m)

TM4S

F9

0.56

3 1.

080

0.52

1 TM

4SF9

H

s.118

118

Tetra

span

in 5

LA

D1

0.59

2 1.

126

0.52

6 LA

D1

Hs.5

1903

5 La

dini

n 1

PTK

6 0.

472

0.89

2 0.

530

PTK

6 H

s.511

33

PTK

6 pr

otei

n ty

rosi

ne k

inas

e 6

KLK

6 1.

171

2.17

1 0.

539

KLK

6 H

s.793

61

Kal

likre

in 6

(neu

rosi

n, z

yme)

AK

AP12

0.

636

1.16

0 0.

549

AKAP

12

Hs.3

7124

0 A

kin

ase

(PR

KA

) anc

hor p

rote

in (g

ravi

n) 1

2 FG

FR3

0.70

5 1.

248

0.56

5 FG

FR3

Hs.1

420

Fibr

obla

st g

row

th fa

ctor

rece

ptor

3 (a

chon

drop

lasi

a, th

anat

opho

ric d

war

fism

) PD

CD

4 0.

599

1.06

1 0.

565

PDC

D4

Hs.2

3254

3 Pr

ogra

mm

ed c

ell d

eath

4 (n

eopl

astic

tran

sfor

mat

ion

inhi

bito

r)

WIS

P3

0.94

1 1.

666

0.56

5 W

ISP3

H

s.549

081

WN

T1 in

duci

ble

sign

alin

g pa

thw

ay p

rote

in 3

IT

GA6

0.

557

0.98

1 0.

568

ITG

A6

Hs.1

3339

7 In

tegr

in, a

lpha

6

PTG

ES

0.99

0 1.

740

0.56

9 PT

GES

H

s.146

688

Pros

tagl

andi

n E

synt

hase

TF

F1

0.52

1 0.

914

0.57

0 TF

F1

Hs.1

6280

7 Tr

efoi

l fac

tor 1

(bre

ast c

ance

r, es

troge

n-in

duci

ble

sequ

ence

exp

ress

ed in

) FG

FBP1

0.

984

1.71

4 0.

574

FGFB

P1

Hs.1

690

Fibr

obla

st g

row

th fa

ctor

bin

ding

pro

tein

1

TM4S

F4

0.53

6 0.

925

0.57

9 TM

4SF4

H

s.133

527

Tran

smem

bran

e 4

L si

x fa

mily

mem

ber 4

C

LDN

11

0.55

7 0.

952

0.58

5 C

LDN

11

Hs.3

1595

C

laud

in 1

1 (o

ligod

endr

ocyt

e tra

nsm

embr

ane

prot

ein)

PTG

S2

1.05

8 1.

805

0.58

6 PT

GS2

H

s.196

384

Pros

tagl

andi

n-en

dope

roxi

de sy

ntha

se 2

(pro

stag

land

in G

/H sy

ntha

se a

nd

cycl

ooxy

gena

se)

ACSL

5 0.

526

0.89

2 0.

589

ACSL

5 H

s.116

38

Acy

l-CoA

synt

heta

se lo

ng-c

hain

fam

ily m

embe

r 5

BDN

F 0.

549

0.92

8 0.

592

BDN

F H

s.502

182

Bra

in-d

eriv

ed n

euro

troph

ic fa

ctor

AK

AP12

0.

639

1.06

0 0.

603

AKAP

12

Hs.3

7124

0 A

kin

ase

(PR

KA

) anc

hor p

rote

in (g

ravi

n) 1

2 PP

P2R2

B 1.

451

2.39

8 0.

605

PPP2

R2B

Hs.1

9382

5 Pr

otei

n ph

osph

atas

e 2

(for

mer

ly 2

A),

regu

lato

ry su

buni

t B (P

R 5

2), b

eta

isof

orm

IT

GB4

0.

799

1.31

5 0.

607

ITG

B4

Hs.3

7025

5 In

tegr

in, b

eta

4 IT

GA9

0.

557

0.90

1 0.

618

ITG

A9

Hs.1

1315

7 In

tegr

in, a

lpha

9

38

2

Sym

bol

Mea

n ex

pres

sion

ra

tio:

Inva

sive

E

OC

Mea

n ex

pres

sion

ra

tio:

LM

P

Mea

n fo

ld

chan

ce

diff

eren

ce

Uni

Gen

e sy

mbo

l U

nige

ne

Clu

ster

U

niG

ene

Nam

e

GPC

6 0.

467

0.74

9 0.

624

GPC

6 H

s.444

329

Gly

pica

n 6

KRT

19

0.78

3 1.

253

0.62

5 K

RT19

H

s.514

167

Ker

atin

19

CD

H5

0.62

4 0.

969

0.64

3 C

DH

5 H

s.762

06

Cad

herin

5, t

ype

2, V

E-ca

dher

in (v

ascu

lar e

pith

eliu

m)

LIM

S1

2.33

4 3.

627

0.64

4 LI

MS1

H

s.469

593

LIM

and

sene

scen

t cel

l ant

igen

-like

dom

ains

1

ANXA

3 0.

684

1.06

2 0.

644

ANXA

3 H

s.480

042

Ann

exin

A3

BAG

5 0.

844

1.30

2 0.

648

BAG

5 H

s.544

3 B

CL2

-ass

ocia

ted

atha

noge

ne 5

LA

MA3

0.

754

1.15

1 0.

655

LAM

A3

Hs.4

3636

7 La

min

in, a

lpha

3

CER

KL

0.88

8 1.

351

0.65

7 IT

GA4

H

s.440

955

Cer

amid

e ki

nase

-like

PC

DH

A6

0.72

9 1.

095

0.66

6 PC

DH

A6

Hs.1

9934

3 Pr

otoc

adhe

rin a

lpha

6

LAM

B1

0.55

4 0.

830

0.66

8 LA

MB1

H

s.489

646

Lam

inin

, bet

a 1

FGF1

3 1.

092

0.68

8 1.

587

FGF1

3 H

s.654

0 Fi

brob

last

gro

wth

fact

or 1

3 C

APN

9 1.

092

0.68

8 1.

587

CAP

N9

Hs.4

9802

1 C

alpa

in 9

G

PC1

1.41

5 0.

885

1.59

9 G

PC1

Hs.3

2823

2 G

lypi

can

1 TS

TA3

1.25

0 0.

781

1.60

0 TS

TA3

Hs.4

0411

9 Ti

ssue

spec

ific

trans

plan

tatio

n an

tigen

P35

B

IMP-

3 1.

547

0.95

8 1.

615

IMP-

3 H

s.432

616

IGF-

II m

RN

A-b

indi

ng p

rote

in 3

LG

ALS1

0.

987

0.60

6 1.

627

LGAL

S1

Hs.4

4535

1 Le

ctin

, gal

acto

side

-bin

ding

, sol

uble

, 1 (g

alec

tin 1

) PD

CD

2 1.

474

0.87

0 1.

694

PDC

D2

Hs.3

6790

0 Pr

ogra

mm

ed c

ell d

eath

2

SLC

39A1

4 1.

485

0.85

0 1.

749

SLC

39A1

4 H

s.491

232

Solu

te c

arrie

r fam

ily 3

9 (z

inc

trans

porte

r), m

embe

r 14

SELE

NBP

1 1.

271

0.70

9 1.

792

SELE

NBP

1 H

s.334

841

Sele

nium

bin

ding

pro

tein

1

CD

H13

2.

100

1.14

7 1.

830

CD

H13

H

s.436

040

Cad

herin

13,

H-c

adhe

rin (h

eart)

ST

EAP

1.18

2 0.

639

1.85

1 ST

EAP

Hs.6

1635

Si

x tra

nsm

embr

ane

epith

elia

l ant

igen

of t

he p

rost

ate

1 TM

4SF3

1.

191

0.63

7 1.

870

TM4S

F3

Hs.1

7056

3 Te

trasp

anin

8

CD

9 1.

976

1.03

4 1.

911

CD

9 H

s.114

286

CD

9 an

tigen

(p24

) PT

PNS1

1.

476

0.75

5 1.

955

PTPN

S1

Hs.1

2884

6 Pr

otei

n ty

rosi

ne p

hosp

hata

se, n

on-r

ecep

tor t

ype

subs

trate

1

LTBP

1 1.

368

0.68

6 1.

993

LTBP

1 H

s.497

87

Late

nt tr

ansf

orm

ing

grow

th fa

ctor

bet

a bi

ndin

g pr

otei

n 1

CD

H6

2.59

1 1.

275

2.03

3 C

DH

6 H

s.171

054

Cad

herin

6, t

ype

2, K

-cad

herin

(fet

al k

idne

y)

MAD

1.

408

0.68

5 2.

056

MAD

H

s.468

908

MA

X d

imer

izat

ion

prot

ein

1 PL

AU

2.33

4 1.

090

2.14

1 PL

AU

Hs.7

7274

Pl

asm

inog

en a

ctiv

ator

, uro

kina

se

383

Sym

bol

Mea

n ex

pres

sion

ra

tio:

Inva

sive

E

OC

Mea

n ex

pres

sion

ra

tio:

LM

P

Mea

n fo

ld

chan

ce

diff

eren

ce

Uni

Gen

e sy

mbo

l U

nige

ne

Clu

ster

U

niG

ene

Nam

e

NG

FR

1.30

6 0.

596

2.19

2 N

GFR

H

s.415

768

Ner

ve g

row

th fa

ctor

rece

ptor

(TN

FR su

perf

amily

, mem

ber 1

6)

MSL

N

1.14

7 0.

519

2.20

9 M

SLN

H

s.408

488

Mes

othe

lin

PRSS

8 1.

597

0.72

3 2.

209

PRSS

8 H

s.757

99

Prot

ease

, ser

ine,

8 (p

rost

asin

) BM

P7

3.03

1 1.

338

2.26

6 BM

P7

Hs.4

7316

3 B

one

mor

phog

enet

ic p

rote

in 7

(ost

eoge

nic

prot

ein

1)

SNC

G

2.22

2 0.

977

2.27

5 SN

CG

H

s.349

470

Synu

clei

n, g

amm

a (b

reas

t can

cer-

spec

ific

prot

ein

1)

CSR

P2

1.95

8 0.

833

2.35

0 C

SRP2

H

s.530

904

Cys

tein

e an

d gl

ycin

e-ric

h pr

otei

n 2

DEF

A4

1.35

7 0.

559

2.42

7 D

EFA4

H

s.258

2 D

efen

sin,

alp

ha 4

, cor

ticos

tatin

C

EBPA

1.

688

0.68

7 2.

457

CEB

PA

Hs.7

6171

C

CA

AT/

enha

ncer

bin

ding

pro

tein

(C/E

BP)

, alp

ha

LCP1

1.

816

0.72

4 2.

510

LCP1

H

s.381

099

Lym

phoc

yte

cyto

solic

pro

tein

1 (L

-pla

stin

)

FABP

3 1.

622

0.64

4 2.

520

FABP

3 H

s.112

669

Fatty

aci

d bi

ndin

g pr

otei

n 3,

mus

cle

and

hear

t (m

amm

ary-

deriv

ed g

row

th

inhi

bito

r)

CD

36

1.42

8 0.

562

2.54

0 C

D36

H

s.120

949

CD

36 a

ntig

en (c

olla

gen

type

I re

cept

or, t

hrom

bosp

ondi

n re

cept

or)

L1C

AM

2.56

9 0.

986

2.60

6 L1

CAM

H

s.522

818

L1 c

ell a

dhes

ion

mol

ecul

e TH

BS2

1.57

0 0.

587

2.67

5 TH

BS2

Hs.3

7114

7 Th

rom

bosp

ondi

n 2

DAB

2 1.

132

0.42

2 2.

685

DAB

2 H

s.481

980

Dis

able

d ho

mol

og 2

, mito

gen-

resp

onsi

ve p

hosp

hopr

otei

n (D

roso

phila

) PR

RX1

2.15

9 0.

787

2.74

2 PR

RX1

Hs.2

8341

6 Pa

ired

rela

ted

hom

eobo

x 1

STEA

P 1.

388

0.50

4 2.

754

STEA

P H

s.616

35

Six

trans

mem

bran

e ep

ithel

ial a

ntig

en o

f the

pro

stat

e 1

CO

L4A2

1.

348

0.48

2 2.

796

CO

L4A2

H

s.508

716

Col

lage

n, ty

pe IV

, alp

ha 2

K

RT10

3.

411

1.21

3 2.

813

KRT

10

Hs.9

9936

K

erat

in 1

0 (e

pide

rmol

ytic

hyp

erke

rato

sis;

ker

atos

is p

alm

aris

et p

lant

aris

) FN

1 1.

534

0.52

1 2.

944

FN1

Hs.2

0371

7 Fi

bron

ectin

1

PTG

S1

3.12

0 1.

048

2.97

7 PT

GS1

H

s.201

978

Pros

tagl

andi

n-en

dope

roxi

de sy

ntha

se 1

(pro

stag

land

in G

/H sy

ntha

se a

nd

cycl

ooxy

gena

se)

DXS

9879

E 2.

020

0.67

5 2.

994

DXS

9879

E H

s.444

619

DN

A se

gmen

t on

chro

mos

ome

X (u

niqu

e) 9

879

expr

esse

d se

quen

ce

DN

MT1

2.

232

0.73

5 3.

038

DN

MT1

H

s.202

672

DN

A (c

ytos

ine-

5-)-

met

hyltr

ansf

eras

e 1

GAG

EB1

1.69

9 0.

538

3.16

0 G

AGEB

1 H

s.128

231

P an

tigen

fam

ily, m

embe

r 1 (p

rost

ate

asso

ciat

ed)

TNFA

IP6

2.68

9 0.

816

3.29

5 TN

FAIP

6 H

s.437

322

Tum

or n

ecro

sis f

acto

r, al

pha-

indu

ced

prot

ein

6 M

EOX1

3.

147

0.93

5 3.

364

MEO

X1

Hs.4

38

Mes

ench

yme

hom

eo b

ox 1

SN

X13

2.06

9 0.

610

3.39

2 SN

X13

Hs.4

8764

8 So

rting

nex

in 1

3

38

4

Sym

bol

Mea

n ex

pres

sion

ra

tio:

Inva

sive

E

OC

Mea

n ex

pres

sion

ra

tio:

LM

P

Mea

n fo

ld

chan

ce

diff

eren

ce

Uni

Gen

e sy

mbo

l U

nige

ne

Clu

ster

U

niG

ene

Nam

e

MEO

X1

2.64

8 0.

768

3.44

9 M

EOX1

H

s.438

M

esen

chym

e ho

meo

box

1

KLK

5 7.

933

2.27

6 3.

485

KLK

5 H

s.509

15

Kal

likre

in 5

H

OXA

5 2.

104

0.59

5 3.

537

HO

XA5

Hs.3

7034

H

omeo

box

A5

UC

HL1

3.

108

0.83

1 3.

740

UC

HL1

H

s.518

731

Ubi

quiti

n ca

rbox

yl-te

rmin

al e

ster

ase

L1 (u

biqu

itin

thio

lest

eras

e)

GPN

MB

1.67

7 0.

446

3.75

9 G

PNM

B H

s.190

495

Gly

copr

otei

n (tr

ansm

embr

ane)

nm

b M

MP1

5 2.

140

0.55

0 3.

888

MM

P15

Hs.8

0343

M

atrix

met

allo

prot

eina

se 1

5 (m

embr

ane-

inse

rted)

IG

F1

1.90

7 0.

481

3.96

1 IG

F1

Hs.1

6056

2 In

sulin

-like

gro

wth

fact

or 1

(som

atom

edin

C)

RAD

51

2.60

0 0.

532

4.88

9 RA

D51

H

s.446

554

RA

D51

hom

olog

(Rec

A h

omol

og, E

. col

i) (S

. cer

evis

iae)

BA

P1

3.42

8 0.

559

6.13

4 BA

P1

Hs.1

0667

4 BR

CA1

ass

ocia

ted

prot

ein-

1 (u

biqu

itin

carb

oxy-

term

inal

hyd

rola

se)

CEN

PF

4.00

0 0.

637

6.28

1 C

ENPF

H

s.497

741

Cen

trom

ere

prot

ein

F, 3

50/4

00ka

(mito

sin)

TO

P2A

3.44

6 0.

546

6.31

1 TO

P2A

Hs.1

5634

6 To

pois

omer

ase

(DN

A) I

I alp

ha 1

70kD

a PT

TG1

4.16

8 0.

545

7.64

2 PT

TG1

Hs.3

5096

6 Pi

tuita

ry tu

mor

-tran

sfor

min

g 1

CRA

BP2

9.17

2 0.

806

11.3

82

CRA

BP2

Hs.4

0566

2 C

ellu

lar r

etin

oic

acid

bin

ding

pro

tein

2

385

Appendix Q: Microarray images from Gilks et al study of LMP and invasive EOC These randomly selected arrays from the total set (n=23) show the extent and range of

hybridisation patterns present in this dataset, visible even at this macroscopic level. In

particular, the feature intensity appears to fade from top to bottom of each sub-grid.

Optimisation of cDNA Microarray Tumour Profiling and Molecular Analysis of Epithelial Ovarian Cancer

Documents