Top Banner
Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes—A Study of 130 Invasive Ductal Breast Carcinomas Fabien Reyal, 1,2 Nicolas Stransky, 1 Isabelle Bernard-Pierrot, 1 Anne Vincent-Salomon, 3 Yann de Rycke, 4 Paul Elvin, 9 Andrew Cassidy, 9 Alexander Graham, 9 Carolyn Spraggon, 9 Yoann De ´sille, 1 Alain Fourquet, 5 Claude Nos, 2 Pierre Pouillart, 6 Henri Magdele ´nat, 7 Dominique Stoppa-Lyonnet, 3 Je ´ro ˆme Couturier, 3 Brigitte Sigal-Zafrani, 3 Bernard Asselain, 4 Xavier Sastre-Garau, 3 Olivier Delattre, 8 Jean Paul Thiery, 1,7 and Franc¸ois Radvanyi 1 1 Unite´ Mixte de Recherche 144, Centre National de la Recherche Scientifique; Departments of 2 Surgery, 3 Tumor Biology, 4 Biostatistics, 5 Radiotherapy, 6 Medical Oncology, and 7 Translational Research; and 8 U509, Institut National de la Sante´ et de la Recherche Me ´dicale, Institut Curie, Paris, France and 9 Astra Zeneca, Alderley Park, United Kingdom Abstract Completion of the working draft of the human genome has made it possible to analyze the expression of genes according to their position on the chromosomes. Here, we used a transcriptome data analysis approach involving for each gene the calculation of the correlation between its expression profile and those of its neighbors. We used the U133 Affymetrix transcriptome data set for a series of 130 invasive ductal breast carcinomas to construct chromosomal maps of gene expression correlation (transcriptome corre- lation map). This highlighted nonrandom clusters of genes along the genome with correlated expression in tumors. Some of the gene clusters identified by this method probably arose because of genetic alterations, as most of the chromosomes with the highest percentage of correlated genes (1q, 8p, 8q, 16p, 16q, 17q, and 20q) were also the most frequent sites of genomic alterations in breast cancer. Our analysis showed that several known breast tumor amplicons (at 8p11-p12, 11q13, and 17q12) are located within clusters of genes with correlated expression. Using hierarchical clustering on samples and a Treeview repre- sentation of whole chromosome arms, we observed a higher-order organization of correlated genes, sometimes involving very large chromosomal domains that could extend to a whole chromosome arm. Transcription correla- tion maps are a new way of visualizing transcriptome data. They will help to identify new genes involved in tumor progression and new mechanisms of gene regulation in tumors. (Cancer Res 2005; 65(4): 1376-83) Introduction Large-scale transcriptome analyses based on DNA microarrays have facilitated the classification of cancers into biologically distinct categories, some of which may explain the clinical behaviors of tumors (1–3). Such analyses may also help to find new prognostic and predictive markers (4, 5). Completion of the initial working draft of the human genome has made it possible to interpret transcriptome data in a new way, by directly assigning the genome-wide, high-throughput gene expression profiles to the human genome sequence (6). A few recent studies have explored the relationship between the transcriptome and the positions of genes on chromosomes. Using data from serial analysis of gene expression (SAGE) in a range of human normal and tumor tissues, Caron et al. (7) showed that highly expressed genes are often found in clusters in specific chromosomal regions, called regions of increased gene expression. By expressed sequence tag collection data analysis, Zhou et al. (8) have compared normal and tumor tissues in 10 different tissue types and showed clusters of genes exhibiting increased expression along chromosomes in tumors. Many of these genomic regions corresponded to known amplicons. By compar- ing serial analysis of gene expression data for normal bronchial epithelium, adenocarcinomas, and squamous cell carcinomas of the lung, Fujii et al. (9) identified clusters of overexpressed and underexpressed genes. They also showed that in squamous cell carcinomas of the lung, about half of these clusters were located in imbalanced chromosomal regions previously identified by cytogenetic, comparative genomic hybridization, or loss of heterozygosity studies. Two studies have directly explored the relationship between DNA copy number alterations and gene expression in breast tumors using cDNA microarrays and found that 40% to 60% of the highly amplified genes were overex- pressed (10, 11). Therefore, clustering of overexpressed genes could be attributable, at least in part, to underlying gene copy number alterations. The different approaches mentioned above established a relationship between the transcriptome and the chromosomal location of the genes by analyzing the tran- scriptome considering the expression of each gene individually. The genes are subsequently grouped into chromosomal domains according to the expression data. A second approach to explore the relationship between the transcriptome and the organization Note: F. Reyal, N. Stransky, and I. Bernard-Pierrot contributed equally to this work. B. Sigal-Zafrani contributed on behalf of the Institut Curie Breast Cancer Group (see Acknowledgments). Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/). The original figures of the article as well as the supplementary data can be found at http://microarrays.curie.fr/publications/ oncologie_moleculaire/breast_TCM/. Requests for reprints: Franc¸ois Radvanyi, Unite´ Mixte de Recherche 144, Institut Curie-CNRS, 26 rue dV Ulm, 75248 Paris Cedex 05, France. Phone: 33-1-42-34-63-39; Fax: 33-1-42-34-63-49; E-mail: [email protected]. I2005 American Association for Cancer Research. Cancer Res 2005; 65: (4). February 15, 2005 1376 www.aacrjournals.org Research Article Research. on March 14, 2015. © 2005 American Association for Cancer cancerres.aacrjournals.org Downloaded from
9

Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes—A Study of 130 Invasive Ductal Breast Carcinomas

Mar 19, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes—A Study of 130 Invasive Ductal Breast Carcinomas

Visualizing Chromosomes as Transcriptome Correlation Maps:

Evidence of Chromosomal Domains Containing Co-expressed

Genes—A Study of 130 Invasive Ductal Breast Carcinomas

Fabien Reyal,1,2Nicolas Stransky,

1Isabelle Bernard-Pierrot,

1Anne Vincent-Salomon,

3

Yann de Rycke,4Paul Elvin,

9Andrew Cassidy,

9Alexander Graham,

9Carolyn Spraggon,

9

Yoann Desille,1Alain Fourquet,

5Claude Nos,

2Pierre Pouillart,

6Henri Magdelenat,

7

Dominique Stoppa-Lyonnet,3Jerome Couturier,

3Brigitte Sigal-Zafrani,

3

Bernard Asselain,4Xavier Sastre-Garau,

3Olivier Delattre,

8

Jean Paul Thiery,1,7and Francois Radvanyi

1

1Unite Mixte de Recherche 144, Centre National de la Recherche Scientifique; Departments of 2Surgery, 3Tumor Biology, 4Biostatistics,5Radiotherapy, 6Medical Oncology, and 7Translational Research; and 8U509, Institut National de la Sante et de la Recherche Medicale,Institut Curie, Paris, France and 9Astra Zeneca, Alderley Park, United Kingdom

Abstract

Completion of the working draft of the human genome hasmade it possible to analyze the expression of genesaccording to their position on the chromosomes. Here, weused a transcriptome data analysis approach involving foreach gene the calculation of the correlation between itsexpression profile and those of its neighbors. We used theU133 Affymetrix transcriptome data set for a series of 130invasive ductal breast carcinomas to construct chromosomalmaps of gene expression correlation (transcriptome corre-lation map). This highlighted nonrandom clusters of genesalong the genome with correlated expression in tumors.Some of the gene clusters identified by this methodprobably arose because of genetic alterations, as most ofthe chromosomes with the highest percentage of correlatedgenes (1q, 8p, 8q, 16p, 16q, 17q, and 20q) were also themost frequent sites of genomic alterations in breast cancer.Our analysis showed that several known breast tumoramplicons (at 8p11-p12, 11q13, and 17q12) are locatedwithin clusters of genes with correlated expression. Usinghierarchical clustering on samples and a Treeview repre-sentation of whole chromosome arms, we observed ahigher-order organization of correlated genes, sometimesinvolving very large chromosomal domains that couldextend to a whole chromosome arm. Transcription correla-tion maps are a new way of visualizing transcriptome data.They will help to identify new genes involved in tumorprogression and new mechanisms of gene regulation intumors. (Cancer Res 2005; 65(4): 1376-83)

Introduction

Large-scale transcriptome analyses based on DNA microarrayshave facilitated the classification of cancers into biologicallydistinct categories, some of which may explain the clinicalbehaviors of tumors (1–3). Such analyses may also help to findnew prognostic and predictive markers (4, 5). Completion of theinitial working draft of the human genome has made it possibleto interpret transcriptome data in a new way, by directlyassigning the genome-wide, high-throughput gene expressionprofiles to the human genome sequence (6). A few recent studieshave explored the relationship between the transcriptome andthe positions of genes on chromosomes. Using data from serialanalysis of gene expression (SAGE) in a range of human normaland tumor tissues, Caron et al. (7) showed that highly expressedgenes are often found in clusters in specific chromosomalregions, called regions of increased gene expression. Byexpressed sequence tag collection data analysis, Zhou et al. (8)have compared normal and tumor tissues in 10 different tissuetypes and showed clusters of genes exhibiting increasedexpression along chromosomes in tumors. Many of thesegenomic regions corresponded to known amplicons. By compar-ing serial analysis of gene expression data for normal bronchialepithelium, adenocarcinomas, and squamous cell carcinomas ofthe lung, Fujii et al. (9) identified clusters of overexpressed andunderexpressed genes. They also showed that in squamous cellcarcinomas of the lung, about half of these clusters were locatedin imbalanced chromosomal regions previously identified bycytogenetic, comparative genomic hybridization, or loss ofheterozygosity studies. Two studies have directly explored therelationship between DNA copy number alterations and geneexpression in breast tumors using cDNA microarrays and foundthat 40% to 60% of the highly amplified genes were overex-pressed (10, 11). Therefore, clustering of overexpressed genescould be attributable, at least in part, to underlying gene copynumber alterations. The different approaches mentioned aboveestablished a relationship between the transcriptome and thechromosomal location of the genes by analyzing the tran-scriptome considering the expression of each gene individually.The genes are subsequently grouped into chromosomal domainsaccording to the expression data. A second approach to explorethe relationship between the transcriptome and the organization

Note: F. Reyal, N. Stransky, and I. Bernard-Pierrot contributed equally to this work.B. Sigal-Zafrani contributed on behalf of the Institut Curie Breast Cancer Group (seeAcknowledgments).

Supplementary data for this article are available at Cancer Research Online(http://cancerres.aacrjournals.org/). The original figures of the article as well as thesupplementary data can be found at http://microarrays.curie.fr/publications/oncologie_moleculaire/breast_TCM/.

Requests for reprints: Francois Radvanyi, Unite Mixte de Recherche 144, InstitutCurie-CNRS, 26 rue dVUlm, 75248 Paris Cedex 05, France. Phone: 33-1-42-34-63-39; Fax:33-1-42-34-63-49; E-mail: [email protected].

I2005 American Association for Cancer Research.

Cancer Res 2005; 65: (4). February 15, 2005 1376 www.aacrjournals.org

Research Article

Research. on March 14, 2015. © 2005 American Association for Cancercancerres.aacrjournals.org Downloaded from

Page 2: Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes—A Study of 130 Invasive Ductal Breast Carcinomas

of the genome was developed first on budding yeast tran-scriptome data (12) and subsequently on Drosophila tran-scriptome data (13). This transcriptome data analysis approachsearches for groups of neighboring genes that show correlatedexpression profiles. Both groups identified chromosomaldomains of similarly expressed neighboring genes. In normalcells, these chromosomal domains of co-regulated genes mayrepresent chromatin-regulated regions or groups of genesregulated by the same transcription factor(s). We developed asimilar computational method for the analysis of transcriptomedata to identify chromosomal domains containing co-expressedgenes in cancer and applied it to a series of 130 invasive ductalbreast carcinomas.

Materials and MethodsPatients and Breast Tumor Samples. We analyzed the gene

expression profiles of 130 infiltrating ductal primary breast carcinomas.

These carcinomas were obtained from 130 patients who were included

in the prospective database initiated in 1981 by the Institut Curie BreastCancer Group between 1989 and 1999. The flash-frozen tumor samples

were stored at �80jC immediately after lumpectomy or mastectomy. All

tumor samples contained >50% cancer cells, as assessed by H&E

staining of histologic sections adjacent to the samples used for thetranscriptome analysis. The clinical data for the patients and the

histologic characteristics of the tumors are summarized in Supplemen-

tary Table S1. This study was approved by the institutional review

boards of Institut Curie.RNA Extraction and Microarray Data Collection. RNA was extracted

from all tumor samples by the cesium chloride protocol (14, 15). Theconcentration and the integrity/purity of each RNA sample were measuredusing RNA 6000 LabChip kit (Agilent Technologies, Palo Alto, CA) and theAgilent 2100 bioanalyzer. The DNA microarrays used in this study were theHuman Genome U133 set (HG-U133, Affymetrix, Santa Clara, CA),consisting of two GeneChip arrays (A and B), and containing almost45,000 probe sets. Each probe set consisted of 22 different oligonucleotides(11 of which are a perfect match with the target transcript and 11 of whichharbor a one-nucleotide mismatch in the middle). These 22 oligonucleo-tides were used to measure the level of a given transcript. Details of the RNAamplification, labeling, and hybridization steps are available from http://www.affymetrix.com. Chips were scanned and the intensities for each probeset were calculated using Affymetrix MAS5.0 default settings. The meanintensity of the probe sets for each array was set to a constant target value(500) by linearly scaling the array signal intensities.

Selection of Probe Sets: Attribution of a Unique Probe Set to EachGene. The fact that some genes are represented by several probe sets could

introduce artifacts when looking for neighboring genes with similarexpression patterns because the intensities of these probe sets are highly

correlated. To avoid this artifact, probe sets corresponding to uncharac-

terized expressed sequence tags were removed from the analysis, and whenseveral probe sets corresponded to the same gene (i.e., if different probe sets

had the same title, the same GenBank ID or belonged to the same UniGene

Cluster), a single probe set was kept for the analysis. Probe sets with an

‘‘_at’’ extension were preferentially kept because they tend to be morespecific according to Affymetrix probe set design algorithms. Probe sets

with an ‘‘_s_at’’ extension were the second best choice followed by all other

extensions. When several probe sets with the same extension were available

for one gene, the one with the highest median value was kept. From aninitial list of f45,000 probe sets, we kept 16,215 ‘‘unique’’ probe sets

corresponding to a unique gene. Due to the univocal correspondence

between these 16,215 probe sets and genes, we will use the terms ‘‘probeset’’ and ‘‘gene’’ indifferently.

Chromosomal Location of the Probe Sets. Each of the 16,215 probe

sets represents a unique gene. Their genomic locations were obtained

from the U133 annotation files from Affymetrix. When the position of the

probe set was not available, it was obtained using the basic localalignment search tool–like alignment tool program (16) for sequence-

matching searches of probe set–specific target sequences (an f600

nucleotide sequence from which probe set oligonucleotides are derived)

against the University of California Santa Cruz Human Genome WorkingDraft sequence, or by using the position of their corresponding UniGene

Cluster (Homo sapiens UniGene Build 164). All positions refer to the July

2003 Human Genome Working Draft.

Calculation of Transcription Correlation Scores and Identificationof Neighboring Genes with Correlated Expression. A similar method to

those described in yeast and Drosophila (12, 13), based on transcriptome

array data, has been developed in our laboratory (by N. Stransky) to evaluate

the correlation between the expression profile of each gene and those of itsneighbors. For each probe set (gene), we calculated a score, which we called

the transcriptome correlation score. This score is the sum of the Spearman

rank order correlation values in the tumor samples between the RNA levelsof this gene and the RNA levels of each of the physically nearest 2n genes (n

centromeric genes and n telomeric genes).

To determine a significance threshold (i.e., a score above which a gene is

considered to have a similar expression pattern to its neighbors), we created100 random data sets of the same size by randomly ordering the 16,215

probe sets on the genome. For each random set, transcriptome correlation

scores were calculated for each probe set. The significance threshold was

the 500th quantile of the distribution (i.e., the value for which 1 of 500 probesets in the random data sets were above this value). Probe sets with a score

exceeding the threshold are consequently significantly correlated with their

neighbors within a number of 2n probe sets at P < 0.002 and are called‘‘correlated probe sets.’’

To determine the appropriate number (2n) of neighboring genes needed

to calculate the transcriptome correlation score for each gene of our data

set, the total number of genes with a score above the threshold wascalculated as a function of 2n (13) for several values ranging from 2 to 34

(Supplementary Fig. S1). Above 2n = 20, this number reaches a plateau.

Therefore, 20 neighboring genes were used to calculate the transcriptome

correlation score.For each of the 16,215 probe sets used, we calculated the transcriptome

correlation score. For each chromosome, we obtained a diagram (the

transcriptome correlation map) representing this score for each probe set,organized according to its chromosomal position.

Correlation between Adjacent Groups of Correlated Genes. To

determine if there was a correlation between adjacent groups of correlated

genes, we analyzed the correlated probe sets by one-way unsupervisedhierarchical clustering on samples (17) and by leaving the probe sets

organized according to their chromosomal position.

Software Used for Data Analyses. Statistical analysis and all

calculations were done using R 1.9.0 (http://www.r-project.org). We usedthe HKIS software (http://isoft.free.fr/hkis/) to look up gene positions in

public databases and for data formatting. We used Java TreeView 1.0.5

(http://jtreeview.sourceforge.net/) to make representations of Eisen clusters

(17) obtained with Cluster 3.0 (http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/).

ResultsGeneration of a Transcriptome Map Based on the

Correlated Expression of Neighboring Genes (TranscriptomeCorrelation Map) in Infiltrating Ductal Breast Carcinoma.The RNA expression data of 130 ductal invasive breastcarcinomas were obtained using the Affymetrix U133 set. Of the45,000 probes in the initial set, 16,215 probe sets, eachcorresponding to a unique gene, were kept to establishchromosomal transcriptome maps (see Materials and Methods).These maps highlight genes that show a correlated expressionwith their neighbors. The transcriptome correlation maps forgenes located on chromosomes 1 and 2 are shown in Fig. 1. Thesignificance threshold was defined as the 500th quantile of the

Transcriptome Correlation Map of Breast Carcinomas

www.aacrjournals.org 1377 Cancer Res 2005; 65: (4). February 15, 2005

Research. on March 14, 2015. © 2005 American Association for Cancercancerres.aacrjournals.org Downloaded from

Page 3: Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes—A Study of 130 Invasive Ductal Breast Carcinomas

distribution of the resulting 1.6 million transcriptome correlationscores in the random data sets and was equal to 3.38. Examplesof a random data set for chromosomes 1 and 2 are given in Fig. 1(right panels). The transcriptome correlation maps of the differentchromosomes, including chromosomes 1 and 2, showed thatgenes with a transcriptome correlation score above the thresholdwere not distributed uniformly along the chromosomes: groups of

adjacent genes with correlated expression could be seen (Fig. 1

and Supplementary Fig. S2 and Table S2). Interestingly, the genes

corresponding to the three well-known amplification regions in

breast cancer [8p11-p12 (FGFR1 locus), 11q13 (CCDN1 locus), and

17p12 (ERBB2 locus)] were present in chromosomal regions

containing genes with transcriptome correlation scores higher

than the threshold (Fig. 2).Large Chromosomal Domains of Co-expressed Genes.

Overall, 20% of the genes had a significant transcriptome correlationscore (i.e., their expression was correlated with that of theirneighbors). The percentage of genes with a significant transcriptome

correlation score differed significantly between the differentchromosome arms (Fig. 3). The chromosome arms with the highestpercentages (higher than 30%) were 1q (243 of 750 genes), 8p (82 of207 genes), 8q (185 of 364 genes), 14q (174 of 527 genes), 16p (179 of379 genes), 16q (110 of 318 genes), 17q (303 of 704 genes), and 20q(101 of 304 genes). Beside chromosome arm 14q, all thesechromosome arms are also the most frequent locations of genomic

alterations in breast cancer. We analyzed chromosome arms 1q, 8p,

8q, 16p, 16q, and 17q in more detail (Figs 4–6 and Supplementary

Fig. S4). For this analysis, we considered only genes with a

transcriptome correlation score above the threshold (Figs. 4A-6A

and Supplementary Fig. S4A). The expression patterns of these

genes, ordered according to their chromosomal location, were

examined in the 130 breast cancer samples by using an unsupervisedhierarchical clustering representation (one-way clustering onsamples). Very similar expression patterns were obtained for the243 genes on chromosome 1q with a significant transcriptomecorrelation score (Fig. 4B). These genes extended from PEX11B

Figure 1. Transcriptome correlation map of chromosomes 1 and 2 in infiltrating ductal breast carcinoma. The transcriptome correlation score (TC score) for agiven gene indicates the strength of the correlation between the expression of this gene and the expression of the 20 neighboring genes (see Materials and Methods).The transcriptome correlation map shows the scores for the different genes as a function of their position along the chromosome. Left, transcriptome correlationmaps of chromosome 1 and chromosome 2. Dashed line, significance threshold: Genes with scores above this threshold are considered to have a significantlycorrelated expression with that of their neighbors. This significant threshold was calculated using random data sets (see Materials and Methods).Right, examples of random data set for chromosome 1 and chromosome 2.

Cancer Research

Cancer Res 2005; 65: (4). February 15, 2005 1378 www.aacrjournals.org

Research. on March 14, 2015. © 2005 American Association for Cancercancerres.aacrjournals.org Downloaded from

Page 4: Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes—A Study of 130 Invasive Ductal Breast Carcinomas

(1q21.1) to ELYS (1q44), spanning a 100,677-kb region. A group ofgenes with a similar expression pattern was also apparent on thelong arm of chromosome 8 (Fig. 5B) and extended to thecentromeric region of chromosome 8p [213 genes, spanning108,140 kb from FLJ14299 (8p12) to KIAA0014 (8q24.3)]. The 8p11-p12 amplified region containing the FGFR1 locus was located at theedge of this domain. A very different expression pattern wasobserved for the genes telomeric to the FGFR1 locus [54 genes,spanning 25,203 kb from LOC84549 (8p12) to DKFZp761P0423(8p23.1)]. The 289 genes on chromosome 16 (Supplementary Fig.S4B) with a transcriptome correlation score higher than thethreshold corresponded to two different sets of genes. One was agroup of 110 genes all localized on 16q and spanning 43,127 kbfrom VPS35 (16q11.2) to FANCA (16q24.3) and the second was agroup of 179 genes all localized on 16p, spanning 30,901 kb fromBCKDK (16p11.2) to RGS11 (16p13.3). In contrast, numerousdifferent expression patterns were observed on chromosome 17q(Fig. 6B). At least six different sets of genes were found: A group of36 genes (a) spanning 4,034 kb from TNFAIP1 (17q11.2) to ZNF207(17q11.2), a group of 37 genes (b) spanning 4,110 kb from FLJ22865(17q12) to SMARCE1 (17q21.2), a group of 64 genes (c) spanning7,045 kb from KRTAP4-15 (17q21.2) to SCAP1 (17q21.32), a group offour genes (d) spanning 78 kb from HOXB2 (17q21.32) to HOXB7(17q21.32), a group of 74 genes (e) spanning 19,006 kb from

KIAA0924 (17q21.32) to LOC51321 (17q24.2), and a group of 88genes ( f ) spanning 14,219 kb from SLC16A6 (17q24.2) to MGC4368(17q25.3 (17925.3) (Fig. 6).

Discussion

Microarray technology makes it possible to monitor theexpression of thousand of genes simultaneously. Completion ofthe initial working draft of the human genome has made itpossible to analyze the expression of the genes according to

their position on the genome. Comparison of the expressionpatterns of adjacent genes in different expression array data setsin yeast (12, 18), Drosophila (13), and worms (19) has led to the

identification of groups of physically adjacent genes that sharesimilar expression profiles. In this work, we used this newapproach to analyze large-scale transcriptome data concerning

cancer.To identify systematically co-expressed adjacent genes along the

chromosomes, we calculated an expression correlation score foreach given gene. This score was the sum of the correlation values

between the RNA expression levels of this gene and the RNAexpression levels of each of the 20 neighboring genes (10centromeric genes and 10 telomeric genes). Using the U133Affymetrix transcriptome data for a series of 130 invasive ductal

Figure 2. Transcriptome correlation maps of regions localized on chromosomes 8, 11, and 17 containing known amplicons in infiltrating ductal breast carcinoma.The transcriptome correlation maps of chromosome 8 (region p21.2 to q11.21; left), chromosome 11 (region q11 to q14.1; middle ), and chromosome 17 (region q11.2to q21.31; right ) are shown. Vertical bars, position of the centromeres. Bold squares, transcriptome correlation scores (TC score) for FGFR1, CCND1,and ERBB2.

Transcriptome Correlation Map of Breast Carcinomas

www.aacrjournals.org 1379 Cancer Res 2005; 65: (4). February 15, 2005

Research. on March 14, 2015. © 2005 American Association for Cancercancerres.aacrjournals.org Downloaded from

Page 5: Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes—A Study of 130 Invasive Ductal Breast Carcinomas

breast carcinomas, we constructed a genome-wide map that wecalled the transcriptome correlation map. This map highlights the‘‘correlated genes’’ (i.e., genes that share a similar expression profilewith their neighbors).We have found in this series of breast tumors that f20% of

the genes showed a significant correlation with their neighbors.The transcription correlation maps of the different chromo-somes revealed regions with a high percentage of correlatedgenes separated by regions containing genes not significantlycorrelated with their neighbors. The precise physical definition

of the correlated regions was not always straightforward as acontinuum between the correlated regions was observed insome cases.We addressed the effect of the gene densities on the percentage

of correlated genes in the transcription correlation map. Unlikefor the regions of increased gene expression described by Caronet al. (7), we observed no systematic correlation between genedensity and the percentage of genes that are over the thresholdon the transcription correlation map (Supplementary Fig. S3).Genetic or nongenetic mechanisms could account for thecorrelation in expression between neighboring genes. Aneusomywill affect in the same way the expression of a series of adjacentgenes not subjected to gene dosage compensation. This has beendescribed both for DNA losses and gains in yeast (20) and inhumans (10, 11, 21, 22). Nongenetic mechanisms could also affectthe expression of neighboring genes. Several models or combina-tions of models have been proposed: long range effect oftranscription factors, chromatin structure modification, andincreased concentration of components of the transcriptionalmachinery due to a particular subnuclear location of chromo-somal segments (23). Genomic alterations most likely explain partof the correlation between neighboring genes in breast tumorsbecause, except for chromosome arm 14q, the chromosome armspresenting the highest percentage of correlated genes (8q, 51%;16p, 47%; 17q, 43%; 8p, 40%; 16q, 35%; 20q, 33%; 1q, 32%) werealso known to harbor frequent chromosome imbalance, as shownby karyotypic, comparative genomic hybridization, or loss ofheterozygosity studies (24–30). Two well-characterized amplicons,the FGFR1 amplicon at 8p11-p12 and the ERBB2 amplicon at17q12, corresponded to regions presenting a high percentage ofcorrelated genes. Very recent data on breast cancer cell lines

suggest that FGFR1 is not the driving oncogene in the 8p11-p12amplicon. Several new candidate oncogenes located in this regionhave been suggested to play a causal role in breast cancerprogression in tumors with an amplification in the 8p11-p12

Figure 3. Percentage of genes with atranscriptome correlation score (TC score)higher than the threshold for eachchromosomal arm in infiltrating ductalbreast carcinoma. For each chromosomalarm, the number of genes with atranscriptome correlation score higher thanthe significant correlation score dividedby the total number of considered geneson the chromosome arm was calculatedand expressed as a percentage.

Figure 4. Chromosomal domains of co-expressed genes on chromosome 1q. A ,transcriptome correlation map of chromosome 1. B , unsupervised hierarchicalcluster analysis of 130 infiltrating ductal breast carcinomas using the genes onthe long arm of chromosome 1 with a significant transcriptome correlationscore (TC score ). Each row corresponds to one tumor sample and each columncorresponds to one gene. The genes are arranged in cytogenetic order fromthe centromere to the q telomere: red, high level of expression relative to the meanexpression level in the 130 tumor samples; green, low level of gene expression.The genes under line a correspond to the genes contained in rectangle a of A .

Cancer Research

Cancer Res 2005; 65: (4). February 15, 2005 1380 www.aacrjournals.org

Research. on March 14, 2015. © 2005 American Association for Cancercancerres.aacrjournals.org Downloaded from

Page 6: Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes—A Study of 130 Invasive Ductal Breast Carcinomas

region. Of the 15 genes with a significant transcriptomecorrelation score from this region, nine (HTPAP, KIAA0725 ,FGFR1 , BAG4 , LSM1 , RCP, BRF2 , PROSC , and FLJ14299) havealready been proposed to be potential oncogenes (31). The ERBB2region at 17q12 is of particular interest in breast cancer.Amplification and overexpression of ERBB2 is associated withpoor clinical outcome and is found in 15% of infiltrating breastcarcinomas. Moreover, ERBB2 gene amplification determines theresponse to specific antibody-based therapy (trastuzumab; ref. 32).All seven of the genes located in the minimal ERBB2amplification region (280 kb) that presented an overexpressionassociated with amplification (STARD3 , PNMT , TCAP,CAB2 , ERBB2 , MGC14832 , and GRB7 ; ref. 33) corresponded tocorrelated genes in the 17q12 region. The 11q13 region is oftenamplified in breast cancer, and CCND1 was initially thought to bethe main oncogene in this region. Although CCND1 was one ofthe correlated genes, many other genes in this region had similaror higher transcriptome correlation scores, possibly becausemultiple mechanisms for increasing the expression of CCND1occur frequently in addition to amplification (33) and/or dueto the complexity of the 11p13 amplification region, whichprobably contains several different amplicons (34). It should benoted that within the different regions rich in correlated genes,in particular 11q13 and 17q12, f50% of the genes were foundto be correlated, meaning that the other 50% of genes are not

correlated in these regions. This percentage is in goodagreement with the results of Hyman et al. (10) and Pollacket al. (11), who showed that f50% of the genes within anamplified region are overexpressed.We found chromosomal regions harboring genes with

correlated expression in regions known to contain ampliconslike 8p11-p12, 11q13, and 17q12 in regions exhibiting frequentsingle chromosome arm gain like 1q, 8q, and 16p, and inregions exhibiting chromosomal losses like 16q. To determinethe exact genetic mechanisms responsible for the correlation inexpression between adjacent genes, it will be necessary toobtain, in parallel with transcriptome data, genome data likethose obtained by comparative genomic hybridization arrays(35). The combination of comparative genomic hybridizationarray data and transcription correlation map analysis shouldhelp us to identify genes involved in tumor progression. Groupsof genes with correlated expression were rarer but also presentin regions not affected by genetic alterations in breast cancerlike chromosome arms 2p or 9q. Our new approach foranalyzing the transcriptome in tumors may pinpoint genesinvolved in tumor progression, the expression of which isaltered by nongenetic mechanisms.Using hierarchical clustering and a Treeview representation,

we were able to show that regions with genes with similarexpression patterns can extend over very large chromosomal

Figure 5. Chromosomal domains of co-expressed genes on chromosome 8.A, transcriptome correlation map of chromosome 8. B, unsupervisedhierarchical cluster analysis of 130 infiltrating ductal breast carcinomas usingthe genes of chromosome 8 with a significant transcriptome correlationscore (TC score). Each row, one tumor sample; each column, one gene.The genes are arranged in cytogenetic order from the p telomere to theq telomere: red, high level of expression relative to the mean expression level inthe 130 tumor samples; green, low level of gene expression. The genesunder line a (or b) correspond to the genes contained inrectangle a (or b) of A .

Figure 6. Chromosomal domains of co-expressed genes on chromosome17q. A, transcriptome correlation map of chromosome 17. B, unsupervisedhierarchical cluster analysis of 130 infiltrating ductal breast carcinomas usingthe genes on the long arm of chromosome 17 with a significant transcriptomecorrelation score (TC score ). Each row corresponds to one tumor sampleand each column corresponds to one gene. The genes are arrangedin cytogenetic order from the centromere to the q telomere: red, high levelof expression relative to the mean expression level in the 130 tumor samples;green, low level of gene expression. The genes under line a (or b , c , d , e , f)correspond to the genes under line a (or b , c , d , e , f ) of A .

Transcriptome Correlation Map of Breast Carcinomas

www.aacrjournals.org 1381 Cancer Res 2005; 65: (4). February 15, 2005

Research. on March 14, 2015. © 2005 American Association for Cancercancerres.aacrjournals.org Downloaded from

Page 7: Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes—A Study of 130 Invasive Ductal Breast Carcinomas

domains, sometimes involving a whole chromosome arm. This

phenomenon could be due to genetic or nongenetic mecha-

nisms. Changes in whole chromosome gene expression due to

aneuploidy have already been described in yeast and in human

(20–22). Modification of gene expression involving large chro-

mosomal domains or even entire chromosomes could also be

due to epigenetic mechanisms (36).The ‘‘transcriptome correlation map’’ approach has two strong

points: (a) although potentially very useful, knowledge about gene

expression in normal tissue is not compulsory. This is particularly

useful for tumors for which the normal counterparts are difficult

to obtain, such as breast or ovarian carcinomas, or when the

cellular origin of the tumor is unknown, like Ewing tumors,

synovialosarcoma, or medulloblastomas; (b) it will be possible to

compare the transcriptome correlation maps between different

groups of samples even if the data for the different groups are

obtained on different platforms because transcriptome correlation

map does not compare the gene expression in different groups

but the correlation of expression between neighboring genes

within a group of samples. Additionally, subsets of genes with a

significant correlation score could be used to classify tumors into

meaningful anatomoclinical groups. It should be noted that the

relationship between regions of correlated genes can occur

not only between adjacent regions but also between regions

on different chromosome arms or even different chromo-

somes (e.g., higher levels of expression of the genes included

in the ERBB2 cluster are associated with lower expression

levels of the genes included in the CCND1 cluster). The

systematic investigation of such relationship could help to

identify combinations of events that occur during tumor

progression.Transcriptome correlation maps are a new way of interpreting

transcriptome data. Combined with other molecular data (i.e.,

chromosomal alteration data obtained by comparative genomic

hybridization arrays or large-scale methylation analysis; refs.

35, 37, 38), transcriptome correlation map can be applied to any

tumor type and will help to identify new genes involved in

tumor progression and new mechanisms of gene regulation in

tumors.

Acknowledgments

Received 7/29/2004; revised 10/13/2004; accepted 12/9/2004.Grant support: Centre National de la Recherche Scientifique, Institut Curie Breast

Cancer program, European FP5 IST HKIS project, and Comite de Paris Ligue NationaleContre le Cancer (Laboratoire associe); Ligue Nationale Contre le Cancer fellowship(F. Reyal and I. Bernard-Pierrot) and French Ministry of Education and Researchfellowship (N. Stransky).

The Institut Curie Breast Cancer Group: Bernard Asselain, Alain Aurias, EmmanuelBarillot, Francois Campana, Patricia De Cremoux, Olivier Delattre, Veronique Dieras,Jean-Marc Extra, Alain Fourquet, Henri Magdelenat, Martine Meunier, Claude Nos,Thao Palangie, Pierre Pouillart, Marie-France Poupon, Francois Radvanyi, XavierSastre-Garau, Brigitte Sigal-Zafrani, Dominique Stoppa-Lyonnet, Anne Tardivon,Fabienne Thibault, Jean Paul Thiery, and Anne Vincent-Salomon.

The costs of publication of this article were defrayed in part by the payment of pagecharges. This article must therefore be hereby marked advertisement in accordancewith 18 U.S.C. Section 1734 solely to indicate this fact.

References1. Perou CM, Sorlie T, Eisen MB, et al. Molecularportraits of human breast tumours. Nature 2000;406:747–52.

2. Sorlie T, Perou CM, Tibshirani R, et al. Geneexpression patterns of breast carcinomas distinguishtumor subclasses with clinical implications. Proc NatlAcad Sci U S A 2001;98:10869–74.

3. Sorlie T, Tibshirani R, Parker J, et al. Repeatedobservation of breast tumor subtypes in independentgene expression data sets. Proc Natl Acad Sci U S A2003;100:8418–23.

4. van’t Veer LJ, Dai H, van de Vijver MJ, et al. Geneexpression profiling predicts clinical outcome of breastcancer. Nature 2002;415:530–6.

5. van de Vijver MJ, He YD, van’t Veer LJ, et al. A gene-expression signature as a predictor of survival in breastcancer. N Engl J Med 2002;347:1999–2009.

6. Collins FS, Morgan M, Patrinos A. The HumanGenome Project: lessons from large-scale biology.Science 2003;300:286–90.

7. Caron H, van Schaik B, van der Mee M, et al. Thehuman transcriptome map: clustering of highlyexpressed genes in chromosomal domains. Science2001;291:1289–92.

8. Zhou Y, Luoh SM, Zhang Y, et al. Genome-wideidentification of chromosomal regions of increasedtumor expression by transcriptome analysis. Cancer Res2003;63:5781–4.

9. Fujii T, Dracheva T, Player A, et al. A preliminarytranscriptome map of non-small cell lung cancer.Cancer Res 2002;62:3340–6.

10. Hyman E, Kauraniemi P, Hautaniemi S, et al. Impactof DNA amplification on gene expression patterns inbreast cancer. Cancer Res 2002;62:6240–5.

11. Pollack JR, Sorlie T, Perou CM, et al. Microarrayanalysis reveals a major direct role of DNA copynumber alteration in the transcriptional program of

human breast tumors. Proc Natl Acad Sci U S A 2002;99:12963–8.

12. Cohen BA, Mitra RD, Hughes JD, Church GM. Acomputational analysis of whole-genome expressiondata reveals chromosomal domains of gene expression.Nat Genet 2000;26:183–6.

13. Spellman PT, Rubin GM. Evidence for large domainsof similarly expressed genes in the Drosophila genome.J Biol 2002;1:5.

14. Coombs LM, Pigott D, Proctor A, Eydmann M,Denner J, Knowles MA. Simultaneous isolation ofDNA, RNA, and antigenic protein exhibiting kinaseactivity from small tumor samples using guanidineisothiocyanate. Anal Biochem 1990;188:338–43.

15. Chirgwin JM, Przybyla AE, MacDonald RJ, Rutter WJ.Isolation of biologically active ribonucleic acid fromsources enriched in ribonuclease. Biochemistry1979;18:5294–9.

16. Kent WJ. BLAT—the BLAST-like alignment tool.Genome Res 2002;12:656–64.

17. Eisen MB, Spellman PT, Brown PO, Botstein D.Cluster analysis and display of genome-wide expressionpatterns. Proc Natl Acad Sci U S A 1998;95:14863–8.

18. Kruglyak S, Tang H. Regulation of adjacent yeastgenes. Trends Genet 2000;16:109–11.

19. Lercher MJ, Blumenthal T, Hurst LD. Coexpression ofneighboring genes in Caenorhabditis elegans is mostlydue to operons and duplicate genes. Genome Res2003;13:238–43.

20. Hughes TR, Roberts CJ, Dai H, et al. Widespreadaneuploidy revealed by DNA microarray expressionprofiling. Nat Genet 2000;25:333–7.

21. Virtaneva K, Wright FA, Tanner SM, et al. Expressionprofiling reveals fundamental biological differences inacute myeloid leukemia with isolated trisomy 8 andnormal cytogenetics. Proc Natl Acad Sci U S A 2001;98:1124–9.

22. Phillips JL, Hayward SW, Wang Y, et al. Theconsequences of chromosomal aneuploidy on gene

expression profiles in a cell line model for prostatecarcinogenesis. Cancer Res 2001;61:8143–9.

23. Oliver B, Parisi M, Clark D. Gene expressionneighborhoods. J Biol 2002;1:4.

24. Tirkkonen M, Tanner M, Karhu R, Kallioniemi A,Isola J, Kallioniemi OP. Molecular cytogenetics ofprimary breast cancer by CGH. Genes ChromosomesCancer 1998;21:177–84.

25. Forozan F, Mahlamaki EH, Monni O, et al.Comparative genomic hybridization analysis of 38breast cancer cell lines: a basis for interpretingcomplementary DNA microarray data. Cancer Res2000;60:4519–25.

26. Cingoz S, Altungoz O, Canda T, Saydam S,Aksakoglu G, Sakizli M. DNA copy number changesdetected by comparative genomic hybridization andtheir association with clinicopathologic parameters inbreast tumors. Cancer Genet Cytogenet 2003;145:108–14.

27. Janssen EA, Baak JP, Guervos MA, van Diest PJ, JiwaM, Hermsen MA. In lymph node-negative invasivebreast carcinomas, specific chromosomal aberrationsare strongly associated with high mitotic activity andpredict outcome more accurately than grade, tumourdiameter, and oestrogen receptor. J Pathol 2003;201:555–61.

28. Jong YJ, Li LH, Tsou MH, et al. Chromosomalcomparative genomic hybridization abnormalities inearly- and late-onset human breast cancers: correlationwith disease progression and TP53 mutations. CancerGenet Cytogenet 2004;148:55–65.

29. Kirchweger R, Zeillinger R, Schneeberger C, Speiser P,Louason G, Theillet C. Patterns of allele losses suggestthe existence of five distinct regions of LOH onchromosome 17 in breast cancer. Int J Cancer1994;56:193–9.

30. Dutrillaux B, Gerbault-Seureau M, Zafrani B. Char-acterization of chromosomal anomalies in humanbreast cancer. A comparison of 30 paradiploid cases

Cancer Research

Cancer Res 2005; 65: (4). February 15, 2005 1382 www.aacrjournals.org

Research. on March 14, 2015. © 2005 American Association for Cancercancerres.aacrjournals.org Downloaded from

Page 8: Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes—A Study of 130 Invasive Ductal Breast Carcinomas

with few chromosome changes. Cancer Genet Cytoge-net 1990;49:203–17.

31. Ray ME, Yang ZQ, Albertson D, et al. Genomic andexpression analysis of the 8p11-12 amplicon inhuman breast cancer cell lines. Cancer Res 2004;64:40–7.

32. Nahta R, Hung MC, Esteva FJ. The HER-2-targetingantibodies trastuzumab and pertuzumab synergisticallyinhibit the survival of breast cancer cells. Cancer Res2004;64:2343–6.

33. Kauraniemi P, Kuukasjarvi T, Sauter G, Kallioniemi A.Amplification of a 280-kilobase core region at theERBB2 locus leads to activation of two hypotheticalproteins in breast cancer. Am J Pathol 2003;163:1979–84.

34. Ormandy CJ, Musgrove EA, Hui R, Daly RJ, Suther-land RL. Cyclin D1, EMS1 and 11q13 amplification inbreast cancer. Breast Cancer Res Treat 2003;78:323–35.

35. Pinkel D, Segraves R, Sudar D, et al. High resolutionanalysis of DNA copy number variation using compar-

ative genomic hybridization to microarrays. Nat Genet1998;20:207–11.

36. Grewal SI, Moazed D. Heterochromatin and epigenet-ic control of gene expression. Science 2003;301:798–802.

37. Zardo G, Tiirikainen MI, Hong C, et al. Integratedgenomic and epigenomic analyses pinpoint biallelicgene inactivation in tumors. Nat Genet 2002; 32:453–8.

38. Huang TH, Perry MR, Laux DE. Methylation profilingof CpG islands in human breast cancer cells. Hum MolGenet 1999;8:459–70.

Transcriptome Correlation Map of Breast Carcinomas

www.aacrjournals.org 1383 Cancer Res 2005; 65: (4). February 15, 2005

Research. on March 14, 2015. © 2005 American Association for Cancercancerres.aacrjournals.org Downloaded from

Page 9: Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes—A Study of 130 Invasive Ductal Breast Carcinomas

2005;65:1376-1383. Cancer Res   Fabien Reyal, Nicolas Stransky, Isabelle Bernard-Pierrot, et al.   Carcinomas

A Study of 130 Invasive Ductal Breast−−Co-expressed Genes Maps: Evidence of Chromosomal Domains Containing

Visualizing Chromosomes as Transcriptome Correlation

  Updated version

  http://cancerres.aacrjournals.org/content/65/4/1376

Access the most recent version of this article at:

  Material

Supplementary

  http://cancerres.aacrjournals.org/content/suppl/2005/10/12/65.4.1376.DC1.html

Access the most recent supplemental material at:

   

   

  Cited Articles

  http://cancerres.aacrjournals.org/content/65/4/1376.full.html#ref-list-1

This article cites by 38 articles, 18 of which you can access for free at:

  Citing articles

  http://cancerres.aacrjournals.org/content/65/4/1376.full.html#related-urls

This article has been cited by 17 HighWire-hosted articles. Access the articles at:

   

  E-mail alerts related to this article or journal.Sign up to receive free email-alerts

  Subscriptions

Reprints and

  [email protected] at

To order reprints of this article or to subscribe to the journal, contact the AACR Publications

  Permissions

  [email protected] at

To request permission to re-use all or part of this article, contact the AACR Publications

Research. on March 14, 2015. © 2005 American Association for Cancercancerres.aacrjournals.org Downloaded from