Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes—A Study of 130 Invasive Ductal Breast Carcinomas
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Visualizing Chromosomes as Transcriptome Correlation Maps:
Evidence of Chromosomal Domains Containing Co-expressed
Genes—A Study of 130 Invasive Ductal Breast Carcinomas
Fabien Reyal,1,2Nicolas Stransky,
1Isabelle Bernard-Pierrot,
1Anne Vincent-Salomon,
3
Yann de Rycke,4Paul Elvin,
9Andrew Cassidy,
9Alexander Graham,
9Carolyn Spraggon,
9
Yoann Desille,1Alain Fourquet,
5Claude Nos,
2Pierre Pouillart,
6Henri Magdelenat,
7
Dominique Stoppa-Lyonnet,3Jerome Couturier,
3Brigitte Sigal-Zafrani,
3
Bernard Asselain,4Xavier Sastre-Garau,
3Olivier Delattre,
8
Jean Paul Thiery,1,7and Francois Radvanyi
1
1Unite Mixte de Recherche 144, Centre National de la Recherche Scientifique; Departments of 2Surgery, 3Tumor Biology, 4Biostatistics,5Radiotherapy, 6Medical Oncology, and 7Translational Research; and 8U509, Institut National de la Sante et de la Recherche Medicale,Institut Curie, Paris, France and 9Astra Zeneca, Alderley Park, United Kingdom
Abstract
Completion of the working draft of the human genome hasmade it possible to analyze the expression of genesaccording to their position on the chromosomes. Here, weused a transcriptome data analysis approach involving foreach gene the calculation of the correlation between itsexpression profile and those of its neighbors. We used theU133 Affymetrix transcriptome data set for a series of 130invasive ductal breast carcinomas to construct chromosomalmaps of gene expression correlation (transcriptome corre-lation map). This highlighted nonrandom clusters of genesalong the genome with correlated expression in tumors.Some of the gene clusters identified by this methodprobably arose because of genetic alterations, as most ofthe chromosomes with the highest percentage of correlatedgenes (1q, 8p, 8q, 16p, 16q, 17q, and 20q) were also themost frequent sites of genomic alterations in breast cancer.Our analysis showed that several known breast tumoramplicons (at 8p11-p12, 11q13, and 17q12) are locatedwithin clusters of genes with correlated expression. Usinghierarchical clustering on samples and a Treeview repre-sentation of whole chromosome arms, we observed ahigher-order organization of correlated genes, sometimesinvolving very large chromosomal domains that couldextend to a whole chromosome arm. Transcription correla-tion maps are a new way of visualizing transcriptome data.They will help to identify new genes involved in tumorprogression and new mechanisms of gene regulation intumors. (Cancer Res 2005; 65(4): 1376-83)
Introduction
Large-scale transcriptome analyses based on DNA microarrayshave facilitated the classification of cancers into biologicallydistinct categories, some of which may explain the clinicalbehaviors of tumors (1–3). Such analyses may also help to findnew prognostic and predictive markers (4, 5). Completion of theinitial working draft of the human genome has made it possibleto interpret transcriptome data in a new way, by directlyassigning the genome-wide, high-throughput gene expressionprofiles to the human genome sequence (6). A few recent studieshave explored the relationship between the transcriptome andthe positions of genes on chromosomes. Using data from serialanalysis of gene expression (SAGE) in a range of human normaland tumor tissues, Caron et al. (7) showed that highly expressedgenes are often found in clusters in specific chromosomalregions, called regions of increased gene expression. Byexpressed sequence tag collection data analysis, Zhou et al. (8)have compared normal and tumor tissues in 10 different tissuetypes and showed clusters of genes exhibiting increasedexpression along chromosomes in tumors. Many of thesegenomic regions corresponded to known amplicons. By compar-ing serial analysis of gene expression data for normal bronchialepithelium, adenocarcinomas, and squamous cell carcinomas ofthe lung, Fujii et al. (9) identified clusters of overexpressed andunderexpressed genes. They also showed that in squamous cellcarcinomas of the lung, about half of these clusters were locatedin imbalanced chromosomal regions previously identified bycytogenetic, comparative genomic hybridization, or loss ofheterozygosity studies. Two studies have directly explored therelationship between DNA copy number alterations and geneexpression in breast tumors using cDNA microarrays and foundthat 40% to 60% of the highly amplified genes were overex-pressed (10, 11). Therefore, clustering of overexpressed genescould be attributable, at least in part, to underlying gene copynumber alterations. The different approaches mentioned aboveestablished a relationship between the transcriptome and thechromosomal location of the genes by analyzing the tran-scriptome considering the expression of each gene individually.The genes are subsequently grouped into chromosomal domainsaccording to the expression data. A second approach to explorethe relationship between the transcriptome and the organization
Note: F. Reyal, N. Stransky, and I. Bernard-Pierrot contributed equally to this work.B. Sigal-Zafrani contributed on behalf of the Institut Curie Breast Cancer Group (seeAcknowledgments).
Supplementary data for this article are available at Cancer Research Online(http://cancerres.aacrjournals.org/). The original figures of the article as well as thesupplementary data can be found at http://microarrays.curie.fr/publications/oncologie_moleculaire/breast_TCM/.
Requests for reprints: Francois Radvanyi, Unite Mixte de Recherche 144, InstitutCurie-CNRS, 26 rue dVUlm, 75248 Paris Cedex 05, France. Phone: 33-1-42-34-63-39; Fax:33-1-42-34-63-49; E-mail: [email protected].
I2005 American Association for Cancer Research.
Cancer Res 2005; 65: (4). February 15, 2005 1376 www.aacrjournals.org
of the genome was developed first on budding yeast tran-scriptome data (12) and subsequently on Drosophila tran-scriptome data (13). This transcriptome data analysis approachsearches for groups of neighboring genes that show correlatedexpression profiles. Both groups identified chromosomaldomains of similarly expressed neighboring genes. In normalcells, these chromosomal domains of co-regulated genes mayrepresent chromatin-regulated regions or groups of genesregulated by the same transcription factor(s). We developed asimilar computational method for the analysis of transcriptomedata to identify chromosomal domains containing co-expressedgenes in cancer and applied it to a series of 130 invasive ductalbreast carcinomas.
Materials and MethodsPatients and Breast Tumor Samples. We analyzed the gene
expression profiles of 130 infiltrating ductal primary breast carcinomas.
These carcinomas were obtained from 130 patients who were included
in the prospective database initiated in 1981 by the Institut Curie BreastCancer Group between 1989 and 1999. The flash-frozen tumor samples
were stored at �80jC immediately after lumpectomy or mastectomy. All
tumor samples contained >50% cancer cells, as assessed by H&E
staining of histologic sections adjacent to the samples used for thetranscriptome analysis. The clinical data for the patients and the
histologic characteristics of the tumors are summarized in Supplemen-
tary Table S1. This study was approved by the institutional review
boards of Institut Curie.RNA Extraction and Microarray Data Collection. RNA was extracted
from all tumor samples by the cesium chloride protocol (14, 15). Theconcentration and the integrity/purity of each RNA sample were measuredusing RNA 6000 LabChip kit (Agilent Technologies, Palo Alto, CA) and theAgilent 2100 bioanalyzer. The DNA microarrays used in this study were theHuman Genome U133 set (HG-U133, Affymetrix, Santa Clara, CA),consisting of two GeneChip arrays (A and B), and containing almost45,000 probe sets. Each probe set consisted of 22 different oligonucleotides(11 of which are a perfect match with the target transcript and 11 of whichharbor a one-nucleotide mismatch in the middle). These 22 oligonucleo-tides were used to measure the level of a given transcript. Details of the RNAamplification, labeling, and hybridization steps are available from http://www.affymetrix.com. Chips were scanned and the intensities for each probeset were calculated using Affymetrix MAS5.0 default settings. The meanintensity of the probe sets for each array was set to a constant target value(500) by linearly scaling the array signal intensities.
Selection of Probe Sets: Attribution of a Unique Probe Set to EachGene. The fact that some genes are represented by several probe sets could
introduce artifacts when looking for neighboring genes with similarexpression patterns because the intensities of these probe sets are highly
correlated. To avoid this artifact, probe sets corresponding to uncharac-
terized expressed sequence tags were removed from the analysis, and whenseveral probe sets corresponded to the same gene (i.e., if different probe sets
had the same title, the same GenBank ID or belonged to the same UniGene
Cluster), a single probe set was kept for the analysis. Probe sets with an
‘‘_at’’ extension were preferentially kept because they tend to be morespecific according to Affymetrix probe set design algorithms. Probe sets
with an ‘‘_s_at’’ extension were the second best choice followed by all other
extensions. When several probe sets with the same extension were available
for one gene, the one with the highest median value was kept. From aninitial list of f45,000 probe sets, we kept 16,215 ‘‘unique’’ probe sets
corresponding to a unique gene. Due to the univocal correspondence
between these 16,215 probe sets and genes, we will use the terms ‘‘probeset’’ and ‘‘gene’’ indifferently.
Chromosomal Location of the Probe Sets. Each of the 16,215 probe
sets represents a unique gene. Their genomic locations were obtained
from the U133 annotation files from Affymetrix. When the position of the
probe set was not available, it was obtained using the basic localalignment search tool–like alignment tool program (16) for sequence-
matching searches of probe set–specific target sequences (an f600
nucleotide sequence from which probe set oligonucleotides are derived)
against the University of California Santa Cruz Human Genome WorkingDraft sequence, or by using the position of their corresponding UniGene
Cluster (Homo sapiens UniGene Build 164). All positions refer to the July
2003 Human Genome Working Draft.
Calculation of Transcription Correlation Scores and Identificationof Neighboring Genes with Correlated Expression. A similar method to
those described in yeast and Drosophila (12, 13), based on transcriptome
array data, has been developed in our laboratory (by N. Stransky) to evaluate
the correlation between the expression profile of each gene and those of itsneighbors. For each probe set (gene), we calculated a score, which we called
the transcriptome correlation score. This score is the sum of the Spearman
rank order correlation values in the tumor samples between the RNA levelsof this gene and the RNA levels of each of the physically nearest 2n genes (n
centromeric genes and n telomeric genes).
To determine a significance threshold (i.e., a score above which a gene is
considered to have a similar expression pattern to its neighbors), we created100 random data sets of the same size by randomly ordering the 16,215
probe sets on the genome. For each random set, transcriptome correlation
scores were calculated for each probe set. The significance threshold was
the 500th quantile of the distribution (i.e., the value for which 1 of 500 probesets in the random data sets were above this value). Probe sets with a score
exceeding the threshold are consequently significantly correlated with their
neighbors within a number of 2n probe sets at P < 0.002 and are called‘‘correlated probe sets.’’
To determine the appropriate number (2n) of neighboring genes needed
to calculate the transcriptome correlation score for each gene of our data
set, the total number of genes with a score above the threshold wascalculated as a function of 2n (13) for several values ranging from 2 to 34
(Supplementary Fig. S1). Above 2n = 20, this number reaches a plateau.
Therefore, 20 neighboring genes were used to calculate the transcriptome
correlation score.For each of the 16,215 probe sets used, we calculated the transcriptome
correlation score. For each chromosome, we obtained a diagram (the
transcriptome correlation map) representing this score for each probe set,organized according to its chromosomal position.
Correlation between Adjacent Groups of Correlated Genes. To
determine if there was a correlation between adjacent groups of correlated
genes, we analyzed the correlated probe sets by one-way unsupervisedhierarchical clustering on samples (17) and by leaving the probe sets
organized according to their chromosomal position.
Software Used for Data Analyses. Statistical analysis and all
calculations were done using R 1.9.0 (http://www.r-project.org). We usedthe HKIS software (http://isoft.free.fr/hkis/) to look up gene positions in
public databases and for data formatting. We used Java TreeView 1.0.5
(http://jtreeview.sourceforge.net/) to make representations of Eisen clusters
(17) obtained with Cluster 3.0 (http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/).
ResultsGeneration of a Transcriptome Map Based on the
Correlated Expression of Neighboring Genes (TranscriptomeCorrelation Map) in Infiltrating Ductal Breast Carcinoma.The RNA expression data of 130 ductal invasive breastcarcinomas were obtained using the Affymetrix U133 set. Of the45,000 probes in the initial set, 16,215 probe sets, eachcorresponding to a unique gene, were kept to establishchromosomal transcriptome maps (see Materials and Methods).These maps highlight genes that show a correlated expressionwith their neighbors. The transcriptome correlation maps forgenes located on chromosomes 1 and 2 are shown in Fig. 1. Thesignificance threshold was defined as the 500th quantile of the
Transcriptome Correlation Map of Breast Carcinomas
www.aacrjournals.org 1377 Cancer Res 2005; 65: (4). February 15, 2005
distribution of the resulting 1.6 million transcriptome correlationscores in the random data sets and was equal to 3.38. Examplesof a random data set for chromosomes 1 and 2 are given in Fig. 1(right panels). The transcriptome correlation maps of the differentchromosomes, including chromosomes 1 and 2, showed thatgenes with a transcriptome correlation score above the thresholdwere not distributed uniformly along the chromosomes: groups of
adjacent genes with correlated expression could be seen (Fig. 1
and Supplementary Fig. S2 and Table S2). Interestingly, the genes
corresponding to the three well-known amplification regions in
breast cancer [8p11-p12 (FGFR1 locus), 11q13 (CCDN1 locus), and
17p12 (ERBB2 locus)] were present in chromosomal regions
containing genes with transcriptome correlation scores higher
than the threshold (Fig. 2).Large Chromosomal Domains of Co-expressed Genes.
Overall, 20% of the genes had a significant transcriptome correlationscore (i.e., their expression was correlated with that of theirneighbors). The percentage of genes with a significant transcriptome
correlation score differed significantly between the differentchromosome arms (Fig. 3). The chromosome arms with the highestpercentages (higher than 30%) were 1q (243 of 750 genes), 8p (82 of207 genes), 8q (185 of 364 genes), 14q (174 of 527 genes), 16p (179 of379 genes), 16q (110 of 318 genes), 17q (303 of 704 genes), and 20q(101 of 304 genes). Beside chromosome arm 14q, all thesechromosome arms are also the most frequent locations of genomic
alterations in breast cancer. We analyzed chromosome arms 1q, 8p,
8q, 16p, 16q, and 17q in more detail (Figs 4–6 and Supplementary
Fig. S4). For this analysis, we considered only genes with a
transcriptome correlation score above the threshold (Figs. 4A-6A
and Supplementary Fig. S4A). The expression patterns of these
genes, ordered according to their chromosomal location, were
examined in the 130 breast cancer samples by using an unsupervisedhierarchical clustering representation (one-way clustering onsamples). Very similar expression patterns were obtained for the243 genes on chromosome 1q with a significant transcriptomecorrelation score (Fig. 4B). These genes extended from PEX11B
Figure 1. Transcriptome correlation map of chromosomes 1 and 2 in infiltrating ductal breast carcinoma. The transcriptome correlation score (TC score) for agiven gene indicates the strength of the correlation between the expression of this gene and the expression of the 20 neighboring genes (see Materials and Methods).The transcriptome correlation map shows the scores for the different genes as a function of their position along the chromosome. Left, transcriptome correlationmaps of chromosome 1 and chromosome 2. Dashed line, significance threshold: Genes with scores above this threshold are considered to have a significantlycorrelated expression with that of their neighbors. This significant threshold was calculated using random data sets (see Materials and Methods).Right, examples of random data set for chromosome 1 and chromosome 2.
Cancer Research
Cancer Res 2005; 65: (4). February 15, 2005 1378 www.aacrjournals.org
(1q21.1) to ELYS (1q44), spanning a 100,677-kb region. A group ofgenes with a similar expression pattern was also apparent on thelong arm of chromosome 8 (Fig. 5B) and extended to thecentromeric region of chromosome 8p [213 genes, spanning108,140 kb from FLJ14299 (8p12) to KIAA0014 (8q24.3)]. The 8p11-p12 amplified region containing the FGFR1 locus was located at theedge of this domain. A very different expression pattern wasobserved for the genes telomeric to the FGFR1 locus [54 genes,spanning 25,203 kb from LOC84549 (8p12) to DKFZp761P0423(8p23.1)]. The 289 genes on chromosome 16 (Supplementary Fig.S4B) with a transcriptome correlation score higher than thethreshold corresponded to two different sets of genes. One was agroup of 110 genes all localized on 16q and spanning 43,127 kbfrom VPS35 (16q11.2) to FANCA (16q24.3) and the second was agroup of 179 genes all localized on 16p, spanning 30,901 kb fromBCKDK (16p11.2) to RGS11 (16p13.3). In contrast, numerousdifferent expression patterns were observed on chromosome 17q(Fig. 6B). At least six different sets of genes were found: A group of36 genes (a) spanning 4,034 kb from TNFAIP1 (17q11.2) to ZNF207(17q11.2), a group of 37 genes (b) spanning 4,110 kb from FLJ22865(17q12) to SMARCE1 (17q21.2), a group of 64 genes (c) spanning7,045 kb from KRTAP4-15 (17q21.2) to SCAP1 (17q21.32), a group offour genes (d) spanning 78 kb from HOXB2 (17q21.32) to HOXB7(17q21.32), a group of 74 genes (e) spanning 19,006 kb from
KIAA0924 (17q21.32) to LOC51321 (17q24.2), and a group of 88genes ( f ) spanning 14,219 kb from SLC16A6 (17q24.2) to MGC4368(17q25.3 (17925.3) (Fig. 6).
Discussion
Microarray technology makes it possible to monitor theexpression of thousand of genes simultaneously. Completion ofthe initial working draft of the human genome has made itpossible to analyze the expression of the genes according to
their position on the genome. Comparison of the expressionpatterns of adjacent genes in different expression array data setsin yeast (12, 18), Drosophila (13), and worms (19) has led to the
identification of groups of physically adjacent genes that sharesimilar expression profiles. In this work, we used this newapproach to analyze large-scale transcriptome data concerning
cancer.To identify systematically co-expressed adjacent genes along the
chromosomes, we calculated an expression correlation score foreach given gene. This score was the sum of the correlation values
between the RNA expression levels of this gene and the RNAexpression levels of each of the 20 neighboring genes (10centromeric genes and 10 telomeric genes). Using the U133Affymetrix transcriptome data for a series of 130 invasive ductal
Figure 2. Transcriptome correlation maps of regions localized on chromosomes 8, 11, and 17 containing known amplicons in infiltrating ductal breast carcinoma.The transcriptome correlation maps of chromosome 8 (region p21.2 to q11.21; left), chromosome 11 (region q11 to q14.1; middle ), and chromosome 17 (region q11.2to q21.31; right ) are shown. Vertical bars, position of the centromeres. Bold squares, transcriptome correlation scores (TC score) for FGFR1, CCND1,and ERBB2.
Transcriptome Correlation Map of Breast Carcinomas
www.aacrjournals.org 1379 Cancer Res 2005; 65: (4). February 15, 2005
breast carcinomas, we constructed a genome-wide map that wecalled the transcriptome correlation map. This map highlights the‘‘correlated genes’’ (i.e., genes that share a similar expression profilewith their neighbors).We have found in this series of breast tumors that f20% of
the genes showed a significant correlation with their neighbors.The transcription correlation maps of the different chromo-somes revealed regions with a high percentage of correlatedgenes separated by regions containing genes not significantlycorrelated with their neighbors. The precise physical definition
of the correlated regions was not always straightforward as acontinuum between the correlated regions was observed insome cases.We addressed the effect of the gene densities on the percentage
of correlated genes in the transcription correlation map. Unlikefor the regions of increased gene expression described by Caronet al. (7), we observed no systematic correlation between genedensity and the percentage of genes that are over the thresholdon the transcription correlation map (Supplementary Fig. S3).Genetic or nongenetic mechanisms could account for thecorrelation in expression between neighboring genes. Aneusomywill affect in the same way the expression of a series of adjacentgenes not subjected to gene dosage compensation. This has beendescribed both for DNA losses and gains in yeast (20) and inhumans (10, 11, 21, 22). Nongenetic mechanisms could also affectthe expression of neighboring genes. Several models or combina-tions of models have been proposed: long range effect oftranscription factors, chromatin structure modification, andincreased concentration of components of the transcriptionalmachinery due to a particular subnuclear location of chromo-somal segments (23). Genomic alterations most likely explain partof the correlation between neighboring genes in breast tumorsbecause, except for chromosome arm 14q, the chromosome armspresenting the highest percentage of correlated genes (8q, 51%;16p, 47%; 17q, 43%; 8p, 40%; 16q, 35%; 20q, 33%; 1q, 32%) werealso known to harbor frequent chromosome imbalance, as shownby karyotypic, comparative genomic hybridization, or loss ofheterozygosity studies (24–30). Two well-characterized amplicons,the FGFR1 amplicon at 8p11-p12 and the ERBB2 amplicon at17q12, corresponded to regions presenting a high percentage ofcorrelated genes. Very recent data on breast cancer cell lines
suggest that FGFR1 is not the driving oncogene in the 8p11-p12amplicon. Several new candidate oncogenes located in this regionhave been suggested to play a causal role in breast cancerprogression in tumors with an amplification in the 8p11-p12
Figure 3. Percentage of genes with atranscriptome correlation score (TC score)higher than the threshold for eachchromosomal arm in infiltrating ductalbreast carcinoma. For each chromosomalarm, the number of genes with atranscriptome correlation score higher thanthe significant correlation score dividedby the total number of considered geneson the chromosome arm was calculatedand expressed as a percentage.
Figure 4. Chromosomal domains of co-expressed genes on chromosome 1q. A ,transcriptome correlation map of chromosome 1. B , unsupervised hierarchicalcluster analysis of 130 infiltrating ductal breast carcinomas using the genes onthe long arm of chromosome 1 with a significant transcriptome correlationscore (TC score ). Each row corresponds to one tumor sample and each columncorresponds to one gene. The genes are arranged in cytogenetic order fromthe centromere to the q telomere: red, high level of expression relative to the meanexpression level in the 130 tumor samples; green, low level of gene expression.The genes under line a correspond to the genes contained in rectangle a of A .
Cancer Research
Cancer Res 2005; 65: (4). February 15, 2005 1380 www.aacrjournals.org
region. Of the 15 genes with a significant transcriptomecorrelation score from this region, nine (HTPAP, KIAA0725 ,FGFR1 , BAG4 , LSM1 , RCP, BRF2 , PROSC , and FLJ14299) havealready been proposed to be potential oncogenes (31). The ERBB2region at 17q12 is of particular interest in breast cancer.Amplification and overexpression of ERBB2 is associated withpoor clinical outcome and is found in 15% of infiltrating breastcarcinomas. Moreover, ERBB2 gene amplification determines theresponse to specific antibody-based therapy (trastuzumab; ref. 32).All seven of the genes located in the minimal ERBB2amplification region (280 kb) that presented an overexpressionassociated with amplification (STARD3 , PNMT , TCAP,CAB2 , ERBB2 , MGC14832 , and GRB7 ; ref. 33) corresponded tocorrelated genes in the 17q12 region. The 11q13 region is oftenamplified in breast cancer, and CCND1 was initially thought to bethe main oncogene in this region. Although CCND1 was one ofthe correlated genes, many other genes in this region had similaror higher transcriptome correlation scores, possibly becausemultiple mechanisms for increasing the expression of CCND1occur frequently in addition to amplification (33) and/or dueto the complexity of the 11p13 amplification region, whichprobably contains several different amplicons (34). It should benoted that within the different regions rich in correlated genes,in particular 11q13 and 17q12, f50% of the genes were foundto be correlated, meaning that the other 50% of genes are not
correlated in these regions. This percentage is in goodagreement with the results of Hyman et al. (10) and Pollacket al. (11), who showed that f50% of the genes within anamplified region are overexpressed.We found chromosomal regions harboring genes with
correlated expression in regions known to contain ampliconslike 8p11-p12, 11q13, and 17q12 in regions exhibiting frequentsingle chromosome arm gain like 1q, 8q, and 16p, and inregions exhibiting chromosomal losses like 16q. To determinethe exact genetic mechanisms responsible for the correlation inexpression between adjacent genes, it will be necessary toobtain, in parallel with transcriptome data, genome data likethose obtained by comparative genomic hybridization arrays(35). The combination of comparative genomic hybridizationarray data and transcription correlation map analysis shouldhelp us to identify genes involved in tumor progression. Groupsof genes with correlated expression were rarer but also presentin regions not affected by genetic alterations in breast cancerlike chromosome arms 2p or 9q. Our new approach foranalyzing the transcriptome in tumors may pinpoint genesinvolved in tumor progression, the expression of which isaltered by nongenetic mechanisms.Using hierarchical clustering and a Treeview representation,
we were able to show that regions with genes with similarexpression patterns can extend over very large chromosomal
Figure 5. Chromosomal domains of co-expressed genes on chromosome 8.A, transcriptome correlation map of chromosome 8. B, unsupervisedhierarchical cluster analysis of 130 infiltrating ductal breast carcinomas usingthe genes of chromosome 8 with a significant transcriptome correlationscore (TC score). Each row, one tumor sample; each column, one gene.The genes are arranged in cytogenetic order from the p telomere to theq telomere: red, high level of expression relative to the mean expression level inthe 130 tumor samples; green, low level of gene expression. The genesunder line a (or b) correspond to the genes contained inrectangle a (or b) of A .
Figure 6. Chromosomal domains of co-expressed genes on chromosome17q. A, transcriptome correlation map of chromosome 17. B, unsupervisedhierarchical cluster analysis of 130 infiltrating ductal breast carcinomas usingthe genes on the long arm of chromosome 17 with a significant transcriptomecorrelation score (TC score ). Each row corresponds to one tumor sampleand each column corresponds to one gene. The genes are arrangedin cytogenetic order from the centromere to the q telomere: red, high levelof expression relative to the mean expression level in the 130 tumor samples;green, low level of gene expression. The genes under line a (or b , c , d , e , f)correspond to the genes under line a (or b , c , d , e , f ) of A .
Transcriptome Correlation Map of Breast Carcinomas
www.aacrjournals.org 1381 Cancer Res 2005; 65: (4). February 15, 2005
domains, sometimes involving a whole chromosome arm. This
phenomenon could be due to genetic or nongenetic mecha-
nisms. Changes in whole chromosome gene expression due to
aneuploidy have already been described in yeast and in human
(20–22). Modification of gene expression involving large chro-
mosomal domains or even entire chromosomes could also be
due to epigenetic mechanisms (36).The ‘‘transcriptome correlation map’’ approach has two strong
points: (a) although potentially very useful, knowledge about gene
expression in normal tissue is not compulsory. This is particularly
useful for tumors for which the normal counterparts are difficult
to obtain, such as breast or ovarian carcinomas, or when the
cellular origin of the tumor is unknown, like Ewing tumors,
synovialosarcoma, or medulloblastomas; (b) it will be possible to
compare the transcriptome correlation maps between different
groups of samples even if the data for the different groups are
obtained on different platforms because transcriptome correlation
map does not compare the gene expression in different groups
but the correlation of expression between neighboring genes
within a group of samples. Additionally, subsets of genes with a
significant correlation score could be used to classify tumors into
meaningful anatomoclinical groups. It should be noted that the
relationship between regions of correlated genes can occur
not only between adjacent regions but also between regions
on different chromosome arms or even different chromo-
somes (e.g., higher levels of expression of the genes included
in the ERBB2 cluster are associated with lower expression
levels of the genes included in the CCND1 cluster). The
systematic investigation of such relationship could help to
identify combinations of events that occur during tumor
progression.Transcriptome correlation maps are a new way of interpreting
transcriptome data. Combined with other molecular data (i.e.,
chromosomal alteration data obtained by comparative genomic
hybridization arrays or large-scale methylation analysis; refs.
35, 37, 38), transcriptome correlation map can be applied to any
tumor type and will help to identify new genes involved in
tumor progression and new mechanisms of gene regulation in
tumors.
Acknowledgments
Received 7/29/2004; revised 10/13/2004; accepted 12/9/2004.Grant support: Centre National de la Recherche Scientifique, Institut Curie Breast
Cancer program, European FP5 IST HKIS project, and Comite de Paris Ligue NationaleContre le Cancer (Laboratoire associe); Ligue Nationale Contre le Cancer fellowship(F. Reyal and I. Bernard-Pierrot) and French Ministry of Education and Researchfellowship (N. Stransky).
The Institut Curie Breast Cancer Group: Bernard Asselain, Alain Aurias, EmmanuelBarillot, Francois Campana, Patricia De Cremoux, Olivier Delattre, Veronique Dieras,Jean-Marc Extra, Alain Fourquet, Henri Magdelenat, Martine Meunier, Claude Nos,Thao Palangie, Pierre Pouillart, Marie-France Poupon, Francois Radvanyi, XavierSastre-Garau, Brigitte Sigal-Zafrani, Dominique Stoppa-Lyonnet, Anne Tardivon,Fabienne Thibault, Jean Paul Thiery, and Anne Vincent-Salomon.
The costs of publication of this article were defrayed in part by the payment of pagecharges. This article must therefore be hereby marked advertisement in accordancewith 18 U.S.C. Section 1734 solely to indicate this fact.
References1. Perou CM, Sorlie T, Eisen MB, et al. Molecularportraits of human breast tumours. Nature 2000;406:747–52.
2. Sorlie T, Perou CM, Tibshirani R, et al. Geneexpression patterns of breast carcinomas distinguishtumor subclasses with clinical implications. Proc NatlAcad Sci U S A 2001;98:10869–74.
3. Sorlie T, Tibshirani R, Parker J, et al. Repeatedobservation of breast tumor subtypes in independentgene expression data sets. Proc Natl Acad Sci U S A2003;100:8418–23.
4. van’t Veer LJ, Dai H, van de Vijver MJ, et al. Geneexpression profiling predicts clinical outcome of breastcancer. Nature 2002;415:530–6.
5. van de Vijver MJ, He YD, van’t Veer LJ, et al. A gene-expression signature as a predictor of survival in breastcancer. N Engl J Med 2002;347:1999–2009.
6. Collins FS, Morgan M, Patrinos A. The HumanGenome Project: lessons from large-scale biology.Science 2003;300:286–90.
7. Caron H, van Schaik B, van der Mee M, et al. Thehuman transcriptome map: clustering of highlyexpressed genes in chromosomal domains. Science2001;291:1289–92.
8. Zhou Y, Luoh SM, Zhang Y, et al. Genome-wideidentification of chromosomal regions of increasedtumor expression by transcriptome analysis. Cancer Res2003;63:5781–4.
9. Fujii T, Dracheva T, Player A, et al. A preliminarytranscriptome map of non-small cell lung cancer.Cancer Res 2002;62:3340–6.
10. Hyman E, Kauraniemi P, Hautaniemi S, et al. Impactof DNA amplification on gene expression patterns inbreast cancer. Cancer Res 2002;62:6240–5.
11. Pollack JR, Sorlie T, Perou CM, et al. Microarrayanalysis reveals a major direct role of DNA copynumber alteration in the transcriptional program of
human breast tumors. Proc Natl Acad Sci U S A 2002;99:12963–8.
12. Cohen BA, Mitra RD, Hughes JD, Church GM. Acomputational analysis of whole-genome expressiondata reveals chromosomal domains of gene expression.Nat Genet 2000;26:183–6.
13. Spellman PT, Rubin GM. Evidence for large domainsof similarly expressed genes in the Drosophila genome.J Biol 2002;1:5.
14. Coombs LM, Pigott D, Proctor A, Eydmann M,Denner J, Knowles MA. Simultaneous isolation ofDNA, RNA, and antigenic protein exhibiting kinaseactivity from small tumor samples using guanidineisothiocyanate. Anal Biochem 1990;188:338–43.
15. Chirgwin JM, Przybyla AE, MacDonald RJ, Rutter WJ.Isolation of biologically active ribonucleic acid fromsources enriched in ribonuclease. Biochemistry1979;18:5294–9.
16. Kent WJ. BLAT—the BLAST-like alignment tool.Genome Res 2002;12:656–64.
17. Eisen MB, Spellman PT, Brown PO, Botstein D.Cluster analysis and display of genome-wide expressionpatterns. Proc Natl Acad Sci U S A 1998;95:14863–8.
18. Kruglyak S, Tang H. Regulation of adjacent yeastgenes. Trends Genet 2000;16:109–11.
19. Lercher MJ, Blumenthal T, Hurst LD. Coexpression ofneighboring genes in Caenorhabditis elegans is mostlydue to operons and duplicate genes. Genome Res2003;13:238–43.
20. Hughes TR, Roberts CJ, Dai H, et al. Widespreadaneuploidy revealed by DNA microarray expressionprofiling. Nat Genet 2000;25:333–7.
21. Virtaneva K, Wright FA, Tanner SM, et al. Expressionprofiling reveals fundamental biological differences inacute myeloid leukemia with isolated trisomy 8 andnormal cytogenetics. Proc Natl Acad Sci U S A 2001;98:1124–9.
22. Phillips JL, Hayward SW, Wang Y, et al. Theconsequences of chromosomal aneuploidy on gene
expression profiles in a cell line model for prostatecarcinogenesis. Cancer Res 2001;61:8143–9.
23. Oliver B, Parisi M, Clark D. Gene expressionneighborhoods. J Biol 2002;1:4.
24. Tirkkonen M, Tanner M, Karhu R, Kallioniemi A,Isola J, Kallioniemi OP. Molecular cytogenetics ofprimary breast cancer by CGH. Genes ChromosomesCancer 1998;21:177–84.
25. Forozan F, Mahlamaki EH, Monni O, et al.Comparative genomic hybridization analysis of 38breast cancer cell lines: a basis for interpretingcomplementary DNA microarray data. Cancer Res2000;60:4519–25.
26. Cingoz S, Altungoz O, Canda T, Saydam S,Aksakoglu G, Sakizli M. DNA copy number changesdetected by comparative genomic hybridization andtheir association with clinicopathologic parameters inbreast tumors. Cancer Genet Cytogenet 2003;145:108–14.
27. Janssen EA, Baak JP, Guervos MA, van Diest PJ, JiwaM, Hermsen MA. In lymph node-negative invasivebreast carcinomas, specific chromosomal aberrationsare strongly associated with high mitotic activity andpredict outcome more accurately than grade, tumourdiameter, and oestrogen receptor. J Pathol 2003;201:555–61.
28. Jong YJ, Li LH, Tsou MH, et al. Chromosomalcomparative genomic hybridization abnormalities inearly- and late-onset human breast cancers: correlationwith disease progression and TP53 mutations. CancerGenet Cytogenet 2004;148:55–65.
29. Kirchweger R, Zeillinger R, Schneeberger C, Speiser P,Louason G, Theillet C. Patterns of allele losses suggestthe existence of five distinct regions of LOH onchromosome 17 in breast cancer. Int J Cancer1994;56:193–9.
30. Dutrillaux B, Gerbault-Seureau M, Zafrani B. Char-acterization of chromosomal anomalies in humanbreast cancer. A comparison of 30 paradiploid cases
Cancer Research
Cancer Res 2005; 65: (4). February 15, 2005 1382 www.aacrjournals.org
with few chromosome changes. Cancer Genet Cytoge-net 1990;49:203–17.
31. Ray ME, Yang ZQ, Albertson D, et al. Genomic andexpression analysis of the 8p11-12 amplicon inhuman breast cancer cell lines. Cancer Res 2004;64:40–7.
32. Nahta R, Hung MC, Esteva FJ. The HER-2-targetingantibodies trastuzumab and pertuzumab synergisticallyinhibit the survival of breast cancer cells. Cancer Res2004;64:2343–6.
33. Kauraniemi P, Kuukasjarvi T, Sauter G, Kallioniemi A.Amplification of a 280-kilobase core region at theERBB2 locus leads to activation of two hypotheticalproteins in breast cancer. Am J Pathol 2003;163:1979–84.
34. Ormandy CJ, Musgrove EA, Hui R, Daly RJ, Suther-land RL. Cyclin D1, EMS1 and 11q13 amplification inbreast cancer. Breast Cancer Res Treat 2003;78:323–35.
35. Pinkel D, Segraves R, Sudar D, et al. High resolutionanalysis of DNA copy number variation using compar-
ative genomic hybridization to microarrays. Nat Genet1998;20:207–11.
36. Grewal SI, Moazed D. Heterochromatin and epigenet-ic control of gene expression. Science 2003;301:798–802.
37. Zardo G, Tiirikainen MI, Hong C, et al. Integratedgenomic and epigenomic analyses pinpoint biallelicgene inactivation in tumors. Nat Genet 2002; 32:453–8.
38. Huang TH, Perry MR, Laux DE. Methylation profilingof CpG islands in human breast cancer cells. Hum MolGenet 1999;8:459–70.
Transcriptome Correlation Map of Breast Carcinomas
www.aacrjournals.org 1383 Cancer Res 2005; 65: (4). February 15, 2005