This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Journal of Molecular and Cellular Cardiology 43 (2007) 93–106www.elsevier.com/locate/yjmcc
Original article
Comprehensive identification and characterization ofnovel cardiac genes in mouse
Inju Park, Seong-Eui Hong, Tae-Wan Kim, Jiae Lee, Jungsu Oh, Eunyoung Choi, Cecil Han,Hoyong Lee, Do Han Kim, Chunghee Cho ⁎
Department of Life Science, Gwangju Institute of Science and Technology (GIST), Gwangju 500-712, Korea
Received 25 February 2007; received in revised form 14 May 2007; accepted 15 May 2007Available online 24 May 2007
Keywords: Cardiac muscle; Gene expression; Microarray; Systems biology; Transcript; UniGene
1. Introduction
The heart is central to the cardiovascular system. Thismuscular organ beats about 100,000 times rhythmically andspontaneously each day throughout a person's life. Complicatedgene networks regulate the mechanism of heart contraction [1],and subtle changes in genes expressed in cardiac muscle cancause serious heart disease.
A comprehensive understanding of the molecular andphysiological events occurring in cardiac muscle requiresidentification and characterization of genes expressed in thistissue. To date, a number of high-throughput studies haveinvestigated gene expression profile in heart tissues. Most ofthem have focused on differential gene expression when normaland disease conditions were compared [2–12]. Although these
studies have provided important information on the expressionprofiles of a number of cardiac genes, the identities andcharacteristics of unknown cardiac genes with unassignedfunction are elusive.
An inclusive approach to the discovery of novel genes ex-pressed in a given cell or tissue is the analysis of a databasecontaining cell-specific or tissue-specific transcriptomes. TheUniGene database provides information on tissue-specific genecluster expression. The expressed sequence tags (ESTs) inUniGene are organized into clusters and each cluster is com-posed of sequences overlapping with at least one other memberof the same cluster, but not with members of any other cluster.Thus, each cluster is likely to contain sequences correspondingto a single gene. Currently, UniGene is a large and widely usedEST database and contains a large amount of unanalyzedinformation [13]. Furthermore, the UniGene database, combinedwith other computational bioinformatics databases, provides agreat deal of information to assist in an understanding of
94 I. Park et al. / Journal of Molecular and Cellular Cardiology 43 (2007) 93–106
characteristics of genes. Such in silico gene identificationand analysis is a powerful tool of modern molecular biology[14–16].
Of 24 mouse heart and muscle libraries deposited in theUniGene data base at the National Center for BiotechnologyInformation (NCBI), we used the adult mouse cardiac musclelibrary (Li.8901) with 827 UniGene entries. Our analysisshowed that 671 entries are known genes. These genes hadpreviously been named and potential functions had beenassigned to the genes. The other 156 entries were potentiallynovel genes (unnamed genes with unknown or unassignedfunctions). We focused on these novel gene candidates. Initially,we examined whether these gene candidates are genuine novelgenes. Through various expression analyses, we soughtobvious, significant expression of these genes in mouse cardiacmuscle. With authentic genes, we used various in vitro and insilico analyses to obtain comprehensive expression informationat both the gene and protein levels. Our study, presented here, isunique in that we offer systematic identification of previouslyuncharacterized genes that show cardiac expression, providing anew basis for studies to uncover molecular mechanismsunderlying cardiac function.
Total RNA samples were isolated from various tissues ofadult mice and from hearts of mice of different ages. Our workconformed to the Guide for the Care and Use of LaboratoryAnimals published by the US National Institutes of Health (NIHPublication No. 85-23, revised in 1996). Complementary DNAs(cDNAs) were synthesized by random hexamer and oligo(dT)(Promega), priming using Omniscript reverse transcriptase(QIAGEN). Gene-specific primers are listed in Table 1. ForPCR, the following conditions were used. The initial treatmentfor 5 min at 94 °C was followed by 30 cycles of 94 °C for 20 s,55 °C for 40 s and 72 °C for 1 min. The cDNA levels werecalibrated by glyceraldehyde-3-phosphate dehydrogenase(Gapdh).
2.2. Northern blot analysis
Total RNA and poly (A)+ RNA samples were incubated at65 °C for 5 min and separated on 1.2% (w/v) agarose gelscontaining 1.8% (w/v) formaldehyde. Separated RNAs weretransferred to Hybond-XL membrane (Amersham) using acapillary method. The RNA bands were crosslinked using theStratalinker reagent (Stratagene, La Jolla, CA). The blots werepre-hybridized with Rapid-Hyb buffer (Amersham) for 30 minat 68 °C and then hybridized with fresh buffer containing 25 ngof probes, derived from gene-specific PCR products, at 68 °Cfor 2 h. Probe DNA was labeled with [α-32P]dCTP (PerkinElmer) using NucTrap® Probe Purification Columns and thePrime-It random priming kit (Stratagene). The blot was rinsedtwice with 2× SSC/0.05% (w/v) SDS at room temperature for
10 min and twice with 0.1× SSC/0.1% (w/v) SDS at 68 °C for10 min. The blots were developed with Hyperfilm (Amersham),using intensifying screens, at −80 °C.
2.3. Cell culture and transient transfection
The atrially derived cardiac muscle cell line HL-1 (a gene-rous gift from Dr. W. C. Claycomb, Louisiana State UniversityMedical Center) was plated in gelatin/fibronectin-coated culturevessels and maintained in Claycomb medium (JRH Bios-ciences) supplemented with 10% (v/v) fetal bovine serum (JRH,Lot No. 3J0229, no heat inactivation), 0.1 mM norepinephrine,2 mM L-glutamine, 100 U/ml penicillin and 100 U/mlstreptomycin [17]. The C2C12 and COS-7 cell lines (ATCC,Manassas, VA) were routinely propagated in a growth mediumconsisting of Dulbecco's modified Eagle's medium (GIBCO)with 10% (v/v) fetal bovine serum (GIBCO), 100 U/ml ofpenicillin and 100 μg/ml of streptomycin. Cells were main-tained in a 95% humidified chamber at 37 °C with 5% (v/v)CO2. Full-length cDNAs of the novel genes were inserted intothe pEGFP-N2 vector (Clontech). The cloning was designed togenerate a C-terminal green fluorescent protein (GFP) fusion ineach fusion protein. Transient transfection of clones wasachieved using the Lipofectamine™ 2000 and Lipofectamine™LTX transfection reagents. As controls, cells were transfectedwith the cDNA-pEGFP construct of calnexin (NM_007597) orstained with a mitochondrion-selective probe, MitoTracker®(Molecular Probe™). After 18–36 h, cells on the coverslip werefixed with 4% (v/v) paraformaldehyde and mounted on slides.
2.4. Calcium overlay assay
A PCR product corresponding to the entire coding region ofthe Mm.20818 gene was generated using gene-specific primerswith a 5′ EcoHI site and a 3′ XhoI site. After restrictiondigestion, the amplified product was ligated into the pGEX-5X-2 vector (Pahmarcia). The resulting construct was expressed inE. coli BL21. The calcium overlay technique was previouslydescribed [18]. In brief, proteins were transferred from gels tonitrocellulose membranes. The membranes were then washedthree times in a buffer (60 mM KCl, 5 mM MgCl2, 10 mMimidazole–HCl, pH 6.8) for 30 min, followed by probing with45[Ca]Cl2 at concentrations of 0.2–2.0 μM for up to 1 h. Afterprobing, the membranes were washed in 67% (v/v) ethanol,dried and subjected to autoradiography with X-ray film for1–3 days at −80 °C.
2.5. In silico analysis
The cDNA sequences of novel genes were subjected toBLAST analysis using software available in the database ofNCBI Mouse Genome Resources. The University of CaliforniaSanta Cruz (UCSC) Genome Bioinformatics Center (availableon the World Wide Web at genome.ucsc.edu/) provides workingdraft assemblies for a large collection of genomes. The datainclude chromosomal locations, exon–intron structures andanalyses of comparative genomics. The programs Goblet (avail-
Table 1List of novel gene candidates and gene-specific primers designed for RT-PCR analysis
No. UniGene ID GenBank ID PCR primers Product size (bp)
95I. Park et al. / Journal of Molecular and Cellular Cardiology 43 (2007) 93–106
96 I. Park et al. / Journal of Molecular and Cellular Cardiology 43 (2007) 93–106
able on the World Wide Web at goblet.molgen.mpg.de/cgi-bin/GOblet.cgi) and AmiGO (available on the World Wide Web atgodatabase.org/cgi-bin/go.cgi) were used to predict gene onto-logy. Amino acid sequences deduced from cDNA sequences ofnovel genes were analyzed using several computational bioinfor-matics tools. The SignalP and TMHMM (available on the WorldWide Web at www.cbs.dtu.dk/services/) and InterProScan (avail-able on the World Wide Web at www.ebi.ac.uk/InterProScan/)programs were used.
2.6. Preprocessing of heart-Related microarray data andidentification of putative neighbor genes
Gene Expression Omnibus (GEO) is a public repository formicroarray studies, in which the novel genes are included in
Fig. 1. Gene ontology profiles of genes expressed in cardiac muscle. Genes from tontology categories belonging to the three main classes of molecular function (A), ceUinGene build 161 (January 2007).
microarray data from diverse organs and tissues. Data on 15heart- and muscle-related microarray studies were obtainedfrom GEO: GDS40 (cardiac development, maturation andaging), GDS41 (cardiac hypertrophy induced by pressureoverload), GDS144 (cardiac hypertrophy, exercise induced),GDS246 (hypoxia), GDS251 (pulmonary fibrosis), GDS404(circadian oscillations and cardiovascular function), GDS488(myocardial infarction time course), GDS496 (heart failure incardiac-specific TNF alpha transgenics), GDS627 (cardiacdevelopment in embryo), GDS638 (dystrophin-deficient mdxdiaphragm muscle development time course), GDS639 (dystro-phin-deficient mdx hindlimb muscle development time course),GDS641 (dystrophin-deficient mdx, mdx5cv and wild typeskeletal muscle profiles), GDS1032 (coxsackievirus B3 infec-tion effect on hearts: time course), GDS1303 (aldosterone effect
he mouse cardiac muscle UniGene library (Lib.8901) were grouped into genellular component (B) and biological process (C). This classification is based on
Table 2Classification and selection of genes in the cardiac muscle UniGene library
Genes Number
Total entries 827Known 671
97I. Park et al. / Journal of Molecular and Cellular Cardiology 43 (2007) 93–106
on heart: time course) and GDS1306 (MAP kinase activationeffect on heart: time course). The redundant expression profilesfor any one gene were recalculated to obtain representativeexpression profiles. Then, UniGene identifiers were assignedusing chip information from GEO. To obtain a correlationcoefficient reflecting gene expression reliability across the 15microarray data sets, a procedure merging the data sets into asingle expression profile was followed. Here, we used a bypassto reject all genes having uneven distributions in the differentmicroarray studies and we obtained accurate correlationcoefficients as follows. Using the 15 heart-related microarraystudies, the Pearson's correlation coefficient (PCC) between theexpression levels of gene pairs was calculated to help identifynovel cardiac genes on the basis of tissue-specific co-expression.The formula used was:
r ¼nðXni¼0
XiYiÞ�Xni¼0
Xi
Xni¼0
Y iffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffinXni¼0
X 2i �
Xni¼0
ðX 2i Þ
" #nXni¼0
Y 2i �
Xni¼0
ðY 2i Þ
" #vuut
where r is PCC, X and Y are measures of gene expression and nrepresents the number of experiments from which data aretaken. The PCC values were calculated in two different con-texts. First, individual PCC values, from each microarray study,were calculated to reflect dynamic gene expression relation-ships under different heart-related conditions. Second, overallPCC values from integrated expression profiles across the 15microarray studies were calculated to obtain static gene ex-pression relationships in heart tissue. To validate the signifi-cance of PCC data, χ2 analyses were performed in comparisonwith randomly constructed expression profiles created byartificial, arbitrary selection of genes and experimental condi-tions in microarray data. P-values of b10−5 were consideredsignificant. The final selection criteria for genes with relatedexpression levels were as follows. First, the overall PCC,indicating a static functional gene relationship in the heart, wasN0.8. Second, a strong relationship, defined as a PCC N0.9, waspresent in at least 25% of the microarray studies. Third, theminimum PCC (N0.7) identifying a functional relationship wassatisfied in at least 75% of the microarray studies. Finally, atleast 50% of the PCC values from each microarray study wereN0.8 to support the notion that valid functional relationshipswere under study.
Unknown 156Full-length genes 110Genes with GO and domain/motif 58Genes with cardiac expression (RT-PCR) 41Genes with similar or higher expression level in heart,compared to other tissue (RT-PCR)
26
Genes with abundant expression (Northern blot) 15
Genes in the cardiac muscle UniGene library (as of January 2005) were class-ified into known and unknown genes. Of unknown genes, partial length geneswere excluded. Then, genes with gene ontology (GO) data, i.e. informationabout the molecular function, biological process and cellular component of geneproducts, or with known domains or motifs, were selected. These genes wereanalyzed in vitro to select authentic genes with abundant cardiac expression.
3. Results
3.1. The cardiac muscle library and in silico analysis to selectnovel gene candidates
Lib.8901, one of the UniGene libraries deposited in theUniGene database at NCBI (www.ncbi.nlm.nih.gov), is a col-lection of genes expressed in mouse cardiac muscle and con-tains 827 UniGene entries (as of January 2005). To gain aninsight into the characteristics of the cardiac muscle library, we
analyzed the genes in the library based on gene ontology codesthat provide information about cellular component, molecularfunction and biological process (Fig. 1). The overall gene onto-logy feature of these genes was similar to that of genome-widegenes in mouse [19], except for the notable expansion in thecategory of mitochondrial genes. This reflects the active aerobicmetabolism required for muscle contraction and is consistentwith the abundance (25%) of this organelle in the cardiacmuscle cells.
To identify and investigate novel genes expressed in cardiacmuscle, we sorted genes in the library based on the followingcriteria. Genes previously named with known or potentialfunctions were considered to be known genes, while unnamedgenes with unassigned functions were regarded to indicateunknown or novel genes. This classification of the 827 geneentries disclosed that 671 (81%) are known genes, and 156genes (19%) are unknown or novel (Tables 2 and 3). To selectmore principal and reliable genes, we further analyzed the 156genes whether they have mRNAs with poly (A). As a result, weidentified 110 gene entries with potential full-length transcriptsequences. Gene ontology data and indications of putativeprotein domains and motifs were also criteria for more effectiveselection of novel genes. Using these processes, the potential110 novel genes were narrowed down to 58 candidates. Thesegenes further analyzed in vitro (Tables 1–3).
3.2. Cardiac expression of the novel genes
To determine whether the candidates selected from the Uni-Gene library are true novel genes with cardiac expression, weperformed various expression analyses (Tables 1–3). RT-PCRanalysis showed that 41 of the 58 candidates are expressed withthe expected sizes in mouse heart. However, no PCR productswere detected for the other 17 candidates in the heart. Thus, theywere excluded from further analysis. It should be noted thatPCR was designed to be similar among the candidates in primerproperty and product size, and reaction condition was the samefor all the candidates. The 41 gene candidates were furtherexamined for tissue distribution using mouse cDNAs from
Abbreviations: F, gene with full-length transcript sequences; G/D/M, gene withgene ontology, domain or motif data; P1, genes with cardiac expression by PCR;P2, genes with expression level in heart similar to or higher than that in at leastone other tissue; N, genes with signal by Northern blot analysis.
99I. Park et al. / Journal of Molecular and Cellular Cardiology 43 (2007) 93–106
multiple tissues. Among these, we discarded 15 candidateswith the lowest expression levels in cardiac tissue, comparedto other tissues examined (data not shown). Expression datafrom the remaining 26 genes show the presence of at least oneother tissue that shows a lower expression level than heart(Fig. 2). One gene candidate (Mm.19961) was expressed spe-cifically in the heart and two candidates (Mm.158769 andMm.133444) were either muscle-specific or were expressedpredominantly in the heart. Expression of the other putativegenes was widely distributed (Fig. 2A). To establish the ex-pression patterns of the 26 novel genes during development,RT-PCR analysis was carried out using mouse heart cDNAsobtained at different time periods after birth. During develop-ment, the transcriptional levels of 12 genes did not change(group 1), and 14 genes showed increased transcriptional levels(group 2) (Fig. 2B).
3.3. Northern blot analysis
To determine the transcript sizes of the 26 novel genecandidates, Northern blot analysis was performed. Fifteen of the26 genes showed significant transcriptional signals in hearttissue (Fig. 3), whereas the analysis resulted in no signals in theother 11 genes. It should be noted that the relative expressionlevels in different tissues are not comparable between the RT-
PCR and Northern blot results in some of the genes (Figs. 2Aand 3). This may be due to a limitation in reliable quantificationduring the non-exponential phase of PCR reaction. Alterna-tively, our Northern blot assay may not be as accurate as thePCR assay. In this case, the Northern blot analysis should beregarded as an assay only for the qualitative evaluation oftranscripts. Transcript sizes ranged from ∼0.5 kb (Mm.269736)to ∼5 kb (Mm.23928). For 12 genes, transcript sizesdetermined by Northern blotting were comparable to thoseestimated from the UniGene database, while for the other threegenes, significant differences in transcript size (N0.5 kb) wereevident between Northern blots and UniGene databasesequences (Fig. 3). Thus, the transcript sequences for the 12genes can be regarded with confidence as full-length cDNAsor sequences containing the majority of entire cDNA se-quences. Mm.261329 produces transcripts of more than asingle length, both in heart and skeletal muscle. This suggeststhat multiple transcript isoforms of this gene may be formed byalternative splicing. Taken together, our in silico and in vitroanalyses of the 156 potential genes identified 15 authenticgenes abundantly transcribed in mouse heart (Tables 2 and 3).It should be noted that Mm.19961 (named as natriuretic pre-cursor type A) has been reported during the course of our study[20]. Thus, 14 genes excluding Mm.19961 were analyzedfurther.
3.4. In silico analysis of genomic, transcript and proteincharacteristics
To characterize genomic, transcript and protein natures of the14 novel genes, we performed various database searches. Fig. 4shows chromosomal locations, genomic structures, exon organ-ization, transcript sizes, numbers of amino acids, specific do-main/motif and gene ontology of the predicted proteins encodedby the novel genes. The novel genes are widely distributed onmouse chromosomes. The sizes of the genes range from 420 bp(Mm.269736) to 4812 bp (Mm.23928). The exon numbers in thegenes are also variable, ranging from 1 (Mm.138832) to 26(Mm.23928) exons. For the three genes (Mm.128832, Mm.38813 and Mm. 158769) predicted to have additional transcriptsequences, based on the Northern blot result (Fig. 3), gene sizeand exon number could be larger than the present estimation.The protein-coding region of each gene was defined by selectingthe longest amino acid sequence terminating before a poly-adenylation signal (if there is one present). Gene products werepredicted to contain various domains and motifs and were foundto have potential gene ontology codes (Fig. 4). It is noteworthyin relation to the in silico prediction of the characteristics of thenovel genes that seven of the genes have been recently named inthe UniGene database. They are Mm.23928 (FCH domain only2),Mm.41877 (solute carrier family 25, member 33),Mm.27886(NADH dehydrogenase 1 alpha subcomplex 12), Mm.20818(penta-EF hand domain containing 1), Mm.30113 (NADHdehydrogenase Fe-S protein 3), Mm.29342 (R3H domaincontaining 2) and Mm.272253 (inner mitochondrial membranepeptidase-like). Nonetheless, in vitro expression and functionaldata for any of the novel genes have not been reported. Based
Fig. 2. Tissue distribution and developmental expression patterns of 26 novel gene candidates. (A) Tissue distribution of the genes by RT-PCR analysis in varioustissues of adult male mice. B, brain; H, heart; K, kidney; Li, liver; Lu, lung; Ov, ovary; Sk, skeletal muscle; Sp, spleen; T, testis. (B) Expression of the candidates inheart during development. Gene expression was analyzed by PCR using cDNAs prepared from postnatal hearts obtained at different days during mouse development.Fifteen of these 26 genes, found to show significant Northern signals (Fig. 3) and thus, analyzed further in this study, are indicated by vertical bars. To insure the use ofthe equivalent amounts of cDNA templates among different tissues (A) and different developmental stages (B), template amounts were normalized prior to PCRreaction of the novel genes, based on the expression levels of the glyceraldehyde-3-phosphate dehydrogenase (Gapdh) gene.
100 I. Park et al. / Journal of Molecular and Cellular Cardiology 43 (2007) 93–106
on the in silico information, proteins encoded by the novel genesmight be implicated in calcium metabolism (Mm.261329 andMm.20818), mitochondrial functions (Mm.41877, Mm.27866,Mm.269736, Mm.41719, Mm.30113 and Mm.272253), tran-scription and/or nuclear activity (Mm.138832, Mm.38813 andMm.29342), cell structure (Mm.158769) and transmembranetransport (Mm.121122).
3.5. Expression and localization of the novel proteins
To explore the characteristics of the novel genes at theprotein and cellular levels [21], GFP-tagged full-length genesequences were transiently transfected into various cell linessuch as HL-1 (mouse cardiac muscle cells), C2C12 (mouseskeletal muscle myoblast cells) and COS-7 (monkey kidneycells) [17,22,23]. We observed GFP signals from most of thenovel genes, except for Mm.29342 (Fig. 5). The cDNA se-
quence for Mm.29342 may not code for a protein, or expressionof Mm.29342 could be highly transient, very low in amount ordelayed. Proteins tagged with GFP were found to distribute tovarious subcellular organelles, such as nucleus (Mm.261329,Mm.138832, Mm.41877, Mm.27886, Mm.269736, Mm.20818,Mm.38813 and Mm.30113), cytoplasm (Mm.261329,Mm.23928, Mm.41877, Mm.27886, Mm.269736, Mm.20818,Mm.41719, Mm.38813, Mm.30113, Mm.272253, Mm.158769and Mm.121122), mitochondria (Mm. 41719 and Mm.272253),cytoskeleton (Mm.30113) and endoplasmic reticulum(Mm.158769) (Fig. 5). It is important to note that localizationpatterns are highly consistent between cell types in the majorityof the genes. Another notable feature in the distribution patternsis that several gene products are located in more than onecompartment. We next compared bioinformatic information(Fig. 4) with the actual localizations of the gene products (Fig.5). The in silico information and localization data were
Fig. 3. Transcript analysis by Northern blot hybridization. Total RNA or poly (A) RNA from adult mouse heart (H), skeletal muscle (S) and liver (L) were hybridizedwith cDNA probes of the genes. Agarose gels were stained with ethidium bromide to visualize 28S and 18S RNAs as a control to ensure loading of the same amount ofRNA in each lane. Transcript sizes from the UniGene database (DB) and transcripts with significant differences in size between the Northern blots and DB areindicated below the blots.
101I. Park et al. / Journal of Molecular and Cellular Cardiology 43 (2007) 93–106
agreeable in the six of the 13 novel genes analyzed, such asMm.261329, Mm.138832, Mm.20818, Mm.41719, Mm.38813and Mm.121122.
3.6. Identification of genes correlated with the novel genes
To extend our systematic investigation into the novel genes,we attempted to identify genes exhibiting transcriptional cor-relation with the novel genes in heart and muscle, using avail-able cardiac or muscle microarray data from GEO. We used thePython script language, a valuable tool in the management ofmicroarray and genome annotation data. Based on the PCCvalues, clustering algorithms were established to discovergroups of genes which exhibited similar expression profilesacross the data sets. Using the strict PCC values in 15 GEO datasets (see Materials and methods), we found correlated and co-regulated genes for seven novel genes (Table 4). These wereMm.23928, Mm.41877, Mm.269736, Mm.20818, Mm.272253,Mm.158769 and Mm.121122. The numbers of the correlatedgenes varied in each novel gene and their gene ontologyappeared also diverse. It is noteworthy that some of the genescorrelated with Mm.269736 are involved in mitochondrialfunctions and Mm.269736 is also predicted to have the similaractivity (Table 4 and Fig. 4). Taken together, these data of thecorrelated genes provide a basis for characterization of a genenetwork involving the novel genes.
3.7. Characterization of Mm.20818
As an initial step towards the functional characterization ofthe novel genes, we analyzed the protein characteristics of oneof the novel genes, Mm.20818. The protein encoded by Mm.20818 is predicted to have the EF hand motif (Fig. 4) and isfound to be located in the nucleus and cytoplasm (Fig. 5). TheEF hand is a helix–turn–helix structural motif with 30 aminoacids. Two α-helices in the motif often bind to Ca2+ions byconformational changing of loop [24]. However, nearly one-third of all known EF-hand motifs do not bind to Ca2+. Weperformed 45Ca2+ overlay assay with the purified GST-Mm.20818 protein. As shown in Fig. 6, the recombinant Mm.20818protein was found to have a Ca2+-binding activity. The GSTprotein itself did not bind to Ca2+. A protein sample fromsarcoplasmic reticulum from mouse heart, containing cardiaccalsequestrin as the major protein, showed a calsequestrin-specific signal. Thus, the result provides the first evidence thatMm.20818 encode a protein with the Ca2+binding activity.
4. Discussion
Cardiac muscle is specialized to pump blood throughout thebody continuously and rhythmically. Cardiac muscle facilitatesefficient blood pumping without control by the central nervoussystem. Unlike other tissues excited by membrane depolariza-
Fig. 4. Genomic, transcript and protein characteristics of the novel genes in silico. The chromosomal locations of the novel genes, their structure and intron–exonorganization were determined by genome database searches. Vertical bars and connecting horizontal lines show exons and introns, respectively. The directions oftranscription are indicated by arrows. Coding regions were determined by selecting the longest open reading frames deduced from the cDNA sequences and thepredicted coding regions are shaded. Putative domains or motifs are indicated and shaded. FCH, Fes/CIP4 (Cdc42-interacting protein 4) homology domain; Mito_carr,mitochondrial carrier protein; Complex1_17_2 kDa, NADH dehydrogenase; FRQ1, Frequenin 1; Q9CZX4, mitochondrial 18 kDa protein (MTP18); KRAB,Krueppel-associated box; Complex1_30 kDa, NADH dehydrogenase; Encore_like_R3H, the R3H domain containing an invariant arginine and a highly conservedhistidine; Peptidase_S26, signal peptidase I; ABC transport, ATP-binding cassette.
102 I. Park et al. / Journal of Molecular and Cellular Cardiology 43 (2007) 93–106
tion, cardiac excitation and contraction coupling occurs uponpolarization. When the pacemaker produces an excitationsignal, the atria and ventricles contract sequentially. To maintainpumping at a constant rate, the heart provides its own energy.Many protein complexes are required for the physiological,energetic and mechanical cardiac functions. This suggests thatmany cardiac genes elaborately regulate cardiac behavior andfunction. To understand the heart in depth, identification of
systematic networks in cardiac regulation is required at both thegene and protein levels.
Our inclusive approach to the discovery of novel genes fromcardiac muscle uses databases and computational bioinfor-matics tools. UniGene provides information on gene-orientedclusters and also on tissue specificities of gene expression.UniGene is a powerful tool, offering an organized view of thetranscriptosome, with frequent updates and convenient linkages
Fig. 5. Subcellular localization of novel gene products in HL-1, C2C12 and COS-7 cell lines. The cells were transfected with novel cDNA-GFP fusions andGFP-taggedproteins were visualized under fluorescent light and protein locations determined. Regions stained with Hoechst 33258 dye (blue) indicate nucleus. Micrographs at thefirst column show cells expressing GFP alone. Micrographs at the last two columns represent positive controls for localization in endoplasmic reticulum andmitochondrion. Cells were transfected with the cDNA-GFP construct of calnexin known to be localized to endoplasmic reticulum or stained with a mitochondrion-selective probe. N, nucleus; C, cytoplasm; M, mitochondrion; Cs, cytoskeleton; E, endoplasmic reticulum; Mit. probe, mitochondrion-selective probe. Scalebar=10 μm.
103I. Park et al. / Journal of Molecular and Cellular Cardiology 43 (2007) 93–106
to other bioinformatics tools. Lib.8901, selected from theUniGene database including 15 heart-related and 9 muscle-related mouse libraries, is the only cardiac muscle library. Ouranalysis of Lib.8901 revealed that the library has 827 UniGeneentries and significant proportion (19%) of the genes areunknown or uncharacterized. Of these entries, many consist ofonly partial ESTs, and there is therefore a risk that the genesmay be removed from the UniGene database or may beamalgamated with other gene entries. We thus selected 110 genecandidates with full-length sequences, and subsequently 58candidates with available data on gene ontology and domain/
motif structure, since gene ontology and domain/motif dataassist the direction of future work.
Through in vitro expressional analyses, the 58 candidategenes were conclusively reduced to 15 authentic genes. Theother 43 genes were excluded from consideration because theirtranscripts were not detected in the heart (17 candidates),observed at a low level, compared to other tissues (15 can-didates) or not detected in the Northern blot analysis (11 can-didates). Most of the 15 genes were widely expressed in varioustissues. The developmental profile showed that a number of thenovel genes are expressed even in the very early stage of
Table 4List of genes correlated with the novel genes in microarray data
Mm.41877 Mm.27578 Regulation from pol IIMm.269736 Mm.29867 Precursor metabolites
Mm.275780 Electron transportMm.20841 ATP synthesisMm.21501 Electron transportMm.29683 Tricarboxylic acid cycleMm.34775 Acetyl-CoA metabolismMm.29939 Sodium ion transportMm.248046 Protein foldingMm.6635 Electron transportMm.158231Mm.27499Mm.141157Mm.41926Mm.425538
Mm.20818 Mm.20914 PhosphorylationMm.305535 Arginine biosynthesisMm.3759 Negative regulation of proliferationMm.28858 ChemotaxisMm.122725 MetabolismMm.925 S phaseMm.276503
Mm.272253 Mm.126534 Regulation of translationMm.4154 Blood coagulationMm.5390 Electron transportMm.24848Mm.41583Mm.49689
Mm.158769 Mm.5189 Phospholipid metabolismMm.30088 Protein folding
Mm.121122 Mm.1963 Nuclear mRNA splicingMm.329083 Male gonad developmentMm.290628 Regulation of transcriptionMm.285906 Regulation of transcriptionMm.19992Mm.119162Mm.281086Mm.34087Mm.390367
Expression profiles of the novel genes were compared to those of other genes,using a variety of heart- and muscle-related microarray data (see Materials andmethods). Genes with transcriptional correlation, showing expression profilessimilar to those of the novel genes, are listed.
Fig. 6. Calcium-binding assay of the Mm.20818 protein. GST-fusion proteinswere overexpressed in E. coli, resolved in by SDS–PAGE and stained withCoomassie brilliant blue (CBB). Autoradiogram of 45Ca2+overlay is shown at theright. A band indicated by an arrow corresponds to the expected size of the fusionprotein (∼46 kDa). This GST-Mm.20818 protein was found to be insoluble,preventing affinity purification of the fusion protein. Thus, total proteinsincluding the fusion protein were used for analysis. LaneM, size marker; Lane 1,proteins from sarcoplasmic reticulum of mouse heart; Lane 2, purified GSTprotein; Lane 3, total cell lysates with GST (only) expressed from the transfectionvector (a control); Lane 4, total cell lysates from cells where the transfectionvector carried a non-induced GST-Mm.20818 gene; Lane 5, as Lane 4, but withan induced GST-Mm.20818 gene.
104 I. Park et al. / Journal of Molecular and Cellular Cardiology 43 (2007) 93–106
development. As the heart develops, some genes increase inexpression while others do not. In mice, the number of cardio-myocytes increases continuously for up to 3 weeks after birth.Although hyperplasia ends early, a long-lasting capacity forDNA replication causes binucleated cardiomyocytes to formand finally induces hypertrophy. Genes with differential expres-sion during growth might have special functions at specificdevelopment stages. Northern blot analysis, critical but usuallyexcluded in large-scale studies, revealed important character-
istics of the genes at the transcription level. New informationabout transcript size and isoform distribution was obtained. Thedata convincingly indicate that the putative genes are bothauthentic and significant.
Mammalian cells are highly compartmentalized and aprotein's localization is tightly related to its function. Despiterapid expansion of transcript databases, bioinformatic analysesalone are not sufficient enough to assign function to novel genes[21]. We examined the intracellular localization of the novelproteins, using the GFP fusion protein approach. In six of thenovel genes, the localization data were matched to theinformation obtained from in silico analysis. In the othergenes with the data contradicting the in silico predictions, thebioinformatic analysis may be unable to make accuratepredictions for these genes. Alternatively, GFP may haveabnormal effects on targeting signals of the proteins. Collec-tively, our results of transient transfection provided critical dataon cellular localization of proteins encoded by the novel genesand provided a firm basis for further study on the genes.
To gain an insight into gene network involving the novelgenes and acquire detailed information on gene function in cells,we selected neighboring genes, defined as genes that are co-regulated under a particular set of conditions. To define suchgenes, we used the GEO database, an online resource for geneexpression data browsing, query and retrieval. Using GEOmicroarray data, and with the Python language, we detectedseveral genes with similar expression patterns under variousconditions. The Python script is a powerful tool in the systematicmanagement of large amounts of data, such as may be found inthe UniGene and GEO databases. From correlation analyses, weshowed that several significant genes were related. These genesmay function similarly in various complexes. Alternatively, thegenes may share common upstream processes.
We attempt to categorize the novel genes based on our invitro and in silico findings. It should be noted that the in vitro
105I. Park et al. / Journal of Molecular and Cellular Cardiology 43 (2007) 93–106
data confirm the bioinformatic predictions in some but not all ofthe novel genes and should extend the in silico data in futureinvestigations. The first group (Mm.261329 and Mm.20818)consists of potential calcium-binding proteins with EF-handmotifs. Indeed, we performed a Ca2+overlay assay with theMm.20818 protein and found for the first time that this novelprotein binds to Ca2+. Ca2+, present in all cellular compart-ments, plays an essential role in the regulation of a wide varietyof cellular functions in heart [25,26]. In muscle cells, the sar-coplasmic reticulum stores Ca2+, regulates intracellular Ca2+
levels and thus is central to excitation-contraction process [27].The novel genes may be involved in cardiac contraction orsignal transduction. Consistently, the human ortholog ofMm.261329 has been named as myosin regulatory light chainMRLC2. The second-group genes (Mm.41877, Mm.27886,Mm.269736, Mm.41719, Mm.30113 and Mm.272253) arelikely to be related with mitochondrial regulation. Mitochondriahave an important role in cardiac contraction; hence, thesegenes may be required for mitochondrial metabolism, perhapsas components of the electron transfer complex [28]. In fact, 4of them (Mm.41877, Mm.27886, Mm.30113 and Mm.272253)have been recently named as components localized inmitochondria in the UniGene database (see Results). Inaddition, the human homolog of Mm.269736 has been givena gene name, ubiquinol–cytochrome c reductase complex 7.2-kDa protein. However, they were found to be located in nucleusand/or cytoplasm in this study. Thus, to answer whether the insilico annotations are imprecise or the ability of the proteins tolocalize correctly is abolished in the GFP fusion proteins willrequire further investigation. The third-group genes (Mm.138832, Mm.38813 and Mm29342) may have functions in theareas of gene transcriptional regulation or nuclear integrity. Theother three novel genes should arguably not be clustered into agroup, as the genes share no obvious expression or computa-tional pattern. Two of them, Mm.158769 and Mm.121122, arepredicted to be implicated in cell structure and transmembranetransport, respectively.
In conclusion, identification of genes expressed in cardiacmuscle is crucial to understanding of the molecular basis ofcardiac function. Our systematic and integrative approachesexplored the genomic, transcriptional and protein expressioncharacteristics of these genes and correlated genes. Our full-scale study of novel cardiac genes should be a large resourceand a firm basis for future investigation into functional charac-terization of the genes, leading to the elucidation of variousmechanisms underlying cardiac function.
Acknowledgments
This work was supported by a grant from Korean SystemsBiology Research grant, M10503010001-06N0301-00110,from Korea Ministry of Science and Technology.
References
[1] SchaubMC, Hefti MA, ZauggM. Integration of calcium with the signalingnetwork in cardiac myocytes. J Mol Cell Cardiol 2006;41:183–214.
[2] Liew CC, Hwang DM, Wang RX, Ng SH, Dempsey A, Wen DH, et al.Construction of a human heart cDNA library and identification of car-diovascular based genes (CVBest). Mol Cell Biochem 1997;172:81–7.
[3] Bortoluzzi S, d'Alessi F, Danieli GA. A computational reconstruction ofthe adult human heart transcriptional profile. J Mol Cell Cardiol2000;32:1931–8.
[4] Plageman Jr TF, Yutzey KE. Microarray analysis of Tbx5-induced genesexpressed in the developing heart. Dev Dyn 2006;235:2868–80.
[5] Nanni L, Romualdi C, Maseri A, Lanfranchi G. Differential geneexpression profiling in genetic and multifactorial cardiovascular diseases.J Mol Cell Cardiol 2006;41:934–48.
[6] Roy S, Khanna S, Kuhn DE, Rink C, Williams WT, Zweier JL, et al.Transcriptome analysis of the ischemia-reperfused remodeling myocar-dium: temporal changes in inflammation and extracellular matrix. PhysiolGenomics 2006;25:364–74.
[7] Lamirault G, Gaborit N, Le Meur N, Chevalier C, Lande G, Demolombe S,et al. Gene expression profile associated with chronic atrial fibrillation andunderlying valvular heart disease in man. J Mol Cell Cardiol 2006;40:173–84.
[8] Kittleson MM, Hare JM. Molecular signature analysis: using themyocardial transcriptome as a biomarker in cardiovascular disease. TrendsCardiovasc Med 2005;15:130–8.
[9] Kittleson MM, Minhas KM, Irizarry RA, Ye SQ, Edness G, Breton E, et al.Gene expression analysis of ischemic and nonischemic cardiomyopathy:shared and distinct genes in the development of heart failure. PhysiolGenomics 2005;21:299–307.
[10] Ueno S, Ohki R, Hashimoto T, Takizawa T, Takeuchi K, Yamashita Y, et al.DNA microarray analysis of in vivo progression mechanism of heartfailure. Biochem Biophys Res Commun 2003;307:771–7.
[11] Anisimov SV, Boheler KR. Aging-associated changes in cardiac geneexpression: large scale transcriptome analysis. Adv Gerontol 2003;11:67–75.
[12] Zhao M, Chow A, Powers J, Fajardo G, Bernstein D. Microarray analysisof gene expression after transverse aortic constriction in mice. PhysiolGenomics 2004;19:93–105.
[13] Pontius JU, Wagner L, Schuler GD. National Center for BiotechnologyInformation. Bethesda, MA: National Library of Medicine, 2003.
[14] Hong S, Choi I, Woo JM, Oh J, Kim T, Choi E, et al. Identification andintegrative analysis of 28 novel genes specifically expressed anddevelopmentally regulated in murine spermatogenic cells. J Biol Chem2005;280:7685–93.
[15] Oh J, Lee J, Woo JM, Choi E, Park I, Han C, et al. Systematic iden-tification and integrative analysis of novel genes expressed specifi-cally or predominantly in mouse epididymis. BMC Genomics 2006;7:314.
[16] Lopez C, Jorge V, Piegu B, Mba C, Cortes D, Restrepo S, et al. A unigenecatalogue of 5700 expressed genes in cassava. Plant Mol Biol2004;56:541–54.
[17] Claycomb WC, Lanson Jr NA, Stallworth BS, Egeland DB, Delcarpio JB,Bahinski A, et al. HL-1 cells: a cardiac muscle cell line that contracts andretains phenotypic characteristics of the adult cardiomyocyte. Proc NatlAcad Sci U S A 1998;95:2979–84.
[18] Maruyama K, Mikawa T, Ebashi S. Detection of calcium binding proteinsby 45Ca autoradiography on nitrocellulose membrane after sodium dodecylsulfate gel electrophoresis. J Biol Chem 1984;95:511–9.
[19] Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P,et al. Initial sequencing and comparative analysis of the mouse genome.Nature 2002;420:520–62.
[20] Cea LB. Natriuretic peptide family: new aspects. Curr Med Chem2005;3:87–98.
[21] Simpson JC, Wellenreuther R, Poustka A, Pepperkok R, Wiemann S.Systematic subcellular localization of novel proteins identified by large-scale cDNA sequencing. EMBO Rep 2000;1:287–92.
[22] Sun L, Trausch-Azar JS, Ciechanover A, Schwartz AL. Ubiquitin-proteasome-mediated degradation, intracellular localization, and proteinsynthesis of MyoD and Id1 during muscle differentiation. J Biol Chem2005;280:26448–56.
106 I. Park et al. / Journal of Molecular and Cellular Cardiology 43 (2007) 93–106
[23] Barbosa MS, Wettstein FO. Identification and characterization of theCRPV E7 protein expressed in COS-7 cells. Virology 1988;165:134–40.
[24] Grabarek Z. Structural basis for diversity of the EF-hand calcium-bindingproteins. J Mol Biol 2006;359:509–25.