Laboratory of Genome Database Laboratory of Sequence Analysis

Owing to continuous developments of high-throughput experimental technologies,ever-increasing amounts of data are being generated in functional genomics andproteomics. We are developing a new generation of databases and computationaltechnologies, beyond the traditional genome databases and sequence analysistools, for making full use of such large-scale data in biomedical applications, espe-cially for elucidating cellular functions as behaviors of complex interaction systems.

1. Comprehensive repository for communitygenome annotation

Toshiaki Katayama, Mari Watanabe and Mi-noru Kanehisa

KEGG DAS is an advanced genome databasesystem providing DAS (Distributed AnnotationSystem) service for all organisms in theGENOME and GENES databases in KEGG(Kyoto Encyclopedia of Genes and Genomes).Currently, KEGG DAS contains genome se-quences of over 300 organisms. The KEGG DASserver provides gene annotations linked to theKEGG PATHWAY and LIGAND databases, aswell as the SSDB database containing paralog,ortholog and motif information. In addition tothe coding genes, information of non-codingRNAs predicted using Rfam database is alsoprovided to fill the annotation of the intergenicregions of the genome. We have been develop-

ing the server based on several open source soft-wares including BioRuby, BioPerl, BioDAS andGMOD/GBrowse to make the system consistentwith the existing open standards. The contentsof the KEGG DAS database can be accessedgraphically in a web browser using GBrowseGUI (graphical user interface) and also pro-gramatically by the DAS protocol. The DAS,which is an XML over HTTP data retrievingprotocol, enables the user to write various kindsof automated programs for analyzing genomesequences and annotations. For example, bycombining KEGG DAS with KEGG API, a pro-gram to retrieve upstream sequences of a givenset of genes which have similar expression pat-terns on the same pathway, can be written veryeasily. GBrowse, the graphical interface, enablesuser to browse, search, zoom and visualize aparticular region of the genome. Moreover, us-ers are also able to add their own annotationsonto the GBrowse view by providing another

Human Genome Center

Laboratory of Genome DatabaseLaboratory of Sequence Analysisゲノムデータベース分野シークエンスデータ情報処理分野

Professor Minoru Kanehisa, Ph.D.Research Associate Toshiaki Katayama, M.Sc.Research Associate Shuichi Kawashima, M.Sc.Lecturer Tetsuo Shibuya, Ph.D.Research Associate Michihiro Araki, Ph.D.

教授理学博士金久實助手理学修士片山俊明助手理学修士川島秀一講師理学博士渋谷哲朗助手薬学博士荒木通啓

134

DAS server or by simply uploading their owndata as a file. This functionality enables re-searchers to add various annotations on thegenome and by sharing their annotations withthe community they can continuously refine thegenome annotation, so-called “community anno-tation.” The KEGG DAS is weekly updated andfreely available at http://das.hgc.jp/.

2. SOAP/WSDL interface for the KEGG sys-tem

Shuichi Kawashima, Toshiaki Katayama andMinoru Kanehisa

We have continued to develop KEGG API, aweb service to facilitate usability of the KEGGsystem. KEGG is a suite of databases and associ-ated software, integrating our current knowl-edge of molecular interaction networks in bio-logical processes (PATHWAY database), the in-formation about the genomic space (GENES da-tabase), and information about the chemicalspace (LIGAND database). KEGG API providesvaluable means to retrieve various kinds of in-formation stored in the KEGG and has becomean increasingly popular mode of access. Recentkey changes include the following: (1) Severalnew methods to retrieve the information con-cerning LIGAND and GLYCAN database havebeen added. (2) Methods to retrieve the informa-tion concerning KO (KEGG Orthology) havebeen re-organized. (2) bconv method have beenadded to exchange external database ID (e.g.NCBI Gene-ID) to the KEGG ID. (3) Methods toutilize “entry” element information in KGML(KEGG Markup Language) have been added. (4)C# programming language client was sup-ported. The KEGG API is available at http://www.genome.jp/kegg/soap/.

3. High performance database retrieval sys-tem

Kazutomo Ushijima, Chiharu Kawagoe, Toshi-aki Katayama, Shuichi Kawashima, KentaNakai, Minoru Kanehisa

Recently, the number of entries in biologicaldatabases is exponentially increasing year byyear. For example, there were 10,106,023 entriesin the GenBank database in the year 2000, whichhas now grown to 49,498,755 (Release 150). Inorder for such a vast amount of data to besearched at a high speed, we have developed ahigh performance database entry retrieval sys-tem, named HiGet. The HiGet system is con-structed on the HiRDB, a commercial ORDBMS(Object-oriented Relational Database Manage-

ment System) developed by Hitachi, Ltd. It ispublicly accessible on the Web page at http://higet.hgc.jp/ or SOAP based web service athttp://higet.hgc.jp/soap/. HiGet can executefull text search on various biological databases.In addition to the original plain format, the sys-tem contains data in the XML format in order toprovide a field specific search facility. When acomplicated search condition is issued to thesystem, the search processing is executed effi-ciently by combining several types of indices toreduce the number of records to be processedwithin the system. Current searchable databasesare GenBank, UniProt, Prosite, OMIIM, PDB andRefSeq. We are planning to include other valu-able databases and also planning to develop aninter-database search interface and a complexsearch facility combining keyword search andsequence similarity search.

4. EGassembler: web server for large-scaleclustering and assembling ESTs andgenomic DNA fragments

Ali Masoudi-Nejad, Shuichi Kawashima, Koi-chiro Tonomura, Masanori Suzuki, MinoruKanehisa

EST sequencing has proven to be an economi-cally feasible alternative for gene discovery inspecies lacking a draft genome sequence. Ongo-ing large-scale EST sequencing projects feel theneed for bioinformatics tools to facilitate uni-form ESTs handling. This brings about a re-newed importance to a universal tool for proc-essing and functional annotation of large sets ofESTs in order to cover the complete transcrip-tome of an organism. EGassembler (http://egassembler.hgc.jp/)) is a web server, whichprovides an automated as well as a user-customized analysis tool for cleaning, repeatmasking, vector trimming, organelle masking,clustering and assembling the of ESTs andgenomic fragments. It is also designed to serveas a standalone web application for each one ofthose processes. The web server is freely pub-licly available and provides the community aunique all-in-one online application web servicefor large scale ESTs and genomic DNA cluster-ing and assembling, especially for EST process-ing and annotation projects.

5. SSS: a new sequence similarity searchservice

Toshiaki Katayama, Kazuhiro Ohi, MinoruKanehisa

There are various services in the world to find

135

similar sequences from the database, such as thefamous BLAST service provided at NCBI. How-ever, the method to search and the database tobe searched could not be added from outside.To provide our super computer resources at theHuman Genome Center to the research commu-nity, we started to develop a new service for thesequence similarity search, SSS. In SSS, user canselect the search algorithm from BLAST, FASTA,SSEARCH, TRANS and EXONERATE. This va-riety of options is unique among the publicservices. Then user is prompted to select appro-priate database depending on the algorithm se-lected and the search is executed. On the back-end, we implemented the search system on theSun Grid Engine to utilize efficient resources ondistributed computers. As a result, we are ableto provide time consuming services such asTRANS and EXONERATE in addition to thepopular algorithms. The SSS service is freelyavailable at http://sss.hgc.jp/.

6. Integrative analysis of chemical andgenomic information on the biosyntheticcircuits of medicinal natural products

Michihiro Araki, Tetsuo Shibuya, Kohichi Sue-matsu, and Minoru Kanehisa

Medicinal natural products have been the ma-jor sources of bioactive compounds with diversepharmacological activities, and are enzymicallysynthesized as secondary metabolites for specificbiological purposes. In order to make full use ofthe potential of natural products as researchtools as well as drug leads on the context ofsynthetic biology, it is of great importance tounderstand the biosynthetic strategies with theintegrative computational analyses of chemicaland genomic information. An increasing numberof such information become available to allowus to extract the design principles of naturalproducts coded on genomic information. Natu-ral products are composed of a series of molecu-lar building blocks, such as fatty acids andamino acids, which can be regarded as minimalunits hierarchically organized into the biosyn-thetic circuits. We define molecular buildingblocks required for describing the chemical in-formation of natural products. Each naturalproduct is then expressed as a combination ofmolecular building blocks with correspondingenzymatic information as links between buildingblocks to be collected in a database. The knowl-edge database constructed from various re-sources enables us to identify the system struc-tures of the biosynthetic circuits. We also de-velop a computational method to extract distinc-tive network structures consisted of both chemi-

cal and genomic information, which will be use-ful for designing the biosynthetic circuits to helpmetabolic engineering of novel bioactive com-pounds.

7. Development of algorithms for biosyn-thetic process analysis

Kohichi Suematsu, Tetsuo Shibuya, MichihiroAraki, and Minoru Kanehisa

We are developing algorithms for identifyingbiosynthetic process of some given medicinalproducts by utilizing the database of identifiedbuilding blocks in biosynthetic processes. Givena set of building block graphs, the problem is tofind the most reasonable decomposition of agraph where each decomposed subgraph is thesame or similar to some of the building blockgraphs. We have developed new efficient algo-rithms and tools based on the algorithms for theproblem, though the problem is a very difficultNP-hard problem.

8. Development of algorithms for proteinstructure indexing

Tetsuo Shibuya

Protein structure analysis is one of the mostimportant research issues in the post-genomicera, and faster and more accurate query datastructures for such 3-D structures are highly de-sired for research on proteins. We proposed anew data structure for indexing protein 3-Dstructures. There are many efficient indexingstructures for strings, but it has been consideredvery difficult to design such sophisticated datastructures against 3-D structures like proteins.By using the data structure, we can search effi-ciently for all of their substructures whoseRMSD (root mean square distance) or URMSD(unit-vector root mean square distances) to somegiven query 3-D structure are not larger than agiven bound. Our data structure can be storedin O(n) space, where n is the sum of lengths ofthe set of proteins. We propose an efficient con-struction algorithm for it and a quasi-linear timesearch algorithm. Further algorithms for moreflexible structure searching/function prediction/clustering/motif finding based on the datastructure is also under development.

9. Construction of a knowledge base for trac-ing drug evolution

Michihiro Araki and Minoru Kanehisa

Current drugs are mostly derived by modifi-

136

cation of known drug structures or from leadstructures to be optimized for targeting newmolecules or obtaining improved efficacy. Re-cent computational approaches have been ana-lyzing the chemical properties of drugs, leadcompounds and seed compounds to provide dif-ferent kinds of chemical rules for ‘drug-likeness’. The chemical rules based on thechemical property distributions are very usefulfor filtering drug-like molecules out of empiri-cally synthesized chemical libraries, but do not

really explain how to modify known drugs orlead structures to new drug candidates. To ex-plore a design rule in drug development, it isnecessary to focus on the empirical modificationprocesses to construct a knowledge base fortracing the chemical evolutions. We start collect-ing data on the drug evolutions from databasesand literatures, and part of the data has alreadybeen implemented on the drug structure mapsin the KEGG DRUG database.

Publications

Tamori A, Yamanishi Y, Kawashima S, KanehisaM, Enomoto M, Tanaka H, Kubo S, Shiomi S,Nishiguchi S. Alteration of gene expression inhuman hepatocellular carcinoma with inte-grated hepatitis B virus DNA. Clin. CancerRes. 11: 5821-5826, 2005

Yamada, T., Kawashima, S., Mamitsuka, H.,Goto, S., Kanehisa, M. Comprehensive analy-sis and prediction of synthetic lethality usingsubcellular locations. Genome Inform SerWorkshop Genome Inform. 2005, 16(1): 150-158, 2005

Okuda S, Kawashima S, Goto S, Kanehisa M.Conservation of gene co-regulation betweentwo prokaryotes: Bacillus subtilis and Es-cherichia coli. Genome Inform Ser WorkshopGenome Inform. 2005, 16(1): 116-124, 2005

Honda, W., Kawashima, S., Kanehisa, M. Auto-immune diseases and peptide variations.Genome Inform. Ser. Workshop Genome In-form. 2005, 16(1): 272-280, 2005

Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Kata-yama, T., Araki, M., Hirakawa, M. Fromgenomics to chemical genomics: new develop-

ments in KEGG. Nucleic Acids Res. 34: D354-357, 2006

Okuda, S., Katayama, T., Kawashima, S., Goto,S., Kanehisa, M. ODB: a database of operonsaccumulating known operons across multiplegenomes. Nucleic Acids Res. 34: D358-362,2006

Hashimoto, K., Goto, S., Kawano, S., Aoki-Kinoshita, K.F., Ueda, N., Hamajima, M.,Kawasaki, T., Kanehisa, M. KEGG as a gly-come informatics resource. Glycobiology, inpress

Shibuya, T., Indexing Structures for Biomolecu-lar Structures, The First Japan-Taiwan Bilat-eral Symposium on Bioinformatics, to appear.

Shibuya, T., Geometric Suffix Tree: A New In-dex Structure for Protein 3-D Structures, IPSJSIG Notes SIGAL 105, to appear.

Sakai, H., Murakami, H., Aburatani, S., Shibuya,T., Horimoto, K., Kanehisa, M. Bayesian Ap-proach for Sequence Pattern Search in TissueSpecific Alternative Splicing. The 9th WorldMulti-Conference on Systemics, Cyberneticsand Informatics, Vol. VIII: 25-30, 2005

137

The aim of the research at this laboratory is to establish computational methodolo-gies for discovering and interpreting information of nucleic acid sequences, pro-teins and some other experimental data arising from researches in Genome Sci-ence. Our current concern is focused on Computational Systems Biology and itsrelated computational techniques. Apart from the research activity, the laboratoryhas been providing bioinformatics software tools and has been taking a leadingpart in organizing an international forum for Genome Informatics.

1. Computational Systems Biology

a. Estimating gene regulatory networks andprotein-protein interactions of Saccharo-myces cerevisiae from multiple genome-wide data

Naoki Nariai, Yoshinori Tamada, Seiya Imoto,Satoru Miyano

Biological processes in cells are properly per-formed by gene regulations, signal transductionsand interactions between proteins. To under-stand such molecular networks, we proposed astatistical method to estimate gene regulatorynetworks and protein-protein interaction net-works simultaneously from DNA microarraydata, protein-protein interaction data and othergenome-wide data. We unify Bayesian networksand Markov networks for estimating gene regu-latory networks and protein-protein interactionnetworks according to the reliability of each bio-logical information source. Through the simulta-neous construction of gene regulatory networksand protein-protein interaction networks of Sac-charomyces cerevisiae cell cycle, we predict the

role of several genes whose functions are cur-rently unknown. By using our probabilisticmodel, we can detect false positives of highthrough-put data, such as yeast two hybriddata. In a genome-wide experiment, we findpossible gene regulatory relationships andprotein-protein interactions between large pro-tein complexes that underlie complex regulatorymechanisms of biological processes.

b. Utilizing evolutionary information and geneexpression data for estimating gene net-works with Bayesian network models

Yoshinori Tamada, Hideo Bannai1, SeiyaImoto, Toshiaki Katayama, Minoru Kanehisa,Satoru Miyano: 1Kyushu University

Since microarray gene expression data do notcontain sufficient information for estimating ac-curate gene networks, other biological informa-tion has been considered to improve the esti-mated networks. Recent studies have revealedthat highly conserved proteins that exhibit simi-lar expression patterns in different organisms,have almost the same function in each organ-

Human Genome Center

Laboratory of DNA Information AnalysisDNA情報解析分野

Professor Satoru Miyano, Ph.D.Assistant Professor Seiya Imoto, Ph.D.Assistant Professor Masao Nagasaki, Ph.D.Assistant Professor Ryo Yoshida, Ph.D.

教授理学博士宮野悟助手博士（数理学）井元清哉助手博士（理学）長 � 正朗特任助手博士（理学）吉田亮

138

ism. Such conserved proteins are also known toplay similar roles in terms of the regulation ofgenes. Therefore, this evolutionary informationcan be used to refine regulatory relationshipsamong genes, which are estimated from geneexpression data. We proposed a statisticalmethod for estimating gene networks from geneexpression data by utilizing evolutionarily con-served relationships between genes. Our methodsimultaneously estimates two gene networks oftwo distinct organisms, with a Bayesian networkmodel utilizing the evolutionary information sothat gene expression data of one organism helpsto estimate the gene network of the other. Weshow the effectiveness of the method throughthe analysis on Saccharomyces cerevisiae andHomo sapiens cell cycle gene expression data.Our method was successful in estimating genenetworks that capture many known relation-ships as well as several unknown relationshipswhich are likely to be novel.

c. Estimating gene networks from expressiondata and binding location data via Booleannetworks

Osamu Hirose, Naoki Nariai, YoshinoriTamada, Hideo Bannai1, Seiya Imoto, SatoruMiyano

We proposed a computational method for es-timating gene networks by the Boolean networkmodel. The Boolean networks have some practi-cal problems in analyzing DNA microarray geneexpression data: One is the choice of thresholdvalue for discretization of gene expression data,since expression data take continuous variables.The other problem is that it is often the case thatthe optimal gene network is not determineduniquely and it is difficult to choose the optimalone from the candidates by using expressiondata only. To solve these problems, we use thebinding location data produced by Lee et al .(Science 298: 799-804, 2002) together with ex-pression data and illustrate a strategy to decidethe optimal threshold and gene network. Toshow the effectiveness of the proposed method,we analyze Saccharomyces cerevisiae cell cyclegene expression data as an application.

d. Error tolerant model for incorporating bio-logical knowledge with expression data inestimating gene networks

Seiya Imoto, Tomoyuki Higuchi2, Takao Goto,Satoru Miyano : 2Instititute of StatisticalMathematics

We proposed a novel statistical method for es-

timating gene networks based on microarraygene expression data together with informationfrom biological knowledge databases. Althougha large amount of gene regulation informationhas already been stored in some biological data-bases, there are still errors and missing facts dueto experimental problems and human errors.Therefore, we cannot blindly use them for un-derstanding gene regulation and a robust proce-dure with a statistical model for using such da-tabase information is required. By using geneexpression data, we provide a probabilisticframework of a joint learning model for repair-ing database information and for estimating agene network based on dynamic Bayesian net-works, simultaneously. To show the effective-ness of the proposed method, we analyze Sac-charomyces cerevisiae cell-cycle gene expressiondata together with KEGG information.

2. Drug Target Gene Discovery with GeneNetworks

a. Computational strategy for discoveringdruggable gene networks from genome-wide RNA expression profiles

Seiya Imoto, Yoshinori Tamada, HiromitsuAraki3, Kaori Yasuda3, Cristin G. Print4,Stephen D. Sharnock-Jones5, Deborah Sanders5,Christopher J. Savoie3, Kousuke Tashiro1, Sa-toru Kuhara1, Satoru Miyano: 3Gene NetworksInternational, 4University of Auckland, 5Cam-bridge University

We proposed a computational strategy fordiscovering gene networks affected by a chemi-cal compound. Two kinds of DNA microarraydata are assumed to be used: One dataset isshort time-course data that measure responsesof genes following an experimental treatment.The other dataset is obtained by several hun-dred single gene knock-downs. These two da-tasets provide three kinds of information; (i) Agene network is estimated from time-coursedata by the dynamic Bayesian network model,(ii) Relationships between the knocked-downgenes and their regulatees are estimated directlyfrom knock-down microarrays and (iii) A genenetwork can be estimated by gene knock-downdata alone using the Bayesian network model.We proposed a method that combines thesethree kinds of information to provide an accu-rate gene network that most strongly relates tothe mode-of-action of the chemical compound incells. This information plays an essential role inpharmacogenomics. We illustrate this methodwith an actual example where human endothe-lial cell gene networks were generated from a

139

novel time course of gene expression followingtreatment with the drug fenofibrate, and from270 novel gene knock-downs. Finally, we suc-ceeded in inferring the gene network related toPPAR-α, which is a known target of fenofibrate.

b. Identifying drug active pathways fromgene networks estimated by gene expres-sion data

Yoshinori Tamada, Seiya Imoto, KousukeTashiro1, Satoru Kuhara1, Satoru Miyano

We present a computational method for iden-tifying genes and their regulatory pathways in-fluenced by a drug, using microarray gene ex-pression data collected by single gene disrup-tions and drug responses. The automatic identi-fication of such genes and pathways in organis-ms’ cells is an important problem for pharmaco-genomics and the tailor-made medication. Ourmethod estimates regulatory relationships be-tween genes as a gene network from microarraydata of gene disruptions with a Bayesian net-work model, then identifies the drug affectedgenes and their regulatory pathways on the esti-mated network with time course drug responsemicroarray data. Compared to the existingmethod, our proposed method can identify notonly the drug affected genes and the druggablegenes, but also the drug responses of the path-ways. For evaluating the proposed method, weconducted simulated examples based on artifi-cial networks and expression data. Our methodsucceeded in identifying the pseudo drug af-fected genes and pathways with the high cover-age greater than 80％. We also applied ourmethod to Saccharomyces cerevisiae drug responsemircorray data. In this real example, we identi-fied the genes and the pathways that are poten-tially influenced by a drug. These computationalexperiments indicate that our method success-fully identifies the drug-activated genes andpathways, and is capable of predicting undesir-able side effects of the drug, identifying noveldrug target genes, and understanding the un-known mechanisms of the drug.

3. Modeling and Simulation of BiologicalPathwas

a. Automatic drawing of biological networksusing cross cost and subcomponent data

Mitsuru Kato, Masao Nagasaki, Atsushi Doi,Satoru Miyano

Automatic graph drawing function forbiopathways is indispensable for biopathway

databases and software tools. We proposed anew grid-based algorithm for biopathway layoutthat considers (a) edge-edge crossing, (b) node-edge crossing, (c) distance measures betweennodes, as its costs, and (d) subcelluar localiza-tion information from Gene Ontology, as itsconstraints. For this algorithm, we newly definecost functions, devise an efficient method forcomputing the costs (a)-(c) by employing a ma-trix representing the difference between two lay-outs, and take a steepest descent method forsearching locally optimal solutions and multi-step layout method for finding better solutions.We implemented this algorithm on Cell Illustra-tor which is a biopathway modeling and simu-lation software. The algorithm is applied to asignal transduction pathway of apoptosis in-duced by fas ligand. We compare our layoutwith that of the grid-based algorithm by Li andKurata (Bioinformatics 21 (9): 2036. 2042, 2005).The result shows that our algorithm reducesedge-edge crossings and node-edge crossings,and solves the “isolated island problem”, that is,some groups of nodes are apart from othernodes in the layout. As a result, the biologicalunderstandability of the layout is fairly im-proved.

c. Simulation based validation of the p53transcriptional activity with hybrid func-tional Petri net

Masao Nagasaki, Atsushi Doi, Hiroshi Mat-suno6, Satoru Miyano

MDM2 and p19ARF are essential proteins incancer pathways forming a complex with pro-tein p53 to control the transcriptional activity ofprotein p53. It is confirmed that protein p53loses its transcriptional activity by forming thefunctional dimer with protein MDM2. However,it is still unclear that protein p53 keeps its tran-scriptional activity when it forms the trimerwith proteins MDM2 and p19ARF. We have ob-served mutual behaviors among genes p53,MDM2, p19ARF and their products on a com-putational model with hybrid functional Petrinet (HFPN) which is constructed based on infor-mation described in the literature. The simula-tion results suggested that protein p53 shouldhave the transcriptional activity in the forms ofthe trimer of proteins p53, MDM2, and p19ARF.This paper also discusses the advantages ofHFPN based modeling method in terms of path-way description for simulations.

140

d. A new regulatory interactions suggestedby simulations for circadian genetic con-trol mechanism in mammals

Hiroshi Matsuno6, Shin-Ichi T. Inouye6, YasukiOkitsu6, Yasushi Fujii6, Satoru Miyano: 6Yama-guchi University

We employed hybrid functional Petri net toanalyze the circadian genetic control mecha-nism, which consists of loops of clock genes andgenerates endogenous near 24 hour rhythms inmammals. Based on the available biologicaldata, we constructed a model and, by using CellIllustrator, we performed computer simulationsfor time courses of clock gene transcription andtranslation. Although the initial model success-fully reproduced most of the circadian geneticcontrol mechanisms, two discrepancies remaineddespite wide selection of the parameters. Wefound that addition of a hypothetical path intothe initial model successfully simulated timecourses and phase relations among clock genes.This also demonstrates usefulness of hybridfunctional Petri net approach to biological sys-tem analysis.

e. Prediction of debacle points for robust-ness of biological pathways by using re-current neural networks

Hironori Kitakaze7 , Hiroshi Matsuno6 ,Nobuhiko Ikeda6, Satoru Miyano: 7OshimaCollege of Maritime Technology, 8TokuyamaCollege of Technology

Living organisms have ingenious controlmechanisms in which many molecular interac-tions work for keeping their normal activitiesagainst disturbances inside and outside of them.However, at the same time, the control mecha-nism has debacle points at which the stabilitycan be broken easily. We proposed a newmethod which uses recurrent neural network forpredicting debacle points in a hybrid functionalPetri net model of a biological pathway. Evalu-ation on an apoptosis signaling pathway indi-cates that the rates of 96.5％ of debacle pointsand 65.5％ of non-debacle points can be pre-dicted by the proposed method.

f. Petri net modeling of biological pathways

Masao Nagasaki, Atsushi Doi, Hiroshi Mat-suno6, Satoru Miyano

We have developed a software tool called CellIllustrator (CI) for modeling and simulating bio-logical pathways based on the concept of Petri

net together with an XML format called CellSystem Markup Language (CSML) describingbiological pathways for simulation. This papershows the concepts behind CI and CSML andpresents our computational strategy with themfor systems biology.

4. Algorithmic and Statistical Methods forBioinformatics

a. Prediction of transcriptional terminators inBacillus subtilis and related species

Michiel J.L. de Hoon, Yuko Makita, KentaNakai, Satoru Miyano

In prokaryotes, genes belonging to the sameoperon are transcribed in a single mRNA mole-cule. Transcription starts as the RNA polym-erase binds to the promoter and continues untilit reaches a transcriptional terminator. Some ter-minators rely on the presence of the Rho pro-tein, whereas others function independently ofRho. Such Rho-independent terminators consistof an inverted repeat followed by a stretch ofthymine residues, allowing us to predict theirpresence directly from the DNA sequence. Un-like in Escherichia coli , the Rho protein is dispen-sable in Bacillus subtilis , suggesting a limitedrole for Rho-dependent termination in this or-ganism and possibly in other Firmicutes. Weanalyzed 463 experimentally known terminatingsequences in B. subtilis and found a decisionrule to distinguish Rho-independent transcrip-tional terminators from non-terminating se-quences. The decision rule allowed us to findthe boundaries of operons in B. subtilis with asensitivity and specificity of about 94％. Usingthe same decision rule, we found an averagesensitivity of 94％ for 57 bacteria belonging tothe Firmicutes phylum , and a considerably lowersensitivity for other bacteria. Our analysis showsthat Rho-independent termination is dominantfor Firmicutes in general, and that the propertiesof the transcriptional terminators are conserved.Terminator prediction can be used to reliablypredict the operon structure in these organisms,even in the absence of experimentally knownoperons.

b. ArrayCluster: an analytic tool for cluster-ing, data visualization and module finderon gene expression profiles

Ryo Yoshida, Tomoyuki Higuchi2, Seiya Imoto,Satoru Miyano

One significant challenge of gene expressionprofiling based on microarray technology is to

141

find unknown subtypes of several diseases atthe molecular levels. This task can be addressedby grouping gene expression patterns of the col-lected samples in the basis of a large number ofgenes. Application of commonly used clusteringmethods to such a dataset, however, are likelyto fail due to the overlearning, because the num-ber of samples to be grouped is much smallerthan the data dimension which is equal to thenumber of genes involved in the profiling. Toovercome such difficulty, we developed a novelmodel-based clustering method, refereed to asthe mixed factors analysis. The ArrayCluster is afreely available software to perform the mixedfactors analysis. It provides us some analytictools for clustering DNA microarray experi-ments, data visualization and an automatic de-tector of module transcriptional genes that arerelevant to the calibrated molecular subtypesand so on.

c. Statistical model selection method to ana-lyze combinatorial effects of SNPs and en-vironmental factors for binary disease

Reiichiro Nakamichi, Seiya Imoto, Satoru Miy-ano

We proposed a model selection method to es-timate the relation of multiple SNPs, environ-mental factors and the binary disease trait. Weapplied the combination of logistic regressionand genetic algorithm for this study. The logisticregression model can capture the continuous ef-fects of environments without categorization,which causes the loss of the information. Toconstruct an accurate prediction rule for binarytrait, we adopted Akaike’s information criterion(AIC) to find the most effective set of SNPs andenvironments. That is, the set of SNPs and envi-ronments that gives the smallest AIC is chosen

as the optimal set. Since the number of combi-nations of SNPs and environments is usuallyhuge, we proposed the use of the genetic algo-rithm for choosing the optimal SNPs and envi-ronments in the sense of AIC. We show the ef-fectiveness of the proposed method through theanalysis of the case/control populations of dia-betes, Alzheimer’s disease and obesity patients.We succeeded in finding an efficient set to pre-dict types of diabetes and some SNPs whichhave strong interactions to age while it is notsignificant as a single locus.

d. A weighted profile based method forprotein-RNA interacting residue prediction

Euna Jeong, Satoru Miyano

The prediction of putative RNA-interactingresidues in proteins is an important problem ina field of molecular recognition. We suggest aweighted profile based method for predictingRNA-interacting residues, which utilizes thetrained neural network. Most neural networkshave a learning rule which allows the networkto adjust its connection weights in order to cor-rectly classify the training data. We focus on thenetwork weights that are dependent on thetraining data set and give evidence of which in-puts were more influential in the network. Alarge set of the network weights trained on se-quence profiles is analyzed and qualified. Weexplore the feasibility of utilizing the qualifiedinformation to improve the prediction perform-ance for protein-RNA interaction. Our proposedmethod shows a considerable improvement,which has been applied to the profiles of the PSI-BLAST alignment. Results for predictions usingalternative representations of profile are in-cluded for comparison.

Publications

Ando, T., Konishi, S., Imoto, S. Nonlinear re-gression modeling via regularized radial basisfunction networks. J. Statistical Planning andInference. In press.

De Hoon, Michiel J.L., Makita, Y., Nakai, K., Mi-yano, S. Prediction of transcriptional termina-tors in Bacillus subtilis and related species.PLoS Computational Biology. 1(3): e25, 2005.

Doi, A., Nagasaki, M., Matsuno, H., Miyano, S.Simulation based validation of the p53 tran-scriptional activity with hybrid unctional Petrinet. In Silico Biology. In press.

Heinrich, T., DeLisi, C., Kanehisa, M., Miyano,S. Genome Informatics 16(1) (Universal Acad-

emy Press). 2005.Heinrich, R., Mamitsuka, H., Kanehisa, M., Miy-

ano, S., Takagi, T. Genome Informatics 16 (2)(Universal Academy Press). 2005.

Hirose, O., Nariai, N., Tamada, Y., Bannai, H.,Imoto, S., Miyano, S. Estimating gene net-works from expression data and binding loca-tion data via Boolean networks. Proc. First In-ternational Workshop on Data Mining andBioinformatics (DMBIO2005). Lecture Notes inComputer Science. 3482: 349-356, 2005.

Imoto, S., Higuchi, T., Goto, T., Miyano, S. Errortolerant model for incorporating biologicalknowledge with expression data in estimating

142

gene networks. Statistical Methodology, 3(1), 1-16, 2006.

Imoto, S., Tamada, Y., Araki, H., Yasuda, K.,Print, C.G., Charnock-Jones, S.D., Sanders, D.,Savoie, C.J., Tashiro, K., Kuhara, S., Miyano,S. Computational strategy for discoveringdruggable gene networks from genome-wideRNA expression profiles. Pacific Symposiumon Biocomputing. 11, In press.

Imoto, S., Matsuno, H., Miyano, S. Gene net-works: estimation, modeling and simulation.in R. Eils and A. Kriete (Eds.), ComputationalSystems Biology, Academic Press, 205-228,2005.

Imoto, S., Tamada, Y., Savoie, C.J., Miyano, S.,Analysis of gene networks for drug target dis-covery and validation. in J. Walker and M.Sioud (Eds.), Target Discovery and Validation(a volume of “Methods in Molecular Biology”series), Humana Press, USA. In press.

Jeong, E., Miyano, S., A weighted profile basedmethod for protein-RNA interacting residueprediction. Transactions on ComputationalSystems Biology. In press.

Kato, M., Nagasaki, M., Doi, A., Miyano, S.Automatic drawing of biological networks us-ing cross cost and subcomponent data.Genome Informatics 16(2): 22-31, 2005.

Kitakaze, H., Matsuno, H., Ikeda, N., Miyano, S.Prediction of debacle points for robustness ofbiological pathways by using recurrent neuralnetworks. Genome Informatics. 16(1): 192-202,2005.

Li, C., Suzuki, S., Ge, Q.-W., Nakata, M., Mat-suno, H., Miyano, S. On modeling and ana-lyzing signaling pathways with inhibitory in-teractions based on Petri net. Proc. The 2005Internatinal Joint Conference of InCoB, AASBiand KSBI (BIOINFO 2005), 348-353, 2005.

Makita, Y., De Hoon, M.J., Ogasawara, N., Miy-ano, S., Nakai, K. Bayesian joint prediction ofassociated transcription factors in Bacillussubtilis. Pacific Symposium on Biocomputing.10: 507-518, 2005.

Matsuno, H., Inouye, S.-T., Okitsu, Y., Fujii, Y.,Miyano, S. A new regulatory interactions sug-gested by simulations for circadian geneticcontrol mechanism in mammals. J. Bioinfor-matics and Computational Biology, In press.

Miyano, S., Mesirov, J.P., Kasif, S., Istrail, S.,Pevzner, P.A., Waterman, M.S.: Proc. 9th An-nual International Conference on Research inComputational Molecular Biology (RECOMB2005), Lecture Notes in Bioinformatics(Springer). Vol. 3500, 2005.

Nagasaki, M., Doi, A., Matsuno, H., Miyano, S.Petri net modeling of biological pathways.Proc. Algebraic Biology 2005 (Universal Acad-emy Press). 19-31, 2005.

Nagasaki, M., Doi, A., Matsuno, H., Miyano, S.Computational modeling of biological proc-esses with Petri net based architecture. In“Bioinformatics Technologies” (Y.P. Chen, ed)(Springer Press). 179-243, 2005.

Nakamichi, R., Imoto, S., Miyano, S. Statisticalmodel selection method to analyze combinato-rial effects of SNPs and environmental factorsfor binary disease. International Journal onArtificial Intelligence Tools, In press.

Nariai, N., Tamada, Y., Imoto, S., Miyano, S. Es-timating gene regulatory networks andprotein-protein interactions of Saccharomycescerevisiae from multiple genome-wide data.Bioinformatics. 21: ii206-ii212, 2005.

Ohtsubo, S., Iida, A., Nitta, K., Tanaka, T.,Yamada, R., Ohnishi, Y., Maeda, S., Tsunoda,T., Takei, T., Obara, W., Akiyama, F., Ito, K.,Honda, K., Uchida, K., Tsuchiya, K., Yumura,W., Ujiie, T., Nagane, Y., Miyano, S., Suzuki,Y., Narita, I., Gejyo, F., Fujioka, T., Nihei, H.,Nakamura, Y. Association of a single-nucleotide polymorphism in the immuno-globulin mu-binding protein 2 gene with im-munoglobulin A nephropathy. J. Hum. Genet.50(1): 30-35, 2005.

Ott, S., Hansen, A., Kim, S.-Y., and Miyano, S.Superiority of network motifs over optimalnetworks and an application to the revelationof gene network evolution. Bioinformatics. 21(2): 227-238, 2005.

Tamada, Y., Bannai, H., Imoto, S., Katayama, T.,Kanehisa, M., Miyano, S. Utilizing evolution-ary information and gene expression data forestimating gene networks with Bayesian net-work models. J. Bioinformatics and Computa-tional Biology. 3(6): 1295-1313, 2005.

Tamada, Y., Imoto, S., Tashiro, K., Kuhara, S.,Miyano, S. Identifying drug active pathwaysfrom gene networks estimated by gene ex-pression data. Genome Informatics. 16(1): 182-191, 2005.

Yoshida, R., Higuchi, T., Imoto, S., Miyano, S.ArrayCluster: an analytic tool for clustering,data visualization and module finder on geneexpression profiles. Bioinformatics. In press.

Yoshida, R., Higuchi, T., Imoto, S. Estimatingtime-dependent gene networks from time se-ries DNA microarray data by dynamic linearmodel with Markov switching. Proc. IEEE 4thComputational Systems Bioinformatics. 289-298, 2005.

Yoshida, R., Imoto, S., Higuchi, T. A penalizedlikelihood estimation on transcriptionalmodule-based clustering, Proc. First Interna-tional Workshop on Data Mining and Bioin-formatics (DMBIO 2005). Lecture Note inComputer Science. 3482: 389-401, 2005.長崎正朗，土井淳，宮野悟．ダイナミックパス

143

ウェイモデリング言語Cell System MarkupLanguage （CSML）．タンパク質核酸酵素．５０（１６Suppl．）：２２６９―２２７４．藤井靖，松野浩嗣，宮野悟，井上愼一．ハイブ

リッド関数ペトリネットによる哺乳類の時計遺伝子機構のモデル化とシミュレーション．時間生物学．１１�：８―１６，２００５．

144

The major goal of the Human Genome Project is to identify genes predisposing todiseases, and to develop new diagnostic and therapeutic tools. We have been at-tempting to isolate genes involving in carcinogenesis and also those causing orpredisposing to other diseases such as IgA nephropathy, and Crohn’s disease. Bymeans of technologies developed through the genome project including a high-resolution SNP map, a large-scale DNA sequencing, and the cDNA microarraymethod, we have isolated a number of biologically and/or medically importantgenes.

1. Genes associated with common diseases

a. IgA nephropathy

Shigeru Ohtsubo, Aritoshi Iida2, Kosaku Nitta1,Toshihiro Tanaka3, Ryo Yamada4, Yozo Oh-nishi3, Shiro Maeda5, Tatsuhiko Tsunoda6,Takashi Takei1, Wataru Obara7, Fumihiro Aki-yama8, Kyoko Ito1, Kazuho Honda1, KeikoUchida1, Ken Tsuchiya1, Wako Yumura1, Taka-shi Ujiie9, Yutaka Nagane10, Satoru Miyano,Yasushi Suzuki7, Ichiei Narita8, Fumitake Ge-jyo8, Tomoaki Fujioka9, Hiroshi Nihei1 and

Yusuke Nakamura: 1Department of Medicine,Kidney Center, Tokyo Women’s Medical Uni-versity, Tokyo, Japan, 2Laboratory for Geno-typing, 3Laboratory for Cardiovascular Dis-eases, 4Laboratory for Rheumatic Diseases, 5

Laboratory for Diabetic Nephropathy, and 6

Laboratory for Medical Informatics, SNP Re-search Center, The Institute of Physical andChemical Research (RIKEN), Tokyo, Japan, 7

Department of Urology, Iwate Medical Univer-sity, Iwate, Japan, 8Division of Clinical Neph-rology and Rheumatology, Niigata UniversityGraduate School of Medical and Dental Sci-

Human Genome Center

Laboratory of Molecular MedicineLaboratory of Genome TechnologyDivision of Advanced Clinical Proteomicsゲノムシークエンス解析分野シークエンス技術開発分野先端臨床プロテオミクス共同研究ユニット

Professor Yusuke Nakamura, M.D., Ph.D.Associate Professor Toyomasa Katagiri, Ph.D.Associate Professor Yataro Daigo, M.D., Ph.D.Assistant Professor Hidewaki Nakagawa, M.D., Ph.D.Assistant Professor Koichi Matsuda, M.D., Ph.D.Assistant Professor Hitoshi Zembutsu, M.D., Ph.D.Assistant Professor Ryuji Hamamoto, Ph.D.

教授医学博士中村祐輔助教授医学博士片桐豊雅特任助教授医学博士醍醐弥太郎助手医学博士中川英刀助手医学博士松田浩一助手医学博士前佛均助手理学博士浜本隆二

145

ences, Niigata, Japan, 9Department of Urology,Iwate Prefectural Ofunato Hospital, Iwate, Ja-pan, 10Department of Urology, Sanai Hospital,Iwate, Japan

Immunogobulin A (IgA) nephropathy is themost common form of primary glomeru-lonephritis worldwide. The pathogenesis of IgAnephropathy is unknown, but it is certain thatsome genetic factors are involved in susceptibil-ity to the disease. Employing a large-scale, case-control association study using gene-basedsingle-nucleotide polymorphism (SNP) markers,we previously reported three candidate genes.We report here an additional significant associa-tion between IgA nephropathy and a SNP lo-cated in the gene encoding immunoglobulin µ-binding protein 2 (IGHMBP2) at chromosome 11q13.2-q13.4. The association (χ2＝17.1, p＝0.00003;odds ratio of 1.85 with 95% confidence intervalof 1.39-2.50 in a dominant association model)was found using DNA from 465 affected indi-viduals and 634 controls. The SNP (G34448A)caused an amino-acid substitution from glu-tamine to lysine (E928K). As the gene product isinvolved in immunoglobulin-class switching andpatients with the A allele revealed higher serumlevels of IgA (p＝0.048), the amino-acid changemight influence a class-switch to increase serumIgA levels, resulting in a higher risk of IgAnephropathy.

So far, we have identified five candidategenes that may be related to susceptibility toIgA nephropathy. On the basis of that informa-tion, we propose the potential mechanisms ofIgA nephropathy. The onset of IgA nephropathycould be associated with antigens such as vi-ruses, fungus, bacteria or food that are proc-essed and presented to T cells. HLA-DR, whichregulates immune responses against protein an-tigens, is of great importance in the selectionand activation of CD4-positive T cells; we iden-tified the gene encoding HLA-DR earlier as acandidate susceptibility gene (Akiyama et al.2002). HLA-DR molecules with the V724L sub-stitution might account for individual differ-ences in immune responses of T cells, which ac-tivate antibody-producing B cells. For its part, asnoted above, the 928K variant of IGHMBP2might influence a class-switch leading to in-creased serum IgA levels.

The third of our candidates, PIGR, is an inte-gral membrane secretory component localizedon the basolateral surface of secretory epithelialcells, where it is thought to mediate the trans-epithelial transport of polymeric IgA. Weshowed earlier that a genetic variation in thepromoter region of the PIGR gene caused an A580V substitution associated with IgA nephropa-

thy, and suggested that the V allele might affectbinding of polymeric IgA to PIGR and causedeposition of mesangial IgA. IgA deposits in thekidney can trigger production of a variety of cy-tokines and growth factors by renal cells and bycirculating inflammatory cells, leading to thecharacteristic histopathological features ofmesangial-cell proliferation and depositions ofimmunoglobulin and complement in mesangialregions.

SELL and SELE genes encode cell-cell adhe-sion molecules involved in the leukocyte-endothelial cell interaction required for ex-travasation at sites of tissue injury. SELE is ex-pressed predominantly in cytokine-activated en-dothelium, and SELL is present in circulatingleukocytes. We reported that Y468H in the SELEgene, as well as P238S-SELL and a SNP in thepromoter region of SELL, were strongly associ-ated with IgA, and suggested that these substi-tutions could affect the quality and/or quantityof gene products and possibly play a significantrole in inflammatory changes leading to renal fi-brosis and ultimately renal failure.

Although functional studies must be under-taken to determine how these genetic variations,now including E928K-IGHMBP2, can affect theonset and development of IgA nephropathy, theresults of our genetic studies have suggestedseveral potential mechanisms for investigation.

b. Crohn’s disease

Keiko Yamazaki, Masakazu Takazoe1, ToraoTanaka1, Toshiki Ichimori1, Susumu Saito2,Aritoshi Iida2, Yoshihiro Onouchi2, Akira Hata2

and Yusuke Nakamura: 1Department of Medi-cine, Division of Gastroenterology, Social In-surance Chuo General Hospital, Tokyo, Japan,2SNP Research Center, the Institute of Physicaland Chemical Research (RIKEN), Kanagawa,Japan

The inflammatory bowel diseases (IBD),Crohn’s disease (CD) and ulcerative colitis (UC),are chronic inflammatory disorders of the diges-tive tract. The pathogenesis of IBD is compli-cated, and it is widely accepted that immu-nologic, environmental and genetic componentscontribute to its etiology. In order to identify ge-netic susceptibility factors in CD, we performeda genome-wide association study in Japanesepatients and controls using nearly 80,000 gene-based single nucleotide polymorphism (SNP)markers, and investigated the haplotype struc-ture of the candidate locus in Japanese andEuropean patients. We identified highly signifi-cant associations (p＝1.71×10－14 with odds ratioof 2.17) of SNPs and haplotypes within the

146

TNFSF15 (the gene encoding tumor necrosis fac-tor superfamily, member 15) genes in JapaneseCD patients. The association was confirmed inthe study of two European IBD cohorts. Iinter-estingly, a core TNFSF15 haplotype showing theassociation with increased risk to the diseasewas common in the two ethnic groups. Our re-sults suggest that the genetic variations in theTNFSF15 gene contribute to the susceptibility toIBD in the Japanese and European popula-tionsns.

2. Genes playing significant roles in humancancer

a. Genes that are inducible by p53

Park Woong Ryeon, Chizu Tanikawa, KoichiMatsuda and Yusuke Nakamura

The p53 tumor suppressor gene is more fre-quently mutated in human cancers than anyother cancer-associated genes yet identified; p53mutations are found in more than half of allcancers examined. The wild-type active form ofits product exerts its tumor-suppressing func-tions either by regulating cell-cycle arrest andDNA repair, or by inducing apoptosis, depend-ing on the specific transcriptional targets thatare activated. Although the selection of tran-scriptional targets seems to depend on the levelof cellular stress and to differ by cell type, p53protein binds to DNA in a sequence-specificmanner to activate transcription of genes encod-ing, for example, p21WAF1, p53R2, MDM2, p53DINP1, p53AIP1, Bax, and GADD45. Modifica-tion of the p53 molecule is considered to be im-portant in the process of selecting transcriptionaltargets, but the mechanism for protein modifica-tion is still not well understood. Phosphoryla-tion of p53 at Ser-15 and Ser-20 has been shownto be involved in activating p53. Although theroles of these modifications are not fully charac-terized, ATM and CHK2 protein are candidatesfor kinases responsible for phosphorylation ofthe Ser-15 or Ser-20 residues of p53, respectively(8, 9). After having a low level of DNA damage,p53 is phosphorylated at residues of Ser-15 andSer-20, and promotes binding of p53 to promot-ers of genes involved in the G1 arrest and DNArepair. However, if DNA damage is severe, Ser-46 of p53 is phosphorylated and the modified p53 leads to induction of apoptosis-related genes,such as p53AIP1. Although dozens of p53-targetgenes involved in p53-dependent tumor sup-pression, i.e. growth arrest, DNA repair, andapoptosis, have been reported to date, the ge-netic mechanisms responsible for p53-dependentcell-survival after exposure to various genotoxic

stresses remains to be elucidated.We reported this year isolation of a novel p53

-target gene, designated p53-inducible cell-survival factor (p53CSV). p53CSV contains a p53-binding site within its second exon and the re-duction of expression by small interfering RNA(siRNA) enhanced apoptosis, whereas over-expression protected cells from apoptosis causedby DNA damage. p53CSV is induced signifi-cantly when cells have a low level of genotoxicstresses, but not when DNA damage is severe. p53CSV can modulate apoptotic pathwaysthrough interaction with Hsp70 that probablyinhibits activity of Apaf-1. Our results implythat under specific conditions of stress, p53regulates transcription of p53CSV and that p53CSV is one of important players in the p53-mediated cell survival.

b. Colon, Liver, and Gastric cancers

Ryuji Hamamoto, Masataka Tsuge, Pittella Fa-bio, Natini Jinawath, Kazutaka Obama, YoichiFurukawa, and Yusuke Nakamura

We previously reported that up-regulation ofSMYD3, a histone H3 lysine-4 specific methyl-transferase, plays a key role in the proliferationof colorectal carcinoma (CRC) and hepatocellu-lar carcinoma (HCC). In this study, we revealthat SMYD3 expression is also elevated in agreat majority of breast cancer tissues. Similarlyto CRC and HCC, silencing of SMYD3 bysiRNA to this gene resulted in the inhibitedgrowth of breast cancer cells, suggesting that theincreased SMYD3 expression is also essential forthe proliferation of the breast cancer cells. More-over, we show here that SMYD3 could promotebreast carcinogenesis by directly regulating theexpression of the proto-oncogene WNT10B .These data imply that augmented SMYD3 playsa crucial role in breast carcinogenesis, and thatinhibition of SMYD3 should be a novel thera-peutic strategy for treatment of breast cancer.

We also found a significant association of agenetic polymorphism (VNTR of a “CCGCC”unit) with an increased risk of colorectal cancerχ2＝17.86, p＝3.8×10－5, odds ratio＝2.20), hepa-toma（χ2＝25.39, p＝4.7×10－7, odds ratio＝2.74)and also breast cancer（χ2＝38.91, p＝3.4×10－9,odds ratio＝3.79), but not with that of gastriccancer. This polymorphic region was proven tobe a binding-site of the transcriptional factor E2F-1. The reporter assay exhibited that the re-porter plasmid containing three-tandem repeatsof the binding motif (corresponding to the riskallele) showed significantly higher reporter ac-tivity than those containing two-tandem repeats(the low-risk allele). These data suggest that the

147

SMYD3 polymorphism enhancing its promoteractivity is a common susceptible factor for hu-man cancer.

Among the genes that were up-regulated intumors, we selected a gene encoding peptidyl-prolyl isomerase like 1 (PPIL1), a cyclophilin-related protein, because it showed a growth-promoting effect on NIH3T3 and HEK293 cells.Moreover, transfection of short-interfering RNAspecific to PPIL1 into SNUC4 and SNUC5 cellseffectively reduced expression of the gene andsuppressed growth of those colon-cancer cells.In addition, we documented interaction betweenPPIL1 protein and stathmin. Since stathmin isup-regulated in various types of malignancy, in-teraction between these two proteins may playan important role in cell proliferation. The find-ings reported here may offer new insight intocolonic carcinogenesis and should contribute todevelopment of new molecular strategies fortreatment of human colorectal tumors.

C. cDNA microarray analysis of cancers

Toyomasa Katagiri, Yataro Daigo, HidewakiNakagawa , Yoichi Furukawa , TakefumiKikuchi, Soji Kakiuchi, Toru Nakamura, Koi-chi Okada, Satoshi Nagayama, Shingo Ashida,Toshihiko Nishidate, Chie Suzuki, NobuhisaIshikawa, Ryo Takata, Tatsuya Kato, AkiraTogashi, Satoshi Hayama, Megumi Iiizumi,Keisuke Taniuchi, and Yusuke Nakamura

(1) ChemosensitivityNeoadjuvant chemotherapy for invasive blad-

der cancer, involving a regimen of methotrexate,vinblastin, doxorubicin, and cisplatin (M-VAC),can improve the resectability of larger neo-plasms for some patients and offer a betterprognosis. However, some suffer severe adversedrug reactions without any effect, and nomethod yet exists for predicting the response ofan individual patient to chemotherapy. Our pur-pose in this study is to establish a method forpredicting response to the M-VAC therapy. Weanalyzed gene-expression profiles of biopsy ma-terials from 27 invasive bladder cancers using acDNA microarray consisting of 27,648 genes, af-ter populations of cancer cells had been purifiedby laser-microbeam microdissection. We identi-fied dozens of genes that were expressed differ-ently between nine “responder” and nine “non-responder” tumors; from that list we selectedthe 14 “predictive” genes that showed the mostsignificant differences and devised a numericalprediction-scoring system that clearly separatedthe responder group from the non-respondergroup. This system accurately predicted thedrug responses of eight of nine test cases that

were reserved from the original 27 cases. As real-time RT-PCR data were highly concordant withthe cDNA microarray data for those 14 genes,we developed a quantitative RT-PCR based-prediction system that could be feasible for rou-tine clinical use. Our results suggest that thesensitivity of an invasive bladder cancer to theM-VAC neoadjuvant chemotherapy can be pre-dicted by expression patterns in this set ofgenes, a step toward achievement of “personal-ized therapy” for treatment of this disease.

Serum levels of amphiregulin (AREG) andtransforming growth factor-alpha (TGFA) thatwere previously identified to be expressed athigh levels in non-small cell lung cancer(NSCLC) with poor response to gefitinib, wereexamined by ELISA using blood samples takenfrom 50 patients with advanced NSCLCs. Of 14cases that revealed above the cut-off line forAREG in serum, twelve responded poorly (PD)to gefitinib, whereas 18 of the 36 cases showingbelow the cut-off revealed partial response (PR)or stable condition (SD) (P＝0.026). Thirteen of15 patients who were positive for TGFA re-sponded poorly to gefitinib, while 18 of the 35patients with negative TGFA levels turned outto be relatively good responders (P＝0.014). Of22 patients with positive values for either orboth marker, 19 were poor responders. On theother hand, among 28 patients negative for bothmarkers, 17 were classified into the PR or SDgroups (P＝0.001). Gefitinib-treated NSCLC pa-tients whose serum AREG or TGFA was posi-tive showed a poorer tumor-specific survival(P＝0.037 and 0.002 respectively, by univariateanalysis), compared with those whose serumAREG or TGFA concentrations were negative.Multivariate analysis showed an independentassociation between positivity for TGFA andshorter survival times among NSCLC patientstreated with gefitinib (P＝0.034). AREG orTGFA positivity in NSCLC tissues was signifi-cantly higher in male, non-adenocarcinomas,and smokers. Our data suggest that the status ofAREG and TGFA in serum can be an importantpredictor of the resistance to gefitinib amongpatients with advanced NSCLC.

(2) Lung cancerWe have been investigating genes involved in

pulmonary carcinogenesis by examining genome-wide gene-expression profiles of non-small celllung cancers (NSCLCs), to identify moleculesthat might serve as diagnostic markers or tar-gets for development of new molecular thera-pies. Distant metastasis is one of crucial parame-ters that determine types of treatment and prog-nosis of patients. Numbers of previous reportsdiscovered important factors involved in multi-

148

ple steps of metastasis, the precise mechanismsof metastasis still remain to be clarified. Toidentify genes associated with this complicatedbiological feature of cancer, we analyzed expres-sion profiles of 16 metastatic brain tumors de-rived from primary lung adenocarcinoma (ADC)using cDNA microarray representing 23,040genes. We applied bioinformatical algorithm tocompare the expression data of these 16 brainmetastatic loci with those of 37 primaryNSCLCs including 22 ADCs, and found thatmetastatic tumor cells has very different charac-teristics of gene expression patterns from pri-mary ones. 244 genes that showed significantlydifferent expression levels between the twogroups included plasma membrane boundingproteins, cellular antigens, and cytoskeletal pro-teins that might play important roles in alteringcell-cell communication, attachment, and cellmotility, and enhance the metastatic ability ofcancer cells. Our results provide valuable infor-mation for development of predictive markersas well as novel therapeutic target molecules formetastatic brain tumor of ADC of the lung.

We found that human ANLN, a homologueof anillin, an actin-binding protein in Drosophila,was transactivated in lung-cancer cells and ap-peared to play a significant role in pulmonarycarcinogenesis. Induction of small interferingRNAs (siRNAs) against ANLN in NSCLC cellssuppressed its expression and resulted ingrowth suppression; moreover, siRNA treatmentyielded cells with larger morphology and multi-ple nuclei, which subsequently died. On theother hand, induction of exogenous expressionof ANLN enhanced the migrating ability ofmammalian cells by interacting with RHOA, asmall GTPase, and inducing actin stress fibers.Interestingly, inhibition of PI3K/AKT activity inNSCLC cells decreased the stability of ANLNand caused reduction of the nuclear ANLNlevel. Immunohistochemical staining of nuclearANLN (n-ANLN) on lung-cancer tissue microar-rays was associated with poor survival ofNSCLC patients, indicating that this moleculemight serve as a prognostic indicator. Our dataimply that up-regulation of ANLN is a commonfeature of the carcinogenetic process in lung tis-sue, and suggest that selective suppression ofANLN could be a promising approach for de-veloping a new strategy to treat lung cancers.

We reported evidence that a member of thearmadillo protein family, plakophilin 3 (PKP3),is a potential molecular target for treatment oflung cancers and might also serve as a prognos-tic indicator. We documented elevated expres-sion of PKP3 in the great majority of NSCLCsamples examined. Treatment of NSCLC cellswith small interfering RNAs (siRNAs) of PKP3

suppressed growth of the cancer cells; on theother hand, induction of exogenous expressionof PKP3 conferred growth-promoting activity onCOS-7 cells and enhanced their mobility in vitro.To investigate its function, we searched for PKP3-interacting proteins and identified dynamin 1-like (DNM1L), which was also activated inNSCLC. In addition, a high level of PKP3 ex-pression was associated with poor survival aswell as disease stage and node status for pa-tients with lung adenocarcinoma (ADC), sug-gesting an important role of the protein in de-velopment and progression of this disease. Asour data imply that up-regulation of PKP3 is afrequent and important feature of lung carcino-genesis, we suggest that targeting the PKP3molecule might hold promise for developmentof a new therapeutic and diagnostic strategy forclinical management of lung cancers.

An increased level of dihydrouridine in trans-fer RNAPhe was found in human malignant tis-sues nearly three decades ago, but its biologicalsignificance in carcinogenesis has remained un-clear. Through analysis of genome-wide gene-expression profiles among non-small cell lungcarcinomas (NSCLCs), we identified over-expression of a novel human gene, termedhDUS2 , encoding a protein that shared struc-tural features with tRNA-dihydrouridine syn-thases (DUS). The deduced 493-amino-acid se-quence showed 39% homology to the Dus2 en-zyme (dihydrouridine synthase 2) of Saccharomy-ces cerevisiae , and contained a conserved double-strand RNA-binding motif (DSRM). We foundthat hDUS2 protein had tRNA-dihydrouridinesynthase activity and that it physically inter-acted with EPRS, a glutamyl-prolyl tRNA syn-thetase, and was likely to enhance translationalefficiencies. An siRNA against hDUS2 trans-fected into NSCLC cells suppressed expressionof the gene, reduced the amount of dihy-drouridine in tRNA molecules, and suppressedgrowth. Immunohistochemical analysis demon-strated significant association between higherlevels of hDUS2 in tumors and poorer prognosisof lung-cancer patients. Our data imply that up-regulation of hDUS2 is a relatively common fea-ture of pulmonary carcinogenesis, and that se-lective suppression of hDUS2 enzyme activityand/or inhibition of formation of the hDUS2-tRNA synthetase complex could be a promisingtherapeutic strategy for treatment of many lungcancers.

(3) Pancreatic cancerThrough functional analysis of genes that

were transactivated in pancreatic ductal adeno-carcinomas (PDACs), we identified RAB6KIFL asa good candidate for development of drugs to

149

treat PDACs at the molecular level. Knockdownof endogenous RAB6KIFL expression in PDACcell lines by siRNA drastically attenuatedgrowth of those cells, suggesting an essentialrole for the gene product in maintaining viabil-ity of PDAC cells. RAB6KIFL belongs to thekinesin superfamily of motor proteins, whichhave critical functions in trafficking of moleculesand organelles. Proteomics analyses using apolyclonal anti-RAB6KIFL antibody identifiedone of the cargoes transported by RAB6KIFL asdiscs large homolog 5 (DLG5), a scaffolding pro-tein that may link the vinexin-β-catenin complexat sites of cell-cell contact. Like RAB6KIFL , DLG5 was up-regulated in PDACs, and knockdownof endogenous DLG5 by siRNA significantlysuppressed the growth of PDAC cells as well.Decreased levels of endogenous RAB6KIFL inPDAC cells altered the sub-cellular localizationof DLG5 from cytoplasmic membranes to cyto-plasm. Our results imply that collaboration ofRAB6KIFL and DLG5 is likely to be involved inpancreatic carcinogenesis . These moleculesshould be promising targets for development ofnew therapeutic strategies for ductal adenocarci-nomas of the pancreas.

P-cadherin/CDH3 belongs to the family ofclassical cadherins that are engaged in variouscellular activities including motility, invasion,and signaling of tumor cells, in addition to celladhesion. However, the biological roles of P-cadherin itself are not fully characterized. Basedon information derived from a previous genome-wide cDNA microarray analysis of microdis-sected PDACs, we focused on P-cadherin as oneof the genes most strongly over-expressed in thegreat majority of PDACs. To investigate the con-sequences of over-expression of P-cadherin interms of pancreatic carcinogenesis and tumorprogression, we used a P-cadherin-deficientPDAC cell line, Panc1, to construct a cell line(Panc1-CDH3) that stably over-expressed P-cadherin. Induction of P-cadherin in Panc1-CDH3 increased the motility of the cancer cells, but ablocking antibody against P-cadherin sup-pressed the motility in vitro. Over-expression ofP-cadherin was strongly associated with cyto-plasmic accumulation of one of the catenins, p120ctn, and cadherin-switching in PDAC cells.Moreover, P-cadherin-dependent activation ofcell motility was associated with activation ofRho GTPases, Rac1 and Cdc42, through accumu-lation of p120ctn in cytoplasm and cadherin-switching. These findings suggest that over-expression of P-cadherin is likely to be relatedto the biological aggressiveness of PDACs;blocking of P-cadherin activity or its associatedsignaling could be a novel therapeutic approachfor treatment of aggressive pancreatic cancers.

(4) Prostate cancerThrough genome-wide cDNA microarray

analysis coupled with microdissection of pros-tate cancer cells, we identified a novel gene,PCOTH (Prostate Collagen Triple Helix), showingover-expression in prostate cancer cells and itsprecursor cells, PINs (prostatic intraepithelialneoplasia). Immunohistochemical analysis usingpolyclonal anti-PCOTH antibody confirmed ele-vated expression of PCOTH, a 100-amino-acidprotein containing collagen triple-helix repeats,in tumor cells. Knocking-down of its expressionusing siRNA resulted in drastic attenuation ofprostate cancer cell growth. Concordantly,LNCaP-derivative cells that were designed toexpress PCOTH highly and stably demonstratedthe higher growth rate than LNCaP cells trans-fected with mock vector. Using 2D-DIGE analy-sis as well as subsequent western blotting andin-gel kinase assay, we found that phosphoryla-tion level of oncoprotein TAF-Iβ/SET was sig-nificantly elevated in LNCaP cells transfectedwith PCOTH than control LNCaP cells. Further-more, knockdown of endogenous TAF-Iβ ex-pression using siRNA also attenuated viabilityof prostate cancer cells as well. These findingssuggest that PCOTH is involved in growth andsurvival of prostate cancer cells thorough, inparts, the TAF-Iβ pathway, and that this mole-cule should be a promising target for develop-ment of new therapeutic strategies for prostatecancers.

(5) Renal cancerTo identify molecules to serve as diagnostic

markers for renal cell carcinoma (RCC) and astargets for novel therapeutic drugs, we investi-gated genome-wide expression profiles of RCCsusing a cDNA microarray. We subsequentlyconfirmed that hypoxia-inducible protein-2 (HIG2) was expressed exclusively in RCCs and fetalkidney. Induction of HIG2 cDNA into COS7cells led to secretion of the gene product intoculture media and resulted in enhancement ofcell growth. Small interfering RNA (siRNA) ef-fectively inhibited expression of HIG2 in humanRCC cells that endogenously expressed high lev-els of the protein, and significantly suppressedcell growth. Moreover, addition of polyclonalanti-HIG2 antibody into culture media inducedapoptosis in RCC-derived cell lines. By bindingto an extracellular domain of frizzled homolog10 (FZD10), HIG2 protein enhanced oncogenicWnt-signaling and its own transcription, sug-gesting that this product is likely to function asan autocrine growth factor. ELISA analysis ofclinical samples identified secretion of HIG2protein into the plasma of RCC patients even atan early stage of tumor development, whereas it

150

was detected at significantly lower levels inhealthy volunteers or patients with chronicglomerulonephritis. The combined evidence sug-gests that this molecule represents a promising

candidate for development of molecular-targeting therapy and could serve as a promi-nent diagnostic tumor-marker for patients withrenal carcinomas.

Publications

1 S. Ohtsubo, A. Iida, K. Nitta, T. Tanaka, R.Yamada, Y. Ohnishi, S. Maeda, T. Tsunoda,T. Takei, W. Obara, F. Akiyama, K. Ito, K.Honda, K. Uchida, K. Tsuchiya, W. Yumura,T. Ujiie, Y. Nagane, S. Miyano, Y. Suzuki, I.Narita, F. Gejyo, T. Fujioka, H. Nihei and Y.Nakamura: Association of a single-nucleotidepolymorphism in the immunoglobulin µ-binding protein 2 gene with immunoglobulinA nephropathy. Journal of Human Genetics,50: 30-35, 2005

2 F.P. Silva, R. Hamamoto, Y. Nakamura, andY. Furukawa: WDRPUH, a novel WD-repeatcontaining protein, is highly expressed in hu-man hepatocellular carcinoma and involvedin cell proliferation. Neoplasia, 7: 348-355,2005

3 A. Iida, K. Ozaki, T. Tanaka, and Y. Naka-mura: Fine-scale SNP map of an 11-kbgenomic region at 22q13.1 containing thegalectin-1 gene. Journal of Human Genetics,50: 42-45, 2005

4 W.-R. Park and Y. Nakamura: p53CSV, anovel p53-inducible gene involved in the p53-dependent cell-survival pathway. CancerResearch 65: 1197-1206, 2005

5 K. Taniuchi, H. Nakagawa, T. Nakamura, H.Eguchi, H. Ohigashi, O. Ishikawa, T.Katagiri, and Y. Nakamura: Over-expressedP-cadherin/CDH3 Promotes Motility of Pan-creatic Cancer Cells by Interacting with p120ctn and Activating Rho-Family GTPases.Cancer Research 65: 3092-3099, 2005

6 R. Takata, T. Katagiri, M. Kanehira, T.Tsunoda, T. Shuin, T. Miki, M. Namiki, K.Kohri, Y. Matsushita, T. Fujioka and Y.Nakamura: Predicting response to M-VACneoadjuvant chemotherapy for bladder can-cers through genome-wide gene expressionprofiling. Clinical Cancer Research 11: 2625-2636, 2005

7 H. Kizawa, I. Kou, A. Iida, A. Sudo, Y. Mi-yamoto, A. Fukuda, A. Mabuchi, A. Kotani,A. Kawakami, S. Yamamoto, N. Uchida, K.Nakamura, K. Notoya, Y. Nakamura and S.Ikegawa: An aspartic acid repeat polymor-phism in asporin negatively affects chondro-genesis and increases susceptibility to os-teoarthritis. Nature Genetics 37: 138-144, 2005

8 M. Nishiu, Y. Tomita, S. Nakatsuka, T.Takakuwa, N. Iizuka, Y. Hoshida, J. Ikeda,

K. Iuchi, R. Yanagawa, Y. Nakamura and K.Aozasa: Distinct Pattern of gene expressionin pyothorax-associated lymphoma (PAL), alymphoma developing inlong-standing in-flammation. Cancer Science, 95: 828-834, 2005

9 K. Taniuchi, H. Nakagawa, T. Nakamura, H.Eguchi, H. Ohigashi, O. Ishikawa, T.Katagiri, and Y. Nakamura: Down-regulationof RAB6KIFL/KIF20A, a Kinesin Involvedwith Membrane Trafficking of Discs LargeHomolog 5, Can Attenuate Growth of Pan-creatic Cancer Cells. Cancer Research, 65: 105-112, 2005

10 A. Iida, Y. Nakamura: Identification of 156novel SNPs in 29 genes encoding G-proteincoupled receptors. Journal of Human Genet-ics 50: 182-191, 2005

11 Y. Anazawa, H. Nakagawa, M. Furihara, S.Ashida, K. Tamura, H. Yoshioka, T. Shuin, T.Fujioka, T. Katagiri, and Y. Nakamura:PCOTH, a Novel Gene Over-expressed inProstate Cancers, Promotes Prostate CancerCell Growth Through Phosphorylation ofOncoprotein TAF-Ib/SET. Cancer Research,65: 4578-4586, 2005

12 A. Togashi, T. Katagiri, S. Ashida, T. Fujioka,O. Maruyama, Y. Wakumoto, Y. Sakamoto,M. Fujime, Y. Kawachi, T. Shuin and YusukeNakamura: Hypoxia-inducible protein 2(HIG2), a novel diagnostic marker for renalcell carcinoma (RCC) and potential target formolecular therapy. Cancer Research, 65: 4817-4826, 2005

13 S. Nagayama, C. Fukukawa, T. Katagiri, T.Okamoto, T. Aoyama, N. Oyaizu, M. Ima-mura, J. Toguchida and Y. Nakamura: Thera-peutic potential of antibodies against FZD10,a cell-surface protein, for synovial sarcomas.Oncogene, 24: 6201-6212, 2005

14 H. Mototani, A. Manuchi, S. Saito, M. Fu-jioka, A. Iida, Y. Takatori, A. Kotani, T.Kubo, K. Nakamura, A. Sekine, Y. Mu-rakami, T. Tsunoda, K. Notoya, Y. Naka-mura and S. Ikegawa: A functional singlenucleotide polymorphism in the core pro-moter region of CALM1 is associated withhip osteoarthritis in Japanese. Human Mo-lecular Genetics, 14: 1009-1017, 2005

15 K. Obama, K. Ura, M. Li, T. Katagiri, T.Tsunoda, A. Nomura, S. Satoh, Y. Naka-mura, and Y. Furukawa: Genome-wide

151

analysis of gene expression in human intra-hepatic cholangiocarcinomas. Hepatology, 41:1339-1348, 2005

16 K. Obama, K. Ura, S. Satoh, Y. Nakamura,and Y. Furukawa: Up-regulation of PSF2, amember of multiprotein complex GINS, in-volved in cholangiocarcinogenesis. OncologyReport, 14: 701-706, 2005

17 T. Watanabe, T. Suda, T. Tsunoda, N.Uchida, K. Ura, T. Kato, S. Hasegawa, S. Sa-toh, S. Ohgi, H. Tahara, Y. Furukawa and Y.Nakamura: Identification of Immunoglobulinsuperfamily 11 (IGSF11) as a novel target forcancer immunotherapy of gastro-intestinalcancers. Cancer Science, 96: 498-506, 2005

18 S. Maeda, S. Tsukada, A. Kanazawa, A.Sekine, T. Tsunoda, D. Koya, H. Maegawa,A. Kashiwagi, T. Babazono, M. Matsuda, Y.Tanaka, T. Fujioka, H. Hirose, T. Eguchi, Y.Ohno, C. Groves, A. Hattersley, G. Hitman,M. Walker, K. Kaku, Y. Iwamoto, R. Kawam-ori, R. Kikkawa, N. Kamatani, M. McCarthy,and Y. Nakamura: Genetic Variations in thegene encoding TFAP2B are associated withtype 2 diabetes. Journal of Human Genetics,50: 283-292, 2005

19 S. Shimazaki, Y. Kawamura, A. Kanazawa,A. Sekine, S. Saito, T. Tsunoda, D. Koya,T.Babazono, Y. Tanaka, M. Matsuda, K. Kawai,T. Iiizumi, M. Imanishi, T. Shinosaki, T. Yan-agimoto, M. Ikeda, S. Omachi, A. Kashiwagi,K. Kaku, Y. Iwamoto, R. Kawamori, R.Kikkawa, M. Nakajima, Y. Nakamura, and S.Maeda: Genetic variations in the gene encod-ing engulfment and cell motility 1 (ELMO1)are associated with susceptibility to diabeticnephropathy. Diabetes, 54: 1171-1178, 2005

20 M. Akahoshi, K. Obara, T. Hirota, A. Mat-suda, K. Hasegawa, N. Takahashi, M.Shimizu, K. Nakashima, S. Doi, H. Fujiwara,A. Miyatake, K. Fujita, N. Higashi, M.Taniguchi, T. Enomoto, X.Q. Mao, K. Naka-shima, C.N. Adra, Y. Nakamura, M. Tamari,and T. Shirakawa: A functional promoterpolymorphism in the TBX21 gene is associ-ated with aspirin-induced asthma. HumanGenetics, 117: 16-26, 2005

21 T. Kato, Y. Daigo, S. Hayama, N. Ishikawa,T. Yamabuki, T. Ito, M. Miyamoto, S. Kondoand Y. Nakamura: A novel human tRNA-dihydrouridine synthase involved in pulmo-nary carcinogenesis. Cancer Research, 65:5638-5646, 2005

22 K. Asamura, S. Abe, Y. Imamura, A. Aszodi,N. Suzuki, S. Hashimoto, Y. Takumi, T. Hay-ashi, R. Fassler, Y. Nakamura, and S. Usami:Type IX collagen is crucial for normal hear-ing. Neuroscience, 132: 493-500, 2005

23 R. Kawaida, R. Yamada, K. Kobayashi, S.

Tokuhiro, A. Suzuki, Y Kochi, X. Chang, A.Sekine, T. Tsunoda, T. Sawada, H. Fu-rukawa, Y. Nakamura, and K. Yamamoto:CUL1, a component of E3 ubiquitin ligase,alters lymphocyte signal transduction withpossible effect on rheumatoid arthritis. GenesImmun, 6: 194-202, 2005

24 S. Seki, Y. Kawaguchi, K. Chiba, Y. Mikami,H. Kizawa, T. Oya, F. Mio, M. Mori, Y. Mi-yamoto, I. Masuda, T. Tsunoda, M. Kamata,T. Kubo, Y. Toyama, T. Kimura, Y. Naka-mura and S. Ikegawa: A functional SNP incartilage intermediate layer protein (CILP) isassociated with susceptibility to lumbar discdisease. Nature Genetics, 37: 607-612, 2005

25 Y. Kochi, R. Yamada, A. Suzuki, J.B. Harley,S. Shirasawa, T. Sawada, S.C. Bae, S.Tokuhiro, X. Chang, A. Sekine, A. Takahashi,T. Tsunoda, Y. Ohnishi, K.M. Kaufman, C.P.Kang, C. Kang, S. Otsubo, W. Yumura, A.Mimori, T. Koike, Y. Nakamura, T. Sasazukiand K. Yamamoto: A functional variant inFCRL3, encoding Fc receptor-like 3, is associ-ated with rheumatoid arthritis and severalautoimmunities. Nature Genetics, 37: 478-485, 2005

26 C. Furukawa, Y. Daigo, N. Ishikawa, T. Kato,T. Ito, E. Tsuchiya, S. Sone, and Y. Naka-mura: PKP3 oncogene as prognostic markerand therapeutic target for lung cancer. Can-cer Research, 65: 7102-7110, 2005

27 M. Tsuge, R. Hamamoto, F.P. Silva, Y. Oh-nishi, K. Chayama, Y. Furukawa, and Y.Nakamura: VNTR Polymorphism of E2F-1binding element in the 5’ flanking region ofSMYD3 is a risk factor for human cancers.Nature Genetics, 37: 1104-1107, 2005

28 M. Sakai, T. Shimokawa, T. Kobayashi, S.Matsushima, Y. Yamada, Y. Nakamura, andY. Furukawa: Elevated expression of C10orf3(Chromosome 10 open reading frame 3) isinvolved in the growth of human colon tu-mor. Oncogene, in press, 2005

29 M. Takahashi, K. Obama, Y. Nakamura, andY. Furukawa: Identification of SP5 as adownstream gene of the beta-catenin/Tcfpathway and its enhanced expression in hu-man colon cancer. International Journal ofOncology, 27: 1483-1487, 2005

30 N. Ishikawa, Y. Daigo, A. Takano, M. Tani-waki, T. Kato, S. Hayama, H. Murakami, Y.Takeshima, K. Inai, H. Nishimura, E.Tsuchiya, N. Kohno, and Y. Nakamura: In-creases of amphiregulin and transforminggrowth factor-alpha in serum as predictors ofpoor response to Gefitinib among patientswith advanced non-small cell lung cancers.Cancer Research, 65: 9176-9184, 2005

31 C. Suzuki, Y. Daigo, N. Ishikawa, T. Kato, S.

152

Hayama, T. Ito, E. Tsuchiya, and Y. Naka-mura: ANLN plays a critical role in humanlung carcinogenesis through activation ofRHOA and by involvement in PI3K/AKTpathway. Cancer Research, in press, 2005

32 R. Hamamoto, F.P. Silva, M. Tsuge, T. Nishi-date, T. Katagiri, Y. Nakamura and Y. Fu-rukawa: Enhanced SMYD3 expression is es-sential for the growth of breast cancer cells.Cancer Science, in press, 2005

33 K. Asamura, S. Abe, H. Fukuoka, Y. Naka-mura, S. Usami: Mutation analysis of COL9A3, a gene highly expressed in the cochlea, inhearing loss patients. Auris Nasus Larynx,32: 113-117, 2005

34 Kanazawa, Y. Kawamura, A. Sekine, A. Iida,T. Tsunoda, A. Kashiwagi, Y. Tanaka, T.Babazono, M. Matsuda, K. Kawai, T. Iiizumi,T. Fujioka, M. Imanishi, K. Kaku, Y.Iwamoto, R. Kawamori, R. Kikkawa, Y.Nakamura, and S. Maeda: Single nucleotidepolymorphosms in the gene encodingKruppel-like factor 7 are associated withtype 2 diabetes. Diabetologia, 48: 1315-1322,2005

35 C. Furukawa, Y. Nakamura and T. Katagiri:Molecular target therapy of synovial sar-coma. Future Oncology, in press, 2005

36 T. Watanabe, M. Suzuki, Y. Yamasaki, S.Okuno, H. Hishigaki, T. Ono, K. Oga, A.Mizoguchi-Miyakita, A. Tsuji, N. Kanemoto,S. Wakitani, T. Takagi, Y. Nakamura, and A.Tanigami: Mutated G-protein-coupled recep-tor GPR10 is responsible for the hy-perphagia/dyslipidaemia/obesity locus ofDmol in the OLETD rat. Clinical and Experi-mental Pharmacology and Physiology, 32:355-366, 2005

37 K. Yamazaki, D. McGovern, J. Ragoussis, M.Paolucci, H. Butler, D. Jewell, L. Cardon, M.

Takazoe, T. Tanaka, T. Ichimori, S. Saito, A.Sekine, A. Iida, A. Takahashi, T. Tsunoda, M.Lathrop and Y. Nakamura: Single nucleotidepolymorphisms in TNFSF15 confer suscepti-bility to Crohn’s disease. Human MolecularGenetics, 14: 3299-3506, 2005

38 The International HapMap Consortium: Ahaplotype map of the human genome. Na-ture, 4371299-1320, 2005

39 T. Ishibe, T. Nakayama, T. Okamoto, T.Aoyama, K. Nishijo, K. Shibata, Y. Shima, S.Nagayama, T. Katagiri, Y. Nakamura, T.Nakamura, and J. Toguchida: Disruption offibroblast growth factor signal pathway in-hibits the growth of synovial sarcomas: po-tential application of signal inhibitor to mo-lecular target therapy. Clinical Cancer Re-search, 11: 2702-2712, 2005

40 K. Obama, T. Kato, S. Hasegawa, S. Satoh, Y.Nakamura, and Y. Furukawa: Overexpres-sion of peptidyl-prolyl isomerase like-1 is as-sociated with the growth of colon cancercells. Clinical Cancer Research, in press, 2005

41 A. Iida, S. Saito, A. Sekine, A. Takahashi, N.Kamatani, and Y. Nakamura: Japanese SNPdatabase for 267 possible drug-related genes.Cancer Science, in press, 2005

42 T. Kikuchi, Y. Daigo, N. Ishikawa, T.Katagiri, T. Tsunoda, S. Yoshida, and Y.Nakamura: Expression profiles of metastaticbrain tumor from lung adenocarcinomas oncDNA microarray. International Journal ofOncology, in press, 2005

43 T. Mushiroda, Y. Ohnishi, S. Saito, A. Taka-hashi, Y. Kikuchi, S. Saito, H. Shimomura, Y.Wanibuchi, T. Suzuki, N. Kamatani, and Y.Nakamura: Association of VKORC1 and CYP2C9 polymorphisms with warfarin dose re-quirements in Japanese patients. Journal ofHuman Genetics, in press, 2006

153

The mission of our laboratory is to conduct computational (“in silico”) studies onthe functional aspects of genome information. Roughly speaking, genome informa-tion represents what kind of proteins/RNAs are synthesized on what conditions.Thus, our study includes the structural analysis of molecular function of each geneproduct as well as the analysis of its regulatory information, which will lead us tothe understanding of its cellular role represented by the networks of inter-gene in-teraction.

1. DBTGR: A Database for ComparativeAnalysis of Tunicates Promoter Se-quences

Nicolas Sierro, Takehiro Kusakabe1, Keun-Joon Park, Riu Yamashita, Kengo Kinoshitaand Kenta Nakai: 1Graduate School of LifeScience, University of Hyogo

The two ascidians Ciona intestinalis and Cionasavignyi belong to the tunicate subphylum,which is particularly interesting because itshares many developmental and physiologicalcharacteristics, as well as basic gene repertoires,with the vertebrates. The rapid development ofa fertilized ascidian egg into a transparenttadpole-like larva having a body plan similar toits vertebrate counterpart and the availability ofa well-established cell-lineage has made of theseorganisms a favored tool to elucidate the geneticregulatory systems underlying the developmen-tal and physiological processes of vertebrates. Inorder to understand the regulation of these nu-merous genes identified after the recent releaseof the C. intestinalis and C. savignyi genomes, adatabase was created containing information onregulation of tunicate genes collected from lit-erature, as well as predicted binding sited for

the identified transcription factors. The informa-tion contained in the DataBase of Tunicate GeneRegulation (DBTGR, http://dbtgr.hgc.jp) origi-nates from two sources. On one hand, generegulation, transcription factor and their bindingsites obtained by searching published literature.And on the other hand C. intestinalis and C.savignyi promoter sequences extracted from themost current genome releases, either by usingthe information provided in the literature andwith the genome releases, or by sequence align-ment and homology searches. Additionally, therecognition sites of the reported transcriptionsfactors were used to identify new potentialbinding sites within the promoters. More impor-tantly, DBTGR provides an alignment betweencorresponding C. intestinalis and C. savignyigene promoter sequences facilitating the identifi-cation of actual regulatory elements and of re-gions conserved in both promoters.

2. Motif Analysis of Tissue-specific Promot-ers in Ciona intestinalis

Keun-Joon Park, Nicolas Sierro, TakehiroKusakabe1, Riu Yamashita, Kengo Kinoshitaand Kenta Nakai

Human Genome Center

Laboratory of Functional Analysis In Silico機能解析イン・シリコ分野

Professor Kenta Nakai, Ph. D.Associate Professor Kengo Kinoshita, Ph. D.

教授理学博士中井謙太助教授理学博士木下賢吾

154

DBTGR is a database of tunicate promotersand their regulatory elements. While construct-ing the DBTGR database, we also investigatedpromoters leading to tissue-specific expressionin Ciona (tunicates). We have found muscle-specific motifs in the Ciona intestinalis datasetconstructed from DBTGR and obtained withPSSM (position-specific scoring matrix) for eachmotif. After finding muscle-specific motifs in therestricted DBTGR Ciona intestinalis dataset, wesearched for the same motifs in the completeCiona intestinalis genome sequence (JGI version1.95). To find potential muscle-specific genes inthe genome sequence, we constructed somemuscle-specific motifs combination (element)models based on the results of the DBTGR da-taset analysis. These models were used to detectnew muscle-specific genes in the whole genome.After prediction of TATA-box sequences in theupstream region of each gene, significant motifcombinations matching the models were de-tected in the genome sequence. To improvethese muscle-specific promoter element models,and discover potentially new ones, we are goingto analyze the new Ciona intestinalis genomeversion 2.0 and to carry out comparativegenomics analysis using the Ciona savignyigenome sequence.

3. Comparative Analysis of Firmicute Promot-ers

Nicolas Sierro and Kenta Nakai

With the rapid increase in the number of bac-terial genome entirely sequenced, systematicfunction analysis projects have started to deci-pher the total gene activity of these organisms.Due to the probable co-regulation by a commontranscription factor of genes showing a similarexpression profile, investigation of their pro-moter regions is an important step towards theunderstanding of global cell regulation net-works. The previously constructed database oftranscriptional regulation in Bacillus subtilis(DBTBS) focuses on known transcription factors,their recognition sequences and the gene theycontrol in B. subtilis. However, in the recentyears, more than 60 other firmicute genomes, in-cluding medically and industrially importantspecies such as Streptococcus pyogenes , Staphylo-coccus aureus or Lactococcus lactis, have beencompletely sequenced. Using the informationavailable from these genomes, as well as regula-tory events reported in literature, a new insightin the evolution of regulatory networks could beobtained. For instance although the heat shockresponse is primordial for the survival of allbacteria, and as such is likely to have appeared

early in evolution, the interaction between twoof its known regulation pathways, controlled bythe HrcA and CtsR transcription factors, variesdepending on species and subspecies. This illus-trates the importance of extending the compre-hensive B. subtilis information with that ob-tained from other related bacteria in order toprovide a more accurate picture of the bacterialgene regulation.

4. Update of DataBase of Transcription StartSites

Riu Yamashita, Yutaka Suzuki2 , SumioSugano, and Kenta Nakai: 2Graduate School ofFrontier Sciences

DBTSS was first constructed in 2002 based onprecise , experimentally determined 5’ -endclones. Several major updates and additionshave been made since the last report. First, thenumber of human clones has drastically in-creased, going from 190,964 to 1,359,000. Wechecked reliability of these data with ChIP ofchip experiments of the ENCODE project.Around 54％ of the transcription start sites wereobserved in ChIP on chip positive (＞＝2 of ra-tio) region. Moreover, these TSSs correspond to90％ of 5’-end clones in ENCODE region; there-fore, our data is consistent with other experi-mental works. Second, information about poten-tial alternative promoters is presented becausethe number of 5’-end clones is now sufficient todetermine several promoters for one gene.Namely, we defined putative promoter groupsby clustering TSSs separated by less than 500bases. 8,308 human genes and 4,276 mousegenes were found to have putative multiple pro-moters. To verify these putative alternative pro-moters, we obtained 138 alternative promoters(64 genes) based on other experimental meth-ods. 74％ of them corresponded to our DBTSSalternative promoters. Finally, we have addedTSS information for zebrafish, malaria, andschyzon (a red algae model organism). DBTSS isaccessible at http://dbtss.hgc.jp.

5. Comparative Analysis of Alternative Pro-moters between Human and Mouse Genes

Katsuki Tsuritani3, Yutaka Suzuki2, KoichiKimura4, Ai Wakamatsu4, Riu Yamashita,Takao Isogai4, Sumio Sugano2, and KentaNakai: 3Taisho Pharmaceutical Co. Ltd., 4RE-PRORI Co. Ltd.

It gradually becomes clear that a large popu-lation of human genes are regulated by morethan one alternative promoters (APs). In this re-

155

port, we focus on the comparative analysis ofhuman/mouse APs. We extracted orthologousgenes that have more than two APs in eitherhuman or mouse from DBTSS (http://dbtss.hgc.jp/) and analyzed their putative promoter re-gions using a local alignment program LALIGN.We classified the extracted genes into five cate-gories: ‘0-0’, ‘1-1’, ‘1-m’, ‘m-1’, and ‘m-m’ basedon the conservation of their promoter regions;where ‘0-0’ is the case that there were no con-served regions, ‘1-m’ and ‘m-1’ contain some re-dundant orthologous promoters in mouse andhuman, respectively, ‘m-m’ is the case that both‘1-m’ and ‘m-1’ hold, and ‘1-1’ is the remainingcase which indicates that the promoter is notduplicated. The number of categories for ‘0-0’, ‘1-1’, ‘1-m’, ‘m-1’, and ‘m-m’ were 538, 4364, 821,334, and 224, respectively. We extracted 523genes with more than two reciprocally con-served promoter regions from the major cate-gory ‘1-1’ and defined them as the AP core da-taset. Among them, the genes that have themost conserved promoter regions were ‘neuralprecursor cell expressed, developmentally down-regulated 4-like’ and ‘regulator of G-proteinsignaling 3’, both having five conserved regions.By analyzing their Gene Ontology annotation,we found that genes related to ‘signal transduc-tion’ are significantly enriched in the AP coredataset. Our results suggest that APs contribut-ing to the diversity of cellular signal transduc-tion are well-conserved throughout the evolu-tion.

6. Comparative Sequence Analysis of Humanand Mouse Promoter Regions

Hirokazu Chiba, Riu Yamashita, Kengo Ki-noshita, and Kenta Nakai

Computational sequence analysis of promoterregions is essential to elucidate the mechanismof transcriptional regulation. The accumulationof experimentally validated TSS data made pos-sible a large scale promoter comparison not onlypossible but also an effective approach. Basedon the database of transcriptional start sites(DBTSS) developed in our laboratory, we carriedout the most comprehensive comparison to datefor promoter regions of human and mousegenes, aiming at elucidating the relationship be-tween gene function and promoter conservation.For 70％ of the orthologous gene pairs, promot-ers were detected to be evolutionary relatedwhen compared to promoters from unrelatedgenes. Conservation levels in a wide range fromgene to gene. The conservation level of geneswith specific functions was examined based onGO slim categories, and new functional catego-

ries with high promoter conservation levelswere identified in addition to the ones reportedpreviously. Furthermore, a similar analysis re-garding protein conservation levels was carriedout, and evolutional constraints on genes werediscussed from the points of view of protein se-quence and promoter sequence. For signal trans-ducer, evolutionary constraints tend to be onpromoter sequences rather than protein se-quences, while for enzymes, they are on proteinsequences rather than promoter sequences.

7. Comprehensive Analysis of Triplet Re-peats in Vertebrate Genomes

Shigeo Okada, Riu Yamashita, Kengo Ki-noshita, and Kenta Nakai

About 3％ of the human genome is composedof triplet repeats, and their expansion in specificgenes is associated with at least 42 human dis-eases. However, the definition of triplet repeatshas not been settled since it may contain one ormore potentially important interruptions. Forexample, Huntington gene has one CAA inter-ruption in its CAG repeat regions. According tothe current Repeat Evolution Model, interrup-tions may have appeared by point mutationsduring evolution. In addition, the possible func-tions of the interruptions are to energeticallystabilize repeats and to prevent their expan-sions. However, interruptions are not consideredby the conventional in silico researches. There-fore we reinvestigated triplet repeats taking in-terruptions into account. We defined the repeatswith interruptions using triplet repeat diseasegenes and statistical significance. While there are6,015 amino acid repeats without interruptionsin human coding sequences (CDSs), we can find22,581 repeats with interruptions. From the in-vestigation of repeats in human and mouseorthologous genes, 4,996 repeats exist only inhuman genes, and 3,522 repeats only in mousegenes. The percentage of a specific amino acidrepeat and the average length of the repeats donot vary much between the two species. Wealso find that the distribution of the differencesbetween mouse and human glutamine-repeatlengths from orthologous genes is almost sym-metrical. We investigated the mutation rate oftriplet repeats between human, chimpanzee andmouse CDSs. Interruptions caused by one pointmutation are the most abundant (＞40％). Wealso find that the distribution of point mutationsresulting in an interruption does not vary muchbetween species, but varies widely dependingon the repeated triplet. This indicates that theformation of interruptions depends on the re-peated triplet, but is independent of the species

156

and triplet repeat length. In summary, it ap-pears that repeats and interruptions are formedat the same rate, independently of the speciesand despite the diversification of the genomes.Our results are in agreement with the RepeatEvolution Model and imply that interruptionscould play some roles in the repeats becausethey are relatively frequent. We expect that ourresults will be of great help to understand therelationship between triplet repeats and dis-eases.

8. ATTED-II: A Database of Co-expressedGenes And cis-elements for Identifying Co-Regulated Gene Groups in Arabidopsis

Takeshi Obayashi5,7, Kenta Nakai, Daisuke Shi-bata6, Kazuki Saito7, Hiroyuki Ohta5: 5TokyoInstitute of Technology, 6Kazusa DNA Re-search Institute, 7Chiba University

Finding out the combination of co-expressedgene sets would be valuable for a wide varietyof experimental design, such as targeting thegenes for functional identification and for inves-tigation of possible cis-elements in promoter se-quences. Here, we report the construction ofArabidopsis thaliana trans-factor and cis-elementprediction database (ATTED-II), which providesco-regulated gene relationships based upon co-expressed genes deduced from microarray datawith the predicted cis elements. ATTED-II in-cludes following features: (i) lists of co-expressed genes calculated with 771 publiclyavailable microarray data in A. thaliana or withthe subsets of these data, (ii) prediction of cisregulatory elements in the 200-bp region up-stream of transcription start site to estimate co-regulated genes from the co-expressed genes,(iii) prediction of subcellular localizations andfunctional groups of proteins to support the esti-mation of the co-regulated genes. ATTED-II canthus provide the clues for researchers to clarifythe function and regulation of particular genesand the networks of gene-to-gene relationships(http://www.atted.bio.titech.ac.jp).

9. Computational Analysis of microRNA Rec-ognition Site

Keishin Nishida, Riu Yamashita, Kengo Ki-noshita, and Kenta Nakai

microRNAs (miRNAs) are noncoding RNAsabout 22 nucleotides that suppress translation oftarget genes by binding to their mRNA, andthus have a role in gene regulation. RecentlyLim et al. (2005 Nature) transfected miRNAsinto human cells and used microarrays to exam-

ine changes in the messenger RNA profile.These microarray profiles indicate that the 3’-untranslated regions of down-regulated mes-sages have a significant propensity to pair to the5’-region of the miRNAs, as expected if many ofthese messages are the direct targets of the miR-NAs. However miRNA target prediction* searchcomplementary sequence of Seed (SeedCS) re-sults contain many false positives. We need todefine more fine Seed and create the target pre-diction algorism to reduce false positive. Herewe present a visualization method to discoverout the miRNAs recognition site by computa-tional analysis for identify the location of miR-NAs recognition site and its required length.And present method to reduce false positive attarget prediction. The sequence data of miRNAswere obtained from Lim et al., and the mRNAsequence data were obtained from the Ensembldatabase. The complementation at each positionof mRNA sequence by a miRNA was consid-ered. The length and start position within themiRNA sequence of any nucleotide stretchlonger than 2based were recorded. We therebyobtained length vs position matrices for eachmiRNA-mRNA pair. Matrices obtained with thesame miRNA were added together, yieldinggeneral miRNA matrices. The microarray datawere obtained from NCBI GEO (GSE2075).miRNA specific gene expression profiles werelinked to Ensembl transcripts based on the map-ping of microarray oligo nucleotides on EnsemblmRNAs. For consider both sensitivity and speci-ficity, we use maximum Matthews correlationcoefficient for score. That score are ploted as dotcolor and aligned matrix shape. It indicatesmiRNA’s length 8 position 0 and length 7 posi-tion 1 is important to recognize mRNA. Next, topredict significant down-regulated mRNAs, wepicked up top mRNAs that down-regulated atmicroarray experiment. It has many SeedCScompare with ambiguously down-regulatedmRNAs. Then we picked mRNA that has pluralSeedCS. Then, false positives were reduesed.mRNAs which has more than two Seed comple-mentary sequences have highly down-regulatedand can be easily predicted.

*miRNA target prediction: Prediction of mRNAsthat are degraded by miRNA

157

10. Sequence-Based Analyses of Biosynthe-sis Rate Limiting Factors in Wheat GermCell-Free System

Naoya Fujita, Motoaki Seki8, Kazuo Shinozaki8,Kengo Kinoshita, Tatsuya Sawasaki9, YaetaEndo9, Kenta Nakai: 8RIKEN, 9Faculty. ofEng., Ehime Univ.

The production of proteins themselves is es-sential for their structural and functional analy-ses in the post-genome era. A wheat germ cell-free system is a helpful method for that pur-pose, as it is able to produce proteins from vari-ous sources’ mRNAs using the translational ma-chinery of wheat germs. Only one 5’-UTR wasused to obtain similar biosynthesis yields re-gardless of the coding sequence. However, therange of yields observed is very wide. We there-fore investigated the causes of the yield vari-ation based on the protein sequences. The da-taset used consists of 425 protein kinases fromArabidopsis thaliana. 4 candidates were consid-ered as yield variation factors: disordered re-gion, coiled coil structure, codon usage andmRNA secondary structure. Disordered regionsand coiled coil structures were predicted usingDISOPRED2 and COILS respectively. A specificnegative correlation between N terminal disor-ders and yields was found. If a protein has ahigh disorder, its yield will be low. However,the yield range of proteins with a lower disor-der percentage remains wide. Focusing on theregion with less than 20％ disorder and consid-ering coiled coil structures, we found that if aprotein kinase has coiled coils, its biosynthesisyield will be low. mRNA structure also relate tothe yield decrease, consistent with the observedin vivo gene expression in E. coli. Codon usagehas no influence on our dataset, although thislack of effect may be restricted to the proteinkinase family. We interpreted the disorder andcoiled coil factors and proposed an entangle-ment model in which potential interactions andentanglements between neighboring synthesizedpolypeptides could occur, resulting in a lowerprotein production. Further investigations re-garding biosynthesis yields of other proteinfamilies as well as protein kinase from other or-ganisms are currently underway.

11. Construction and Analyses of A Non-interacting Sites Database for Protein-protein Interaction

Miho Higurashi, Kenta Nakai, and Kengo Ki-noshita

Many of the celluer events are regulated

through the interaction between protein andprotein. Recent growth of PDB entry enables usto reveal characteristics of protein-protein inter-acting sites from the viewpoint of structuralgenomics. In the past studies, comparison ofprotein-protein interacting sites with whole sur-face of protein was done to extract features ofprotein-protein interacting sites. However, thefeature of protein-protein interacting site may beobscured because whole surface contain interact-ing sites. To extract the characteristics of protein-protein interacting sites, comparison of protein-protein interacting sites with non-interactingsites should be done. Refer to protein-protein in-teracting site of homologue proteins in PDB, weconstructed database for sites with which inter-acts nothing, as far as we know.

12. Modeling Tertiary Structure ofComplementarity-Determining Region ofAntibodies

Toru Hosokawa, Kenta Nakai, and Kengo Ki-noshita

The tertiary structure of protein is importantto predict protein function and achieve rationaldrug design. Therefore, the establishment ofprotein tertiary structure prediction by com-puter is needed. We focused on improving pre-diction accuracy of the complementarity deter-mining region (CDR) of antibody. It has been re-ported that CDRs can be modeled by using tem-plates from other antibodies. However, the pre-cise prediction of the position of the CDR andthe selection of an appropriate template for thetarget CDR are necessary. We developed a pre-diction method which uses hidden markovmodel and achieved almost 100％ prediction ac-curacy. In previous works, several relations be-tween the amino acid sequence and the canoni-cal structure (called “rule”) were suggested.While validating these rules and found 35％ ofCDR structures for which template less thanRMSD 1A

�cannot find by known rules. There-

fore we created new rules by clustering struc-tures and patterns of sequences. By implement-ing a new prediction method for localization ofthe CDR and for rule matching, we successfullyautomated the structure prediction process ofantibodies and improved the overall accuracy ofstructure prediction from 2.5A

�to 1.9A

�RMSD. A

web server which predicts tertiary structure ofantibodies from amino acid sequence was builtbased on these results and existing homologymodeling method.

158

13. Identification of The Ligand Binding SitesOn The Molecular Surface of Proteins

Kengo Kinoshita and Haruki Nakamura10:10Osaka Univ.

Identification of protein biochemical functionsbased on their three-dimensional structures isnow required in the post-genome-sequencingera. Ligand binding is one of the major bio-chemical functions of proteins, and thus theidentification of ligands and their binding sitesis the starting point for the function identifica-tion. Here we describe our trial on structure-based function prediction, based on the similar-ity searches of molecular surfaces against thefunctional site database, eF-site.

14. PreDs: A Server for Predicting dsDNA-binding Site on Protein Molecular Sur-faces

Yuko Tsuchiya10, Kengo Kinoshita, HarukiNakamura10

PreDs is a WWW server that predicts thedsDNA-binding sites on protein molecular sur-faces generated from the atomic coordinates in aPDB format. The prediction was done by evalu-ating the electrostatic potential, the local curva-ture and the global curvature on the surfaces.Results of the prediction can be interactivelychecked with our original surface viewer.

15. P-cats: prediction of catalytic residues inproteins from their tertiary structures.

Kengo Kinoshita and Motonori Ota5

P-cats is a web server that predicts the cata-lytic residues in proteins from the atomic coor-

dinates. P-cats receives a coordinate file of thetertiary structure and sends out analytical re-sults via e-mail. The reply contains a summaryand two URLs to allow the user to examine theconserved residues: one for interactive images ofthe prediction results and the other for a graphi-cal view of the multiple sequence alignment.

16. Analysis of the three-dimensional struc-ture of the “C-X-G-X-C” motif in theCMGCC and CAGYC regions of α- andβ-subunits of human chorionic gona-dotropin: Importance of Glycine Residue(G) in the Motif

Kengo Kinoshita, Masami Kusunoki10, and Ki-yoshi Miyai10

The “C-X-G-X-C” motif of human glycopro-tein hormones, hTSH, hLH, hFSH and hCG, arestrictly conserved. These proteins form dimerwith the identical α-subunit and specific β-subunit for each glycoprotein hormones. Somemutational studies have shown the importanceof the Gly residues in the β-subunit. In thisstudy, using the recently solved structure ofhCG, we have analyzed role of the glycine resi-due in the α- and β-subunits by the conforma-tional energy calculation with loop closure algo-rithm. As a result, in the α-subunit, only aGly residue is allowed at the third site due tothe steric hiderance within the subunit. On theother hand in the β-subunit, Ala residue is alsoacceptable in the monomer structure, but theAla is also forbidden when the dimer structureis formed. This different role of the Gly in eachsubunit can be a possible explanation of the im-portance of Gly residue in the β-subunit as inthe case of TSH deficiency disease, which iscaused by the mutation from Gly to Ala in β-subunit.

Publications

Kato, K., Yamashita, R., Matoba, R., Monden,M., Noguchi, S., Takagi, T., and Nakai, K.Cancer gene expression database (CGED): adatabase for gene expression profiling withaccompanying clinical information of humancancer tissues, Nucl. Acids Res., 33: D533-D536, 2005.

Makita, Y., De Hoon, M.J.L., Ogasawara, N., Mi-yano, S., and Nakai, K. Bayesian joint predic-tion of associated transcription factors in Ba-cillus subtilis, Pacific Symposium on Biocom-puting 2005 (Altman et al ed.), 507-518, WorldScientific, 2005.

Poluliakh, N., Konno, M., Horton, P., and Nakai,

K. Parameter landscape analysis for commonmotif discovery programs, in Eskin, E. &Workman, C. (eds.), Regulatory Genomics,RECOMB 2004 International Workshop, RRG2004, San Diego, CA, USA, March 26-27, 2004,Revised Selected Papers. Lecture Notes inComputer Science 3318, pp. 79-87, Springer,2005.

Yamashita, R., Suzuki, Y., Sugano, S., andNakai, K. Genome-wide analysis revealsstrong correlation between CpG islands withnearby transcription start sites of genes andtheir tissue-specificity, Gene, 350(2): 129-136,2005.

159

Nakao, M., Barrero, R.A., Mukai, Y., Motono, C.,Suwa, M., and Nakai, K. Large-scale analysisof human alternative protein isoforms: patternclassification and correlation with subcellularlocallization signals, Nucl. Acids Res., 33(8):2355-2363, 2005.

De Hoon, M.J.L., Makita, Y., Nakai, K., and Mi-yano, S. Prediction of transcriptional termina-tors in Bacillus subtilis and related species,PLoS Comput. Biol., 1(3): e25, 2005.

Horton, P., Park, K.-J., Obayashi, T., and Nakai,K., Protein subcellular localization predictionwith WoLF PSORT, Proc. APBC, in press.

Kimura, K., Watanabe, A., Suzuki, Y., Ota, T.,Nishikawa, T., Yamashita, R., Yamamoto, J.,Sekine, M., Tsuritani, K., Ishii, S., Sugiyama,T., Saito, K., Isono, Y., Irie, R., Kushida, N.,Yoneyama, T., Otsuka, R., Kanda, K., Yokoi,T., Kondo, H., Wagatsuma, M., Murakawa, K.,Ishida, S., Ishibashi, T., Takahashi-Fujii, A.,Tanase, T., Nagai, K., Kikuchi, H., Nakai, K.,Isogai, T., and Sugano, S. Diversification oftranscriptional modulation: large-scale identi-fication and characterization of putative alter-native promoters of human genes, GenomeRes., in press

Sierro, N., Kusakabe, T., Park, K.-J., Yamashita,R., Kinoshita, K., and Nakai, K. DBTGR: a da-tabase of tunicate promoters and their regula-tory elements, Nucl. Acids Res., in press

Yamashita, R., Suzuki, Y., Wakaguri, H., Tsuri-tani, K., Nakai, K., and Sugano, S. DBTSS: Da-

tabase of Human Transcription Start Sites,Progress Report 2006, Nucl. Acids Res., inpress.

Kinoshita, K. and Nakamura, H. Identificationof the ligand binding sites on the molecularsurface of proteins. Protein Sci 14: 711-718,2005.

Tsuchiya, Y., Kinoshita, K, Nakamura, H. PreDs:a server for predicting dsDNA-binding site onprotein molecular surfaces. Bioinformatics 21:1721-1723, 2005.

Kinoshita, K. and Ota, M. P-cats: prediction ofcatalytic residues in proteins from their terti-ary structures. Bioinformatics 21: 3570-3571,2005.

Kinoshita, K., Kusunoki, M., Miyai, K. Analysisof the three-dimensional structure of the “C-X-G-X-C” motif in the CMGCC and CAGYC re-gions of α- and β-subunits of human chorionicgonadotropin: Importance of Glycine Residue(G) in the Motif, Endocrine J. , in press

中井謙太，法澤公寛，ゲノム解析とテラヘルツ技術，大森豊明監修，テラヘルツテクノロジー：発生・計測・応用技術・展望，エヌ・ティー・エス，pp.４０５―４１５，２００５．

中井謙太，ヒト遺伝子転写開始点のゲノムワイド解析，小原・菅野・小笠原・高木・藤山・辻編集，ゲノムから生命システムへ，蛋白質核酸酵素増刊，５０�：p.２０８３，２００５．

木下賢吾，プロテインインフォマティクスによるタンパク質の機能推定，日本結晶学会誌，４７：３４１―３４７，２００５．

160

The main projects of our laboratory are to reveal new biological meanings at mo-lecular level by various statistical approaches, and to train the researchers for theappropriate use of statistical techniques. The subject under investigation focuseson devising the methods for inferring the association between biological phenom-ena from gene expression profiles by statistical models. In addition, the algebraicapproaches are applied to some issues on theoretical biology.

1. Association Inference by Statistical Models

a. A graphical chain model for inferring regu-latory systems network from gene expres-sion profiles.

Sachiyo Aburatani, Shigeru Saito1, KatsuhisaHorimoto: 1INFOCOM CORP.

A procedure for graphical chain modeling hasbeen designed for analyzing the expression pro-files of genes that can be classified into severalblocks in a natural order. Since the gene expres-sion profiles often share similar patterns, thegenes within a block are grouped into someclusters, as a prerequisite for the modeling.Then, the clusters in the naturally orderedblocks are regarded as variables. Finally, the as-sociations of the variables within and betweenblocks are inferred by the covariance selection ingraphical Gaussian modeling. The newly de-signed procedure for graphical chain modelingwas applied to 619 expression profiles of cell-cycle related genes in yeast, which were selectedfrom 792 genes experimentally identified as be-

ing transcribed in the order of four cell-cyclephases, G1, S, G2, and M. By the application ofthe procedure, the 619 genes were classified into50 clusters, and a chain graph was fitted for 50clusters in the four phases. By focusing on theclusters including transcription factors, charac-teristic relationships between the clustersemerged from the associations of the clusterswithin and between the four phases; one of theremarkable features is the distinctive relation-ships of the clusters between neighboring andnon-neighboring phases. The merits and pitfallsof the graphical chain model are discussed interms of its application to the field of molecularbiology.

b. Algorithm for predicting co-expressiongenes by improvement of path consis-tency algorithm.

Shigeru Saito1, Sachiyo Aburatani, KatsuhisaHorimoto: 1INFOCOM CORP.

We design a simple algorithm to predict theco-expressed genes from expression profiles. The

Human Genome Center

Laboratory of Biostatistics(Biostatistics Training Unit)バイオスタティスティクス人材養成ユニット

Professor Katsuhisa Horimoto, Ph.D.Research Associate Sachiyo Aburatani, Ph.D.

特任教授理学博士堀本勝久特任助手農学博士油谷幸代

161

path consistency (PC) algorithm, which is one ofconstraint-based method for inferring the causalgraph, is modified by considering the nature ofactual expression profiles, and further improvedby interpolating the biological knowledge of op-erons. The algorithm was applied to the expres-sion profiles of known operons and of all genesin Escherichia coli , and then about 90% and 70%of known operons were correctly detected withsmall errors. In addition, a reasonable operoncandidate in terms of gene function was pre-sented.

c. Orchestration of Gene Systems Inferredfrom Expression Profiles in HepatocellularCarcinoma.

Sachiyo Aburatani, Shigeru Saito1, MasaoHonda2, Shu-ichi Kaneko2, Katsuhisa Hori-moto: 1INFOCOM CORP., 2Kanazawa Univ.

Hepatitis C virus (HCV) is the major etiologicagent of non-A non-B hepatitis and chronicallyinfects about 170 million people worldwide.Many HCV carriers develop chronic hepatitis C(CH-C), finally complicated with hepatocellularcarcinoma (HCC) in a liver with advanced stageCH-C.

Here, we analyzed the gene expression pro-files of HCC and its background liver with ad-vanced stage of CH-C, by a statistical methodrecently devised to infer the gene systems net-work from the gene expression profiles, basedon the graphical Gaussian model.

Recently, we have developed an approach toinfer a regulatory network, which is based ongraphical Gaussian modeling (GGM). Ourmethod provides a framework of gene regula-tory relationships by inferring the relationshipbetween the clusters, and provides clues towardestimating the global relationships betweengenes on a genomic scale. Also, we have de-vised a procedure, named ASIAN (AutomaticSystem for Inferring A Network), to apply GGMto gene expression profiles, by a combination ofhierarchical clustering. In this study, the previ-ous version of the ASIAN web server has beenimproved to facilitate its utilization.

The ASIAN system is composed of four parts:1) the calculation of a correlation coefficient ma-trix for the raw data, 2) the hierarchical cluster-ing, 3) the estimation of cluster boundaries, and4) the application of GGM to the clusters. In theGGM, the network is inferred by the calculationof a partial correlation coefficient matrix fromthe correlation coefficient matrix, and the partialcorrelation coefficient matrix can only be ob-tained if the correlation coefficient matrix isregular. Since the gene expression profiles on a

genomic scale often include many profiles shar-ing similar expression patterns, the correlationcoefficient matrix is not always regular. There-fore, the first three parts (1)-3)) are prerequisitefor analyzing the redundant data, includingmany similar patterns of expression profiles, bythe last part (4)), the network inference byGGM. All calculations were performed byASIAN web site and “Auto Net Finder”, PCversion of ASIAN, from INFOCOM CORPORA-TION.

The expression profiles of 8516 genes weremonitored in 17 samples in HCC. By applicationof ASIAN to the profiles, a graph of the genesystems is inferred, and it provides snapshot fororchestrating the gene systems in HCC.

d. Causal inference of gene systems networkin hepatocellular carcinoma progressionby graphical chain model.

Sachiyo Aburatani, Shigeru Saito1, MasaoHonda2, Shu-ichi Kaneko2, Katsuhisa Hori-moto: 1INFOCOM CORP., 2Kanazawa Univ.

We analyzed the gene expression profiles ofHCC and its background liver with advancedstage of CH-C, by a statistical method recentlydevised to infer the gene systems network fromthe gene expression profiles, based on thegraphical chain model (GCM).

The GCM infers a causal relationship betweenvariables that can be naturally grouped (blocks)and ordered from prior knowledge. In a GCM,any direct association between two variables inthe same block is assumed to be non-causal, andany direct association between two variablesfrom different blocks is assumed to be poten-tially causal. Thus, the GCM is one of the suit-able models to infer the causal network betweengene systems in distinct biological stages.

The procedure is as follows: 1) The genes thatexpress characteristically in distinct stages areselected from all genes.; 2) In each stage, theprofiles of genes are subjected to a clusteringanalysis, and the gene groups (systems) are de-fined in terms of biological function.; 3) Thegene systems are subjected to GCM, to infer thecausal network between gene systems in differ-ent stages.; 4) The causal network between genesystems are evaluated by the biological knowl-edge.

The expression profiles of 8516 genes weremonitored in 27 samples in CH-C and 17 sam-ples in HCC. By application of GCM to the pro-file data, a causal graph for the gene systems inCH-C and HCC progression were inferred fromthe profiles.

162

f. ASIAN: a web server for inferring a regula-tory network framework from gene expres-sion profiles.

Shigeru Saito1, Kousuke Goto1 , SachiyoAburatani, Hiroyuki Toh2, Katsuhisa Hori-moto: 1INFOCOM CORP., 2Kyushu University

The standard workflow in gene expressionprofile analysis to identify gene function is theclustering by various metrics and techniques,and the following analyses, such as sequenceanalyses of upstream regions. A further chal-lenging analysis is the inference of a gene regu-latory network, and some computational meth-ods have been intensively developed to deducethe gene regulatory network. Here, we describeour web server for inferring a framework ofregulatory networks from a large number ofgene expression profiles, based on graphicalGaussian modeling (GGM) in combination withhierarchical clustering (http://eureka.ims.u-tokyo.ac.jp/asian). GGM is based on a simplemathematical structure, which is the calculationof the inverse of the correlation coefficient ma-trix between variables, and therefore, our servercan analyze a wide variety of data within a rea-sonable computational time. The server allowsusers to input the expression profiles, and itoutputs the dendrogram of genes by several hi-erarchical clustering techniques, the clusternumber estimated by a stopping rule for hierar-chical clustering, and the network between theclusters by GGM, with the respective graphicalpresentations. Thus, the ASIAN web server pro-vides an initial basis for inferring regulatory re-lationships, in that the clustering serves as thefirst step toward identifying the gene function.

2. Algebraic Approaches

a. Symbolic-numeric optimization for biologi-cal kinetics by quantifier elimination.

Shigeo Orii1, Hazuhiro Anai1, Katsuhisa Hori-moto: 1FUJITSU LTD.

We introduce a new approach to optimizationfor biological kinetics that deals with numericaldata by symbolic quantifier elimination (QE). Inthis study, we illustrate the feasibility of thesymbolic-numeric method in comparison withprevious numerical methods.

The symbolic-numeric approach is applied toan optimization problem estimating five reactionparameters to fit a simulated signal with thefive parameters to observed one, in the model

described by ODE for the mechanism of irre-versible inhibition of HIV protainase.

The reaction parameters k″(k″22, k″3, k″42, k″52, k″6with minimum SSq is the same magnitude asthose by the numerical optimization methods inprevious studies. Furthermore, the presentmethod has the following merits: 1) The modelparameters ‾k′(i , j) and ‾k″are estimated with afew points (e.g. two points) of the observed sig-nal. 2) Feasible ranges of ‾k′(i , j) and ‾k″(i , J ) areselected because unfeasible region can be con-firmed exactly by the result “false” obtained byQE. 3) Our method enables us to estimate ex-actly how much the uncertainties of numericalsimulation and observation should be so thatthe constraints become feasible. 4) The symbolic-numeric approach provides feasible ranges ofreaction parameters.

b. On conditions for morphogenetic diversityof multicellular organisms.

Hiroshi Yoshida, Shigeo Orii1, Hazuhiro Anai1,Katsuhisa Horimoto: 1FUJITSU LTD.

In a multicellular organism, a single cell-anegg-or a group of cells develops into a certainpattern with a variety of cell types. The varietyof cell types are created through cell differentia-tion; the differentiation starts from an initialtype, and then the initial type changes into sev-eral intermediate types before differentiatinginto the final type. Theoretical study of cell dif-ferentiation and morphogenesis was pioneeredby Alan Turing, who showed that a reaction-diffusion system can produce an inhomogene-ous stable pattern. Turing’s theory provides abasis of dynamical system for morphogenesisand potentiality of cell differentiation. Embryo-genesis with increases of cell numbers was, how-ever, not studied. By considering Turing’s studyand intracellular dynamics, together with thecell division process to increase the cell num-bers, Kaneko and Yomo have proposed isologousdiversification. This allows the spontaneous celldifferentiation through cell division processesand cell-cell interactions. These studies providea basis for morphogenetic diversity of multicel-lular organisms. However, relevance of prolif-eration rates and transition rates between celltypes to morphogenetic diversity has not beenstudied. In this paper, we shall answer thisquestion by constructing the model based onprobabilistic Lindenmayer system with interac-tion and by using quantifier elimination (abbre-viated to QE).

163

Publications

Aburatani, S., Goto, K., Saito, S., Toh, H. andHorimoto, K. ASIAN: A Web Server for Infer-ring a Regulatory Network Framework fromGene Expression Profiles. Nucleic Acids Res. ,33, W659-W664, 2005.

Aburatani, S., Sakai, H., Murakami, H. andHorimoto, K. Elucidation of the relationshipsamong the LexA-regulated genes in SOS re-sponse. Proceedings of the 9th World Multi-Conference on Systemics , Cybernetics and Infor-matics, 8, 1-5, 2005.

Aburatani, S., Sakai, H., Murakami, H. andHorimoto, K. Elucidation of the relationshipsamong the LexA-regulated genes in SOS re-sponse. Genome Informatics. 16, 95-105, 2005.

Aburatani, S., Saito, S., Toh, H. and Horimoto,K. Graphical Models for Gene ExpressionAnalyses. Algebraic Biology 2005 , 1, 157-171,2005.

Aburatani, S., Saito, S., Toh, H. and Horimoto,K. A Graphical Chain Modeling Approach forAnalyzing Gene Expression Profiles. Stat.Method. , (in press).

Aburatani, S. and Horimoto, K. Statistical analy-sis of the relationships between LexA-regulated genes from expression profiles. Res.Commun. Biochem. Cell & Mol. Biol. (in press).

Dukka Bahadur K.C., Tomita, E., Suzuki, J.,Horimoto, K. and Akutsu, T. Protein Thread-ing with Profiles and Distance Constraints Us-ing Clique Based Algorithms. J. Bioinfo. Com-put. Biol. (in press).

Murakami, H., Sakai, H., Aburatani, S. andHorimoto, K. Relationship between SegmentalDuplications and Repeat Sequences in HumanChromosome 7. Genome Informatics, 16, 13-21,2005.

Murakami, H., Sakai, H., Aburatani, S. andHorimoto, K. Relationship between Segmental

Duplications and Repeat Sequences in HumanChromosome 7. Proceedings of the 9th WorldMulti-Conference on Systemics , Cybernetics andInformatics, 8, 10-14, 2005.

Orii, S. Anai, H. and Horimoto, K. Symbolic-Numeric Optimization by Quantifier Elimina-tion: an Application to Biological KineticModel. Proceedings of the 9th World Multi-Conference on Systemics , Cybernetics and Infor-matics, 8, 15-20, 2005.

Orii, S., Anai, H. and Horimoto, K. Symbolic-numeric estimation of parameters in bio-chemical models by quantifier elimination.Bioinfo 2005 , 272-277, 2005.

Orii, S., Horimoto, K. and Anai, H., A New Ap-proach for Symbolic-Numeric Optimization inBiological Kinetic Models. Algebraic Biology2005 , 1, 85-95, 2005.

Sakai, H., Murakami, H., Aburatani, S., Hori-moto, K. and Kanehisa, M. Bayesian Ap-proach for Sequence Pattern Search in TissueSpecific Alternative Splicing. Proceedings of the9th World Multi-Conference on Systemics , Cyber-netics and Informatics, 8, 25-30, 2005.

Saito, S., Aburatani, S. and Horimoto, K. Net-work Inference Tool on Personal Computer.Proceedings of the 9th World Multi-Conference onSystemics , Cybernetics and Informatics, 8, 21-24,2005.

Sato, K and Horimoto, K. Comparison betweenProfile-Profile Methods based on HiddenMarkov Models and Multiple Alignments.Proceedings of the 9th World Multi-Conference onSystemics , Cybernetics and Informatics, 8, 31-37,2005.

Yoshida, H, Anai, H., Orii, S. and Horimoto, K.Inquiry into conditions for cell-type diversityof multicellular organisms by quantifier elimi-nation. Algebraic Biology 2005 , 1, 105-113, 2005.

164

Laboratory of Genome Database Laboratory of Sequence Analysis

Documents