The Prediction of Drug-Disease Correlation Based on Gene ...downloads.hindawi.com/journals/bmri/2018/4028473.pdf · BioMedResearchInternational Disease related transcriptome data

Research ArticleThe Prediction of Drug-Disease Correlation Based onGene Expression Data

Hui Cui 123 Menghuan Zhang 23 Qingmin Yang34 Xiangyi Li3

Michael Liebman35 Ying Yu 6 and Lu Xie 3

1School of Life Science and Technology ShanghaiTech University Shanghai 201210 China2Institute for Nutritional Sciences Shanghai Institutes for Biological Sciences University of Chinese Academy of SciencesShanghai 200031 China3Shanghai Center for Bioinformation Technology Shanghai Academy of Science and Technology Shanghai 201203 China4College of Food Science and Technology Shanghai Ocean University No 999 Hu Cheng Huan Road Shanghai 201306 China5IPQ Analytics LLCStrategic Medicine Philadelphia PA USA6Department of Pharmacology School of Basic Medical Sciences Tianjin Medical University Tianjin 30007 China

Correspondence should be addressed to Ying Yu yuyingsibsaccn and Lu Xie luxiex2017outlookcom

Received 11 November 2017 Revised 18 January 2018 Accepted 11 February 2018 Published 25 March 2018

Academic Editor Jialiang Yang

Copyright copy 2018 Hui Cui et al This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

The explosive growth of high-throughput experimental methods and resulting data yields both opportunity and challenge forselecting the correct drug to treat both a specific patient and their individual disease Ideally it would be useful and efficient ifcomputational approaches could be applied to help achieve optimal drug-patient-diseasematching but current efforts havemetwithlimited success Current approaches have primarily utilized the measureable effect of a specific drug on target tissue or cell lines toidentify the potential biological effect of such treatment While these efforts have met with some level of success there exists muchopportunity for improvement This specifically follows the observation that for many diseases in light of actual patient responsethere is increasing need for treatment with combinations of drugs rather than single drug therapies Only a few previous studieshave yielded computational approaches for predicting the synergy of drug combinations by analyzing high-throughput moleculardatasets However these computational approaches focused on the characteristics of the drug itself without fully accounting fordisease factors Here we propose an algorithm to specifically predict synergistic effects of drug combinations on various diseases byintegrating the data characteristics of disease-related gene expression profiles with drug-treated gene expression profiles We havedemonstrated utility through its application to transcriptome data including microarray and RNASeq data and the drug-diseaseprediction results were validated using existing publications and drug databases It is also applicable to other quantitative profilingdata such as proteomics data We also provide an interactive web interface to allow our Prediction of Drug-Disease method tobe readily applied to user data While our studies represent a preliminary exploration of this critical problem we believe that thealgorithm can provide the basis for further refinement towards addressing a large clinical need

1 Introduction

As we know many diseases are not resolved by treatmentwith one single drug for example most cancers and diabetesAt time of diagnosis and staging many aberrant genes canbe observed either involving mutation or modification orexhibiting altered levels of expression yielding perturbationsto signaling pathways This is the reality of complex dis-eases which complicates their treatment particularly in the

difficulty in identifying potential driver or passenger genesTherefore the traditional ldquoone drug-one targetrdquo therapeuticapproach often shows limited efficacy because of inappropri-ate targeting development of adverse events and potentialresistance [1] As a result it has become necessary to developcombination drug therapies [2]

Combined drug therapy typically involves administeringtwo or more drugs simultaneously or sequentially Withinthe past two decades combination therapies have been

HindawiBioMed Research InternationalVolume 2018 Article ID 4028473 6 pageshttpsdoiorg10115520184028473

2 BioMed Research International

used successfully in clinical experiments and have attractedtremendous attention as promising treatments for complexdisorders especially those with multifactorial pathogenicmechanisms [3] For example the combination treatment offluticasone and propionate provides better asthma controlthan increasing the dose of either single drug alone whilesimultaneously reducing the frequency of exacerbations [4]It is noted that an increasing number of combination drugsare being marketed as commercial products with a fixeddosage of each component and with approval of the Food andDrug Administration (FDA) in the past 5 years especially forthose complex diseases such as type II diabetes HIV infec-tions and cancer In the particular area of cancer therapythe first combination was granted in January 2014 by FDA totreat melanoma with BRAF V600E or V600K mutations [2]Currently approximately 50 combination therapies withoutfixed component dosage have been referred by FDA to treatdifferent cancer subtypes

Pharmacologically a drug combination may producesynergistic additive antagonistic or even suppressive effectif the combined effect is greater than equal to or less thanthe sum of each individual drug [5] Synergistic effects aretypically the most desirable because of enhanced efficacypotential for decreasing dosage with equal or increasedlevel of efficacy or delayed development of drug resistance[6] Therefore identification of synergistic agents presents asignificant opportunity to better deal with complex diseaseseven though it is a highly challenging task [7]The synergy ofdrugs can be assayed by testing the inhibition of tumor cellgrowth by individual drugs and their combinations in vitrofollowed by a mathematical formulation by Loewe additivityor Bliss independence [1 8] However it is not practicalto test the synergistic effect of all possible combinationsof drugs through experiments due to the large number ofdrugs approved by FDA The development of computationalmethods for predicting effects of drug combination canplay an essential role in developing systematic screening ofcombinatorial treatment regiments [9]

Previous studies have proposed a handful of compu-tational approaches to analyze high-throughput moleculardatasets for predicting the synergy of drug combinationsRecently Zhao et al introduced a model to predict theefficacies of drug combinations by integrating molecular andpharmacological data But its dependence on the featurepattern specifically enriched in approved drug combinationsseverely limited its potential application [10] Similarly Wuet al proposed a network-analysis-based model that utilizedgene expression profiles following individual treatments topredict gene expression changes induced by drug combina-tions which were then used to estimate the effectiveness ofthe combinations [7] Another model named the enhancedPetri-Netmodel provided informative insight into themech-anisms of drug actions which was established to recognizethe synergism of drug combinations [11] But its requirementof a gene expression profile for every drug pair limited itsapplication

However these computational approaches only considerthe characteristics of the drug itself without taking intoaccount an equivalent characterization of the disease The

effectiveness of the drug may be applicable for the specifiedcell line but not applicable for the actual disease as itpresents in patients To account for this here we proposean algorithm to specifically predict synergistic effects ofdrug combinations on various diseases by integrating thedata characteristics of disease-related gene expression profileswith drug-treated gene expression profiles We have demon-strated utility through its application to transcriptome dataincludingmicroarray andRNASeq data and the drug-diseaseprediction results were validated using existing publicationsand drug databases It is also applicable to other quantitativeprofiling data such as proteomics data We also provide aninteractive web interface (httpswwwscbitorgPEDD) toallow our Prediction of Drug-Disease method to be readilyapplied to user data

2 Methods

In this research we developed a disease-drug predictionalgorithm using transcriptome data We describe both dataaggregation and our algorithm in detail below

21 Data Aggregation First gene expression data of drugtreated samples and disease-related gene expression datasetare identified and qualified from literature and public domaindatabases

211 Gene Expression Data following Drug TreatmentGSE51068 dataset (httpswwwncbinlmnihgovgeoqueryacccgiacc=GSE51068) was downloaded from the GEOdatabase which contained gene expression data of 282 drug-treated samples We selected high-throughput expressionprofiling of OCI-Ly3 cell line treated with 14 different knowndrugs at 2 different concentrations and profiled at 6 12 and24 hours after treatment For our initial study profiling after6-hour treatment was chosen Summary information aboutthe 14 known drugs was shown in Table S1

212 Disease-Related Gene Expression Data We have devel-oped our method so that it can be applied not only tomicroarray data but also to RNAseq data Thus two datatypes were identified and collected

We established the following requirements formicroarraydata in this study the experimental group involves humandisease samples the control group is nondisease samplesand the number of experimental samples is greater than50 Six microarray datasets (GSE9476 GSE33615 GSE22529GSE26049 GSE19429 and GSE47552) were selected from theGEO database (httpswwwncbinlmnihgovgeo) includ-ing 9 blood cell and bone marrow related malignancies anddiseases (Table S2)

Additional disease-related gene expression data involvesRNAseq data Here four cancer types were chosen includingbreast cancer liver cancer lung adenocarcinoma and lungsquamous cell carcinoma We extracted these cancer-relatedRNAseq data from UCSC Xena which is provided by TCGA(httpsxenabrowsernetdatapageshost=httpstcgaxena-hubsnet)

BioMed Research International 3

Disease relatedtranscriptome data

ldquoDisease-genesrdquo

Drug 2 treatedtranscriptome data


Nontreatedtranscriptome data

Normal relatedtranscriptome data

ldquoDrug 1-genesrdquo ldquoDrug 2-genesrdquo

ldquoCombined drugs-genesrdquo

ldquoDrugs-genesrdquo

R limma

One-sidedPearson method

T-testZ-test

T-testZ-test

ldquoDiseases-drugsrdquo

Scoring method

Figure 1 The algorithm flow

22 Algorithm Design and Implementation Our goal is topredict the effects of drugs on various diseases when usedin combination The detailed algorithm implementation isdefined in the steps (Figure 1)

Step 1 Differentially expressed genes (DEGs) were identi-fied within the disease-related gene expression dataset Formicroarray data the ldquolimmardquo package in 119877 was used toidentify DEGs with a Benjamini-Hochberg adjusted 119901 valueof 001 For RNASeq data the ldquolimmardquo package in 119877 was alsoused to identify DEGs with a Benjamini-Hochberg adjusted119901 value of 005 Additionally the threshold fold change ingene expression in the experimental group that was selectedwas at least twice higher or lower than the gene expression incontrol group for microarray and RNASeq data

Step 2 DEGs were identified for the 14 drugs A 119879 test wasperformed to get the observed test statistics for the genes inthe drug-treated group compared to control groupThen theobserved test statistics were converted into 119911-scores

119911119894 = Φminus1 (119875 (119905119894)) (1)

where 119905119894 denotes the observed test statistics for the gene 119894 andΦ(sdot) is the cumulative distribution If the 119911-score is greaterthan 196 it indicates that the gene expression is upregulatedafter drug treatment If the 119911-score is lower than minus196 itindicates that the gene expression is downregulated after drugtreatment

Step 3 DEGs were identified for the 91 combination drugsThe 14 drugswill generate 91 unique drug combinations (119862214)To compute the combined effect of two drugs on each gene a

Table 1 The matching coefficient

Disease DrugUp expressed gene Down expressed gene

Up expressed gene minus1 +1Down expressed gene +1 minus1one-sidedPearsonrsquosmethodwas used to combine the 119911-scoresof two drugs

119901119904119894 = 119875(11988324 lt minus2 times sum119895=12

ln (1 minus Φ (119911119894119895))) (2)

where 119911119894119895 (119895 = 1 2) denote the 119911-score of the gene 119894 for anytwo drugs

Then the combined 119911-score was calculated1199111015840119894 = Φminus1 (119901119904119894 ) (3)

Step 4 DEGs of drug-related and disease-related werematched by evaluating a specific constraint Here the 119901 valueof the ldquodrug-diseaserdquo relationship is calculated using thefollowing formula

119901119896 = Φ(sum119899119894=1 abs (Φminus1 (119901119904119894 )) 119868 119894 isin 119896radicsum119896119894=1 abs (119868) 119894 isin 119896 ) (4)

where 119896 represents the number of genes that can be matchedbetween the drug and the disease and 119868 is the matchingcoefficient (Table 1) If the gene is upregulated in the diseaseand the gene is downregulated after drug treated 119868 is +1If the gene is downregulated in the disease and the gene isupregulated after drug treated 119868 is +1 Otherwise 119868 is minus1


(a) (b)

Figure 2 The relationship between drug and disease using microarray data (a) and RNASeq data (b) Drugs are represented by trianglesDiseases are represented by circlesThe thickness of the linking edge is directly related to themagnitude of the score between drug and disease

Step 5 An indicator score was calculated by scoring thematching results to evaluate the effect of the drug on thedisease The formula is as follows

Score = Φminus1 (119875119896) times 119896119873 (5)

where 119896 represents the number of genes that can be matchedbetween the drug and the disease 119873 is the total number ofDEGs in each disease 119875 is the value calculated in Step 4

23 Literature and Database Validation For any two drugs(A and B) and any specific disease three scores can begenerated indicating the relationship between drug A andthe disease between drug B and the disease and betweenthe A + B drug combination and the disease Here we chosethe highest score as the most effective In addition the scoremust be greater than 0 suggesting that the drug has anenhanced treatment effect on the disease If the score ofdrug combination is higher than that of any single drug wedefine the drug combination to be more effective We choseto exclude those drugs that were not in DrugBank Finallyresults were validated through reviewing both publishedliterature and drug-related databases including DrugBank(httpswwwdrugbankcareleaseslatest) [12] FDA (httpswwwfdaGov) DCDB (httpwwwclszjueducndcdb)[13] and the Pubmed (httpswwwncbinlmnihgovpub-med)

3 Results

31 Relation between Drug and Disease As a result of ouranalysis relationships between drugs and diseases wereestablished and are shown in Figure 2(a) for microarray data

We can see that the most closely related to acute adult T-cell leukemia is the drug combination of camptothecin (CA)and Mitomycin C (MC) followed by the drug combinationof camptothecin (CA) and Etoposide (EP) and combinationof Etoposide (EP) and Mitomycin C (MC) These drugcombinations were also closely related to chronic adult T-cellleukemia whichmay be due to their similar pathophysiologiccharacteristics

Similarly relationships between drugs and other cancersare shown in Figure 2(b) for RNASeq data The drug com-bination most closely related to breast cancer is that of Acla-cinomycin A (AA) and Doxorubicin (DH) followed by thedrug combination of Doxorubicin (DH) and Etoposide (EP)and then the combination of Etoposide (EP) and Rapamycin(RP) The most closely related combination to liver cancerinvolves Doxorubicin (DH) and Etoposide (EP) followed bythe drug combination of Aclacinomycin A (AA) and Dox-orubicin (DH) and then the combination of Etoposide (EP)and Rapamycin (RP) The drug combination most closelyrelated to lung adenocarcinoma is Aclacinomycin A (AA)and Doxorubicin (DH) followed by the drug combinationof Doxorubicin (DH) and Etoposide (EP) and then thecombination of Doxorubicin (DH) and Rapamycin (RP) Inlung squamous cell carcinoma the most closely related drugcombination involves Etoposide (EP) and Rapamycin (RP)followed by the drug combination of Doxorubicin (DH) andEtoposide (EP) and then the combination of Doxorubicin(DH) and Rapamycin (RP)

32 Further Validation As a result of our filtering algorithm(see Methods) a total of 105 relationships between drugs anddiseases were identified using microarray data and a total of67 relationships were identified using RNASeq data Then


results were validated through review of published literatureand drug-related databases

The reviewing identified 36 relationships (microarray)and 41 relationships (RNASeq) in previous studies (TablesS3 and S4) Moreover there are also 39 synergistic drugsand 18 synergistic drugs identified by previous studies formicroarray andRNASeq data respectively (Tables S5 and S6)

33 Web Interface We have further implemented the pro-posed approach as an interactive web tool named ldquoPre-dicting the Effect of the Drug on Disease (PEDD)rdquo(httpswwwscbitorgPEDD)Thisweb tool is intuitive andcan be easily applied to similar analyses using user-provideddrug-treated gene expression data and disease-related geneexpression data to predict relationships between drugs anddiseases We continue to refine the algorithm and to refinethe selection of datasets for example both experimental dataand disease subtypes in ongoing studies

4 Discussion

Due to the complexity of the disease frequent lack ofresponse to targeted therapies and the emergence of drugresistance interest in potential drug combination therapyhas increased [14] Both computational methods and exper-imental methods have been applied to screen synergis-tic drugs An optimal approach would be the potentialto use computational screening to broaden the study ofpotential component drugs for combination therapy andto better direct the application of experimental validationThis approach can lead to more rapid and effective meansfor screening and identifying candidate drug combinationsSynergistic drug prediction models have been previouslystudied For example Jin et al built an enhanced Petri-net(EPN)model to predict the synergistic effect of pairwise drugcombinations from genome-wide transcriptional expressiondata by applying Petri-nets to identify specific drug targetedsignaling networks [11] Sun et al constructed a model calledRanking-system of Anticancer Synergy (RACS) based onsemisupervised learning which was used to rank drug pairsaccording to their similarity to the labeled samples in a spec-ified multifeature space [15] However these computationalapproaches only considered the characteristics of the drugitself without taking into account potentially valuable diseaseobservations The resulting effectiveness of these predictionsmay be applicable for the cell line but not readily extendablefor disease as it appears in humans For these reasons wedeveloped an algorithm to expand on these earlier worksand to predict the effects of drugs on various diseasesby integrating gene expression data generated from diseasetissues and drug-treated cell lines

The workflow is as follows Firstly up and down geneswere calculated with disease-related gene expression dataSecondly with the gene expression data of drug-treated cellline we calculated up and down genes for single drug andcombination drugs Next the disease-related up and downgenes were matched with drug-related up and down genes byour matched principle Moreover according to the matchedresult scores were calculated which represented the effect

of drug on various diseases by our scoring method Theimplementation of our algorithm as an interactive web toolmakes the proposed approach easily accessible to all scientistsin general Researchers can find potential drugs for diseasesaccording to the calculated scores

In this study our algorithm can give out the scoresof both drug combination and each of the single drugfor a disease thus it is applicable not only to the drugcombination prediction but also to the drug repositioningAlso according to the score rank it may be defined thatthe drug combination is more effective than single drugs ifit has the highest score Besides this algorithm is not onlyapplicable to transcriptomics data but also applicable to otherquantitative profiling data such as proteomics data

The results showed that the effect of combination drugsmay be higher than the effect of the individual componentdrugs in some diseases For example the effect of com-bination of camptothecin and monastrol was predicted tobe greater than the effect of camptothecin or monastrolindividually in acute adult T-cell leukemia and chronic adultT-cell leukemia In contrast the effect of combination drugsmay be lower than the effect of the individual componentdrugs in some other diseases For example the effect ofcombination of camptothecin andmonastrol was predicted tobe reduced in efficacy inmultiplemyeloma and polycythemiavera In general we believe that this analytic approach cancontribute to drug research and screening studies and use thispreliminary study to show its potential value

However in our algorithm differential genes bear equalweights while the change of some key genes may givelarger effect For example both gene sequence variations andexpression changes are important molecular phenotypes inhuman disease especially cancer They should be assigneddifferential weights But how to determine the key genes andhow to assign differential weights for them are very difficultas we only use the data of gene expression profile in this studyIn the future research more in-depth study of this aspectconsidering more factors should be carried out For examplewemay usemultilevel omics expression data and drug targetsto find the key genes and assign differential weights for themWhat is more we also recognize that the disease classes forexample ldquobreast cancerrdquo that have been used in this studyare likely subject to further stratification for example DCISWe are currently studying the application of this approach tosuch refinements

And with the rapid development of next-generationsequencing (NGS) technology and the accumulation of his-tological data [16] there have been many databases thatcan be used to screen single drugs or synergistic drugssuch as FDA and DrugBank [12] However a compre-hensive database about ldquodrug-cancer relationshipsrdquo has notbeen established which contains both the single drugs andcombination drugs related to cancer-related informationWe believe such database would be available in future bycollecting the information from current public databases andpublished literature The database will provide an importantassessment criteria for the ldquodrug-cancerrdquo predictions andprovide important reference value for the strategy design ofantitumor combination therapy While our studies represent


a preliminary exploration of this critical direction we believethat the algorithm can provide the basis for further refine-ment towards addressing a large clinical need in antitumorcombination therapy

Conflicts of Interest

The authors declare that they have no conflicts of interest

Authorsrsquo Contributions

Hui Cui and Menghuan Zhang contributed equally to thiswork and should be considered co-first authors

Acknowledgments

This work was supported by National Key Research andDevelopment ProgramofChina [2016YFC0904101] NationalNatural Science Foundation of China [31570831] NationalHi-Tech Program [2015AA020101] and Chinese Human Pro-teome Projects [CNHPP 2014DFB30020 2014DFB30030]

Supplementary Materials

Table S1 drug information Table S2 microarray data infor-mation Table S3 drug-disease relations identified by pre-vious studies from microarray data Table S4 drug-diseaserelations identified by previous studies from RNAseq dataTable S5 synergistic drugs identified by previous studies frommicroarray data Table S6 synergistic drugs identified by pre-vious studies from RNAseq data (Supplementary Materials)

References

[1] J A Curtin J Fridlyand T Kageshita et al ldquoDistinct sets ofgenetic alterations in melanomardquo The New England Journal ofMedicine vol 353 no 20 pp 2135ndash2147 2005

[2] Z Sheng Y Sun Z Yin K Tang and Z Cao ldquoAdvancesin computational approaches in identifying synergistic drugcombinationsrdquo Briefings in Bioinformatics 2017

[3] J Jia X Ma Z W Cao Y X Li and Y Z Chen ldquoErratumMechanisms of drug combinations Interaction and networkperspectives (Nature ReviewsDrugDiscovery (2009) vol 8 (111-128) 101038nrd2683)rdquo Nature Reviews Drug Discovery vol 8no 6 p 516 2009

[4] J Yang H Tang Y Li et al ldquoDIGRE drug-induced genomicresidual effect model for successful prediction of multidrugeffectsrdquo CPT Pharmacometrics amp Systems Pharmacology vol 4no 2 pp 91ndash97 2015

[5] P B Chapman et al ldquoImproved survival with vemurafenibin melanoma with BRAF V600E mutationrdquo The New EnglandJournal of Medicine vol 364 no 26 pp 2507-16 2011

[6] H S Nelson ldquoAdvair Combination treatment with fluticasonepropionatesalmeterol in the treatment of asthmardquoThe Journalof Allergy and Clinical Immunology vol 107 no 2 pp 397ndash4162001

[7] Z Wu X Zhao and L Chen ldquoA systems biology approach toidentify effective cocktail drugsrdquo BMC Systems Biology vol 4no Suppl 2 p S7 2010

[8] M A Held C G Langdon J T Platt et al ldquoGenotype-selective combination therapies for melanoma identified by

high-throughput drug screeningrdquo Cancer Discovery vol 3 no1 pp 52ndash67 2013

[9] Q Xu Y Xiong H Dai et al ldquoPDC-SGB Prediction of effectivedrug combinations using a stochastic gradient boosting algo-rithmrdquo Journal of Theoretical Biology vol 417 pp 1ndash7 2017

[10] X ZhaoM Iskar G ZellerM Kuhn V vanNoort and P BorkldquoPrediction of drug combinations by integrating molecular andpharmacological datardquo PLoS Computational Biology vol 7 no12 Article ID e1002323 2011

[11] G Jin H Zhao X Zhou and S T C Wong ldquoAn enhancedPetri-Net model to predict synergistic effects of pairwise drugcombinations from gene microarray datardquo Bioinformatics vol27 no 13 pp i310ndashi316 2011

[12] D S Wishart C Knox A C Guo et al ldquoDrugBank a compre-hensive resource for in silico drug discovery and explorationrdquoNucleic Acids Research vol 34 pp D668ndashD672 2006

[13] Y Liu Q Wei G Yu W Gai Y Li and X Chen ldquoDCDB 20 amajor update of the drug combination databaserdquoDatabase vol2014 Article ID bau124 2014

[14] N Borisov et al ldquoA method of gene expression data transferfrom cell lines to cancer patients for machine-learning predic-tion of drug efficiencyrdquo Cell Cycle pp 1ndash6 2017

[15] Y Sun Z Sheng CMa et al ldquoCombining genomic andnetworkcharacteristics for extended capability in predicting synergisticdrugs for cancerrdquo Nature Communications vol 6 article 94812015

[16] J Reuter D V Spacek and M Snyder ldquoHigh-throughputsequencing technologiesrdquoMolecular Cell vol 58 no 4 pp 586ndash597 2015

Hindawiwwwhindawicom

International Journal of

Volume 2018

Zoology

Hindawiwwwhindawicom Volume 2018

Anatomy Research International

PeptidesInternational Journal of



Journal of Parasitology Research

GenomicsInternational Journal of


Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018


BioinformaticsAdvances in

Marine BiologyJournal of



Neuroscience Journal


BioMed Research International

Cell BiologyInternational Journal of



Biochemistry Research International

ArchaeaHindawiwwwhindawicom Volume 2018


Genetics Research International


Advances in

Virolog y Stem Cells International



Enzyme Research



MicrobiologyHindawiwwwhindawicom

Nucleic AcidsJournal of

Volume 2018

Submit your manuscripts atwwwhindawicom


used successfully in clinical experiments and have attractedtremendous attention as promising treatments for complexdisorders especially those with multifactorial pathogenicmechanisms [3] For example the combination treatment offluticasone and propionate provides better asthma controlthan increasing the dose of either single drug alone whilesimultaneously reducing the frequency of exacerbations [4]It is noted that an increasing number of combination drugsare being marketed as commercial products with a fixeddosage of each component and with approval of the Food andDrug Administration (FDA) in the past 5 years especially forthose complex diseases such as type II diabetes HIV infec-tions and cancer In the particular area of cancer therapythe first combination was granted in January 2014 by FDA totreat melanoma with BRAF V600E or V600K mutations [2]Currently approximately 50 combination therapies withoutfixed component dosage have been referred by FDA to treatdifferent cancer subtypes

Pharmacologically a drug combination may producesynergistic additive antagonistic or even suppressive effectif the combined effect is greater than equal to or less thanthe sum of each individual drug [5] Synergistic effects aretypically the most desirable because of enhanced efficacypotential for decreasing dosage with equal or increasedlevel of efficacy or delayed development of drug resistance[6] Therefore identification of synergistic agents presents asignificant opportunity to better deal with complex diseaseseven though it is a highly challenging task [7]The synergy ofdrugs can be assayed by testing the inhibition of tumor cellgrowth by individual drugs and their combinations in vitrofollowed by a mathematical formulation by Loewe additivityor Bliss independence [1 8] However it is not practicalto test the synergistic effect of all possible combinationsof drugs through experiments due to the large number ofdrugs approved by FDA The development of computationalmethods for predicting effects of drug combination canplay an essential role in developing systematic screening ofcombinatorial treatment regiments [9]

Previous studies have proposed a handful of compu-tational approaches to analyze high-throughput moleculardatasets for predicting the synergy of drug combinationsRecently Zhao et al introduced a model to predict theefficacies of drug combinations by integrating molecular andpharmacological data But its dependence on the featurepattern specifically enriched in approved drug combinationsseverely limited its potential application [10] Similarly Wuet al proposed a network-analysis-based model that utilizedgene expression profiles following individual treatments topredict gene expression changes induced by drug combina-tions which were then used to estimate the effectiveness ofthe combinations [7] Another model named the enhancedPetri-Netmodel provided informative insight into themech-anisms of drug actions which was established to recognizethe synergism of drug combinations [11] But its requirementof a gene expression profile for every drug pair limited itsapplication

However these computational approaches only considerthe characteristics of the drug itself without taking intoaccount an equivalent characterization of the disease The

effectiveness of the drug may be applicable for the specifiedcell line but not applicable for the actual disease as itpresents in patients To account for this here we proposean algorithm to specifically predict synergistic effects ofdrug combinations on various diseases by integrating thedata characteristics of disease-related gene expression profileswith drug-treated gene expression profiles We have demon-strated utility through its application to transcriptome dataincludingmicroarray andRNASeq data and the drug-diseaseprediction results were validated using existing publicationsand drug databases It is also applicable to other quantitativeprofiling data such as proteomics data We also provide aninteractive web interface (httpswwwscbitorgPEDD) toallow our Prediction of Drug-Disease method to be readilyapplied to user data

2 Methods

In this research we developed a disease-drug predictionalgorithm using transcriptome data We describe both dataaggregation and our algorithm in detail below

21 Data Aggregation First gene expression data of drugtreated samples and disease-related gene expression datasetare identified and qualified from literature and public domaindatabases

211 Gene Expression Data following Drug TreatmentGSE51068 dataset (httpswwwncbinlmnihgovgeoqueryacccgiacc=GSE51068) was downloaded from the GEOdatabase which contained gene expression data of 282 drug-treated samples We selected high-throughput expressionprofiling of OCI-Ly3 cell line treated with 14 different knowndrugs at 2 different concentrations and profiled at 6 12 and24 hours after treatment For our initial study profiling after6-hour treatment was chosen Summary information aboutthe 14 known drugs was shown in Table S1

212 Disease-Related Gene Expression Data We have devel-oped our method so that it can be applied not only tomicroarray data but also to RNAseq data Thus two datatypes were identified and collected

We established the following requirements formicroarraydata in this study the experimental group involves humandisease samples the control group is nondisease samplesand the number of experimental samples is greater than50 Six microarray datasets (GSE9476 GSE33615 GSE22529GSE26049 GSE19429 and GSE47552) were selected from theGEO database (httpswwwncbinlmnihgovgeo) includ-ing 9 blood cell and bone marrow related malignancies anddiseases (Table S2)

Additional disease-related gene expression data involvesRNAseq data Here four cancer types were chosen includingbreast cancer liver cancer lung adenocarcinoma and lungsquamous cell carcinoma We extracted these cancer-relatedRNAseq data from UCSC Xena which is provided by TCGA(httpsxenabrowsernetdatapageshost=httpstcgaxena-hubsnet)











R limma


T-testZ-test

T-testZ-test


Scoring method





119911119894 = Φminus1 (119875 (119905119894)) (1)







ln (1 minus Φ (119911119894119895))) (2)







(a) (b)






3 Results









4 Discussion














Acknowledgments




References




















Volume 2018

Zoology











Volume 2018

















Advances in




Enzyme Research





Volume 2018












R limma


T-testZ-test

T-testZ-test


Scoring method





119911119894 = Φminus1 (119875 (119905119894)) (1)







ln (1 minus Φ (119911119894119895))) (2)







(a) (b)






3 Results









4 Discussion














Acknowledgments




References




















Volume 2018

Zoology











Volume 2018

















Advances in




Enzyme Research





Volume 2018



(a) (b)






3 Results









4 Discussion














Acknowledgments




References




















Volume 2018

Zoology











Volume 2018

















Advances in




Enzyme Research





Volume 2018






4 Discussion














Acknowledgments




References




















Volume 2018

Zoology











Volume 2018

















Advances in




Enzyme Research





Volume 2018








Acknowledgments




References




















Volume 2018

Zoology











Volume 2018

















Advances in




Enzyme Research





Volume 2018




Volume 2018

Zoology











Volume 2018

















Advances in




Enzyme Research





Volume 2018


The Prediction of Drug-Disease Correlation Based on Gene ...downloads.hindawi.com/journals/bmri/2018/4028473.pdf · BioMedResearchInternational Disease related transcriptome data

Documents