Page 1
PACIFICSYMPOSIUMONBIOCOMPUTING2017
ABSTRACTBOOK
PosterPresenters:Posterspaceisassignedbyabstractpagenumber.Pleasefindthepagethatyourabstractisonandputyourposterontheposterboardwiththecorrespondingnumber(e.g.,ifyourabstractison
page50,putyourposteronboard#50).
Proceedingspaperswithoralpresentations#2-39arenotassignedposterspace.
Papersareorganizedfirstbysession,thenthelastnameofthefirstauthor.Presentingauthors’namesareunderlined.
Page 2
i
TABLEOFCONTENTS
PROCEEDINGSPAPERSWITHORALPRESENTATIONCOMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION 1IDENTIFICATIONANDANALYSISOFBACTERIALGENOMICMETABOLICSIGNATURES...2NathanBowerman,NathanTintle,MatthewDeJongh,AaronA.Best
WHENSHOULDWENOTTRANSFERFUNCTIONALANNOTATIONBETWEENSEQUENCEPARALOGS?.............................................................................................................................................................3MengfeiCao,LenoreJ.Cowen
PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION...................................................................................................................................4ShengWang,MengQu,JianPeng
ONTHEPOWERANDLIMITSOFSEQUENCESIMILARITYBASEDCLUSTERINGOFPROTEINSINTOFAMILIES...............................................................................................................................5ChristianWiwie,RichardRöttger
IMAGINGGENOMICS 6INTEGRATIVEANALYSISFORLUNGADENOCARCINOMAPREDICTSMORPHOLOGICALFEATURESASSOCIATEDWITHGENETICVARIATIONS.....................................................................7ChaoWang,HaiSu,LinYang,KunHuang
IDENTIFICATIONOFDISCRIMINATIVEIMAGINGPROTEOMICSASSOCIATIONSINALZHEIMER'SDISEASEVIAANOVELSPARSECORRELATIONMODEL......................................8JingwenYan,ShannonL.Risacher,KwangsikNho,AndrewJ.Saykin,LiShen
ENFORCINGCO-EXPRESSIONINMULTIMODALREGRESSIONFRAMEWORK.........................9PascalZille,VinceD.Calhoun,Yu-PingWang
METHODSTOENSURETHEREPRODUCIBILITYOFBIOMEDICALRESEARCH 10EXPLORINGTHEREPRODUCIBILITYOFPROBABILISTICCAUSALMOLECULARNETWORKMODELS.........................................................................................................................................11AriellaCohain,AparnaA.Divaraniya,KuixiZhu,JosephR.Scarpa,AndrewKasarskis,JunZhu,RuiChang,JoelT.Dudley,EricE.Schadt
REPRODUCIBLEDRUGREPURPOSING:WHENSIMILARITYDOESNOTSUFFICE...............12EmreGuney
EMPOWERINGMULTI-COHORTGENEEXPRESSIONANALYSISTOINCREASEREPRODUCIBILITY...........................................................................................................................................13WinstonA.Haynes,FrancescoVallania,CharlesLiu,ErikaBongen,AurelieTomczak,MartaAndres-Terrè,ShaneLofgren,AndrewTam,ColeA.Deisseroth,MatthewD.Li,TimothyE.Sweeney,PurveshKhatri
RABIX:ANOPEN-SOURCEWORKFLOWEXECUTORSUPPORTINGRECOMPUTABILITYANDINTEROPERABILITYOFWORKFLOWDESCRIPTIONS..........................................................14GauravKaushik,SinisaIvkovic,JankoSimonovic,NebojsaTijanic,BrandiDavis-Dusenbery,DenizKural
DATASHARINGANDCLINICALGENETICTESTING:SUCCESSESANDCHALLENGES........15ShanYang,MelissaCline,CanZhang,BenedictPaten,StephenE.Lincoln
Page 3
ii
PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM? 16LEARNINGATTRIBUTESOFDISEASEPROGRESSIONFROMTRAJECTORIESOFSPARSELABVALUES.........................................................................................................................................................17VibhuAgarwal,NigamH.Shah
COMPUTERAIDEDIMAGESEGMENTATIONANDCLASSIFICATIONFORVIABLEANDNON-VIABLETUMORIDENTIFICATIONINOSTEOSARCOMA......................................................18HarishBabuArunachalam,RashikaMishra,BogdanArmaselu,OvidiuDaescu,MariaMartinez,PatrickLeavey,DineshRakheja,KevinCederberg,AnitaSengupta,MollyNi'Suilleabhain
MISSINGDATAIMPUTATIONINTHEELECTRONICHEALTHRECORDUSINGDEEPLYLEARNEDAUTOENCODERS..........................................................................................................................19BrettK.Beaulieu-Jones,JasonH.Moore,ThePooledResourceOpen-AccessALSClinicalTrialsConsortium
DEVELOPMENTANDPERFORMANCEOFTEXT-MININGALGORITHMSTOEXTRACTSOCIOECONOMICSTATUSFROMDE-IDENTIFIEDELECTRONICHEALTHRECORDS.......20BrittanyM.Hollister,NicoleA.Restrepo,EricFarber-Eger,DanaC.Crawford,MelindaC.Melinda C. Aldrich,AmyNon
DEMODASHBOARD:VISUALIZINGANDUNDERSTANDINGGENOMICSEQUENCESUSINGDEEPNEURALNETWORKS..........................................................................................................................21JackLanchantin,RitambharaSingh,BeilunWang,YanjunQi
PREDICTIVEMODELINGOFHOSPITALREADMISSIONRATESUSINGELECTRONICMEDICALRECORD-WIDEMACHINELEARNING:ACASE-STUDYUSINGMOUNTSINAIHEARTFAILURECOHORT.............................................................................................................................22KhaderShameer,KippW.Johnson,AlexandreYahi,RiccardoMiotto,LiLi,DoranRicks,JebakumarJebakaran,PatriciaKovatch,ParthoP.Sengupta,AnnetineGelijns,AlanMoskovitz,BruceDarrow,DavidL.Reich,AndrewKasarskis,NicholasP.Tatonetti,SeanPinney5,JoelT.Dudley
METHODSFORCLUSTERINGTIMESERIESDATAACQUIREDFROMMOBILEHEALTHAPPS........................................................................................................................................................................23NicoleTignor,PeiWang,NicholasGenes,LindaRogers,StevenG.Hershman,ErickR.Scott,MicolZweig,Yu-FengYvonneChan,EricE.Schadt
ANEWRELEVANCEESTIMATORFORTHECOMPILATIONANDVISUALIZATIONOFDISEASEPATTERNSANDPOTENTIALDRUGTARGETS.................................................................24ModestvonKorff,TobiasFink,ThomasSander
DISCOVERYOFFUNCTIONALANDDISEASEPATHWAYSBYCOMMUNITYDETECTIONINPROTEIN-PROTEININTERACTIONNETWORKS.................................................................................25StephenJ.Wilson,AngelaD.Wilkins,Chih-HsuLin,RhonaldC.Lua,OlivierLichtarge
PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES 26OPENINGTHEDOORTOTHELARGESCALEUSEOFCLINICALLABMEASURESFORASSOCIATIONTESTING:EXPLORINGDIFFERENTMETHODSFORDEFININGPHENOTYPES......................................................................................................................................................27ChristopherR.Bauer,DanielLavage,JohnSnyder,JosephLeader,J.MatthewMahoney,SarahA.Pendergrass
TEMPORALORDEROFDISEASEPAIRSAFFECTSSUBSEQUENTDISEASETRAJECTORIES:THECASEOFDIABETESANDSLEEPAPNEA.......................................................................................28MetteBeck,DavidWestergaard,LeifGroop,SorenBrunak
Page 4
iii
HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER..................................................................................................................................................................29JonathanGallion,AngelaD.Wilkins,OlivierLichtarge
MUSE:AMULTI-LOCUSSAMPLING-BASEDEPISTASISALGORITHMFORQUANTITATIVEGENETICTRAITPREDICTION......................................................................................................................30DanHe,LaxmiParida
DIFFERENTIALPATHWAYDEPENDENCYDISCOVERYASSOCIATEDWITHDRUGRESPONSEACROSSCANCERCELLLINES..............................................................................................31GilSpeyer,DivyaMahendra,HaiJ.Tran,JeffKiefer,StuartL.Schreiber,PaulA.Clemons,HarshilDhruv,MichaelBerens,SeungchanKim
AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETSINCLEARCELLKIDNEYCANCER................................................................................................................................................32JeffreyA.Thompson,CarmenJ.Marsit
DENOVOMUTATIONSINAUTISMIMPLICATETHESYNAPTICELIMINATIONNETWORK.............................................................................................................................................................33GuhanRamVenkataraman,ChloeO'Connell,FumikoEgawa,DornaKashef-Haghighi,DennisPaulWall
IDENTIFYINGGENETICASSOCIATIONSWITHVARIABILITYINMETABOLICHEALTHANDBLOODCOUNTLABORATORYVALUES:DIVINGINTOTHEQUANTITATIVETRAITSBYLEVERAGINGLONGITUDINALDATAFROMANEHR.................................................................34ShefaliS.Verma,AnastasiaM.Lucas,DanielR.Lavage,JosephB.Leader,RaghuMetpally,SarathbabuKrishnamurthy,FrederickDewey,IngridBorecki,AlexanderLopez,JohnOverton,JohnPenn,JeffreyReid,SarahA.Pendergrass,GerdaBreitwieser,MarylynD.Ritchie
STRATEGIESFOREQUITABLEPHARMACOGENOMIC-GUIDEDWARFARINDOSINGAMONGEUROPEANANDAFRICANAMERICANINDIVIDUALSINACLINICALPOPULATION.......................................................................................................................................................35LauraWiley,JacobVanHouten,DavidSamuels,MelindaAldrich,DanRoden,JoshPeterson,JoshuaDenny
SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY36PRODUCTIONOFAPRELIMINARYQUALITYCONTROLPIPELINEFORSINGLENUCLEIRNA-SEQANDITSAPPLICATIONINTHEANALYSISOFCELLTYPEDIVERSITYOFPOST-MORTEMHUMANBRAINNEOCORTEX...................................................................................................37BrianAevermann,JamisonMcCorrison,PratapVenepally,RebeccaHodge,TrygveBakken,JeremyMiller,MarkNovotny,DannyN.Tran,FranciscoDiez-Fuertes,LenaChristiansen,FanZhang,FrankSteemers,RogerS.Lasken,EdLein,NicholasSchork,RichardH.Scheuermann
TRACINGCO-REGULATORYNETWORKDYNAMICSINNOISY,SINGLE-CELLTRANSCRIPTOMETRAJECTORIES.............................................................................................................38PabloCordero,JoshuaM.Stuart
ANUPDATEDDEBARCODINGTOOLFORMASSCYTOMETRYWITHCELLTYPE-SPECIFICANDCELLSAMPLE-SPECIFICSTRINGENCYADJUSTMENT...........................................................39KristinI.Fread,WilliamD.Strickland,GarryP.Nolan,EliR.Zunder
Page 5
iv
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONSIMAGINGGENOMICS 40ADAPTIVETESTINGOFSNP-BRAINFUNCTIONALCONNECTIVITYASSOCIATIONVIAAMODULARNETWORKANALYSIS...............................................................................................................41ChenGao,JunghiKim,WeiPan
EXPLORINGBRAINTRANSCRIPTOMICPATTERNS:ATOPOLOGICALANALYSISUSINGSPATIALEXPRESSIONNETWORKS...........................................................................................................42ZhanaKuncheva,MichelleL.Krishnan,GiovanniMontana
PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM? 43ADEEPLEARNINGAPPROACHFORCANCERDETECTIONANDRELEVANTGENEIDENTIFICATION...............................................................................................................................................44PadidehDanaee,RezaGhaeini,DavidHendrix
GENOME-WIDEINTERACTIONWITHSELECTEDTYPE2DIABETESLOCIREVEALSNOVELLOCIFORTYPE2DIABETESINAFRICANAMERICANS...................................................45JacobM.Keaton,JacklynN.Hellwege,MaggieC.Y.Ng,NicholetteD.Palmer,JamesS.Pankow,MyriamFornage,JamesG.Wilson,AdolofoCorrea,LauraJ.Rasmussen-Torvik,JeromeI.Rotter,Yii-DerI.Chen,KentD.Taylor,StephenS.Rich,LynneE.Wagenknecht,BarryI.Freedman,DonaldW.Bowden
META-ANALYSISOFCONTINUOUSPHENOTYPESIDENTIFIESAGENESIGNATURETHATCORRELATESWITHCOPDDISEASESTATUS.......................................................................................46MadeleineScott,FrancescoVallania,PurveshKhatri
LEARNINGPARSIMONIOUSENSEMBLESFORUNBALANCEDCOMPUTATIONALGENOMICSPROBLEMS...................................................................................................................................47AnaStanescu,GauravPandey
NETWORKMAPOFADVERSEHEALTHEFFECTSAMONGVICTIMSOFINTIMATEPARTNERVIOLENCE.......................................................................................................................................48KathleenWhiting,LarryY.Liu,MehmetKoyutürk,GunnurKarakurt
PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES 49APOWERFULMETHODFORINCLUDINGGENOTYPEUNCERTAINTYINTESTSOFHARDY-WEINBERGEQUILIBRIUM............................................................................................................50AndrewBeck,AlexanderLuedtke,KeliLiu,NathanTintle
MICRORNA-AUGMENTEDPATHWAYS(MIRAP)ANDTHEIRAPPLICATIONSTOPATHWAYANALYSISANDDISEASESUBTYPING...............................................................................51DianaDiaz,MicheleDonato,TinNguyen,SorinDraghici
FREQUENTSUBGRAPHMININGOFPERSONALIZEDSIGNALINGPATHWAYNETWORKSGROUPSPATIENTSWITHFREQUENTLYDYSREGULATEDDISEASEPATHWAYSANDPREDICTSPROGNOSIS....................................................................................................................................52ArdaDurmaz,TimA.D.Henderson,DouglasBrubaker,GurkanBebek
CERNASEARCHMETHODIDENTIFIEDAMET-ACTIVATEDSUBGROUPAMONGEGFRDNAAMPLIFIEDLUNGADENOCARCINOMAPATIENTS.................................................................53HallaKabat,LeoTunkle,InhanLee
IMPROVEDPERFORMANCEOFGENESETANALYSISONGENOME-WIDETRANSCRIPTOMICSDATAWHENUSINGGENEACTIVITYSTATEESTIMATES...................54ThomasKamp,MicahAdams,CraigDisselkoen,NathanTintle
Page 6
v
METHYLDMV:SIMULTANEOUSDETECTIONOFDIFFERENTIALDNAMETHYLATIONANDVARIABILITYWITHCONFOUNDERADJUSTMENT.................................................................55PeiFenKuan,JunyanSong,ShuyaoHe
IDENTIFYCANCERDRIVERGENESTHROUGHSHAREDMENDELIANDISEASEPATHOGENICVARIANTSANDCANCERSOMATICMUTATIONS.................................................56MengMa,ChangchangWang,BenjaminGlicksberg,EricE.Schadt,ShuyuLi,RongChen
IDENTIFYINGCANCERSPECIFICMETABOLICSIGNATURESUSINGCONSTRAINT-BASEDMODELS.................................................................................................................................................................57AndréSchultz,SanketMehta,ChenyueW.Hu,FiekeW.Hoff,TerzahM.Horton,StevenM.Kornblau,AminaA.Qutub
SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY58MAPPINGNEURONALCELLTYPESUSINGINTEGRATIVEMULTI-SPECIESMODELINGOFHUMANANDMOUSESINGLECELLRNASEQUENCING...................................................................59TravisJohnson,ZacharyAbrams,YanZhang,KunHuang
ASPATIOTEMPORALMODELTOSIMULATECHEMOTHERAPYREGIMENSFORHETEROGENEOUSBLADDERCANCERMETASTASESTOTHELUNG........................................60KimberlyR.KanigelWinner,JamesC.Costello
SCALABLEVISUALIZATIONFORHIGH-DIMENSIONALSINGLE-CELLDATA.........................61JuhoKim,NateRussell,JianPeng
POSTERPRESENTATIONSCOMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION 62CLUSTER-BASEDGENOTYPE-ENVIRONMENT-PHENOTYPECORRELATIONALGORITHM.........................................................................................................................................................63ErnestoBorrayo,RyokoMachida-Hirano
QUANTITATINGTRANSLATIONALCONTROL:MRNAABUNDANCE-DEPENDENTANDINDEPENDENTCONTRIBUTIONS..............................................................................................................64JingyiJessicaLi,Guo-LiangChew,MarkD.Biggin
PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION................................................................................................................................65ShengWang,MengQu,JianPen
GENERAL 66IDENTIFICATIONOFDIFFERENTIALLYPHOSPHORYLATEDMODULESINPROTEININTERACTIONNETWORKS...........................................................................................................................67MarziehAyati,DanicaWiredja,DanielaSchlatzer,GouthamNarla,MarkChance,MehmetKoyutürk
CLUSTERINGMETHODFORPRIORITIZINGBREASTCANCERRISKGENESANDMIRNAS..................................................................................................................................................................68YongshengBai,NaureenAslam,AliSalman
FUSIONDB:ASSESSINGMICROBIALDIVERSITYANDENVIRONMENTALPREFERENCESVIAFUNCTIONALSIMILARITY....................................................................................................................69ChengshengZhu,YannickMahlich,YanaBromberg
Page 7
vi
THEGEORGEM.O’BRIENKIDNEYTRANSLATIONALCORECENTERATTHEUNIVERSITYOFMICHIGAN......................................................................................................................................................70FrankC.Brosius,WenjunJu,KeithBellovich,ZeenatBhat,CrystalGadegbeku,DebbieGipson,JenniferHawkins,JuliaHerzog,SusanMassengill,RichardC.McEachin,SubramaniamPennathur,KalyaniPerumal,RogerWiggins,MatthiasKretzler
MININGDIRECTIONALDRUGINTERACTIONEFFECTSONMYOPATHYUSINGTHEFAERSDATABASE............................................................................................................................................................71DanaiChasioti,XiaohuiYao,PengyueZhang,XiaNing,LangLi,LiShen
DECIPHERINGNEURONALBROADHISTONEH3K4ME3DOMAINSASSOCIATEDWITH GENE-REGULATORYNETWORKSANDCONSERVEDEPIGENOMICLANDSCAPESINTHEHUMANBRAIN...................................................................................................................................................72AslihanDincer,EricE.Schadt,BinZhang,JoelT.Dudley,DavinGavin,SchahramAkbarian
NORMALIZATIONTECHNIQUESANDMACHINELEARNINGCLASSIFICATIONFORASSIGNINGMOLECULARSUBSETSINAUTOIMMUNEDISEASEANDCANCER....................73JenniferM.Franks,GuoshuaiCai,JaclynN.Taroni,MichaelL.Whitfield
MULTI-OMICSDATAINTEGRATIONTOSTRATIFYPOPULATIONINHEPATOCELLULARCARCINOMA.........................................................................................................................................................74KumardeepChaudhary,OlivierPoirion,LiangqunLu,LanaGarmire
TOWARDSSTANDARDS-BASEDCLINICALDATAWEBAPPLICATIONLEVERAGINGSHINYRANDHL7FHIR..................................................................................................................................75NaHong,NareshProdduturi,ChenWang,GuoqianJiang
ADATALAKEPLATFORMOFCONTEXTUALBIOLOGICALINFORMATIONFORAGILETRANSLATIONALRESEARCH......................................................................................................................76AustinHuang,DmitriBichko,MathieuBoespflug,EdskodeVries,FacundoDominguez,DanielZiemek
GENOMEREADIN-MEMORY(GRIM)FILTER:FASTLOCATIONFILTERINGINDNAREADMAPPINGUSINGEMERGINGMEMORYTECHNOLOGIES................................................................77JeremieKim,DamlaSenol,HongyiXin,DonghyukLee,MohammedAlser,HasanHassan,OguzErgin,CanAlkan,OnurMutlu
BCL-2FAMILYMEMBERSASREGULATORSOFRESPONSIVENESSTOBORTEZOMIBINAMULTIPLEMYELOMAMODEL.....................................................................................................................78MelissaE.Ko,CharisTeh,ChristopherS.Playter,EliR.Zunder,DanielH.Gray,WendyJ.Fantl,SylviaK.Plevritis,GarryP.Nolan
BIOMEDICALTEXT-MININGAPPLICATIONSFORTHESYSTEMDEEPDIVE.............................79EmilyK.Mallory,ChrisRe,RussB.Altman
PROFILINGADAPTIVEIMMUNEREPERTOIRESACROSSMULTIPLEHUMANTISSUESBYRNASEQUENCING.............................................................................................................................................80SergheiMangul,IgorMandric,HarryTaegyunYang,DennisMontoya,NicolasStrauli,JeremyRotman,BenjaminStatz,WillVanDerWey,AlexZelikovsky,RobertoSpreafico,MauraRossetti,SagivShifman,MarkAnsel,NoahZaitlen,EleazarEskin
THECMHVARIANTWAREHOUSE-ACATALOGOFGENETICVARIATIONINPATIENTSOFACHILDREN'SHOSPITAL.......................................................................................................................81NeilMIller,GreysonTwist,ByunggilYoo,AndreaGaedigk
MUTPRED2ANDITSAPPLICATIONTOTHEINFERENCEOFMOLECULARSIGNATURESOFDISEASE..........................................................................................................................................................82VikasPejaver,LiliaM.Iakoucheva,SeanD.Mooney,PredragRadivojac
HIV-TRACE:MONITORINGTHEHIVEPIDEMICINNEARREALTIMEUSINGLARGENATIONALANDGLOBALSCALEMOLECULAREPIDEMIOLOGY..................................................83SergeiPond,StevenWeaver,JoelWertheim,AndrewJ.LeighBrown
Page 8
vii
THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY........................................................................................84MaryA.Pyc,EmilyGiron,PhilipCheung,DouglasFenger,J.StevendeBelle,TimTully
RESCUETHEMISSINGVARIANTS-LESSONSLEARNEDFROMLARGESEQUENCINGPROJECTS..............................................................................................................................................................85YingxueRen,JosephS.Reddy,VivekanandaSarangi,JasonP.Sinnwell,SteveG.Younkin,NilüferErtekin-Taner,OwenA.Ross,RosaRademakers,ShannonK.McDonnell,JoannaM.Biernacka,YanW.Asmann
TOWARDEFFECTIVEMICRORNAQUANTIFICATIONFROMSMALLRNA-SEQ.......................86PamelaRussell,RichardRadcliffe,BrianVestal,WenShi,PratyaydiptaRudra,LauraSaba,KaterinaKechris
NANOPORESEQUENCINGTECHNOLOGYANDTOOLS:COMPUTATIONALANALYSISOFTHECURRENTSTATE,BOTTLENECKSANDFUTUREDIRECTIONS..........................................87DamlaSenol,JeremieKim,SaugataGhose,CanAlkan,OnurMutlu
DETECTINGOUTLIERSFROMMULTIDIMENSIONALDATAWITHAPPLICATIONINCANCER..................................................................................................................................................................88KyleSmith,SubhajyotiDe,DebashisGosh
HUEMR:INTUITIVEMININGOFELECTRONICMEDICALRECORDS...........................................89AbiodunOtolorin,NanaOsafo,WilliamSoutherland
DECIPHERINGLUNGADENOCARCINOMAMORPHOLOGYANDPROGNOSISBYINTEGRATINGOMICSANDHISTOPATHOLOGY..................................................................................90Kun-HsingYu,GeraldJ.Berry,DanielL.Rubin,ChristopherRé,RussB.Altman,MichaelSnyder
EXPLORINGDEEPLEARNINGFORCOPYNUMBERVARIATIONDETECTIONWITHNGSDATA.......................................................................................................................................................................91Yao-zhongZhang,RuiYamaguchi,SeiyaImoto,SatoruMiyano
IMAGINGGENOMICS 92PERIPHERALEPIGENETICASSOCIATIONSWITHBRAINGRAYMATTERINSCHIZOPHRENIA................................................................................................................................................93DongdongLin,VinceD.Calhoun,JuanR.Bustillo,NoraPerrone-Bizzozero,JingyuLiu
THEINTERPLAYBETWEENOLIGO-TARGETSPECIFICANDGENOME-WIDEOFF-TARGETINTERACTIONS...................................................................................................................................................94OlgaV.Matveeva,NafisaN.Nazipova,AlekseyY.Ogurtsov,SvetlanaA.Shabalina
PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM? 95WARS2IMPLICATEDASACOMMONMODIFIEROFMETFORMINMETABOLITEBIOMARKERSINABIOBANKCOHORT...................................................................................................96AlyssaI.Clay,RichardM.Weinshilboum,K.SreekumaranNair,RimaF.Kaddurah-Daouk,LieweiWang,MatthewK.Breitenstein
ESTIMATIONOFFALSENEGATIVERATESVIAEMBEDDINGSIMULATEDEVENTS..........97StephenV.Gliske,KatyL.Lau,BenjaminH.Brinkman,GregA.Worrell,CrisG.Fink,WilliamC.Stacey
INTEGRATIVE,INTERPRETABLEDEEPLEARNINGFRAMEWORKSFORREGULATORYGENOMICSANDEPIGENOMICS..................................................................................................................98ChuanShengFoo,AvantiShrikumar,JohnnyIsraeli,PeytonGreenside,ChrisProbert,AnnaScherbina,RahulMohan,NathanBoley,AnshulKundaje
VISUALIZATIONOFCOMPLEXDISEASESANDRELATEDGENESETS......................................99ModestvonKorff,TobiasFink,ThomasSander
Page 9
viii
PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES 100FINDINGSFROMTHEFOURTHCRITICALASSESSMENTOFGENOMEINTERPRETATION,ACOMMUNITYEXPERIMENTTOEVALUATEPHENOTYPEPREDICTION............................101StevenE.Brenner,GaiaAndreoletti,RogerAHoskins,JohnMoult,CAGIParticipants
ASTROLABE:EXPANSIONTOCYP2C9ANDCYP2C1.......................................................................102AndreaGaedigk,GreysonP.Twist,SarahSoden,EmilyG.Farrow,NeilA.Miller
HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER................................................................................................................................................................103JonathanGallion,AngelaD.Wilkins,OlivierLichtarge
SCOTCH:ANOVELMETHODTODETECTINSERTIONSANDDELETIONSFROMNGSDATA.....................................................................................................................................................................104RachelGoldfeder,EuanAshley
MAYOOMICSREPOSITORYFORTRANSLATIONALMEDICINE..................................................105IainHorton,JeanetteEckel-Passow,StevenHart,ShannonMcDonnell,DavidMead,GayGay Reed,GregDougherty,JasonRoss,JulieSwank,MarkMyers,MathieuWiepert,RamaVolety,TonyStai,YaxiongLin,RobertFreimuth
PHARMACOGENOMICSCLINICALANNOTATIONTOOL(PHARMCAT).....................................106T.E.Klein,M.Whirl-Carrillo,R.M.Whaley,M.Woon,K.Sangkuhl,LesterG.Carter,H.M.Dunnenberger,P.E.Empey,A.T.Frase,R.R.Freimuth,A.Gaedigk,A.Gordon,C. Haidar,J.K.Hicks,J.M.Hoffman,M.T.Lee,N.Miller,S.D.Mooney,T.N.Person,J.F.Peterson,M.V.Relling,S.A.Scott,G.Twist,A.Verma,M.S.Williams,C.Wu,W.Yang,M.D.Ritchie
PCSK9MODULATINGVARIANTSINFAMILIALHYPERCHOLESTEROLEMIA......................107SarathbabuKrishnamurthy,DianeSmelser,ManickamKandamurugu,JosephLeader,NouraS.Abul-Husn,AlanR.Shuldiner,DavidH.Ledbetter,FrederickE.Dewey,DavidJ.Carey,MichaelF.Murray,RaghuP.R.Metpally
INTEGRATIVENETWORKANALYSISOFPROSTATETISSUELINCRNA-MRNAEXPRESSIONPROFILESREVEALSPOTENTIALREGULATORYMECHANISMSOFPROSTATECANCERRISKLOCI.................................................................................................................108NicholasB.Larson,ShannonMcDonnell,ZachFogarty,MelissaLarson,JohnCheville,ShaunRiska,SaurabhBaheti,AshaA.Nair,DanielO’Brien,Jaime Davila, Daniel Schaid, Stephen N. Thibodeau
INTEGRATEDANALYSISOFGENOMICS,PROTEOMICS,ANDPHOSPHOPROTEOMICSINCELLSANDTUMORSAMPLES...................................................................................................................109JasonE.McDermott,TaoLiu,SamuelPayne,VladislavPetyuk,RichardSmith,PhilippMertins,StevenCarr,KarinRodland
NETDX:PATIENTCLASSIFICATIONUSINGINTEGRATEDPATIENTSIMILARITYNETWORKS........................................................................................................................................................110ShraddhaPai,ShirleyHui,RuthIsserlin,HussamKaka,GaryD.Bader
PREVALENCEANDDETECTIONOFLOW-ALLELE-FRACTIONVARIANTSINCLINICALCANCERSAMPLES...........................................................................................................................................111Hyun-TaeShin,JaeWonYun,NayoungK.D.Kim,Yoon-LaChoi,Woong-YangPark,PeterJ.Park
AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETS.......................................112JeffreyA.Thompson,CarmenJ.Marsit
CYP2D6DIPLOTYPECALLINGFROMWGSUSINGASTROLABE:UPDATE............................113AndreaGaedigk,GreysonP.Twist,SarahSoden,EmilyG.Farrow,NeilA.Miller
Page 10
ix
INTEGRATION,INTERPRETATIONANDDISPLAYOFMULTI-OMICDATAFORPRECISIONMEDICINE...........................................................................................................................................................114DavidS.Wishart,AnaMarcu,AnChiGuo,AshAnwar,SolveigJohannessen,CraigKnox,MichaelWilson,ChristophH.Borchers,PieterCullis,RobertFraser
BIOTHINGSAPIS:LINKEDHIGH-PERFORMANCEAPISFORBIOLOGICALENTITIES..........115JiwenXin,CyrusAfrasiabi,SebastienLelong,GingerTsueng,SeanD.Mooney,AndrewI.Su,ChunleiWu
SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY116SINGLECELLSIGNALINGSTATESREVEALINDUCTIONOFNON-GENETICVARIATIONINRESISTANCETOTRAIL-INDUCEDAPOPTOSIS..................................................................................117ReemaBaskar,HarrisFienberg,GarryNolan,SeanBendall
ANOVELK-NEARESTNEIGHBORSAPPROACHTOCOMPAREMULTIPLEBIOLOGICALCONDITIONSINSINGLECELLDATA......................................................................................................118TylerJ.Burns,GarryP.Nolan,NikolaySamusik
SINGLE-CELLRNASEQUENCINGINPRIMARYGLIOBLASTOMA:IMPROVINGANALYSISOFHETEROGENEOUSSAMPLESBYINCORPORATINGQUANTIFICATIONOFUNCERTAINTY..................................................................................................................................................119WendyMarieIngram,DebdiptoMisra,NicholasF.Marko,MarylynRitchie
REGISTRATIONOFFLOWCYTOMETRYDATAUSINGSWIFTCLUSTERTEMPLATESTOREMOVECHANNEL-SPECIFICORCLUSTER-SPECIFICVARIATION.........................................120JonathanA.Rebhahn,SallyA.Quataert,GauravSharma,TimR.Mosmann
WORKSHOP:NOBOUNDARYTHINKINGINBIOINFORMATICS 121ENABLINGRICHERDATAINTEGRATIONFORGENOMICEPIDEMIOLOGY..........................122E. Griffiths,D.Dooley,C.Bertelli,J.Adam,F.Bristow,T.Matthews,A.Petkau,M.Courtot,J.A. Carriço,A.Keddy,R.Beiko,L.M.Schriml,E.Taboada,M.Graham,G.VanDomselaar,W. Hsiao,F.Brinkman
AUTHORINDEX 123
Page 11
1
COMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
Page 12
2
IDENTIFICATIONANDANALYSISOFBACTERIALGENOMICMETABOLICSIGNATURES
NathanBowerman1,NathanTintle2,MatthewDeJongh3,AaronA.Best1
1DepartmentofBiology,HopeCollege;2DepartmentofMathematicsandStatistics,DordtCollege,3DepartmentofComputerScience,HopeCollege
BestAaronWithcontinuedrapidgrowthinthenumberandqualityoffullysequencedandaccuratelyannotatedbacterialgenomes,wehaveunprecedentedopportunitiestounderstandmetabolicdiversity.Weselected101diverseandrepresentativecompletelysequencedbacteriaandimplementedamanualcurationefforttoidentify846uniquemetabolicvariantspresentinthesebacteria.Thepresenceorabsenceofthesevariantsactasametabolicsignatureforeachofthebacteria,whichcanthenbeusedtounderstandsimilaritiesanddifferencesbetweenandacrossbacterialgroups.Weproposeanovelandrobustmethodofsummarizingmetabolicdiversityusingmetabolicsignaturesandusethismethodtogenerateametabolictree,clusteringmetabolicallysimilarorganisms.Resultinganalysisofthemetabolictreeconfirmsstrongassociationswithwell-establishedbiologicalresultsalongwithdirectinsightintoparticularmetabolicvariantswhicharemostpredictiveofmetabolicdiversity.Thepositiveresultsofthismanualcurationeffortandnovelmethoddevelopmentsuggestthatfutureworkisneededtofurtherexpandthesetofbacteriatowhichthisapproachisappliedandusetheresultingtreetotestbroadquestionsaboutmetabolicdiversityandcomplexityacrossthebacterialtreeoflife.
Page 13
3
WHENSHOULDWENOTTRANSFERFUNCTIONALANNOTATIONBETWEENSEQUENCEPARALOGS?
MengfeiCao,LenoreJ.Cowen
TuftsUniversity
LenoreCowenCurrentautomatedcomputationalmethodstoassignfunctionallabelstounstudiedgenesofteninvolvetransferringannotationfromorthologousorparalogousgenes,howeversuchgenescanevolvedivergentfunctions,makingsuchtransferinappropriate.Weconsidertheproblemofdeterminingwhenitiscorrecttomakesuchanassignmentbetweenparalogs.Weconstructabenchmarkdatasetoftwotypesofsimilarparalogouspairsofgenesinthewell-studiedmodelorganismS.cerevisiae:onesetofpairswheresingledeletionmutantshaveverysimilarphenotypes(implyingsimilarfunctions),andanothersetofpairswheresingledeletionmutantshaveverydivergentphenotypes(implyingdifferentfunctions).Stateoftheartmethodsforthisproblemwilldeterminetheevolutionaryhistoryoftheparalogswithreferencestomultiplerelatedspecies.Here,weaskafirstandsimplerquestion:weexploretowhatextentanycomputationalmethodwithaccessonlytodatafromasinglespeciescansolvethisproblem.Weconsiderdivergencedata(atboththeaminoacidandnucleotidelevels),andnetworkdata(basedontheyeastprotein-proteininteractionnetwork,ascapturedinBioGRID),andaskifwecanextractfeaturesfromthesedatathatcandistinguishbetweenthesesetsofparalogousgenepairs.Wefindthatthebestfeaturescomefrommeasuresofsequencedivergence,however,simplenetworkmeasuresbasedondegreeorcentralityorshortestpathordiffusionstatedistance(DSD),orsharedneighborhoodintheyeastprotein-proteininteraction(PPI)networkalsocontainsomesignal.Oneshould,ingeneral,nottransferfunctionifsequencedivergenceistoohigh.Furtherimprovementsinclassificationwillneedtocomefrommorecomputationallyexpensivebutmuchmorepowerfulevolutionarymethodsthatincorporateancestralstatesandmeasureevolutionarydivergenceovermultiplespeciesbasedonevolutionarytrees.
Page 14
4
PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION
ShengWang,MengQu,JianPeng
UniversityofIllinoisUrbana-Champaign
ShengWangAutomatedannotationofproteinfunctionhasbecomeacriticaltaskinthepost-genomicera.Network-basedapproachesandhomology-basedapproacheshavebeenwidelyusedandrecentlytestedinlarge-scalecommunity-wideassessmentexperiments.Itisnaturaltointegratenetworkdatawithhomologyinformationtofurtherimprovethepredictiveperformance.However,integratingthesetwoheterogeneous,high-dimensionalandnoisydatasetsisnon-trivial.Inthiswork,weintroduceanovelproteinfunctionpredictionalgorithmProSNet.Anintegratedheterogeneousnetworkisfirstbuilttoincludemolecularnetworksofmultiplespeciesandlinktogetherhomologousproteinsacrossmultiplespecies.Basedonthisintegratednetwork,adimensionalityreductionalgorithmisintroducedtoobtaincompactlow-dimensionalvectorstoencodeproteinsinthenetwork.Finally,wedevelopmachinelearningclassificationalgorithmsthattakethevectorsasinputandmakepredictionsbytransferringannotationsbothwithineachspeciesandacrossdifferentspecies.Extensiveexperimentsonfivemajorspeciesdemonstratethatourintegrationofhomologywithmolecularnetworkssubstantiallyimprovesthepredictiveperformanceoverexistingapproaches.
Page 15
5
ONTHEPOWERANDLIMITSOFSEQUENCESIMILARITYBASEDCLUSTERINGOFPROTEINSINTOFAMILIES
ChristianWiwie,RichardRöttger
UniversityofSouthernDenmark
RichardRöttgerOverthelastdecades,wehaveobservedanongoingtremendousgrowthofavailablesequencingdatafueledbytheadvancementsinwet-labtechnology.Thesequencinginformationisonlythebeginningoftheactualunderstandingofhoworganismssurviveandprosper.Itis,forinstance,equallyimportanttoalsounraveltheproteomicrepertoireofanorganism.Aclassicalcomputationalapproachfordetectingproteinfamiliesisasequence-basedsimilaritycalculationcoupledwithasubsequentclusteranalysis.Inthisworkwehaveintensivelyanalyzedvariousclusteringtoolsonalargescale.Weusedthedatatoinvestigatethebehaviorofthetools'parametersunderliningthediversityoftheproteinfamilies.Furthermore,wetrainedregressionmodelsforpredictingtheexpectedperformanceofaclusteringtoolforanunknowndatasetandaimedtoalsosuggestoptimalparametersinanautomatedfashion.Ouranalysisdemonstratesthebenefitsandlimitationsoftheclusteringofproteinswithlowsequencesimilarityindicatingthateachproteinfamilyrequiresitsowndistinctsetoftoolsandparameters.Allresults,atoolpredictionservice,andadditionalsupportingmaterialisalsoavailableonlineunderhttp://proteinclustering.compbio.sdu.dk/
Page 16
6
IMAGINGGENOMICS
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
Page 17
7
INTEGRATIVEANALYSISFORLUNGADENOCARCINOMAPREDICTSMORPHOLOGICALFEATURESASSOCIATEDWITHGENETICVARIATIONS
ChaoWang1,HaiSu2,LinYang2,KunHuang1
1TheOhioStateUniversity,2UniversityofFlorida
KunHuangLungcancerisoneofthemostdeadlycancersandlungadenocarcinoma(LUAD)isthemostcommonhistologicaltypeoflungcancer.However,LUADishighlyheterogeneousduetogeneticdifferenceaswellasphenotypicdifferencessuchascellularandtissuemorphology.Inthispaper,wesystematicallyexaminetherelationshipsbetweenhistologicalfeaturesandgenetranscription.Specifically,wecalculated283morphologicalfeaturesfromhistologyimagesfor201LUADpatientsfromTCGAprojectandidentifiedthemorphologicalfeaturewithstrongcorrelationwithpatientoutcome.Wethenmodeledthemorphologyfeatureusingmultipleco-expressedgeneclustersusingLasso-regression.Manyofthegeneclustersarehighlyassociatedwithgeneticvariations,specificallyDNAcopynumbervariations,implyingthatgeneticvariationsplayimportantrolesinthedevelopmentcancermorphology.Asfarasweknow,ourfindingisthefirsttodirectlylinkthegeneticvariationsandfunctionalgenomicstoLUADhistology.Theseobservationswillleadtonewinsightonlungcancerdevelopmentandpotentialnewintegrativebiomarkersforpredictionpatientprognosisandresponsetotreatments.
Page 18
8
IDENTIFICATIONOFDISCRIMINATIVEIMAGINGPROTEOMICSASSOCIATIONSINALZHEIMER'SDISEASEVIAANOVELSPARSECORRELATIONMODEL
JingwenYan,ShannonL.Risacher,KwangsikNho,AndrewJ.Saykin,LiShen
IndianaUniversity
JingwenYanBrainimagingandproteinexpression,frombothcerebrospinalfluidandbloodplasma,havebeenfoundtoprovidecomplementaryinformationinpredictingtheclinicaloutcomesofAlzheimer'sdisease(AD).Buttheunderlyingassociationsthatcontributetosuchacomplementaryrelationshiphavenotbeenpreviouslystudiedyet.Inthiswork,wewillperformanimagingproteomicsassociationanalysistoexplorehowtheyarerelatedwitheachother.Whiletraditionalassociationmodels,suchasSparseCanonicalCorrelationAnalysis(SCCA),cannotguaranteetheselectionofonlydisease-relevantbiomarkersandassociations,weproposeanoveldiscriminativeSCCA(denotedasDSCCA)modelwithnewpenaltytermstoaccountforthediseasestatusinformation.Givenbrainimaging,proteomicanddiagnosticdata,theproposedmodelcanperformajointassociationandmulti-classdiscriminationanalysis,suchthatwecannotonlyidentifydisease-relevantmultimodalbiomarkers,butalsorevealstrongassociationsbetweenthem.Basedonarealimagingproteomicdataset,theempiricalresultsshowthatDSCCAandtraditionalSCCAhavecomparableassociationperformances.Butinafurtherclassificationanalysis,canonicalvariablesofimagingandproteomicdataobtainedinDSCCAdemonstratemuchmorediscriminationpowertowardmultiplepairsofdiagnosisgroupsthanthoseobtainedinSCCA.
Page 19
9
ENFORCINGCO-EXPRESSIONINMULTIMODALREGRESSIONFRAMEWORK
PascalZille1,VinceD.Calhoun2,Yu-PingWang1
1TulaneUniversity,2UniversityofNewMexico
PascalZilleWeconsidertheproblemofmultimodaldataintegrationforthestudyofcomplexneurologicaldiseases(e.g.schizophrenia).Amongthechallengesarisinginsuchsituation,estimatingthelinkbetweengeneticandneurologicalvariabilitywithinapopulationsamplehasbeenapromisingdirection.Awidevarietyofstatisticalmodelsarosefromsuchapplications.Forexample,Lassoregressionanditsmultitaskextensionareoftenusedtofitamultivariatelinearrelationshipbetweengivenphenotype(s)andassociatedobservations.Otherapproaches,suchascanonicalcorrelationanalysis(CCA),arewidelyusedtoextractrelationshipsbetweensetsofvariablesfromdifferentmodalities.Inthispaper,weproposeanexploratorymultivariatemethodcombiningthesetwomethods.MoreSpecifically,werelyona'CCA-type'formulationinordertoregularizetheclassicalmultimodalLassoregressionproblem.Theunderlyingmotivationistoextractdiscriminativevariablesthatdisplayarealsoco-expressedacrossmodalities.Wefirstevaluatethemethodonasimulateddataset,andfurthervalidateitusingSingleNucleotidePolymorphisms(SNP)andfunctionalMagneticResonanceImaging(fMRI)dataforthestudyofschizophrenia.
Page 20
10
METHODSTOENSURETHEREPRODUCIBILITYOFBIOMEDICALRESEARCH
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
Page 21
11
EXPLORINGTHEREPRODUCIBILITYOFPROBABILISTICCAUSALMOLECULARNETWORKMODELS
AriellaCohain,AparnaA.Divaraniya,KuixiZhu,JosephR.Scarpa,AndrewKasarskis,JunZhu,RuiChang,JoelT.Dudley,EricE.Schadt
IcahnInstituteandDepartmentofGeneticsandGenomics,IcahnSchoolofMedicineatMountSinai
AriellaCohainNetworkreconstructionalgorithmsareincreasinglybeingemployedinbiomedicalandlifesciencesresearchtointegratelarge-scale,high-dimensionaldatainformingonlivingsystems.OneparticularclassofprobabilisticcausalnetworksbeingappliedtomodelthecomplexityandcausalstructureofbiologicaldataisBayesiannetworks(BNs).BNsprovideanelegantmathematicalframeworkfornotonlyinferringcausalrelationshipsamongmanydifferentmolecularandhigherorderphenotypes,butalsoforincorporatinghighlydiversepriorsthatprovideanefficientpathforincorporatingexistingknowledge.WhilesignificantmethodologicaldevelopmentshavebroadlyenabledtheapplicationofBNstogenerateandvalidatemeaningfulbiologicalhypotheses,thereproducibilityofBNsinthiscontexthasnotbeensystematicallyexplored.Inthisstudy,weaimtodeterminethecriteriaforgeneratingreproducibleBNsinthecontextoftranscription-basedregulatorynetworks.Weutilizetwouniquetissuesfromindependentdatasets,wholebloodfromtheGTExConsortiumandliverfromtheStockholm-TartuAtherosclerosisReverseNetworkEngineeringTeam(STARNET)study.WeevaluatedthereproducibilityoftheBNsbycreatingnetworksondatasubsampledatdifferentlevelsfromeachcohortandcomparingthesenetworkstotheBNsconstructedusingthecompletedata.Tohelpvalidateourresults,weusedsimulatednetworksatvaryingsamplesizes.OurstudyindicatesthatreproducibilityofBNsinbiologicalresearchisanissueworthyoffurtherconsideration,especiallyinlightofthemanypublicationsthatnowemployfindingsfromsuchconstructswithoutappropriateattentionpaidtoreproducibility.Wefindthatwhileedge-to-edgereproducibilityisstronglydependentonsamplesize,identificationofmorehighlyconnectedkeydrivernodesinBNscanbecarriedoutwithhighconfidenceacrossarangeofsamplesizes.
Page 22
12
REPRODUCIBLEDRUGREPURPOSING:WHENSIMILARITYDOESNOTSUFFICE
EmreGuney
JointIRB-BSC-CRGPrograminComputationalBiology-InstituteforResearchinBiomedicine(IRB)Barcelona
EmreGuneyRepurposingexistingdrugsfornewuseshasattractedconsiderableattentionoverthepastyears.Toidentifypotentialcandidatesthatcouldberepositionedforanewindication,manystudiesmakeuseofchemical,target,andsideeffectsimilaritybetweendrugstotrainclassifiers.Despitepromisingpredictionaccuraciesofthesesupervisedcomputationalmodels,theiruseinpractice,suchasforrarediseases,ishinderedbytheassumptionthattherearealreadyknownandsimilardrugsforagivenconditionofinterest.Inthisstudy,usingpubliclyavailabledatasets,wequestionthepredictionaccuraciesofsupervisedapproachesbasedondrugsimilaritywhenthedrugsinthetrainingandthetestsetarecompletelydisjoint.WefirstbuildaPythonplatformtogeneratereproduciblesimilarity-baseddrugrepurposingmodels.Next,weshowthat,whileasimplechemical,target,andsideeffectsimilaritybasedmachinelearningmethodcanachievegoodperformanceonthebenchmarkdataset,thepredictionperformancedropssharplywhenthedrugsinthefoldsofthecrossvalidationarenotoverlappingandthesimilarityinformationwithinthetrainingandtestsetsareusedindependently.Theseintriguingresultssuggestrevisitingtheassumptionsunderlyingthevalidationscenariosofsimilarity-basedmethodsandunderlinetheneedforunsupervisedapproachestoidentifynoveldrugusesinsidetheunexploredpharmacologicalspace.WemakethedigitalnotebookcontainingthePythoncodetoreplicateouranalysisthatinvolvesthedrugrepurposingplatformbasedonmachinelearningmodelsandtheproposeddisjointcrossfoldgenerationmethodfreelyavailableatgithub.com/emreg00/repurpose.
Page 23
13
EMPOWERINGMULTI-COHORTGENEEXPRESSIONANALYSISTOINCREASEREPRODUCIBILITY
WinstonA.Haynes,FrancescoVallania,CharlesLiu,ErikaBongen,AurelieTomczak,MartaAndres-Terrè,ShaneLofgren,AndrewTam,ColeA.Deisseroth,MatthewD.Li,
TimothyE.Sweeney,PurveshKhatri
StanfordUniversity
WinstonHaynesAmajorcontributortothescientificreproducibilitycrisishasbeenthattheresultsfromhomogeneous,single-centerstudiesdonotgeneralizetoheterogeneous,realworldpopulations.Multi-cohortgeneexpressionanalysishashelpedtoincreasereproducibilitybyaggregatingdatafromdiversepopulationsintoasingleanalysis.Tomakethemulti-cohortanalysisprocessmorefeasible,wehaveassembledananalysispipelinewhichimplementsrigorouslystudiedmeta-analysisbestpractices.Wehavecompiledandmadepubliclyavailabletheresultsofourownmulti-cohortgeneexpressionanalysisof103diseases,spanning615studiesand36,915samples,throughanovelandinteractivewebapplication.Asaresult,wehavemadeboththeprocessofandtheresultsfrommulti-cohortgeneexpressionanalysismoreapproachablefornon-technicalusers.
Page 24
14
RABIX:ANOPEN-SOURCEWORKFLOWEXECUTORSUPPORTINGRECOMPUTABILITYANDINTEROPERABILITYOFWORKFLOWDESCRIPTIONS
GauravKaushik,SinisaIvkovic,JankoSimonovic,NebojsaTijanic,BrandiDavis-Dusenbery,DenizKural
SevenBridgesGenomics
GauravKaushikAsbiomedicaldatahasbecomeincreasinglyeasytogenerateinlargequantities,themethodsusedtoanalyzeithaveproliferatedrapidly.Reproducibleandreusablemethodsarerequiredtolearnfromlargevolumesofdatareliably.Toaddressthisissue,numerousgroupshavedevelopedworkflowspecificationsorexecutionengines,whichprovideaframeworkwithwhichtoperformasequenceofanalyses.OnesuchspecificationistheCommonWorkflowLanguage,anemergingstandardwhichprovidesarobustandflexibleframeworkfordescribingdataanalysistoolsandworkflows.Inaddition,reproducibilitycanbefurtheredbyexecutorsorworkflowengineswhichinterpretthespecificationandenableadditionalfeatures,suchaserrorlogging,fileorganization,optimizationstocomputationandjobscheduling,andallowforeasycomputingonlargevolumesofdata.Tothisend,wehavedevelopedtheRabixExecutora,anopen-sourceworkflowengineforthepurposesofimprovingreproducibilitythroughreusabilityandinteroperabilityofworkflowdescriptions.
Page 25
15
DATASHARINGANDCLINICALGENETICTESTING:SUCCESSESANDCHALLENGES
ShanYang1,MelissaCline2,CanZhang2,BenedictPaten2,StephenE.Lincoln1
1Invitae,2UniversityofCaliforniaSantaCruz
StephenLincolnOpensharingofclinicalgeneticdatapromisestobothmonitorandeventuallyimprovethereproducibilityofvariantinterpretationamongclinicaltestinglaboratories.AsignificantpublicdataresourcehasbeendevelopedbytheNIHClinVarinitiative,whichincludessubmissionsfromhundredsoflaboratoriesandclinicsworldwide.WeanalyzedasubsetofClinVardatafocusedonspecificclinicalareasandwefindhighreproducibility(>90%concordance)amonglabs,althoughchallengesforthecommunityareclearlyidentifiedinthisdataset.WefurtherreviewresultsforthecommonlytestedBRCA1andBRCA2genes,whichshowevenhigherconcordance,althoughthesignificantfragmentationofdataintodifferentsilospresentsanongoingchallengenowbeingaddressedbytheBRCAExchange.Weencouragealllaboratoriesandclinicstocontributetotheseimportantresources.
Page 26
16
PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM?
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
Page 27
17
LEARNINGATTRIBUTESOFDISEASEPROGRESSIONFROMTRAJECTORIESOFSPARSELABVALUES
VibhuAgarwal1,NigamH.Shah2
1BiomedicalInformaticsTrainingProgram,StanfordUniversity,2TheCenterforBiomedicalInformaticsResearch,StanfordUniversity
VibhuAgarwalThereisheterogeneityinthemanifestationofdiseases,thereforeitisessentialtounderstandthepatternsofprogressionofadiseaseinagivenpopulationfordiseasemanagementaswellasforclinicalresearch.Diseasestatusisoftensummarizedbyrepeatedrecordingsofoneormorephysiologicalmeasures.Asaresult,historicalvaluesofthesephysiologicalmeasuresforapopulationsamplecanbeusedtocharacterizediseaseprogressionpatterns.Weuseamethodforclusteringsparsefunctionaldataforidentifyingsub-groupswithinacohortofpatientswithchronickidneydisease(CKD),basedonthetrajectoriesoftheirCreatininemeasurements.Wedemonstratethroughaproof-of-principlestudyhowthetwosub-groupsthatdisplaydistinctpatternsofdiseaseprogressionmaybecomparedonclinicalattributesthatcorrespondtothemaximumdifferenceinprogressionpatterns.Thekeyattributesthatdistinguishthetwosub-groupsappeartohavesupportinpublishedliteratureclinicalpracticerelatedtoCKD.
Page 28
18
COMPUTERAIDEDIMAGESEGMENTATIONANDCLASSIFICATIONFORVIABLEANDNON-VIABLETUMORIDENTIFICATIONINOSTEOSARCOMA
HarishBabuArunachalam1,RashikaMishra1,BogdanArmaselu1,OvidiuDaescu1,MariaMartinez1,PatrickLeavey1,DineshRakheja2,KevinCederberg2,AnitaSengupta2,Molly
Ni'Suilleabhain2
1UniversityofTexasatDallas,2UniversityofTexasSouthwesternMedicalCenter
HarishBabuArunachalamOsteosarcomaisoneofthemostcommontypesofbonecancerinchildren.Togaugetheextentofcancertreatmentresponseinthepatientaftersurgicalresection,theH&Estainedimageslidesaremanuallyevaluatedbypathologiststoestimatethepercentageofnecrosis,atimeconsumingprocesspronetoobserverbiasandinaccuracy.Digitalimageanalysisisapotentialmethodtoautomatethisprocess,thussavingtimeandprovidingamoreaccurateevaluation.TheslidesarescannedinAperioScanscope,convertedtodigitalWholeSlideImages(WSIs)andstoredinSVSformat.Thesearehighresolutionimages,oftheorderof10^9pixels,allowingupto40Xmagnificationfactor.Thispaperproposesanimagesegmentationandanalysistechniqueforsegmentingtumorandnon-tumorregionsinhistopathologicalWSIsofosteosarcomadatasets.Ourapproachisacombinationofpixel-basedandobject-basedmethodswhichutilizetumorpropertiessuchasnucleicluster,density,andcircularitytoclassifytumorregionsasviableandnon-viable.AK-Meansclusteringtechniqueisusedfortumorisolationusingcolornormalization,followedbymulti-thresholdOtsusegmentationtechniquetofurtherclassifytumorregionasviableandnon-viable.ThenaFlood-fillalgorithmisappliedtoclustersimilarpixelsintocellularobjectsandcomputeclusterdataforfurtheranalysisofregionsunderstudy.TothebestofourknowledgethisisthefirstcomprehensivesolutionthatisabletoproducesuchaclassificationforOsteosarcomacancer.Theresultsareveryconclusiveinidentifyingviableandnon-viabletumorregions.Inourexperiments,theaccuracyofthediscussedapproachis100%inviabletumorandcoagulativenecrosisidentificationwhileitisaround90%forfibrosisandacellular/hypocellulartumorosteoid,forallthesampleddatasetsused.Weexpectthedevelopedsoftwaretoleadtoasignificantincreaseinaccuracyanddecreaseininter-observervariabilityinassessmentofnecrosisbythepathologistsandareductioninthetimespentbythepathologistsinsuchassessments.
Page 29
19
MISSINGDATAIMPUTATIONINTHEELECTRONICHEALTHRECORDUSINGDEEPLYLEARNEDAUTOENCODERS
BrettK.Beaulieu-Jones1,JasonH.Moore2,ThePooledResourceOpen-AccessALSClinicalTrialsConsortium
1GenomicsandComputationalBiologyGraduateGroup,ComputationalGeneticsLab,InstituteforBiomedicalInformatics,PerelmanSchoolofMedicine,UniversityofPennsylvania;2ComputationalGeneticsLab,InstituteforBiomedicalInformatics,
UniversityofPennsylvania
BrettBeaulieu-JonesElectronichealthrecords(EHRs)havebecomeavitalsourceofpatientoutcomedatabutthewidespreadprevalenceofmissingdatapresentsamajorchallenge.DifferentcausesofmissingdataintheEHRdatamayintroduceunintentionalbias.Here,wecomparetheeffectivenessofpopularmultipleimputationstrategieswithadeeplylearnedautoencoderusingthePooledResourceOpen-AccessALSClinicalTrialsDatabase(PRO-ACT).Toevaluateperformance,weexaminedimputationaccuracyforknownvaluessimulatedtobeeithermissingcompletelyatrandomormissingnotatrandom.WealsocomparedALSdiseaseprogressionpredictionacrossdifferentimputationmodels.Autoencodersshowedstrongperformanceforimputationaccuracyandcontributedtothestrongestdiseaseprogressionpredictor.Finally,weshowthatdespiteclinicalheterogeneity,ALSdiseaseprogressionappearshomogenouswithtimefromonsetbeingthemostimportantpredictor.
Page 30
20
DEVELOPMENTANDPERFORMANCEOFTEXT-MININGALGORITHMSTOEXTRACTSOCIOECONOMICSTATUSFROMDE-IDENTIFIEDELECTRONICHEALTH
RECORDS
BrittanyM.Hollister1,NicoleA.Restrepo2,EricFarber-Eger3,DanaC.Crawford2,MelindaC.Aldrich4,AmyNon5
1VanderbiltGeneticInstitute,VanderbiltUniversity;2InstituteforComputationalBiologyandDepartmentofEpidemiologyandBiostatistics,CaseWesternReserveUniversity;3VanderbiltInstituteforClinicalandTranslationalResearch,VanderbiltUniversity;
4DepartmentofThoracicSurgeryandDivisionofEpidemiology,VanderbiltUniversityMedicalCenter;5DepartmentofAnthropology,UniversityofCaliforniaSanDiego
BrittanyHollisterSocioeconomicstatus(SES)isafundamentalcontributortohealth,andakeyfactorunderlyingracialdisparitiesindisease.However,SESdataarerarelyincludedingeneticstudiesdueinparttothedifficultlyofcollectingthesedatawhenstudieswerenotoriginallydesignedforthatpurpose.Theemergenceoflargeclinic-basedbiobankslinkedtoelectronichealthrecords(EHRs)providesresearchaccesstolargepatientpopulationswithlongitudinalphenotypedatacapturedinstructuredfieldsasbillingcodes,procedurecodes,andprescriptions.SESdatahowever,areoftennotexplicitlyrecordedinstructuredfields,butratherrecordedinthefreetextofclinicalnotesandcommunications.Thecontentandcompletenessofthesedatavarywidelybypractitioner.Toenablegene-environmentstudiesthatconsiderSESasanexposure,wesoughttoextractSESvariablesfromracial/ethnicminorityadultpatients(n=9,977)inBioVU,theVanderbiltUniversityMedicalCenterbiorepositorylinkedtode-identifiedEHRs.WedevelopedseveralmeasuresofSESusinginformationavailablewithinthede-identifiedEHR,includingbroadcategoriesofoccupation,education,insurancestatus,andhomelessness.TwohundredpatientswererandomlyselectedformanualreviewtodevelopasetofsevenalgorithmsforextractingSESinformationfromde-identifiedEHRs.Thealgorithmsconsistof15categoriesofinformation,with830uniquesearchterms.SESdataextractedfrommanualreviewof50randomlyselectedrecordswerecomparedtodataproducedbythealgorithm,resultinginpositivepredictivevaluesof80.0%(education),85.4%(occupation),87.5%(unemployment),63.6%(retirement),23.1%(uninsured),81.8%(Medicaid),and33.3%(homelessness),suggestingsomecategoriesofSESdataareeasiertoextractinthisEHRthanothers.TheSESdataextractionapproachdevelopedherewillenablefutureEHR-basedgeneticstudiestointegrateSESinformationintostatisticalanalyses.Ultimately,incorporationofmeasuresofSESintogeneticstudieswillhelpelucidatetheimpactofthesocialenvironmentondiseaseriskandoutcomes.
Page 31
21
DEMODASHBOARD:VISUALIZINGANDUNDERSTANDINGGENOMICSEQUENCESUSINGDEEPNEURALNETWORKS
JackLanchantin,RitambharaSingh,BeilunWang,YanjunQi
UniversityofVirginia
JackLanchantinDeepneuralnetwork(DNN)modelshaverecentlyobtainedstate-of-the-artpredictionaccuracyforthetranscriptionfactorbinding(TFBS)siteclassificationtask.However,itremainsunclearhowtheseapproachesidentifymeaningfulDNAsequencesignalsandgiveinsightsastowhyTFsbindtocertainlocations.Inthispaper,weproposeatoolkitcalledtheDeepMotifDashboard(DeMoDashboard)whichprovidesasuiteofvisualizationstrategiestoextractmotifs,orsequencepatternsfromdeepneuralnetworkmodelsforTFBSclassification.WedemonstratehowtovisualizeandunderstandthreeimportantDNNmodels:convolutional,recurrent,andconvolutional-recurrentnetworks.Ourfirstvisualizationmethodisfindingatestsequence'ssaliencymapwhichusesfirst-orderderivativestodescribetheimportanceofeachnucleotideinmakingthefinalprediction.Second,consideringrecurrentmodelsmakepredictionsinatemporalmanner(fromoneendofaTFBSsequencetotheother),weintroducetemporaloutputscores,indicatingthepredictionscoreofamodelovertimeforasequentialinput.Lastly,aclass-specificvisualizationstrategyfindstheoptimalinputsequenceforagivenTFBSpositiveclassviastochasticgradientoptimization.Ourexperimentalresultsindicatethataconvolutional-recurrentarchitectureperformsthebestamongthethreearchitectures.ThevisualizationtechniquesindicatethatCNN-RNNmakespredictionsbymodelingbothmotifsaswellasdependenciesamongthem.
Page 32
22
PREDICTIVEMODELINGOFHOSPITALREADMISSIONRATESUSINGELECTRONICMEDICALRECORD-WIDEMACHINELEARNING:ACASE-STUDYUSINGMOUNT
SINAIHEARTFAILURECOHORT
KhaderShameer1,2,KippW.Johnson1,2,AlexandreYahi7,RiccardoMiotto1,2,LiLi1,2,DoranRicks3,JebakumarJebakaran4,PatriciaKovatch1,4,ParthoP.Sengupta5,AnnetineGelijns8,Alan
Moskovitz8,BruceDarrow5,DavidL.Reich6,AndrewKasarskis1,NicholasP.Tatonetti7,SeanPinney5,JoelT.Dudley1,2,8*
1DepartmentofGeneticsandGenomics,IcahnInstituteofGenomicsandMultiscaleBiology;2InstituteofNextGenerationHealthcare,MountSinaiHealthSystem,NY;3DecisionSupport,
MountSinaiHealthSystem,NY;4MountSinaiDataWarehouse,IcahnInstituteofGenomicsandMultiscaleBiology,NY;5ZenaandMichaelA.WienerCardiovascularInstitute,IcahnSchoolofMedicineatMountSinai,NY;6DepartmentofAnesthesiology,IcahnSchoolofMedicineatMountSinai,NY;7DepartmentsofBiomedicalInformatics,SystemsBiologyandMedicine,
ColumbiaUniversityMedicalCenter,NY;8PopulationHealthScienceandPolicy,MountSinaiHealthSystem,NY
*CorrespondingAuthor,Email:joel.dudley@mssm.eduKhaderShameerReductionofpreventablehospitalreadmissionsthatresultfromchronicoracuteconditionslikestroke,heartfailure,myocardialinfarctionandpneumoniaremainsasignificantchallengeforimprovingtheoutcomesanddecreasingthecostofhealthcaredeliveryintheUnitedStates.Patientreadmissionratesarerelativelyhighforconditionslikeheartfailure(HF)despitetheimplementationofhigh-qualityhealthcaredeliveryoperationguidelinescreatedbyregulatoryauthorities.Multiplepredictivemodelsarecurrentlyavailabletoevaluatepotential30-dayreadmissionratesofpatients.Mostofthesemodelsarehypothesisdrivenandrepetitivelyassessthepredictiveabilitiesofthesamesetofbiomarkersaspredictivefeatures.Inthismanuscript,wediscussourattempttodevelopadata-driven,electronic-medicalrecord-wide(EMR-wide)featureselectionapproachandsubsequentmachinelearningtopredictreadmissionprobabilities.Wehaveassessedalargerepertoireofvariablesfromelectronicmedicalrecordsofheartfailurepatientsinasinglecenter.Thecohortincluded1,068patientswith178patientswerereadmittedwithina30-dayinterval(16.66%readmissionrate).Atotalof4,205variableswereextractedfromEMRincludingdiagnosiscodes(n=1,763),medications(n=1,028),laboratorymeasurements(n=846),surgicalprocedures(n=564)andvitalsigns(n=4).WedesignedamultistepmodelingstrategyusingtheNaïveBayesalgorithm.Inthefirststep,wecreatedindividualmodelstoclassifythecases(readmitted)andcontrols(non-readmitted).Inthesecondstep,featurescontributingtopredictiveriskfromindependentmodelswerecombinedintoacompositemodelusingacorrelation-basedfeatureselection(CFS)method.Allmodelsweretrainedandtestedusinga5-foldcross-validationmethod,with70%ofthecohortusedfortrainingandtheremaining30%fortesting.ComparedtoexistingpredictivemodelsforHFreadmissionrates(AUCsintherangeof0.6-0.7),resultsfromourEMR-widepredictivemodel(AUC=0.78;Accuracy=83.19%)andphenome-widefeatureselectionstrategiesareencouragingandrevealtheutilityofsuchdata-drivenmachinelearning.Finetuningofthemodel,replicationusingmulti-centercohortsandprospectiveclinicaltrialtoevaluatetheclinicalutilitywouldhelptheadoptionofthemodelasaclinicaldecisionsystemforevaluatingreadmissionstatus.
Page 33
23
METHODSFORCLUSTERINGTIMESERIESDATAACQUIREDFROMMOBILEHEALTHAPPS
NicoleTignor1,PeiWang1,NicholasGenes1,LindaRogers1,StevenG.Hershman2,ErickR.Scott1,MicolZweig1,Yu-FengYvonneChan1,EricE.Schadt1
1IcahnSchoolofMedicineatMountSinai,2LifeMapSolutions
NicoleTignorInourrecentAsthmaMobileHealthStudy(AMHS),thousandsofasthmapatientsacrossthecountrycontributedmedicaldatathroughtheiPhoneAsthmaHealthApponadailybasisforanextendedperiodoftime.Thecollecteddataincludeddailyself-reportedasthmasymptoms,symptomtriggers,andrealtimegeographiclocationinformation.TheAMHSisjustoneofmanystudiesoccurringinthecontextofnowmanythousandsofmobilehealthappsaimedatimprovingwellnessandbettermanagingchronicdiseaseconditions,leveragingthepassiveandactivecollectionofdatafrommobile,handheldsmartdevices.Theabilitytoidentifypatientgroupsorpatternsofsymptomsthatmightpredictadverseoutcomessuchasasthmaexacerbationsorhospitalizationsfromthesetypesoflarge,prospectivelycollecteddatasets,wouldbeofsignificantgeneralinterest.However,conventionalclusteringmethodscannotbeappliedtothesetypesoflongitudinallycollecteddata,especiallysurveydataactivelycollectedfromappusers,givenheterogeneouspatternsofmissingvaluesdueto:1)varyingsurveyresponseratesamongdifferentusers,2)varyingsurveyresponseratesovertimeofeachuser,and3)non-overlappingperiodsofenrollmentamongdifferentusers.Tohandlesuchcomplicatedmissingdatastructure,weproposedaprobabilityimputationmodeltoinfermissingdata.Wealsoemployedaconsensusclusteringstrategyintandemwiththemultipleimputationprocedure.Throughsimulationstudiesunderarangeofscenariosreflectingrealdataconditions,weidentifiedfavorableperformanceoftheproposedmethodoverotherstrategiesthatimputethemissingvaluethroughlow-rankmatrixcompletion.WhenapplyingtheproposednewmethodtostudyasthmatriggersandsymptomscollectedaspartoftheAMHS,weidentifiedseveralpatientgroupswithdistinctphenotypepatterns.Furthervalidationofthemethodsdescribedinthispapermightbeusedtoidentifyclinicallyimportantpatternsinlargedatasetswithcomplicatedmissingdatastructure,improvingtheabilitytousesuchdatasetstoidentifyat-riskpopulationsforpotentialintervention.
Page 34
24
ANEWRELEVANCEESTIMATORFORTHECOMPILATIONANDVISUALIZATIONOFDISEASEPATTERNSANDPOTENTIALDRUGTARGETS
ModestvonKorff,TobiasFink,ThomasSander
ResearchInformationManagement,ActelionPharmaceuticalsLtd.
ModestvonKorffAnewcomputationalmethodispresentedtoextractdiseasepatternsfromheterogeneousandtext-baseddata.Forthisstudy,22millionPubMedrecordswereminedforco-occurrencesofgenenamesynonymsanddiseaseMeSHterms.TheresultingpublicationcountsweretransferredintoamatrixMdata.Inthismatrix,adiseasewasrepresentedbyarowandagenebyacolumn.Eachfieldinthematrixrepresentedthepublicationcountforaco-occurringdisease–genepair.AsecondmatrixwithidenticaldimensionsMrelevancewasderivedfromMdata.TocreateMrelevancethevaluesfromMdatawerenormalized.Thenormalizedvaluesweremultipliedbythecolumn-wisecalculatedGinicoefficient.Thismultiplicationresultedinarelevanceestimatorforeverygeneinrelationtoadisease.FromMrelevancethesimilaritiesbetweenallrowvectorswerecalculated.TheresultingsimilaritymatrixSrelevancerelated5,000diseasesbytherelevanceestimatorscalculatedfor15,000genes.Threediseaseswereanalyzedindetailforthevalidationofthediseasepatternsandtherelevantgenes.CytoscapewasusedtovisualizeandtoanalyzeMrelevanceandSrelevancetogetherwiththegenesanddiseases.Summarizingtheresults,itcanbestatedthattherelevanceestimatorintroducedherewasabletodetectvaliddiseasepatternsandtoidentifygenesthatencodedkeyproteinsandpotentialtargetsfordrugdiscoveryprojects.
Page 35
25
DISCOVERYOFFUNCTIONALANDDISEASEPATHWAYSBYCOMMUNITYDETECTIONINPROTEIN-PROTEININTERACTIONNETWORKS
StephenJ.Wilson,AngelaD.Wilkins,Chih-HsuLin,RhonaldC.Lua,OlivierLichtarge
BaylorCollegeofMedicine
StephenWilsonAdvancesincellular,molecular,anddiseasebiologydependonthecomprehensivecharacterizationofgeneinteractionsandpathways.Traditionally,thesepathwaysarecuratedmanually,limitingtheirefficientannotationand,potentially,reinforcingfield-specificbias.Here,inordertotestobjectiveandautomatedidentificationoffunctionallycooperativegenes,wecomparedanovelalgorithmwiththreeestablishedmethodstosearchforcommunitieswithingeneinteractionnetworks.Communitiesidentifiedbythenovelapproachandbyoneoftheestablishedmethodoverlappedsignificantly(q<0.1)withcontrolpathways.Withrespecttodisease,thesecommunitieswerebiasedtogeneswithpathogenicvariantsinClinVar(p<<0.01),andoftengenesfromthesamecommunitywereco-expressed,includinginbreastcancers.Theinterestingsubsetofnovelcommunities,definedbypooroverlaptocontrolpathwaysalsocontainedco-expressedgenes,consistentwithapossiblefunctionalrole.Thisworkshowsthatcommunitydetectionbasedontopologicalfeaturesofnetworkssuggestsnew,biologicallymeaningfulgroupingsofgenesthat,inturn,pointtohealthanddiseaserelevanthypotheses.
Page 36
26
PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
Page 37
27
OPENINGTHEDOORTOTHELARGESCALEUSEOFCLINICALLABMEASURESFORASSOCIATIONTESTING:EXPLORINGDIFFERENTMETHODSFORDEFINING
PHENOTYPES
ChristopherR.Bauer,DanielLavage,JohnSnyder,JosephLeader,J.MatthewMahoney,SarahA.Pendergrass
GeisingerHealthSystem,UniversityofVermont
ChristopherBauerThepastdecadehasseenexponentialgrowthinthenumbersofsequencedandgenotypedindividualsandacorrespondingincreaseinourabilityofcollectandcataloguephenotypicdataforuseintheclinic.Wenowfacethechallengeofintegratingthesediversedatainnewwaysnewthatcanprovideusefuldiagnosticsandprecisemedicalinterventionsforindividualpatients.Oneofthefirststepsinthisprocessistoaccuratelymapthephenotypicconsequencesofthegeneticvariationinhumanpopulations.Themostcommonapproachforthisisthegenomewideassociationstudy(GWAS).Whilethistechniqueisrelativelysimpletoimplementforagivenphenotype,thechoiceofhowtodefineaphenotypeiscritical.ItisbecomingincreasinglycommonforeachindividualinaGWAScohorttohavealargeprofileofquantitativemeasures.Thestandardapproachistotestforassociationswithonemeasureatatime;however,therearemanyjustifiablewaystodefineasetofphenotypes,andthegeneticassociationsthatarerevealedwillvarybasedonthesedefinitions.Somephenotypesmayonlyshowasignificantgeneticassociationsignalwhenconsideredtogether,suchasthroughprinciplecomponentsanalysis(PCA).Combiningcorrelatedmeasuresmayincreasethepowertodetectassociationbyreducingthenoisepresentinindividualvariablesandreducethemultiplehypothesistestingburden.HereweshowthatPCAandk-meansclusteringaretwocomplimentarymethodsforidentifyingnovelgenotype-phenotyperelationshipswithinasetofquantitativehumantraitsderivedfromtheGeisingerHealthSystemelectronichealthrecord(EHR).Usingadiversesetofapproachesfordefiningphenotypemayyieldmoreinsightsintothegeneticarchitectureofcomplextraitsandthefindingspresentedherehighlightaclearneedforfurtherinvestigationintoothermethodsfordefiningthemostrelevantphenotypesinasetofvariables.AsthedataofEHRcontinuetogrow,addressingtheseissueswillbecomeincreasinglyimportantinoureffortstousegenomicdataeffectivelyinmedicine.
Page 38
28
TEMPORALORDEROFDISEASEPAIRSAFFECTSSUBSEQUENTDISEASETRAJECTORIES:THECASEOFDIABETESANDSLEEPAPNEA
MetteBeck1,DavidWestergaard1,LeifGroop2,SorenBrunak1
1NovoNordiskFoundationCenterforProteinResearch;2LundUniversityDiabetesCentre,DepartmentofClinicalSciences
MetteBeckMoststudiesofdiseaseetiologiesfocusononediseaseonlyandnotthefullspectrumofmultimorbiditiesthatmanypatientshave.Somediseasepairshavesharedcausalorigins,othersrepresentcommonfollow-ondiseases,whileyetotherco-occurringdiseasesmaymanifestthemselvesinrandomorderofappearance.Wediscussthesedifferenttypesofdiseaseco-occurrences,andusethetwodiseases“sleepapnea”and“diabetes”toshowcasetheapproachwhichotherwisecanbeappliedtoanydiseasepair.WebenefitfromsevenmillionelectronicmedicalrecordscoveringtheentirepopulationofDenmarkformorethan20years.Sleepapneaisthemostcommonsleep-relatedbreathingdisorderandithaspreviouslybeenshowntobebidirectionallylinkedtodiabetes,meaningthateachdiseaseincreasestheriskofacquiringtheother.Weconfirmthatthereisnosignificanttemporalrelationship,asapproximatelyhalfofpatientswithbothdiseasesarediagnosedwithdiabetesfirst.However,wealsoshowthatpatientsdiagnosedwithdiabetesbeforesleepapneahaveahigherdiseaseburdencomparedtopatientsdiagnosedwithsleepapneabeforediabetes.Thestudyclearlydemonstratesthatitisnotonlythediagnosesinthepatient’sdiseasehistorythatareimportant,butalsothespecificorderinwhichthesediagnosisaregiventhatmattersintermsofoutcome.Wesuggestthatthisshouldbeconsideredforpatientstratification.
Page 39
29
HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER
JonathanGallion,AngelaD.Wilkins,OlivierLichtarge
BaylorCollegeofMedicine
JonathanGallionThediscoveryofdrivergenesisamajorpursuitofcancergenomics,usuallybasedonobservingthesamemutationindifferentpatients.Buttheheterogeneityofcancerpathwaysplusthehighbackgroundmutationalfrequencyoftumorcellsoftencloudthedistinctionbetweenlessfrequentdriversandinnocentpassengermutations.Here,toovercomethesedisadvantages,wegroupedtogethermutationsfromclosekinaseparalogsunderthehypothesisthatcognatemutationsmayfunctionallyfavorcancercellsinsimilarways.Indeed,wefindthatkinaseparalogsoftenbearmutationstothesamesubstitutedaminoacidatthesamealignedpositionsandwithalargepredictedEvolutionaryAction.Functionally,thesehighEvolutionaryAction,non-randommutationsaffectknownkinasemotifs,butstrikingly,theydosodifferentlyamongdifferentkinasetypesandcancers,consistentwithdifferencesinselectivepressures.Takentogether,theseresultssuggestthatcancerpathwaysmayflexiblydistributeadependenceonagivenfunctionalmutationamongmultipleclosekinaseparalogs.Therecognitionofthis“mutationaldelocalization”ofcancerdriversamonggroupsofparalogsisanewphenomenathatmayhelpbetteridentifyrelevantmechanismsandthereforeeventuallyguidepersonalizedtherapy.
Page 40
30
MUSE:AMULTI-LOCUSSAMPLING-BASEDEPISTASISALGORITHMFORQUANTITATIVEGENETICTRAITPREDICTION
DanHe,LaxmiParida
IBMThomasJ.WatsonResearchCenter
DanHeQuantitativegenetictraitpredictionbasedonhigh-densitygenotypingarraysplaysanimportantroleforplantandanimalbreeding,aswellasgeneticepidemiologysuchascomplexdiseases.Thepredictioncanbeveryhelpfultodevelopbreedingstrategiesandiscrucialtotranslatethefindingsingeneticstoprecisionmedicine.Epistasis,thephenomenawheretheSNPsinteractwitheachother,hasbeenstudiedextensivelyinGenomeWideAssociationStudies(GWAS)butreceivedrelativelylessattentionforquantitativegenetictraitprediction.Asthenumberofpossibleinteractionsisgenerallyextremelylarge,evenpairwiseinteractionsisverychallenging.Toourknowledge,thereisnosolidsolutionyettoutilizeepistasistoimprovegenetictraitprediction.Inthiswork,westudiedthemulti-locusepistasisproblemwheretheinteractionswithmorethantwoSNPsareconsidered.WedevelopedanefficientalgorithmMUSEtoimprovethegenetictraitpredictionwiththehelpofmulti-locusepistasis.MUSEissampling-basedandweproposedafewdifferentsamplingstrategies.OurexperimentsonrealdatashowedthatMUSEisnotonlyefficientbutalsoeffectivetoimprovethegenetictraitprediction.MUSEalsoachievedverysignificantimprovementsonarealplantdatasetaswellasarealhumandataset.
Page 41
31
DIFFERENTIALPATHWAYDEPENDENCYDISCOVERYASSOCIATEDWITHDRUGRESPONSEACROSSCANCERCELLLINES
GilSpeyer1,DivyaMahendra1,HaiJ.Tran1,JeffKiefer1,StuartL.Schreiber2,PaulA.Clemons2,HarshilDhruv1,MichaelBerens1,SeungchanKim1
1TheTranslationalGenomicsResearchInstitute,2BroadInstituteofHarvardandMIT
SeungchanKimTheefforttopersonalizetreatmentplansforcancerpatientsinvolvestheidentificationofdrugtreatmentsthatcaneffectivelytargetthediseasewhileminimizingthelikelihoodofadversereactions.Inthisstudy,thegene-expressionprofileof810cancercelllinesandtheirresponsedatato368smallmoleculesfromtheCancerTherapeuticsResearchPortal(CTRP)areanalyzedtoidentifypathwayswithsignificantrewiringbetweengenes,ordifferentialgenedependency,betweensensitiveandnon-sensitivecelllines.Identifiedpathwaysandtheircorrespondingdifferentialdependencynetworksarefurtheranalyzedtodiscoveressentialityandspecificitymediatorsofcelllineresponsetodrugs/compounds.ForanalysisweusethepreviouslypublishedmethodEDDY(EvaluationofDifferentialDependencY).EDDYfirstconstructslikelihooddistributionsofgene-dependencynetworks,aidedbyknowngene-geneinteraction,fortwogivenconditions,forexample,sensitivecelllinesvs.non-sensitivecelllines.Thesesetsofnetworksyieldadivergencevaluebetweentwodistributionsofnetworklikelihoodsthatcanbeassessedforsignificanceusingpermutationtests.Resultingdifferentialdependencynetworkswerethenfurtheranalyzedtoidentifygenes,termedmediators,whichmayplayimportantrolesinbiologicalsignalingincertaincelllinesthataresensitiveornon-sensitivetothedrugs.Establishingstatisticalcorrespondencebetweencompoundsandmediatorscanimproveunderstandingofknowngenedependenciesassociatedwithdrugresponsewhilealsodiscoveringnewdependencies.Millionsofcomputehoursresultedinthousandsofthesestatisticaldiscoveries.EDDYidentified8,811statisticallysignificantpathwaysleadingto26,822compound-pathway-mediatortriplets.ByincorporatingSTITCHandSTRINGdatabases,wecouldconstructevidencenetworksfor14,415compound-pathway-mediatortripletsforsupport.Theresultsofthisanalysisarepresentedinasearchablewebsitetoaidresearchersinstudyingpotentialmolecularmechanismsunderlyingcells’drugresponseaswellasindesigningexperimentsforthepurposeofpersonalizedtreatmentregimens.
Page 42
32
AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETSIN
CLEARCELLKIDNEYCANCER
JeffreyA.Thompson1,CarmenJ.Marsit2
1DartmouthCollege,2EmoryUniversity
JeffreyThompsonManyresearchersnowhaveavailablemultiplehigh-dimensionalmolecularandclinicaldatasetswhenstudyingadisease.Asweenterthismulti-omiceraofdataanalysis,newapproachesthatcombinedifferentlevelsofdata(e.g.atthegenomicandepigenomiclevels)arerequiredtofullycapitalizeonthisopportunity.Inthiswork,weoutlineanewapproachtomulti-omicdataintegration,whichcombinesmolecularandclinicalpredictorsaspartofasingleanalysistocreateaprognosticriskscoreforclearcellrenalcellcarcinoma.Theapproachintegratesdatainmultiplewaysandyetcreatesmodelsthatarerelativelystraightforwardtointerpretandwithahighlevelofperformance.Furthermore,theproposedprocessofdataintegrationcapturesrelationshipsinthedatathatrepresenthighlydisease-relevantfunctions.
Page 43
33
DENOVOMUTATIONSINAUTISMIMPLICATETHESYNAPTICELIMINATIONNETWORK
GuhanRamVenkataraman1,ChloeO'Connell1,FumikoEgawa2,DornaKashef-Haghighi1,DennisPaulWall1
1StanfordUniversity,2St.George'sUniversity
FumikoEgawaAutismhasbeenshowntohaveamajorgeneticriskcomponent;thearchitectureofdocumentedautisminfamilieshasbeenoverandagainshowntobepasseddownforgenerations.Whileinheritedriskplaysanimportantroleintheautisticnatureofchildren,denovo(germline)mutationshavealsobeenimplicatedinautismrisk.HerewefindthatautismdenovovariantsverifiedandpublishedintheliteratureareBonferroni-significantlyenrichedinagenesetimplicatedinsynapticelimination.Additionally,severalofthegenesinthissynapticeliminationsetthatwereenrichedinprotein-proteininteractions(CACNA1C,SHANK2,SYNGAP1,NLGN3,NRXN1,andPTEN)havebeenpreviouslyconfirmedasgenesthatconferriskforthedisorder.Theresultsdemonstratethatautism-associateddenovosarelinkedtopropersynapticpruninganddensity,hintingattheetiologyofautismandsuggestingpathophysiologyfordownstreamcorrectionandtreatment.
Page 44
34
IDENTIFYINGGENETICASSOCIATIONSWITHVARIABILITYINMETABOLICHEALTHANDBLOODCOUNTLABORATORYVALUES:DIVINGINTOTHE
QUANTITATIVETRAITSBYLEVERAGINGLONGITUDINALDATAFROMANEHR
ShefaliS.Verma1,AnastasiaM.Lucas1,DanielR.Lavage1,JosephB.Leader1,RaghuMetpally2,SarathbabuKrishnamurthy1,FrederickDewey1,IngridBorecki1,AlexanderLopez3,JohnOverton3,
JohnPenn3,JeffreyReid3,SarahA.Pendergrass1,GerdaBreitwieser2,MarylynD.Ritchie1
1DepartmentofBiomedicalandTranslationalInformatics,GeisingerHealthSystem,Danville,PA;2DepartmentofFunctionalandMolecularGenomics,GeisingerHealthSystem,Danville,PA;
3RegeneronGeneticsCenter,Tarrytown,NYShefaliSetiaVermaAwiderangeofpatienthealthdataisrecordedinElectronicHealthRecords(EHR).Thisdataincludesdiagnosis,surgicalprocedures,clinicallaboratorymeasurements,andmedicationinformation.Togetherthisinformationreflectsthepatient’smedicalhistory.ManystudieshaveefficientlyusedthisdatafromtheEHRtofindassociationsthatareclinicallyrelevant,eitherbyutilizingInternationalClassificationofDiseases,version9(ICD-9)codesorlaboratorymeasurements,orbydesigningphenotypealgorithmstoextractcaseandcontrolstatuswithaccuracyfromtheEHR.HerewedevelopedastrategytoutilizelongitudinalquantitativetraitdatafromtheEHRatGeisingerHealthSystemfocusingonoutpatientmetabolicandcompletebloodpaneldataasastartingpoint.ComprehensiveMetabolicPanel(CMP)aswellasCompleteBloodCounts(CBC)arepartsofroutinecareandprovideacomprehensivepicturefromhighlevelscreeningofpatients’overallhealthanddisease.Werandomlysplitourdataintotwodatasetstoallowfordiscoveryandreplication.Wefirstconductedagenome-wideassociationstudy(GWAS)withmedianvaluesof25differentclinicallaboratorymeasurementstoidentifyvariantsfromHumanOmniExpressExomebeadchipdatathatareassociatedwiththesemeasurements.Weidentified687variantsthatassociatedandreplicatedwiththetestedclinicalmeasurementsatp<5x10-08.SincelongitudinaldatafromtheEHRprovidesarecordofapatient’smedicalhistory,weutilizedthisinformationtofurtherinvestigatetheICD-9codesthatmightbeassociatedwithdifferencesinvariabilityofthemeasurementsinthelongitudinaldataset.WeidentifiedlowandhighvariancepatientsbylookingatchangeswithintheirindividuallongitudinalEHRlaboratoryresultsforeachofthe25clinicallabvalues(thuscreating50groups–ahighvarianceandalowvarianceforeachlabvariable).WethenperformedaPheWASanalysiswithICD-9diagnosiscodes,separatelyinthehighvariancegroupandthelowvariancegroupforeachlabvariable.Wefound717PheWASassociationsthatreplicatedatap-valuelessthan0.001.Next,weevaluatedtheresultsofthisstudybycomparingtheassociationresultsbetweenthehighandlowvariancegroups.Forexample,wefound39SNPs(inmultiplegenes)associatedwithICD-9250.01(Type-Idiabetes)inpatientswithhighvarianceofplasmaglucoselevels,butnotinpatientswithlowvarianceinplasmaglucoselevels.Anotherexampleistheassociationof4SNPsinUMODwithchronickidneydiseaseinpatientswithhighvarianceforaspartateaminotransferase(discoveryp-value:8.71x10-09andreplicationp-value:2.03x10-06).Ingeneral,weseeapatternofmanymore statisticallysignificantassociationsfrompatientswithhighvarianceinthequantitativelabvariables, incomparisonwiththelowvariancegroupacrossallofthe25laboratorymeasurements.Thisstudy isoneofthefirstofitskindtoutilizequantitativetraitvariancefromlongitudinallaboratorydatato findassociationsamonggeneticvariantsandclinicalphenotypesobtainedfromanEHR,integrating laboratoryvaluesanddiagnosiscodestounderstandthegeneticcomplexitiesofcommondiseases.
Page 45
35
STRATEGIESFOREQUITABLEPHARMACOGENOMIC-GUIDEDWARFARINDOSINGAMONGEUROPEANANDAFRICANAMERICANINDIVIDUALSINACLINICAL
POPULATION
LauraWiley1,JacobVanHouten2,DavidSamuels2,MelindaAldrich3,DanRoden2,JoshPeterson2,JoshuaDenny2
1UniversityofColorado,2VanderbiltUniversity,3VanderbiltUniversityMedicalCenter
LauraWileyThebloodthinnerwarfarinhasanarrowtherapeuticrangeandhighinter-andintra-patientvariabilityintherapeuticdoses.Severalstudieshaveshownthatpharmacogenomicvariantshelppredictstablewarfarindosing.However,retrospectiveandrandomizedcontrolledtrialsthatemploydosingalgorithmsincorporatingpharmacogenomicvariantsunderperforminAfricanAmericans.Thisstudysoughttodetermineif:1)includingadditionalvariantsassociatedwithwarfarindoseinAfricanAmericans,2)predictingwithinsingleancestrygroupsratherthanacombinedpopulation,or3)usingpercentageAfricanancestryratherthanobservedrace,wouldimprovewarfarindosingalgorithmsinAfricanAmericans.UsingBioVU,theVanderbiltUniversityMedicalCenterbiobanklinkedtoelectronicmedicalrecords,wecompared25modelingstrategiestoexistingalgorithmsusingacohortof2,181warfarinusers(1,928whites,253blacks).Wefoundthatapproachesincorporatingadditionalvariantsincreasedmodelaccuracy,butnotinclinicallysignificantways.RacestratificationincreasedmodelfidelityforAfricanAmericans,buttheimprovementwassmallandnotlikelytobeclinicallysignificant.UseofpercentAfricanancestryimprovedmodelfitinthecontextofracemisclassification.
Page 46
36
SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
Page 47
37
PRODUCTIONOFAPRELIMINARYQUALITYCONTROLPIPELINEFORSINGLENUCLEIRNA-SEQANDITSAPPLICATIONINTHEANALYSISOFCELLTYPE
DIVERSITYOFPOST-MORTEMHUMANBRAINNEOCORTEX
BrianAevermann1,JamisonMcCorrison1,PratapVenepally1,RebeccaHodge2,TrygveBakken2,JeremyMiller2,MarkNovotny1,DannyN.Tran1,FranciscoDiez-Fuertes3,LenaChristiansen4,FanZhang4,FrankSteemers4,RogerS.Lasken1,EdLein2,NicholasSchork1,
RichardH.Scheuermann1
1J.CraigVenterInstitute,2AllenInstituteforBrainScience,3InstitutodeSaludCarlosIII,4Illumina,Inc.
RichardScheuermannNextgenerationsequencingoftheRNAcontentofsinglecellsorsinglenuclei(sc/nRNA-seq)hasbecomeapowerfulapproachtounderstandthecellularcomplexityanddiversityofmulticellularorganismsandenvironmentalecosystems.However,thefactthattheprocedurebeginswitharelativelysmallamountofstartingmaterial,therebypushingthelimitsofthelaboratoryproceduresrequired,dictatesthatcarefulapproachesforsamplequalitycontrol(QC)areessentialtoreducetheimpactoftechnicalnoiseandsamplebiasindownstreamanalysisapplications.HerewepresentapreliminaryframeworkforsamplelevelqualitycontrolthatisbasedonthecollectionofaseriesofquantitativelaboratoryanddatametricsthatareusedasfeaturesfortheconstructionofQCclassificationmodelsusingrandomforestmachinelearningapproaches.We’veappliedthisinitialframeworktoadatasetcomprisedof2272singlenucleiRNA-seqresultsanddeterminedthat~79%ofsampleswereofhighquality.Removalofthepoorqualitysamplesfromdownstreamanalysiswasfoundtoimprovethecelltypeclusteringresults.Inaddition,thisapproachidentifiedquantitativefeaturesrelatedtotheproportionofuniqueorduplicatereadsandtheproportionofreadsremainingafterqualitytrimmingasusefulfeaturesforpass/failclassification.Theconstructionanduseofclassificationmodelsfortheidentificationofpoorqualitysamplesprovidesforanobjectiveandscalableapproachtosc/nRNA-seqqualitycontrol.
Page 48
38
TRACINGCO-REGULATORYNETWORKDYNAMICSINNOISY,SINGLE-CELLTRANSCRIPTOMETRAJECTORIES
PabloCordero,JoshuaM.Stuart
UCSantaCruzGenomicsInstitute,UniversityofCalifornia,SantaCruz
PabloCorderoTheavailabilityofgeneexpressiondataatthesinglecelllevelmakesitpossibletoprobethemolecularunderpinningsofcomplexbiologicalprocessessuchasdifferentiationandoncogenesis.Promisingnewmethodshaveemergedforreconstructingaprogression'trajectory'fromstaticsingle-celltranscriptomemeasurements.However,itremainsunclearhowtoadequatelymodeltheappreciablelevelofnoiseinthesedatatoelucidategeneregulatorynetworkrewiring.Here,wepresentaframeworkcalledSingleCellInferenceofMorphIngTrajectoriesandtheirAssociatedRegulation(SCIMITAR)thatinfersprogressionsfromstaticsingle-celltranscriptomesbyemployingacontinuousparametrizationofGaussianmixturesinhigh-dimensionalcurves.SCIMITARyieldsrichmodelsfromthedatathathighlightgeneswithexpressionandco-expressionpatternsthatareassociatedwiththeinferredprogression.Further,SCIMITARextractsregulatorystatesfromtheimplicatedtrajectory-evolvingco-expressionnetworks.Webenchmarkthemethodonsimulateddatatoshowthatityieldsaccuratecellorderingandgenenetworkinferences.Appliedtotheinterpretationofasingle-cellhumanfetalneurondataset,SCIMITARfindsprogression-associatedgenesincornerstoneneuraldifferentiationpathwaysmissedbystandarddifferentialexpressiontests.Finally,byleveragingtherewiringofgene-geneco-expressionrelationsacrosstheprogression,themethodrevealstheriseandfallofco-regulatorystatesandtrajectory-dependentgenemodules.Theseanalysesimplicatenewtranscriptionfactorsinneuraldifferentiationincludingputativeco-factorsforthemulti-functionalNFATpathway.
Page 49
39
ANUPDATEDDEBARCODINGTOOLFORMASSCYTOMETRYWITHCELLTYPE-SPECIFICANDCELLSAMPLE-SPECIFICSTRINGENCYADJUSTMENT
KristinI.Fread1,WilliamD.Strickland2,GarryP.Nolan3,EliR.Zunder1
1DepartmentofBiomedicalEngineering,UniversityofVirginia;2DepartmentofBiomedicalSciences,UniversityofVirginia;3DepartmentofMicrobiologyand
Immunology,StanfordUniversity
EliZunderPooledsampleanalysisbymasscytometrybarcodingcarriesmanyadvantages:reducedantibodyconsumption,increasedsamplethroughput,removalofcelldoublets,reductionofcross-contaminationbysamplecarryover,andtheeliminationoftube-to-tube-variabilityinantibodystaining.Asingle-celldebarcodingalgorithmwaspreviouslydevelopedtoimprovetheaccuracyandyieldofsampledeconvolution,butthismethodwaslimitedtousingfixedparametersfordebarcodingstringencyfiltering,whichcouldintroducecell-specificorsample-specificbiastocellyieldinscenarioswherebarcodestainingintensityandvariancearenotuniformacrossthepooledsamples.Toaddressthisissue,wehaveupdatedthealgorithmtooutputdebarcodingparametersforeverycellinthesample-assignedFCSfiles,whichallowsforvisualizationandanalysisoftheseparametersviaflowcytometryanalysissoftware.Thisstrategycanbeusedtodetectcelltype-specificandsample-specificeffectsontheunderlyingcelldatathatariseduringthedebarcodingprocess.Anadditionalbenefittothisstrategyisthedecouplingofbarcodestringencyfilteringfromthedebarcodingandsampleassignmentprocess.Thisisaccomplishedbyremovingthestringencyfiltersduringsampleassignment,andthenfilteringafterthefactwith1-and2-dimensionalgatingonthedebarcodingparameterswhichareoutputwiththeFCSfiles.Thesedataexplorationstrategiesserveasanimportantqualitycheckforbarcodedmasscytometrydatasets,andallowcelltypeandsample-specificstringencyadjustmentthatcanremovebiasincellyieldintroducedduringthedebarcodingprocess.
Page 50
40
IMAGINGGENOMICS
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
Page 51
41
ADAPTIVETESTINGOFSNP-BRAINFUNCTIONALCONNECTIVITYASSOCIATIONVIAAMODULARNETWORKANALYSIS
ChenGao,JunghiKim,WeiPan
DivisionofBiostatistics,SchoolofPublicHealth,UniversityofMinnesota
WeiPanDuetoitshighdimensionalityandhighnoiselevels,analysisofalargebrainfunctionalnetworkmaynotbepowerfulandeasytointerpret;instead,decompositionofalargenetworkintosmallersubcomponentscalledmodulesmaybemorepromisingassuggestedbysomeempiricalevidence.Forexample,alterationofbrainmodularityisobservedinpatientssufferingfromvarioustypesofbrainmalfunctions.Althoughseveralmethodsexistforestimatingbrainfunctionalnetworks,suchasthesamplecorrelationmatrixorgraphicallassoforasparseprecisionmatrix,itisstilldifficulttoextractmodulesfromsuchnetworkestimates.Motivatedbytheseconsiderations,weadaptaweightedgeneco-expressionnetworkanalysis(WGCNA)frameworktoresting-statefMRI(rs-fMRI)datatoidentifymodularstructuresinbrainfunctionalnetworks.Modularstructuresareidentifiedbyusingtopologicaloverlapmatrix(TOM)elementsinhierarchicalclustering.Weproposeapplyinganewadaptivetestbuiltontheproportionaloddsmodel(POM)thatcanbeappliedtoahigh-dimensionalsetting,wherethenumberofvariables(p)canexceedthesamplesize(n)inadditiontotheusualp<nsetting.WeappliedourproposedmethodstotheADNIdatatotestforassociationsbetweenageneticvariantandeitherthewholebrainfunctionalnetworkoritsvarioussubcomponentsusingvariousconnectivitymeasures.Weuncoveredseveralmodulesbasedonthecontrolcohort,andsomeofthemweremarginallyassociatedwiththeAPOE4variantandseveralotherSNPs;however,duetothesmallsamplesizeoftheADNIdata,largerstudiesareneeded.
Page 52
42
EXPLORINGBRAINTRANSCRIPTOMICPATTERNS:ATOPOLOGICALANALYSISUSINGSPATIALEXPRESSIONNETWORKS
ZhanaKuncheva1,MichelleL.Krishnan2,GiovanniMontana2
1ImperialCollegeLondon,2King'sCollegeLondon
ZhanaKunchevaCharacterizingthetranscriptomearchitectureofthehumanbrainisfundamentalingaininganunderstandingofbrainfunctionanddisease.AnumberofrecentstudieshaveinvestigatedpatternsofbraingeneexpressionobtainedfromanextensiveanatomicalcoverageacrosstheentirehumanbrainusingexperimentaldatageneratedbytheAllenHumanBrainAtlas(AHBA)project.Inthispaper,weproposeanewrepresentationofagene'stranscriptionactivitythatexplicitlycapturesthepatternofspatialco-expressionacrossdifferentanatomicalbrainregions.Foreachgene,wedefineaSpatialExpressionNetwork(SEN),anetworkquantifyingco-expressionpatternsamongstseveralanatomicallocations.NetworksimilaritymeasuresarethenemployedtoquantifythetopologicalresemblancebetweenpairsofSENsandidentifynaturallyoccurringclusters.Usingnetwork-theoreticalmeasures,threelargeclustershavebeendetectedfeaturingdistincttopologicalproperties.WethenevaluatewhethertopologicaldiversityoftheSENsreflectssignificantdifferencesinbiologicalfunctionthroughageneontologyanalysis.WereportonevidencesuggestingthatoneofthethreeSENclustersconsistsofgenesspecificallyinvolvedinthenervoussystem,includinggenesrelatedtobraindisorders,whiletheremainingtwoclustersarerepresentativeofimmunity,transcriptionandtranslation.Thesefindingsareconsistentwithpreviousstudiesshowingthatbraingeneclustersaregenerallyassociatedwithoneofthesethreemajorbiologicalprocesses.
Page 53
43
PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM?
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
Page 54
44
ADEEPLEARNINGAPPROACHFORCANCERDETECTIONANDRELEVANTGENEIDENTIFICATION
PadidehDanaee,RezaGhaeini,DavidHendrix
OregonStateUniversity
PadidehDaneeCancerdetectionfromgeneexpressiondatacontinuestoposeachallengeduetothehighdimensionalityandcomplexityofthesedata.Afterdecadesofresearchthereisstilluncertaintyintheclinicaldiagnosisofcancerandtheidentificationoftumor-specificmarkers.Herewepresentadeeplearningapproachtocancerdetection,andtotheidentificationofgenescriticalforthediagnosisofbreastcancer.First,weusedStackedDenoisingAutoencoder(SDAE)todeeplyextractfunctionalfeaturesfromhighdimensionalgeneexpressionprofiles.Next,weevaluatedtheperformanceoftheextractedrepresentationthroughsupervisedclassificationmodelstoverifytheusefulnessofthenewfeaturesincancerdetection.Lastly,weidentifiedasetofhighlyinteractivegenesbyanalyzingtheSDAEconnectivitymatrices.Ourresultsandanalysisillustratethatthesehighlyinteractivegenescouldbeusefulcancerbiomarkersforthedetectionofbreastcancerthatdeservefurtherstudies.
Page 55
45
GENOME-WIDEINTERACTIONWITHSELECTEDTYPE2DIABETESLOCIREVEALSNOVELLOCIFORTYPE2DIABETESINAFRICANAMERICANS
JacobM.Keaton1,JacklynN.Hellwege1,MaggieC.Y.Ng1,NicholetteD.Palmer1,JamesS.Pankow2,MyriamFornage3,JamesG.Wilson4,AdolofoCorrea4,LauraJ.Rasmussen-Torvik5,JeromeI.Rotter6,Yii-DerI.Chen6,KentD.Taylor6,StephenS.Rich7,LynneE.
Wagenknecht1,BarryI.Freedman1,DonaldW.Bowden1
1WakeForestSchoolofMedicine,2UniversityofMinnesota,3UniversityofTexasHealthScienceCenteratHouston,4UniversityofMississippiMedicalCenter,5NorthwesternUniversityFeinbergSchoolofMedicine,6Harbor-UCLAMedicalCenter,7Universityof
Virginia
JacobKeatonType2diabetes(T2D)istheresultofmetabolicdefectsininsulinsecretionandinsulinsensitivity,yetmostT2Dlociidentifiedtodateinfluenceinsulinsecretion.WehypothesizedthatT2Dloci,particularlythoseaffectinginsulinsensitivity,canbeidentifiedthroughinteractionwithknownT2Dlociimplicatedininsulinsecretion.Totestthishypothesis,singlenucleotidepolymorphisms(SNPs)nominallyassociatedwithacuteinsulinresponsetoglucose(AIRg),adynamicmeasureoffirst-phaseinsulinsecretion,andpreviouslyassociatedwithT2Dingenome-wideassociationstudies(GWAS)wereidentifiedinAfricanAmericansfromtheInsulinResistanceAtherosclerosisFamilyStudy(IRASFS;n=492subjects).TheseSNPsweretestedforinteraction,individuallyandjointlyasageneticriskscore(GRS),usingGWASdatafromfivecohorts(ARIC,CARDIA,JHS,MESA,WFSM;n=2,725cases,4,167controls)withT2Dastheoutcome.Insinglevariantanalyses,suggestivelysignificant(Pinteraction<5x10-6)interactionswereobservedatseverallociincludingDGKB(rs978989),CDK18(rs12126276),CXCL12(rs7921850),HCN1(rs6895191),FAM98A(rs1900780),andMGMT(rs568530).Notablebeta-cellGRSinteractionsincludedtwoSNPsattheDGKBlocus(rs6976381;rs6962498).ThesedatasupportthehypothesisthatadditionalgeneticfactorscontributingtoT2Driskcanbeidentifiedbyinteractionswithinsulinsecretionloci.
Page 56
46
META-ANALYSISOFCONTINUOUSPHENOTYPESIDENTIFIESAGENESIGNATURETHATCORRELATESWITHCOPDDISEASESTATUS
MadeleineScott1,FrancescoVallania2,PurveshKhatri3
1StanfordMedicalSchool,StanfordUniversity,Stanford,California;2StanfordInstituteforImmunity,Transplantation,andInfection,StanfordUniversity,Stanford,California;3StanfordCenterforBiomedicalInformaticsResearch,StanfordUniversity,Stanford,
California
PurveshKhatriTheutilityofmulti-cohorttwo-classmeta-analysistoidentifyrobustdifferentiallyexpressedgenesignatureshasbeenwellestablished.However,manybiomedicalapplications,suchasgenesignaturesofdiseaseprogression,requireone-classanalysis.HerewedescribeanRpackage,MetaCorrelator,thatcanidentifyareproducibletranscriptionalsignaturethatiscorrelatedwithacontinuousdiseasephenotypeacrossmultipledatasets.Wesuccessfullyappliedthisframeworktoextractapatternofgeneexpressionthatcanpredictlungfunctioninpatientswithchronicobstructivepulmonarydisease(COPD)inbothperipheralbloodmononuclearcells(PBMCs)andtissue.OurresultspointtoadisregulationintheoxidationstateofthelungsofpatientswithCOPD,aswellasunderscoretheclassicallyrecognizedinflammatorystatethatunderliesthisdisease.
Page 57
47
LEARNINGPARSIMONIOUSENSEMBLESFORUNBALANCEDCOMPUTATIONALGENOMICSPROBLEMS
AnaStanescu,GauravPandey
IcahnSchoolofMedicineatMountSinai
GauravPandeyPredictionproblemsinbiomedicalsciencesaregenerallyquitedifficult,partiallyduetoincompleteknowledgeofhowthephenomenonofinterestisinfluencedbythevariablesandmeasurementsusedforprediction,aswellasalackofconsensusregardingtheidealpredictor(s)forspecificproblems.Inthesesituations,apowerfulapproachtoimprovingpredictionperformanceistoconstructensemblesthatcombinetheoutputsofmanyindividualbasepredictors,whichhavebeensuccessfulformanybiomedicalpredictiontasks.Moreover,selectinga{\itparsimonious}ensemblecanbeofevengreatervalueforbiomedicalsciences,whereitisnotonlyimportanttolearnanaccuratepredictor,butalsotointerpretwhatnovelknowledgeitcanprovideaboutthetargetproblem.Ensembleselectionisapromisingapproachforthistaskbecauseofitsabilitytoselectacollectivelypredictivesubset,oftenarelativelysmallone,ofallinputbasepredictors.Oneofthemostwell-knownalgorithmsforensembleselection,CES(Caruana{\itetal.}'sEnsembleSelection),generallyperformswellinpractice,butfacesseveralchallengesduetothedifficultyofchoosingtherightvaluesofitsvariousparameters.Sincethechoicesmadefortheseparametersareusuallyad-hoc,goodperformanceofCESisdifficulttoguaranteeforavarietyofproblemsordatasets.ToaddressthesechallengeswithCESandothersuchalgorithms,weproposeanovelheterogeneousensembleselectionapproachbasedontheparadigmofreinforcementlearning(RL),whichoffersamoresystematicandmathematicallysoundmethodologyforexploringthemanypossiblecombinationsofbasepredictorsthatcanbeselectedintoanensemble.WedevelopthreeRL-basedstrategiesforconstructingensemblesandanalyzetheirresultsontwounbalancedcomputationalgenomicsproblems,namelythepredictionofproteinfunctionandsplicesitesineukaryoticgenomes.Weshowthattheresultantensemblesareindeedsubstantiallymoreparsimoniousascomparedtothefullsetofbasepredictors,yetstillofferalmostthesameclassificationpower,especiallyforlargerdatasets.TheRLensemblesalsoyieldabettercombinationofparsimonyandpredictiveperformanceascomparedtoCES.
Page 58
48
NETWORKMAPOFADVERSEHEALTHEFFECTSAMONGVICTIMSOFINTIMATEPARTNERVIOLENCE
KathleenWhiting1,LarryY.Liu2,MehmetKoyutürk2,GunnurKarakurt2
1UniformedServicesUniversity,2CaseWesternReserveUniversity
GunnurKarakurtIntimatepartnerviolence(IPV)isaseriousproblemwithdevastatinghealthconsequences.ScreeningproceduresmayoverlookrelationshipsbetweenIPVandnegativehealtheffects.ToidentifyIPV-associatedwomen’shealthissues,weminednational,aggregatedde-identifiedelectronichealthrecorddataandcomparedfemalehealthissuesofdomesticabuse(DA)versusnon-DArecords,identifyingtermssignificantlymorefrequentfortheDAgroup.Aftercodingthesetermsinto28broadcategories,wedevelopedanetworkmaptodeterminestrengthofrelationshipsbetweencategoriesinthecontextofDA,findingthatacuteconditionsarestronglyconnectedtocardiovascular,gastrointestinal,gynecological,andneurologicalconditionsamongvictims.
Page 59
49
PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
Page 60
50
APOWERFULMETHODFORINCLUDINGGENOTYPEUNCERTAINTYINTESTSOFHARDY-WEINBERGEQUILIBRIUM
AndrewBeck1,AlexanderLuedtke2,KeliLiu3,NathanTintle4
1UniversityofMichigan,2UniversityofCalifornia-Berkeley,3HarvardUniversity,4DordtCollege
NathanTintleTheuseofposteriorprobabilitiestosummarizegenotypeuncertaintyispervasiveacrossgenotype,sequencingandimputationplatforms.Priorworkinmanycontextshasshowntheutilityofincorporatinggenotypeuncertainty(posteriorprobabilities)indownstreamstatisticaltests.TypicalapproachestoincorporatinggenotypeuncertaintywhentestingHardy-WeinbergequilibriumtendtolackcalibrationinthetypeIerrorrate,especiallyasgenotypeuncertaintyincreases.WeproposeanewapproachinthespiritofgenomiccontrolthatproperlycalibratesthetypeIerrorrate,whileyieldingimprovedpowertodetectdeviationsfromHardy-WeinbergEquilibrium.Wedemonstratetheimprovedperformanceofourmethodonbothsimulatedandrealgenotypes.
Page 61
51
MICRORNA-AUGMENTEDPATHWAYS(MIRAP)ANDTHEIRAPPLICATIONSTOPATHWAYANALYSISANDDISEASESUBTYPING
DianaDiaz1,MicheleDonato2,TinNguyen1,SorinDraghici1
1WayneStateUniversity,2StanfordUniversityMedicalCenter
SorinDraghiciMicroRNAsplayimportantrolesinthedevelopmentofmanycomplexdiseases.Becauseoftheirimportance,theanalysisofsignalingpathwaysincludingmiRNAinteractionsholdsthepotentialforunveilingthemechanismsunderlyingsuchdiseases.However,currentsignalingpathwaydatabasesarelimitedtointeractionsbetweengenesandignoremiRNAs.Here,weusetheinformationonmiRNAtargetstobuildadatabaseofmiRNA-augmentedpathways(mirAP),andweshowitsapplicationinthecontextsofintegrativepathwayanalysisanddiseasesubtyping.OurmiRNA-mRNAintegrativepathwayanalysispipelineincorporatesatopology-awareapproachthatwepreviouslyimplemented.Ourintegrativediseasesubtypingpipelinetakesintoaccountsurvivaldata,geneandmiRNAexpression,andknowledgeoftheinteractionsamonggenes.Wedemonstratetheadvantagesofourapproachbyanalyzingninesample-matcheddatasetsthatprovidebothmiRNAandmRNAexpression.WeshowthatintegratingmiRNAsintopathwayanalysisresultsingreaterstatisticalpower,andprovidesamorecomprehensiveviewoftheunderlyingphenomena.Wealsocompareourdiseasesubtypingmethodwiththestate-of-the-artintegrativeanalysisbyanalyzingacolorectalcancerdatabasefromTCGA.Thecolorectalcancersubtypesidentifiedbyourapproacharesignificantlydifferentintermsoftheirsurvivalexpectation.ThesemiRNA-augmentedpathwaysofferamorecomprehensiveviewandadeeperunderstandingofbiologicalpathways.Abetterunderstandingofthemolecularprocessesassociatedwithpatients'survivalcanhelptoabetterprognosisandanappropriatetreatmentforeachsubtype.
Page 62
52
FREQUENTSUBGRAPHMININGOFPERSONALIZEDSIGNALINGPATHWAYNETWORKSGROUPSPATIENTSWITHFREQUENTLYDYSREGULATEDDISEASE
PATHWAYSANDPREDICTSPROGNOSIS
ArdaDurmaz,TimA.D.Henderson,DouglasBrubaker,GurkanBebek
CaseWesternReserveUniversity
GurkanBebekMotivation:Largescalegenomicsstudieshavegeneratedcomprehensivemolecularcharacterizationofnumerouscancertypes.Subtypesformanytumortypeshavebeenestablished;however,theseclassificationsarebasedonmolecularcharacteristicsofasmallgenesetswithlimitedpowertodetectdysregulationatthepatientlevel.Wehypothesizethatfrequentgraphminingofpathwaystogatherpathwaysfunctionallyrelevanttotumorscancharacterizetumortypesandprovideopportunitiesforpersonalizedtherapies.Results:Inthisstudywepresentanintegrativeomicsapproachtogrouppatientsbasedontheiralteredpathwaycharacteristicsandshowprognosticdifferenceswithinbreastcancer(p<9.57E−10)andglioblastomamultiforme(p<0.05)patients.WewereablevalidatethisapproachinsecondaryRNA-Seqdatasetswithp<0.05andp<0.01respectively.Wealsoperformedpathwayenrichmentanalysistofurtherinvestigatethebiologicalrelevanceofdysregulatedpathways.Wecomparedourapproachwithnetwork-basedclassifieralgorithmsandshowedthatourunsupervisedapproachgeneratesmorerobustandbiologicallyrelevantclusteringwhereaspreviousapproachesfailedtoreportspecificfunctionsforsimilarpatientgroupsorclassifypatientsintoprognosticgroups.Conclusions:Theseresultscouldserveasameanstoimproveprognosisforfuturecancerpatients,andtoprovideopportunitiesforimprovedtreatmentoptionsandpersonalizedinterventions.TheproposednovelgraphminingapproachisabletointegratePPInetworkswithgeneexpressioninabiologicallysoundapproachandclusterpatientsintoclinicallydistinctgroups.WehaveutilizedbreastcancerandglioblastomamultiformedatasetsfrommicroarrayandRNA-Seqplatformsandidentifieddiseasemechanismsdifferentiatingsamples.
Page 63
53
CERNASEARCHMETHODIDENTIFIEDAMET-ACTIVATEDSUBGROUPAMONGEGFRDNAAMPLIFIEDLUNGADENOCARCINOMAPATIENTS
HallaKabat,LeoTunkle,InhanLee
miRcore
InhanLeeGiventhediversemolecularpathwaysinvolvedintumorigenesis,identifyingsubgroupsamongcancerpatientsiscrucialinprecisionmedicine.WhilemosttargetedtherapiesrelyonDNAmutationstatusintumors,responsestosuchtherapiesvaryduetothemanymolecularprocessesinvolvedinpropagatingDNAchangestoproteins(whichconstitutetheusualdrugtargets).ThoughRNAexpressionshavebeenextensivelyusedtocategorizetumors,identifyingclinicallyimportantsubgroupsremainschallenginggiventhedifficultyofdiscerningsubgroupswithinallpossibleRNA-RNAnetworks.Itisthusessentialtoincorporatemultipletypesofdata.Recently,RNAwasfoundtoregulateotherRNAthroughacommonmicroRNA(miR).TheseregulatingandregulatedRNAsarereferredtoascompetingendogenousRNAs(ceRNAs).However,globalcorrelationsbetweenmRNAandmiRexpressionsacrossallsampleshavenotreliablyyieldedceRNAs.Inthisstudy,wedevelopedaceRNA-basedmethodtoidentifysubgroupsofcancerpatientscombiningDNAcopynumbervariation,mRNAexpression,andmicroRNA(miR)expressiondatawithbiologicalknowledge.ClinicaldataisusedtovalidateidentifiedsubgroupsandceRNAs.SinceceRNAsarecausal,ceRNA-basedsubgroupsmaypresentclinicalrelevance.UsinglungadenocarcinomadatafromTheCancerGenomeAtlas(TCGA)asanexample,wefocusedonEGFRamplificationstatus,sinceatargetedtherapyforEGFRexists.WehypothesizedthatglobalcorrelationsbetweenmRNAandmiRexpressionsacrossallpatientswouldnotrevealimportantsubgroupsandthatclusteringofpotentialceRNAsmightdefinemolecularpathway-relevantsubgroups.UsingexperimentallyvalidatedmiR-targetpairs,weidentifiedEGFRandMETaspotentialceRNAsformiR-133binlungadenocarcinoma.TheEGFR-METupandmiR-133bdownsubgroupshowedahigherdeathratethantheEGFR-METdownandmiR-133bupsubgroup.AlthoughtransactivationbetweenMETandEGFRhasbeenidentifiedpreviously,ourresultisthefirsttoproposeceRNAasoneofitsunderlyingmechanisms.Furthermore,sinceMETamplificationwasseeninthecaseofresistancetoEGFR-targetedtherapy,theEGFR-METupandmiR-133bdownsubgroupmayfallintothedrugnon-responsegroupandthusprecludeEGFRtargettherapy.
Page 64
54
IMPROVEDPERFORMANCEOFGENESETANALYSISONGENOME-WIDETRANSCRIPTOMICSDATAWHENUSINGGENEACTIVITYSTATEESTIMATES
ThomasKamp,MicahAdams,CraigDisselkoen,NathanTintle
DordtCollege
NathanTintleGenesetanalysismethodscontinuetobeapopularandpowerfulmethodofevaluatinggenome-widetranscriptomicsdata.Theseapproachrequireapriorigroupingofgenesintobiologicallymeaningfulsets,andthenconductingdownstreamanalysesattheset(insteadofgene)levelofanalysis.Genesetanalysismethodshavebeenshowntoyieldmorepowerfulstatisticalconclusionsthansingle-geneanalysesduetobothreducedmultipletestingpenaltiesandpotentiallylargerobservedeffectsduetotheaggregationofeffectsacrossmultiplegenesintheset.Traditionally,genesetanalysismethodshavebeenapplieddirectlytonormalized,log-transformed,transcriptomicsdata.Recently,effortshavebeenmadetotransformtranscriptomicsdatatoscalesyieldingmorebiologicallyinterpretableresults.Forexample,recentlyproposedmodelstransformlog-transformedtranscriptomicsdatatoaconfidencemetric(rangingbetween0and100%)thatageneisactive(roughlyspeaking,thatthegeneproductispartofanactivecellularmechanism).Inthismanuscript,wedemonstrate,onbothrealandsimulatedtranscriptomicsdata,thattestsfordifferentialexpressionbetweensetsofgenesusingaretypicallymorepowerfulwhenusinggeneactivitystateestimatesasopposedtolog-transformedgeneexpressiondata.Ouranalysissuggestsfurtherexplorationoftechniquestotransformtranscriptomicsdatatomeaningfulquantitiesforimproveddownstreaminference.
Page 65
55
METHYLDMV:SIMULTANEOUSDETECTIONOFDIFFERENTIALDNAMETHYLATIONANDVARIABILITYWITHCONFOUNDERADJUSTMENT
PeiFenKuan,JunyanSong,ShuyaoHe
StonyBrookUniversity
PeiFenKuanDNAmethylationhasemergedaspromisingepigeneticmarkersfordiseasediagnosis.Boththedifferentialmean(DM)anddifferentialvariability(DV)inmethylationhavebeenshowntocontributetotranscriptionalaberrationanddiseasepathogenesis.ThepresenceofconfoundingfactorsinlargescaleEWASmayaffectthemethylationvaluesandhamperaccuratemarkerdiscovery.Inthispaper,weproposeaflexibleframeworkcalledmethylDMVwhichallowsforconfoundingfactorsadjustmentandenablessimultaneouscharacterizationandidentificationofCpGsexhibitingDMonly,DVonlyandbothDMandDV.Theproposedframeworkalsoallowsforprioritizationandselectionofcandidatefeaturestobeincludedinthepredictionalgorithm.WeillustratetheutilityofmethylDMVinseveralTCGAdatasets.AnRpackagemethylDMVimplementingourproposedmethodisavailableathttp://www.ams.sunysb.edu/~pfkuan/softwares.html#methylDMV.
Page 66
56
IDENTIFYCANCERDRIVERGENESTHROUGHSHAREDMENDELIANDISEASEPATHOGENICVARIANTSANDCANCERSOMATICMUTATIONS
MengMa1,ChangchangWang2,BenjaminGlicksberg1,EricE.Schadt1,ShuyuLi1,RongChen1
1IcahnSchoolofMedicineatMountSinai,2AnhuiUniversity
ShuyuLiGenomicsequencingstudiesinthepastseveralyearshaveyieldedalargenumberofcancersomaticmutations.Thereremainsamajorchallengeindelineatingasmallfractionofsomaticmutationsthatareoncogenicdriversfromabackgroundofpredominantlypassengermutations.Althoughcomputationaltoolshavebeendevelopedtopredictthefunctionalimpactofmutations,theirutilityislimited.Inthisstudy,weappliedanalternativeapproachtoidentifypotentiallynovelcancerdriversasthosesomaticmutationsthatoverlapwithknownpathogenicmutationsinMendeliandiseases.Wehypothesizethatthosesharedmutationsaremorelikelytobecancerdriversbecausetheyhavetheestablishedmolecularmechanismstoimpactproteinfunctions.WefirstshowthattheoverlapbetweensomaticmutationsinCOSMICandpathogenicgeneticvariantsinHGMDisassociatedwithhighmutationfrequencyincancersandisenrichedforknowncancergenes.WethenattemptedtoidentifyputativetumorsuppressorsbasedonthenumberofdistinctHGMD/COSMICoverlappingmutationsinagivengene,andourresultssuggestthationchannels,collagensandMarfansyndromeassociatedgenesmayrepresentnewclassesoftumorsuppressors.Toelucidatepotentiallynoveloncogenes,weidentifiedthoseHGMD/COSMICoverlappingmutationsthatarenotonlyhighlyrecurrentbutalsomutuallyexclusivefrompreviouslycharacterizedoncogenicmutationsineachspecificcancertype.Takentogether,ourstudyrepresentsanovelapproachtodiscovernewcancergenesfromthevastamountofcancergenomesequencingdata.
Page 67
57
IDENTIFYINGCANCERSPECIFICMETABOLICSIGNATURESUSINGCONSTRAINT-BASEDMODELS
AndréSchultz1,SanketMehta1,ChenyueW.Hu1,FiekeW.Hoff2,TerzahM.Horton3,StevenM.Kornblau2,AminaA.Qutub1
1RiceUniversity,2UniversityofTexasMDAndersonCancerCenter,3BaylorCollegeof
MedicineandTexasChildren'sHospital
AndréSchultzCancermetabolismdiffersremarkablyfromthemetabolismofhealthysurroundingtissues,anditisextremelyheterogeneousacrosscancertypes.Whilethesemetabolicdifferencesprovidepromisingavenuesforcancertreatments,muchworkremainstobedoneinunderstandinghowmetabolismisrewiredinmalignanttissues.Tothatend,constraint-basedmodelsprovideapowerfulcomputationaltoolforthestudyofmetabolismatthegenomescale.Togeneratemeaningfulpredictions,however,thesegeneralizedhumanmodelsmustfirstbetailoredforspecificcellortissuesub-types.Herewefirstpresenttwoimprovedalgorithmsfor(1)thegenerationofthesecontext-specificmetabolicmodelsbasedonomicsdata,and(2)Monte-Carlosamplingofthemetabolicmodelfluxspace.Byapplyingthesemethodstogenerateandanalyzecontext-specificmetabolicmodelsofdiversesolidcancercelllinedata,andprimaryleukemiapediatricpatientbiopsies,wedemonstratehowthemethodologypresentedinthisstudycangenerateinsightsintotherewiringdifferencesacrosssolidtumorsandbloodcancers.
Page 68
58
SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
Page 69
59
MAPPINGNEURONALCELLTYPESUSINGINTEGRATIVEMULTI-SPECIESMODELINGOFHUMANANDMOUSESINGLECELLRNASEQUENCING
TravisJohnson,ZacharyAbrams,YanZhang,KunHuang
OhioStateUniversity
TravisJohnsonMousebraintranscriptomicstudiesareimportantintheunderstandingofthestructuralheterogeneityinthebrain.However,itisnotwellunderstoodhowcelltypesinthemousebrainrelatetohumanbraincelltypesonacellularlevel.Weproposethatitispossiblewithsinglecellgranularitytofindconcordantgenesbetweenmouseandhumanandthatthesegenescanbeusedtoseparatecelltypesacrossspecies.Weshowthatasetofconcordantgenescanbealgorithmicallyderivedfromacombinationofhumanandmousesinglecellsequencingdata.Usingthisgeneset,weshowthatsimilarcelltypessharedbetweenmouseandhumanclustertogether.Furthermorewefindthatpreviouslyunclassifiedhumancellscanbemappedtotheglial/vascularcelltypebyintegratingmousecelltypeexpressionprofiles.
Page 70
60
ASPATIOTEMPORALMODELTOSIMULATECHEMOTHERAPYREGIMENSFORHETEROGENEOUSBLADDERCANCERMETASTASESTOTHELUNG
KimberlyR.KanigelWinner1,JamesC.Costello2
1ComputationalBioscienceProgram,DepartmentofPharmacology,UniveristyofColoradoCancerCenter;2UniversityofColoradoAnschutzMedicalCampus
KimberlyKanigelWinnerTumorsarecomposedofheterogeneouspopulationsofcells.Somaticgeneticaberrationsareoneformofheterogeneitythatallowsclonalcellstoadapttochemotherapeuticstress,thusprovidingapathforresistancetoarise.Insilicomodelingoftumorsprovidesaplatformforrapid,quantitativeexperimentstoinexpensivelystudyhowcompositionalheterogeneitycontributestodrugresistance.Accordingly,wehavebuiltaspatiotemporalmodelofalungmetastasisoriginatingfromaprimarybladdertumor,incorporatinginvivodrugconcentrationsoffirst-linechemotherapy,resistancedatafrombladdercancercelllines,vasculardensityoflungmetastases,andgainsinresistanceincellsthatsurvivechemotherapy.Inmetastaticbladdercancer,afirst-linedrugregimenincludessixcyclesofgemcitabinepluscisplatin(GC)deliveredsimultaneouslyonday1,andgemcitabineonday8ineach21-daycycle.Theinteractionbetweengemcitabineandcisplatinhasbeenshowntobesynergisticinvitro,andresultsinbetteroutcomesinpatients.Ourmodelshowsthatduringsimulatedtreatmentwiththisregimen,GCsynergydoesbegintokillcellsthataremoreresistanttocisplatin,butrepopulationbyresistantcellsoccurs.Post-regimenpopulationsaremixturesoftheoriginal,seededresistantclones,and/ornewclonesthathavegainedresistancetocisplatin,gemcitabine,orbothdrugs.Theemergenceofatumorwithincreasedresistanceisqualitativelyconsistentwiththefive-yearsurvivalof6.8%forpatientswithmetastatictransitionalcellcarcinomaoftheurinarybladdertreatedwithaGCregimen.Themodelcanbefurtherusedtoexploretheparameterspaceforclinicallyrelevantvariables,includingthetimingofdrugdeliverytooptimizecelldeath,andpatient-specificdatasuchasvasculardensity,ratesofresistancegain,diseaseprogression,andmolecularprofiles,andcanbeexpandedfordataontoxicity.Themodelisspecifictobladdercancer,whichhasnotpreviouslybeenmodeledinthiscontext,butcanbeadaptedtorepresentothercancers.
Page 71
61
SCALABLEVISUALIZATIONFORHIGH-DIMENSIONALSINGLE-CELLDATA
JuhoKim,NateRussell,JianPeng
UniversityofIllinoisatUrbana-Champaign
JuhoKimSingle-cellanalysiscanuncoverthemysteriesinthestateofindividualcellsandenableustoconstructnewmodelsabouttheanalysisofheterogeneoustissues.State-of-the-arttechnologiesforsingle-cellanalysishavebeendevelopedtomeasurethepropertiesofsingle-cellsanddetecthiddeninformation.Theyareabletoprovidethemeasurementsofdozensoffeaturessimultaneouslyineachcell.However,duetothehigh-dimensionality,heterogeneouscomplexityandsheerenormityofsingle-celldata,itsinterpretationischallenging.Thus,newmethodstoovercomehigh-dimensionalityarenecessary.Here,wepresentacomputationaltoolthatallowsefficientvisualizationofhigh-dimensionalsingle-celldataontoalow-dimensional(2Dor3D)spacewhilepreservingthesimilaritystructurebetweensingle-cells.Wefirstconstructanetworkthatcanrepresentthesimilaritystructurebetweenthehigh-dimensionalrepresentationsofsingle-cells,andthen,embedthisnetworkintoalow-dimensionalspacethroughanefficientonlineoptimizationmethodbasedontheideaofnegativesampling.Usingthisapproach,wecanpreservethehigh-dimensionalstructureofsingle-celldatainanembeddedlow-dimensionalspacethatfacilitatesvisualanalysesofthedata.
Page 72
62
COMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION
POSTERPRESENTATIONS
Page 73
63
CLUSTER-BASEDGENOTYPE-ENVIRONMENT-PHENOTYPECORRELATIONALGORITHM
ErnestoBorrayo,RyokoMachida-Hirano
GeneResearchCenter,UniversityofTsukuba
ErnestoBorrayoTheinteractionsbetweengenotypeandenvironmentgiverisetophenotypicplasticity.However,theseinteractionsaredynamicandcomplex.Whatisconsideredasaphenotypeatoneevaluation,canbeconsideredasanenvironmentalconditionatsomeother,asthatpreviousphenotypewillaffectparticularconditionsforthenewone.Also,underaspecificperspectiveadeterminedgeneticmaterialcanbeconsideredasanenvironmentalconditionforotherloci.Theseconceptselucidatethatthe“onegene,onetrait”rationaleisrathertheexceptionthantherule,andinordertoadequatelypredictthepossiblephenotypeexpectedatanybiologicallevel,thespecificinteractionbetweenenvironmentandgenotypeshouldbeanalyzedcarefully.Inordertoinferthedegreeofinfluenceofbothagenotypeandanenvironmentovercertainphenotypictraits,wedevelopedacluster-basedalgorithmthatrendersthewayphenotypicaltraitscanbeexplainedbyeitherthatgenotypeorsuchenvironmentalconditions.Althoughthisapproachisstillfarfrombeingabletoconsiderallpossibleaspectsthatmayexplainaphenotypiccondition,itisafirstapproachtosuccessfullyanalyzingthementionedgenotype-environment-phenotypeinteractionsinacomprehensivemanner.Totestthealgorithmalongwithsyntheticdata,realgenetic,environmentalandagromorphologicaltraitsofTheobromacacaoandSechiumedulewerealsoanalyzed.Weexpectthatfurtherexplorationofdifferentclassifierswillhelptoadequatelypredictphenotypicexpressionatdifferentbiologicallevels—withsignificantapplicationsindiversefieldssuchascropimprovement,genomics,clinicaldiagnosis/prognosis/treatmentandmetabolomics—andthatitwillenhanceourunderstandingofgenomics,metabolomicsandadaptation/evolutionaryprocesses.
Page 74
64
QUANTITATINGTRANSLATIONALCONTROL:MRNAABUNDANCE-DEPENDENTANDINDEPENDENTCONTRIBUTIONS
JingyiJessicaLi1,Guo-LiangChew2,MarkD.Biggin3
1DepartmentofStatisticsandDepartmentofHumanGenetics,UCLA;2ComputationalBiologyProgram,FredHutchinsonCancerResearchCenter;3BiologicalSystemsand
EngineeringDivision,LawrenceBerkeleyNationalLaboratory
JingyiJessicaLiTranslationratepermRNAmoleculecorrelatespositivelywithmRNAabundance.Asaresult,proteinlevelsdonotscalelinearlywithmRNAlevels,butinsteadscalewiththeabundanceofmRNAraisedtothepowerofan“amplificationexponent”.Hereweshowthattoquantitatetranslationalcontrolitisnecessarytodecomposethetranslationrateintotwocomponents.Onecomponent,TRmD,dependsonthemRNAlevelanddefinestheamplificationexponent.Theothercomponent,TRmIND,isindependentofmRNAamountandimpactsthecorrelationcoefficientbetweenproteinandmRNAlevels.WeshowthatinS.cerevisiaeTRmDrepresents~30%ofthevarianceintranslationandresultsinanamplificationexponentof~1.20–1.27.TRmINDconstitutestheremaining70%ofthevarianceintranslationandexplains<5%ofthevarianceinproteinexpression.Whenproteindegradationisalsoconsidered,thecorrelationbetweentheabundancesofproteinandmRNAisR2prot–RNA>0.92.WealsoinvestigatewhichmRNAsequenceelementsexplainthevarianceinTRmDandTRmIND.WefindthatTRmINDismoststronglydeterminedbythelengthoftheopenreadingframe,whileTRmDismorestronglydeterminedbyanArich,highlyunfoldedelementthatspansnucleotides-35to+28relativetotheinitiatingAUGcodon,implyingthatTRmINDisunderdifferentevolutionaryselectivepressuresthanTRmD.OurworkintroducesmethodsforcorrectlyscalingmRNAandproteinabundancedatausinginternallycontrolledstandards.Itprovidesquitedifferent,moreaccurateestimatesoftranslationalcontrolthananyprevious.Bydecomposingtranslationrates,wealsoprovideinsightsintothemRNAsequencedependenciesoftranslationthatwouldnotbeapparentotherwise.
Page 75
65
PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION
ShengWang,MengQu,JianPen
UniversityofIllinoisUrbanaChampaign
ShengWangAutomatedannotationofproteinfunctionhasbecomeacriticaltaskinthepost-genomicera.Network-basedapproachesandhomology-basedapproacheshavebeenwidelyusedandrecentlytestedinlarge-scalecommunity-wideassessmentexperiments.Itisnaturaltointegratenetworkdatawithhomologyinformationtofurtherimprovethepredictiveperformance.However,integratingthesetwoheterogeneous,high-dimensionalandnoisydatasetsisnon-trivial.Inthiswork,weintroduceanovelproteinfunctionpredictionalgorithmProSNet.Anintegratedheterogeneousnetworkisfirstbuilttoincludemolecularnetworksofmultiplespeciesandlinktogetherhomologousproteinsacrossmultiplespecies.Basedonthisintegratednetwork,adimensionalityreductionalgorithmisintroducedtoobtaincompactlow-dimensionalvectorstoencodeproteinsinthenetwork.Finally,wedevelopmachinelearningclassificationalgorithmsthattakethevectorsasinputandmakepredictionsbytransferringannotationsbothwithineachspeciesandacrossdifferentspecies.Extensiveexperimentsonfivemajorspeciesdemonstratethatourintegrationofhomologywithmolecularnetworkssubstantiallyimprovesthepredictiveperformanceoverexistingapproaches.
Page 76
66
GENERAL
POSTERPRESENTATIONS
Page 77
67
IDENTIFICATIONOFDIFFERENTIALLYPHOSPHORYLATEDMODULESINPROTEININTERACTIONNETWORKS
MarziehAyati,DanicaWiredja,DanielaSchlatzer,GouthamNarla,MarkChance,MehmetKoyutürk
CaseWesternReserveUniversity
MehmetKoyuturkAdvancesinhigh-throughputomicstechnologiesrevolutionizedourunderstandingofthegenomicunderpinningsofcancer.However,manychallengesremaininunderstandinghowpatientswithcommondrivermutationsmaydisplaydivergingphosphoproteomicresponsestothesametreatment.Thus,anexaminationofthesignalinglandscapewillprovideessentialmolecularinformationformodelingpersonalizedpatienttreatmentdesign.However,integrativebioinformaticsapproachestoidentifyphosphoproteomics-basedmolecularstatesareintheirinfancy.Toaddressthischallenge,weadaptouralgorithmMoBaS,whichhasbeenoriginallydevelopedtoidentifyphenotype-associatedsubnetworksinthecontextofgenome-wideassociationstudies.MoBaStakesasinputaPPInetworkandascoreforeachproteinindicatingtheprotein’sdifferentialphosphorylationlevel.Itthenidentifiesproteinsubnetworksthatare(i)composedofdenselyinteractingproteins,and(ii)enrichedinproteinswithhighscores.MoBaSalsoassessesthestatisticalsignificanceoftheidentifiedsubnetworksusingpermutationteststhateffectivelyhandlemultiplehypothesistesting.WeapplyMoBaStocompareandcontrastthedrug-inducedglobalsignalingalterationsoftwoKRASmutatednon-smallcelllungcancer(NSCLC)celllines,A549andH358,treatedwithanovelactivatorofthetumorsuppressorProteinPhosphatase2A(PP2A)versusDMSOcontrol.Applyingkinaseenrichmentanalysisonidentifiedsubnetworks,weidentifyAuroraKBasakeykinasedifferentiallyregulatedbetweenthetwocelllinesinresponsetoourcompound.Furthercorroboratingthisfinding,weshowthatAuroraKBisdownregulatedattheproteinandmRNAlevelswithourtreatmentinA549butnotinH358.
Page 78
68
CLUSTERINGMETHODFORPRIORITIZINGBREASTCANCERRISKGENESANDMIRNAS
YongshengBai,NaureenAslam,AliSalman
IndianaStateUniversity
YongshengBaiBackgroundMicroRNAs(miRNA)areshortnucleotidesthatinteractwiththeirtargetmRNAsthrough3’untranslatedregions(UTRs).TheCancerGenomeAtlas(TCGA)projectinitiatedin2006hasachievedtosequencetissuecollectionwithmatchedtumorandnormalsamplesfrom11,000patientsin33cancertypesandsubtypes,including10rarecancers.ThereisanurgentneedtodevelopinnovativemethodologiesandtoolsthatcanclustermRNA-miRNAinteractionpairsintogroupsandcharacterizefunctionalconsequencesofcancerriskgeneswhileanalyzingthetumorandnormalsamplessimultaneously.RationaleAnundirectedgraphcanbeusedtorepresentgeneandmiRNArelationshipsinaninteractionnetwork.Specifically,interactionsbetweengenesandmiRNAsarerenderedasabipartitegraphwithgenesormiRNAsasverticesandtheircalculatedcorrelationasedges.Ourhypothesisis:Ifahighlyscoredgene/miRNAclusterinagiventumorsampleshowsasignificantlyalteredregulationrelativetoasimilargene/miRNAclusterinthecorrespondingnon-tumorsample,theclusterisbiologicallysignificant.ResultsWedevelopedapowerfulmathematicalmodeltoidentifyclustersofsignificantmRNAandmiRNAinteractionpairsanddeciphermRNAandmiRNAregulationnetworkusingTCGAmiRNAsequencingandmRNAsequencingdata.WerantheclusterdetectionalgorithmimplementedinPython3onTCGABreastInvasiveCarcinoma(BRCA)transcriptome(bothRNA-SeqandmiRNA-Seq)datasets.Usingdifferentclustersize(orbin)anddifferentselectionofmiRNAandmRNApairsforcreatingclusterswillgeneratedifferenttopologyofclusters,therefore,resultingindifferentnumbersofcommonclustersbetweentumorandnormalsamplesaswell.Weran1,000differentrandomselectionsoftargetpairstogeneratedifferentclustertopologyandcombinedallresultstogethertoobtain105,850distinctivecandidateclustersforprioritization.ConclusionsWethinkourmethodologyforidentifyingcancerdrivergenesinpersonalgenomesinwhichcliniciansseektodevelopbettertreatmentstrategiesisvaluabletothefield.Ourproposedmethodshouldbeapplicableacrossarangeofdiseasesandcancers.
Page 79
69
FUSIONDB:ASSESSINGMICROBIALDIVERSITYANDENVIRONMENTALPREFERENCESVIAFUNCTIONALSIMILARITY
ChengshengZhu1,YannickMahlich1,2,3,4,YanaBromberg1,4
1DepartmentofBiochemistryandMicrobiology,SchoolofEnvironmentalandBiologicalSciences,RutgersUniversity,NewBrunswick,NJ,USA;2GraduateSchool,Centerof
DoctoralStudiesinInformaticsanditsApplications(CeDoSIA),TUM,Garching,Germany;3DepartmentofInformatics,Bioinformatics&ComputationalBiology-I12,TUM,Garching,Germany;4InstituteofAdvancedStudy(TUM-IAS),Garching,Germany
YanaBrombergSummary:Microbialfunctionaldiversificationisdrivenbyenvironmentalfactors.Insomecases,microbesdiffermoreacrossenvironmentsthanacrosstaxa.HereweintroducefusionDB,anoveldatabaseofmicrobialfunctionalsimilarities,indexedbyavailableenvironmentalpreferences.fusionDBentriesrepresentnearlyfourteenhundredtaxonomically-distinctbacteriaannotatedwithavailablemetadata:habitat,temperature,andoxygenuse.Eachmicrobeisencodedasasetoffunctionsrepresentedbyitsproteome,andindividualmicrobesareconnectedviacommonfunctions.DatabasesearchesproduceeasilyvisualizableXML-formattednetworkfilesofselectedorganisms,alongwiththeirsharedfunctions.fusionDBthusprovidesafastmeansofassociatingspecificenvironmentalfactorswithorganismfunctions.Availability:http://bromberglab.org/databases/fusiondbandasasql-dumpbyrequest.Contact:[email protected] ,[email protected]
Page 80
70
THEGEORGEM.O’BRIENKIDNEYTRANSLATIONALCORECENTERATTHEUNIVERSITYOFMICHIGAN
FrankC.Brosius1,WenjunJu1,KeithBellovich2,ZeenatBhat3,CrystalGadegbeku4,DebbieGipson1,JenniferHawkins1,JuliaHerzog1,SusanMassengill5,RichardC.
McEachin1,SubramaniamPennathur1,KalyaniPerumal6,RogerWiggins1,MatthiasKretzler1
1UniversityofMichigan,2RenaissanceRenalResearchInstitute,3WayneStateUniversity,
4TempleUniversity,5LevineChildren’sHospital,6UniviversityofIllinoisatChicago
RichardMcEachinRecentadvanceshaveallowedthedevelopmentofmolecularmapstodefinechronickidneydisease(CKD)innew,accurateandpersonalizedways.ThesedevelopmentsmakepossiblethepredictionofoutcomesandresponsetotherapyandtheidentificationofkeymoleculartargetsfortreatmentofCKDinindividualpatients.IdentificationofsuchtargetsentailsclosecollaborationbetweenteamsofinvestigatorstocollectandannotatesamplesfromwellcharacterizedCKDsubjects.Inaddition,technologiesareneededthatsupportinformationexchange,robustdatabanks,anddataintegrationtodefinekeypathwaysdrivingCKDpathogenesis.TheO'BrienKidneyTranslationalCoreCenterattheUniversityofMichiganprovidessuchbiobanking,databankstructureandbioinformaticsupporttobasicandclinicalinvestigatorstoallowthemtopursuecriticalprecisionmedicineinvestigationsofhumanswithCKD.TheClinicalPhenotypingandBiobankCorehasenrolledover1200patientswithCKDfrom5sitesandbankedtheirsamplesandclinicalinformationprovidingavaluableresourceforefficientdiscovery.Multiplespecificresearchstudieshavenowsuccessfullyutilizedtheseresources.TheAppliedSystemsBiologyCoreanditsonlineanalyticaltool,Nephroseq,haveassistedhundredsofinvestigatorsaroundtheworldinapproachestotheanalysisoflargetranscriptomicdatasetsandothersystems-level,biologicalstudiesofpatientswithCKD.TheCenter’sBioinformaticsCoreprovidesaccesstocomputationalapplicationsandskilledprofessionalsupportinbioinformaticsandbiostatisticsandwillnowbeprovidingback-endmaintenanceofNephroseq.TheAdministrativeCoredirectspilotandsmallgrants,studenttraininganddiscountprogramswiththegoalofhelpingnewandestablishedresearchersutilizesystemsbiologicalandtranslationalresearchtools.Togetherthesecoresprovideacomprehensivetranslationalresearchsupportfornovelresearchintoclassificationandtreatmentofchronickidneydiseases.Allinterestedacademicinvestigatorsaroundtheworldareinvitedtomakeuseoftheseservicesandtocontactusforinformationandconsultation.
Page 81
71
MININGDIRECTIONALDRUGINTERACTIONEFFECTSONMYOPATHYUSINGTHEFAERSDATABASE
DanaiChasioti1,XiaohuiYao1,PengyueZhang2,XiaNing3,LangLi2,LiShen4
1IUPUISchoolofInformaticsandComputing;2CenterforComputationalBiologyandBioinformatics,DepartmentofMedicalandMolecularGenetics,IndianaUniversity
SchoolofMedicine;3IUPUIDepartmentofComputerScience;4CenterforNeruoimaging,DepartmentofRadiologyandImagingSciences,IndianaUniversitySchoolofMedicine
LiShenBackground:Mininghigh-orderdrug-druginteraction(DDI)inducedadversedrugeffects(ADEs)fromelectronichealthrecord(EHR)databasesisanemergingarea,andveryfewstudieshaveexploredtherelationshipsbetweenDDIs.Tobridgethisgap,westudyanovelpharmacovigilanceproblemforminingdirectionaldruginteractioneffectonmyopathyusingtheFDAAdverseEventReportingSystem(FAERS)database.Method:Theanalysiswasperformedonacase–controldatasetextractedfromtheFAERSdatabase.Thedatasetcontains1,763drugs,andincludes136,860myopathyeventsand3,940,587controlevents.GiventwosetsofdrugcombinationsD1andD2(asupersetofD1),wedefinethedirectionalADEeffectfromD1toD2,asthealteredADEriskassociatedwiththechangefromtakingD1totakingD2.TheADEriskswereestimatedusingoddratios(ORs).Toaddressbothcomputationalandstatisticalchallenges,thisstudywasfocusedoncomputingORsforfrequentD2’s(i.e.,thenumberofoccurrencesauser-specifiedminimumsupport).TheApriorialgorithmwasemployedtoidentifyfrequentD2’s.Results:Usingtheminimumsupportof1000,weidentified764frequentdrugs,7036frequent2drugcombinations,and4280frequent3drugcombinations.ThetoptenADEORsforsingledrugsrangefrom4.1to5.6,fortwodrugcombinationsfrom12.6to21.5,andforthreedrugcombinationsfrom14.8to19.5.ThetoptendirectionalADEORsbetweenonedrugandtwodrugsrangefrom13.5to28.2;thosebetweenonedrugandthreedrugsrangefrom13.1to20.3;andthosebetweentwodrugsandthreedrugsrangefrom11.3to34.4.MultiplepromisingdirectionalADEfindingswereidentified.Forexample,theriskofmyopathyis28.2timeshigherwhenaddingGadopentetatedimeglumineontopofGadobenatedimeglumine.BothdrugsareGadolinium-basedcontrastagents(GBCAs)usedinmagneticresonanceimaging.GBCAshavebeenshowntobeassociatedwithNephrogenicsystemicfibrosis(NSF)whichmaypresentasprogressivemyopathy.Conclusion:ThedirectionaldruginteractionscapturetheADErisksintroducedbyadditionaldrugstakenontopofasetofbaselinedrugs,andprovidenovelandvaluablepharmacovigilanceknowledgewithpotentialtoimpactclinicaldecisionsupport.MiningfrequentpatternsusingAprioriisapromisingapproachforeffectivediscoveryofhigh-orderdirectionaldruginteractioneffects.
Page 82
72
DECIPHERINGNEURONALBROADHISTONEH3K4ME3DOMAINSASSOCIATEDWITHGENE-REGULATORYNETWORKSANDCONSERVEDEPIGENOMIC
LANDSCAPESINTHEHUMANBRAIN
AslihanDincer1,EricE.Schadt2,BinZhang2,JoelT.Dudley2,DavinGavin3,SchahramAkbarian4
1DepartmentofNeuroscience,FriedmanBrainInstitute,IcahnSchoolofMedicineatMountSinai,NewYork;2DepartmentofGeneticsandGenomicSciences,InstituteforGenomicsandMultiscaleBiology,IcahnSchoolofMedicineatMountSinai,NewYork;
3DepartmentofPsychiatry,JesseBrownVeteransAffairsMedicalCenter,Chicago;4DepartmentofPsychiatry,FriedmanBrainInstitute,IcahnSchoolofMedicineatMount
Sinai,NewYork
AslihanDincerOnlyfewhistonemodificationshavebeenmappedinhumanbrain.TrimethylationofhistoneH3atlysine4(H3K4me3)isachromatinmodificationknowntomarkthetranscriptionstartsites(TSS)ofactivegenepromoters.RegulatorsofH3K4me3markaresignificantlyassociatedwiththegeneticriskarchitectureofcommonneurodevelopmentaldisease,includingschizophreniaandautism.Here,throughintegrativecomputationalanalysisofepigenomicandtranscriptomicdatabasedonnextgenerationsequencing,weinvestigatedH3K4me3landscapesofFACSsortedneuronalandnon-neuronalnucleiinhumanpostmortem,non-humanprimate(chimpanzeeandmacaque)andmouseprefrontalcortex(PFC),andblood.WecharacterizedthebroadH3K4me3histonedomainsfromhumanPFCinthecontextofcell-typespecificregulation,associationwithneuronalandnon-neuronalgeneexpressionandpotentialimplicationsfornormalanddiseaseddevelopment.WefirstaddressedtheoccurrenceandthebiologicalsignificanceofthebroadH3K4me3histonedomainsinthreedifferentcelltypes,includingNeuN+PFCneurons,NeuN-PFCcells,andnucleatedbloodcellsandthenidentifiednovelregulatorsofthesethreedifferentcelltypesbyfocusingontop5%broadestH3K4me3peaks(lengthinbasepairs).InPFCneurons,broadestpeaksrangedinsizefrom3.9to12kb,withextremelybroadpeaks(~10kborbroader)relatedtosynapticfunctionandGABAergicsignaling(DLX1,ELFN1,GAD1,LINC00966).Broadestneuronalpeaksshoweddistinctmotifsignatures,andwerecentrallypositionedinprefrontalgenebayesianregulatorynetworks.Approximately120ofthebroadestH3K4me3peaksinhumanPFCneurons,includingmanygenesrelatedtoglutamatergicanddopaminergicsignaling,werefullyconservedinchimpanzee,macaqueandmousecorticalneurons.Explorationofspreadandbreadthoflysinemethylationmarkingsinspecificcelltypescouldprovidenovelinsightsintoepigeneticmechanismofnormalanddiseasedbraindevelopment,agingandevolutionofneuronalgenomes.
Page 83
73
NORMALIZATIONTECHNIQUESANDMACHINELEARNINGCLASSIFICATIONFORASSIGNINGMOLECULARSUBSETSINAUTOIMMUNEDISEASEANDCANCER
JenniferM.Franks1,2,GuoshuaiCai1,JaclynN.Taroni3,4,MichaelL.Whitfield1,21DepartmentofMolecularandSystemsBiology;2PrograminQuantitativeBiomedicalSciences,
GeiselSchoolofMedicineatDartmouth;3DepartmentofSystemsPharmacologyandTranslationalTherapeutics;4InstituteforTranslationalMedicineandTherapeutics,Universityof
PennsylvaniaPerelmanSchoolofMedicineJenniferFranksSystemicsclerosis(SSc)isacomplexconnectivetissuediseaseinvolvingskinandinternalorganfibrosis,vasculardamage,andimmunologicabnormalities.Tocharacterizediseaseheterogeneityandmolecularpathogenesis,transcriptomicshaveelucidatedcommonbiologicalprocessesinsubsetsofSScpatientsusingintrinsicgeneexpressionanalyses.Fourintrinsicsubsetscharacterizedbydistinctmolecularsignatureshavebeenvalidatedbymultipleindependentcohorts.Technicalbiasesinherenttodifferentgeneexpressionprofilingplatformspresentauniqueproblemwhenanalyzingdatageneratedfrommultiplestudies.WhilemicroarrayandRNA-seqdatahavebeenshowntohaveahighcorrelation,differencesinoverallprocessingandquantificationresultindistinctdatadistributions.Here,weintroduceanaccurateandreproducibleclassifierforSScmolecularsubtypesandhavedevelopedamethodtonormalizedatawhenplatform-specificartifactsarise.Weusedthreeindependent,well-characterizedandvalidatedexperimentalmicroarraydatasets(Hinchcliffetal.,2013;Milanoetal.,2008;Pendergrassetal.,2012)totrainasupervisedclassifierusingthree-foldcross-validationrepeatedtentimes,performingatanaverageof>88%accuracy.Datafromotherplatforms,includingRNA-seq,areanalyzedforplatform-basedbiasusingguidedPCAanalysis(Reeseetal.,2013).Wedevelopedamethodtoeliminateplatformbiasbynormalizingonagene-by-genebasisusingthemicroarraytrainingdataasthetargetdistribution.Wefindthatthismethodsuccessfullyremovesplatform-specificeffectsfromthedata.Followingnormalization,eachsampleisassignedtoamolecularsubsetbasedonsupportvectormachine(SVM)classification.OurpreliminaryanalysesfindthatthesemethodsworkextremelywellonavalidationRNA-seqdatasetinSSc(100%accuracy,n=12,Lietal.,inpreparation).WealsoappliedourmethodstobreastcancerDNAmicroarrayandRNA-seqdatafromTheCancerGenomeAtlas(TCGA)(CancerGenomeAtlas,2012)wherefiveintrinsicgeneexpressionsubsetshavebeenpreviouslyidentifiedanddescribedwithPAM50(Parkeretal.,2009).Tumorandtumor-adjacentnormalbiopsiesofbreastcancer,forwhichintrinsicsubtypeinformationwasavailable,wereusedtotrainandtestaSVMandevaluateournormalizationtechnique.Weachieve93%accuracyinassigningsubtypesfornormalizedRNA-seqdatausingourclassifiertrainedexclusivelyonmicroarraydata.Untilrecently,clinicaltrialsanddiagnosingphysicianshavenotconsideredmolecularheterogeneityinthecontextofimmunosuppressivetherapy,whichmayexplainimprovementinselectSScpatients(Martyanov&Whitfield,2016).Advancingpersonalizedmedicinebyusingintrinsicmolecularsubsetsmayproveparticularlybeneficialtothisfield.Withournewlydevelopedtechniques,wecansuccessfullyleverageinformationfromvalidatedexpressiondatainnewanalysesdespitedifferentplatformsusedforgeneexpressionprofiling.
Page 84
74
MULTI-OMICSDATAINTEGRATIONTOSTRATIFYPOPULATIONINHEPATOCELLULARCARCINOMA
KumardeepChaudhary,OlivierPoirion,LiangqunLu,LanaGarmire
UniversityofHawaiiCancerCenter,Honolulu
LanaGarmireHighmortalityrateofHepatocellularCarcinoma(HCC)isinpartduetothevastheterogeneityofthecancer.IdentifyingrobustmolecularsubgroupsofHCChelpstoguideprecisetargetedtherapeutics.Thiscouldberealizedbyintegratingdifferentlayersofomicsdatasetsfromthesamecohort.Toachievethis,wepresentadeeplearning(DL)basedmethodtoinspectthedifferentsubpopulationsofpatientswithinHCCfromTCGA.Weobtainedtheinformationof360HCCpatientsavailableinTCGAwith3omicsdatatypes(RNA-seq,miRNA-seqandmethylation).Toidentifythedifferentsubpopulations,ourpipelineimplementsaDL-basedautoencoder,identifieshiddenlayerslinkedtosurvival,andperformsk-meansclusteringusingthesenewfeatures.Toassignnewsamplestotheidentifiedsubpopulations,asupervisedclassificationprocedurewasconductedusingSupportVectorMachine(SVM).Toassesstheperformanceofthemodel,weused5-foldscross-validationschemetoestimatec-indexandbrierscores.Wealsoused60:40ratiotosplitthedatain10foldsinordertoassessthesignificanceofthecoxphregressioninthetestdataset.Finally,weinferredtheclusterlabelsoftwoexternalcohortsbasedonthegeneexpressiondata.Autoencoderframeworkwasusedtocombinethe3omicsasinputfeatures(~40,000)andtoproduce100transformednewfeatures.Amongthesenewfeatures,weidentified36featuressignificantlylinkedwithsurvival,whichwerefurtherusedtoinfer2optimalclustersofpatientswithsignificantsurvivaldifferences.Usingcross-validationprocedure,weobtainedaveragec-indexandbrierscorevaluesof0.70and0.20respectively,forthetestsets.Also,thecoxphregressionshowssignificantsurvivalestimationwhenusingthetestsamples.Finally,ourframeworkisvalidatedontwoexternaldataset:221HCCsamplesfromGEOstudyand230HCCsamplesfromLIRI-JP(RIKEN)cohort.Moreover,weprovedthateachoftheindividualomicfeaturesetscanbeusedsuccessfullytoinferthe2survivalprofiles.However,thecombinationofthe3omicsismorepowerful.WealsocomparedtheDLmethodologywithnewfeaturesproducedbyPCAinstead.Theclinicalandmoleculardifferences(intermsofsurvival,pathways,anddrivermutationprofiles)weresignificantlydifferentforthetwosubpopulations.Thisisthefirststudytoemploydeeplearningasarobustframeworktoidentifynon-linearcombinationofmulti-omicsfeatureslinkedtoidentificationofsubclassesofHCCpatients.Usingmulti-omicsdatasets,ourpipelinesuccessfullycombinesthesedifferentfeaturesandidentifiestwoHCCsubpopulationsexhibitingdifferentsurvivalprofiles.Wethenusedthismodelincombinationwithsupervisedmachine-learningapproachestopredictHCCsubpopulationassignmentfortestandvalidationdatasets.
Page 85
75
TOWARDSSTANDARDS-BASEDCLINICALDATAWEBAPPLICATIONLEVERAGINGSHINYRANDHL7FHIR
NaHong,NareshProdduturi,ChenWang,GuoqianJiang
DepartmentofHealthSciencesResearch,MayoClinic,Rochester,MN
GuoqianJiangIntroduction:TheFastHealthcareInteroperabilityResources(FHIR)isanemergingclinicaldatastandarddevelopedatHL7,whichenablestherepresentationandexchangeoftheelectronichealthrecords(EHR)datainastandardstructure.FHIRhasstrongexecutableabilitybasedontheRESTfulservicearchitectureandmultipleflexibledataexchangeformats.ShinyisawebapplicationframeworkwithasimplifiedwebdeploymentmechanismthatenablespowerfulRfunctionstosupportthegraphicalandinteractiveanalysis.Therefore,withthegoalofbuildingreusableandextensibleclinicalstatisticsandanalysisapplications,weaimtodesign,developandevaluateaflexibleframeworkusingtheHL7FHIRstandardandtheR-poweredwebapplication-Shiny.Methods:WefirstestablishedalocalFHIRservertomanageourclinicaldata.ThispartofworkisfocusedontheanalysisandimplementationoftheFHIRdatamodels(i.e.,coreresources),dataexchangeformats(e.g.,XMLandJSON)andinvokinganopensourceHAPIFHIRAPI.Second,wedesignedtwoanalysisworkflowsthatarefocusedonpatient-centereddataanalysisandcohort-baseddataanalysisrespectively.Accordingtotheworkflowdesign,wedevelopedanopenapplicationplatformknownasShinyFHIRusingtheShinywebframeworkandtheestablishedFHIRserver.Results:WebuiltalocalFHIRserverusingtheHAPIDSTU2API.Intotal,140patientrecords,476observationrecords,496conditionrecordsand107procedurerecordswerepopulatedintotheFHIRserverfortesting.WiththesupportofRpackages,including‘jsonlite’,‘dygraph’and‘timeline’,ourplatformcanbeusedforavarietyofusecasesofclinicaldataanalysis,includingpatientbloodpressureobservationtimelineanalysis,patientcohortgender/agedistributionstatistics,etc.TheresultsoftheexperimentshowthattheShinyFHIRintegrationapproachoffersthefeasibilityofweb-basedinteractivestatisticsanalysisonstandardizedFHIR-basedclinicaldata.Discussions:TheimplementationsofFHIRhavealreadyattractedalotofinterestsfromhealthcarepractitioners.OurShinyFHIRimplementationprovidesausefulframeworkthatwouldbecomplementarytootherFHIR-basedapplications(e.g.,SMARTonFHIR).ShinyFHIRisdesignedtovisualizetheFHIR-conformantdatathroughcapturingtheuserexperiencesandhabits,andoffersrapidsupportforclinicalresearchwhilecombiningthelimitlessstatisticalpowerofR.However,thereareseveralissuesneedtobesolvedinthefuture,suchasthesupportoftheFHIRextensionsandcustommodelsandthesystemperformanceenhancement.Inthisstudy,wedescribedoureffortsinbuildingastandardizedclinicalstatisticsandanalysisapplicationleveragingShiny.WeconsiderthatthedesignedworkflowscanbeappliedtootherEHRsdatathatfollowstheFHIRstandard,andotherpublicavailableFHIRserverscanbeusedtovalidatetheutilityofourframework.
Page 86
76
ADATALAKEPLATFORMOFCONTEXTUALBIOLOGICALINFORMATIONFORAGILETRANSLATIONALRESEARCH
AustinHuang1,DmitriBichko1,MathieuBoespflug2,EdskodeVries3,FacundoDominguez2,DanielZiemek1
1Pfizer,2TweagI/O,3Well-Typed
AustinHuangResearchersneedtoaggregatecontextualbiologicalinformationinordertointerpretexperimentalandclinicalstudyresults.Theseneedsvarygreatlydependingonthescientificquestion.Creatinglarge-scale,structureddatarepositoriesrequiressubstantialinvestmentthatisnotamenabletotherapidly-evolvingneedsoftranslationalresearch.Ontheotherhand,performingdataanalysesusingadhoccollectionsoflocaldatafiles(excelsheets,csvtables,etc.)allowsrapidandflexibleexecution,italsocreatestechnicaldebt.Inthelongterm,theseworkflowsresultinmissedopportunitiestoaccumulateinstitutionalknowledgeandareassociatedwithpoorreproducibility.Wehaveimplementedadataplatformthatcanachievethebenefitsofamoreprincipledhandlingofdatapersistencewithminimalanalystoverhead.Thisisachievedbyautomatingschemainference,metadatacuration,versioning,andRESTfulserviceproductionthroughasimple,Git-likeingestiontool.DatascientistscanretrievedataviafamiliarclientlanguageAPIssuchasdplyrinR.Theplatformisbuiltonopensourcedatabase(Postgres,withanarchitecturethatallowsalternativebackends)andfunctionalprogramming(Haskell,PostgREST)technologies.Ourobjectiveistoacceleratedatasharing/discoverabilityonanalystteamsanddrasticallyreducetheeffortofpersistingdatainasystematicmechanism.Wethereforeprovideatechnologyfoundationforrapiddataserviceproductionandimprovingreproducibilityandreusabilityindataanalyses.
Page 87
77
GENOMEREADIN-MEMORY(GRIM)FILTER:FASTLOCATIONFILTERINGINDNAREADMAPPINGUSINGEMERGINGMEMORYTECHNOLOGIES
JeremieKim1,DamlaSenol1,HongyiXin2,DonghyukLee1,3,MohammedAlser4,HasanHassan5,OguzErgin5,CanAlkan4,OnurMutlu1,6
1DepartmentofElectricalandComputerEngineering,CarnegieMellonUniversity,Pittsburgh,PA;2DepartmentofComputerScience,CarnegieMellonUniversity,Pittsburgh,PA;3NVIDIA
Research,Austin,TX;4DepartmentofComputerEngineeringBilkentUniversity,Ankara,Turkey;5DepartmentofComputerEngineering,TOBBUniversityofEconomicsandTechnology,
Söğütözü,Ankara,Turkey;6DepartmentofComputerScience,SystemsGroupETH,Zürich,Switzerland
JeremieKimHigh-throughput sequencing (HTS) technology has resulted in a massive influx of available genetic data. Using HTS technology, genomes are sequenced relatively quickly and result in many short DNA sequences (reads) that are used to analyze the donor’s genome across multiple days when using state-of-the-art methods. The first step of genome analysis, read mapping, determines origins for billions of reads within a reference genome to identify the donor’s genomic variants. Hash-table based read mappers are a common type of comprehensive read mappers. They operate by fetching from a pre-generated hash-table, potential mapping locations of a read in the reference genome, which are verified by local alignment, a computationally-expensive dynamic programming algorithm that determines similarity between the read and the potential mapping segment of the reference genome. Alignment has traditionally been the computational bottleneck of read mapping, but recently, many works have been proposing a new step called Location-Filtering in order to alleviate this bottleneck.
Location-Filtering is a critical step where many incorrect potential locations from the hash-table are discarded before local alignment verifies such locations. FastHASH, SHD, and GateKeeper propose variations of Location-Filtering that discard only incorrect locations to reduce end-to-end runtime of hash-table based read mapping. Location-Filtering is now the computational bottleneck of read mapping.
Our goal is to create an efficient Location-Filter that quickly discards as many false negative locations as possible before alignment, while retaining a zero false positive rate. Efficiently filtering incorrect mappings before alignment significantly improves throughput and latency of hash-table based read mapping. We propose a novel filtering algorithm that quickly eliminates from consideration reference genome segments where alignment would yield no matches. Our algorithm’s novelty mainly stems from its design to exploit 3D-stacked memory systems. 3D-stacked memory is an emerging technology that tightly integrates computation and high-capacity memory in a single die stack, thereby enabling concurrent processing of large data chunks at low latency and high bandwidth. The key ideas of our design consist of 1) a new representation of coarse-grained reference genome segments such that the genome can be operated on in parallel using bitwise operations and 2) exploiting the parallel computation capability of 3D-stacked memory to run massively-parallel in-memory operations on the new genome representation. We call our resulting filter the GRIM-Filter.
This work shows how GRIM-Filter can be used with any hash-table based read mapping algorithm and how it effectively exploits processing-in-memory capabilities of 3D-stacked memory. We show that when running with 5% error tolerance, GRIM-Filter reduces false positive locations by 5.59x-6.41x and provides a 1.81x-3.65x end-to-end speedup over the state-of-the-art read mapper mrFAST with FastHASH
Page 88
78
BCL-2FAMILYMEMBERSASREGULATORSOFRESPONSIVENESSTOBORTEZOMIBINAMULTIPLEMYELOMAMODEL
MelissaE.Ko1,2,CharisTeh3,4,ChristopherS.Playter5,EliR.Zunder6,DanielH.Gray4,7,WendyJ.Fantl8,SylviaK.Plevritis9,GarryP.Nolan2
1CancerBiologyProgram,StanfordSchoolofMedicine,Stanford,CA;2BaxterLaboratoryforStemCellBiology,StanfordSchoolofMedicine,Stanford,CA;3MolecularGeneticsofCancerDivision,ImmunologyDivision,TheWalterandElizaHallInstitute,Parkville,VIC,Australia;
4DepartmentofMedicalBiology,TheUniversityofMelbourne,Parkville,VIC,Australia;5DepartmentofBiologicalSciences,PurdueUniversity,Lafayette,IN;6DepartmentofBiomedical
Engineering,UniversityofVirginia,Charlottesville,VA;7TheWalterandElizaHallInstitute,Parkville,VIC,Australia;8DepartmentofObstetricsandGynecology,StanfordSchoolof
Medicine,Stanford,CA;9DepartmentofRadiology,StanfordSchoolofMedicine,Stanford,CAMelissaKoSurvivalratesforBcellmalignancieshavesteadilyimprovedoverthelastfivedecadesreachinglevelsofover50%asaresultoftherapeuticagentssuchasdexamethasone,bortezomib,andlenalidomide.However,despitetheirsuccessinproducingclinicalresponses,thecellularmechanismsbywhichtheseagentskilltumorcellsarepoorlyunderstood.WehypothesizedthattheBcl-2familyofproteins,whichareknowntocontrolinitiationofapoptosisandarefrequentlydysregulatedincancerousBcellssuchasmultiplemyeloma,caninfluenceresponsivenesstothesetherapeuticagents.Thus,withafocusonmultiplemyeloma,weaimedtocomprehensivelyprofileindividualcellsfortheirexpressionlevelsofBcl-2familymemberssimultaneouslywithactivatedintracellularsignalingproteinsuponexposureofcellstodrugsusedtotreatB-cellmalignancies.Weappliedsingle-cellmasscytometrytoinvestigatetheinterplayofpro-survivalandpro-apoptoticBcl-2familymembersinMM1SBlymphoblasticcellsexposedtodifferentdrugs.ThisdatasetwasanalyzedwithFLOW-MAP,acomputationaltooldevelopedintheNolanLabthatorganizeshigh-dimensionalsingle-celldataintoaninterpretable2Dgraphstructure.FLOW-MAPenabledtheapoptoticprogressionofindividualcellstobevisualizedandshowedchangesinexpressionlevelsofBcl-2familymembersandsignalingfactorsacrosscellswithdifferentdrugsensitivities.Ourextensivestudyrevealedheterogeneousresponsesofcellsubsetstotherapeuticagentsusedtotreatmultiplemyelomapatients.Forexample,ourresultsshowedthatbortezomib,aproteasomeinhibitorapprovedfortreatmentofmultiplemyeloma,potentlyinducesapoptosiswithin24hourstoagreaterextentcomparedtoothertreatments.Inductionofapoptosisinsinglecellstreatedwithbortezomibcoincidedwithaselectivereductionofasubsetofpro-survivalBcl-2members.Furthermore,ouranalysissuggeststhatametricthatreflectsthebalanceofpro-survivalandpro-apoptoticBcl-2proteinsmaybestseparateandpredictcellswithdifferentialsensitivitytobortezomib.Thisparadigmissupportedbystatisticalmodelingwhereinwedevelopedaclassifierofbortezomib-resistantvs.sensitivecellsusingBcl-2familyinformationorasingleBcl-2scorewithsignificantaccuracy.Ourstudyprovidesageneralframeworkforunderstandingdifferentialsensitivityoftumorpopulationstoanti-cancerdrugs.Ourresultsarelikelytoidentifypreviouslyunknowndeath-inducingmechanismsaswellaspinpointpotentialsynergiesbetweenstandard-of-caretherapiesandnewlydevelopedtherapies,suchasBcl-2familyinhibitors.
Page 89
79
BIOMEDICALTEXT-MININGAPPLICATIONSFORTHESYSTEMDEEPDIVE
EmilyK.Mallory,ChrisRe,RussB.Altman
StanfordUniversity
EmilyMalloryAcompleterepositoryofbiomedicalrelationshipsiskeyforunderstandingtheprocessesunderlyingbothhumandiseaseanddrugresponse.Afterdecadesofexperimentalresearch,themajorityofknownbiomedicalrelationshipsexistsolelyintextualformintheliteratureandarethuscomputationallyinaccessible.Whilecurateddatabaseshaveexpertsmanuallyannotaterelevantrelationshipsorinteractionsfromtext,thesedatabasesstruggletokeepupwiththerapidgrowthofthebiomedicalliterature.Toaddresstheneedforbiomedicalrelationshipextraction,therehavebeennumerousbiologicalentityandrelationshipextractionchallenges;however,extractionsystemsinthebiomedicalspacetendtobetaskspecificanddonotprovideageneralframeworkforquicklydevelopingfuturesystemstoaddressnewextractiontasks.Inthiswork,wedevelopedmultipleentityandrelationshipapplications(called“extractors”)forthesystemDeepDivetoextractbiomedicalrelationshipsfromfulltextarticles.DeepDiveisatrainedsystemforextractinginformationfromavarietyofsources,includingtext.Applicationdeveloperscreatefeaturesandtrainingexamples,andDeepDiveassignsaprobabilitythatagivenentityorrelationshipiscorrectortrueintheoriginalsentence.Wedevelopedentityextractorsforgenes,drugs,anddiseases;andrelationshipextractorsforgene-gene,gene-disease,andgene-drugrelationships.Weevaluatedthegene-geneworkpreviouslywithacorpusofarticlesfromthreePLOSjournals,andwearecurrentlyevaluatingtheothertworelationshipextractorsonacorpusfromPubMedCentral.Theprecisionofourentityextractorsrangedfrom80to90%.Forthetaskofextractinggene-generelationships,oursystemachieved76%precisionand49%recallinextractingdirectandindirectinteractionspreviouslycuratedbytheDatabaseofInteractingProteins(DIP).Forrandomlycuratedextractions,thesystemachievedbetween62%and83%precisionbasedondirectorindirectinteractions,aswellassentence-levelanddocument-levelprecision.Ourcurrentgene-diseaseandgene-drugextractorsachievedover70%precisiononarandomsubsetofdocumentsfromover340,000fulltextarticlesinthePubMedCentralOpenAccessSubset.Wearecurrentlytuningtheseextractorstoincreaseperformance.Thisworkwillenablenotonlyfulltextliteratureextractionforbiomedicalrelationships,butalsocomputationalmethodsdevelopmentbasedontheserelationships.
Page 90
80
PROFILINGADAPTIVEIMMUNEREPERTOIRESACROSSMULTIPLEHUMANTISSUESBYRNASEQUENCING
SergheiMangul1,IgorMandric2,HarryTaegyunYang1,DennisMontoya1,NicolasStrauli3,JeremyRotman1,BenjaminStatz1,WillVanDerWey1,AlexZelikovsky2,Roberto
Spreafico1,MauraRossetti1,SagivShifman1,MarkAnsel3,NoahZaitlen3,EleazarEskin1
1UniversityofCaliforniaLosAngeles,2GeorgiaStateUniversity,3UniversityofCaliforniaSanFrancisco
SergheiMangulAssay-basedapproachesprovideadetailedviewoftheadaptiveimmunesystembyprofilingT-andB-cellreceptors.However,thesemethodscomeatahighcostandlackthescaleofregularRNAsequencing(RNA-seq).WedevelopedImReP,anovelcomputationalmethodthatutilizesRNA-seqdatatoprofiletheadaptiveimmunerepertoire.ImRePisabletoquantifyindividualimmuneresponsesfromRNA-SeqdatabasedonarecombinationlandscapeofgenesencodingB-andT-cellreceptors.WeappliedImRePto8,555samplesfrom544individualsand53diversehumantissues,andconstructedthecomplementaritydeterminingregions3(CDR3),whichisthemostvariablepartoftheantigen-bindingsite.Weassembled3.8milliondistinctCDR3sequences.Analyzingthisdataset,weidentifiedthenormal,healthy,adaptiveimmuneprofilefordifferenttissues.Wedescribethevariationinimmuneprofiles,andthedistributionofclonallineagesacrossindividualsandtissues.BaseontheimmuneprofilesgeneratedbyImReP,wewereabletoidentifyinflammationandvariousdiseases,asconfirmedfromthehistologicalimages.TheatlasofTandBcellrepertoires,freelyavailableathttps://sergheimangul.wordpress.com/atlas-of-t-and-b-cell-repertoires/,isthelargestrecourseintermsofthenumberofCDR3sequencesandtissuetypesinvolved.Weanticipatethisrecoursetoenhancefuturestudiesinareassuchasimmunologyandadvancedevelopmentoftherapiesforhumandiseases.ImRePisfreelyavailableathttps://sergheimangul.wordpress.com/imrep/.
Page 91
81
THECMHVARIANTWAREHOUSE-ACATALOGOFGENETICVARIATIONINPATIENTSOFACHILDREN'SHOSPITAL
NeilMIller1,GreysonTwist1,ByunggilYoo1,AndreaGaedigk2
1CenterforPediatricGenomicMedicine,Children'sMercy,KansasCity;2DivisionofClinicalPharmacology&TherapeuticInnovation,Children'sMercy,KansasCity,School
ofMedicine,UniversityofMissouri-KansasCity
NeilMillerAdvancesinhigh-throughputDNAsequencinghaveenabledthecomprehensiveidentificationofindividualgeneticvariationonanunprecedentedscale,poweringthediagnosisofdiseaseandpersonalizedtreatment.Astheabilitytodetectgeneticvariationhasgrown,cliniciansandresearchersstruggletointerpretthefunctionalsignificanceofthemillionsofvariantsfoundineachindividualgenome.TheVariantWarehouseattheCenterforPediatricGenomicMedicineatChildren’sMercy,KansasCity,isaresourcecontainingarecordofover160milliongenomicvariantsdetectedinmorethan5000patientssequencedbytheCentersince2011.EachvarianthasbeencharacterizedbytheCPGM’sRapidUnderstandingofNucleotideEffectSoftware(RUNES)pipeline,whichrecordsdatabasecrossreferences,predictedfunctionalconsequencesandavariantclassificationscore(1-5)basedonpreliminaryguidelinesfromtheAmericanCollegeofMedicalGeneticsandGenomics(ACMG).Additionally,alocalallelefrequencyiscalculatedforeachvariantevery6hoursenablingcliniciansandresearcherstorapidlyidentifyrarevariants.Despiteextensivecross-referencingwithdatabasessuchasdbSNP,ClinVar,ExACandCOSMICtheCMHvariantwarehousecontainsasignificantnumberofnovelvariantsnotpresentinexternaldatabases.59%ofthetotalvariantsinthewarehousearenovelwithalocalallelefrequencyoflessthan0.25%.Ofthese,1%arecategory1-3variantsexpectedtohavesomefunctionalimpact.Wehaveobserved82,578variantsamongapanelof58pharmacogenes(includingCPICgenes),ofwhich59%arenoveland2%arecategory1-3variants.Theamountofnoveltyobservedinthispatientpopulationsuggeststhateffortstocomprehensivelycataloghumanvariationremainaworkinprogressandthatinterpretationofvariantdatawillrequiresomelevelofinterpretationofnovelvariantsfortheforeseeablefuture.Theseobservationsareincreasinglyrelevantinpharmacogenomicsapplicationswheredrugcompatibilityisdeterminedthroughassociationtoknownhaplotypes;inthiscontext,thepresenceofnovelandrarevariantsmustbeanticipatedandaccountedforinautomatedhaplotypedetermination.TheCMHvariantwarehouseispubliclyavailableathttp://warehouse.cmh.edu.Toolstosearchandviewvariantsbygene,categoryandallelefrequencyareprovidedaswellasbulkdownloadsofdata.ProgrammaticaccesstodataisprovidedthroughimplementationsoftheGlobalAllianceforGenomicsandHealthvariantannotationAPI.
Page 92
82
MUTPRED2ANDITSAPPLICATIONTOTHEINFERENCEOFMOLECULARSIGNATURESOFDISEASE
VikasPejaver1,LiliaM.Iakoucheva2,SeanD.Mooney3,PredragRadivojac1
1DepartmentofComputerScienceandInformatics,SchoolofInformaticsandComputing,IndianaUniversityBloomington;2DepartmentofPsychiatry,UniversityofCaliforniaSanDiego;
3DepartmentofBiomedicalInformaticsandMedicalEducation,UniversityofWashingtonSeattlePredragRadivojacOverthepastdecade,severalmethodshavebeendevelopedforthecomputationalprioritizationofmissensemutations.However,theidentificationoftheeffectsofsuchmutationsonproteinstructureandfunctionstillremainamajorchallenge.Previously,wedevelopedMutPred,arandomforest-basedmodelfortheclassificationofpathogenicmissensevariantsandtheautomatedinferenceofmolecularmechanismsofdisease.Here,webuildonourpreviousworkandpresentMutPred2asanimprovedapproachforthesetasks.Forpathogenicityprediction,MutPred2particularlybenefitsfromalargerandheterogeneoustrainingset,theinclusionofnewfeatures,theencodingoflocalsequencecontextandtheuseofaneuralnetworkensemble.Throughcross-validationexperimentsandatestonanindependentdataset,weshowthatMutPred2outperformsMutPredandotherstate-of-the-artmethods.Inparticular,weobservethatMutPred2predictsfewerpathogenicmutationsthanPolyPhen-2,whenappliedtohomozygousmutationsfromhealthyindividuals.Additionally,MutPred2hasover50built-instructuralandfunctionalpropertypredictors,whichgreatlyincreasethenumberofpossibledownstreamconsequencesthatcanbeassociatedwithagivenaminoacidsubstitution.Weintroduceanovelrankingapproachthatutilizesapositive-unlabeledlearningframeworktoderiveposteriorprobabilitiesforthedisruptionofthesepropertiesand,thus,inferthemostlikelymolecularmechanismofpathogenicity.WethendemonstratetheutilityofMutPred2intwosituations.First,weidentifyprominentstructuralandfunctionalsignaturesinadatasetofmostlyMendeliandiseases(fromMutPred2’strainingset)andrecapitulateknownassociationsbetweenthesediseasesandorderedandstructuredregionsofproteins.Wealsomakenovelpredictionsabouttheroleofallostericresiduesinsuchdiseases.Second,weapplyMutPred2toadatasetofdenovomutationsfrompatientsdiagnosedwithneuropsychiatricdisorders,alongwithhealthysiblingsascontrols.Onthisdataset,MutPred2pathogenicityscoresalonearesufficienttodistinguishbetweenneuropsychiatriccasesandcontrols,withoutanyadditionalgene-basedorvariant-basedfiltering.Wealsoobservethatdisruptionsinprotein-proteininteractions(PPIs),phosphorylationandacetylationarefrequentmechanisms,suggestingthatneuropsychiatricdisordersarelargelycharacterizedbyabreakdowninmolecularsignaling.Finally,weidentifycandidatemutationspredictedtodisruptPPIsandvalidatethemexperimentally.
Page 93
83
HIV-TRACE:MONITORINGTHEHIVEPIDEMICINNEARREALTIMEUSINGLARGENATIONALANDGLOBALSCALEMOLECULAREPIDEMIOLOGY
SergeiPond1,StevenWeaver1,JoelWertheim2,AndrewJ.LeighBrown3
1TempleUniversity,2UniversityofCaliforniaSanDiego,3UniversityofEdinburgh
SergeiPondManypathogens,includingHIV,propagatealongsexualandsocialcontactnetworks.ItisnowclearthatHIVtransmissionnetworksbelongtothescalefreefamilyandthespreadofinfectionsinscalefreenetworksiscriticallyenhancedbyhighlyconnectedindividualsor“hubs”.Thestructureofthetransmissionnetworkhasmajorimplicationsforinterruptinganepidemic.Sincepathogentransmissionnetworksarenotobserveddirectly,theyareinferredandcharacterizedbasedonindirectmeasurements,andmethodstodothisproperlyremainsanopenresearchchallenge.Becauseoftheirrapidandhost-specificevolutionandchronicdiseasestates,HIVsequenceisolatesareessentiallyuniquetoeachinfectedperson.Thissequenceuniquenesscanbeusedtoconfirmorrejectthehypothesisthattwoindividualsare“linked”byarecenttransmissionorbelongtothesametransmissionclusterThereare~1,000,000HIVsequencesisolatedfromdifferentindividualsoverthelast4decades.Nationalandinternationalsurveillanceanddrugresistanceprogramsaregeneratinghighresolutionsequencingdataonhundredsofthousandsofisolatesannually.WedevelopedHIVTransmissionClusterEngine(HIV-TRACE)inordertomaketheprocessofcluster(andnetwork)inferenceautomated,fast,convenient,andmorerobust.Itisanefficientopen-sourceapplicationdesignedtoscalewellandenablenearreal-timeinferenceandanalysisoflargenetworks:itcanprocess100,000sequencesin~15-30minutesona64corebackendsystem.HIV-TRACE(hiv-trace.org)isanopen-sourcewebapplicationbuiltonrobustandpopularmodernlibraries.Userinteractionandresultvisualizationisdoneentirelyinthebrowser,processingisdoneasynchronouslyonaserverbackend.ComponentsandversionsofHIV-TRACEareusedbytheCDC(VARS,HICSB),Canadianpublichealthofficials,NYCDepartmentofPublichealth,SanDiegoprimaryinfectioncohort,andtheUKDrugResistanceDatabase.WeillustratetheutilityofHIV-TRACEonfourreal-worldexamplesofessentialquestionsinpublichealthandepidemiologyofHIV-1:1).Arethererapidlygrowingtransmissionclusters,andwhatisdrivingtheirgrowth?2).HowdoesHIVspreadatdifferentgeographicscales,andamongdifferentriskgroups?3).Howcantreatmentandinterventionbedeployedinoptimalwaystoreduceincidenceandprevalence?4).Canvaccineandpreventionefficacybemeasuredmoreaccuratelyusingnetwork-levelinformation.
Page 94
84
THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY
MaryA.Pyc,EmilyGiron,PhilipCheung,DouglasFenger,J.StevendeBelle,TimTully
DartNeuroScience
DouglasFengerWeareinterestedindiscoveringnewcandidatetargetsfordrugtherapiestoenhancecognitivevitalityinhumansthroughoutlife,andtoremediatememorydeficitsassociatedwithbraininjuryandbrain-relateddiseasessuchasAlzheimer’sandParkinson’s.Toachieveourgoalweneedacomprehensiveandobjectiveunderstandingofthehumangenomecontributiontovariationinmemoryperformanceinhealthyindividuals.WeareimplementingaGenome-WideAssociationStudy(GWAS)toidentifygeneticlocivaryingamongindividualswhopossessexceptionalandnormalmemoryabilities.Thesegenesandthoseinassociatednetworkswillinformdrugdiscoveryanddevelopment.Ourfirststepistoidentifyexceptionalmembersofthepopulation.Thus,wehavecreatedanonlinememorytest–theExtremeMemoryChallenge(XMC,accessibleathttp://www.extremememorychallenge.com)–toconvenientlyscreenthroughanunlimitednumberofsubjectstofindindividualswithexceptionalmemoryconsolidationabilities.Identifiedsubjectsare(1)validatedbyabatteryofsecondarymemorytasks,and(2)providingsalivasamplesfromwhichwecanisolateDNAforGWAS..TenpilotexperimentswereconductedtoparameterizetheXMCscreen.Participantslearnedface-namepairsforadelayedrecalltest.Afterinitialstudy,eachnamewaspresentedandparticipantswereaskedtoselectthecorrectfaceamongfour(distracterswereotherfacespairedwithdifferentnames).Onedaylaterparticipantscompletedafinaltesttrial.Weareprimarilyinterestedinforgettingacrosssessions,asthisprovidesanestimateofconsolidationacrossa24-hourtimeinterval.Pilotstudiesindicatedtheoptimalprotocolshouldinclude30face-namepairs,presentedata4secondrate.Todate,17,849participantsfrom176nationshavebeenscreenedintheXMC.Ofthese,11,311havecompletedbothsessions.IndividualsinoursamplearemostfrequentlyCaucasians(55%),post-secondaryschool-educated(63%),reportedbeingmostalertinthemorning(51%),andrighthanded(89.5%).Theaverageagewas34,andthegenderdistributionwassplitevenly.Theforgettingrate(decreaseinperformancefromday1today2)was10%.Wehaveidentified49individualswithperfectperformanceonday2ofthetestand24withexceptionalconsolidationabilities(definedas3SDsfromthemean).Wehavebegunthegenomicsphaseofthestudywith33individualswhohavecompletedadditionalbehavioraltesting.
Page 95
85
RESCUETHEMISSINGVARIANTS-LESSONSLEARNEDFROMLARGESEQUENCINGPROJECTS
YingxueRen1,JosephS.Reddy1,VivekanandaSarangi2,JasonP.Sinnwell2,SteveG.Younkin3,NilüferErtekin-Taner3,OwenA.Ross3,RosaRademakers3,ShannonK.McDonnell2,JoannaM.
Biernacka2,YanW.Asmann1
1DepartmentofHealthSciencesResearch,MayoClinic,Jacksonville,FL;2DepartmentofHealthSciencesResearch,MayoClinic,Rochester,MN;3DepartmentofNeuroscience,MayoClinic,
Jacksonville,FLYingxueRenIdentifyingnoveldiseasevariantsthroughnextgenerationsequencing(NGS)hasbeenafruitfulpracticeinmedicalresearchinrecentyears,leadingtothediscoveriesofnewdiseasemechanismsaswellastherapeuticstrategies.TheGATKbestpracticeshavesincebeenestablishedtoprovidegeneralrecommendationsoncoreprocessingstepsrequiredtogofromrawreadstofinalvariantcallsets.However,withthesamplesizedrasticallyincreasingintoday’ssequencingexperiments,manydefaultvariantcallingstrategiesandthechoiceoftoolscallforacloserexamination.OurstudyutilizedthewholeexomesequencingdataprovidedbytheAlzheimer'sDiseaseSequencingProject(ADSP)totestfordifferentvariantcallingstrategiesandtoolsinvolvedinthevariantdiscoveryworkflowinthecontextofsamplesizes.WefirstinvestigatedtheimpactofusingdifferentsequencealignersonvariantcallsetswhilekeepingthedefaultGATKsettingsofthevariantcallingandQCstepsidentical.Weselected1952samplestoalignbybothBWAandNovoAlign,andcomparedthevariantcallsetsin50,100,200,500,1000and1952samples.Wediscoveredthatthepercentageofvariantsuniquetoalignerincreaseddramaticallywithincreasingsamplesizes.Atsamplesizeof1952,theuniquevariantsgeneratedbyBWAandNovoAlignaccountformorethan20%oftotalcalledvariants.Theseuniquevariantshavegoodvariantqualitymetrics:~80%haveGenotypeQuality(GQ)scoreof60orabove,andtheirdistributionofBalleleconcentration(BAC)centersaround0.5and1,consistentwithwhatisexpectedofdiploidgenomes.What’smore,over96%oftheuniquevariantshavepopulationBallelefrequency(BAF)oflessthan0.01,indicatingthatthesevariantsarerareinthepopulation.Allthesemetricssuggestthattheseuniquevariantsareimportanttobeincludedindownstreamvariantanalysis.Inadditiontoalignercomparison,wealsoevaluatedsingle-samplevariantcallingversusthedefault,singlesamplevariantcallingfollowedbyjointmulti-samplegenotypingstrategyin50,100,500,2000,and5000samples.Ourdatashowedthat,withincreasingsamplesizes,thesingle-samplecallingstrategyaddedincreasingpercentageofuniquevariants.Atsamplesizeof5000,single-samplecallingadded58,884variants,accountingfor5.55%oftotalvariantscalledbybothstrategies.7331oftheseuniquevariantspassedVariantQualityScoreRecalibration(VQSR)andhaveGQof60oraboveinatleast5samples.Ourstudyidentifiedalargenumberofgood-qualityvariantsfromtheADSPexomesequencingprojectthatweremissedbyusingonealignerorusingmulti-samplegenotypingstrategyalone.Ourfindingsrevealedtherelationshipsbetweenbioinformaticspipelinesandbiomedicalresearchresults,andsuggestedthatalternativevariantcallingstrategiesmaybebeneficialforoptimalvariantdiscoveryinfaceoftoday’slargesequencingscale.
Page 96
86
TOWARDEFFECTIVEMICRORNAQUANTIFICATIONFROMSMALLRNA-SEQ
PamelaRussell1,RichardRadcliffe2,BrianVestal1,WenShi1,PratyaydiptaRudra1,LauraSaba2,KaterinaKechris1
1DepartmentofBiostatisticsandInformatics,ColoradoSchoolofPublicHealth;2DepartmentofPharmaceuticalSciences,UniversityofColoradoSkaggsSchoolof
PharmacyandPharmaceuticalSciences
PamelaRussellExtensiveworkhasledtorobustquantificationmethodsforRNA-seqdataprimarilyderivedfromlargeRNAs.Manystudieshaveusedthesemethods“outofthebox”toestimatemicroRNA(miRNA)expressionfromsmallRNA-seqdata.However,thesemethodsdonoteffectivelyaddressissuesparticulartomiRNAs.Firstofall,referencebiasisamplifiedduetothesmallsizeofsequencingreadsderivedfrommiRNAs(~22nt).Thatis,withshorterreads,atruemismatchbetweenasampleandthereferencecanleadtoincorrectalignmentsorinabilitytoalignreadsatall,creatingacountbiastowardthosesampleswiththereferenceallele.Withlongerreads,singlemismatcheshavelessimpactonalignmentalgorithms.Second,anybiasforindividualmiRNAsismoreimpactfuloverallduetotherelativelysmallrepertoireofmiRNAscomparedtomRNAs.InaccuratecountsforahandfulofmiRNAscansignificantlyalteroveralllibrarycountsandthusaffectnormalization.Werefertothisissueasrepertoirebias.Also,mostmiRNAstudiesseektoidentifyfunctionalmaturemiRNAmoleculesregardlessofthepositioninthegenomethattheyareoriginallytranscribedfromorsmallnon-functionaldifferencesbetweenmiRNAsofthesamefamily.ToolsdesignedforlargeRNAsdonotaddresstherepetitivenatureandfamilystructureofmiRNAs,bydefaultreturningestimatedcountsformultipletargetsthatshouldbeconsideredequivalentbytypicalmiRNAstudyparadigms.Genome-basedmethodsoftenmapmiRNAreadstomultiplelociencodingthesamematuremiRNA.MethodsbasedonmappingdirectlytoamiRNAdatabasedonotsufferfrommultiplealignmentsduetoidenticalregionsofthegenomebutdotypicallydistinguishamongmembersofeachmiRNAfamily.Bothsourcesofmultiplemappingscanleadtomisleadingcountswhenthegoalistoelucidatefunction.Hereweexplorealltheseissuesinthecontextofcommonlyusedmethods.Wethenproposeanewhighthroughputapproachthat(1)incorporatesindividualgeneticvariationintothereferencesequenceusedforalignment,reducingreferencebias,and(2) assignseachreadtoasinglefunctionalgroupsuchasamiRNAfamily.Wedemonstratetheaccuracyofthisapproachcomparedtootherpopularmethodsusingadatasetderivedfrom206mousebrainsamples.FundedbyNIH/NIAAAAA016597,R01AA021131andR24AA013162
Page 97
87
NANOPORESEQUENCINGTECHNOLOGYANDTOOLS:COMPUTATIONALANALYSISOFTHECURRENTSTATE,BOTTLENECKSANDFUTUREDIRECTIONS
DamlaSenol1,JeremieKim1,SaugataGhose1,CanAlkan2,OnurMutlu1,3
1DepartmentofElectricalandComputerEngineering,CarnegieMellonUniversity,Pittsburgh,PA,USA;2DepartmentofComputerEngineering,BilkentUniversity,Bilkent,Ankara,Turkey;
3DepartmentofComputerScience,SystemsGroup,ETHZürich,SwitzerlandDamlaSenolNanoporesequencing,apromisingsingle-moleculeDNAsequencingtechnology,exhibitsmanyattractive qualities and, in time, could potentially surpass current sequencing technologies.Nanoporesequencingpromiseshigherthroughput,lowercost,andincreasedreadlength,anditdoes not require a prior amplification step. Nanopore sequencers rely solely on theelectrochemicalstructureofthedifferentnucleotidesforidentificationandmeasurethechangeintheioniccurrentaslongstrandsofDNA(ssDNA)passthroughthenano-scaleproteinpores. Biologicalnanopores forDNAsequencingwas firstproposed in the1990s,but itwasonly justrecentlymade commercially available inMay 2014 by Oxford Nanopore Technologies (ONT).The first commercial nanopore sequencing device, MinION, is an inexpensive, pocket-sized,portable,high-throughputsequencingapparatusthatproducesreal-timedata.Thesepropertiesenable newpotential applications of genome sequencing, such as rapid surveillanceof Ebola,Zikaorotherepidemics,near-patienttesting,andotherapplicationsthatrequirereal-timedataanalysis. Inaddition,thistechnologyiscapableofgeneratingvery longreads(~50,000bp)withminimal sample preparation. Despite all these advantageous characteristics, it has onemajordrawback:higherrorrates.Inordertoprovidehigheraccuracyandhigherspeed,inMay2016,ONT released a new version of MinION with a new nanopore chemistry called R9, whichreplacedthepreviousversionR7.AlthoughR9chemistryimprovesthedataaccuracy,thetoolsused for nanopore sequence analysis are of critical importance as they should overcome thehigherrorratesofthetechnology. Ourgoalinthisworkistocomprehensivelyanalyzetoolsfornanoporesequenceanalysis,withafocusonunderstandingtheadvantages,disadvantages,andbottlenecksofthevarioustools.Tothisend,werigorouslyexaminemultiplesteps in thenanoporegenomeanalysispipeline.Thefirststep,basecalling, translatestherawsignaloutputofMinIONintonucleotidestogenerateDNA sequences. Currently,Nanocall andNanonet are publicly available nanoporebasecallers.The second stepperformsgenomeassemblywithassemblers fornoisy long reads.Usingonlythe basecalled DNA reads, assemblers generate longer contiguous fragments called draftassemblies. Currently,CanuandMiniasm are the commonlyused long-readassemblers.Afterthis step, an improved consensus sequence is generated from the draft assembly withNanopolish,andacompletewholegenomeisobtained. Weanalyzethefiveaforementionednanoporesequencingtoolsintermsoftheirspeedandaccuracy,withthegoalsofdeterminingtheirbottlenecksandfindingimprovementstothesetools.Wealsodiscusspotentialfutureworksinnanoporebasecallersandassemblers,totakebetteradvantageofnanoporesequencingandtoovercomeitscurrentdisadvantageofhigherrorrates.
Page 98
88
DETECTINGOUTLIERSFROMMULTIDIMENSIONALDATAWITHAPPLICATIONINCANCER
KyleSmith1,SubhajyotiDe2,DebashisGosh1
1UniversityofColorado,2RutgersUniversity
KyleSmithOutliers,whichareverydifferentfromthetypicalcasesinacohort,bringinunexpectedchallengesfordecisionmakinginmanydifferentdisciplines.Theissueismoreacuteinoncology,sincemosttypesofcancerarehighlyheterogeneousdiseases.Evenwithinanycancersubtype,patientsshowextensivevariationintheirmolecularprofilesandclinicaloutcomes.Evenwithinacohortofcancerpatientswhohaveapparentlythesamebiomarkersandreceivedidenticaltreatment,thereareexceptionalrespondersandexceptionalnon-responders,whoareoutliers.Itissuspectedthattheiratypicalmolecularandclinicalprofilescontributetotheirexceptionalresponse.Whileidentifyingsuchoutliercasescanbenefitprecisionmedicineinitiatives,methodstodetectthemfrommultidimensionaldatahasreceivedlimitedattention.Here,weproposeanovelframeworktoidentifyoutliercancerpatientswithatypicalprofilesfrommultidimensionalgenomicdata.Wearguethatdetectionofoutlierpatientswithatypicalprofilescanhelpidentifyexceptionalrespondersandtailorprecisionmedicineinoncologyinitiatives.
Page 99
89
HUEMR:INTUITIVEMININGOFELECTRONICMEDICALRECORDS
AbiodunOtolorin1,NanaOsafo2,WilliamSoutherland2
1DepartmentofCommunityandFamilyMedicine,HowardUniversity,Washington,DC;2DepartmentofBiochemistry&MolecularBiologyandtheCenterforComputational
BiologyandBioinformatics,HowardUniversity,Washington,DC
WilliamSoutherlandDespitethewidespreadadoptionofelectronicmedicalrecordsystemsandadvancesingenomics,amajorbarriertoresearchendeavorsisthelackofintuitiveuser-friendlyinteractivetoolsthatenableresearcherstoaccessandanalyzedatareadily.Inlightofthis,innovativetoolshavebeendevelopedtoaddresstheproblem.However,wehypothesizedthataninteractivedatavisualizationtoolthatiscapableofstand-aloneorpluginfunctionalitythatalsoleveragescommondataquerymethodologieswouldcontributetoresearcheffortsrequiringinterrogationofclinicalresearchdatabases.HowardUniversityHospital(HUH)isatertiaryacademicmedicalcenterwithover50,000emergencydepartmentvisitsand8,000inpatientadmissionsperyearandprimarilyprovidescaretotheminoritypopulationintheDistrictofColumbiametropolitanarea.Usingde-identifiedHUHelectronicmedicalrecordsdata,aHUHclinicalresearchdatabasewasdeveloped.Additionally,theHowardUniversityelectronicMedicalRecords(HUeMR)querytoolwasdevelopedasaweb-basedclient-serverapplicationusingjavascriptandphp.HUeMRmayfunctioninstand-aloneorpluginmode.ItsgraphicalinterfacewasbuiltusingGoogleCharts,aninteractiveopensourcevisualizationlibrary.HUeMRsupportscomplexbooleansearchoperationsspecifiedbyaninteractivequerytool.Ontologyispresentedusinglinkeddropdownmenusandqueryconstructionisdisplayedinnaturallanguageform.Dataisdisplayedusingeditableinteractivecharts.Multiplerowsofchartsmaybecreatedthatcontaindifferenttypesofdataconcepts.Queriesmayberefinedbyclickingonthechartsfollowedbyselectionofoneormoreadditionalqueryparameters.DiagnosisbasedonICDcodesorkeywordsmayalsobesearched.Thesefeaturesareillustratedinadiabetesuse-caseinvestigation.Insummary,HUeMRisasecuredataanalyticsthatcanbeuseinstand-aloneorpluginmodetoqueryingclinicalresearchdatabases.Ithasahighlyinteractiveuserinterfacethatallowsrapiddataanalysisforcohortdiscovery.Thisworkwassupportedbygrant#5G12MD007597fromtheNationalInstituteonMinorityHealthandHealthDisparitiesfromtheNIH.
Page 100
90
DECIPHERINGLUNGADENOCARCINOMAMORPHOLOGYANDPROGNOSISBYINTEGRATINGOMICSANDHISTOPATHOLOGY
Kun-HsingYu1,GeraldJ.Berry2,DanielL.Rubin1,ChristopherRé3,RussB.Altman1,MichaelSnyder4
1BiomedicalInformaticsProgram,StanfordUniversity;2DepartmentofPathology,StanfordUniversity;3DepartmentofComputerScience,StanfordUniversity;
4DepartmentofGenetics,StanfordUniversity
Kun-HsingYuAdenocarcinomaaccountsformorethan40%oflungmalignancy,andmicroscopicpathologyevaluationisindispensabletoitsdiagnosis.However,howhistopathologyfindingsrelatetomolecularabnormalitiesremainslargelyunknown.Toaddressthisproblem,weobtainedhematoxylinandeosinstainedwhole-slidehistopathologyimages,pathologyreports,RNA-sequencing,andproteomicsdataof538lungadenocarcinomapatientsfromTheCancerGenomeAtlas.Weprofiledgeneexpression,proteinexpressionandmodifications,andextractedmorethan9,000objectivefeaturesfromthehistopathologyimagesofeachpatient.Wesuccessfullypredictedhistologygradewithtranscriptomicsandproteomicssignatures(areaundercurve>0.75)andidentifiedtheassociatedmolecularpathways,suchascellcycleregulation,whichprovidebiologicalinsightsintotumorcelldifferentiationgrades.Wefurtherbuiltanintegrativehistopathology-transcriptomicsmodeltogeneratesuperiorprognosticpredictionsforstageIpatients(P<0.01)comparedwithgeneexpressionorhistopathologyanalysisalone.Theseresultssuggestthattheintegrationofhistopathologyandomicsstudiescanrevealthemolecularmechanismsofpathologyfindingsandenhanceclinicalprognosticprediction,whichwillcontributetothedevelopmentofprecisioncancermedicine.Ourmethodsaregeneralizabletoothertypesofmalignancyordiseases.
Page 101
91
EXPLORINGDEEPLEARNINGFORCOPYNUMBERVARIATIONDETECTIONWITHNGSDATA
Yao-zhongZhang,RuiYamaguchi,SeiyaImoto,SatoruMiyano
InstituteofMedicalScience,UniversityofTokyo
Yao-zhongZhangCopynumbervariations(CNVs)areanimportanttypeofgeneticvariationswidelyusedforprofilingcancerandothercomplexdiseases.AccuratedetectionandsummarizationofCNVshelpidentifyoncotargetandcancersubtypesforprecisionmedicine.InusingNGSdataforCNVsdetection,variousheterogeneousbiases,suchasGC-contentbiasandothernoisesareneededtobeproperlyprocessed.ThisbecomesespeciallyimportantforCNVsdetectiononsingle-cellNGSdata.Inthisstudy,weextendtraditionalHMMapproachesforCNVsdetectionwithdeeplearning.Weextractfeaturerepresentation,whichintegratetheinformationfromreadcountandobservablegenomicsequences,asthenewobservablesequenceofgenomicbinsanditerativelytrainaDNN-HMMmodelforCNVsdetection.WecompareourmethodwithotherHMMbasedCNVsdetectionmethods.
Page 102
92
IMAGINGGENOMICS
POSTERPRESENTATIONS
Page 103
93
PERIPHERALEPIGENETICASSOCIATIONSWITHBRAINGRAYMATTERINSCHIZOPHRENIA
DongdongLin1,VinceD.Calhoun2,JuanR.Bustillo3,NoraPerrone-Bizzozero4,JingyuLiu1
1TheMindResearchNetworkandLovelaceBiomedicalandEnvironmentalResearchInstitute,Albuquerque;2Dept.ofElectronicandComputerEngineering,UniversityofNewMexico,Albuquerque;3Dept.ofPsychiatry,UniversityofNewMexico,Albuquerque;4Dept.of
Neurosciences,UniversityofNewMexico,AlbuquerqueJingyuLiuEpigeneticregulationbyDNAmethylationandhistonemodificationhasbeenincreasinglyrecognizedforitsrelevancetoschizophrenia(SZ).Beyondthegeneticvariation,epigeneticsthroughregulationofgenetranscriptionandexpressioncanpotentiallyexplainthe‘missing’heritabilityandmediatetheeffectofgeneticrisksindisease.SpecifictoDNAmethylation,recentstudieshavedemonstratedthat6-7%ofCpGsitesacrossthegenomeshowsignificantcorrespondencebetweenbrainandblood,supportingtheinvestigationofeasilyaccessibletissuesforbrainandmentaldisorders.Inthisstudy,weanalyzedDNAmethylationof163CpGsitesfromsalivaandwholebraingraymatterdensityof108SZpatientsand105healthycontrols.Weareawareofcellularitydifferencesbetweenbloodandsaliva,andtoourbestknowledgenodetailedsaliva-braincorrespondencestudyhasbeendoneexceptgeneralcomparisonofoverallpatterns,whichindicatesalivamaybeamorecloseindicatortobrainthanblood.The163CpGsitesarelocatedwithinthe108schizophrenicriskregionsreportedbythePsychiatricGenomicsConsortiumschizophreniaworkinggroup,andalsoshowedstrongcross-tissuesimilaritybasedonthegenome-widemethylationstudyofbloodandbraintissuesbyHannon,etal.QualitycontrolandnormalizationformethylationdatawereimplementedusingminfiRpackagetoremovebatcheffect,andcelltypeproportioneffect.GraymatterdensitymapsweresegmentedbySPM12withasmoothkennelof8mm3.Weappliedindependentcomponentanalysistobothbrainimagingdataandmethylationdata,andextracted25graymatternetworks,and15methylationcomponents.Amongthem,twomethylationcomponentsweresignificantlycorrelatedtothreegraymatternetworks(falsediscoveryrate<0.05).ThefirstmethylationcomponentcomprisedtwoCpGsiteswithinandneargeneZSCAN12,andwasassociatedwithabilateralmiddle/superiortemporalnetwork(r=0.25),andabilateralsuperiorfrontalnetwork(r=-0.24).Thehigherthemethylationcomponentis,thelowerthegraymatterdensityinsuperiorfrontalgyrusandthehigherinmiddletemporalgyrusare.Moreover,SZpatientsshowedsignificantgraymatterreductioninsuperiorfrontalgyrus(p=7.9x10-5).ThesecondmethylationcomponentconsistedofCpGsitesfromtwochromosomeregions(Chr.10AS3MTandNT5C2genes,andChr.12ARL6IP4andOGFOD2genes),andwasassociatedwithcaudateandthalamusregions.Allanalyseswerecontrolledforageandgender.AlthoughwedidnotfindSZspecificmethylationdifferenceswithinSZriskregions,ourresultssuggestthatDNAmethylationpatternsinsalivaareassociatedwithbraingraymattervariation,andsomeofthisvariationisrelatedtoschizophrenia.Themainlimitationofthisstudyincludes1)thelackofreplicationdatatoverifythefindings,and2)thelackofdirectsalivaandbraintissuecorrespondenceverification.
Page 104
94
THEINTERPLAYBETWEENOLIGO-TARGETSPECIFICANDGENOME-WIDEOFF-TARGETINTERACTIONS
OlgaV.Matveeva1,NafisaN.Nazipova2,AlekseyY.Ogurtsov3,SvetlanaA.Shabalina3
1BiopolymerDesignLLC,Acton,MA;2InstituteofMathematicalProblemsofBiology,Pushchino,MoscowRegion,Russia;3NationalCenterforBiotechnologyInformation,NationalLibraryof
Medicine,NationalInstitutesofHealth,Bethesda,MDSvetlanaShabalinaManytechniquesofmolecularbiologyinvolveinteractionofspecificoligonucleotideswithDNAorRNAasabasicstep.DNAtargetingofsingle-guided(sg)RNAsforgenomeeditingprocedures,oligonucleotidearraygeneexpressionmonitoringoranti-sense-mediatedgenedown-regulation,andtheGenomicComparisonHybridization(GCH)arrayexperimentsareexamplesoftechniquesinvolvingRNA-DNAandDNA-DNAinteractions.RNAiapproacheswithsiRNAandshRNAmoleculesarebasedonRNA-RNAinteractions.Themainproblemofanyoligo-probeexperimentisthatthespecificoligo-targetinteraction,basedonfullypairedduplex,areusuallycombinedwithnon-specificparallelreactions,whereoligo-probecouldinteractwithmanypartiallypairedDNAorRNAsequences.Theinterplaybetweenspecificandgenome-wideoff-targetinteractionsispoorlystudieddespiteitscrucialroleintheefficacyofthesetechniques.Inthisstudy,weinvestigatedoligo-probecharacteristics,whichareresponsiblefortheinterplay,andwhichmostimprovetheoligo-probedesign.Wedefinedspecificityofinteractionasaratiobetweenoligo-targetspecificandgenome-wideoff-targetinteractions.Microarraydatabases,derivedfromtheGCHexperimentsusingtheAffymetrixplatforms,andcontainingtwodifferenttypesofprobeswereusedfortheanalysisbasedonthethermodynamicfeaturesandnucleotidesequencesofoligo-probes.Thefirsttypeofoligo-probedoesnothaveaspecifictargetonthegenomeandtheirhybridizationsignalsarederivedfromgenome-widecross-hybridizationalone.ThesecondtypeincludesoligonucleotidesthathaveaspecifictargetonthegenomicDNAandtheirsignalsarederivedfromspecificandcross-hybridizationcomponentscombinedtogetherinatotalsignal.Theanalysishasrevealedthathybridizationspecificitywasnegativelyaffectedbylowstabilityofthefully-pairedoligo-targetduplex,stableprobeself-folding,G-richcontent,includingGGGmotifs,lowsequenceSymmetricalComplexity(SC)score.TheSC-scorecharacterizesnucleotidecompositionsymmetryandprobe’svulnerabilitytooff-targetinteractions.Filteringouttheprobeswiththesecharacteristicssignificantlyincreaseshybridizationspecificitybydecreasinggenome-widecross-hybridizationorbyincreasingspecificinteractions.Selectedoligo-probeshavethreetimeshigherhybridizationspecificityonaverage,comparedtotheprobesthatwerefilteredoutfromtheanalysisbyapplyingsuggestedcut-offthresholdstothedescribedparameters.Multipleregressionmodelswithdescribedparametersweresuccessfullyappliedforpredictionsofinteractionspecificityandoff-targeteffectsandsupportedparameterchoice(P<0.001).WealsocomparedprobecharacteristicsselectedfortheanalysisinmicroarraydatabaseswithapplicablefeaturesofsiRNA/shRNAdesignfromourearlierstudies.WeappliedallselectedoligonucleotidefeaturesanddescribedparameterstonewsetsofsgRNAs.Ourstudyexaminedthethermodynamicsandsequence-intrinsicpropertiesofsgRNA-DNAduplexesandanalyzedadditionalselectioncriteriathatarecriticalforguideefficacy.Finally,weidentifyuniversalfeaturesofoligo-probes,si/shRNAsandguidesforoptimaldesignincludingtheSC-score.
Page 105
95
PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM?
POSTERPRESENTATIONS
Page 106
96
WARS2IMPLICATEDASACOMMONMODIFIEROFMETFORMINMETABOLITEBIOMARKERSINABIOBANKCOHORT
AlyssaI.Clay1,RichardM.Weinshilboum2,K.SreekumaranNair3,RimaF.Kaddurah-Daouk4,LieweiWang2,MatthewK.Breitenstein1
1DivisionofEpidemiology,MayoClinic;2DepartmentofMolecularPharmacologyandExperimentalTherapeutics,MayoClinic;3DivisionofEndocrinology,MayoClinic;4Duke
UniversityMatthewBreitensteinBackgroundMetforminisoneofthemostwidelyprescribeddrugsworldwideandafirstlinetreatmentfortype2diabetesmellitus(T2D).Metforminhasmanymechanismsofaction,withvaryinglevelsofunderstanding.Metforminisbeingevaluatedasapotentialchemopreventionagentforcancertreatment,withinhibitionofangiogenesisasoneaffectofmetforminbeingstronglypursued.However,contradictoryevidenceexistsforapotentialmechanismofangiogenesisinhibition(Carcinogenesis2014;(35)5).Buildingonourpriorworkthatidentifiedstratumofstatisticallycorrelatedmetabolites,weaimedtoidentifyoverlappingmetforminpharmacogenomic(PGx)SNPassociations,usingpharmacometabolomicsinformedPGxpairedwithanagnosticcomputationalapproach.MethodsToelucidateoverlappingPGxsignalsofmetforminexposure,weincludedmetabolites(n=5)withcorrelatedplasmaconcentration,adjustedformetforminexposure,inabiobankcohort-based,case-controlstudy.Cases(n=274)wereexposedtometforminmonotherapywithT2D;healthycontrols(n=274)hadnoknowndrugexposures.Casesandcontrolswerematchedbyageandgender,andadjustedforBMIandbatch.Apanelofaminoacidmetabolite(n=42)concentrationswasquantitativelymeasuredusingtandemliquidchromatography-massspectrometryfromfastingplateletpoorplasmasamplescollectedinEDTA.Genotypingwasperformedusingthe700kSNPIlluminaOmniExpressarrayplatformfrom250ngofDNA.Normalizedmetaboliteconcentrationswereutilizedasendpointstoinformgenomewideassociations.ResultsIncreasedplasmametaboliteconcentrationsforleucine(t=4.47,p=<0.0001),isoleuceine(t=4.63,p=<0.0001),andvaline(t=4.48,p=<0.0001)wereobservedwithexposuretometformin.Variantrs17023164(MAF=0.31),intheTryptophanylTRNASynthetase2,Mitochondrial(WARS2)generegionofchromosome1andaneQTLforWARS2infibroblasts,wasacommondownwardmodifierofleucine(β=-11.69,p=1.79e-7),isoleuceine(β=-6.99,p=2.40e-6),andvaline(β=-14.55,p=1.04e-5)withmetforminexposure.NoSNPsinneighboringgenesregionswereinhighLD(R^2>0.5)withrs17023164.ConclusionIncreasedplasmametaboliteconcentrationsforleucine,valine,andisoleucinewereobservedwithmetforminexposure.Acommonvariant,rs17023164inWARS2,wasidentifiedasastrongdownwardmodifierofthesemetaboliteswithmetforminexposure.Independently,WARS2isproposedasadeterminantofangiogenesis(NatCom2016;(7)12061).Wepositahypothesis:modificationofmetabolitebiomarkerconcentrationassociatedwithmetforminexposurebyWARS2variantsisapotentiallinkbetweenmetforminandangiogenesis.Functionalcharacterizationofapotentialmechanismformetformininhibitionofangiogenesis,modifiedbyWARS2,isongoing.
Page 107
97
ESTIMATIONOFFALSENEGATIVERATESVIAEMBEDDINGSIMULATEDEVENTS
StephenV.Gliske1,KatyL.Lau1,BenjaminH.Brinkman2,GregA.Worrell2,CrisG.Fink3,WilliamC.Stacey1
1UniversityofMichigan,2MayoClinic,3OhioWesleyanUniversity
StephenGliskAutomatedeventdetectionistheresultofmanytypesofdata-drivenpatternrecognitionmethods.Oneofthegeneralchallengestotheseanalyzesisthequantificationandcorrectionforfalsenegativedetections,i.e.,caseswheretheevent(pattern)ispresentinthedatabutwasnotdetected.Estimatingthefalsepositiverateismucheasier,ashumanreviewofasubsampleofdetectedeventsissufficient.However,determiningthefalsenegativeratebyhumanreviewwouldrequiremanualsearchingthroughtherawdata,whichisimpractical,ifnotcompletelyinfeasible.Thischallengeisnotuniquetobiomedicaldataandiscommonlyaddressedinhighenergyphysics.Theapproachiscalledembedding.Itisapplicabletoanyanalysiswhereatleastoneofthesignalorbackgroundcanbemodeledwellbysimulations.Byplacingspecificeventsatknownlocations,onecanthenruntheautomateddetectorandreportthefractionofembeddedeventsthatweredetected.Wepresentthefirstapplicationofembeddingtoneurologicaldata,specificallytheautomateddetectionofabiomarkerofepilepsy(highfrequencyoscillations)recordedinintracranialelectroencephalogram(EEG)data.Thefalsenegativerateisfoundtobeconsistentacrossbothrecordingchannelandacrosspatients.
Page 108
98
INTEGRATIVE,INTERPRETABLEDEEPLEARNINGFRAMEWORKSFORREGULATORYGENOMICSANDEPIGENOMICS
ChuanShengFoo,AvantiShrikumar,JohnnyIsraeli,PeytonGreenside,ChrisProbert,AnnaScherbina,RahulMohan,NathanBoley,AnshulKundaje
StanfordUniversity
AnshulKundajeWepresentgeneralizableandinterpretablesuperviseddeeplearningframeworkstopredictregulatoryandepigeneticstateofputativefunctionalgenomicelementsbyintegratingrawDNAsequencewithdiversechromatinassayssuchasATAC-seq,DNase-seqorMNase-seq.First,wedevelopnovelmulti-channel,multi-modalCNNsthatintegrateDNAsequenceandchromatinaccessibityprofiles(DNase-seqorATAC-seq)topredictin-vivobindingsitesofadiversesetoftranscriptionfactors(TF)acrosscelltypeswithhighaccuracy.Ourintegrativemodelsprovidesignificantimprovementsoverotherstate-of-the-artmethodsincludingrecentlypublisheddeeplearningTFbindingmodels.Next,wetrainmulti-task,multi-modaldeepCNNstosimultaneouslypredictmultiplehistonemodificationsandcombinatorialchromatinstateatregulatoryelementsbyintegratingDNAsequence,RNA-seqandATAC-seqoracombinationofDNase-seqandMNase-seq.Ourmodelsachievehighpredictionaccuracyevenacrosscell-typesrevealingafundamentalpredictiverelationshipbetweenchromatinarchitectureandhistonemodifications.Finally,wedevelopDeepLIFT(DeepLinearImportanceFeatureTracker),anovelinterpretationengineforextractingpredictiveandbiologicalmeaningfulpatternsfromdeepneuralnetworks(DNNs)fordiversegenomicdatatypes.DeepLIFTcanintegratethecombinedeffectsofmultiplecooperatingfiltersandcomputeimportancescoresaccountingforredundantpatterns.WeapplyDeepLIFTonourmodelstoobtainunifiedTFsequenceaffinitymodels,inferhighresolutionpointbindingeventsofTFs,dissectregulatorysequencegrammarsinvolvinghomodimerandheterodimericbindingwithco-factors,learnpredictivechromatinarchitecturalfeaturesandunravelthesequenceandarchitecturalheterogeneityofregulatoryelements.
Page 109
99
VISUALIZATIONOFCOMPLEXDISEASESANDRELATEDGENESETS
ModestvonKorff,TobiasFink,ThomasSander
ActelionPharmaceuticalsLtd.,Allschwil,Switzerland
ModestvonKorffTherelationsbetweengenesanddiseasesformcomplexpatterns.Visualizationofthesepatternsenablesthescientisttoobtainanoverviewofthemostimportantgene–diseaserelations.Thesegene–diseaserelationsareofhighimportanceindrugdiscovery.Proteinsencodedbydisease-relatedgenesarepotentialtargetsfornewdrugsormaybecomebiomarkersfordiseasediagnosis.Bothanoveldrugtargetandabiomarkershouldbehighlyspecificfortheaimeddisease.Inourpublicationforthisconference,weintroducearelevanceestimator.Thisrelevanceestimatorisameasureofthespecificityofagene–diseaserelationshipthatalsotakesintoconsiderationallotherknowngene–diseaserelationships.Weanalyzedgene–diseaserelationshipsfrom22millionPubMedrecordsandobtainedamatrixwithrelevanceestimatorsforabout5000diseasesand15,000genes.Thisrelevancematrixenabledustoexpressthesimilaritybetweendiseaseswithsimplevector-baseddistancemeasures.Ameaningfuldisease–gene–diseasevisualization,consistingofseverallayers,wasderivedfromthesedisease–diseasesimilaritymeasuresandtherelevanceestimators.Themultidimensionalvisualizationspresentedheregiveanoverviewofcomplexdiseaseslikeasthma,Alzheimer'sdiseaseandhypertension.
Page 110
100
PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES
POSTERPRESENTATIONS
Page 111
101
FINDINGSFROMTHEFOURTHCRITICALASSESSMENTOFGENOMEINTERPRETATION,ACOMMUNITYEXPERIMENTTOEVALUATEPHENOTYPE
PREDICTION
StevenE.Brenner1,GaiaAndreoletti1,RogerAHoskins1,JohnMoult2,CAGIParticipants,
1UniversityofCalifornia,Berkeley;2IBBR,UniversityofMaryland,Rockville,MD
StevenBrennerTheCriticalAssessmentofGenomeInterpretation(CAGI,\'kā-jē\)isacommunityexperimenttoobjectivelyassesscomputationalmethodsforpredictingthephenotypicimpactsofgenomicvariation.CAGIparticipantsareprovidedgeneticvariantsandmakepredictionsofresultingphenotype.Thesepredictionsareevaluatedagainstexperimentalcharacterizationsbyindependentassessors.
ThefourthCAGIexperimentconcludedthisyear.Itincluded11challengeswhichreflected:non-synonymousvariantsandtheirbiochemicalimpactmeasuredbytargetedassays;noncodingregulatoryvariantsandtheirimpactongeneexpression;researchexomesforpredictionofcomplextraits;personalgenomesandtraitprofiles;andclinicalsequencesandassociatedreferringindications.
TherewerenotablediscoveriesthroughouttheCAGIexperiment,andgeneralthemesemerged.Theindependentassessmentfoundthattopmissensepredictionmethodsarehighlystatisticallysignificant,butindividualvariantaccuracyislimited.Moreover,missensemethodstendtocorrelatebetterwitheachotherthanwithexperiments(forreasonsthatmayreflectthepredictivemethodsandtheassaysthemselves).However,theremightbepotentialformissenseinterpretationattheextremeofthedistribution.Structure-basedmissensemethodsexcelinafewcases,whileevolutionary-basedmethodshavemoreconsistentperformance.Bespokeapproachesoftenenhanceperformance.
Ontheclinicalstudies,predictorswereabletoidentifycausalvariantsthatwereoverlookedbytheclinicallaboratory,anditappearsthatphysiciansmaynotalwaysorderthemostrelevantgenetictestfortheirpatients.CAGIdatashowthatrunningmultipleuncalibratedmethodsandconsideringtheirconsensusoftenprovidesundueconfidenceintheircorrelation;wethereforeadviseagainstrunningmultipleuncalibratedvariantinterpretationtoolsinclinicalanalysis.
Theresultsshowedthatpredictingcomplextraitsfromexomesisfraught.Interpretationofnon-codingvariantsshowspromisebutisnotatthelevelofmissense.Beyondthis,creatingageneticstudythatprovidesareliablegoldstandardisremarkablydifficult.However,therewerenotableimprovementsintheabilitytomatchgenomestotraitprofiles.
CompleteinformationaboutCAGImaybefoundathttps://genomeinterpretation.org.
Page 112
102
ASTROLABE:EXPANSIONTOCYP2C9ANDCYP2C1
AndreaGaedigk1,GreysonP.Twist2,SarahSoden2,EmilyG.Farrow2,NeilA.Miller2
1DivisionofClinicalPharmacology&TherapeuticInnovation,Children'sMercy,KansasCity,SchoolofMedicine,UniversityofMissouri-KansasCity;2CenterforPediatricGenomic
Medicine,Children'sMercy,KansasCity
AndreaGaedigkBackground:CYP2C9and19arehighlypolymorphicpharmacogenesmetabolizingnumerousdrugs.BotharegeneswithCPICguidelinesunderscoringtheirclinicalrelevance.Tofacilitatehaplotypecallingandtranslationintophenotype,wehavedevelopedaprobabilisticscoringsystem,Astrolabe(initiallycalledConstellation;Twistetal2016,GenMed1:15007)thatenablesautomatedCYP2D6diplotypecallingfromwholegenomesequencing.WereportheretheextensionofAstrolabetoCYP2C9and2C19.Methods:ThestudywasapprovedbytheInstitutionalReviewBoardofChildren’sMercyandincluded85subjects(7HapMap;78patients/parents).AlleledefinitionsareaccordingtotheP450NomenclatureDatabase(cypalleles.ki.se/)withsomemodifications.Exonsand100bpofflankingintronswereusedforAstrolabecallsaswellas-2990to-440ofCYP2C9and-1063to-180ofCYP2C19harboringSNPsdefiningCYP2C9*8andCYP2C19*27,respectively.Allbut3subjectsweregenotypedforCYP2C9*2,*3,*5and*8andCYP2C19*2-*4,*17,*27and*35usingTaqManassaystovalidateAstrolabecalls.WGSdatawerereanalyzedwiththeDRAGENBio-ITprocessor(EdicoGenome)toimprovevariationcallquality.ToaccountforhaplotypeanddiplotypecombinationsnotobservedinoursamplesetsimulationsofallpossiblediplotypecombinationswereperformedusingtheARTreadsimulatorandDRAGENanalysispipeline.Astrolabeisavailableathttps://www.childrensmercy.org/genomesoftwareportal/Results:TomaximizeAstrolabecallaccuracy,intronregionswereadjustedtoincludeinformativeSNPswhileexcludingthosethatoccuronnumeroushaplotypesand/orarenotpartofadefinedallele.TheCYP2C9exon1region,e.g.waslimitedto57bpofintron1toexclude251T>C,whichispresentin1155/3540subjects(CMHvariantwarehousedatabase).ThisSNPdefinesCYP2C*29,butinterferedwithAstrolabecallsbyovercallingCYP2C*29intheabsenceofitskeySNP(33437C>A).OptimizedcallingtargetregionswerethenusedtocompareAstrolabewithgenotypecalls.Astrolabecorrectlycalled68/75(90.67%)and71/75(94.67%)ofsubjectsforCYP2C9and19,respectively.AmongtheallelesdetectedbyAstrolabeandgenotypingwereCYP2C9*2,*3and*8andCYP2C19*2,*17,*27and*35.AstrolabealsoidentifiedsubjectscarryingtherareCYP2C9*9and*11andCYP2C19*15alleleswhichwerenotcoveredbygenotyping.Astrolabecorrectlycalled1077/1128simulatedCYP2C19diplotypes(95%recall;45missedand6multiplecalls).Allmissedcallswere*12calledas*1.ForCYP2C9,Astrolabecorrectlycalled2186/2278simulateddiplotypes(95%recall;61missedand31multiplecalls).Allmissedcallswere*25calledas*1.Discussion:Astrolabe’sfunctionalitywassuccessfullyexpandedtoCYP2C9and19.PhenotypepredictionbasedonAstrolabewassuperioroverthatderivedfromalimitedgenotypepanel.ContinuedimprovementandexpansionofthenomenclaturedefinitionswillallowustoresolvethemiscalledhaplotypesrepresentedinthesimulationsetandimproveAstrolabecallingacrossalldiplotypes.
Page 113
103
HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER
JonathanGallion,AngelaD.Wilkins,OlivierLichtarge
BaylorCollegeofMedicine
JonathanGallionThediscoveryofdrivergenesisamajorpursuitofcancergenomics,usuallybasedonobservingthesamemutationindifferentpatients.Buttheheterogeneityofcancerpathwaysplusthehighbackgroundmutationalfrequencyoftumorcellsoftencloudthedistinctionbetweenlessfrequentdriversandinnocentpassengermutations.Here,toovercomethesedisadvantages,wegroupedtogethermutationsfromclosekinaseparalogsunderthehypothesisthatcognatemutationsmayfunctionallyfavorcancercellsinsimilarways.Indeed,wefindthatkinaseparalogsoftenbearmutationstothesamesubstitutedaminoacidatthesamealignedpositionsandwithalargepredictedEvolutionaryAction.Functionally,thesehighEvolutionaryAction,non-randommutationsaffectknownkinasemotifs,butstrikingly,theydosodifferentlyamongdifferentkinasetypesandcancers,consistentwithdifferencesinselectivepressures.Takentogether,theseresultssuggestthatcancerpathwaysmayflexiblydistributeadependenceonagivenfunctionalmutationamongmultipleclosekinaseparalogs.Therecognitionofthis“mutationaldelocalization”ofcancerdriversamonggroupsofparalogsisanewphenomenathatmayhelpbetteridentifyrelevantmechanismsandthereforeeventuallyguidepersonalizedtherapy.
Page 114
104
SCOTCH:ANOVELMETHODTODETECTINSERTIONSANDDELETIONSFROMNGSDATA
RachelGoldfeder,EuanAshley
StanfordUniversity
RachelGoldfederClinical-gradegenomesequencingandinterpretationrequiresaccurateandcompletegenotypecallsacrosstheentiregenome.Whilesinglenucleotidevariantdetectionishighlyaccurateandconsistent,thesevariantsexplainonlyasmallfractionofdiseaserisk.Othertypesofvariationthatdisrupttheopenreadingframe,suchasinsertionsanddeletions(INDELs),aremorelikelytobeharmful.However,currentmethodshavelowsensitivityforlarger(>=fivebases)INDELs,primarilyduetochallengessurroundingaligningsequencereadsthatspanINDELs.WepresentScotch,anovelINDELdetectionmethodthatleveragessignaturesofpoorreadalignment,readdepthinformation,andmachinelearningapproachestoaccuratelyidentifyINDELsfromnext-generationDNAsequencingdata.Usingbiologicallyrealisticsimulatedgenomesandsequencereadswithtechnologicallyrepresentativeerrorprofiles(generatedbyART),weevaluateScotchandseveralcurrentlyavailableINDELcallers.WeshowthatScotchhashighersensitivitythancurrentmethods,particularlyforlargerINDELs.Finally,wevalidateINDELsthatScotchdiscoveredinoneindividual,NA12878,andshowthatScotchhashighpositivepredictivevalue.ThismethodwillenableresearchersandclinicianstomoreaccuratelyidentifyINDELsassociatedwithpreviouslyunexplainedgeneticconditions.
Page 115
105
MAYOOMICSREPOSITORYFORTRANSLATIONALMEDICINE
IainHorton,JeanetteEckel-Passow,StevenHart,ShannonMcDonnell,DavidMead,GayReed,GregDougherty,JasonRoss,JulieSwank,MarkMyers,MathieuWiepert,Rama
Volety,TonyStai,YaxiongLin,RobertFreimuth
MayoClinic
IainHortonTheMayoClinicGenomicDataWarehousehasestablishedtheinfrastructurefoundation,processes,andapplicationstomeetthetranslationalneedsoftheMayoClinicCenterforIndividualizedMedicine(CIM).Throughthestreamlinedandautomateddatapipeline,thenext-gensequencing(NGS)resultsareloadedandintegratedwithclinicaldata,providingthefoundationforthedevelopmentofrevolutionarysolutionsanddiscoveryintheclinicalpracticeandgenomicresearch.Initiatedin2012,withproductiondataingestionbeginninginearly2014,MayoClinic'sTranslationalResearchCenter(TRC)hasprovidedthecornerstoneplatformfordatacentricactivitieswithinCIM.DatageneratedfromboththeclinicalpipelineandresearchpipelineareautomaticallyloadedintoTRCwitheachnewbitaddingvalueandpowertothesystem.Twokeysolutionswithsignificantpotentialofimpactingpatientcareandscientificdiscoveryhavebeenbuiltonthisgenomicdatawarehouse.FirstistheMolecularDecisionSupportsystem,arule-basedpharmacogenomicssystemthatenablesMayoClinicclinicianstointegrateactionableinformationbasedonapatient'sgenotypeinformationatthepointofcareusingNGSdata.SecondistheMayoVariantSummaryapplication,acloud-nativesystemwhichempowersMayoClinicresearcherstoidentifyrareandactionablegenomicvariantsthroughdynamicfilteringandgroupingofsubjectphenotypeandspecimenmetadata.
Page 116
106
PHARMACOGENOMICSCLINICALANNOTATIONTOOL(PHARMCAT)
T.E.Klein1,M.Whirl-Carrillo1,R.M.Whaley1,M.Woon1,K.Sangkuhl1,LesterG.Carter1,H.M.Dunnenberger2,P.E.Empey3,A.T.Frase4,R.R.Freimuth5,A.Gaedigk6,A.Gordon7,C.Haidar8,J.K.Hicks9,J.M.Hoffman8,M.T.Lee10,N.Miller11,S.D.Mooney12,T.N.Person13,J.F.Peterson14,M.V.Relling8,S.A.Scott15,G.Twist11,A.Verma13,M.S.Williams10,C.Wu16,W.Yang8,M.D.Ritchie4,13
1DeptGenetics,StanfordUniv,Stanford,CA;2CenterforMolecularMedicine,NorthShoreUniversityHealthSystem,EvanstonIL;3DepartmentofPharmacyandTherapeutics,SchoolofPharmacy,
UniversityofPittsburgh;4DepartmentofBiochemistryandMolecularBiology,ThePennsylvaniaStateUniversity,UniversityPark,PA;5DepartmentofHealthSciencesResearch,MayoClinic,RochesterMN;6DivisionofClinicalPharmacology,Toxicology&TherapeuticInnovation,Children’sMercy-
KansasCity,KansasCity,MO;7DepartmentofMedicine,DivisionofMedicalGenetics,UniversityofWashington,Seattle,WA;8St.JudeChildren'sResearchHospital,Memphis,TN;9DeBartoloFamilyPersonalizedMedicineInstitute,H.LeeMoffittCancerCenter,Tampa,FL;10GenomicMedicine
Institute,GeisingerHealthSystem,Danville,PA;11CenterforPediatricGenomicMedicine,Children’sMercy,KansasCity,MO;12DepartmentofBiomedicalInformaticsandMedicalEducation,UniversityofWashington,Seattle,WA;13BiomedicalandTranslationalInformatics,GeisingerHealthSystem,Danville,PA;14VanderbiltUniversityMedicalCenter,Nashville,TN;15DepartmentofGeneticsand
GenomicSciences,IcahnSchoolofMedicineatMountSinai,NewYork,NY;16DepartmentofMolecularandExperimentalMedicine,TheScrippsResearchInstitute,LaJolla,CA
TeriKleinPharmacogenomics(PGx)decisionsupportandreturnofresultsisanactiveareaofgenomicmedicineimplementationatmanyhealthcareorganizationsandacademicmedicalcenters.TheClinicalPharmacogeneticsImplementationConsortium(CPIC)hasestablishedguidelinessurroundinggene-drugpairsthatcanandshouldleadtoprescribingmodificationsbasedongeneticvariant(s).OneofthechallengesinimplementingPGxisextractinggenomicvariantsandassigninghaplotypes(includingstar-alleles)fromgeneticdataderivedfromsequencingandgenotypingtechnologiesinordertoapplytheprescribingrecommendationsofCPICguidelines.InacollaborationbetweenthePGRNStatisticalAnalysisResource(P-STAR),ThePharmacogenomicsKnowledgebase(PharmGKB),theClinicalGenomeResource(ClinGen),andCPIC,wearedevelopingasoftwaretooltoextractallvariantsfromCPIClevel-AgeneswiththeexceptionofG6PDandHLA,fromageneticdatasetresultingfromsequencingorgenotypingtechnologies(representedasa.vcf),interpretthevariantalleles,inferdiplotypes,andgenerateaninterpretationreportbasedonCPICguidelines.TheCPICpipelinereportcanthenbeusedtoinformprescribingdecisions.WeassembledafocusgroupofthoughtleadersinPGxtobrainstormtheissuesandtodesignthesoftwarepipeline.Wehostedaone-weekHackathonatthePharmGKBatStanfordUniversitytobringtogethercomputerprogrammerswithscientificcuratorstoimplementthefirstversionofthistool.Throughthisprocess,wehaveuncoveredmanyofthechallengessurroundingPGximplementation.Forexample,theinferenceofdiplotypesischallengingforseveralCPIClevel-Agenes.ThissoftwarepipelinewillbemadeavailableundertheMozillaPublicLicense(MPL2.0)anddisseminatedinGithubforthescientificandclinicalcommunitytotest,explore,andimprove.PharmCATwillprovideasolutionthatwillenablesitesimplementingPGxawaytomoreconsistentlyinterpretgenomicresultsandlinkthoseresultstopublishedclinicalguidelines.Furthermore,weareassembling(andwillbemaintaining)thetranslationtablesthatunderliethetool,whichwillsignificantlyreducetheeffortrequiredtoimplementPGxclinicallyandensuremoreuniforminterpretationsofPGxknowledge.Asprecisionmedicinecontinuestomoveintoclinicalpractice,implementationworkflowsforPGx,likePharmCAT,wouldenablestandardizedandconsistentimplementationofPGxgenes.
Page 117
107
PCSK9MODULATINGVARIANTSINFAMILIALHYPERCHOLESTEROLEMIA
SarathbabuKrishnamurthy1,DianeSmelser1,ManickamKandamurugu1,JosephLeader1,NouraS.Abul-Husn2,AlanR.Shuldiner2,DavidH.Ledbetter1,FrederickE.Dewey2,David
J.Carey1,MichaelF.Murray1,RaghuP.R.Metpally1
1GeisingerHealthSystem;2RegeneronGeneticsCenter
SarathbabuKrishnamurthyBACKGROUND:Highlypenetrantautosomaldominantfamilialhypercholesterolemia(FH)isknowntobecausedbypathogeniclossoffunction(LOF)variantsinLDLRandgainoffunctionvariantsinPCSK9andAPOBgenes.InadditiontoitscausativeroleinFH,PCSK9LOFvariantsareassociatedwithloweringofserumlowdensitylipoproteincholesterol(LDL-C)andtotalcholesterol.Theaimsofthisstudywereto1.IdentifyrarenovelPCSK9genevariantsthatleadtocompleteorpartiallossofproteinfunctionintheDiscovEHRcohort.2.ExploreprevalenceofPCSK9LOFvariantsinasubsetofFHpatientsand3.ExaminewhetherFHpatientscarryingPCSK9LOFsshowassociationwithloweringtheplasmalowdensityLDL-Candcardiovascularrisk.METHODS:Weanalyzedwholeexomesequencesfrom51,289individualsintheDiscovEHRcohort,whoconsentedtoparticipateintheGeisingerHealthSystem’sMyCodeCommunityHealthInitiative.Raremissenseandpredictivelossoffunction(pLOF)codingvariantsinPCSK9wereidentifiedbyintegratingbioinformaticsandevaluatingLDL-Candtotalcholesterolmeasuresfromtheelectronichealthrecords(EHR).RESULTS:IntheoverallDiscovEHRcohort,weidentified20missenseand13pLOFs(2splicedonor,6stopgainedand5frameshift)rarevariantsinPCSK9,including15novelvariantsthatwereassociatedwithlowerLDL-Candtotalcholesterollevels.LDL-CinpLOFcarrierswassignificantlylowerthaninmissensecarrierswithpresumedpartiallossoffunction(p<0.0012).PatientswithPCSK9raremissensewithpresumedpartialLOForLOFvariantshadsignificantreductionintheincidenceofcoronaryeventscomparedtothecontrolgroup(p<0.0001).InFHpatients,theLDL-loweringPCSK9R46Lvariantpreviouslyreportedas3%prevalencewasfoundtobeenrichedat9.6%andwasassociatedwithlowerLDL-CcomparedtoFHpatientsnotcarryinganR46Lallele.AnovelPCSK9missensevariant(G316S)wasalsopresentinFHpatientswithaprevalenceof0.8%andalsoshowedanLDL-loweringphenotypiceffectinanimputedfamilypedigree.CONCLUSIONS:Overall11.8%oftheFHpatientsintheDiscovEHRcohortwereidentifiedtoalsocarryaPCSK9variantwhichmodulatestheirLDL-Candserumcholesterollevels.
Page 118
108
INTEGRATIVENETWORKANALYSISOFPROSTATETISSUELINCRNA-MRNAEXPRESSIONPROFILESREVEALSPOTENTIALREGULATORYMECHANISMSOF
PROSTATECANCERRISKLOCI
NicholasB.Larson1,ShannonMcDonnell1,ZachFogarty1,MelissaLarson1,JohnCheville2,ShaunRiska1,SaurabhBaheti1,AshaA.Nair1,DanielO’Brien1,JaimeDavila1,DanielSchaid1,StephenN.
Thibodeau21DepartmentofHealthSciencesResearch,MayoClinic,Rochester,MN;2Departmentof
LaboratoryMedicineandPathology,MayoClinic,Rochester,MN
NicholasLarsonLarge-scalegenome-wideassociationstudieshaveidentified146lociassociatedwithriskofdevelopingprostatecancer(PRCA).However,mostoftheselocidonotlieincloseproximitytoproteincodinggenesandarepresumedtoberegulatoryinnature.DownstreamregulationofproteincodinggenesrelatedtoPRCAdevelopmentmaybemediatedbycis-actingregulationofnearbytranscripts,alsoknownascis-mediatedtrans-eQTLs.Thiscis-mediatorcausalrelationshipiscomprisedofaregulatoryvariant,anearbycis-regulatedgene,andthedownstreamregulatedtranstargetgene.Cis-mediatorsmayincludetranscriptionfactors,signalingproteins,andlongintergenicnon-codingRNAs(lincRNAs).LincRNAscorrespondtoahostofregulatoryfunctionssuchaschromatinremodelingandtranscriptionalco-activation,andhavepreviouslybeenidentifiedasdiagnosticandprognosticbiomarkersforanumberofcancers.Howevertheirroleincancerdevelopmentandprogressionispoorlyunderstood.Toexplorethehypothesisthatcis-mediatedtranseQTLsmayplayaroleinPRCArisk,weleveragedaneQTLdatasetof471samplesofnormalprostatetissuefromprostate/bladdercancerpatientswithavailableRNA-SeqandimputedIlluminaInfinium2.5Mgenotypedata.Wefirstconductedaninitialtranscriptome-wideeQTLscreeningofalllincRNAsandmRNAswith8,073SNPsinhighlinkagedisequilibrium(r2>0.5)withpreviouslyidentifiedPRCArisk-associatedvariants,identifyingapproximately5000transcripts(FDR<0.10)tobeputativelyassociated(cisortrans).WethenconstructedanundirectedGaussiangraphicalregulatorynetworkfromtheexpressionprofilesofthistranscriptsubset,identifying87,468connections.Toidentifycandidatecis-mediatornode-pairsintheexpressionnetwork,weisolatedasubsetofcis-associatedtranscripts(lincRNAormRNA)atastrictBonferronisignificancethreshold.WethenidentifiedallconnectedmRNAnodestothesecis-nodesthatdistaltothecis-variant(>1Mb)andhadevidenceofatrans-associationwiththecisvariant(P<1E-04),resultingin9candidatecis-mediatortrios.Finally,weappliedcausalmediationanalysistotesttheproportionofthetrans-associationthatismediatedbythecis-regulatedtranscript,resultingin7/9significantcis-mediatorrelationships.TranscriptionfactorHNF1Bwasidentifiedtobeasignificantmediatorinthetrans-associationsbetweenrs11263762andthreemRNAs:SRC,MIA2,andSEMA6A.AllthreeexhibitedconcomitantupregulationwithHNF1B.Notably,HNF1AhasbeenshowntostimulateSRCexpressionviaanalternativepromoter,whileMIA2isalsoaknownHNF1Atarget.DysregulationofSEMA6AhasbeenobservedinPRCAmetastasesandplaysapotentialroleinangiogenesisinteractingwithVEGFR2.MSMBandNDRG1bothdemonstrateandrogen-stimulatedexpressioninprostatetissue,andindicatedarecessivepatternofexpressiondysregulationwithrs10993994.Despiteasmallsamplesize,wereplicatedmultipletrans-eQTLsfromthesecis-mediatortriosintheGTExprostatetissueeQTLdataset(P<0.05).Together,ourfindingssuggestdysregulationofRNAexpressionmayplayaroleingeneticpredispositiontoPRCA.
Page 119
109
INTEGRATEDANALYSISOFGENOMICS,PROTEOMICS,ANDPHOSPHOPROTEOMICSINCELLSANDTUMORSAMPLES
JasonE.McDermott1,TaoLiu1,SamuelPayne1,VladislavPetyuk1,RichardSmith1,PhilippMertins2,StevenCarr2,KarinRodland1
1PacificNorthwestNationalLaborator,2BroadInstitute
JasonMcDermottAspartoftheClinicalProteomicTumorAnalysisConsortium(CPTAC),wehaverecentlypublishedthefirstlarge-scaleproteomicandphosphoproteomicanalysisofhigh-gradeserousovariantumors.Weobservedthatphosphorylationstatuswasanexcellentindicatorofpathwayactivityandcoulddiscriminatebetweenpatientsurvivaltimes.Inthecurrentworkwehavecombinedthisdatawithcomparabledatafrombreastcancertumorsandcancercelllinestreatedwithkinaseinhibitors,toanswerseveralfundamentalquestionsabouttheroleofphosphorylationincellularprocessesandcancer.Thetotaldatasetcomprisedover150sampleswithverydeepproteomiccoverage(>20,000phosphopeptidesconfidentlyidentified).Wefirstfoundthatthecorrelationbetweenkinaseproteinabundanceandabundanceofphosphorylatedtargetpeptideswasverylow,indicatingthatkinaseabundanceisnotagoodproxyforphosphorylationstatusoverall.However,highlycorrelatedkinase-substratepairsweresignificantlymorelikelytobetruerelationships(fromexistingknowledge),demonstratingthatthismethodcouldbeusedtopredictnovelkinasetargetsinsomecases.Weusedthisanalysistoidentifyseveralnovelkinase-substraterelationshipsthatweredifferentialbetweentumorsubtypes,andthatcorrelatedwithpathwayswherephosphorylationwasaffectedbydrugtreatment.Theserelationshipsarecurrentlyunderinvestigationaspotentialnoveltargetsfortherapeuticintervention.Tobetteranalyzecancer-relevantpathwayactivitywedevelopedanovelapproachthatcharacterizescorrelation,differentialabundance,andstatisticalinteractionsbetweencomponentstoanalyzemultipleomicstypesinthecontextofsignalingandfunctionalpathways.Weusedthisapproach,calledtheLayeredEnrichmentAnalysisofPathways(LEAP),toidentifyactivepathwaysinmolecularsubtypesofovarianandbreastcancer,andseveralnovelsubpopulationsofpatientsdisplayinguniquelydysregulatedpathways.Ourresultsshowthatintegrationofmultipleomicstypeshasgreatpotentialintheareaofdevelopmentofnoveltherapeuticapproachesforpersonalizedmedicine.
Page 120
110
NETDX:PATIENTCLASSIFICATIONUSINGINTEGRATEDPATIENTSIMILARITYNETWORKS
ShraddhaPai,ShirleyHui,RuthIsserlin,HussamKaka,GaryD.Bader
TheDonnellyCentre,UniversityofToronto
ShraddhaPaiPatientclassificationhaswidespreadbiomedicalandclinicalapplications,includingdiagnosis,prognosis,diseasesubtypingandtreatmentresponseprediction.Ageneralpurposeandclinicallyrelevantpredictionalgorithmshouldbeaccurate,generalizable,beabletointegratediversedatatypes(e.g.clinical,genomic,metabolomic,imaging),handlesparsedataandbeintuitivetointerpret.WedescribenetDx,asupervisedpatientclassificationframeworkbasedonpatientsimilaritynetworks,thatmeetstheabovecriteria(Ref1).netDxmodelsinputdataaspatientnetworks,andusesnetworkintegrationandmachinelearningforfeatureselection.WedemonstratetheutilityofnetDxbyintegratinggeneexpressionandcopynumbervariantstoclassifybreastcancertumoursasbeingoftheLuminalAsubtype(N=348tumours;Ref2).Usinggeneexpressiondata,netDxperformedaswellasorbetterthanestablishedstateoftheartmachinelearningmethods,achievingameanaccuracyof89%(2%s.d.)inclassifyingLuminalA.Inthesecondapplication,wepredictcase/controlstatusinautismspectrumdisordersbasedontheoccurrenceofrarecopynumberdeletionsinmetabolicpathways(N=3,291patients;Ref3);thispredictorachievedbetterperformancethanpreviouslypublishedmethods.netDxusespathwayfeaturestoaidbiologicalinterpretabilityandresultscanbevisualizedasanintegratedpatientsimilaritynetworktoaidclinicalinterpretation.Uponpublication,netDxsoftwarewillbemadepubliclyavailableviagithub;thesoftwareprovidesworkedexamplesandeasy-to-usefunctionsfordesignofcustompredictorworkflows.Moreathttp://netdx.orgReferences:1.netDxpreprint:http://dx.doi.org/10.1101/0844182.TheCancerGenomeAtlas(2012)Nature490:61.3. Pintoetal.(2014).AmJHumGen.94(5):677.
Page 121
111
PREVALENCEANDDETECTIONOFLOW-ALLELE-FRACTIONVARIANTSINCLINICALCANCERSAMPLES
Hyun-TaeShin1,2,JaeWonYun1,2,NayoungK.D.Kim1,Yoon-LaChoi2,3,Woong-YangPark1,2,4,PeterJ.Park5
1SamsungGenomeInstitute,SamsungMedicalCenter,Seoul,Korea;2Samsung
AdvancedInstituteofHealthScienceandTechnology,SungkyunkwanUniversity,Seoul,Korea;3DepartmentofPathology&TranslationalGenomics,SamsungMedicalCenter,SungkyunkwanUniversitySchoolofMedicine,Seoul,Korea;4DepartmentofMolecularCellBiology,SungkyunkwanUniversitySchoolofMedicine,Seoul,Korea;5Department
ofBiomedicalInformatics,HarvardMedicalSchool,Boston,MA
Hyun-TaeShinClinicalapplicationofsequencing-basedassaysrequireshighsensitivityandspecificityfordetectinggenomicalterations.Ouranalysisofmorethan5000cancersamplesrevealsthatasignificantfractionofclinically-actionablesomaticvariantsmayhavelowvariantallelefractions(VAF),indicatingtheimportanceofveryhighcoveragesequencingforthesepatients.Asacasestudy,wedescriberefractorycancerpatientswithclinicalresponsetotherapiesthattargetlowVAFalterations.
Page 122
112
AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETS
JeffreyA.Thompson1,CarmenJ.Marsit2
1DartmouthCollege,2EmoryUniversity
JeffreyThompsonManyresearchersnowhaveavailablemultiplehigh-dimensionalmolecularandclinicaldatasetswhenstudyingadisease.Asweenterthismulti-omiceraofdataanalysis,newapproachesthatcombinedifferentlevelsofdata(e.g.atthegenomicandepigenomiclevels)arerequiredtofullycapitalizeonthisopportunity.Inthiswork,weoutlineanewapproachtomulti-omicdataintegration,whichcreatesamodelofmethylationdysregulationanditseffectongeneexpressionandthencombinesthismolecularinformationwithclinicalpredictorsaspartofasingleanalysistocreateaprognosticriskscoreforclearcellrenalcellcarcinoma.Theapproachintegratesdatainmultiplewaysandyetcreatesmodelsthatarerelativelystraightforwardtointerpretandwithahighlevelofperformance.Over100randomsplitsofthedataintotrainingandtestingsets,ourmodelhadthehighestmedianC-indexofanymethodwetried,at.792.Furthermore,wedemonstratedthatourmolecularriskpredictorisindependentofclinicalcovariatesandthatthecombinedmodelresultsinstatisticallysignificantlyhigheraccuracythaneitherdatatypealone.Additionally,theproposedprocessofdataintegrationitselfcapturesrelationshipsinthedatathatrepresenthighlydisease-relevantfunctions.Thegenesignatureweidentifyforclearcellrenalcellcarcinomaprognosisisenrichedforgenesthatarecentralnodesinaprotein-proteininteractionnetworkassociatedwiththeJAK-STATsignalingcascade,whichitselfisaknownfactorinkidneycancerprogression.Oursignatureisalsoenrichedforgenesinpathwaysinvolvedinimmuneresponse,whichareincreasinglytargetedbynovelcancertherapies.Wecallthismodelthemethylation-to-expressionfeaturemodel(M2EFM).Althoughoneoftheotherapproachesweconsideredalsoresultedinahighlyaccuratemodel,M2EFMperformedbetterwithafarmoreparsimoniousmodelthatshedslightonthepotentialrelationshipbetweenabnormalgeneregulationandcancerprognosis.Givenourresults,wethinkthatfurtherdevelopmentofthisapproachiswarranted.
Page 123
113
CYP2D6DIPLOTYPECALLINGFROMWGSUSINGASTROLABE:UPDATE
AndreaGaedigk1,GreysonP.Twist2,SarahSoden2,EmilyG.Farrow2,NeilA.Miller2
1DivisionofClinicalPharmacology&TherapeuticInnovation,Children'sMercy,KansasCity,SchoolofMedicine,UniversityofMissouri-KansasCity;2CenterforPediatric
GenomicMedicine,Children'sMercy,KansasCityGreysonTwistBackground:Tofacilitatehaplotypecallingandtranslationintophenotype,wehavepreviouslydevelopedaprobabilisticscoringsystem,Astrolabe(initiallycalledConstellation;Twistetal2016,GenMed1:15007)enablingautomatedCYP2D6diplotypecallingfromwholegenomesequencing.Wehaveimplementedaseriesofimprovementstoincreasecallaccuracyaswellaseaseofuse.Methods:TheStudywasapprovedbytheInstitutionalReviewBoardofChildren’sMercyKansasCityandincludedatotalof85subjects(7HapMap;78patients/parents).WGSdatawerereanalyzedwiththeDRAGENBio-ITprocessor(EdicoGenome)toimprovethequalityofvariationcalls.TheAstrolabeCYP2D6alleledefinitiontablewasexpandedtoincludea)additionalvariantsavailablethroughtheP450NomenclatureDatabase;b)variantscharacterizedbyourlaboratory,butnotavailablethroughtheNomenclatureDatabase;c)resequencingofsomealleles(e.g.*10,*17)forwhichonlyexonsareannotatedbytheNomenclatureDatabase.Programmingerrorsinthescoringalgorithmwererepairedandunittestedaswellasabroadrangeofvariantfileinputtypeswereincluded(vcf,gvcf,tabix,.gz).ImprovementsalsoincludeversioningoftheAstrolabetoolandthenomenclaturedatafromwhichcallsaregenerated.ToaccountforhaplotypeanddiplotypecombinationsnotobservedinoursamplesetsimulationsofallpossiblediplotypecombinationswereperformedusingtheARTreadsimulatorandDRAGENanalysispipeline.Astrolabeisavailableathttps://www.childrensmercy.org/genomesoftwareportal/.Results:TomaximizeAstrolabecallaccuracy,weremovedCYP2D6*1E,*3B,*4A-L,*4N,*6D,*10C-D,and*45Bfromthecallset,becauseofincompletealleledefinitions(basedonexonsonly),orSNP(s)thatarenotuniquetoanallele.Forexample,1749A>GispartoftheCYP2D6*3Band*103definitions,butalsoappearstobepresentonsome*1subvariants.Likewise,3288A>GisnotlimitedtoCYP2D6*6Dasimpliedbythenomenclaturedatabase,thuscausingerroneousAstrolabecalls.Callswithourreviseddefinitionswerecomparedwiththoseobtainedbygenotyping.AstrolabealsoaccuratelyidentifiedsubjectswithcopynumbervariationsincludingtheCYP2D6*5deletion(n=5)andgeneduplications(n=2).Also,increasedvariantcallingaccuracyoftheDRAGENpipelineimprovedthecallingofseveralsamples(n=).Astrolabecorrectlycalled7731/8128simulateddiplotypes(95%recall);133missedand264multiplecalls).Ofthemissedcalls124weredueto*38calledas*1.Discussion:TheseriesofimprovementstoAstrolabeincreasedcallaccuracyandminimizedthenumberofnocalls.PhenotypepredictionbasedonAstrolabewassuperioroverthatderivedfromalimitedgenotypepanel.ContinuedrefinementofexistingalleledefinitionsandtheinclusionofnovelhaplotypedefinitionswillfurtherimprovetheAstrolabetool.WearecurrentlyapplyingAstrolabetootherNGSdatasetsincludingexomesandtargetedNGSpanels.
Page 124
114
INTEGRATION,INTERPRETATIONANDDISPLAYOFMULTI-OMICDATAFORPRECISIONMEDICINE
DavidS.Wishart1,AnaMarcu1,AnChiGuo1,AshAnwar2,SolveigJohannessen3,CraigKnox4,MichaelWilson4,ChristophH.Borchers5,PieterCullis6,RobertFraser2
1UniversityofAlberta,2MolecularYouInc.,3EduceDesignInc.,4OMxInc.,5Universityof
Victoria,6UniversityofBritishColumbia
DavidWishartThegoalofprecisionmedicineistouseadvancedmulti-omictechnologiestoimprovetheaccuracyofmedicaldiagnosesandenhancetheindividualizationofmedicaltreatment.Thefundamentalchallengeinprecisionmedicineisnotinthemeasurementorcollectionofmulti-omicdatabutinitsdelivery.Inparticular,theintegration,interpretationanddisplayofmulti-omicdatahasproventobeparticularlyproblematic.Herewedescribesomeofourexperiencesintacklingthisproblemandoutlineanumberofimportantfindingsthatwebelieveareworthsharing.Ourmostimportantfindingwastheneedtousehighquality,quantitative‘omicsdata.Measuringabsolutelyquantitative‘omicsdataensuresgreaterreproducibilityandpermitsdirectcomparisonstowell-establishedclinicalreferencevalues.Several‘omicslaboratoriesofferingquantitativeserviceshavebeenidentifiedandthesearedescribedhere.Second,wediscoveredthatcustomdatabasescontainingbiomarker-diseasedataareessential.Veryfewofthesekindsofdatabasesexist,buttheyarenecessaryforthecomparisonandfullintegrationofmulti-omicdata.Inparticular,theyprovidetheinformationneededtointegratemulti-omicmeasuresandtodeterminediseaserisk.Abriefdescriptionofafewofthesebiomarker-diseasedatabasesisprovided.Third,wediscoveredthatcolor-codedgraphs,whicharehyperlinkedtodetailedtextualexplanations,arenecessaryforthefacileinterpretationofthemulti-omicdata–bothbypatientsandphysicians.Anexampleofawell-designed,web-enabled“dashboard”isshowntohighlightthesefindings.Finallywefoundthatcomprehensivedatabasesofactionableresponsesmustbepreparedsothatdetailed,customizablemedical,lifestyle,dietorpharmacologicalguidancecanbeprovidedtotreatorpreventconditionsdetectedbythesemulti-omicmeasurements.Examplesofseveralomics-derived,actionableresponsesareprovidedtoclarifythispoint.Thesefindings,alongwithseveralassociatedsoftwaretoolsanddatabases,haverecentlybeenintegratedintoanautomaticworkflowthatallowsawiderangeofmulti-omicmeasurementstobeintegrated,interpretedanddisplayedforprecisionorpersonalizedmedicineapplications.
Page 125
115
BIOTHINGSAPIS:LINKEDHIGH-PERFORMANCEAPISFORBIOLOGICALENTITIES
JiwenXin1,CyrusAfrasiabi1,SebastienLelong1,GingerTsueng1,SeanD.Mooney2,AndrewI.Su1,ChunleiWu1
1TheScrippsResearchInstitute,2TheUniversityofWashington
ChunleiWuTheaccumulationofbiologicalknowledgeandtheadvanceofwebandcloudtechnologyaregrowinginparallel.Recently,manybiologicaldataprovidersstarttoprovideweb-basedAPIs(ApplicationProgrammingInterfaces)foraccessingdatainasimpleandreliablemanner,inadditiontothetraditionalrawflat-filedownloads.WebAPIsprovidemanybenefitsovertraditionalfiledownloads.Forinstance,userscanrequestspecificdatasuchasalistofgenesofinterestwithouthavingtodownloadtheentiredataset,therebyprovidingthelatestdataondemandandreducingcomputationanddatatransfertimes.Thismeansthatprogrammerscanspendlesstimeonwranglingdata,andmoretimeonanalysisanddiscovery.Buildinganddeployingscalableandhigh-performancewebAPIsrequiressophisticatedsoftwareengineeringtechniques.Wepreviouslydevelopedhigh-performanceandscalablewebAPIsforgeneandgeneticvariantannotations,accessibleatMyGene.infoandMyVariant.info.Thesetwoservicesareatangibleimplementationofourexpertiseandcollectivelyserveover4millionrequestseverymonthfromthousandsofuniqueusers.Crucially,theunderlyingdesignandimplementationofthesesystemsareinfactnotspecifictogenesorvariants,butrathercanbeeasilyadaptedtootherbiomedicaldatatypessuchdrugs,diseases,pathways,species,genomes,domainsandinteractions.Wearecurrentlyexpandingthescopeofourplatformtootherbiologicalentities.Collectively,wereferthemas“BioThingsAPIs”(http://biothings.io).WealsoappliedJSON-LD(JSONforLinkingData)technologyinthedevelopmentofBioThingsAPIs.JSON-LDprovidesastandardwaytoaddsemanticcontexttotheexistingJSONdatastructure,forthepurposeofenhancingtheinteroperabilitybetweenAPIs.WehavedemonstratedtheapplicationsofJSON-LDwithBioThingsAPIs,includingdatadiscrepancychecksaswellasthecross-linkingbetweenAPIs.
Page 126
116
SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY
POSTERPRESENTATIONS
Page 127
117
SINGLECELLSIGNALINGSTATESREVEALINDUCTIONOFNON-GENETICVARIATIONINRESISTANCETOTRAIL-INDUCEDAPOPTOSIS
ReemaBaskar,HarrisFienberg,GarryNolan,SeanBendall
StanfordUniversity
ReemaBaskarTNFalpha-relatedapoptosis-inducingligand(TRAIL)hasbeenshowntospecificallytargetcancercells,howeverrampantresistancehascurtaileditsefficacyasadrug.Cell-to-cellvariationhasbeenpreviouslylinkedtoresistancetoTRAIL-inducedapoptosis.Wefurtherinvestigatenon-geneticphenotypicvariationasanovelmodeofdrugresistance.Usingmasscytometry,wecapturedhigh-dimensional,single-cellsignalingstatesofdifferentcancertypesoverthecourseofTRAILtreatment.Forthefirsttime,weprovideacomprehensivesinglecelloverviewofTRAILsignalingdynamicsandprovidepopulationmetricstoquantifyheterogeneitywithinresistancephenotypes.WedemonstratethatwhileallcellsrespondtoTRAIL,asubsetofthempersistintransientresistantstatesanddonotprogresstoapoptosis.OurmethodsshowcorrelationbetweenheterogeneityofresponsetoTRAILandpersistenceofnon-apoptotic,viablecancercellsindrug.Wealsoshowthatcombinatorialtherapiesdesignedtoinhibitimplicatedpathwaysinconservedresistantstatesdonoteradicateresistanceandinfactcaninducenewstatesofresistance.Thisstudypresentsexperimentalandcomputationaltoolstoinvestigatenon-geneticphenotypicvariationasanovelmodeofdrugresistanceincanceranddemonstratestheirutilityinunderstandingresistancetoTRAIL-inducedapoptosis.
Page 128
118
ANOVELK-NEARESTNEIGHBORSAPPROACHTOCOMPAREMULTIPLEBIOLOGICALCONDITIONSINSINGLECELLDATA
TylerJ.Burns1,GarryP.Nolan2,NikolaySamusik2
1StanfordUniversitySchoolofMedicine,Dept.ofCancerBiology;2StanfordUniversitySchoolofMedicine,BaxterLaboratoryforStemCellBiology
TylerBurnsHighdimensionalsingle-celldataisroutinelyvisualizedintwodimensionsusingdimensionreductionalgorithmsliket-SNE,PrincipleComponentsAnalysis(PCA),orforce-directedgraphs.Whencomparinglevelsofintracellularproteinsinbasalversusperturbedcells,clusteringmustbeusedtovisualizechangesinspecificmarkersinasinglegraph.However,discretizingadatasetdoesnotallowonetounderstandsubtle,rare,and/orcontinuousbiologicalchangesacrosstheoriginalmanifold.Herein,wepresentanalgorithmthatrepresentseachcell’sinformationcontentasitsaverageacrossk-nearestneighbors.Thisallowsforcomparisonstobemadebetweenbiologicalconditionsonaper-cellbasis.Weusethistoproducedetailedt-SNEmapsdepictingbiologicalchange,andcorrelationanalysistoenumeratesignalingresponsestoperturbation.
Page 129
119
SINGLE-CELLRNASEQUENCINGINPRIMARYGLIOBLASTOMA:IMPROVINGANALYSISOFHETEROGENEOUSSAMPLESBYINCORPORATING
QUANTIFICATIONOFUNCERTAINTY
WendyMarieIngram,DebdiptoMisra,NicholasF.Marko,MarylynRitchie
GeisingerHealthSystemWendyIngramBackground:Glioblastoma(GBM)isthemostcommonanddeadlybraincancerinadults.Theassociatedlethalitymaybeattributabletotheintrinsicheterogeneityofmicro-invasivetumorcells,someofwhichareunavoidablyleftbehindfollowingtumorresection.Thetranscriptomicheterogeneitymaycontributetothesurvivalandsubsequentproliferationofasmallsubsetofcellsthatareresistanttoradiationandchemotherapy.Ithaslongbeenhypothesizedthatinvestigationsintothesetumorsatasinglecelllevelwillallowforbettermolecularunderstandingoftreatmentresistanceandthedevelopmentofnoveltherapeuticapproaches.Recently,advancesinsinglecellcaptureandsequencingtechnologyhavebecomeavailableandallowforthesestudiestobeconducted.However,therearemanytechnicalandcomputationalchallengesinherenttosinglecelltranscriptomicsthatarenotaddressedbytraditionalRNA-seqanalysistools.Thesechallengesincludeuncertaintyoftechnicalandbiologicalvarianceandmustbecarefullyconsideredinorderforbiologicallyandtherapeuticallyrelevantconclusionstobereached.Methods:TumortissuefromtwoGBMpatientsundergoingsurgicalresectionaspartofstandardofcaretherapywascollectedatthetimeofsurgery.WeusedtheFluidigmC1microfluidicsplatformtocapturesinglecellsfollowedbyRNAsequencing(RNA-seq)ofthesecellsandabulkpopulationof~10,000cellsfromeachtumor.Wecomparedtwodifferenttranscriptomicalignmenttools,Bowtieandkallisto,andanalyzedthesinglecelltranscriptionalheterogeneityofcellswithinandbetweentumorsusingtherecentlydevelopedanalysistools,sleuth.Tothebestofourknowledge,wearethefirsttoutilizethissinglecellcapturemethodandperformsinglecellRNA-seqanalysisusingthenewlydevelopedkallistoandsleuthprogramsforprimaryGBMtissuesamples.Results:WeshowthattheFluidigmC1microfluidicssinglecellcapturemethodproduceshighqualitytranscriptomicmaterialforRNA-seqandmayhavebenefitsoveralternativemethods(e.g.fluorescence-activatedcellsorting)suchasshorterpreparationtime.Thekallisto-sleuthanalysisprogramsprovideimprovedestimationofgeneexpressionvariabilityandmorereliableclusteringofsinglecellsbyleveragingtheuniquefeaturesofequivalencygroupsandbootstrapestimatesofkallisto.Clusteranalysisdemonstratesthatcertaincellsfrombothtumorsclustertogetherandsharesomecommonexpressionpatters,buttheremainingcellsclusterintumor-specificgroupsordonotgroupwithothercells.WeobservemarkedintertumorandintratumortranscriptionalvariabilityandnotethataverageexpressionfromsinglecellsdoesnotreliablycorrelatewiththebulkcellRNA-seqabundanceestimates.Takentogether,wehaveshownthatthecombinationofFluidigmC1andthekallisto-sleuthanalysisprogramsprovetobeusefulandreliablemethodstoobtainandanalyzehighqualitysinglecellRNA-seqdatafortheinvestigationofprimarytumortissues.
Page 130
120
REGISTRATIONOFFLOWCYTOMETRYDATAUSINGSWIFTCLUSTERTEMPLATESTOREMOVECHANNEL-SPECIFICORCLUSTER-SPECIFICVARIATION
JonathanA.Rebhahn1,SallyA.Quataert1,GauravSharma2,TimR.Mosmann1
1CenterforVaccineBiologyandImmunology,UniversityofRochesterMedicalCenter;2DepartmentofElectricalandComputerEngineering,UniversityofRochester
TimMosmannStandardizationbetweenflowcytometryexperimentsperformedatdifferenttimesisdifficultbecausevariationsincellparameterscanbecausedbymanyfactors,includingchangesinantibodyreagents,stainingprotocols,cellhandling,differentcytometers,andcytometersettingssuchasphotomultiplieramplificationvoltages.Thesevariationsmayoverwhelmthegenuinebiologicaldifferencesbeinginvestigated,suchasgeneticordisease-specificvariationsbetweensubjects.Technicalvariationscanbepartlyreducedbymanuallyadjustinganalysisgates,butthisissubjectiveandtime-consuming.Previousmethodsforsemi-automatedadjustmenthavereliedonhistogrampeaksormanualgatingtoidentifyanchorpopulations.Wehavenowdevelopedfully-automatedmethodsforregisteringflowcytometrysamples,i.e.normalizingthefluorescenceintensityofeachcellinallchannels.Wetakeadvantageofthehigh-resolutionclustertemplatesderivedbyclusteringreferencesamplesbytheSWIFTalgorithm.ThesetemplatesrepresentGaussianmodeldescriptionsofthemultidimensionaldata.Ifsamplestoberegisteredareatleastmoderatelysimilartothetarget/referencesample,assignmentofthetestsampletothetemplateresultsinmostcellsbeingassignedtotheappropriatecluster,butclustersthathaveshiftedinthetestsamplethenhavealteredmedianvaluesinoneormorechannels.Thishigh-resolutionpositionalinformationisusedfortwotypesofregistration:Rigid,orper-channelregistrationcomparesclusterlocationsbetweenthetargetandthetestsampletoberegistered,andthebest-fitregistrationadjustmentsaredeterminedforeachchannelandappliedincrementally,reassigningthecellsateachsteptoimprovethefinalfit.Thisobjectivelyusespositionalinformationfromallclusters,regardlessofclustersizevariation,andsuccessfullycorrectsglobalartifactssuchasstainingorcytometersettingsthatcause‘batch’differencesbetweenassaydays.Fluid,orper-clusterregistrationcalculatestheregistrationadjustmentrequiredforeachclusterinthetestsampletooverlapfullywithitscorrespondingclusterinthereferencesample.Thisregistersclustersmorecompletely,andcanremoveindividualvariation(duetoe.g.geneticordisease-specificeffects).Fluidregistrationremovesmostpositionalinformation-thisisdesirableifthemainexperimentaloutcomeisexpectedtobevariationsofthenumberofcellsofdifferenttypes.Thismethodhasbeenappliedtodatasetsthatincludechangesduetoassaydates,flowcytometers,subjects,andsequentialbloodsamples.Mostvariationoccurredbetweencytometersandassaydays,lessbetweensubjects,andtheleastbetweendifferentbleedsfromthesameperson.Registrationsubstantiallyimprovedcorrelationsbetweenclustermedians.Thenumberofcellsperclusteralsoshowedincreasedcorrelation,suggestingthatunmodifiedsamplesassignedtotheclustertemplatessometimeshadcellsassignedtoaninappropriatecluster.ThustheSWIFTcluster-basedregistrationcanimprovesubsequentflowcytometryanalysis.Registeredsamplescanbeanalyzedbyavarietyofmanualorautomatedprocedures.
Page 131
121
WORKSHOP:NOBOUNDARYTHINKINGINBIOINFORMATICS
POSTERPRESENTATION
Page 132
122
ENABLINGRICHERDATAINTEGRATIONFORGENOMICEPIDEMIOLOGY
E. Griffiths1,D.Dooley2,C.Bertelli1,J.Adam3,F.Bristow3,T.Matthews3,A.Petkau3,M.Courtot4,J.A.Carriço5,A.Keddy6,R.Beiko6,L.M.Schriml7,E.Taboada8,M.Graham3,G.VanDomselaar3,
W. Hsiao2,F.Brinkman1
1SFU,Burnaby,BC,Canada;2BCCentreforDiseaseControl,Vancouver,BC,Canada;3PHAC,Winnipeg,MB,Canada;4EBI,Hinxton,Cambridge,UK;5Univ.ofLisbon,Lisbon,Portugal;
6DalhousieUniv.,Halifax,NS,Canada;7Univ.ofMarylandSchoolofMedicine,Baltimore,MD,USA;8PHAC,Lethbridge,AB,Canada
FionaBrinkmanOnebarriertoeffectivelycapitalizingonwholegenomesequencedataisefficient,robustannotationandintegrationofassociatedcontextualdata(metadata).Whetherhuman,microbialorotherorganismalgenomicsequence,frequentlysuchcontextualdataistoounorganized,infreetextformat,toenableeffectiveintegrationforansweringmoresophisticatedquestions.ApproachestohelpovercomethisbarrierareillustratedherewiththeIntegratedRapidInfectiousDiseasesAnalysis(IRIDA.ca)ProjectandGenomicEpidemiologyOntology(GenEpiO.org)Consortium.Microbialpathogenwholegenomesequencingprovidesthehighestresolutionmolecular“fingerprint”forinfectiousdiseaseepidemiologyandistransformingpublichealthpractice–enablingmorerapididentificationofdiseaseoutbreaks,theirsources,andpotentialcontrolmeasures.However,suchmicrobialgenomicdata(likehuman‘omicdata)mustbecombinedwithepidemiological/clinical/laboratory/otherhealthcaredata(“contextualdata”)tobemeaningfullyinterpretedforclinicalandpublichealthquestions/actions.Furthermore,informationmustbesharedbetweendifferentagenciestoefficientlyassessandmanageriskstohumanhealthacrossjurisdictions.Currently,terminologiesdescribingpublichealthdatacannotbeeasilymappedacrossfunctionally-similarsoftwaresystemswithoutintricateinterventionbyspecialists,resultingindataexchangesystemsthatarestaticandfragile.Topromoteefficientdataexchangeandintelligencesharing,weproposeanintuitiveplatformforsearching,identifying,andverifyingthefundamentalhealthcareentityelements(ontologyterms)tomaptoinstitutionalapplicationdataformats,startingwithgenomicandpublichealthcontextualdata.KeyinnovationsaretheproposedGenomicEpidemiologyEntityMart(GE2M)thatallowsuserstoinspecttermdefinitions,labeling,anddatabasecrossreferencesinauser-friendlyformat,plusasoftwaresystemallowingdifferentjurisdictionstousethetermssuitableforthem,essentiallychoosingfroma“shoppingcart”ofoptionsmappedbetweenjurisdictions/organizations.AverypreliminaryprototypeofthisconcepthasbeenestablishedaspartoftheIRIDA.caprojectandtheGenEpiOConsortium(aconsortiumof70researchersfrom15countriesinterestedincontributingtothiseffort).Wehypothesizethatacommonandaccessibleontologyentitymartcanbedeveloped,ifappropriatetoolsforinterfacingdomainexpertswiththismartaredeveloped–andthemartisfirstappliedtopracticalmicrobialgenomicepidemiologydatasharingneedsbetweenselectpublichealthsystems(withconsultationinvolvingalargerconsortium).Inaddition,newgenomicdatavisualizationapproachesarebeingdevelopedforintegrationintotheIRIDAsoftwareplatform,toenablemoreinteractive,flexiblevisualizationofgenomicdatawithdifferentlevelsorviewsofcontextualdata(fromfinelydetailedcomparisonsofgenomicislandsandotherfeaturesbetweengenomes,toexamininggenomicdatainthecontextofgeographicaldata).IRIDAisbeingusedinCanada’spublichealthagency,andthisopensourcesoftwareisalsobeinginstalledinothercountriesinterestedinco-developingthisresourceandusingafederateddatasharingapproach.
Page 133
123
AUTHORINDEX
A
Abrams,Zachary·59Abul-Husn,NouraS.·107Adam,J.·122Adams,Micah·54Aevermann,Brian·37Afrasiabi,Cyrus·115Agarwal,Vibhu·17Akbarian,Schahram·72Aldrich,MelindaC.·20,35Alkan,Can·77,87Alser,Mohammed·77Altman,RussB.·79,90Andreoletti,Gaia·101Andres-Terrè,Marta·13Ansel,Mark·80Anwar,Ash·114Armaselu,Bogdan·18Arunachalam,HarishBabu·18Ashley,Euan·104Aslam,Naureen·68Asmann,YanW.·85Ayati,Marzieh·67
B
Bader,GaryD.·110Baheti,Saurabh·108Bai,Yongsheng·68Bakken,Trygve·37Baskar,Reema·117Bauer,ChristopherR.·27Beaulieu-Jones,BrettK.·19Bebek,Gurkan·52Beck,Andrew·50Beck,Mette·28Beiko,R.·122Bellovich,Keith·70Bendall,Sean·117Berens,Michael·31Berry,GeraldJ.·90Bertelli,C.·122Best,AaronA.·2Bhat,Zeenat·70Bichko,Dmitri·76Biernacka,JoannaM.·85Biggin,MarkD.·64Boespflug,Mathieu·76Boley,Nathan·98Bongen,Erika·13Borchers,ChristophH.·114Borecki,Ingrid·34Borrayo,Ernesto·63
Bowden,DonaldW.·45Bowerman,Nathan·2Breitenstein,MatthewK.·96Breitwieser,Gerda·34Brenner,StevenE.·101Brinkman,BenjaminH.·97Brinkman,F.·122Bristow,F.·122Bromberg,Yana·69Brosius,FrankC.·70Brown,AndrewJ.Leigh·83Brubaker,Douglas·52Brunak,Soren·28Burns,TylerJ.·118Bustillo,JuanR.·93
C
Cai,Guoshuai·73Calhoun,VinceD.·9,93Cao,Mengfei·3Carey,DavidJ.·107Carr,Steven·109Carriço,J.A.·122Carter,LesterG.·106Cederberg,Kevin·18Chan,Yu-FengYvonne·23Chance,Mark·67Chang,Rui·11Chasioti,Danai·71Chaudhary,Kumardeep·74Chen,Rong·56Chen,Yii-DerI.·45Cheung,Philip·84Cheville,John·108Chew,Guo-Liang·64Choi,Yoon-La·111Christiansen,Lena·37Clay,AlyssaI.·96Clemons,PaulA.·31Cline,Melissa·15Cohain,Ariella·11Cordero,Pablo·38Correa,Adolofo·45Costello,JamesC.·60Courtot,M.·122Cowen,LenoreJ.·3Crawford,DanaC.·20Cullis,Pieter·114
D
Daescu,Ovidiu·18Danaee,Padideh·44Darrow,Bruce·22
Page 134
124
Davila, Jaime·108Davis-Dusenbery,Brandi·14deBelle,J.Steven·84De,Subhajyoti·88Deisseroth,ColeA.·13DeJongh,Matthew·2Denny,Joshua·35deVries,Edsko·76Dewey,FrederickE.·34,107Dhruv,Harshil·31Diaz,Diana·51Diez-Fuertes,Francisco·37Dincer,Aslihan·72Disselkoen,Craig·54Divaraniya,AparnaA.·11Dominguez,Facundo·76Domselaar,G.Van·122Donato,Michele·51Dooley,D.·122Dougherty,Greg·105Draghici,Sorin·51Dudley,JoelT.·11,22,72Dunnenberger,H.M.·106Durmaz,Arda·52
E
Eckel-Passow,Jeanette·105Egawa,Fumiko·33Empey,P.E.·106Ergin,Oguz·77Ertekin-Taner,Nilüfer·85Eskin,Eleazar·80
F
Fantl,WendyJ.·78Farber-Eger,Eric·20Farrow,EmilyG.·102,113Fienberg,Harris·117Fink,CrisG.·97Fink,Tobias·24,99Fogarty,Zach·108Foo,ChuanSheng·98Fornage,Myriam·45Franks,JenniferM.·73Frase,A.T.·106Fraser,Robert·114Fread,KristinI.·39Freedman,BarryI.·45Freimuth,R.R.·106Freimuth,Robert·105
G
Gadegbeku,Crystal·70Gaedigk,A.·106
Gaedigk,Andrea·81,102,113Gallion,Jonathan·29,103Gao,Chen·41Garmire,Lana·74Gavin,Davin·72Gelijns,Annetine·22Genes,Nicholas·23Ghaeini,Reza·44Ghose,Saugata·87Gipson,Debbie·70Giron,Emily·84Glicksberg,Benjamin·56Gliske,StephenV.·97Goldfeder,Rachel·104Gordon,A.·106Gosh,Debashis·88Graham,M.·122Gray,DanielH.·78Greenside,Peyton·98Griffiths,E.·122Groop,Leif·28Guney,Emre·12Guo,AnChi·114
H
Haidar,C.·106Hart,Steven·105Hassan,Hasan·77Hawkins,Jennifer·70Haynes,WinstonA.·13He,Dan·30He,Shuyao·55Hellwege,JacklynN.·45Henderson,TimA.D.·52Hendrix,David·44Hershman,StevenG.·23Herzog,Julia·70Hicks,J.K.·106Hodge,Rebecca·37Hoff,FiekeW.·57Hoffman,J.M.·106Hollister,BrittanyM.·20Hong,Na·75Horton,Iain·105Horton,TerzahM.·57Hoskins,RogerA.·101Hsiao,W.·122Hu,ChenyueW.·57Huang,Austin·76Huang,Kun·7,59Hui,Shirley·110
I
Iakoucheva,LiliaM.·82Imoto,Seiya·91Ingram,WendyMarie·119
Page 135
125
Israeli,Johnny·98Isserlin,Ruth·110Ivkovic,Sinisa·14
J
Jebakaran,Jebakumar·22Jiang,Guoqian·75Johannessen,Solveig·114Johnson,KippW.·22Johnson,Travis·59Ju,Wenjun·70
K
Kabat,Halla·53Kaddurah-Daouk,RimaF.·96Kaka,Hussam·110Kamp,Thomas·54Kandamurugu,Manickam·107KanigelWinner,KimberlyR.·60Karakurt,Gunnur·48Kasarskis,Andrew·11,22Kashef-Haghighi,Dorna·33Kaushik,Gaurav·14Keaton,JacobM.·45Kechris,Katerina·86Keddy,A.·122Khatri,Purvesh·13, 46Kiefer,Jeff·31Kim,Jeremie·77,87Kim,Juho·61Kim,Junghi·41Kim,NayoungK.D.·111Kim,Seungchan·31Klein,T.E.·106Knox,Craig·114Ko,MelissaE.·78Kornblau,StevenM.·57Kovatch,Patricia·22Koyutürk,Mehmet·48, 67Kretzler,Matthias·70Krishnamurthy,Sarathbabu·34,107Krishnan,MichelleL.·42Kuan,PeiFen·55Kuncheva,Zhana·42Kundaje,Anshul·98Kural, Deniz ·14
L
Lanchantin,Jack·21Larson,Melissa·108Larson,NicholasB.·108Lasken,RogerS.·37
Lau,KatyL.·97Lavage,DanielR.·27,34Leader,JosephB.·27,34,107Leavey,Patrick·18Ledbetter,DavidH.·107Lee,Donghyuk·77Lee,Inhan·53Lee,M.T.·106Lein,Ed·37Lelong,Sebastien·115Li,JingyiJessica·64Li,Lang·71Li,Li·22Li,MatthewD.·13Li,Shuyu·56Lichtarge,Olivier·25,29,103Lin,Chih-Hsu·25Lin,Dongdong·93Lin,Yaxiong·105Lincoln,StephenE.·15Liu,Charles·13Liu,Jingyu·93Liu,Keli·50Liu,LarryY.·48Liu,Tao·109Lofgren,Shane·13Lopez,Alexander·34Lu,Liangqun·74Lua,RhonaldC.·25Lucas,AnastasiaM.·34Luedtke,Alexander·50
M
Ma,Meng·56Machida-Hirano,Ryoko·63Mahendra,Divya·31Mahlich,Yannick·69Mahoney,J.Matthew·27Mallory,EmilyK.·79Mandric,Igor·80Mangul,Serghei·80Marcu,Ana·114Marko,NicholasF.·119Marsit,CarmenJ.·32,112Martinez,Maria·18Massengill,Susan·70Matthews,T.·122Matveeva,OlgaV.·94McCorrison,Jamison·37McDermott,JasonE.·109McDonnell,ShannonK.·85,105,108McEachin,RichardC.·70Mead,David·105Mehta,Sanket·57Mertins,Philipp·109Metpally,RaghuP.R.·34,107Miller,Jeremy·37Miller,Neil·81,102, 106,113
Page 136
126
Miotto,Riccardo·22Mishra,Rashika·18Misra,Debdipto·119Miyano,Satoru·91Mohan,Rahul·98Montana,Giovanni·42Montoya,Dennis·80Mooney,SeanD.·82,106,115Moore,JasonH.·19Moskovitz,Alan·22Mosmann,TimR.·120Moult,John·101Murray,MichaelF.·107Mutlu,Onur·77,87Myers,Mark·105
N
Nair,AshaA.·108Nair,K.Sreekumaran·96Narla,Goutham·67Nazipova,NafisaN.·94Ng,MaggieC.Y.·45Nguyen,Tin·51Nho,Kwangsik·8Ni'Suilleabhain,Molly·18Ning,Xia·71Nolan,GarryP.·39,78,117,118Non,Amy·20Novotny,Mark·37
O
O'Connell,Chloe·33O’Brien,Daniel·108Ogurtsov,AlekseyY.·94Osafo,Nana·89Otolorin,Abiodun·89Overton,John·34
P
Pai,Shraddha·110Palmer,NicholetteD.·45Pan,Wei·41Pandey,Gaurav·47Pankow,JamesS.·45Parida,Laxmi·30Park,PeterJ.·111Park,Woong-Yang·111Paten,Benedict·15Payne,Samuel·109Pejaver,Vikas·82Pen,Jian·65Pendergrass,SarahA.·27,34Peng,Jian·4,61
Penn,John·34Pennathur,Subramaniam·70Perrone-Bizzozero,Nora·93Person,T.N.·106Perumal,Kalyani·70Peterson,Josh·35,106Petkau,A.·122Petyuk,Vladislav·109Pinney,Sean·22Playter,ChristopherS.·78Plevritis,SylviaK.·78Poirion,Olivier·74Pond,Sergei·83Probert,Chris·98Prodduturi,Naresh·75Pyc,MaryA.·84
Q
Qi,Yanjun·21Qu,Meng·4,65Quataert,SallyA.·120Qutub,AminaA.·57
R
Radcliffe,Richard·86Rademakers,Rosa·85Radivojac,Predrag·82Rakheja,Dinesh·18Rasmussen-Torvik,LauraJ.·45Ré,Christopher·79,90Rebhahn,JonathanA.·120Reddy,JosephS.·85Reed,Gay·105Reich,DavidL.·22Reid,Jeffrey·34Relling,M.V.·106Ren,Yingxue·85Restrepo,NicoleA.·20Rich,StephenS.·45Ricks,Doran·22Risacher,ShannonL.·8Riska,Shaun·108Ritchie,MarylynD.·34,106,119Roden,Dan·35Rodland,Karin·109Rogers,Linda·23Ross,Jason·105Ross,OwenA.·85Rossetti,Maura·80Rotman,Jeremy·80Rotter,JeromeI.·45Röttger,Richard·5Rubin,DanielL.·90Rudra,Pratyaydipta·86Russell,Nate·61Russell,Pamela·86
Page 137
127
S
Saba,Laura·86Salman,Ali·68Samuels,David·35Samusik,Nikolay·118Sander,Thomas·24,99Sangkuhl,K.·106Sarangi,Vivekananda·85Saykin,AndrewJ.·8Scarpa,JosephR.·11Schadt,EricE.·11,23,56,72Schaid, Daniel ·108Scherbina,Anna·98Scheuermann,RichardH.·37Schlatzer,Daniela·67Schork,Nicholas·37Schreiber,StuartL.·31Schriml,L.M.·122Schultz,André·57Scott,ErickR.·23Scott,Madeleine·46Scott,S.A.·106Sengupta,Anita·18Sengupta,ParthoP.·22Senol,Damla·77,87Shabalina,SvetlanaA.·94Shah,NigamH.·17Shameer,Khader·22Sharma,Gaurav·120Shen,Li·8,71Shi,Wen·86Shifman,Sagiv·80Shin,Hyun-Tae·111Shrikumar,Avanti·98Shuldiner,AlanR.·107Simonovic,Janko·14Singh,Ritambhara·21Sinnwell,JasonP.·85Smelser,Diane·107Smith,Kyle·88Smith,Richard·109Snyder,John·27Snyder,Michael·90Soden,Sarah·102,113Song,Junyan·55Southerland,William·89Speyer,Gil·31Spreafico,Roberto·80Stacey,WilliamC.·97Stai,Tony·105Stanescu,Ana·47Statz,Benjamin·80Steemers,Frank·37Strauli,Nicolas·80Strickland,WilliamD.·39Stuart,JoshuaM.·38Su,AndrewI.·115Su,Hai·7Swank,Julie·105
Sweeney,TimothyE.·13
T
Taboada,E.·122Tam,Andrew·13Taroni,JaclynN.·73Tatonetti,NicholasP.·22Taylor,KentD.·45Teh,Charis·78Thibodeau, Stephen N.·108Thompson,JeffreyA.·32,112Tignor,Nicole·23Tijanic,Nebojsa·14Tintle,Nathan·2,50,54Tomczak,Aurelie·13Tran,DannyN.·37Tran,HaiJ.·31Tsueng,Ginger·115Tully,Tim·84Tunkle,Leo·53Twist,GreysonP.·81,102,106,113
V
Vallania,Francesco·13,46VanDerWey,Will·80VanHouten,Jacob·35Venepally,Pratap·37Venkataraman,GuhanRam·33Verma,A.·106Verma,ShefaliS.·34Vestal,Brian·86Volety,Rama·105vonKorff,Modest·24,99
W
Wagenknecht,LynneE.·45Wall,DennisPaul·33Wang,Beilun·21Wang,Changchang·56Wang,Chao·7Wang,Chen·75Wang,Liewei·96Wang,Pei·23Wang,Sheng·4,65Wang,Yu-Ping·9Weaver,Steven·83Weinshilboum,RichardM.·96Wertheim,Joel·83Westergaard,David·28Whaley,R.M.·106Whirl-Carrillo,M.·106Whitfield,MichaelL.·73Whiting,Kathleen·48Wiepert,Mathieu·105
Page 138
128
Wiggins,Roger·70Wiley,Laura·35Wilkins,AngelaD.·25,29,103Williams,M.S.·106Wilson,JamesG.·45Wilson,Michael·114Wilson,StephenJ.·25Wiredja,Danica·67Wishart,DavidS.·114Wiwie,Christian·5Woon,M.·106Worrell,GregA.·97Wu,Chunlei·106,115
X
Xin,Hongyi·77Xin,Jiwen·115
Y
Yahi,Alexandre·22Yamaguchi,Rui·91Yan,Jingwen·8Yang,HarryTaegyun·80Yang,Lin·7
Yang,Shan·15Yang,W.·106Yao,Xiaohui·71Yoo,Byunggil·81Younkin,SteveG.·85Yu,Kun-Hsing·90Yun,JaeWon·111
Z
Zaitlen,Noah·80Zelikovsky,Alex·80Zhang,Bin·72Zhang,Can·15Zhang,Fan·37Zhang,Pengyue·71Zhang,Yan·59Zhang,Yao-zhong·91Zhu,Chengsheng·69Zhu,Jun·11Zhu,Kuixi·11Ziemek,Daniel·76Zille,Pascal·9Zunder,EliR.·39,78Zweig,Micol·23