Page 1
PACIFICSYMPOSIUMONBIOCOMPUTING2020
ABSTRACTBOOK
PosterPresenters:Posterspaceisassignedbyabstractpagenumber.Pleasefindthepagethatyourabstractisonandputyourposterontheposterboardwith
thecorrespondingnumber(e.g.,ifyourabstractisonpage50,putyourposteronboard#50).
Proceedingspaperswithoralpresentations#2-39arenotassignedposterspace.
Abstractsareorganizedfirstbysession,thenthelastnameofthefirstauthor.Presentingauthors’namesareunderlinedintheTableofContents
andinboldtextontheabstracts.
Page 2
PROCEEDINGSPAPERSWITHORALPRESENTATIONSATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE....................................................1PREDICTINGLONGITUDINALOUTCOMESOFALZHEIMER'SDISEASEVIAATENSOR-BASEDJOINT
.........................................................................................................................2CLASSIFICATIONANDREGRESSIONMODELLodewijkBrand,KaiNichols,HuaWang,HengHuang,LiShen,fortheADNI
ROBUSTLYEXTRACTINGMEDICALKNOWLEDGEFROMEHRS:ACASESTUDYOFLEARNINGAHEALTH................................................................................................................................................................3KNOWLEDGEGRAPH
IreneY.Chen,MonicaAgrawal,StevenHorng,DavidSontag...........................4INCREASINGCLINICALTRIALACCRUALVIAAUTOMATEDMATCHINGOFBIOMARKERCRITERIA
JessicaW.Chen,ChristianA.Kunder,NamBui,JamesL.Zehnder,HelioA.Costa,HenningStehrADDRESSINGTHECREDITASSIGNMENTPROBLEMINTREATMENTOUTCOMEPREDICTIONUSINGTEMPORAL
...........................................................................................................................................................5DIFFERENCELEARNINGSaharHarati,AndreaCrowell,HelenMayberg,ShamimNemati
FROMGENOMETOPHENOME:PREDICTINGMULTIPLECANCERPHENOTYPESBASEDONSOMATICGENOMIC.................................................................................................6ALTERATIONSVIATHEGENOMICIMPACTTRANSFORMER
YifengTao,ChunhuiCai,WilliamW.Cohen,XinghuaLuAUTOMATEDPHENOTYPINGOFPATIENTSWITHNON-ALCOHOLICFATTYLIVERDISEASEREVEALSCLINICALLY
................................................................................................................................................7RELEVANTDISEASESUBTYPESMaxenceVandromme,TomiJun,PonniPerumalswami,JoelT.Dudley,AndreaBranch,LiLi
...8MONITORINGICUMORTALITYRISKWITHALONGSHORT-TERMMEMORYRECURRENTNEURALNETWORKKeYu,MingdaZhang,TianyiCui,MilosHauskrecht
INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS......................................9DISORDEREDFUNCTIONCONJUNCTION:ONTHEIN-SILICOFUNCTIONANNOTATIONOFINTRINSICALLY
............................................................................................................................................................10DISORDEREDREGIONSSinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan
DENOVOENSEMBLEMODELINGSUGGESTSTHATAP2-BINDINGTODISORDEREDREGIONSCANINCREASESTERIC.....................................................................................................................................11VOLUMEOFEPSINBUTNOTEPS15
N. SuhasJagannathan,ChristopherW.V.Hogue,LisaTucker-KelloggMODULATIONOFP53TRANSACTIVATIONDOMAINCONFORMATIONSBYLIGANDBINDINGANDCANCER-
......................................................................................................................................................12ASSOCIATEDMUTATIONSXiaorongLiu,JianhanChen
EXPLORINGRELATIONSHIPSBETWEENTHEDENSITYOFCHARGEDTRACTSWITHINDISORDEREDREGIONSAND...............................................................................................................................................................13PHASESEPARATION
RamizSomjee,DianaM.Mitrea,RichardW.Kriwacki
MUTATIONALSIGNATURES...........................................................................................................................14......................................................15PHYSIGS:PHYLOGENETICINFERENCEOFMUTATIONALSIGNATUREDYNAMICS
SarahChristensen,MarkD.M.Leiserson,MohammedEl-KebirTRACKSIGFREQ:SUBCLONALRECONSTRUCTIONSB ..16ASEDONMUTATIONSIGNATURESANDALLELEFREQUENCIESCaitlinF.Harrigan,YuliaRubanova,QuaidMorris,AlinaSelega
DNAREPAIRFOOTPRINTUNCOVERSCONTRIBUTIONOFDNAREPAIRMECHANISMTOMUTATIONAL.............................................................................................................................................................................17SIGNATURES
DamianWojtowicz,MarkD.M.Leiserson,RodedSharan,TeresaM.Przytycka
PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK....................................................................................................................................................................18
.........19CLINICALCONCEPTEMBEDDINGSLEARNEDFROMMASSIVESOURCESOFMULTIMODALMEDICALDATAAndrewL.Beam,BenjaminKompa,AllenSchmaltz,InbarFried,GriffinWeber,NathanPalmer,XuShi,TianxiCai,IsaacS.Kohane
Page 3
ii
ASSESSMENTOFIMPUTATIONMETHODSFORMISSINGGENEEXPRESSIONDATAINMETA-ANALYSISOF...........................................................................................................20DISTINCTCOHORTSOFTUBERCULOSISPATIENTS
CarlyA.Bobak,LaurenMcDonnell,MatthewD.Nemesure,JustinLin,JaneE.HillTOWARDSIDENTIFYINGDRUGSIDEEFFECTSFROMSOCIALMEDIAUSINGACTIVELEARNINGANDCROWD
.................................................................................................................................................................................21SOURCINGSophieBurkhardt,JuliaSiekiera,JosuaGlodde,MiguelA.Andrade-Navarro,StefanKramer
.....................................22MICROVASCULARDYNAMICSFROM4DMICROSCOPYUSINGTEMPORALSEGMENTATIONShirGur,LiorWolf,LiorGolgher,PabloBlinder
.....................................................23USINGTRANSCRIPTIONALSIGNATURESTOFINDCANCERDRIVERSWITHLUREDavidHaan,RuikangTao,VerenaFriedl,IoannisN.Anastopoulos,ChristopherK.Wong,AlanaS.Weinstein,JoshuaM.Stuart
PAGE-NET:INTERPRETABLEANDINTEGRATIVEDEEPLEARNINGFORSURVIVALANALYSISUSING.......................................................................................................24HISTOPATHOLOGICALIMAGESANDGENOMICDATA
JieHao,SaiChandraKosaraju,NelsonZangeTsaku,DaeHyunSong,MingonKangMACHINELEARNINGALGORITHMSFORSIMULTANEOUSSUPERVISEDDETECTIONOFPEAKSINMULTIPLE
.....................................................................................................................................................25SAMPLESANDCELLTYPESTobyDylanHocking,GuillaumeBourque
GRAPH-BASEDINFORMATIONDIFFUSIONMETHODFORPRIORITIZINGFUNCTIONALLYRELATEDGENESIN...................................................................................................................26PROTEIN-PROTEININTERACTIONNETWORKS
MinhPham,OlivierLichtargeALITERATURE-BASEDKNOWLEDGEGRAPHEMBEDDINGMETHODFORIDENTIFYINGDRUGREPURPOSING
....................................................................................................................................27OPPORTUNITIESINRAREDISEASESDanielN.Sosa,AlexanderDerry,MargaretGuo,EricWei,ConnorBrinton,RussB.Altman
...............28TWO-STAGEMLCLASSIFIERFORIDENTIFYINGHOSTPROTEINTARGETSOFTHEDENGUEPROTEASEJacobT.Stanley,AlisonR.Gilchrist,AlexC.Stabell,MaryA.Allen,SaraL.Sawyer,RobinD.Dowell
ENHANCINGMODELINTERPRETABILITYANDACCURACYFORDISEASEPROGRESSIONPREDICTIONVIA....................................................................................................29PHENOTYPE-BASEDPATIENTSIMILARITYLEARNING
YueWang,TongWu,YunlongWang,GaoWangPRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE...............................................................................................................................................................30
...............31INTEGRATEDCANCERSUBTYPINGUSINGHETEROGENEOUSGENOME-SCALEMOLECULARDATASETSSuzanArslanturk,SorinDraghici,TinNguyen
ASSESSMENTOFCOVERAGEFORENDOGENOUSMETABOLITESANDEXOGENOUSCHEMICALCOMPOUNDSUSING...................................................................................................................32ANUNTARGETEDMETABOLOMICSPLATFORM
SekWonKong,CarlesHernandez-FerrerCOVERAGEPROFILECORRECTIONOFSHALLOW-DEPTHCIRCULATINGCELL-FREEDNASEQUENCINGVIAMULTI-
..............................................................................................................................................................33DISTANCELEARNINGNicholasB.Larson,MelissaC.Larson,JieNa,CarlosP.Sosa,ChenWang,Jean-PierreKocher,RossRowsey
..............................................................................................34PGXMINE:TEXTMININGFORCURATIONOFPHARMGKBJakeLever,JuliaM.Barbarino,LiGong,RachelHuddart,KatrinSangkuhl,RyanWhaley,MichelleWhirl-Carrillo,MarkWoon,TeriE.Klein,RussB.Altman
....................................35THEPOWEROFDYNAMICSOCIALNETWORKSTOPREDICTINDIVIDUALS'MENTALHEALTHShikangLiu,DavidHachen,OmarLizardo,ChristianPoellabauer,AaronStriegel,TijanaMilenkovic
.............................36IMPLEMENTINGACLOUDBASEDMETHODFORPROTECTEDCLINICALTRIALDATASHARINGGauravLuthria,QingboWang
....................................37PATHWAYANDNETWORKEMBEDDINGMETHODSFORPRIORITIZINGPSYCHIATRICDRUGSYashPershad,MargaretGuo,RussB.Altman
ROBUST-ODAL:LEARNINGFROMHETEROGENEOUSHEALTHSYSTEMSWITHOUTSHARINGPATIENT-LEVEL..........................................................................................................................................................................................38DATA
JiayiTong,RuiDuan,RuowangLi,MartijnJ.Scheuemie,JasonH.Moore,YongChen
Page 4
iii
COMPUTATIONALLYEFFICIENT,EXACT,COVARIATE-ADJUSTEDGENETICPRINCIPALCOMPONENTANALYSISBY..................................................39LEVERAGINGINDIVIDUALMARKERSUMMARYSTATISTICSFROMLARGEBIOBANKS
JackWolf,MarthaBarnard,XuetingXia,NathanRyder,JasonWestra,NathanTintle
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONSARTIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE..................................................40
.......................41MULTICLASSDISEASECLASSIFICATIONFROMMICROBIALWHOLE-COMMUNITYMETAGENOMESSaadKhan,LibushaKelly
.....................................42LITGEN:GENETICLITERATURERECOMMENDATIONGUIDEDBYHUMANEXPLANATIONSAllenNie,ArturoL.Pineda,MattW.Wright,HannahWand,BryanWulf,HelioA.Costa,RonakY.Patel,CarlosD.Bustamante,JamesZou
...........................................43MULTILEVELSELF-ATTENTIONMODELANDITSUSEONMEDICALRISKPREDICTIONXianlongZeng,YunyiFeng,SoheilMoosavinasab,DeborahLin,SimonLin,ChangLiu
IDENTIFYINGTRANSITIONALHIGHCOSTUSERSFROMUNSTRUCTUREDPATIENTPROFILESWRITTENBY.................................................................................................................................................44PRIMARYCAREPHYSICIANS
HaoranZhang,ElisaCandido,AndrewS.Wilton,RaquelDuchen,LiisaJaakkimainen,WalterWodchis,QuaidMorris
OBTAININGDUAL-ENERGYCOMPUTEDTOMOGRAPHY(CT)INFORMATIONFROMASINGLE-ENERGYCTIMAGE.........................................45FORQUANTITATIVEIMAGINGANALYSISOFLIVINGSUBJECTSBYUSINGDEEPLEARNING
WeiZhao,TianlingLv,RenaLee,YangChen,LeiXingINTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS....................................46
............................................................47MANY-TO-ONEBINDINGBYINTRINSICALLYDISORDEREDPROTEINREGIONSWei-LunAlterovitz,EshelFaraggi,ChristopherJ.Oldfield,JingweiMeng,BinXue,FeiHuang,PedroRomero,AndrzejKloczkowski,VladimirN.Uversky,A.KeithDunker
MUTATIONALSIGNATURES...........................................................................................................................48......................................49IMPACTOFMUTATIONALSIGNATURESONMICRORNAANDTHEIRRESPONSEELEMENTS
EiriniStamoulakatou,PietroPinoli,StefanoCeri,RosarioPiroGENOMEGERRYMANDERING:OPTIMALDIVISONOFTHEGENOMEINTOREGIONSWITHCANCERTYPESPECIFIC
.....................................................................................................................................50DIFFERENCESINMUTATIONRATESAdamoYoung,JacobChmura,YoonsikPark,QuaidMorris,GurnitAtwal
PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK....................................................................................................................................................................51
..........................................................52LEARNINGALATENTSPACEOFHIGHLYMULTIDIMENSIONALCANCERDATABenjaminKompa,BeauCoker
................53SCALINGSTRUCTURALLEARNINGWITHNO-BEARSTOINFERCAUSALTRANSCRIPTOMENETWORKSHao-ChihLee,MatteoDanieletto,RiccardoMiotto,SarahT.Cherng,JoelT.Dudley
PATHFLOWAI:AHIGH-THROUGHPUTWORKFLOWFORPREPROCESSING,DEEPLEARNINGAND.......................................................................................................................54INTERPRETATIONINDIGITALPATHOLOGY
JoshuaJ.Levy,LucasA.Salas,BrockC.Christensen,AravindhanSriharan,LouisJ.VaickusIMPROVINGSURVIVALPREDICTIONUSINGANOVELFEATURESELECTIONANDFEATUREREDUCTION
...................................................55FRAMEWORKBASEDONTHEINTEGRATIONOFCLINICALANDMOLECULARDATA*LisaNeums,RichardMeier,DevinC.Koestler,JeffreyA.Thompson
BAYESIANSEMI-NONNEGATIVEMATRIXTRI-FACTORIZATIONTOIDENTIFYPATHWAYSASSOCIATEDWITH.............................................................................................................................................................56CANCERPHENOTYPES
SunhoPark,NabhonilKar,Jae-HoCheong,TaeHyunHwang......................................................................................57TREE-WEIGHTINGFORMULTI-STUDYENSEMBLELEARNERS
MayaRamchandran,PrasadPatil,GiovanniParmigianiPTREXPLORER:ANAPPROACHTOIDENTIFYANDEXPLOREPOSTTRANSCRIPTIONALREGULATORY
.............................................................................................................................58MECHANISMSUSINGPROTEOGENOMICSArunimaSrivastava,MichaelSharpnack,KunHuang,ParagMallick,RaghuMachiraju
Page 5
iv
NETWORKREPRESENTATIONOFLARGE-SCALEHETEROGENEOUSRNASEQUENCESWITHINTEGRATIONOF............................................................................59DIVERSEMULTI-OMICS,INTERACTIONS,ANDANNOTATIONSDATA
NhatTran,JeanGao...............60HADOOPANDPYSPARKFORREPRODUCIBILITYANDSCALABILITYOFGENOMICSEQUENCINGSTUDIES
NicholasR.Wheeler,PenelopeBenchek,BrianW.Kunkle,KaraL.Hamilton-Nelson,MikeWarfe,JeremyR.Fondran,JonathanL.Haines,WilliamS.Bush
CERENKOV3:CLUSTERINGANDMOLECULARNETWORK-DERIVEDFEATURESIMPROVECOMPUTATIONAL..............................................................................................................61PREDICTIONOFFUNCTIONALNONCODINGSNPS
YaoYao,StephenA.RamseyPRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE...............................................................................................................................................................62
.................63ANOMIGAN:GENERATIVEADVERSARIALNETWORKSFORANONYMIZINGPRIVATEMEDICALDATAHoBae,DahuinJung,Hyun-SooChoi,SungrohYoon
FREQUENCYOFCLINVARPATHOGENICVARIANTSINCHRONICKIDNEYDISEASEPATIENTSSURVEYEDFOR..........................................................................64RETURNOFRESEARCHRESULTSATACLEVELANDPUBLICHOSPITAL
DanaC.Crawford,JohnLin,JessicaN.CookeBailey,TylerKinzy,JohnR.Sedor,JohnF.O'Toole,WilliamsS.Bush
................65NETWORK-BASEDMATCHINGOFPATIENTSANDTARGETEDTHERAPIESFORPRECISIONONCOLOGYQingzhiLiu,MinJinHa,RupamBhattacharyya,LanaGarmire,VeerabhadranBaladandayuthapani
PHENOME-WIDEASSOCIATIONSTUDIESONCARDIOVASCULARHEALTHANDFATTYACIDSCONSIDERING..................................................................66PHENOTYPEQUALITYCONTROLPRACTICESFOREPIDEMIOLOGICALDATA
KristinPassero,XiHe,JiayanZhou,BertramMueller-Myhsok,MarcusE.Kleber,WinfriedMaerz,MollyA.Hall
.....................................67ATEMPO:PATHWAY-SPECIFICTEMPORALANOMALIESFORPRECISIONTHERAPEUTICSChristopherMichaelPietras,LiamPower,DonnaK.Slonim
.........................................................68FEATURESELECTIONANDDIMENSIONREDUCTIONOFSOCIALAUTISMDATAPeterWashington,KelleyMariePaskov,HaikKalantarian,NathanielStockham,CatalinVoss,AaronKline,RitikPatnaik,BriannaChrisman,MayaVarma,QandeelTariq,KaitlynDunlap,JesseySchwartz,NickHaber,DennisP.Wall
POSTERPRESENTATIONSATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE..................................................69PRIORITIZINGCOPYNUMBERVARIANTSUSINGPHENOTYPEANDGENEFUNCTIONALSIMILARITY.....................70AzzaAlthagafi,JunChen,RobertHoehndorf
INFERRINGTHEREWARDFUNCTIONSTHATGUIDECANCERPROGRESSION..............................................................71JohnKalantari,HeidiNelson,NicholasChia
PREDICTINGDISEASE-ASSOCIATEDMUTATIONOFMETAL-BINDINGSITESINPROTEINSUSINGADEEPLEARNINGAPPROACH................................................................................................................................................................................72MohamadKoohi-Moghadam,HaiboWang,YuchuanWang,XinmingYang,HongyanLi,JunwenWang,HongzheSun
GENERAL...............................................................................................................................................................73RANKINGRASPATHWAYMUTATIONSUSINGEVOLUTIONARYHISTORYOFMEK1...................................................74KatiaAndrianova,IgorJouline
INTEGRATIVEANALYSISOFCOPDANDLUNGCANCERMETADATAREVEALSSHAREDALTERATIONSINIMMUNERESPONSE,PTENANDPI3K-AKTPATHWAYS}.............................................................................................................75DannielleSkander,ArdaDurmaz,MohammedOrloff,GurkanBebek
INVESTIGATINGSOURCESOFIRREPRODUCIBILITYINANALYSISOFGENEEXPRESSIONDATA..................................76CarlyA.Bobak,JaneE.Hill
ETHEREUMANDMULTICHAINBLOCKCHAINSASSECURETOOLSFORINDIVIDUALIZEDMEDICINE........................77CharlotteBrannon,GamzeGursoy,SarahWagner,MarkGerstein
Page 6
v
GENOMICPREDICTORSOFL-ASPARAGINASE-INDUCEDPANCREATITISINPEDIATRICCANCERPATIENTS............78BrittDrogemoller,GalenE.B.Wright,ShahradRassekh,ShinyaIto,BruceCarleton,ColinRoss,TheCanadianPharmacogenomicsNetworkforDrugSafetyConsortium
NITECAP:ANOVELMETHODANDINTERFACEFORTHEIDENTIFICATIONOFCIRCADIANBEHAVIORINHIGHLYPARALLELTIME-COURSEDATA.............................................................................................................................................79ThomasG.Brooks,CrisW.Lawrence,NicholasF.Lahens,SoumyashantNayak,DimitraSarantopoulou,GarretA.FitzGerald,GregoryR.Grant
THEINTERPLAYOFOBESITYANDRACE/ETHNICITYONMAJORPERINATALCOMPLICATIONS.............................80YaadiraBrown,MPH;OlubodeA.Olufajo,MD,MPH;EdwardE.CornwellIII,MD;WilliamSoutherland,PhD
ACOMPARISONOFPHARMACOGENOMICINFORMATIONINFDA-APPROVEDDRUGLABELSANDCPICGUIDELINES..............................................................................................................................................................................81KatherineI.Carrillo,TeriE.Klein
XTEA:ATRANSPOSABLEELEMENTINSERTIONANALYZERFORGENOMESEQUENCINGDATAFROMMULTIPLETECHNOLOGIES........................................................................................................................................................................82ChongChu,RebecaMonroy,SoohyunLee,E.AliceLee,PeterJ.Park
GOGETDATA(GGD):SIMPLE,REPRODUCIBLEACCESSTOSCIENTIFICDATA............................................................83MichaelCormier,JonBelyeu,BrentPedersen,JoeBrown,JohannesKoster,AaronR.Quinlan
GLOBALEPIGENOMICREGULATIONOFGENEEXPRESSIONANDCELLULARPROLIFERATIONINT-CELLLEUKEMIA..84SinisaDovat,YaliDing,BoZhang,JonathonL.Payne,FengYue
APHARMACOGENOMICINVESTIGATIONOFTHECARDIACSAFETYPROFILEOFONDANSETRONINCHILDRENANDINPREGNANTWOMEN............................................................................................................................................................85GalenE.B.Wright,BrittI.Drögemöller,JessicaTrueman,KaitlynShaw,MichelleStaub,ShahnazChaudhry,SholehGhayoori,FudanMiao,MichelleHigginson,GabriellaS.S.Groeneweg,JamesBrown,LauraAMagee,SimonD.Whyte,NicholasWest,SoniaBrodie,Geert’tJong,HowardBerger,ShinyaIto,ShahradR.Rassekh,ShubhayanSanatani,ColinJ.D.Ross,BruceC.Carleton
TREND:APLATFORMFOREXPLORINGPROTEINFUNCTIONINPROKARYOTESUSINGPHYLOGENETICS,DOMAINARCHITECTURES,ANDGENENEIGHBORHOODSINFORMATION......................................................................................86VadimM.Gumerov,IgorB.Zhulin
TRACKSIGFREQ:SUBCLONALRECONSTRUCTIONSBASEDONMUTATIONSIGNATURESANDALLELEFREQUENCIES..87CaitlinF.Harrigan,YuliaRubanova,QuaidMorris,AlinaSelega
AFLEXIBLEPIPELINEFORTHEPREDICTIONOFBIOMARKERSRELEVANTTODRUGSENSITIVITY........................88V.KeithHughitt,SayehGorjifard,AleksandraM.Michalowski,JohnK.Simmons,RyanDale,EricC.Polley,JonathanJ.Keats,BeverlyA.Mock
CREATINGAMETABOLICSYNDROMERESEARCHRESOURCE(METSRR)...................................................................89WillyshaJenkins,ChristianRichardson,ClarLyndaWilliams-DeVanePhD
UTILIZINGCOHORTINFORMATIONTOFINDCAUSATIVEVARIANTS...............................................................................90SenayKafkas,RobertHoehndorf
INTEGRATEDANALYSISOFJAK-STATPATHWAYINHOMEOSTASIS,SIMULATEDINFLAMMATIONANDTUMOUR...91MilicaKrunic,AnzhelikaKarjalainen,MojoyinolaJoannaOla,StephenShoebridge,SabineMacho-Maschler,CarolineLassnig,AndreaPoelzl,MatthiasFarlik,NikolausFortelny,ChristophBock,BirgitStrobl,MathiasMueller
BEERS2:THENEXTGENERATIONOFRNA-SEQSIMULATOR....................................................................................92NicholasF.Lahens,ThomasG.Brooks,DimitraSarantopoulou,SoumyashantNayak,CrisLawrence,AnandSrinivasan,JonathanSchug,GarretA.FitzGerald,JohnB.Hogenesch,YosephBarash,GregoryR.Grant
EFFECTMODIFICATIONBYAGEONADIAGNOSTICTHREE-GENE-SIGNATUREINPATIENTSWITHACTIVETUBERCULOSIS........................................................................................................................................................................93LaurenMcDonnell,CarlyBobak,MatthewNemesure,JustinLin,JaneHill
CLASSIFICATIONANDMUTATIONPREDICTIONFROMGASTROINTESTINALCANCERHISTOPATHOLOGYIMAGESUSINGDEEPLEARNING...........................................................................................................................................................94SungHakLee,Hyun-JongJang
Page 7
vi
MAPPINGTHEEMERGENCEANDMIGRATIONOFHEMATOPOIETICSTEMCELLSANDPROGENITORSDURINGHUMANDEVELOPMENTATSINGLECELLRESOLUTION..................................................................................................95FeiyangMa,VincenzoCalvanese,SandraCapellera-Garcia,SophiaEkstrand,MatteoPellegrini,HannaK.A.Mikkola
LARGE-SCALEMACHINELEARNINGANDGRAPHANALYTICSFORFUNCTIONALPREDICTIONOFPATHOGENPROTEINS.................................................................................................................................................................................96JasonMcDermott,SongFeng,WilliamNelson,Joon-YongLee,SayanGhosh,ArifulKhan,MahanteshHalappanavar,JustineNguyen,JonathanPruneda,DavidBaltrus,JoshuaAdkins
GENE-SETANALYSISUSINGGWASSUMMARYSTATISTICSANDGTEXDATABASE....................................................97MasahiroNakatochi
TARGETINGCANCERVIASIGNALINGPATHWAYS:ANOVELAPPROACHTOTHEDISCOVERYOFGENECCDC191'SDOUBLE-AGENTFUNCTIONUSINGDIFFERENTIALGENEEXPRESSION,HEATMAPANALYSESTHROUGHAIDEEPLEARNING,ANDMATHEMATICALMODELING................................................................................98AnnieOstojic
RFEX:SIMPLERANDOMFORESTMODELANDSAMPLEEXPLAINERFORNON-MACHINELEARNINGEXPERTS..99DragutinPetkovic,AliAlavi,DanDanCai,JizhouYang,SabihaBarlaskar
APPARENTBIASTOWARDLONGGENEMISREGULATIONINMECP2SYNDROMESDISAPPEARSAFTERCONTROLLINGFORBASELINEVARIATIONS.....................................................................................................................100AyushT.Raman,AmyEPohodich,Ying-WooiWan,HariKrishnaYalamanchili,WilliamE.Lowry,HudaY.Zoghbi,ZhandongLiu
PREDICTIONOFCHRONOLOGICALANDBIOLOGICALAGEFROMLABORATORYDATA..............................................101LukeSagers,LukeMelas-Kyriazi,ChiragJ.Patel,ArjunK.Manrai
WHOLEGENOMESEQUENCINGANALYSISOFINFLUENZACVIRUSINKOREA...........................................................102SooyeonLim,HanSolLee,JiYunNoh,JoonYoungSong,HeeJinCheong,WooJooKim
MININGTHEHUMUHUMUNUKUNUKUAPUAANDTHESHAKAOFAUTISMWITHBIGDATABIOMEDICALDATASCIENCE.................................................................................................................................................................................103PeterWashington,BriannaChrisman,KaitiDunlap,AaronKline,ArmanHusic,MichaelNing,KelleyPaskov,NateStockham,MayaVarma,EmilieLeBlanc,JackKent,YordanPenev,MinWooSun,Jae-YoonJung,CatalinVoss,NickHaber,DennisP.Wall
DEVELOPMENTOFARECURRENCEPREDICTIONMODELFOREARLYLUNGADENOCARCINOMAUSINGRADIOMICS-BASEDARTIFICIALINTELLIGENCE.....................................................................................................................................104HeeChulYang,GunseokPark,JiEunOh
DRLPC:DIMENSIONREDUCTIONOFSEQUENCINGDATAUSINGLOCALPRINCIPALCOMPONENTS...................105YunJooYoo,FatemehYavartanu,ShelleyB.Bull
META-ANALYSISINEXHAUSTEDTCELLSFROMHOMOSAPIENSANDMUSMUSCULUSPROVIDESNOVELTARGETSFORIMMUNOTHERAPY........................................................................................................................................................106LinZhang,YichengGuo,HafumiNishi
INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS.................................107DISORDEREDFUNCTIONCONJUNCTION:ONTHEIN-SILICOFUNCTIONANNOTATIONOFINTRINSICALLYDISORDEREDREGIONS.........................................................................................................................................................108SinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan
MUTATIONALSIGNATURES........................................................................................................................109TRANSCRIPTION-ASSOCIATEDREGIONALMUTATIONRATESANDSIGNATURESINREGULATORYELEMENTSACROSS2,500WHOLECANCERGENOMES......................................................................................................................110JüriReimand
COMPLEXMOSAICSTRUCTURALVARIATIONSINHUMANFETALBRAINS...................................................................111ShobanaSekar,LiviaTomasini,MariaKalyva,TaejeongBae,LoganManlove,BoZhou,JessicaMariani,FritzSedlazeck,AlexanderE.Urban,ChristosProukakis,FloraM.Vaccarino,AlexejAbyzov
Page 8
vii
PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK.................................................................................................................................................................112STRATIFICATIONOFKIDNEYTRANSPLANTRECIPIENTSBASEDONTEMPORALDISEASETRAJECTORIES............113IsabellaFriisJørgensenPhD,SørenSchwartzSørensenPhD,SørenBrunakPhD
MODELINGGENEEXPRESSIONLEVELSFROMEPIGENETICMARKERSUSINGADYNAMICALSYSTEMSAPPROACH114JamesBrunner,JacobKim,KordM.Kober
TRANSLATINGBIGDATANEUROIMAGINGFINDINGSINTOMEASUREMENTSOFINDIVIDUALVULNERABILITY..115PeterKochunov,PaulThompson,NedaJahanshad,ElliotHong
AUTOMATINGNEW-USERCOHORTCONSTRUCTIONWITHINDICATIONEMBEDDINGS............................................116RachelD.Melamed
REPRODUCIBILITY-OPTIMIZEDSTATISTICALTESTINGFOROMICSSTUDIES.............................................................117TomiSuomi,LauraElo
DATAINTEGRATIONEXPECTATIONMAPS:TOWARDSMOREINFORMED'OMICDATAINTEGRATION.................118TiaTate,ChristainRichardson,ClarLyndaWIlliams-DeVane
PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE............................................................................................................................................................119INTEGRATEDOMICSDATAMININGOFSYNERGISTICGENEPAIRSFORCANCERPRECISIONMEDICINE.................120EunaJeong,ChoaPark,SukjoonYoon
THEPOWEROFDYNAMICSOCIALNETWORKSTOPREDICTINDIVIDUALS'MENTALHEALTH.................................121ShikangLiu,DavidHachen,OmarLizardo,ChristianPoellabauer,AaronStriegel,TijanaMilenkovic
ROBUST-ODAL:LEARNINGFROMHETEROGENEOUSHEALTHSYSTEMSWITHOUTSHARINGPATIENT-LEVELDATA.......................................................................................................................................................................................122JiayiTong,RuiDuan,RuowangLi,MartijnJ.Scheuemie,JasonH.Moore,YongChen
PHARMGKB:AUTOMATEDLITERATUREANNOTATIONS............................................................................................123MichelleWhirl-Carrillo,LiGong,RachelHuddart,KatrinSangkuhl,RyanWhaley,MarkWoon,JuliaBarbarino,JakeLever,RussB.Altman,TeriE.Klein
WORKSHOPSWITHPOSTERPRESENTATIONSPACKAGINGBIOCOMPUTINGSOFTWARETOMAXIMIZEDISTRIBUTIONANDREUSE...........124APOLLOPROVIDESCOLLABORATIVEGENOMEANNOTATIONEDITINGWITHTHEPOWEROFJBROWSE...........125NathanDunn,ColinDiesh,RobertBuels,HelenaRasche,AnthonyBretaudeau,NomiHarris,IanHolmes
G:PROFILER-ONEFUNCTIONALENRICHMENTANALYSISTOOL,MANYINTERFACESSERVINGLIFESCIENCECOMMUNITIES.......................................................................................................................................................................126LiisKolberg,UkuRaudvere,IvanKuzmin,JaakVilo,HediPeterson
INCREASINGUSABILITYANDDISSEMINATIONOFTHEPATHFXALGORITHMUSINGWEBAPPLICATIONSANDDOCKERSYSTEMS.................................................................................................................................................................127JenniferWilson,NicholasStepanov,AjinkyaChalke,MikeWong,DragutinPetkovic,RussB.Altman
TRANSLATIONALBIOINFORMATICSWORKSHOP:BIOBANKSINTHEPRECISIONMEDICINEERA......................................................................................................................................................................128IDENTIFICATIONOFBIOMARKERSRELATEDTOAUTISMSPECTRUMDISORDERUSINGGENOMICINFORMATION.................................................................................................................................................................................................129LeenaSait,MarthaGizaw,andIosifVaisman
APAN-CANCER3-GENESIGNATURETOPREDICTDORMANCY.....................................................................................130IvyTran,AnchalSharma,SubhajyotiDe
AUTHORINDEX.......................................................................................................................................131
Page 9
1
ATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
Page 10
2
ArtificialIntelligenceforEnhancingClinicalMedicine
PredictingLongitudinalOutcomesofAlzheimer'sDiseaseviaaTensor-BasedJointClassificationandRegressionModel
LodewijkBrand1,KaiNichols1,HuaWang1,HengHuang2,LiShen3,fortheADNI
1ColoradoSchoolofMines,2UniversityofPittsburgh,3UniversityofPennsylvaniaHuaWangAlzheimer'sdisease(AD)isaseriousneurodegenerativeconditionthataffectsmillionsofpeopleacrosstheworld.RecentlymachinelearningmodelshavebeenusedtopredicttheprogressionofAD,althoughtheyfrequentlydonottakeadvantageofthelongitudinalandstructuralcomponentsassociatedwithmulti-modalmedicaldata.Toaddressthis,wepresentanewalgorithmthatusesthemulti-blockalternatingdirectionmethodofmultiplierstooptimizeanovelobjectivethatcombinesmulti-modallongitudinalclinicaldataofvariousmodalitiestosimultaneouslypredictthecognitivescoresanddiagnosesoftheparticipantsintheAlzheimer'sDiseaseNeuroimagingInitiativecohort.Ournewmodelisdesignedtoleveragethestructureassociatedwithclinicaldatathatisnotincorporatedintostandardmachinelearningoptimizationalgorithms.Thisnewapproachshowsstate-of-the-artpredictiveperformanceandvalidatesacollectionofbrainandgeneticbiomarkersthathavebeenrecordedpreviouslyinADliterature.
Page 11
3
ArtificialIntelligenceforEnhancingClinicalMedicine
RobustlyExtractingMedicalKnowledgefromEHRs:ACaseStudyofLearningaHealthKnowledgeGraph
IreneY.Chen1,MonicaAgrawal1,StevenHorng2,DavidSontag1
1MassachusettsInstituteofTechnology,2BethIsraelDeaconessMedicalCenter
IreneChenIncreasinglylargeelectronichealthrecords(EHRs)provideanopportunitytoalgorithmicallylearnmedicalknowledge.Inoneprominentexample,acausalhealthknowledgegraphcouldlearnrelationshipsbetweendiseasesandsymptomsandthenserveasadiagnostictooltoberefinedwithadditionalclinicalinput.Priorresearchhasdemonstratedtheabilitytoconstructsuchagraphfromover270,000emergencydepartmentpatientvisits.Inthiswork,wedescribemethodstoevaluateahealthknowledgegraphforrobustness.Movingbeyondprecisionandrecall,weanalyzeforwhichdiseasesandforwhichpatientsthegraphismostaccurate.Weidentifysamplesizeandunmeasuredconfoundersasmajorsourcesoferrorinthehealthknowledgegraph.Weintroduceamethodtoleveragenon-linearfunctionsinbuildingthecausalgraphtobetterunderstandexistingmodelassumptions.Finally,toassessmodelgeneralizability,weextendtoalargersetofcompletepatientvisitswithinahospitalsystem.WeconcludewithadiscussiononhowtorobustlyextractmedicalknowledgefromEHRs.
Page 12
4
ArtificialIntelligenceforEnhancingClinicalMedicine
IncreasingClinicalTrialAccrualviaAutomatedMatchingofBiomarkerCriteria
JessicaW.Chen,ChristianA.Kunder,NamBui,JamesL.Zehnder,HelioA.Costa,HenningStehr
StanfordUniversitySchoolofMedicine
Successfulimplementationofprecisiononcologyrequiresboththedeploymentofnucleicacidsequencingpanelstoidentifyclinicallyactionablebiomarkers,andtheefficientscreeningofpatientbiomarkereligibilitytoon-goingclinicaltrialsandtherapies.Thisprocessistypicallyperformedmanuallybybiocurators,geneticists,pathologists,andoncologists;however,thisisatime-intensive,andinconsistentprocessamongsthealthcareproviders.WepresentthedevelopmentofafeaturematchingalgorithmicpipelinethatidentifiespatientswhomeeteligibilitycriteriaofprecisionmedicineclinicaltrialsviageneticbiomarkersandapplyittopatientsundergoingtreatmentattheStanfordCancerCenter.Thisstudydemonstrates,throughourpatienteligibilityscreeningalgorithmthatleveragesclinicalsequencingderivedbiomarkerswithprecisionmedicineclinicaltrials,thesuccessfuluseofanautomatedalgorithmicpipelineasafeasible,accurateandeffectivealternativetothetraditionalmanualclinicaltrialcuration.
Page 13
5
ArtificialIntelligenceforEnhancingClinicalMedicine
AddressingtheCreditAssignmentProbleminTreatmentOutcomePredictionusingTemporalDifferenceLearning
SaharHarati1,AndreaCrowell2,HelenMayberg3,ShamimNemati4
1StanfordUniversity,2EmoryUniversity,3MountSinai,4UniversityofCaliforniaSanDiego
SaharHaratiMentalhealthpatientsoftenundergoavarietyoftreatmentsbeforefindinganeffectiveone.Improvedpredictionoftreatmentresponsecanshortenthedurationoftrials.Akeychallengeofapplyingpredictivemodelingtothisproblemisthatoftentheeffectivenessofatreatmentregimenremainsunknownforseveralweeks,andthereforeimmediatefeedbacksignalsmaynotbeavailableforsupervisedlearning.HereweproposeaMachineLearningapproachtoextractingaudio-visualfeaturesfromweeklyvideointerviewrecordingsforpredictingthelikelyoutcomeofDeepBrainStimulation(DBS)treatmentseveralweeksinadvance.Intheabsenceofimmediatetreatment-responsefeedback,weutilizeajointstate-estimationandtemporaldifferencelearningapproachtomodelboththetrajectoryofapatient'sresponseandthedelayednatureoffeedbacks.Ourresultsbasedonlongitudinalrecordingsfrom12patientswithdepressionshowthatthelearnedstatevaluesarepredictiveofthelong-termsuccessofDBStreatments.Weachieveanareaunderthereceiveroperatingcharacteristiccurveof0.88,beatingallbaselinemethods.
Page 14
6
ArtificialIntelligenceforEnhancingClinicalMedicine
Fromgenometophenome:Predictingmultiplecancerphenotypesbasedonsomaticgenomicalterationsviathegenomicimpacttransformer
YifengTao1,ChunhuiCai2,WilliamW.Cohen1,XinghuaLu2
1CarnegieMellonUniversity,2UniversityofPittsburgh
YifengTaoCancersaremainlycausedbysomaticgenomicalterations(SGAs)thatperturbcellularsignalingsystemsandeventuallyactivateoncogenicprocesses.Therefore,understandingthefunctionalimpactofSGAsisafundamentaltaskincancerbiologyandprecisiononcology.Here,wepresentadeepneuralnetworkmodelwithencoder-decoderarchitecture,referredtoasgenomicimpacttransformer(GIT),toinferthefunctionalimpactofSGAsoncellularsignalingsystemsthroughmodelingthestatisticalrelationshipsbetweenSGAeventsanddifferentiallyexpressedgenes(DEGs)intumors.Themodelutilizesamulti-headself-attentionmechanismtoidentifySGAsthatlikelycauseDEGs,orinotherwords,differentiatingpotentialdriverSGAsfrompassengeronesinatumor.GITmodellearnsavector(geneembedding)asanabstractrepresentationoffunctionalimpactforeachSGA-affectedgene.GivenSGAsofatumor,themodelcaninstantiatethestatesofthehiddenlayer,providinganabstractrepresentation(tumorembedding)reflectingcharacteristicsofperturbedmolecular/cellularprocessesinthetumor,whichinturncanbeusedtopredictmultiplephenotypes.WeapplytheGITmodelto4,468tumorsprofiledbyTheCancerGenomeAtlas(TCGA)project.TheattentionmechanismenablesthemodeltobettercapturethestatisticalrelationshipbetweenSGAsandDEGsthanconventionalmethods,anddistinguishescancerdriversfrompassengers.ThelearnedgeneembeddingscapturethefunctionalsimilarityofSGAsperturbingcommonpathways.Thetumorembeddingsareshowntobeusefulfortumorstatusrepresentation,andphenotypepredictionincludingpatientsurvivaltimeanddrugresponseofcancercelllines.
Page 15
7
ArtificialIntelligenceforEnhancingClinicalMedicine
Automatedphenotypingofpatientswithnon-alcoholicfattyliverdiseaserevealsclinicallyrelevantdiseasesubtypes
MaxenceVandromme,TomiJun,PonniPerumalswami,JoelT.Dudley,AndreaBranch,LiLi
IcahnSchoolofMedicineatMountSinai,Sema4MaxenceVandrommeNon-alcoholicfattyliverdisease(NAFLD)isacomplexheterogeneousdiseasewhichaffectsmorethan20%ofthepopulationworldwide.SomesubtypesofNAFLDhavebeenclinicallyidentifiedusinghypothesis-drivenmethods.Inthisstudy,weuseddataminingtechniquestosearchforsubtypesinanunbiasedfashion.Usingelectronicsignaturesofthedisease,weidentifiedacohortof13,290patientswithNAFLDfromahospitaldatabase.Wegatheredclinicaldatafrommultiplesourcesandappliedunsupervisedclusteringtoidentifyfivesubtypesamongthiscohort.Descriptivestatisticsandsurvivalanalysisshowedthatthesubtypeswereclinicallydistinctandwereassociatedwithdifferentratesofdeath,cirrhosis,hepatocellularcarcinoma,chronickidneydisease,cardiovasculardisease,andmyocardialinfarction.Noveldiseasesubtypesidentifiedinthismannercouldbeusedtorisk-stratifypatientsandguidemanagement.
Page 16
8
ArtificialIntelligenceforEnhancingClinicalMedicine
MonitoringICUMortalityRiskwithALongShort-TermMemoryRecurrentNeuralNetwork
KeYu1,MingdaZhang2,TianyiCui2,MilosHauskrecht2
1IntelligentSystemsProgram,UniversityofPittsburgh;2DepartmentofComputerScience,
UniversityofPittsburghKeYuInintensivecareunits(ICU),mortalitypredictionisacriticalfactornotonlyforeffectivemedicalinterventionbutalsoforallocationofclinicalresources.Structuredelectronichealthrecords(EHR)containvaluableinformationforassessingmortalityriskinICUpatients,butcurrentmortalitypredictionmodelsusuallyrequirelaborioushuman-engineeredfeatures.Furthermore,substantialmissingdatainEHRisacommonproblemforboththeconstructionandimplementationofapredictionmodel.Inspiredbylanguage-relatedmodels,wedesignanewframeworkfordynamicmonitoringofpatients’mortalityrisk.Ourframeworkusesthebag-of-wordsrepresentationforallrelevantmedicaleventsbasedonmostrecenthistoryasinputs.Bydesign,itisrobusttomissingdatainEHRandcanbeeasilyimplementedasaninstantscoringsystemtomonitorthemedicaldevelopmentofallICUpatients.Specifically,ourmodeluseslatentsemanticanalysis(LSA)toencodethepatients’statesintolow-dimensionalembeddings,whicharefurtherfedtolongshort-termmemorynetworksformortalityriskprediction.Ourresultsshowthatthedeeplearningbasedframeworkperformsbetterthantheexistingseverityscoringsystem,SAPS-II.Weobservethatbidirectionallongshort-termmemorydemonstratessuperiorperformance,probablyduetothesuccessfulcaptureofbothforwardandbackwardtemporaldependencies.
Page 17
9
INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
Page 18
10
IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions
DisorderedFunctionConjunction:Onthein-silicofunctionannotationofintrinsicallydisorderedregions
SinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan
DepartmentofComputerScience,VirginiaCommonwealthUniversity,401WestMainStreet,Richmond,VA23284,USA
LukaszKurganIntrinsicallydisorderregions(IDRs)lackastablestructure,yetperformbiologicalfunctions.ThefunctionsofIDRsincludemediatinginteractionswithothermolecules,includingproteins,DNA,orRNAandentropicfunctions,includingdomainlinkers.Computationalpredictorsprovideresidue-levelindicationsoffunctionfordisorderedproteins,whichcontrastswiththeneedtofunctionallyannotatethethousandsofexperimentallyandcomputationallydiscoveredIDRs.Inthiswork,weinvestigatethefeasibilityofusingresidue-levelpredictionmethodsforregion-levelfunctionpredictions.Foraninitialexaminationofthemultiplefunctionregion-levelpredictionproblem,weconstructedadatasetof(likely)singlefunctionIDRsinproteinsthataredissimilartothetrainingdatasetsoftheresidue-levelfunctionpredictors.Wefindthatavailableresidue-levelpredictionmethodsareonlymodestlyusefulinpredictingmultipleregion-levelfunctions.Classificationisenhancedbysimultaneoususeofmultipleresidue-levelfunctionpredictionsandisfurtherimprovedbyinclusionofaminoacidscontentextractedfromtheproteinsequence.WeconcludethatmultifunctionpredictionforIDRsisfeasibleandbenefitsfromtheresultsproducedbycurrentresidue-levelfunctionpredictors,however,ithastoaccommodateinaccuracyinfunctionalannotations.
Page 19
11
IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions
DenovoensemblemodelingsuggeststhatAP2-bindingtodisorderedregionscanincreasestericvolumeofEpsinbutnotEps15
N.SuhasJagannathan1,ChristopherW.V.Hogue2,LisaTucker-Kellogg3
1Duke-NUSMedicalSchool;2NationalUniversityofSingapore,600EpicWayUnit345SanJoseCA95134;3Cancer&StemCellBiologyandCentreforComputationalBiologyDuke-NUSMedical
SchoolLisaTucker-KelloggProteinswithintrinsicallydisorderedregions(IDRs)havelargehydrodynamicradii,comparedwithglobularproteinsofequivalentweight.RecentexperimentsshowedthatIDRswithlargeradiicancreatestericpressuretodrivemembranecurvatureduringClathrin-mediatedendocytosis(CME).EpsinandEps15aretwoCMEproteinswithIDRsthatcontainmultiplemotifsforbindingtheadaptorproteinAP2,buttheimpactofAP2-bindingontheseIDRsisunknown.SomeIDRsacquirebinding-inducedfunctionbyformingafoldedquaternarystructure,butwehypothesizethattheIDRsofEpsinand/orEps15acquirebinding-inducedfunctionbyincreasingtheirstericvolume.WeexplorethishypothesisinsilicobygeneratingconformationalensemblesoftheIDRsofEpsin(4millionstructures)orEps15(3millionstructures),thenestimatingtheimpactofAP2-bindingonRadiusofGyration(RG).ResultsshowthattheensembleofEpsinIDRconformationsthataccommodateAP2bindinghasaright-shifteddistributionofRG(largerradii)thantheunboundEpsinensemble.Incontrast,theensembleofEps15IDRconformationshascomparableRGdistributionbetweenAP2-boundandunbound.WespeculatethatAP2triggerstheEpsinIDRtofunctionthroughbinding-induced-expansion,whichcouldincreasestericpressureandmembranebendingduringCME.
Page 20
12
IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions
Modulationofp53TransactivationDomainConformationsbyLigandBindingandCancer-AssociatedMutations
XiaorongLiu,JianhanChen
UniversityofMassachusettsAmherstJianhanChenIntrinsicallydisorderedproteins(IDPs)areimportantfunctionalproteins,andtheirderegulationarelinkedtonumeroushumandiseasesincludingcancers.Understandinghowdisease-associatedmutationsordrugmoleculescanperturbthesequence-disorderedensemble-function-diseaserelationshipofIDPsremainschallenging,becauseitrequiresdetailedcharacterizationoftheheterogeneousstructuralensemblesofIDPs.Inthiswork,wecombinethelatestatomisticforcefielda99SB-disp,enhancedsamplingtechniquereplicaexchangewithsolutetempering,andGPU-acceleratedmoleculardynamicssimulationstoinvestigatehowfourcancer-associatedmutations,K24N,N29K/N30D,D49Y,andW53G,andbindingofananti-cancermolecule,epigallocatechingallate(EGCG),modulatethedisorderedensembleofthetransactivationdomain(TAD)oftumorsuppressorp53.Throughextensivesampling,inexcessof1.0μsperreplica,well-convergedstructuralensemblesofwild-typeandmutantp53-TADaswellasWTp53-TADinthepresenceofEGCGweregenerated.Theresultsrevealthatmutantscouldinducelocalstructuralchangesandaffectsecondarystructuralproperties.Interestingly,bothEGCGbindingandN29K/N30Dcouldalsoinducelong-rangestructuralreorganizationsandleadtomorecompactstructuresthatcouldshieldkeybindingsitesofp53-TADregulators.FurtheranalysisrevealsthattheeffectsofEGCGbindingaremainlyachievedthroughnonspecificinteractions.Theseobservationsaregenerallyconsistentwithon-goingNMRstudiesandbindingassays.OurstudiessuggestthatinducedconformationalcollapseofIDPsmaybeageneralmechanismforshieldingfunctionalsites,thusinhibitingrecognitionoftheirtargets.Thecurrentstudyalsodemonstratesthatatomisticsimulationsprovideaviableapproachforstudyingthesequence-disorderedensemble-function-diseaserelationshipsofIDPsanddevelopingnewdrugdesignstrategiestargetingregulatoryIDPs.
Page 21
13
IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions
ExploringRelationshipsbetweentheDensityofChargedTractswithinDisorderedRegionsandPhaseSeparation
RamizSomjee1,2,DianaM.Mitrea1,RichardW.Kriwacki1,3
1St.JudeChildren'sResearchHospital;2RhodesCollege,3UniversityofTennesseeHealthSciences
CenterRamizSomjeeBiomolecularcondensatesformthroughaprocesstermedphaseseparationandplaydiverserolesthroughoutthecell.Proteinsthatundergophaseseparationoftenhavedisorderedregionsthatcanengageinweak,multivalentinteractions;however,ourunderstandingofthesequencegrammarthatdefineswhichproteinsphaseseparateisfarfromcomplete.Here,weshowthatproteinsthatdisplayahighdensityofchargedtractswithinintrinsicallydisorderedregionsarelikelytobeconstituentsofelectrostaticallyorganizedbiomolecularcondensates.WescoredthehumanproteomeusinganalgorithmtermedABTdensitythatquantifiesthedensityofchargedtractsandobservedthatproteinswithmorechargedtractsareenrichedinparticularGeneOntologyannotationsand,baseduponanalysisofinteractionnetworks,clusterintodistinctbiomolecularcondensates.Theseresultssuggestthatelectrostatically-driven,multivalentinteractionsinvolvingchargedtractswithindisorderedregionsservetoorganizecertainbiomolecularcondensatesthroughphaseseparation.
Page 22
14
MUTATIONALSIGNATURES
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
Page 23
15
MutationalSignatures
PhySigs:PhylogeneticInferenceofMutationalSignatureDynamics
SarahChristensen1,MarkD.M.Leiserson2,MohammedEl-Kebir1
1UniversityofIllinoisatUrbana-Champaign,2UniversityofMaryland
SarahChristensenDistinctmutationalprocessesshapethegenomesoftheclonescomprisingatumor.Theseprocessesresultindistinctmutationalpatterns,summarizedbyasmallnumberofmutationalsignatures.Currentanalysesofclone-specificexposurestomutationalsignaturesdonotfullyincorporateatumor’sevolutionarycontext,eitherinferringidenticalexposuresforalltumorclones,orinferringexposuresforeachcloneindependently.Here,weintroducetheTree-constrainedExposureproblemtoinferasmallnumberofexposureshiftsalongtheedgesofagiventumorphylogeny.Ouralgorithm,PhySigs,solvesthisproblemandincludesmodelselectiontoidentifythenumberofexposureshiftsthatbestexplainthedata.Wevalidateourapproachonsimulateddataandidentifyexposureshiftsinlungcancerdata,includingatleastoneshiftwithamatchingsubclonaldrivermutationinthemismatchrepairpathway.Moreover,weshowthatourapproachenablestheprioritizationofalternativephylogeniesinferredfromthesamesequencingdata.PhySigsispubliclyavailableathttps://github.com/elkebir-group/PhySigs
Page 24
16
MutationalSignatures
TrackSigFreq:subclonalreconstructionsbasedonmutationsignaturesandallelefrequencies
CaitlinF.Harrigan1,2,4,YuliaRubanova1,2,4,QuaidMorris1,2,3,4,5,6,AlinaSelega2,4
1DepartmentofComputerScience,UniversityofToronto,Toronto,Canada;2DonnellyCentreforCellularandBiomolecularResearch,UniversityofToronto,Toronto,Canada;3Departmentof
MolecularGenetics,UniversityofToronto,Toronto,Canada;4VectorInstitute,Toronto,Canada;5OntarioInstituteforCancerResearch,Toronto,Canada;6MemorialSloanKetteringCancer
Centre,NewYork,USA(pending)CaitHarriganMutationalsignaturesarepatternsofmutationtypes,manyofwhicharelinkedtoknownmutagenicprocesses.Signatureactivityrepresentstheproportionofmutationsasignaturegenerates.Incancer,cellsmaygainadvantageousphenotypesthroughmutationaccumulation,causingrapidgrowthofthatsubpopulationwithinthetumour.Thepresenceofmanysubclonescanmakecancershardertotreatandhaveotherclinicalimplications.Recon-structingchangesinsignatureactivitiescangiveinsightintotheevolutionofcellswithinatumour.Recently,weintroducedanewmethod,TrackSig,todetectchangesinsignatureactivitiesacrosstimefromsinglebulktumoursample.Bydesign,TrackSigisunabletoidentifymutationpopulationswithdifferentfrequenciesbutlittletonodifferenceinsignatureactivity.Herewepresentanextensionofthismethod,TrackSigFreq,whichenablestrajectoryreconstructionbasedonbothobserveddensityofmutationfrequenciesandchangesinmutationalsignatureactivities.TrackSigFreqpreservestheadvantagesofTrackSig,namelyoptimalandrapidmutationclusteringthroughsegmentation,whileextendingitsothatitcanidentifydistinctmutationpopulationsthatsharesimilarsignatureactivities.
Page 25
17
MutationalSignatures
DNARepairFootprintUncoversContributionofDNARepairMechanismtoMutationalSignatures
DamianWojtowicz1,MarkD.M.Leiserson2,RodedSharan3,TeresaM.Przytycka1
1NIH,2UniversityofMaryland,3TelAvivUniversityTeresaPrzytyckaCancergenomesaccumulatealargenumberofsomaticmutationsresultingfromimperfectionofDNAprocessingduringnormalcellcycleaswellasfromcarcinogenicexposuresorcancerrelatedaberrationsofDNAmaintenancemachinery.Theseprocessesoftenleadtodistinctivepatternsofmutations,calledmutationalsignatures.Severalcomputationalmethodshavebeendevelopedtouncoversuchsignaturesfromcatalogsofsomaticmutations.However,cancermutationalsignaturesaretheend-effectofseveralinterplayingfactorsincludingcarcinogenicexposuresandpotentialdeficienciesoftheDNArepairmechanism.Tofullyunderstandthenatureofeachsignature,itisimportanttodisambiguatetheatomiccomponentsthatcontributetothefinalsignature.Here,weintroduceanewdescriptorofmutationalsignatures,DNARepairFootPrint(RePrint),andshowthatitcancapturecommonpropertiesofdeficienciesinrepairmechanismscontributingtodiversesignatures.WevalidatethemethodwithpublishedmutationalsignaturesfromcelllinestargetedwithCRISPR-Cas9-basedknockoutsofDNArepairgenes.
Page 26
18
PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
Page 27
19
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
ClinicalConceptEmbeddingsLearnedfromMassiveSourcesofMultimodalMedicalData
AndrewL.Beam1,BenjaminKompa2,AllenSchmaltz1,InbarFried3,GriffinWeber2,NathanPalmer2,XuShi1,TianxiCai1,IsaacS.Kohane3
1HarvardT.H.ChanSchoolofPublicHealth,2HarvardMedicalSchool,3UniversityofNorth
CarolinaSchoolofMedicineBenjaminKompaWordembeddingsareapopularapproachtounsupervisedlearningofwordrelationshipsthatarewidelyusedinnaturallanguageprocessing.Inthisarticle,wepresentanewsetofembeddingsformedicalconceptslearnedusinganextremelylargecollectionofmultimodalmedicaldata.Leaningonrecenttheoreticalinsights,wedemonstratehowaninsuranceclaimsdatabaseof60millionmembers,acollectionof20millionclinicalnotes,and1.7millionfulltextbiomedicaljournalarticlescanbecombinedtoembedconceptsintoacommonspace,resultinginthelargesteversetofembeddingsfor108,477medicalconcepts.Toevaluateourapproach,wepresentanewbenchmarkmethodologybasedonstatisticalpowerspecificallydesignedtotestembeddingsofmedicalconcepts.Ourapproach,calledcui2vec,attainsstate-of-the-artperformancerelativetopreviousmethodsinmostinstances.Finally,weprovideadownloadablesetofpre-trainedembeddingsforotherresearcherstouse,aswellasanonlinetoolforinteractiveexplorationofthecui2vecembeddings.
Page 28
20
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
AssessmentofImputationMethodsforMissingGeneExpressionDatainMeta-AnalysisofDistinctCohortsofTuberculosisPatients
CarlyA.Bobak,LaurenMcDonnell,MatthewD.Nemesure,JustinLin,JaneE.Hill
DartmouthCollegeCarlyBobakThegrowthofpubliclyavailablerepositories,suchastheGeneExpressionOmnibus,hasallowedresearcherstoconductmeta-analysisofgeneexpressiondataacrossdistinctcohorts.Inthiswork,weassesseightimputationmethodsfortheirabilitytoimputegeneexpressiondatawhenvaluesaremissingacrossanentirecohortofTuberculosis(TB)patients.Weinvestigatehowvaryingproportionsofmissingdata(across10%,20%,and30%ofpatientsamples)influencetheimputationresults,andtestforsignificantlydifferentiallyexpressedgenesandenrichedpathwaysinpatientswithactiveTB.Ourresultsindicatethattruncatingtocommongenesobservedacrosscohorts,whichisthecurrentmethodusedbyresearchers,resultsintheexclusionofimportantbiologyandsuggestthatLASSOandLLSimputationmethodologiescanreasonablyimputegenesacrosscohortswhentotalmissingnessratesarebelow20%.
Page 29
21
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Towardsidentifyingdrugsideeffectsfromsocialmediausingactivelearningandcrowdsourcing
SophieBurkhardt,JuliaSiekiera,JosuaGlodde,MiguelA.Andrade-Navarro,StefanKramer
UniversityofMainzSophieBurkhardtMotivation:Socialmediaisalargelyuntappedsourceofinformationonsideeffectsofdrugs.Twitterinparticulariswidelyusedtoreportoneverydayeventsandpersonalailments.However,labelingthisnoisydataisadifficultproblembecauselabeledtrainingdataissparseandautomaticlabelingiserror-prone.Crowdsourcingcanhelpinsuchascenariotoobtainmorereliablelabels,butisexpensiveincomparisonbecauseworkershavetobepaid.Toremedythis,semi-supervisedactivelearningmayreducethenumberoflabeleddataneededandfocusthemanuallabelingprocessonimportantinformation.Results:WeextracteddatafromTwitterusingthepublicAPI.WesubsequentlyuseAmazonMechanicalTurkincombinationwithastate-of-the-artsemi-supervisedactivelearningmethodtolabeltweetswiththeirassociateddrugsandsideeffectsintwostages.Ourresultsshowthatourmethodisaneffectivewayofdiscoveringsideeffectsintweetswithanimprovementfrom53%F-measureto67%F-measureascomparedtoaonestageworkflow.Additionally,weshowtheeffectivenessoftheactivelearningschemeinreducingthelabelingcostincomparisontoanon-activebaseline.
Page 30
22
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
MicrovascularDynamicsfrom4DMicroscopyUsingTemporalSegmentation
ShirGur,LiorWolf,LiorGolgher,PabloBlinder
TelAvivUniversityLiorWolfRecentlydevelopedmethodsforrapidcontinuousvolumetrictwo-photonmicroscopyfacilitatetheobservationofneuronalactivityinhundredsofindividualneuronsandchangesinbloodflowinadjacentbloodvesselsacrossalargevolumeoflivingbrainatunprecedentedspatio-temporalresolution.However,thehighimagingratenecessitatesfullyautomatedimageanalysis,whereastissueturbidityandphoto-toxicitylimitationsleadtoextremelysparseandnoisyimagery.Inthiswork,weextendarecentlyproposeddeeplearningvolumetricbloodvesselsegmentationnetwork,suchthatitsupportstemporalanalysis.Withthistechnology,weareabletotrackchangesincerebralbloodvolumeovertimeandidentifyspontaneousarterialdilationsthatpropagatetowardsthepialsurface.Thisnewcapabilityisapromisingsteptowardscharacterizingthehemodynamicresponsefunctionuponwhichfunctionalmagneticresonanceimaging(fMRI)isbased.
Page 31
23
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
UsingTranscriptionalSignaturestoFindCancerDriverswithLURE
DavidHaan,RuikangTao,VerenaFriedl,IoannisN.Anastopoulos,ChristopherK.Wong,AlanaS.Weinstein,JoshuaM.Stuart
Dept.ofBiomolecularEngineeringandUCSantaCruzGenomicsInstitute,UniversityOf
CaliforniaSantaCruz,SantaCruz,CA95064USADavidHaanCancergenomeprojectshaveproducedmultidimensionaldatasetsonthousandsofsamples.Yet,dependingonthetumortype,5-50%ofsampleshavenoknowndrivingevent.Weintroduceasemi-supervisedmethodcalledLearningUnRealizedEvents(LURE)thatusesaprogressivelabellearningframeworkandminimumspanninganalysistopredictcancerdriversbasedontheiralteredsamplessharingageneexpressionsignaturewiththesamplesofaknownevent.WedemonstratetheutilityofthemethodontheTCGAPan-CancerAt-lasdatasetforwhichitproducedahigh-confidenceresultrelating59newconnectionsto18knownmutationeventsincludingalterationsinthesamegene,family,andpathway.WegiveexamplesofpredicteddriversinvolvedinTP53,telomeremaintenance,andMAPK/RTKsignalingpathways.LUREidentifiesconnectionsbetweengeneswithnoknownpriorrela-tionship,someofwhichmayoffercluesfortargetingspecificformsofcancer.CodeandSup-plementalMaterialareavailableontheLUREwebsite:https://sysbiowiki.soe.ucsc.edu/lure.
Page 32
24
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
PAGE-Net:InterpretableandIntegrativeDeepLearningforSurvivalAnalysisUsingHistopathologicalImagesandGenomicData
JieHao1,SaiChandraKosaraju2,NelsonZangeTsaku3,DaeHyunSong4,MingonKang2
1UniversityofPennsylvania,2UniversityofNevadaLasVegas,3KennesawStateUniversity,
4GyeongsangNationalUniversityChangwonHospitalJieHaoTheintegrationofmulti-modaldata,suchashistopathologicalimagesandgenomicdata,isessentialforunderstandingcancerheterogeneityandcomplexityforpersonalizedtreatments,aswellasforenhancingsurvivalpredictionsincancerstudy.Histopathology,asaclinicalgold-standardtoolfordiagnosisandprognosisincancers,allowsclinicianstomakeprecisedecisionsontherapies,whereashigh-throughputgenomicdatahavebeeninvestigatedtodissectthegeneticmechanismsofcancers.Weproposeabiologicallyinterpretabledeeplearningmodel(PAGE-Net)thatintegrateshistopathologicalimagesandgenomicdata,notonlytoimprovesurvivalprediction,butalsotoidentifygeneticandhistopathologicalpatternsthatcausedifferentsurvivalratesinpatients.PAGE-Netconsistsofpathology/genome/demography-specificlayers,eachofwhichprovidescomprehensivebiologicalinterpretation.Inparticular,weproposeanovelpatch-wisetexture-basedconvolutionalneuralnetwork,withapatchaggregationstrategy,toextractglobalsurvival-discriminativefeatures,withoutmanualannotationforthepathology-specificlayers.Weadaptedthepathway-basedsparsedeepneuralnetwork,namedCox-PASNet,forthegenome-specificlayers.TheproposeddeeplearningmodelwasassessedwiththehistopathologicalimagesandthegeneexpressiondataofGlioblastomaMultiforme(GBM)atTheCancerGenomeAtlas(TCGA)andTheCancerImagingArchive(TCIA).PAGE-NetachievedaC-indexof0.702,whichishigherthantheresultsachievedwithonlyhistopathologicalimages(0.509)andCox-PASNet(0.640).Moreimportantly,PAGE-Netcansimultaneouslyidentifyhistopathologicalandgenomicprognosticfactorsassociatedwithpatients’survivals.ThesourcecodeofPAGE-Netispubliclyavailableathttps://github.com/DataX-JieHao/PAGE-Net
Page 33
25
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Machinelearningalgorithmsforsimultaneoussuperviseddetectionofpeaksinmultiplesamplesandcelltypes
TobyDylanHocking1,GuillaumeBourque2
1NorthernArizonaUniversity,2McGillUniversity
TobyHockingJointpeakdetectionisacentralproblemwhencomparingsamplesinepigenomicdataanalysis,butcurrentalgorithmsforthistaskareunsupervisedandlimitedtoatmosttwosampletypes.WeproposePeakSegPipeline,anewgenome-widemulti-samplepeakcallingpipelineforepigenomicdatasets.Itperformspeakdetectionusingaconstrainedmaximumlikelihoodsegmentationmodelwithessentiallyonlyonefreeparameterthatneedstobetuned:thenumberofpeaks.Toselectthenumberofpeaks,weproposetolearnapenaltyfunctionbasedonuser-providedlabelsthatindicategenomicregionswithorwithoutpeaksinspecificsamples.Incomparisonswithstate-of-the-artpeakdetectionalgorithms,PeakSegPipelineachievessimilarorbetteraccuracy,andamoreinterpretablemodelwithoverlappingpeaksthatoccurinexactlythesamepositionsacrossallsamples.Ournovelapproachisabletolearnthatpredictedpeaksizesvarybyexperimenttype.
Page 34
26
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Graph-basedinformationdiffusionmethodforprioritizingfunctionallyrelatedgenesinprotein-proteininteractionnetworks
MinhPham,OlivierLichtarge
BaylorCollegeofMedicineMinhPhamShortestpathlengthmethodsareroutinelyusedtovalidatewhethergenesofinterestarefunctionallyrelatedtoeachotherbasedonbiologicalnetworkinformation.However,themethodsarecomputationallyintensive,impedingextensiveutilizationofnetworkinformation.Inaddition,non-weightedshortestpathlengthapproach,whichismorefrequentlyused,oftentreatallnetworkconnectionsequallywithouttakingintoaccountofconfidencelevelsoftheassociations.Ontheotherhand,graph-basedinformationdiffusionmethod,whichemploysboththepresenceandconfidenceweightsofnetworkedges,canefficientlyexplorelargenetworksandhaspreviouslydetectedmeaningfulbiologicalpatterns.Therefore,inthisstudy,wehypothesizedthatthegraph-basedinformationdiffusionmethodcouldprioritizegeneswithrelevantfunctionsmoreefficientlyandaccuratelythantheshortestpathlengthapproaches.Wedemonstratedthatthegraph-basedinformationdiffusionmethodsubstantiallydifferentiatednotonlygenesparticipatinginsamebiologicalpathways(p<<0.0001)butalsogenesassociatedwithspecifichumandrug-inducedclinicalsymptoms(p<<0.0001)fromrandom.Furthermore,thediffusionmethodprioritizedthesefunctionallyrelatedgenesfasterandmoreaccuratelythantheshortestpathlengthapproaches(pathways:p=2.7e-28,clinicalsymptoms:p=0.032).Thesedatashowthegraph-basedinformationdiffusionmethodcanberoutinelyusedforrobustprioritizationoffunctionallyrelatedgenes,facilitatingefficientnetworkvalidationandhypothesisgeneration,especiallyforhumanphenotype-specificgenes.
Page 35
27
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
ALiterature-BasedKnowledgeGraphEmbeddingMethodforIdentifyingDrugRepurposingOpportunitiesinRareDiseases
DanielN.Sosa,AlexanderDerry,MargaretGuo,EricWei,ConnorBrinton,RussB.Altman
StanfordUniversityDanielSosaMillionsofAmericansareaffectedbyrarediseases,manyofwhichhavepoorsurvivalrates.However,thesmallmarketsizeofindividualrarediseases,combinedwiththetimeandcapitalrequirementsofpharmaceuticalR&D,havehinderedthedevelopmentofnewdrugsforthesecases.Apromisingalternativeisdrugrepurposing,wherebyexistingFDA-approveddrugsmightbeusedtotreatdiseasesdifferentfromtheiroriginalindications.Inordertogeneratedrugrepurposinghypothesesinasystematicandcomprehensivefashion,itisessentialtointegrateinformationfromacrosstheliteratureofpharmacology,genetics,andpathology.Tothisend,weleverageanewlydevelopedknowledgegraph,theGlobalNetworkofBiomedicalRelationships(GNBR).GNBRisalarge,heterogeneousknowledgegraphcomprisingdrug,disease,andgene(orprotein)entitieslinkedbyasmallsetofsemantic“themes”derivedfromtheabstractsofbiomedicalliterature.Weapplyaknowledgegraphembeddingmethodthatexplicitlymodelstheuncertaintyassociatedwithliterature-derivedrelationshipsanduseslinkpredictiontogeneratedrugrepurposinghypotheses.Thisapproachachieveshighperformanceonagold-standardtestsetofknowndrugindications(AUROC=0.89)andiscapableofgeneratingnovelrepurposinghypotheses,whichweindependentlyvalidateusingexternalliteraturesourcesandproteininteractionnetworks.Finally,wedemonstratetheabilityofourmodeltoproduceexplanationsofitspredictions.
Page 36
28
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Two-stageMLClassifierforIdentifyingHostProteinTargetsoftheDengueProtease
JacobT.Stanley,AlisonR.Gilchrist,AlexC.Stabell,MaryA.Allen,SaraL.Sawyer,RobinD.Dowell
DepartmentofMolecular,CellularandDevelopmentalBiology;BioFrontiersInstitute;University
ofColoradoBoulder(allauthorshavethesameaffiliation)JacobStanleyFlavivirusessuchasdengueencodeaproteasethatisessentialforviralreplication.Theproteasefunctionsbycleavingwell-conservedpositionsintheviralpolyprotein.Inadditiontotheviralpolyprotein,thedengueproteasecleavesatleastonehostproteininvolvedinimmuneresponse.Thisraisesthequestion,whatotherhostproteinsaretargetedandcleaved?Herewepresentanewcomputationalmethodforidentifyingputativehostproteintargetsofthedenguevirusprotease.Ourmethodreliesonbiochemicalandsecondarystructurefeaturesattheknowncleavagesitesintheviralpolyproteininatwo-stageclassificationprocesstoidentifyputativecleavagetargets.Theaccuracyofourpredictionsscaledinverselywithevolutionarydistancewhenweappliedittotheknowncleavagesitesofseveralotherflaviviruses---agoodindicationofthevalidityofourpredictions.Ultimately,ourclassifieridentified257humanproteinsitespossessingbothasimilartargetmotifandaccessiblelocalstructure.Theseproteinsarepromisingcandidatesforfurtherinvestigation.Asthenumberofviralsequencesexpands,ourmethodcouldbeadoptedtopredicthosttargetsofotherflaviviruses.
Page 37
29
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
EnhancingModelInterpretabilityandAccuracyforDiseaseProgressionPredictionviaPhenotype-BasedPatientSimilarityLearning
YueWang1,TongWu1,2,YunlongWang1,GaoWang3
1IQVIAInc.,2UniversityofMinnesota,3UniversityofChicago
YueWangModelshavebeenproposedtoextracttemporalpatternsfromlongitudinalelectronichealthrecords(EHR)forclinicalpredictivemodels.However,thecommonrelationsamongpatients(e.g.,receivingthesamemedicaltreatments)wererarelyconsidered.Inthispaper,weproposetolearnpatientsimilarityfeaturesasphenotypesfromtheaggregatedpatient-medicalservicematrixusingnon-negativematrixfactorization.Onreal-worldmedicalclaimdata,weshowthatthelearnedphenotypesarecoherentwithineachgroup,andalsoexplanatoryandindicativeoftargeteddiseases.WeconductedexperimentstopredictthediagnosesforChronicLymphocyticLeukemia(CLL)patients.Resultsshowthatthephenotype-basedsimilarityfeaturescanimprovepredictionovermultiplebaselines,includinglogisticregression,randomforest,convolutionalneuralnetwork,andmore.
Page 38
30
PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
Page 39
31
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
IntegratedCancerSubtypingusingHeterogeneousGenome-ScaleMolecularDatasets
SuzanArslanturk1,SorinDraghici1,TinNguyen2
1WayneStateUniversity,2UniversityofNevada
SorinDraghiciVastrepositoriesofheterogeneousdatafromexistingsourcespresentuniqueopportunities.Takenindividually,eachofthedatasetsofferssolutionstoimportantdomainandsource-specificquestions.Collectively,theyrepresentcomplementaryviewsofrelateddataentitieswithanaggregateinformationvalueoftenwellexceedingthesumofitsparts.Integrationofheterogeneousdataisthereforeparamounttoi)obtainamoreunifiedpictureandcomprehensiveviewoftherelations,ii)achievemorerobustresults,iii)improvetheaccuracyandintegrity,andiv)illuminatethecomplexinteractionsamongdatafeatures.Inthispaper,wehaveproposedadataintegrationmethodologytoidentifysubtypesofcancerusingmultipledatatypes(mRNA,methylation,microRNAandsomaticvariants)anddifferentdatascalesthatcomefromdifferentplatforms(microarray,sequencing,etc.).TheCancerGenomeAtlas(TCGA)datasetisusedtobuildthedataintegrationandcancersubtypingframework.Theproposeddataintegrationanddiseasesubtypingapproachaccuratelyidentifiesnovelsubgroupsofpatientswithsignificantlydifferentsurvivalprofiles.Withcurrentavailabilityofvastgenomics,andvariantdataforcancer,theproposeddataintegrationsystemwillbetterdifferentiatecancerandpatientsubtypesforriskandoutcomepredictionandtargetedtreatmentplanningwithoutadditionalcostandpreciouslosttime.
Page 40
32
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Assessmentofcoverageforendogenousmetabolitesandexogenouschemicalcompoundsusinganuntargetedmetabolomicsplatform
SekWonKong1,CarlesHernandez-Ferrer2
1ComputationalHealthInformaticsProgram,BostonChildren’sHospital,300LongwoodAvenueBoston,MA02115,USA;2DepartmentofPediatrics,HarvardMedicalSchool,Boston,MA02115,
USASekWonKongPhysiologicalstatusandpathologicalchangesinanindividualcanbecapturedbymetabolicstatethatreflectstheinfluenceofbothgeneticvariantsandenvironmentalfactorssuchasdiet,lifestyleandgutmicrobiome.Thetotalityofenvironmentalexposurethroughoutlifetime–i.e.,exposome–isdifficulttomeasurewithcurrenttechnologies.However,targetedmeasurementofexogenouschemicalsanduntargetedprofilingofendogenousmetaboliteshavebeenwidelyusedtodiscoverbiomarkersofpathophysiologicchangesandtounderstandfunctionalimpactsofgeneticvariants.Toinvestigatethecoverageofchemicalspaceandinterindividualvariationrelatedtodemographicandpathologicalconditions,weprofiled169plasmasamplesusinganuntargetedmetabolomicsplatform.Onaverage,1,009metaboliteswerequantifiedineachindividual(range906–1,038)outof1,244totalchemicalcompoundsdetectedinourcohort.Ofnote,agewaspositivelycorrelatedwiththetotalnumberofdetectedmetabolitesinbothmalesandfemales.UsingtherobustQnestimator,wefoundmetaboliteoutliersineachsample(mean22,rangefrom7to86).Atotalof50metaboliteswereoutliersinapatientwithphenylketonuriaincludingtheonesknownforphenylalaninepathwaysuggestingmultiplemetabolicpathwaysperturbedinthispatient.Thelargestnumberofoutliers(N=86)wasfoundina5-year-oldboywithalpha-1-antitrypsindeficiencywhowerewaitingforlivertransplantationduetocirrhosis.Xenobioticsincludingdrugs,dietsandenvironmentalchemicalsweresignificantlycorrelatedwithdiverseendogenousmetabolitesandtheuseofantibioticssignificantlychangedgutmicrobialproductsdetectedinhostcirculation.Severalchallengessuchasannotationoffeatures,referencerangeandvarianceforeachfeatureperagegroupandgender,andpopulationscalereferencedatasetsneedtobeaddressed;however,untargetedmetabolomicscouldbeimmediatelydeployedasabiomarkerdiscoveryplatformandtoevaluatetheimpactofgenomicvariantsandexposuresonmetabolicpathwaysforsomediseases.
Page 41
33
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Coverageprofilecorrectionofshallow-depthcirculatingcell-freeDNAsequencingviamulti-distancelearning
NicholasB.Larson,MelissaC.Larson,JieNa,CarlosP.Sosa,ChenWang,Jean-PierreKocher,RossRowsey
MayoClinicCollegeofMedicineandSciences
NicholasLarsonShallow-depthwhole-genomesequencing(WGS)ofcirculatingcell-freeDNA(ccfDNA)isapopularapproachfornon-invasivegenomicscreeningassays,includingliquidbiopsyforearlydetectionofinvasivetumorsaswellasnon-invasiveprenatalscreening(NIPS)forcommonfetaltrisomies.IncontrasttonuclearDNAWGS,ccfDNAWGSexhibitsextensiveinter-andintra-samplecoveragevariabilitythatisnotfullyexplainedbytypicalsourcesofvariationinWGS,suchasGCcontent.Thisvariabilitymayinflatefalsepositiveandfalsenegativescreeningratesofcopy-numberalterationsandaneuploidy,particularlyifthesefeaturesarepresentatarelativelylowproportionoftotalsequencedcontent.Herein,weproposeanempirically-drivencoveragecorrectionstrategythatleveragespriorannotationinformationinamulti-distancelearningcontexttoimprovewithin-samplecoverageprofilecorrection.Specifically,wetrainaweightedk-nearestneighbors-stylemethodonnon-pregnantfemaledonorccfDNAWGSsamples,andapplyittoNIPSsamplestoevaluatecoverageprofilevariabilityreduction.Weadditionallycharacterizeimprovementinthediscriminationofpositivefetaltrisomycasesrelativetonormalcontrols,andcompareourresultsagainstamoretraditionalregression-basedapproachtoprofilecoveragecorrectionbasedonGCcontentandmappability.Undercross-validation,performancemeasuresindicatedbenefittocombiningthetwofeaturesetsrelativetoeitherinisolation.Wealsoobservedsubstantialimprovementincoverageprofilevariabilityreductioninleave-outclinicalNIPSsamples,withvariabilityreducedby26.5-53.5%relativetothestandardregression-basedmethodasquantifiedbymedianabsolutedeviation.Finally,weobservedimprovementdiscriminationforscreeningpositivetrisomycasesreducingccfDNAWGScoveragevariabilitywhileadditionallyimprovingNIPStrisomyscreeningassayperformance.Overall,ourresultsindicatethatmachinelearningapproachescansubstantiallyimproveccfDNAWGScoverageprofilecorrectionanddownstreamanalyses.
Page 42
34
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
PGxMine:TextminingforcurationofPharmGKB
JakeLever1,JuliaM.Barbarino2,LiGong2,RachelHuddart2,KatrinSangkuhl2,RyanWhaley2,MichelleWhirl-Carrillo2,MarkWoon2,TeriE.Klein2,3,RussB.Altman1,2,3
1DepartmentofBioengineering,StanfordUniversity,Stanford,CA,94305;2Departmentof
BiomedicalDataScience,StanfordUniversity,Stanford,CA,94305;3DepartmentofMedicine,StanfordUniversity,Stanford,CA,94305
JakeLeverPrecisionmedicinetailorstreatmenttoindividualspersonaldataincludingdifferencesintheirgenome.ThePharmacogenomicsKnowledgebase(PharmGKB)provideshighlycuratedinformationontheeffectofgeneticvariationondrugresponseandsideeffectsforawiderangeofdrugs.PharmGKB’sscientificcuratorstriage,reviewandannotatealargenumberofpaperseachyearbutthetaskischallenging.WepresentthePGxMineresource,atext-minedresourceofpharmacogenomicassociationsfromallaccessiblepublishedliteraturetoassistinthecurationofPharmGKB.Wedevelopedasupervisedmachinelearningpipelinetoextractassociationsbetweenavariant(DNAandproteinchanges,starallelesanddbSNPidentifiers)andachemical.PGxMinecovers452chemicalsand2,426variantsandcontains19,930mentionsofpharmacogenomicassociationsacross7,170papers.AnevaluationbyPharmGKBcuratorsfoundthat57ofthetop100associationsnotfoundinPharmGKBledto83curatablepapersandafurther24associationswouldlikelyleadtocuratablepapersthroughcitations.Theresultscanbeviewedathttps://pgxmine.pharmgkb.org/andcodecanbedownloadedathttps://github.com/jakelever/pgxmine.
Page 43
35
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Thepowerofdynamicsocialnetworkstopredictindividuals'mentalhealth
ShikangLiu1,DavidHachen1,OmarLizardo2,ChristianPoellabauer1,AaronStriegel1,TijanaMilenkovic1
1UniversityofNotreDame,2UniversityofCaliforniaLosAngeles
ShikangLiuPrecisionmedicinehasreceivedattentionbothinandoutsidetheclinic.Wefocusonthelatter,byexploitingtherelationshipbetweenindividuals'socialinteractionsandtheirmentalhealthtopredictone'slikelihoodofbeingdepressedoranxiousfromrichdynamicsocialnetworkdata.Existingstudiesdifferfromourworkinatleastoneaspect:theydonotmodelsocialinteractiondataasanetwork;theydosobutanalyzestaticnetworkdata;theyexamine''correlation''betweensocialnetworksandhealthbutwithoutmakinganypredictions;ortheystudyotherindividualtraitsbutnotmentalhealth.Inacomprehensiveevaluation,weshowthatourpredictivemodelthatusesdynamicsocialnetworkdataissuperiortoitsstaticnetworkaswellasnon-networkequivalentswhenrunonthesamedata.
Page 44
36
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
ImplementingaCloudBasedMethodforProtectedClinicalTrialDataSharing
GauravLuthria,QingboWang
HarvardUniversityGauravLuthriaClinicaltrialsgeneratealargeamountofdatathathavebeenunderutilizedduetoobstaclesthatpreventdatasharingincludingriskingpatientprivacy,datamisrepresentation,andinvalidsecondaryanalyses.Inordertoaddresstheseobstacles,wedevelopedanoveldatasharingmethodwhichensurespatientprivacywhilealsoprotectingtheinterestsofclinicaltrialinvestigators.Ourflexibleandrobustapproachinvolvestwocomponents:(1)anadvancedcloud-basedqueryinglanguagethatallowsuserstotesthypotheseswithoutdirectaccesstotherealclinicaltrialdataand(2)correspondingsyntheticdataforthequeryofinterestthatallowsforexploratoryresearchandmodeldevelopment.Bothcomponentscanbemodifiedbytheclinicaltrialinvestigatordependingonfactorssuchasthetypeoftrialornumberofpatientsenrolled.Totesttheeffectivenessofoursystem,wefirstimplementasimpleandrobustpermutationbasedsyntheticdatagenerator.Wethenusethesyntheticdatageneratorcoupledwithourqueryinglanguagetoidentifysignificantrelationshipsamongvariablesinarealisticclinicaltrialdataset.
Page 45
37
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Pathwayandnetworkembeddingmethodsforprioritizingpsychiatricdrugs
YashPershad1,MargaretGuo2,RussB.Altman3
1StanfordUniversityDepartmentofBioengineering,2StanfordUniversityBiomedicalInformatics
Program,3StanfordUniversityDepartmentsofBioengineering,Genetics,&MedicineYashPershad
OneinfiveAmericansexperiencementalillness,androughly75%ofpsychiatricprescriptionsdonotsuccessfullytreatthepatient’scondition.Extensiveevidenceimplicatesgeneticfactorsandsignalingdisruptioninthepathophysiologyofthesediseases.Changesintranscriptionoftenunderliethismolecularpathwaydysregulation;individualpatienttranscriptionaldatacanimprovetheefficacyofdiagnosisandtreatment.Recentlarge-scalegenomicstudieshaveuncoveredsharedgeneticmodulesacrossmultiplepsychiatricdisorders—providinganopportunityforanintegratedmulti-diseaseapproachfordiagnosis.Moreover,network-basedmodelsinformedbygeneexpressioncanrepresentpathologicalbiologicalmechanismsandsuggestnewgenesfordiagnosisandtreatment.Here,weusepatientgeneexpressiondatafrommultiplestudiestoclassifypsychiatricdiseases,integrateknowledgefromexpert-curateddatabasesandpubliclyavailableexperimentaldatatocreateaugmenteddisease-specificgenesets,andusethesetorecommenddisease-relevantdrugs.FromGeneExpressionOmnibus,weextractexpressiondatafrom145casesofschizophrenia,82casesofbipolardisorder,190casesofmajordepressivedisorder,and307sharedcontrols.Weusepathway-basedapproachestopredictpsychiatricdiseasediagnosiswitharandomforestmodel(78%accuracy)andderiveimportantfeaturestoaugmentavailabledruganddiseasesignatures.Usingprotein-protein-interactionnetworksandembedding-basedmethods,webuildapipelinetoprioritizetreatmentsforpsychiatricdiseasesthatachievesa3.4-foldimprovementoverabackgroundmodel.Thus,wedemonstratethatgene-expression-derivedpathwayfeaturescandiagnosepsychiatricdiseasesandthatmolecularinsightsderivedfromthisclassificationtaskcaninformtreatmentprioritizationforpsychiatricdiseases.
Page 46
38
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Robust-ODAL:Learningfromheterogeneoushealthsystemswithoutsharingpatient-leveldata
JiayiTong1,RuiDuan1,RuowangLi1,MartijnJ.Scheuemie2,JasonH.Moore1,YongChen1
1UniversityofPennsylvania,2JanssenResearchandDevelopmentLLC
JiayiTongElectronicHealthRecords(EHR)containextensivepatientdataonvarioushealthoutcomesandriskpredictors,providinganefficientandwide-reachingsourceforhealthresearch.IntegratedEHRdatacanprovidealargersamplesizeofthepopulationtoimproveestimationandpredictionaccuracy.Toovercometheobstacleofsharingpatient-leveldata,distributedalgorithmsweredevelopedtoconductstatisticalanalysesacrossmultipleclinicalsitesthroughsharingonlyaggregatedinformation.However,theheterogeneityofdataacrosssitesisoftenignoredbyexistingdistributedalgorithms,whichleadstosubstantialbiaswhenstudyingtheassociationbetweentheoutcomesandexposures.Inthisstudy,weproposeaprivacy-preservingandcommunication-efficientdistributedalgorithmwhichaccountsfortheheterogeneitycausedbyasmallnumberoftheclinicalsites.Weevaluatedouralgorithmthroughasystematicsimulationstudymotivatedbyreal-worldscenariosandappliedouralgorithmtomultipleclaimsdatasetsfromtheObservationalHealthDataSciencesandInformatics(OHDSI)network.TheresultsshowedthattheproposedmethodperformedbetterthantheexistingdistributedalgorithmODALandameta-analysismethod.
Page 47
39
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Computationallyefficient,exact,covariate-adjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks
JackWolf1,MarthaBarnard1,XuetingXia2,NathanRyder3,JasonWestra4,NathanTintle4
1St.OlafCollege,2TexasTechUniversity,3ColoradoStateUniversity,4DordtUniversity
NathanTintle
Thepopularizationofbiobanksprovidesanunprecedentedamountofgeneticandphenotypicinformationthatcanbeusedtoresearchtherelationshipbetweengeneticsandhumanhealth.Despitetheopportunitiesthesedatasetsprovide,theyalsoposemanyproblemsassociatedwithcomputationaltimeandcosts,datasizeandtransfer,andprivacyandsecurity.Thepublishingofsummarystatisticsfromthesebiobanks,andtheuseoftheminavarietyofdownstreamstatisticalanalyses,alleviatesmanyoftheselogisticalproblems.However,majorquestionsremainabouthowtousesummarystatisticsinallbutthesimplestdownstreamapplications.Here,wepresentanovelapproachtoutilizebasicsummarystatistics(estimatesfromsinglemarkerregressionsonsinglephenotypes)toevaluatemorecomplexphenotypesusingmultivariatemethods.Inparticular,wepresentacovariate-adjustedmethodforconductingprincipalcomponentanalysis(PCA)utilizingonlybiobanksummarystatistics.Wevalidateexactformulasforthismethod,aswellasprovideaframeworkofestimationwhenspecificsummarystatisticsarenotavailable,throughsimulation.Weapplyourmethodtoarealdatasetoffattyacidandgenomicdata.
Page 48
40
ARTIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
Page 49
41
ArtificialIntelligenceforEnhancingClinicalMedicine
MulticlassDiseaseClassificationfromMicrobialWhole-CommunityMetagenomes
SaadKhan,LibushaKelly
AlbertEinsteinCollegeofMedicineSaadKhanThemicrobiome,thecommunityofmicroorganismslivingwithinanindividual,isapromisingavenuefordevelopingnon-invasivemethodsfordiseasescreeninganddiagnosis.Here,weutilize5643aggregated,annotatedwhole-communitymetagenomestoimplementthefirstmulticlassmicrobiomediseaseclassifierofthisscale,abletodiscriminatebetween18differentdiseasesandhealthy.Wecomparedthreedifferentmachinelearningmodels:randomforests,deepneuralnets,andanovelgraphconvolutionalarchitecturewhichexploitsthegraphstructureofphylogenetictreesasitsinput.Weshowthatthegraphconvolutionalmodeloutperformsdeepneuralnetsintermsofaccuracy(achieving75%averagetest-setaccuracy),receiver-operator-characteristics(92.1%averagearea-under-ROC(AUC)),andprecision-recall(50%averagearea-under-precision-recall(AUPR)).Additionally,theconvolutionalnet'sperformancecomplementsthatoftherandomforest,showingalowerpropensityforType-Ierrors(false-positives)whiletherandomforestmakeslessType-IIerrors(false-negatives).Lastly,weareabletoachieveover90%averagetop-3accuracyacrossallofourmodels.Together,theseresultsindicatethattherearepredictive,disease-specificsignaturesacrossmicrobiomesthatcanbeusedfordiagnosticpurposes.
Page 50
42
ArtificialIntelligenceforEnhancingClinicalMedicine
LitGen:GeneticLiteratureRecommendationGuidedbyHumanExplanations
AllenNie1,ArturoL.Pineda1,MattW.Wright1,HannahWand1,BryanWulf1,HelioA.Costa1,RonakY.Patel2,CarlosD.Bustamante1,JamesZou1
1StanfordUniversity,2BaylorCollegeofMedicine
AllenNieAsgeneticsequencingcostsdecrease,thelackofclinicalinterpretationofvariantshasbecomethebottleneckinusinggeneticsdata.Amajorratelimitingstepinclinicalinterpretationisthemanualcurationofevidenceinthegeneticliteraturebyhighlytrainedbiocurators.Whatmakescurationparticularlytime-consumingisthatthecuratorneedstoidentifypapersthatstudyvariantpathogenicityusingdifferenttypesofapproachesandevidences---e.g.biochemicalassaysorcasecontrolanalysis.IncollaborationwiththeClinicalGenomicResource(ClinGen)---theflagshipNIHprogramforclinicalcuration---weproposethefirstmachinelearningsystem,LitGen,thatcanretrievepapersforaparticularvariantandfilterthembyspecificevidencetypesusedbycuratorstoassessforpathogenicity.LitGenusessemi-superviseddeeplearningtopredictthetypeofevi+denceprovidedbyeachpaper.ItistrainedonpapersannotatedbyClinGencuratorsandsystematicallyevaluatedonnewtestdatacollectedbyClinGen.LitGenfurtherleveragesrichhumanexplanationsandunlabeleddatatogain7.9%-12.6%relativeperformanceimprovementovermodelslearnedonlyontheannotatedpapers.Itisausefulframeworktoimproveclinicalvariantcuration.
Page 51
43
ArtificialIntelligenceforEnhancingClinicalMedicine
MultilevelSelf-AttentionModelanditsUseonMedicalRiskPrediction
XianlongZeng1,2,YunyiFeng1,2,SoheilMoosavinasab2,DeborahLin2,SimonLin2,ChangLiu1
1SchoolofElectricalEngineeringandComputerScience,OhioUniversity,Athens,OH,USA;2The
ResearchInstituteatNationwideChildren’sHospital,Columbus,OH,USAxianlongzengVariousdeeplearningmodelshavebeendevelopedfordifferenthealthcarepredictivetasksusingElectronicHealthRecordsandhaveshownpromisingperformance.Inthesemodels,medicalcodesareoftenaggregatedintovisitrepresentationwithoutconsideringtheirheterogeneity,e.g.,thesamediagnosismightimplydifferenthealthcareconcernswithdifferentproceduresormedications.Thenthevisitsareoftenfedintodeeplearningmodels,suchasrecurrentneuralnetworks,sequentiallywithoutconsideringtheirregulartemporalinformationanddependenciesamongvisits.Toaddresstheselimitations,wedevelopedaMultilevelSelf-AttentionModel(MSAM)thatcancapturetheunderlyingrelationshipsbetweenmedicalcodesandbetweenmedicalvisits.WecomparedMSAMwithvariousbaselinemodelsontwopredictivetasks,i.e.,futurediseasepredictionandfuturemedicalcostprediction,withtwolargedatasets,i.e.,MIMIC-3andPFK.Intheexperiments,MSAMconsistentlyoutperformedbaselinemodels.Additionally,forfuturemedicalcostprediction,weuseddiseasepredictionasanauxiliarytask,whichnotonlyguidesthemodeltoachieveastrongerandmorestablefinancialprediction,butalsoallowsmanagedcareorganizationstoprovideabettercarecoordination.
Page 52
44
ArtificialIntelligenceforEnhancingClinicalMedicine
IdentifyingTransitionalHighCostUsersfromUnstructuredPatientProfilesWrittenbyPrimaryCarePhysicians
HaoranZhang1,2,3,ElisaCandido3,AndrewS.Wilton3,RaquelDuchen3,LiisaJaakkimainen3,WalterWodchis3,4,5,QuaidMorris1,2,6,7
1DepartmentofComputerScience,UniversityofToronto;2VectorInstituteforArtificial
Intelligence,Toronto,Ontario,Canada;3ICES,Toronto,Ontario,Canada;4InstituteofHealthPolicy,Management,andEvaluation,UniversityofToronto;5InstituteforBetterHealth,Trillium
HealthPartners,Mississauga,Ontario,Canada;6TerrenceDonnellyCenterforCellularandBiomolecularResearch,UniversityofToronto;7DepartmentofMolecularGenetics,Universityof
TorontoHaoranZhangIdentificationandsubsequentinterventionofpatientsatriskofbecomingHighCostUsers(HCUs)presentstheopportunitytoimproveoutcomeswhilealsoprovidingsignificantsavingsforthehealthcaresystem.Inthispaper,the2016HCUstatusofpatientswaspredictedusingfree-formtextdatafromthe2015cumulativepatientprofileswithintheelectronicmedicalrecordsoffamilycarepracticesinOntario.Theseunstructurednotesmakesubstantialuseofdomain-specificspellingsandabbreviations;weshowthatwordembeddingsderivedfromthesamecontextprovidemoreinformativefeaturesthanpre-trainedonesbasedonWikipedia,MIMIC,andPubmed.Wefurtherdemonstratethatamodelusingfeaturesderivedfromaggregatedwordembeddings(EmbEncode)providesasignificantperformanceimprovementoverthebag-of-wordsrepresentation(82.48±0.35%versus81.85±0.36%held-outAUROC,p=3.2E-4),usingfarfewerinputfeatures(5,492versus214,750)andfewernon-zerocoefficients(1,177versus4,284).ThefutureHCUsofgreatestinterestarethetransitionaloneswhoarenotalreadyHCUs,becausetheyprovidethegreatestscopeforinterventions.PredictingthesenewHCUischallengingbecausemostHCUsrecur.WeshowthatremovingrecurrentHCUsfromthetrainingsetimprovestheabilityofEmbEncodetopredictnewHCUs,whileonlyslightlydecreasingitsabilitytopredictrecurrentones.
Page 53
45
ArtificialIntelligenceforEnhancingClinicalMedicine
Obtainingdual-energycomputedtomography(CT)informationfromasingle-energyCTimageforquantitativeimaginganalysisoflivingsubjectsbyusingdeeplearning
WeiZhao1,TianlingLv2,RenaLee3,YangChen2,LeiXing1
1StanfordUniversity,2SoutheastUniversity,3EhwaWomensUniversity
LeiXingComputedtomographic(CT)isafundamentalimagingmodalitytogeneratecross-sectionalviewsofinternalanatomyinalivingsubjectorinterrogatematerialcompositionofanobject,andithasbeenroutinelyusedinclinicalapplicationsandnondestructivetesting.InastandardCTimage,pixelshavingthesameHounsfieldUnits(HU)cancorrespondtodifferentmaterials,anditisthereforechallengingtodifferentiateandquantifymaterials.Dual-energyCT(DECT)isdesirabletodifferentiatemultiplematerials,butthecostlyDECTscannersarenotwidelyavailableassingle-energyCT(SECT)scanners.Recentadvancementindeeplearningprovidesanenablingtooltomapimagesbetweendifferentmodalitieswithincorporatedpriorknowledge.HerewedevelopadeeplearningapproachtoperformDECTimagingbyusingthestandardSECTdata.Theendpointoftheapproachisamodelcapableofprovidingthehigh-energyCTimageforagiveninputlow-energyCTimage.Thefeasibilityofthedeeplearning-basedDECTimagingmethodusingaSECTdataisdemonstratedusingcontrast-enhancedDECTimagesandevaluatedusingclinicalrelevantindexes.ThisworkopensnewopportunitiesfornumerousDECTclinicalapplicationswithastandardSECTdataandmayenablesignificantlysimplifiedhardwaredesign,scanningdose,andimagecostreductionforfutureDECTsystems.
Page 54
46
INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
Page 55
47
IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions
Many-to-onebindingbyintrinsicallydisorderedproteinregions
Wei-LunAlterovitz1*,EshelFaraggi1,2,3*,ChristopherJ.Oldfield1,JingweiMeng1,BinXue1,FeiHuang1,PedroRomero1,AndrzejKloczkowski2,VladimirN.Uversky1,A.KeithDunker1
1CenterforComputationalBiologyandBioinformatics,DepartmentofBiochemistryand
MolecularBiology,IndianaUniversitySchoolofMedicine,410W.10thSt,HS5000,Indianapolis,IN46202,USA([email protected] );2BattelleCenterforMathematicalMedicine,andthe
NationwideChildren’sHospital,DepartmentofPediatrics,TheOhioStateUniversity,Columbus,OH43210,USA;3ResearchandInformationSystems,LLC,1620E.72ndSt.Indianapolis,IN
46240USA*Contributedequally([email protected] ,[email protected] )
KeithDunkerDisorderedbindingregions(DBRs),whichareembeddedwithinintrinsicallydisorderedproteinsorregions(IDPsorIDRs),enableIDPsorIDRstomediatemultipleprotein-proteininteractions.DBR-proteincomplexeswerecollectedfromtheProteinDataBankforwhichtwoormoreDBRshavingdifferentaminoacidsequencesbindtothesame(100%sequenceidentical)globularproteinpartner,atypeofinteractionhereincalledmany-to-onebinding.Twodistinctbindingprofileswereidentified:independentandoverlapping.Fortheoverlappingbindingprofiles,thedistinctDBRsinteractbymeansofalmostidenticalbindingsites(hereincalled“similar”),orthebindingsitescontainbothcommonanddivergentinteractionresidues(hereincalled“intersecting”).FurtheranalysisofthesequenceandstructuraldifferencesamongthesethreegroupsindicatehowIDPflexibilityallowsdifferentsegmentstoadjusttosimilar,intersecting,andindependentbindingpockets.
Page 56
48
MUTATIONALSIGNATURES
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
Page 57
49
MutationalSignatures
ImpactofmutationalsignaturesonmicroRNAandtheirresponseelements
EiriniStamoulakatou1,PietroPinoli1,StefanoCeri1,RosarioPiro2
1PolitecnicodiMilano,2FreieUniversitatBerlin
EiriniStamoulakatouMicroRNAsareaclassofsmallnon-codingRNAmoleculeswithgreatimportanceforregulatingalargenumberofdiversebiologicalprocessesinhealthanddisease,mostlybybindingtocomplementarymicroRNAresponseelements(MREs)onprotein-codingmessengerRNAsandothernon-codingRNAsandsubsequentlyinducingtheirdegradation.AgrowingbodyofevidenceindicatesthatthedysregulationofcertainmicroRNAsmayeitherdriveorsuppressoncogenesis.TheseedregionofamicroRNAisofcrucialimportanceforitstargetrecognition.MutationsintheseseedregionsmaydisruptthebindingofmicroRNAstotheirtargetgenes.Inthisstudy,weinvestigatethetheoreticalimpactofcancer-associatedmutagenicprocessesandtheirmutationalsignaturesonmicroRNAseedsandtheirMREs.Toourknowledge,thisisthefirststudywhichprovidesaprobabilisticframeworkformicroRNAandMREsequencealterationanalysisbasedonmutationalsignaturesandcomputationallyassessingthedisruptiveimpactofmutationalsignaturesonhumanmicroRNA–targetinteractions.
Page 58
50
MutationalSignatures
GenomeGerrymandering:optimaldivisonofthegenomeintoregionswithcancertypespecificdifferencesinmutationrates
AdamoYoung,JacobChmura,YoonsikPark,QuaidMorris,GurnitAtwal
UniversityofTorontoAdamoYoungTheactivityofmutationalprocessesdiffersacrossthegenome,andisinfluencedbychromatinstateandspatialgenomeorganization.Atthescaleofonemegabase-pair(Mb),regionalmutationdensitycorrelatestronglywithchromatinfeaturesandmutationdensityatthisscalecanbeusedtoaccuratelyidentifycancertype.Here,weexploretherelationshipbetweengenomicregionandmutationratebydevelopinganinformationtheorydriven,dynamicprogrammingalgorithmfordividingthegenomeintoregionswithdifferingrelativemutationratesbetweencancertypes.Ouralgorithmimprovesmutualinformationwhencomparedtothenaiveapproach,effectivelyreducingtheaveragenumberofmutationsrequiredtoidentifycancertype.Ourapproachprovidesanefficientmethodforassociatingregionalmutationdensitywithmutationlabels,andhasfutureapplicationsinexploringtheroleofsomaticmutationsinanumberofdiseases.
Page 59
51
PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
Page 60
52
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
LearningaLatentSpaceofHighlyMultidimensionalCancerData
BenjaminKompa1,BeauCoker2
1HarvardMedicalSchool,2HarvardSchoolofPublicHealth
BenjaminKompaWeintroduceaUnifiedDisentanglementNetwork(UFDN)trainedonTheCancerGenomeAtlas(TCGA),whichwerefertoasUFDN-TCGA.WedemonstratethatUFDN-TCGAlearnsabiologicallyrelevant,low-dimensionallatentspaceofhigh-dimensionalgeneexpressiondatabyapplyingournetworktotwoclassificationtasksofcancerstatusandcancertype.UFDN-TCGAperformscomparablytorandomforestmethods.TheUFDNallowsforcontinuous,partialinterpolationbetweendistinctcancertypes.Furthermore,weperformananalysisofdifferentiallyexpressedgenesbetweenskincutaneousmelanoma(SKCM)samplesandthesamesamplesinterpolatedintoglioblastoma(GBM).Wedemonstratethatourinterpolationsconsistofrelevantmetagenesthatrecapitulateknownglioblastomamechanisms.
Page 61
53
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
ScalingstructurallearningwithNO-BEARStoinfercausaltranscriptomenetworks
Hao-ChihLee1,3,MatteoDanieletto1,2,3,RiccardoMiotto1,2,3,SarahT.Cherng1,3,JoelT.Dudley1,2,3
1InstituteforNextGenerationHealthcare,2HassoPlattnerInstituteforDigitalHealth,
3DepartmentofGeneticsandGenomicSciencesIcahnSchoolofMedicineatMountSinaiNewYork,NY10065,USA
Hao-ChihLeeConstructinggeneregulatorynetworksisacriticalstepinrevealingdiseasemechanismsfromtranscriptomicdata.Inthiswork,wepresentNO-BEARS,anovelalgorithmforestimatinggeneregulatorynetworks.TheNO-BEARSalgorithmisbuiltonthebasisoftheNO-TEARSalgorithmwithtwoimprovements.First,weproposeanewconstraintanditsfastapproximationtoreducethecomputationalcostoftheNO-TEARSalgorithm.Next,weintroduceapolynomialregressionlosstohandlenon-linearityingeneexpressions.OurimplementationutilizesmodernGPUcomputationthatcandecreasethetimeofhours-longCPUcomputationtoseconds.Usingsyntheticdata,wedemonstrateimprovedperformance,bothinprocessingtimeandaccuracy,oninferringgeneregulatorynetworksfromgeneexpressiondata.
Page 62
54
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
PathFlowAI:AHigh-ThroughputWorkflowforPreprocessing,DeepLearningandInterpretationinDigitalPathology
JoshuaJ.Levy1,LucasA.Salas1,BrockC.Christensen1,AravindhanSriharan2,LouisJ.Vaickus2
1GeiselSchoolofMedicineatDartmouth,2DartmouthHitchcockMedicalCenter
JoshuaLevyThediagnosisofdiseaseoftenrequiresanalysisofabiopsy.Manydiagnosesdependnotonlyonthepresenceofcertainfeaturesbutontheirlocationwithinthetissue.Recently,anumberofdeeplearningdiagnosticaidshavebeendevelopedtoclassifydigitizedbiopsyslides.Clinicalworkflowsofteninvolveprocessingofmorethan500slidesperday.But,clinicaluseofdeeplearningdiagnosticaidswouldrequireapreprocessingworkflowthatiscost-effective,flexible,scalable,rapid,interpretable,andtransparent.Here,wepresentsuchaworkflow,optimizedusingDaskandmixedprecisiontrainingviaAPEX,capableofhandlinganypatch-levelorslidelevelclassificationandpredictionproblem.Theworkflowusesaflexibleandfastpreprocessinganddeeplearninganalyticspipeline,incorporatesmodelinterpretationandhasahighlystorage-efficientaudittrail.Wedemonstratetheutilityofthispackageontheanalysisofaprototypicalanatomicpathologyspecimen,liverbiopsiesforevaluationofhepatitisfromaprospectivecohort.ThepreliminarydataindicatethatPathFlowAImaybecomeacost-effectiveandtime-efficienttoolforclinicaluseofArtificialIntelligence(AI)algorithms.
Page 63
55
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Improvingsurvivalpredictionusinganovelfeatureselectionandfeaturereductionframeworkbasedontheintegrationofclinicalandmoleculardata*
LisaNeums,RichardMeier,DevinC.Koestler,JeffreyA.Thompson
DepartmentofBiostatisticsandDataScience,UniversityofKansasMedicalCenter,andUniversityofKansasCancerCenter
LisaNeumsTheaccuratepredictionofacancerpatient’sriskofprogressionordeathcanguidecliniciansintheselectionoftreatmentandhelppatientsinplanningpersonalaffairs.Predictivemodelsbasedonpatient-leveldatarepresentatoolfordeterminingrisk.Ideally,predictivemodelswillusemultiplesourcesofdata(e.g.,clinical,demographic,molecular,etc.).However,therearemanychallengesassociatedwithdataintegration,suchasoverfittingandredundantfeatures.Inthispaperweaimtoaddressthosechallengesthroughthedevelopmentofanovelfeatureselectionandfeaturereductionframeworkthatcanhandlecorrelateddata.Ourmethodbeginsbycomputingasurvivaldistancescoreforgeneexpression,whichincombinationwithascoreforclinicalindependence,resultsintheselectionofhighlypredictivegenesthatarenon-redundantwithclinicalfeatures.Thesurvivaldistancescoreisameasureofvariationofgeneexpressionovertime,weightedbythevarianceofthegeneexpressionoverallpatients.Selectedgenes,incombinationwithclinicaldata,areusedtobuildapredictivemodelforsurvival.Webenchmarkourapproachagainstcommonlyusedmethods,namelylasso-aswellasridge-penalizedCoxproportionalhazardsmodels,usingthreepubliclyavailablecancerdatasets:kidneycancer(521samples),lungcancer(454samples)andbladdercancer(335samples).Acrossalldatasets,ourapproachbuiltonthetrainingsetoutperformedtheclinicaldataaloneinthetestsetintermsofpredictivepowerwithac.Indexof0.773vs0.755forkidneycancer,0.695vs0.664forlungcancerand0.648vs0.636forbladdercancer.Further,wewereabletoshowincreasedpredictiveperformanceofourmethodcomparedtolasso-penalizedmodelsfittobothgeneexpressionandclinicaldata,whichhadac.Indexof0.767,0.677,and0.645,aswellasincreasedorcomparablepredictivepowercomparedtoridgemodels,whichhadac.Indexof0.773,0.668and0.650forthekidney,lung,andbladdercancerdatasets,respectively.Therefore,ourscoreforclinicalindependenceimprovesprognosticperformanceascomparedtomodelingapproachesthatdonotconsidercombiningnon-redundantdata.Futureworkwillconcentrateonoptimizingthesurvivaldistancescoreinordertoachieveimprovedresultsforalltypesofcancer.
Page 64
56
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Bayesiansemi-nonnegativematrixtri-factorizationtoidentifypathwaysassociatedwithcancerphenotypes
SunhoPark1,NabhonilKar1,Jae-HoCheong2,TaeHyunHwang1
1ClevelandClinic,2YonseiUniversityCollegeofMedicine
SunhoParkAccurateidentificationofpathwaysassociatedwithcancerphenotypes(e.g.,cancersubtypesandtreatmentoutcome)couldleadtodiscoveringreliableprognosticand/orpredictivebiomarkersforbetterpatientsstratificationandtreatmentguidance.Inourpreviouswork,wehaveshownthatnon-negativematrixtri-factorization(NMTF)canbesuccessfullyappliedtoidentifypathwaysassociatedwithspecificcancertypesordiseaseclassesasaprognosticandpredictivebiomarker.However,onekeylimitationofnon-negativefactorizationmethods,includingvariousnon-negativebi-factorizationmethods,istheirlimitedabilitytohandlenegativeinputdata.Forexample,manymoleculardatathatconsistofreal-valuescontainingbothpositiveandnegativevalues(e.g.,normalized/logtransformedgeneexpressiondatawherenegativevaluerepresentsdown-regulatedexpressionofgenes)arenotsuitableinputforthesealgorithms.Inaddition,mostpreviousmethodsprovidejustasinglepointestimateandhencecannotdealwithuncertaintyeffectively.Toaddresstheselimitations,weproposeaBayesiansemi-nonnegativematrixtri-factorizationmethodtoidentifypathwaysassociatedwithcancerphenotypesfromareal-valuedinputmatrix,e.g.,geneexpressionvalues.Motivatedbysemi-nonnegativefactorization,weallowoneofthefactormatrices,thecentroidmatrix,tobereal-valuedsothateachcentroidcanexpresseithertheup-ordown-regulationofthemembergenesinapathway.Inaddition,weplacestructuredspike-and-slabpriors(whichareencodedwiththepathwaysandagene-geneinteraction(GGI)network)onthecentroidmatrixsothatevenasetofgenesthatisnotinitiallycontainedinthepathways(duetotheincompletenessofthecurrentpathwaydatabase)canbeinvolvedinthefactorizationinastochasticwayspecifically,ifthosegenesareconnectedtothemembergenesofthepathwaysontheGGInetwork.Wealsopresentupdaterulesfortheposteriordistributionsintheframeworkofvariationalinference.AsafullBayesianmethod,ourproposedmethodhasseveraladvantagesoverthecurrentNMTFmethods,whicharedemonstratedusingsyntheticdatasetsinexperiments.UsingtheTheCancerGenomeAtlas(TCGA)gastriccancerandmetastaticgastriccancerimmunotherapyclinical-trialdatasets,weshowthatourmethodcouldidentifybiologicallyandclinicallyrelevantpathwaysassociatedwiththemolecularsubtypesandimmunotherapyresponse,respectively.Finally,weshowthatthosepathwaysidentifiedbytheproposedmethodcouldbeusedasprognosticbiomarkerstostratifypatientswithdistinctsurvivaloutcomeintwoindependentvalidationdatasets.Additionalinformationandcodescanbefoundathttps://github.com/parks-cs-ccf/BayesianSNMTF.
Page 65
57
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Tree-WeightingforMulti-StudyEnsembleLearners
MayaRamchandran1,PrasadPatil1,2,GiovanniParmigiani1,2
1DepartmentofBiostatistics,HarvardT.H.ChanSchoolofPublicHealth;Departmentof
Biostatistics,HarvardT.H.ChanSchoolofPublicHealth;2DepartmentofDataSciences,Dana-FarberCancerInstitute
MayaRamchandranMulti-studylearningusesmultipletrainingstudies,separatelytrainsclassifiersoneach,andformsanensemblewithweightsrewardingmemberswithbettercross-studypredictionability.Thisarticleconsidersnovelweightingapproachesforconstructingtree-basedensemblelearnersinthissetting.UsingRandomForestsasasingle-studylearner,wecompareweightingeachforesttoformtheensemble,toextractingtheindividualtreestrainedbyeachRandomForestandweightingthemdirectly.Wefindthatincorporatingmultiplelayersofensemblinginthetrainingprocessbyweightingtreesincreasestherobustnessoftheresultingpredictor.Furthermore,weexplorehowensemblingweightscorrespondtotreestructure,toshedlightonthefeaturesthatdeterminewhetherweightingtreesdirectlyisadvantageous.Finally,weapplyourapproachtogenomicdatasetsandshowthatweightingtreesimprovesuponthebasicmulti-studylearningparadigm.Codeandsupplementarymaterialareavailableathttps://github.com/m-ramchandran/tree-weighting.
Page 66
58
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
PTRExplorer:AnapproachtoidentifyandexplorePostTranscriptionalRegulatorymechanismsusingproteogenomics
ArunimaSrivastava1,MichaelSharpnack1,KunHuang2,ParagMallick3,RaghuMachiraju1
1TheOhioStateUniversity,2IndianaUniversitySchoolofMedicine,3StanfordUniversity
ArunimaSrivastavaIntegrationoftranscriptomicandproteomicdatashouldrevealmulti-layeredregulatoryprocessesgoverningcancercellbehaviors.Traditionalcorrelation-basedanalyseshavedemonstratedlimitedabilitytoidentifythepost-transcriptionalregulatory(PTR)processesthatdrivethenon-linearrelationshipbetweentranscriptandproteinabundances.Inthiswork,weideateanintegrativeapproachtoexplorethevarietyofpost-transcriptionalmechanismsthatdictaterelationshipsbetweengenesandcorrespondingproteins.Theproposedworkflowutilizestheintuitivetechniqueofscatterplotdiagnosticsorscagnostics,tocharacterizeandexaminethediversescatterplotsbuiltfromtranscriptandproteinabundancesinaproteogenomicexperiment.Theworkflowincludesrepresentinggene-proteinrelationshipsasscatterplots,clusteringongeometricscagnosticfeaturesofthesescatterplots,andfinallyidentifyingandgroupingthepotentialgene-proteinrelationshipsaccordingtotheirdispositiontovariousPTRmechanisms.Ourstudyverifiestheefficacyoftheimplementedapproachtoexcavatepossibleregulatorymechanismsbyutilizingcomprehensivetestsonasyntheticdataset.Wealsoproposeavarietyof2Dpattern-specificdownstreamanalysesmethodologiessuchasmixturemodeling,andmappingmiRNApost-transcriptionaleffectstoexploreeachmechanismfurther.Thisworksuggeststhattheproposedmethodologyhasthepotentialfordiscoveringandcategorizingpost-transcriptionalregulatorymechanisms,manifestinginproteogenomictrends.Thesetrendssubsequentlyprovideevidenceforcancerspecificity,miRNAtargeting,andidentificationofregulationimpactedbybiologicalfunctionalityanddifferenttypesofdegradation.
Page 67
59
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
NetworkRepresentationofLarge-ScaleHeterogeneousRNASequenceswithIntegrationofDiverseMulti-omics,Interactions,andAnnotationsData
NhatTran,JeanGao
TheUniversityofTexasatArlingtonJeanGaoLongnon-codingRNA(lncRNA),microRNA,andmessengerRNAenablekeyregulationsofvariousbiologicalprocessesthroughavarietyofdiverseinteractionmechanisms.Identifyingtheinteractionsandcross-talkbetweentheseheterogeneousRNAclassesisessentialinordertouncoverthefunctionalroleofindividualRNAtranscripts,especiallyforunannotatedandsparselydiscoveredRNAsequenceswithnoknowninteractions.Recently,sequence-baseddeeplearningandnetworkembeddingmethodsaregainingtractionashigh-performingandflexibleapproachesthatcaneitherpredictRNA-RNAinteractionsfromsequenceorinfermissinginteractionsfrompatternsthatmayexistinthenetworktopology.However,mostofthecurrentmethodshaveseverallimitations,e.g.,theinabilitytoperforminductivepredictions,todistinguishthedirectionalityofinteractions,ortointegratevarioussequence,interaction,expression,andgenomicannotationdatasets.Weproposedanoveldeeplearningframework,rna2rna,whichlearnsfromRNAsequencestoproducealow-dimensionalembeddingthatpreservesproximitiesinboththeinteractiontopologyandthefunctionalaffinitytopology.Inthisproposedembeddingspace,thetwo-part"sourceandtargetcontexts"capturethereceptivefieldsofeachRNAtranscripttoencapsulateheterogeneouscross-talkinteractionsbetweenlncRNAsandmicroRNAs.TheproximitybetweenRNAsinthisembeddingspacealsouncoversthesecond-orderrelationshipsthatallowforaccurateinferenceofnoveldirectedinteractionsorfunctionalsimilaritiesbetweenanytwoRNAsequences.Inaprospectiveevaluation,ourmethodexhibitssuperiorperformancecomparedtostate-of-artapproachesatpredictingmissinginteractionsfromseveralRNA-RNAinteractiondatabases.AdditionalresultssuggestthatourproposedframeworkcancaptureamanifoldforheterogeneousRNAsequencestodiscovernovelfunctionalannotations.
Page 68
60
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
HadoopandPySparkforreproducibilityandscalabilityofgenomicsequencingstudies
NicholasR.Wheeler1,PenelopeBenchek1,BrianW.Kunkle2,KaraL.Hamilton-Nelson2,MikeWarfe1,JeremyR.Fondran1,JonathanL.Haines1,WilliamS.Bush1
1CaseWesternReserveUniversity,2UniversityofMiami
WilliamBushModerngenomicstudiesarerapidlygrowinginscale,andtheanalyticalapproachesusedtoanalyzegenomicdataareincreasingincomplexity.Genomicdatamanagementposeslogisticandcomputationalchallenges,andanalysesareincreasinglyreliantongenomicannotationresourcesthatcreatetheirowndatamanagementandversioningissues.Asaresult,genomicdatasetsareincreasinglyhandledinwaysthatlimittherigorandreproducibilityofmanyanalyses.Inthiswork,weexaminetheuseoftheSparkinfrastructureforthemanagement,access,andanalysisofgenomicdataincomparisontotraditionalgenomicworkflowsontypicalclusterenvironments.WevalidatetheframeworkbyreproducingpreviouslypublishedresultsfromtheAlzheimer’sDiseaseSequencingProject.UsingtheframeworkandanalysesdesignedusingJupyternotebooks,Sparkprovidesimprovedworkflows,reducesuser-drivendatapartitioning,andenhancestheportabilityandreproducibilityofdistributedanalysesrequiredforlarge-scalegenomicstudies.
Page 69
61
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
CERENKOV3:Clusteringandmolecularnetwork-derivedfeaturesimprovecomputationalpredictionoffunctionalnoncodingSNPs
YaoYao,StephenA.Ramsey
OregonStateUniversityYaoYaoIdentificationofcausalnoncodingsinglenucleotidepolymorphisms(SNPs)isimportantformaximizingtheknowledgedividendfromhumangenome-wideassociationstudies(GWAS).Recently,diversemachinelearning-basedmethodshavebeenusedforfunctionalSNPidentification;however,thistaskremainsafundamentalchallengeincomputationalbiology.WereportCERENKOV3,amachinelearningpipelinethatleveragesclustering-derivedandmolecularnetwork-derivedfeaturestoimprovepredictionaccuracyofregulatorySNPs(rSNPs)inthecontextofpost-GWASanalysis.Theclustering-derivedfeature,locussize(numberofSNPsinthelocus),derivesfromourlocuspartitioningprocedureandrepresentsthesizesofclustersbasedonSNPlocations.Wegeneratedtwomolecularnetwork-derivedfeaturesfromrepresentationlearningonanetworkrepresentingSNP-geneandgene-generelations.Basedonempiricalstudiesusingaground-truthSNPdataset,CERENKOV3significantlyimprovesrSNPrecognitionperformanceinAUPRC,AUROC,andAVGRANK(alocus-wiserank-basedmeasureofclassificationaccuracywepreviouslyproposed).
Page 70
62
PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
Page 71
63
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
AnomiGAN:GenerativeAdversarialNetworksforAnonymizingPrivateMedicalData
HoBae,DahuinJung,Hyun-SooChoi,SungrohYoon
SeoulNationalUniversityHoBae
Typicalpersonalmedicaldatacontainssensitiveinformationaboutindividuals.Storingorsharingthepersonalmedicaldataisthusoftenrisky.Forexample,ashortDNAsequencecanprovideinformationthatcanidentifynotonlyanindividual,butalsohisorherrelatives.Nonetheless,mostcountriesandresearchersagreeonthenecessityofcollectingpersonalmedicaldata.Thisstemsfromthefactthatmedicaldata,includinggenomicdata,areanindispensableresourceforfurtherresearchanddevelopmentregardingdiseasepreventionandtreatment.Topreventpersonalmedicaldatafrombeingmisused,techniquestoreliablypreservesensitiveinformationshouldbedevelopedforrealworldapplications.Inthispaper,weproposeaframeworkcalledanonymizedgenerativeadversarialnetworks(AnomiGAN),topreservetheprivacyofpersonalmedicaldata,whilealsomaintaininghighpredictionperformance.Wecomparedourmethodtostate-of-the-arttechniquesandobservedthatourmethodpreservesthesamelevelofprivacyasdifferentialprivacy(DP)andprovidesbetterpredictionresults.Wealsoobservedthatthereisatrade-offbetweenprivacyandpredictionresultsthatdependsonthedegreeofpreservationoftheoriginaldata.Here,weprovideamathematicaloverviewofourproposedmodelanddemonstrateitsvalidationusingUCImachinelearningrepositorydatasetsinordertohighlightitsutilityinpractice.Thecodeisavailableathttps://github.com/hobae/AnomiGAN/
Page 72
64
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
FrequencyofClinVarpathogenicvariantsinchronickidneydiseasepatientssurveyedforreturnofresearchresultsataClevelandpublichospital
DanaC.Crawford1,2,3,,JohnLin1,JessicaN.CookeBailey1,2,TylerKinzy1,JohnR.Sedor4,5,JohnF.O'Toole5,WilliamS.Bush1,2,3
1ClevelandInstituteforComputationalBiology,2DepartmentsofPopulationandQuantitative
HealthSciences,and3GeneticsandGenomeSciences,CaseWesternReserveUniversity4DepartmentofPhysiologyandBiophysics,CaseWesternReserveUniversity;and5DepartmentofNephrologyandHypertension,GlickmanUrologyandKidneyandLernerResearchInstitute,
ClevelandClinicDanaCrawfordReturnofresultsisnotcommoninresearchsettingsasstandardsarenotyetinplaceforwhattoreturn,howtoreturn,andtowhom.Asapioneeroflarge-scaleofreturnofresearchresults,thePrecisionMedicineInitiativeCohortnowknownofAllofUsplanstoreturnpharmacogenomicresultsandvariantsofclinicalsignificancetoitsparticipantsstartinglate2019.Tobetterunderstandthelocallandscapeofpossibilitiesregardingreturnofresearchresults,weassessedthefrequencyofpathogenicvariantsandAPOL1renalriskvariantsinasmalldiversecohortofchronickidneydiseasepatients(CKD)ascertainedfromapublichospitalinCleveland,OhiogenotypedontheIlluminaInfiniumMegaEX.Ofthe23,720ClinVar-designatedvariantsdirectlyassayedbytheMegaEX,8,355(35%)hadatleastonealternatealleleinthe130participantsgenotyped.Ofthese,18ClinVarvariantsdeemedpathogenicbymultiplesubmitterswithnoconflictsininterpretationweredistributedacross27participants.ThemajorityofthesepathogenicClinVarvariants(14/18)wereassociatedwithautosomalrecessivedisorders.OfnotewerefourAfricanAmericancarriersofTTRrs76992529associatedwithamyloidogenictransthyretinamyloidosis,otherwiseknownasfamilialtransthyretinamyloidosis(FTA).FTA,anautosomaldominantdisorderwithvariablepenetrance,ismorecommonamongAfrican-descentpopulationscomparedwithEuropean-descentpopulations.AlsocommoninthisCKDpopulationwereAPOL1renalriskallelesG1(rs73885319)andG2(rs71785313)with60%ofthestudypopulationcarryingatleastonerenalriskallele.BothpathogenicClinVarvariantsandAPOL1renalriskallelesweredistributedamongparticipantswhowantedactionablegeneticresultsreturned,wantedgeneticresultsreturnedregardlessofactionability,andwantednoresultsreturned.Resultsfromthislocalgeneticstudyhighlightchallengesinwhichvariantstoreport,howtointerpretthem,andtheparticipant’spotentialforfollow-up,onlysomeofthechallengesinreturnofresearchresultslikelyfacinglargerstudiessuchasAllofUs.
Page 73
65
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Network-BasedMatchingofPatientsandTargetedTherapiesforPrecisionOncology
QingzhiLiu1,MinJinHa2,RupamBhattacharyya1,LanaGarmire3,VeerabhadranBaladandayuthapani1
1DepartmentofBiostatistics,UniversityofMichigan;2DepartmentofBiostatistics,The
UniversityofTexasMDAndersonCancerCenter;3DepartmentofComputationalMedicineandBioinformaticsUniversityofMichigan
QingzhiLiuTheextensiveacquisitionofhigh-throughputmolecularprofilingdataacrossmodelsystems(humantumorsandcancercelllines)anddrugsensitivitydata,makesprecisiononcologypossible–allowingclinicianstomatchtherightdrugtotherightpatient.Currentsupervisedmodelsfordrugsensitivityprediction,oftenusecelllinesasexemplarsofpatienttumorsandformodeltraining.However,thesemodelsarelimitedintheirabilitytoaccuratelypredictdrugsensitivityofindividualcancerpatientstoalargesetofdrugs,giventhepaucityofpatientdrugsensitivitydatausedfortestingandhighvariabilityacrossdifferentdrugs.Toaddressthesechallenges,wedevelopedamultilayernetwork-basedapproachtoimputeindividualpatients’responsestoalargesetofdrugs.Thisapproachconsidersthetripletofpatients,celllinesanddrugsasoneinter-connectedholisticsystem.Wefirstusetheomicsprofilestoconstructapatient-celllinenetworkanddeterminebestmatchingcelllinesforpatienttumorsbasedonrobustmeasuresofnetworksimilarity.Subsequently,theseresultsareusedtoimputethe“missinglink”betweeneachindividualpatientandeachdrug,calledPersonalizedImputedDrugSensitivityScore(PIDS-Score),whichcanbeconstruedasameasureofthetherapeuticpotentialofadrugortherapy.Weappliedourmethodtotwosubtypesoflungcancerpatients,matchedthesepatientswithcancercelllinesderivedfrom19tissuetypesbasedontheirfunctionalproteomicsprofiles,andcomputedtheirPIDS-Scoresto251drugsandexperimentalcompounds.Weidentifiedthebestrepresentativecelllinesthatconservelungcancerbiologyandmoleculartargets.ThePIDS-Scorebasedtopsensitivedrugsfortheentirepatientcohortaswellasindividualpatientsarehighlyrelatedtolungcancerintermsoftheirtargets,andtheirPIDS-Scoresaresignificantlyassociatedwithpatientclinicaloutcomes.Thesefindingsprovideevidencethatourmethodisusefultonarrowthescopeofpossibleeffectivepatient-drugmatchingsforimplementingevidence-basedpersonalizedmedicinestrategies.
Page 74
66
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Phenome-wideassociationstudiesoncardiovascularhealthandfattyacidsconsideringphenotypequalitycontrolpracticesforepidemiologicaldata
KristinPassero1,XiHe1,JiayanZhou1,BertramMueller-Myhsok2,3,4,MarcusE.Kleber5,WinfriedMaerz5,6,7,MollyA.Hall1
1PennState;2MaxPlanckInstituteofPsychiatry;3MunichClusterofSystemsBiology;4University
ofLiverpool;5HeidelbergUniversity;6SYNLABAcademy;7MedicalUniversityofGrazKristinPasseroPhenome-wideassociationstudies(PheWAS)allowagnosticinvestigationofcommongeneticvariantsinrelationtoavarietyofphenotypesbutpreservingthepowerofPheWASrequirescarefulphenotypicqualitycontrol(QC)procedures.WhileQCofgeneticdataiswell-defined,noestablishedQCpracticesexistformulti-phenotypicdata.Manuallyimposingsamplesizerestrictions,identifyingvariabletypes/distributions,andlocatingproblemssuchasmissingdataoroutliersisarduousinlarge,multivariatedatasets.Inthispaper,weperformtwoPheWASonepidemiologicaldataand,utilizingthenovelsoftwareCLARITE(CLeaningtoAnalysis:Reproducibility-basedInterfaceforTraitsandExposures),showcaseatransparentandreplicablephenomeQCpipelinewhichwebelieveisanecessityforthefield.UsingdatafromtheLudwigshafenRiskandCardiovascular(LURIC)HealthStudywerantwoPheWAS,oneoncardiac-relateddiseasesandtheotheronpolyunsaturatedfattyacidslevels.Thesephenotypesunderwentastringentqualitycontrolscreenandwereregressedonagenome-widesampleofsinglenucleotidepolymorphisms(SNPs).SevenSNPsweresignificantinassociationwithdihomo-γ-linolenicacid,ofwhichfivewerewithinfattyaciddesaturasesFADS1andFADS2.PheWASisausefultooltoelucidatethegeneticarchitectureofcomplexdiseasephenotypeswithinasingleexperimentalframework.However,toreducecomputationalandmultiple-comparisonsburden,carefulassessmentofphenotypequalityandremovaloflow-qualitydataisprudent.HereinweperformtwoPheWASwhileapplyingadetailedphenotypeQCprocess,forwhichweprovideareplicablepipelinethatismodifiableforapplicationtootherlargedatasetswithheterogenousphenotypes.Asinvestigationofcomplextraitscontinuesbeyondtraditionalgenomewideassociationstudies(GWAS),suchQCconsiderationsandtoolssuchasCLARITEarecrucialtotheintheanalysisofnon-geneticbigdatasuchasclinicalmeasurements,lifestylehabits,andpolygenictraits.
Page 75
67
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
aTEMPO:Pathway-SpecificTemporalAnomaliesforPrecisionTherapeutics
ChristopherMichaelPietras,LiamPower,DonnaK.Slonim
TuftsUniversityChristopherPietrasDynamicprocessesareinherentlyimportantindisease,andidentifyingdisease-relateddisruptionsofnormaldynamicprocessescanprovideinformationaboutindividualpatients.Wehavepreviouslycharacterizedindividuals'diseasestatesviapathway-basedanomaliesinexpressiondata,andwehaveidentifieddisease-correlateddisruptionofpredictabledynamicpatternsbymodelingavirtualtimeseriesinstaticdata.Herewecombinethetwoapproaches,usingananomalydetectionmodelandvirtualtimeseriestoidentifyanomaloustemporalprocessesinspecificdiseasestates.Wedemonstratethatthisapproachcaninformativelycharacterizeindividualpatients,suggestingpersonalizedtherapeuticapproaches.
Page 76
68
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
FeatureSelectionandDimensionReductionofSocialAutismData
PeterWashington1,KelleyMariePaskov1,HaikKalantarian1,NathanielStockham1,CatalinVoss1,AaronKline1,RitikPatnaik2,BriannaChrisman1,MayaVarma1,QandeelTariq1,Kaitlyn
Dunlap1,JesseySchwartz1,NickHaber1,DennisP.Wall1
1StanfordUniversity,2MassachusettsInstituteofTechnology
PeterWashingtonAutismSpectrumDisorder(ASD)isacomplexneuropsychiatricconditionwithahighlyheterogeneousphenotype.FollowingtheworkofDudaetal.,whichusesareducedfeaturesetfromtheSocialResponsivenessScale,SecondEdition(SRS)todistinguishASDfromADHD,weperformeditem-levelquestionselectiononanswerstotheSRStodeterminewhetherASDcanbedistinguishedfromnon-ASDusingasimilarlysmallsubsetofquestions.ToexplorefeatureredundanciesbetweentheSRSquestions,weperformedfilter,wrapper,andembeddedfeatureselectionanalyses.ToexplorethelinearityoftheSRS-relatedASDphenotype,wethencompressedthe65-questionSRSintolow-dimensionrepresentationsusingPCA,t-SNE,andadenoisingautoencoder.Wemeasuredtheperformanceofamulti-layerperceptron(MLP)classifierwiththetop-rankingquestionsasinput.Classificationusingonlythetop-ratedquestionresultedinanAUCofover92%forSRS-deriveddiagnosesandanAUCofover83%fordataset-specificdiagnoses.Highredundancyoffeatureshaveimplicationstowardsreplacingthesocialbehaviorsthataretargetedinbehavioraldiagnosticsandinterventions,wheredigitalquantificationofcertainfeaturesmaybeobfuscatedduetoprivacyconcerns.WesimilarlyevaluatedtheperformanceofanMLPclassifiertrainedonthelow-dimensionrepresentationsoftheSRS,findingthatthedenoisingautoencoderachievedslightlyhigherperformancethanthePCAandt-SNErepresentations.
Page 77
69
ATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE
POSTERPRESENTATIONS
Page 78
70
ArtificialIntelligenceforEnhancingClinicalMedicine
PrioritizingCopyNumberVariantsusingPhenotypeandGeneFunctionalSimilarity
AzzaAlthagafi,JunChen,RobertHoehndorf
Computer,Electrical&MathematicalScienceandEngineeringDivision(CEMSE),ComputationalBioscienceResearchCenter(CBRC),KingAbdullahUniversityofScienceandTechnology
(KAUST),4700KAUST,23955-6900,Thuwal,KingdomofSaudiArabia
AzzaAlthagafiTherearemanytypesofgeneticvariationinthehumangenome,rangingfromlargechromosomeanomaliestoSingleNucleotideVariant(SNV).Itisbecomingnecessarytodevelopmethodsfordistinguishingdisease-causingvariantsfromalargenumberofneutralgeneticvariationinanindividual.ThisproblemisalsorelevanttoCopyNumberVariants(CNVs),whichisaclassofgeneticvariationwherelargesegmentsofthegenomedifferincopynumberamongstvariousindividuals.Overthepastseveralyears,muchprogresshasbeenmadeintheareaofCNVsdetectionandunderstandingtheirroleinhumandiseases.WenowunderstandthatCNVsaccountformuchofhumanvariability.Correspondingly,therehavebeenseveralmethodsintroducedtofinddisease-associatedgenesandSNVs.DifferentmethodshavebeendevelopedforpredictingandprioritizingpathogenicityofSNVsfoundwithinagenome.ConstructingsimilarmethodsforCNVischallengingduetotheheterogeneityinvariantsize,typeandthepossibilityofmultiplegenesbeingaffectedbylargeCNVs.CNVimpactpredictionmethodsshouldconsiderthesefactorsinordertorobustlyprioritizepathogenicvariants.Wehavebuiltamethodthatincorporatesbiologicalbackgroundknowledgeabouttherelationbetweenphenotypesresultingfromalossoffunctioninmousegenes,genefunctionsasdescribedusingtheGeneOntology(GO),aswellastheanatomicalsiteofgeneexpressionalongwithascorethatpredictsthepathogenicityofCNVSVScore.WeusethisinformationtobuildamachinelearningmodelthatranksCNVsbasedontheirpredictedpathogenicityandtherelationbetweengenesaffectedbytheCNVandthephenotypeweobserveinaffectedindividuals.Additionally,ourapproachconsidersseveralgenomicfeaturesofeachCNVs,suchasthelengthofthecodingsequenceoverlappingwiththeCNV,haploinsufficiencyandtriplosensitivityscorestomeasurethedosage-sensitivityforgenes/regions,andGCcontent.Ourresultsshowthatincorporatingthisinformationleadstoimprovementoverabaselinemodelwhichusesonlysimilarityscoresbetweengene--phenotypeassociationsanddisease-associatedphenotypes,aswellasimprovementoverusingonlypathogenicitypredictionmethodsforCNVs.OurmethodachievesanF-scoreof80.85%,with82.05%precisionand79.67%recallinourevaluationset.Theresultsdemonstratethatincorporatingphenotype,functional,andgeneexpressioninformationmaybeutilizedtoidentifycausativeCNVs.Futureworkisrequiredtoevaluateandimproveourmodelusingpatient-derivedWGSdata.
Page 79
71
ArtificialIntelligenceforEnhancingClinicalMedicine
InferringtheRewardFunctionsthatGuideCancerProgression
JohnKalantari1,HeidiNelson2,NicholasChia3
1MicrobiomeProgram,CenterforIndividualizedMedicine,MayoClinic,Rochester,MN,USA;2ColonandRectalSurgery,MayoClinic,Rochester,MN,USA;3DivisionofSurgicalResearch,
DepartmentofSurgery,MayoClinic,Rochester,MN,USA
JohnKalantariCancercanoccurinpatientswithdifferentgeneticbackgroundsviaamulti-stepevolutionaryprocess,i.e.,drivenbymodificationandselection,thatcanaccumulatedifferentgeneticalterations.Despitethesedifferences,manycancersubtypesareunifiedbysimilarmechanismsortypesofgeneticchanges.Inotherwords,therearemultipleetiologicalpathstiedtogetherbyspecificeventsthatsharecommonalityintheircausalmechanism.Understandingthesecommonmechanismswillenablethedevelopmentofbettertherapiesandpreventativemeasures.Itwillalsoenableimprovedpredictionofrecurrenceandmetastaticadvancementofcancer,directlyimpactingthe606,880annualcancerdeathsintheUnitedStatesalone.OurworkisbuiltuponthecentralpropositionthattheMarkovDecisionProcess(MDP)canbetterrepresenttheprocessbywhichcancerarisesandprogresses.Morespecifically,byencodingacancercell'scomplexbehaviorasaMDP,weseektomodeltheseriesofgeneticchanges,orevolutionarytrajectory,thatleadstocancerasanoptimaldecisionprocess.WepositthatusinganInverseReinforcementLearning(IRL)approachwillenableustoreverseengineeranoptimalpolicyandrewardfunctionbasedonasetofexpertdemonstrationsextractedfromtheDNAofpatienttumors.Theinferredrewardfunctionandoptimalpolicycansubsequentlybeusedtoextrapolatetheevolutionarytrajectoryofanytumor.Weintroduceanoveldata-agnosticartificialintelligenceframeworkwhichcaninferrewardfunctionsdescribingthecausalmechanismsthatbestexplaintheobservedbehaviorofan'optimally-behavingagent'–thecancercell.Usingmulti-omicdatafrom27colorectalcancer(CRC)patientsasproof-of-principle,weshowthatIRLprovidesasystematicandscalableapproachtoformallystatingandsolvingtheproblemofcancerevolution.Byprovidingalineagepath(i.e.,sequencesofalterations)obtainedviasubclonalreconstructionforeachtumor,weareabletoreducethiscomplexproblemtotherecoveryofanassociatedreinforcementlearningrewardfunction.Theserewardfunctionshavethepotentialtomodelunknownmolecularmechanismsdrivingintratumorheterogeneityandtoelucidatecanceretiologies.
Page 80
72
ArtificialIntelligenceforEnhancingClinicalMedicine
Predictingdisease-associatedmutationofmetal-bindingsitesinproteinsusingadeeplearningapproach
MohamadKoohi-Moghadam,HaiboWang,YuchuanWang,XinmingYang,HongyanLi,JunwenWang,HongzheSun
DepartmentofChemistry,TheUniversityofHongKong,HongKong,China;
DepartmentofHealthSciences,MayoClinic,Scottsdale,AZ,USA;DepartmentofMolecularPharmacologyandExperimentalTherapeutics,MayoClinic,
Scottsdale,AZ,USA;CenterforIndividualizedMedicine,MayoClinic,Scottsdale,AZ,USA;
CollegeofHealthSolutions,ArizonaStateUniversity,Scottsdale,AZ,USA
JunwenWangMetalloproteinsplayimportantrolesinmanybiologicalprocesses.Mutationsatthemetal-bindingsitesmayfunctionallydisruptmetalloproteins,initiatingseverediseases;however,thereseemedtobenoeffectiveapproachtopredictsuchmutationsuntilnow.Herewedevelopadeeplearningapproachtosuccessfullypredictdisease-associatedmutationsthatoccuratthemetal-bindingsitesofmetalloproteins.Wegenerateenergy-basedaffinitygridmapsandphysiochemicalfeaturesofthemetalbindingpockets(obtainedfromdifferentdatabasesasspatialandsequentialfeatures)andsubsequentlyimplementthesefeaturesintoamultichannelconvolutionalneuralnetwork.Aftertrainingthemodel,thenetworkcansuccessfullypredictdisease-associatedmutationsthatoccuratthefirstandsecondcoordinationspheresofzinc-bindingsiteswithanareaunderthecurveof0.90andanaccuracyof0.82.Ourapproachstandsforthefirstdeeplearningapproachforthepredictionofdisease-associatedmetal-relevantsitemutationsinmetalloproteins,providinganewplatformtotacklehumandiseases.
Page 81
73
ArtificialIntelligenceforEnhancingClinicalMedicine
GENERAL
POSTERPRESENTATIONS
Page 82
74
General
RankingRASpathwaymutationsusingevolutionaryhistoryofMEK1
KatiaAndrianova,IgorJouline
OhioStateUniversity,DepartmentofMicrobiology,Columbus,Ohio43210
KatiaAndrianovaTheRas/MAPK(ratsarcoma/mitogen-activatedproteinkinase)signalingpathwayisinvolvedinessentiallyallaspectsoforganismaldevelopment,fromthefirstcelldivisionsintheearlyembryotopostnataldevelopmentandgrowth.Givenitscriticalfunction,itisnotsurprisingthatderegulatedRas/MAPKsignaling,resultingfromeithergeneticorenvironmentalperturbations,canleadtocanceranddevelopmentalabnormalities.Alargeclassofsuchabnormalities,knownasRASopathies,isassociatedwithactivatinggerm-linemutationsinmanycomponentsoftheRaspathway.Overthepastdecadewhennextgenerationsequencing(NGS)hasbecomevaluableandcost-effectivetoolforresearchapplicationsandclinicaldiagnosticsofMendeliandiseases,simultaneoussequencingofmultiplegenesinMAPKsignalingpathwayshaveyieldedmanyreportswithhundredsofmutationspossiblyassociatedwithRASopathiesandcancer.Inparticular,multiplenewmutationswereidentifiedinMEK1kinase.Themajorityofnewlydiscoveredcodingvariationsneitherhavebeendescribedinotherindividualsnorhavebeenstudiedorfunctionallyanalyzedincellularoranimalmodels,thusleavingclinicianstorelyoninsilicopredictionsofthe“variantsofuncertainsignificance”consequenceswithcomputationalsoftware,suchasPolyPhenandSIFT.Automatedsequencesearchesusedinthesemethodsdonotdistinguishpossibleduplicationeventsinthegenes’histories,hencemultiplesequencealignment(MSA)setsusuallyincludebothorthologandparalogcopies.Aspurifyingselectiontreadsononeoftheduplicatecopyitcanbecomeassociatedwithadifferentphenotypecomparedtoitsparalogoussiblingand/ortotheparentalgene.InmostcasesofMendeliandiseasesonlyonespecificduplicateofthegeneinthehumangenomeresultstobeassociatedwithadisease.Thisindicatestheimportanceofconsideringbothcommonancestorsandanygene’sduplicationhistoryforthevariantsinterpretation.ThepresenceofsevenhumanMEKproteinsincreasesthechancesofincludingparalogsintotheanalysis,andtherefore,substantiallylimitsmutationinterpretation.InthisstudyweestablishedthefirstprecisedescriptionofanevolutionaryhistoryofMEKkinasesandidentifiedpotentialduplicationevents.WedeterminedthatMEK1isanancestoroftheentireMEKfamily.Indepthanalysisoftheorthologousproteinsshowedthatessentiallyallexperimentallyprovenpathogenicmutationswerepredictedas“damaging”byourapproach.BycomparingourresultswiththepredictionsmadebyPolyPhen-2andSIFTweshowedhowcarefulanalysisofanevolutionaryhistoryofagenemayimproveaccuracyofmissensemutationsoutcomesprediction.
Page 83
75
GeneraNlGeneral
IntegrativeAnalysisofCOPDandLungCancerMetadataRevealsSharedAlterationsinImmuneResponse,PTENandPI3K-AKTPathways}
DannielleSkander1,ArdaDurmaz1,MohammedOrloff2,GurkanBebek1
1CaseWesternReserveUniversity,2UniversityofArkansasforMedicalSciences
GurkanBebekChronicobstructivepulmonarydisease(COPD)andlungcancerareamongtheleadingcausesofdeathworldwide.Whileitisbelievedthetwodiseasesarerelated,themechanismsbehindthisrelationshipremainunclear.WeinvestigatetherelationshipbetweenCOPDandlungcancerusinganintegrative-omicsapproach.IntegrationofepigeneticandmRNAgeneexpressiondataallowsustodiscoverthefunctionallyrelevantgenes,i.e.,thegenescrucialfordiseasedevelopment.Usingthisapproach,ourstudysuggeststhatthemechanismsdrivingthedevelopmentofbothdiseasesarerelatedtotheinterleukinimmuneresponse(IL4andIL17),PTENandPI3K-AKTpathways.UnderstandingthisrelationshipbetweenCOPDandlungcanceriscrucialforfuturepreventionandtreatmentoptionsofbothCOPDandlungcancer.
Page 84
76
General
Investigatingsourcesofirreproducibilityinanalysisofgeneexpressiondata
CarlyA.Bobak,JaneE.Hill
DartmouthCollege
CarlyBobakTheuseofbigdatapromisestochangethelandscapeofbiomedicalresearch;however,irreproducibilityofresultsremainsaproblem.Inthiswork,wesetouttoinvestigateproposedmethodstoincreasereproducibilityofgeneexpressionresults.Specifically,wetestthefollowingthreehypotheses:Resultsfrompathwayenrichmentwillbemoresimilaracrossdatasetsthanresultsondifferentiallyexpressed(DE)genesSimilarityacrosssmallerdatasetswillbelowerthansimilarityinlargerdatasetsResultsfrommulti-cohortdatawillbemoresimilarthanresultsfromsinglecohortdataWeselectedthreeuniquedatasetsfromtheGeneExpressionOmnibusthatincludeactiveTBpatients,spanningpediatricandadultpatients.IneachdatasetwerankedDEgenesastheywereassociatedwithTBvsother(healthycontrols,otherdiseases,orlatenttuberculosisinfection).Wethencalculatedtherankbiasedoverlap(RBO)oftherankedgenesacrosseachdataset.RBOisasimilaritymeasurescaledbetween0and1andcanbeinterpretedastheaverageagreementbetweentwolists.Genesetenrichmentanalysis(GSEA)wasperformed,andwecalculatedarankforthepathwayhitsandcomparedRBOforassociatedpathwaysbetweendatasets.Onaverage,theRBOincreasedbyafoldchangeof1.83×10^4whencomparingsimilarityofassociatedpathwaystosimilarityofDEgenes.Wethendividedeachdatasetinhalfandrepeatedtheanalysisonallsub-datasets.Sub-datasetsfromthesameparentdatasethadsimilarresults(meanRBOof0.60,sd=0.24)asopposedtosubsetsfromadifferentparentdataset(mean=0.10,sd=0.15).Contradictingouroriginalhypothesis,overallRBOcalculatedbetweensubsetsfromdifferentparentdatasetsdidnotnecessarilydecreasecomparedtotheinitialRBOcalculation–infact,halfoftheRBOcomparisonsincreasedinthesub-datasetscomparedtousingthewholedatasets.Totestthefinalhypothesis,weco-normalized,merged,andthenrandomlydivideddatasetsintothreeapproximatelyequalpieces.WerepeatedtheDEanalysisoneachpieceofthemergeddataset.Acrossmixeddatasets,themeanRBOwas0.023(sd=0.43).Heterogeneousdatasetsweremorealikethanuniquedatasets,butlessalikethanasingledivideddataset.However,theRBOsfrommixeddatasetscomparedtooriginaldatasetswerenotstatisticallysignificantlydifferentfromtheRBOscomparingresultsfromtheoriginaldatasets.Thus,wedemonstratedthatassociatedpathwaysaregreatlymorereproduciblethanassociatedgenes.Furtherstudyisnecessarytoinvestigatetheconditionsunderwhichstatisticalpowerandheterogeneityofdatainfluencereproducibilityoffindingsfromgeneexpressionstudies.
Page 85
77
General
EthereumandMultiChainblockchainsassecuretoolsforindividualizedmedicine
CharlotteBrannon,GamzeGursoy,SarahWagner,MarkGerstein
YaleUniversityComputationalBiologyandBioinformaticsProgram
CharlotteBrannonWiththerapidlydecreasingcostofgenomesequencingandadventofindividualizedmedicine,relianceonindividualgenomicdatawillsoonbeintegraltomedicaltreatmentdecisions.Forexample,apatient’spersonalgenomicsequencewillprovidephysicianswithinformationonwhichtobasetestsanddiagnoses.Similarly,pharmacogenomicsdatawillrevealthemosteffectiveprescriptionsforaparticularpatient.Genomicdatawillneedtobesharedefficientlyamongmultipleparties.However,becausethesearesensitivepersonaldatawhichwilldirectlyimpactmedicaltreatmentdecisions,theymustbemaintainedinasecure,high-integrityfashion.Blockchaintechnologyisonewaytoachievesecure,high-integritydatastorage.Wepresenttwoproof-of-conceptsolutions,oneforstoringandqueryingpersonalgenomicsequencedatainaMultiChainblockchaindesignedfordirectsharingwithphysicians;andoneforstoringandqueryinggene-druginteractiondatainanEthereumblockchainsmartcontractdesignedforsharedaccessamongpermissionedresearchersandphysicians.Despitethehighsecurityandintegritythatcomeswithblockchaindatastorage,thereisatrade-offwithdataaccessefficiencyandstoragecosts.Weovercomethesechallengesbydevelopingnovelstoragetechniques.Whenstoringpersonalgenomicsequencedata,wedonotstoretheactualsequencedatabutratherasetofmeta-datawhichcanbeusedincombinationwithareferencegenometoreconstructtheoriginalsequences.Whenstoringpharmacogenomicsdata,weuseanindex-based,multi-mappingapproachtoprovidetime-andspace-efficientinsertionandquerying.
Page 86
78
General
GenomicpredictorsofL-asparaginase-inducedpancreatitisinpediatriccancerpatients
BrittI.Drögemöller,GalenE.B.Wright,ShahradR.Rassekh,ShinyaIto,BruceC.Carleton,ColinJ.D.Ross,TheCanadianPharmacogenomicsNetworkforDrugSafetyConsortium
FacultyofPharmaceuticalSciences,UniversityofBritishColumbia,Vancouver,BC,Canada;BCChildren’sHospitalResearchInstitute,UniversityofBritishColumbia,Vancouver,BC,Canada;DepartmentofPediatrics,FacultyofMedicine,UniversityofBritishColumbia,Vancouver,BC,Canada;ClinicalPharmacologyandToxicology,TheHospitalforSickChildren,Universityof
Toronto,Toronto,ON,Canada;PharmaceuticalOutcomesProgramme,BCChildren’sHospital,Vancouver,BC,Canada
BrittDrogemollerBackground:L-asparaginaseishighlyeffectiveinthetreatmentofpediatricacutelymphoblasticleukemia.Unfortunately,theuseofthistreatmentislimitedbytheoccurrenceofpancreatitis,asevereandpotentiallylethaladversedrugreaction,whichoccursin2-18%ofpatients.AspreviousstudieshavebeenunabletoidentifystrongassociationsbetweenclinicalvariablesandsusceptibilitytoL-asparaginase-inducedpancreatitis,geneticfactorsareexpectedtoplayanimportantrolethisadversedrugreaction.Objectives:WesoughttoexploretheroleofthesegeneticsusceptibilityfactorstoL-asparaginase-inducedpancreatitisinpediatriccancerpatients.Methods:PatientswhoweretreatedwithL-asparaginasewererecruitedfrom13pediatriconcologyunitsacrossCanada(n=284)andextensiveclinicaldatawerecollectedforallpatients.GenotypingwasperformedusingtheIlluminaHumanOmniExpressandGlobalScreeningArraysandpancreaticgeneexpressionprofileswereimputedintheseindividualsusingGTExv7andS-PrediXcan.Genome-andtranscriptome-wideassociations(GWASandTWAS)wereperformedtoidentifyassociationswithL-asparaginase-inducedpancreatitis.Results:GWASanalysesidentifiedsignificantassociationsbetweengeneticvariantsinHLA-DQA1and–DRB1andpancreatitis,whileTWASrevealedthatindividualsexperiencingL-asparaginase-inducedpancreatitisexhibitedlowerexpressionlevelsofHLA-DRB5.FurtherinterrogationoftheTWASdatarevealedanenrichmentingenesinvolvedinthesomaticdiversificationofimmunereceptors.Conclusions:Theseanalysesuncoveredanassociationbetweengeneticvariationinimmune-relatedgenesandthedevelopmentofL-asparaginase-inducedpancreatitis.TheseassociationsmirrorpreviousassociationswiththeHLAregionand(i)pancreatitisinducedbyotherdrugsand(ii)L-asparaginase-inducedhypersensitivity.
Page 87
79
General
NITECAP:Anovelmethodandinterfacefortheidentificationofcircadianbehaviorinhighlyparalleltime-coursedata
ThomasG.Brooks1,CrisW.Lawrence1,NicholasF.Lahens1,SoumyashantNayak1,DimitraSarantopoulou1,GarretA.FitzGerald1,2,GregoryR.Grant3
1InstituteforTranslationalMedicineandTherapeutics(ITMAT),UniversityofPennsylvania;
2SystemsPharmacologyandTranslationalTherapeutics;3DepartmentofGenetics,UniversityofPennsylvania
ThomasBrooksWeintroduceanewtoolcalledNITECAPforthetaskofidentifyingcircadianbehaviorinmassivelyparallelmeasurementsofbiologicalentities;forexample,findingcircadiangenesfromgeneexpressiontimecoursedatameasuredbyRNA-Seqormicroarrays.NITECAPemploysapermutation-basedapproachwhichusesanovelstatisticdesignedtobesensitivetocircadianbehavior.NITECAPalsousesanapproachtomultiple-testingwhichproducesq-valuesdirectlywithoutneedingtofirstgeneratep-valueswhichthenneedtobeadjusted.Ourapproachhasseveraladvantagesparticularlywhenindividualp-valuesareunderpoweredorunreliable.Importantly,wehavedevelopedanintuitiveuser-friendlyweb-basedinterfacewhichenablesinvestigatorstoperformrobustcircadiananalysesofthistypedirectlywithoutexpertinformaticssupport.Userscanquicklyscrollthroughtimecourseprofilessortedbyeffectsize,greatlyfacilitatingthechoiceofsignificancethresholdsthatcurrentlyrequiremakingblindchoicesofnumericalcutoffs.Puttingthistypeofanalysisinthehandsoftheinvestigatorscansignificantlystreamlinetheirresearch.ThewebsitealsoenablestheotherstandardsignificancetestssuchasJTKandANOVAandprovidestoolstoperformcomparativestudies,suchasfindingphaseoramplitudedifferencesbetweendifferentconditions.NITECAPisfreelyavailableforpublicuseat:http://www.nitecap.org
Page 88
80
General
TheInterplayofObesityandRace/EthnicityonMajorPerinatalComplications
YaadiraBrown,MPH1;OlubodeA.Olufajo,MD,MPH2;EdwardE.CornwellIII,MD2;WilliamSoutherland,PhD3
1ResearchCentersinMinorityInstitutions:HowardUniversity,HowardUniversityCollegeofMedicine;2ResearchCentersinMinorityInstitutions:HowardUniversity,CliveCallender
Howard-HarvardHealthSciencesOutcomesResearchCenter;3ResearchCentersinMinorityInstitutions:HowardUniversity
YaadiraBrownBackground:Ithasbeenestablishedthatasignificantdisparityexistsintheratesofadverseperinataloutcomesacrossdifferentracial/ethnicgroups,withnon-HispanicBlackwomengenerallybeingmostimpacted.Thereisalsoevidencethatobesityisassociatedwithadverseperinataloutcomes.Althoughsomestudieshaveexaminedtheimpactofrace/ethnicityandobesityonadverseperinataloutcomes,moststudieshavedonesousinglocalorstatewidedata.Thisstudyaimstouseanationalsampletodeterminetheroleofobesityintheracial/ethnicdisparitiesseeninadverseperinataloutcomesintheUnitedStates.Methods:DatafromtheNationalInpatientSamplewasutilizedinselectingpregnantwomenadmittedfordeliverybetween2010and2014.Demographics(race/ethnicity,insurancetype,householdincome,co-morbidities)andhospitalcharacteristicswereextracted.Race/ethnicitywascategorizedasNon-HispanicWhites(NHW),Non-HispanicBlacks(NHB),andHispanics.Outcomesofinterestweregestationaldiabetes,pre-eclampsia,pre-termbirth,andhospitalmortality.Multivariatelogisticregressionswereperformedtodeterminetheindependentpredictorsoftheoutcomes,usingtwosetsofmodels;onewhichincludedobesityasavariableinthemodelandonewhichdidnot.ThedifferencesbetweenthetwosetsofmodelswerecomparedbyperformingtheWaldTest.Results:Ourcohortconsistedof15,561,942pregnantindividualsadmittedfordelivery.Therewere9,247,729(59.43%)NHW,2,552,569(16.4%)NHB,and3,761,644(24.17%)Hispanic.Comparedtoothergroups,NHBhadsignificantlyhigherratesofpre-eclampsia(5.1%),pre-termbirth(9.4%),andhospitalmortality(.11%).Theyalsohadthehighestratesofobesity(9.0%).Onmultivariateanalysis,NHBweremorelikelytohavepre-eclampsia(AdjustedOddsRatio[aOR]1.26;95%ConfidenceInterval[CI]1.23-1.29),pre-termbirth(aOR1.38;95%CI1.34-1.41),andhospitalmortality(aOR2.05;95%CI1.2-3.38)whencomparedtoNHW.However,theyhadasimilarriskforgestationaldiabetes(aOR0.94;95%CI0.91-0.96)asNHW.Obesitywassignificantlyassociatedwithgestationaldiabetes(aOR3.08;95%CI3.02-3.15),pre-eclampsia(aOR2.14;95%CI2.09-2.19),andpre-termbirth(aOR1.04;95%CI1.01-1.06).Althoughthedifferenceswereminimal,theregressionmodelsthatincludedobesityasavariablebetterpredictedtheoutcomesthanthosethatdidnotwhenassessinggestationaldiabetes,pre-eclampsia,andpre-termbirth.Conclusion:Thesefindingsfurtherconfirmthatracial/ethnicdisparitiesexistamongstadverseperinataloutcomes,withNHBbeingdisproportionatelyaffected.Theyalsosuggestthatobesityplaysasignificantroleintheracial/ethnicdisparitiesthatdoexistfortheadverseperinataloutcomesmeasured,otherthanhospitalmortality.Thesedatasuggestthataddressingobesityinthepopulationmaybebeneficialinimprovingperinataloutcomes,buttheyalsosuggestthatmoreresearchisneededtoidentifythemajorfactorsthatdrivetheracial/ethnicdisparitiesthatexistamongstperinataloutcomesintheUnitedStates.
Page 89
81
General
AComparisonofPharmacogenomicInformationinFDA-ApprovedDrugLabelsandCPICGuidelines
KatherineI.Carrillo1,TeriE.Klein2
1HenryM.GunnHighSchool,PaloAlto,CA;2StanfordUniversity,Stanford,CA
KatherineCarrilloPharmacogenomics(PGx)isusefulinhelpingtopredictapatient’slikelyreactiontoamedicationbasedontheirgenotype,allowingforpersonalizedmedicine.TheFDAmaintainsa“TableofPharmacogenomicBiomarkersinDrugLabeling”(https://www.fda.gov/drugs/science-and-research-drugs/table-pharmacogenomic-biomarkers-drug-labeling)consistingofpharmacogenomicinformationfoundinthedruglabeling.However,manylabelsonthelistdonotcontainadviceforaclinicianabouthoworwhentouseapatient’sgeneticinformation.GuidelinescreatedbytheClinicalPharmacogeneticImplementationConsortium(CPIC;https://cpicpgx.org/)containinformationabouthowtousepatientgeneticinformationwhenprescribingdrugs.Also,CPICprovidesguidelinesforsomedrugsnotcurrentlyontheFDAbiomarkerlist,thoughitdoesnotprovideguidelinesforeverydrugonthebiomarkerlist.UsingPharmGKBannotatedFDA-approvedlabels(throughOctober2019),weevaluatedlabelinformationtodetermine(1)whichlabelscontainedanykindofprescribinginformationincludingasuggestedalternatedrug,dosinginformationorspecialconsiderationsbasedonthepatient’sgenotype/metabolizerstatus,(2)whichPharmGKBannotatedlabelswerepresentontheFDAbiomarkerlist,and(3)whatgeneswereinvolved.WedidnotincludeFDAlabelsannotatedforgeneticvariationincancercells;onlygermlinevariationwasincluded.WecomparedallavailableCPICguidelinerecommendationstotheinformationfromthelabels.Weidentifiedwherethelabelsandguidelinesaresimilarornot.PharmGKBhas223annotations(notincluding82annotationsforcancercellDNAvariation)basedon219FDA-approveddruglabels.Ofthese,199labelsarecurrentlyonthebiomarkerlistand17wereonthebiomarkerlistatonetimebuthavebeenremovedbytheFDA.Twentylabelshavedosinginformationand35recommendanalternatedrugbasedongenotype/metabolizerphenotype.Another34labelshavesomeotherspecialconsideration,butmostlabelsonthebiomarkerlist(136)havenoguidanceforcliniciansaboutwhattodoaboutthebiomarker,ifanything.Thereare45drugswithpublishedCPICguidelines(https://cpicpgx.org/genes-drugs/).Thirty-sixofthedrugshavealabelontheFDAbiomarkerlistbuttheinformationonthelabeldoesnotalwaysmatchtheguideline.Only21oftheCPICdrugshavelabelswithguidance.Forsomedrugs,thePGxinformationonthelabelsissimilartotheCPICguidelinesbutdifferentformanyothers.TheFDAbiomarkerlisthasmoredrugsthanCPICguidelineswrittenandinsomecasesthelabelstellclinicianswhentheyshouldtestapatientwhileCPICdoesn’ttalkabouttesting.However,formostdrugs,thelabelsdon’tgivethecliniciansalotofinformationaboutwhattodowiththeirpatients’genetictestresults.ForthedrugswithCPICguidelines,thereismoreinformationabouthowtousegenetictestresultsandwhy.FundedbyNIH/NIGMSR24GM61374.
Page 90
82
General
xTEA:atransposableelementinsertionanalyzerforgenomesequencingdatafrommultipletechnologies
ChongChu1,RebecaMonroy2,SoohyunLee1,E.AliceLee2,PeterJ.Park1
1HarvardMedicalSchool,2BostonChildren'sHospital
E.AliceLeeTransposableelements(TEs)comprisenearly50%ofthehumangenome.AlthoughmostoftheTEsarenowsilent,severaltypesofretrotransposonsincludingLINE-1,Alu,andSVAarestillactive.SomaticTEinsertionshavebeenshowntooccurfrequentlyinmultipletumortypes[1,2]andatalowrateinneuronsofphenotypicallynormalindividuals[3].MultipletoolshavebeendevelopedtocallTEinsertionsfromgenomesequencingdata,butanefficienttoolthatcanidentifybothgermlineandsomaticTEinsertionswithhighsensitivityandspecificityisstilllacking.Moreover,newertechnologiessuchas10XLinked-ReadandPacBioorNanoporelongreadsequencingprovideanunprecedentedopportunitytostudyTEs;however,currentmethodsdonottakeadvantageofthesedatatypes.Here,wepresentanewcomputationaltoolxTEA,buildingonourpreviousalgorithmTEA[1].ThistoolidentifiesTEinsertionsfromIlluminapaired-endreads,10XLinked-Reads,longreads,oracombineddataset.xTEAoutperformsMELT[4]andTraffic-mem[5]onnormalandtumorIlluminadata,respectively.Acomparisonofdifferentsequencingplatformsrevealsthattheanalysisoflongreadshadgreatersensitivityandspecificity,especiallyinrepetitiveregions.Both10XLinked-ReadsandlongreadsdemonstratedclearadvantagesovershortreadsinconstructingfulllengthTEinsertions.Betterperformancewasachievedonhybriddatacomparedtosingleplatformdata.Using22humansampleswitheitherPacBioorNanoporelongreadsandmatchedshortreads,weuncoveredLINE-1internalSVhotspotsandSVAinternalVNTRexpansion.xTEAisacomprehensivecross-platformTEinsertion-callingtool.Itcanbedeployedonacomputingcluster,AWS,andGoogleCloud,andisefficientforlargecohortanalysis.xTEAispubliclyavailableathttps://github.com/parklab/xTEA.References[1]Lee,Eunjung,etal."Landscapeofsomaticretrotranspositioninhumancancers."Science337.6097(2012):967-971.[2]Rodriguez-Martin,Bernardo,etal."Pan-canceranalysisofwholegenomesrevealsdriverrearrangementspromotedbyLINE-1retrotranspositioninhumantumours."BioRxiv(2017):179705.[3]Evrony,GiladD.,etal."Celllineageanalysisinhumanbrainusingendogenousretroelements."Neuron85.1(2015):49-59.[4]Gardner,EugeneJ.,etal."TheMobileElementLocatorTool(MELT):population-scalemobileelementdiscoveryandbiology."Genomeresearch27.11(2017):1916-1929.[5]Tubio,JoseMC,etal."ExtensivetransductionofnonrepetitiveDNAmediatedbyL1retrotranspositionincancergenomes."Science345.6196(2014):1251343.
Page 91
83
General
GoGetData(GGD):simple,reproducibleaccesstoscientificdata
MichaelCormier1,JonBelyeu1,BrentPedersen1,JoeBrown1,JohannesKoster2,AaronR.Quinlan1
1DepartmentofHumanGenetics,UniversityofUtah,SaltLakeCity,UT,USA;2Algorithmsforreproduciblebioinformatics,InstituteofHumanGenetics,UniversityofDuisburg-Essen,Essen,
NRW,Germany
AaronQuinlanGenomicsresearchiscomplicatedbythedifficultyofidentifying,collecting,andintegratingthenumerousdatasetsandannotationsgermanetoourexperiments.Furthermore,thesedataexistindisparatesources,andarestoredindiverse,oftenabusedformatspertainingtodifferentgenomebuilds.Thesecomplexitieswastetime,inhibitreproducibility,andcurtailresearchcreativity.Inspiredbythesuccessofsoftwarepackagemanagers,wehavedevelopedGoGetData(GGD;https://gogetdata.github.io/)asafast,reproducibleapproachtoinstallstandardizedpackagesofdataandannotationsforgenomicsresearch.
Page 92
84
General
GlobalepigenomicregulationofgeneexpressionandcellularproliferationinT-cellleukemia
SinisaDovat,YaliDing,BoZhang,JonathonL.Payne,FengYue
PennsylvaniaStateUniversityCollegeofMedicine,Hershey,PA,USA
SinisaDovatIkarosencodesaDNA-bindingproteinthatfunctionsasatumorsuppressorinT-cellacutelymphoblasticleukemia(T-ALL).Deletionand/orfunctionalinactivationofIkarosresultsinthedevelopmentofhigh-riskleukemia.ThemechanismsthroughwhichIkarosregulatesgeneexpressionandtumorsuppressioninT-ALLareunknown.Ikaroshaplo-knockoutmicedevelopT-ALLwith100%penetrancewitharrestofT-celldifferentiation.DuringtheprocessofmalignanttransformationtoT-ALL,IkaroshaploinsufficientthymocyteslosetheirremainingwildtypeIkarosallele.Re-introductionofIkarosintoIkaros-nullT-ALLcellsresultsincessationofcellularproliferationandinductionofT-celldifferentiation.Thus,thisisanoptimalsystemforstudyingIkarostumorsuppressorfunctionbecauseitcapturestheroleofIkarosinthetransitionfromamalignantstate(Ikaros-nullT-ALL)toanon-malignantstate(followingIkarosre-introduction).WeusedATAC-seqandChIP-seqofH3K4me1,H3K4me3,H3K27ac,andIkarostoperformdynamic,globalepigenomicandgeneexpressionanalysesatseveraltimepointsinIkaros-nullT-ALLandfollowingIkarosre-introductioninordertodeterminethemechanismsofIkaros’tumorsuppressoractivity.ExpressionanalysisidentifiedalargenumberofnovelsignalingpathwaysthataredirectlyregulatedbyIkarosandIkaros-inducedenhancers,andthatareresponsibleforthecessationofproliferationandinductionofT-celldifferentiationinT-ALLcells.EpigenomicanalysisidentifiednovelIkarosfunctionsintheepigeneticregulationofgeneexpression:Ikarosdirectlyregulatesdenovoformationanddepletionofenhancers;denovoformationofactiveenhancersandactivationofpoisedenhancers;andIkarosdirectlyinducestheformationofsuper-enhancers.GlobalanalysisofchromatinaccessibilityrevealedthatIkarosbindingresultedintheopeningofover3400previously-inaccessiblechromatinsites.ThisisaccompaniedbydenovoenrichmentofH3K4me1andH3K4me3modificationsandformationofdenovoenhancersandpromoters.ThesedatademonstratethatIkaroshaspioneeractivityandtriggerscoordinatedregulationofgeneexpression.Ikarospioneeringactivitywasfurtherdeterminedbydirectbindingofikarostoreconstitutednucleosomesbyelectromobilityshiftassay.Dynamicanalysesdemonstratethelong-lastingeffectsofIkaros’DNAbindingonenhanceractivation,denovoformationofenhancersandsuper-enhancers,andchromatinaccessibility.Inconclusion,ourresultsestablishthatIkaros’tumorsuppressorfunctionoccursviaglobalregulationoftheenhancerandsuper-enhancerlandscape,alongwithregulationofchromatinaccessibility,andidentifiednoveltumorsuppressorregulatorypathwaysinT-ALL.
Page 93
85
General
Apharmacogenomicinvestigationofthecardiacsafetyprofileofondansetroninchildrenandinpregnantwomen
GalenE.B.Wright,BrittI.Drögemöller,JessicaTrueman,KaitlynShaw,MichelleStaub,ShahnazChaudhry,SholehGhayoori,FudanMiao,MichelleHigginson,GabriellaS.S.Groeneweg,JamesBrown,LauraA.Magee,SimonD.Whyte,NicholasWest,SoniaBrodie,Geert’tJong,HowardBerger,ShinyaIto,
ShahradR.Rassekh,ShubhayanSanatani,ColinJ.D.Ross,BruceC.Carleton
BritishColumbiaChildren’sHospitalResearchInstitute,Vancouver,BritishColumbia,Canada;PharmaceuticalOutcomesProgramme,BritishColumbiaChildren’sHospital,Vancouver,BritishColumbia,Canada;Divisionof
TranslationalTherapeutics,DepartmentofPediatrics,UniversityofBritishColumbia,Vancouver,BritishColumbia,Canada;FacultyofPharmaceuticalSciences,UniversityofBritishColumbia,Vancouver,BritishColumbia,Canada;ClinicalResearchUnit,Children'sHospitalResearchInstituteofManitoba,Winnipeg,
Manitoba,Canada;DivisionofClinicalPharmacologyandToxicology,TheHospitalforSickChildren,Toronto,Ontario,Canada;BritishColumbiaWomen’sHospitalandHealthCentre,Vancouver,BritishColumbia,Canada;DepartmentofAnesthesiology,PharmacologyandTherapeutics,UniversityofBritishColumbia,Vancouver,BritishColumbia,Canada;SchoolofLifeCourseSciences,FacultyofLifeSciencesandMedicine,King'sCollege,London,UnitedKingdom;DepartmentofPediatricAnesthesia,BritishColumbiaChildren'sHospital,Vancouver,BritishColumbia,Canada;MaxRadyCollegeofMedicine,RadyFacultyofHealth
Sciences,UniversityofManitoba,Winnipeg,Manitoba,Canada;DepartmentofObstetricsandGynecology,St.Michael'sHospital,Toronto,Ontario,Canada;EpiMethodsConsulting,Toronto,Ontario,Canada;DivisionofCardiology,DepartmentofPediatrics,Children'sHeartCentre,BCChildren'sHospital,UniversityofBritish
Columbia,Vancouver,CanadaGalenWrightBackground:5-HT3receptorantagonists,suchasondansetron,arehighlyeffectivemedicationsforthetreatmentofnauseaandvomiting.However,thesemedicationsarealsoassociatedwithprolongationoftheQTinterval,placingpatientsatriskofcardiacadverseevents.Pharmacogenomicinformationfortherapeuticresponsetoondansetronexists,particularlypertainingtoCYP2D6,butnostudyhasbeenperformedongeneticfactorsthatinfluencethecardiacsafetyofthismedication.Objectives:Determineondansetron-inducedcardiacelectrophysiologicalchangesinthreeuniquepatientcohortsandidentifypharmacogenomicpredictorsofQTintervalprolongation.Methods:Threepatientgroupsreceivingondansetronforthepreventionofnauseaandvomitingwererecruitedandfollowedprospectively(pediatricpost-surgicalpatientsn=101;pediatriconcologypatientsn=98;pregnantwomenn=62).Electrocardiogramswereconductedatbaselineandpost-ondansetronadministration.PharmacogenomicassociationswerethenassessedviaanalysesofcomprehensiveCYP2D6genotypingdataandgenome-wideassociationanalyses.Results:Intheentirecohort,62patients(24.1%)weredefinedascasesbasedonBazett-correctedQTcvalues.Themostsignificantshiftfrombaselineoccurredatfiveminutespost-administration(P=9.8x10-4).Genome-wideanalysesidentifiednovelcandidategenesforthisdrug-inducedphenotype.ThetwomostsignificantassociationswereobservedforamissensevariantinTLR3(rs3775291;P=2.00x10-7)andaneQTLforSLC36A1(rs34124313;P=1.97x10-7).Thesegenesareimplicatedinserotonin-andQT-relatedtraitsandthereforelikelyrepresentbiologicallyrelevantfindings.CYP2D6activityscorewasnotassociatedwithcase-controlstatus.Conclusions:Theresultsofthisstudyprovidethefirststeptowardsunderstandingthegenomicbasisofcardiacchangesoccurringafterondansetronuseinchildrenandpregnantwomen,withtheoverallgoaltoimprovethesafetyofthesecommonlyusedantiemeticmedications.
Page 94
86
General
TREND:aplatformforexploringproteinfunctioninprokaryotesusingphylogenetics,domainarchitectures,andgeneneighborhoodsinformation.
VadimM.Gumerov,IgorB.Zhulin
TheOhioStateUniversity
VadimGumerovKeystepsinacomputationalstudyofproteinfunctioninvolveanalysisof(i)relationshipsbetweenhomologousproteins,(ii)proteindomainarchitecture,and(iii)geneneighborhoodsthecorrespondingproteinsareencodedin.Eachofthesestepsrequiresaseparatecomputationaltaskandsetsoftools.Combiningtheresultsintoacompleteanalysisisusuallydonebyhand,whichistime-consuminganderror-prone.Herewepresentanewplatform,TREND(tree-basedexplorationofneighborhoodsanddomains),whichcanperformallthenecessarystepsinautomatedfashionandputthederivedinformationintophylogenomiccontext,thusmakingevolutionarybasedproteinfunctionanalysismoreefficient.TRENDisfreelyavailableathttp://trend.zhulinlab.org.TRENDconsistsoftwopipelines:(1)Domains,whichidentifiesproteindomains,transmembraneregionsandlow-complexitysegments,andmapsthisinformationonthephylogenetictree,and(2)Neighborhoods,whichidentifiesgeneneighborhoodsforthegivensetofproteinsequences,clustersthegenesbasedonshareddomainsoftheencodedproteins,identifiesoperonsandputsthederiveddataintophylogenomiccontext.LocallystoreddatabasesofthePfamprofileHiddenMarkovmodels(HMMs)andCDDposition-specificscoringmatricesareusedasasourceofmodelsfordomainsidentification.Anothersourceisarichcollectionofsignal-transductionspecificprofileHMMsderivedfromMiSTdatabase.Thepipelinesarehighlycustomizable.Onstart,bothpipelinesfirstalignprovidedproteinsandbuildphylogenetictrees.Thesestepscanbeskippedifaresearcheralreadyhasanalignmentoratreeandwouldliketousetheminstead.Optionallyredundancyofthesequencescanbereduced.Insteadofproteinsequences,proteinidentifierscanbeprovidedasinput;correspondingsequenceswillbefetchedfromRefSeqandMiSTdatabases.Resultsofthepipelinesarepresentedasinteractivepictureswithcross-linkstoPfam,CDD,RefSeqandMiSTdatabases.Allproducedresultscanbedownloadedforsubsequentanalysis.
Page 95
87
General
TrackSigFreq:subclonalreconstructionsbasedonmutationsignaturesandallelefrequencies
CaitlinF.Harrigan1,2,4,YuliaRubanova1,2,4,QuaidMorris1,2,3,4,5,6,AlinaSelega2,4
1DepartmentofComputerScience,UniversityofToronto,Toronto,Canada;2DonnellyCentreforCellularandBiomolecularResearch,UniversityofToronto,Toronto,Canada;3Departmentof
MolecularGenetics,UniversityofToronto,Toronto,Canada;4VectorInstitute,Toronto,Canada;5OntarioInstituteforCancerResearch,Toronto,Canada;6MemorialSloanKetteringCancer
Centre,NewYork,USA(pending)
CaitHarriganMutationalsignaturesarepatternsofmutationtypes,manyofwhicharelinkedtoknownmutagenicprocesses.Signatureactivityrepresentstheproportionofmutationsasignaturegenerates.Incancer,cellsmaygainadvantageousphenotypesthroughmutationaccumulation,causingrapidgrowthofthatsubpopulationwithinthetumour.Thepresenceofmanysubclonescanmakecancershardertotreatandhaveotherclinicalimplications.Reconstructingchangesinsignatureactivitiescangiveinsightintotheevolutionofcellswithinatumour.Recently,weintroducedanewmethod,TrackSig,todetectchangesinsignatureactivitiesacrosstimefromsinglebulktumoursample.Bydesign,TrackSigisunabletoidentifymutationpopulationswithdifferentfrequenciesbutlittletonodifferenceinsignatureactivity.Herewepresentanextensionofthismethod,TrackSigFreq,whichenablestrajectoryreconstructionbasedonbothobserveddensityofmutationfrequenciesandchangesinmutationalsignatureactivities.TrackSigFreqpreservestheadvantagesofTrackSig,namelyoptimalandrapidmutationclusteringthroughsegmentation,whileextendingitsothatitcanidentifydistinctmutationpopulationsthatsharesimilarsignatureactivities.
Page 96
88
General
AFlexiblePipelineforthePredictionofBiomarkersRelevanttoDrugSensitivity
V.KeithHughitt1,SayehGorjifard1,AleksandraM.Michalowski1,JohnK.Simmons2,RyanDale1,EricC.Polley3,JonathanJ.Keats4,BeverlyA.Mock1
1NCI,2PersonalGenomeDiagnostics,3MayoClinic,Rochester,4TGen
V.KeithHughittRecentyearshaveseenanexplosionintheavailabilityofpairedmolecularprofilinganddrugscreendata,providinganunprecedentedopportunityforthedevelopmentoftargetedtherapiesbasedonanindividual’sgeneticbackground.Despiteanumberofrecentsuccessesindiseasesrangingfromcysticfibrosistocancer,significanthurdlesremaininourabilitytoaccuratelypredicttreatmentsbasedonmolecularprofilingdata.Inparticular,fewsuchtoolsexistthatallowtheintegrationofheterogeneousdatatypes(e.g.genomic,transcriptomic,andsomaticmutations),alongwithhigh-throughputdrugscreendatatomakepredictionsabouttreatmentefficacy.Here,wedescribeageneralizedopen-sourcepipelinedevelopedfortheanalysisofprecisionmedicinedata,PharmacogenomicsPredictionPipeline,or“P3”.ThemodulardesignofP3enablestheinclusionofarbitraryinputdatatypesandtheselectionfrommultiplealternativemachinelearningalgorithms,whileautomatedstatisticalandvisualizationreportingstepsincorporatedthroughoutthepipelineassistinparametertuningandearlydetectionofproblematicdataelements.ByincorporatingexternalbiologicalannotationsfromsourcessuchasTheMolecularSignaturesDatabase(MSigDB),DrugSignaturesDatabase(DSigDB),andDrugBank,P3isabletodetectimportantpathwayscorrelatedwithdrugsensitivity,whiletheinclusionofmolecularprofilingandclinicaldatafromexternalpatientandcelllinesdatasetsallowsP3tofocusitseffortsongeneswhicharemostlikelytoplayaroleintherapeuticresponse.TodemonstratetheuseofP3forpreclinicalbiomarkerprediction,weappliedP3toanunpublishedmultiplemyelomadatasetconsistingofexome,RNA-Seq,anddrugscreendatafor1900compoundsacross45tumorcelllines.Furthermore,geneexpressionandclinicaldatafrom20additionalpublically-availablepatientandcelllinemultiplemyelomadatasets(>5,500samplesintotal),alongwithdatafromtheGDSCandCCLEdrugsensitivityexperimentswerealsoanalyzed,providingarichsourceofinformationwithrespecttothebiologicalrelevanceofputativebiomarkersdetectedbythepipeline.
Page 97
89
General
CreatingaMetabolicSyndromeResearchResource(MetSRR)
WillyshaJenkins1,ChristianRichardson2,ClarLyndaWilliams-DeVanePhD1
1FiskUniversityNashvilleTN,2DukeUniversityDurhamNC
WillyshaJenkinsMetabolicsyndrome(MetS)isamultifacetedsyndrome.Riskfactorsincludevisceraladiposity,dyslipidemia,hyperglycemia,hypertension,andenvironmentalfactors.Anestablishedcomponentofchronicdiseasesequela,MetSleadstoanincreasedriskofcardiovasculardiseaseandtype2diabetes.MetSalsoleadstoanincreasedriskofstroke.ComparativestudieshaveidentifiedheterogeneityinthepathologyofMetSacrossgroups,however,theetiologyofthesedifferenceshasyettobeelucidated.DespitethepresenceofpublicrepositoriesofbiologicalMetS-relateddata,theabilitytoaccessandworksaiddatahasitschallenges.Theprocessofqueryingdatabases,wrestlingwithsoftwareandwranglingdataintoworkableformatspriortoanalysisisbothcumbersomeandtimeconsuming.TheMetabolicSyndromeResearchResource(MetSRR)isacurateddatabasethatprovidesaccesstoMetSassociatedbiologicalandancillarydata.ItisanamalgamationofcurrentandpotentialbiomarkersofMetSextractedfromrelevantNationalHealthandNutritionExaminationSurvey(NHANES)datafrom1999-2016.Eachpotentialbiomarkerselectionwasdrivenbyinsightselucidatedbythereviewofover100peer-reviewedarticles.Itincludes28demographic,surveyandknownMetSrelatedvariables.Thereare9curatedcategoricalvariablesand42potentiallynovelbiomarkers.Allmeasuresarecapturedfromover90,000individuals.ThisbiocurationeffortwillprovideincreasedaccesstocuratedMetSrelateddata.ItwillalsoserveasahypothesisgenerationtoolfordisparateMetSetiologydiscovery,providingtheabilitytogenerate;andexportethnicgroup/race,sex,andage-specificcurateddatasets.MetSRRseekstobroadenparticipationinresearcheffortstoidentifyclinicallyevaluativedisparateMetSbiomarkers.Tothebestofourknowledge,MetSRRistheonlyMetSspecificdatabasetargetedatuncoveringthedisparateetiologyofMetSthroughbiocuration.
Page 98
90
General
Utilizingcohortinformationtofindcausativevariants
SenayKafkas,RobertHoehndorf
ComputationalBioscienceResearchCenter,Computer,ElectricalandMathematicalSciences&EngineeringDivision,KingAbdullahUniversityScienceandTechnology,4700KAUST,Thuwal,
23955-6900SaudiArabiaSenayKafkasIdentificationofcausativevariantsingenomicdataischallenging.Currentstudiesfocusonprioritizingvariantswithinindividualgenomes,orapplystatisticalmethods(e.g.GWAS)tolargecohorts.WiththerapidadvancementsandcostdecreaseinNGS,scientistsareabletoproducesequencedatafromlargediseasecohortsandhealthypopulation.Forexample,UKBiobankmakesavailablegenotypetophenotyperelationsfor>500,000individualsandwholeexomesequencing(WES)datafor50,000individuals.Patientswiththesame/similarsetofphenotypesmaysharethesame/biologicallyrelatedgeneticabnormalitiesandriskfactors.Theavailabilityofthesedatasetsmayallowustostratifyindividualsbytheirphenotypeandusethisinformationtoidentifycausativevariantswithinlargecohorts.WeproposeanewmethodthatstratifiespatientsbytheirphenotypesandidentifiesthesetofcausativevariantswhichcanexplainphenotypesinmostindividualswithinacohortfromWES/WGS.First,wegeneratedandusedsyntheticdiseasecohortstoevaluateourmethod.Weusedthehumangenotype-phenotypeassociationsfromClinVarandthesequencedatafrom1000Genomesandgeneratedsyntheticcohortswithdifferentpopulationsizesfor200randomlyselecteddiseasesfromClinVar.TogenerateasyntheticdiseasecohortofsizeN,firstwepickedrandomlyNindividualsfrom1000Genomesandthenforeachindividual,wepickedrandomlyoneofthevariantsofthegivendiseaseandaddedittothegenotypeofthegivenindividual.Wepre-processedthesequencedatabyannotatingwithCADDandselectingonlythemostdeleteriousvariantofagivengeneforeachindividual.Furthermore,we“normalize”pathogenicityscoresbasedontheirfrequencieswithinapopulationinordertoaccountfordifferentdistributionwithingenesbasedontheirlength.WethenapplyourmethodonUKBiobank.WedevelopedamethodthatidentifiescausativevariantsbyutilizinginformationaboutsharedphenotypeswithinacohortandcomparedthemagainstindividuallyprioritizingvariantsusingWES/WGSdataandaveragegeneranks.Ourapproachreliesonamachinelearningmodeltrainedonapathogenicitypredictionscore(e.g.CADD),thefrequencyofobservingapathogenicityscoreaboveacertainthresholdinthesamegenewithinapopulation,andusesthiscohortandphenotype-derivedinformationasfeaturetopredictcausativevariantswithinindividualgenomesequences.Ourmethodcanidentifycausativevariantsinsmallandmedium-sizedcohorts(2to100individuals).Asthediseasebecomesmorecomplex(i.e.involvingharmfulvariantsinmultiplegenes),ourmachinelearningmodelimprovesoverestablishedmethodsinparticularinlargercohorts(>80individuals).Currently,weappliedourmethodonUKBiobankandsuggestcandidatecausativevariantsfor1499complexdiseases.
Page 99
91
General
IntegratedanalysisofJAK-STATpathwayinhomeostasis,simulatedinflammationandtumour
MilicaKrunic1,AnzhelikaKarjalainen1,MojoyinolaJoannaOla1,StephenShoebridge1,SabineMacho-Maschler1,CarolineLassnig1,AndreaPoelzl1,MatthiasFarlik2,NikolausFortelny2,
ChristophBock2,BirgitStrobl1,MathiasMueller1
1InstituteofAnimalBreedingandGeneticsandBiomodelsAustriaUniversityofVeterinaryMedicineViennaAustria;2CeMM–CenterforMolecularMedicineAustrianAcademyofSciences
ViennaAustria
MilicaKrunicJanuskinases(JAKs)andsignaltransducersandactivatorsoftranscription(STATs)playakeyroleincytokinesignallingandinthedefenceagainstinfectionandcancer.JAK-STATsignallingcomponentsinteractwithchromatinremodellingproteinsandchangechromatinarchitecture/landscapeduringcelldifferentiationandrecognitionandeliminationofpathogens.Usingdifferentsequencingapproaches(ATAC-Seq,ChIPmentation,single-cellRNA-Seq,RNA-Seq),ourgoalistountangletherolesofJAK-STATproteinsinshapingchromatinlandscapesofmyeloidandlymphoidcellsinhomeostasis,sterile(simulated)inflammationandwithintumourmicroenvironment.Additionally,weareinvestigatinghowevolutionaryconservedSTATproteinisoformsinteractwithchromatinandco-regulatoryproteinstoinducecelltype-andgene-specificresponses.Thepostershowsoursummarisedfindingsasaresultofintegrationofdifferentapproaches.
Page 100
92
General
BEERS2:TheNextGenerationofRNA-SeqSimulator
NicholasF.Lahens1,ThomasG.Brooks1,DimitraSarantopoulou1,SoumyashantNayak1,CrisW.Lawrence1,AnandSrinivasan2,JonathanSchug3,4,GarretA.FitzGerald1,5,JohnB.Hogenesch6,
YosephBarash4,GregoryR.Grant1,4
1InstituteforTranslationalMedicineandTherapeutics,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;2PMACSEnterpriseResearchApplicationsandHigh
PerformanceComputing,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;3InstituteforDiabetes,Obesity,andMetabolism,PerelmanSchoolofMedicine,Universityof
Pennsylvania,Philadelphia,PA;4DepartmentofGenetics,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;5DepartmentofSystemPharmacologyandTranslationalTherapeutics,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;6DivisionofHumanGenetics,DepartmentofPediatrics,Centerfor
Chronobiology,CincinnatiChildren'sHospitalMedicalCenter,Cincinnati,OH
NicholasLahensTheaccurateinterpretationofRNA-Seqdatapresentsamovingtargetasscientistscontinuetointroducenewexperimentaltechniquesandanalysisalgorithms.Thischallengehasledresearcherstoperformasubstantialnumberofbenchmarkingstudiesinordertodeterminebestanalysispractices.Simulateddatasetshaveproventobeaninvaluabletoolintheseefforts.Despitethisstrongneedforsimulateddata,onlyafewRNA-Seqsimulatorshavebeenreleasedinthepublicdomain,andallofthemarebasedonsimplifyingassumptionsthatlimittheirutility.ToaddresstheseshortcomingsandgeneraterealisticsimulateddatawearedevelopingtheBenchmarkerforEvaluatingtheEffectivenessofRNA-SeqSoftware(BEERS)2:anopen-source,modularsimulatorthatmodelseachstepintheprocessofconvertingRNAmoleculesintosequencingreads.WetakeanempiricalapproachtogeneratingrealisticRNAsamplesreflectingbiologicalvariability,alternativesplicing,andallele-specificexpression,whichusesrealdatatotraintheparameters.Next,wemodelbiochemicalreactionsandbiasesfromeachstepinlibraryconstructionasseparatemodules.Usinganobject-orientedparadigm,eachmodulehaswell-definedinputsandoutputsallowinguserstoeasilysubstitutenewmodules.ThisdesigngivesBEERS2theflexibilitytomodelchangestolibraryconstructionandsequencingprotocols,evolvinginparallelwithsequencingtechnology.BEERS2isopensource,freelyavailable,andwillbeacrucialtoolforthecommunityaswecontinuetodevelopstandardsfortranscriptomeanalysis.
Page 101
93
General
EffectModificationbyAgeonaDiagnosticThree-Gene-SignatureinPatientswithActiveTuberculosis
LaurenMcDonnell1,CarlyA.Bobak1,2,MatthewNemesure1,JustinLin1,JaneE.Hill1
1ThayerSchoolofEngineeringatDartmouthCollege,2GeiselSchoolofMedicineatDartmouthCollege
LaurenMcDonnellIntroductionTuberculosis(TB)istheleadingcauseofdeathfromasingleinfectiousagentworldwide(1).In2017,therewere10millionreportedcasesofTBandanother1.3milliondeathsfromthedisease(1).ItiscurrentlytheleadingkillerforindividualswhoareHIVpositive(1).In2014,theWHOdevelopedtheambitiousSustainableDevelopmentGoals(SDGs)whichincluded"EndTB",amajorprogramaimingtoeradicatetheTBepidemicby2030(2).Accomplishingthiswillrequiremoreadvanceddiagnosticsthatarelessinvasiveanddeterminethediseasestatusmorequicklyandmorereliably.Inouranalysis,weaimtomodelriskfactorsassociatedwiththedevelopmentofTB.Here,wearelookingatdemographicfeaturesfrommulti-cohortstudiespullingdatafromthirtydifferentcountriesfromtheGeneExpressionOmnibusexaminingpatientswithactiveTB,latentTB,otherdiseases,andhealthycontrols.Thedataispulledpredominantlyfromdevelopingcountries,butalsoincludessamplesfromdevelopedcountries,includingtheUK,France,Germany,andtheUnitedStates.Intotal,thedatasetincludes3,096participants.Metaanalysisofsimilardatasetshaveproposedathree-gene-scoreasa"global"tuberculosismetric(3).ThistypeofanalysissuggeststhatallactiveTBpatients,regardlessofotherfactors,willexpressthisgenescore.OurhypothesisisthatthisactiveTBwillbeadditionallymediatedbydemographicfactorssuchasageandHIVstatusthatareassociatedwithTB.MethodologyWeperformedamultivariatelogisticregressionanalysistoidentifydemographicfeaturesassociatedwithculture-confirmedTuberculosis.Themodelfeaturesincludedage,HIVstatus,andgeneexpressionsforeachgeneindividually(GBP5,DUSP3,andKLF2),aswellasaninteractiontermforHIVandagewitheachofthethreegenes.ResultsTheresultsofourmultivariatelogisticregressionsuggestthatagemodifiesallthreegenesintheproposedglobalgenesignatures(p-valuesof5.38e-05,6.75e-05,and,0.01012,forGBP5,KLF2andDUSP3respectively).InitialfindingsalsoindicatethatHIVstatusisamediatoroftheeffectofGBP5(p-valueof0.03437).Knowingthattherelationshipbetweenthegeneexpressionofthesethreegenesvariesbydemographicsmaychangethewaythatadiagnosticisimplementedinclinic.Ourhopeisthatthisanalysiswillbeusedtofurtherrefinethethree-genesignatureforspecificdemographicgroupswhereitmaybemosteffectiveindiagnosingactiveTB.Citations(1)WHOGlobalTuberculosisReport2018www.who.int/tb/publications/global_report/en/(2)EndingTuberculosisby2030:CanWeDoIt?A.B.Suthar,R.Zachariah,Harrieshttps://www.ingentaconnect.com/contentone/iuatld/ijtld/2016/00000020/00000009/art00007?crawler=true(3)Genome-WideExpressionforDiagnosisofPulmonaryTuberculosis:aMulticohortAnalysishttps://www.ncbi.nlm.nih.gov/pubmed/26907218
Page 102
94
General
Classificationandmutationpredictionfromgastrointestinalcancerhistopathologyimagesusingdeeplearning
SungHakLee1,Hyun-JongJang2
1DepartmentofHospitalPathology,SeoulSt.Mary’sHospital,CollegeofMedicine,TheCatholicUniversityofKorea,2DepartmentofPhysiology,CollegeofMedicine,TheCatholic
UniversityofKorea
SungHakLeeBACKGROUND:Althoughmicroscopicanalysisoftissueslideshasbeenthebasisfordiseasediagnosisfordecades,intra-andinter-observervariabilitiesremainissuestoberesolved.TherecentintroductionofdigitalscannershasallowedforresearcherstousedeeplearningintheanalysisoftissueimagesbecausemanyH&Ewholeslideimages(WSIs)areavailable.Inthepresentstudy,weinvestigatedthepossibilityofadeeplearning-based,fullyautomated,computer-aideddiagnosissystemwithWSIsfromagastricadenocarcinoma(STAD)dataset.Inaddition,wetrainedthenetworktopredictseveralcommonlymutatedgenesinSTAD.Furthermore,weshowedthatdeeplearningcanpredictMSIdirectlyfromH&Eimages.MATERIALSANDMETHODS:Westudiedtheautomaticclassificationof‘normal’and‘tumor’regionsusingatotalof432H&E-stainedWSIsfromTCGAgastriccancerimagedataset.Theslidesweretiledinnon-overlapping360x360pixelwindowsatamagnificationof20x.Weused70%ofthosetilesfortraining,15%forvalidation,and15%forfinaltesting.Thedeeplearningwithconvolutionalneuralnetworkswasperformedbasedoninceptionv3architecture.TostudythepredictionofgenemutationsfromH&Eimages,averageareaunderthecurve(AUC)valuesforKRASandSMAD4mutation(93and88cases,respectively)werecalculatedusingourautomatictumorclassificationdeep-learningapproach.TostudythepredictionofMSI(MSSvs.MSI-H)fromH&Eimages,383caseswereenrolledusingthesameapproach.RESULTS:Theperformanceofourmethodiscomparabletothatofpathologists,withanAUCofupto0.999.Furthermore,wetrainedthenetworktopredicttwocommonlymutatedgenesinSTAD(KRASandSMAD)andinvestigatedwhethertheycanbepredictedfrompathologyH&Eimages.WefoundthatKRASandSMADmutationcanbepredictedfrompathologyimages,withAUCsof0.711to0.737,similarresultsfrompreviousstudieswithnon-smallcelllungcancerhistopathologyimagesusingdeeplearning.ForthepredictionofMSI,patch-levelandpatient-levelAUCswere0.843and0.912,respectively,whichissuperiortothepreviousstudieswithTCGA-COADand-STADhistopathologyimages.CONCLUSIONS:Thesefindingssuggestthatdeep-learningmodelscanassistpathologistsinthedetectionofcancersubtypesandinthepredictionofgenemutationsandMSIstatus.Aftertrainingonlargerdatasetsandprospectivevalidation,thisapproachhasthepotentialtoprovideimmunotherapytoamuchbroadersubsetofpatientswithSTAD.
Page 103
95
General
MappingtheEmergenceandMigrationofHematopoieticStemCellsandProgenitorsDuringHumanDevelopmentatSingleCellResolution
FeiyangMa,VincenzoCalvanese,SandraCapellera-Garcia,SophiaEkstrand,MatteoPellegrini,HannaK.A.Mikkola
DepartmentofMolecular,CellandDevelopmentalBiology,UCLA,LosAngeles,CA,USA
FeiyangMaHematopoiesisisestablishedduringdevelopmentthroughmultiplewavesofbloodcellproduction,startingwithlineage-primedprogenitorsrequiredfortheembryosneeds,andculminatinginthegenerationofself-renewinghematopoieticstemcells(HSCs)forlife-longhematopoiesis.Althoughhematopoieticontogenyhasbeenstudiedextensivelyinmice,welackknowledgeoftheanatomical,temporalandmolecularmapforhematopoieticdevelopmentinhuman.PriorstudiessuggestthatHSCsemergefromhemogenicendotheliumintheaorta-gonad-mesonephros(AGM)regionbetween4-6weeksofhumangestation.Extraembryonicsitesincludingtheplacenta,umbilicalandvitellinearteries,andtheyolksac,havebeenproposedtogenerateHSCsinthemouse.However,whetherthesamesitesgenerateHSCsinhumanisunclear,mainlyduetothelimitedaccesstodevelopmentaltissuesandlackofreliablemethodstoidentifydevelopinghumanHSCs.Wecreatedasingle-celltranscriptomemapofhemato-vascularcells(CD34+and/orCD31+)fromhumanhematopoietictissuesat1stand2ndtrimester.Usingamolecularsignatureofself-renewingHSCsdefinedinourpreviousmolecularandfunctionalstudies,wecouldidentifyCD34+Thy1+RUNX1+HOXA7+MLLT3+HLF+cellsasHSCsthroughoutdevelopment.Analysesof5-wkAGMrevealedadistinctpopulationofnewlyemergedHSCsthatvanishedby7wks.HSCscolonizedthefetalliverby6wks,wheretheyexpandedanddifferentiatedbeyond15wks.SmallbutdistinctpopulationexpressingHSCmolecularmarkerswasreproduciblydetectedin5wkplacentas.Atthistime,theheart,umbilicalcordandfetalliverlackedclearHSCpopulations,implyingminimalspreadingthroughcirculatingblood.Interestingly,precedingHSCcolonization,the5wkfetalliveralreadyharboredCD34+Thy1-RUNX1+HOXA7-MLLT3-HLF-progenitorsthatco-expressedmarkersassociatedwitherythro-myeloidandlympho-myeloidpotential.Comparablepopulationswereabundantintheyolksac,suggestiveoftheirorigin.Thisdata-setprovidesanunprecedentedresourcetodissectthedynamicsandmolecularpathwaysgoverningtheemergenceandprogressionofdistinctwavesofhematopoieticcellsduringhumandevelopment,andservesasareferencemapforthegenerationofHSCsinvitrofortherapeuticpurposes.
Page 104
96
General
Large-scaleMachineLearningandGraphAnalyticsforFunctionalPredictionofPathogenProteins
JasonMcDermott1,SongFeng1,WilliamNelson1,Joon-YongLee1,SayanGhosh1,ArifulKhan1,MahanteshHalappanavar1,JustineNguyen2,JonathanPruneda2,DavidBaltrus3,JoshuaAdkins1
1PacificNorthwestNationalLaboratory,2OregonHealth&ScienceUniversity,3Universityof
Arizona
JasonMcDermottProteinsenactthefunctionalityencodedbygenomesandsounderstandingproteinfunctioniscriticaltomanyareasofbiology.Predictionofproteinfunctionfromsequenceispossiblebecauseofevolutionaryrelationshipsbetweenproteinswithsimilarfunctions,andexistingalgorithmscanidentifythecorrespondingsequencesimilarity.However,manyproteinshavesimilarfunctionsbutdiversesequences,whichthwartexistingmethods,anddrivenbyadvancesinsequencingtechnologythenumberofproteinsequenceswithnoknownfunctionorsimilaritytoproteinsofknownfunctionislargeandgrowingrapidly.Weusereducedaminoacidalphabetmappingandkmer-basedproteinsequencerepresentationtodetectfunctionalsimilaritiesbetweenproteinsandapplythismethodtobacterialandviralproteinsthatmimiceukaryoticubiquitinligasesanddeubiquitinasesandclassesofbacteriocins.Thesemodelsallowpredictionofnovelexamplesthatarenotdetectedbytraditionalsequencesimilarity,andcanprovideinsightintoactivesitesorotherfunctionaldomainsfortheproteins.Toexploresequencespaceinamorediscovery-orientedwaywehaveappliedthisapproachtoaverylargesetofbacterialproteinsequences(>20millionsequences)anduseaGPU-basedalgorithmtoquicklycalculateasimilaritygraphbasedonproteinfeaturesbeyondtraditionalsequencesimilarity.Exascalegraphanalyticsmethodsareusedtoidentifygroupsofcloselyrelatedsequencesfromthesimilaritygraph.Weshowthatthismethodcanrecapitulateknownrelationshipsbetweenproteins,highlightinconsistenciesintheunderlyingproteindatabase,andprovidehypothesesforfunctionsofnovelproteinsthusprovidingalarge-scalesequencelandscape.
Page 105
97
General
Gene-setanalysisusingGWASsummarystatisticsandGTExdatabase
MasahiroNakatochi
DepartmentofNursing,NagoyaUniversityGraduateSchoolofMedicine
MasahiroNakatochiRecently,samplesizesofgenome-wideassociationstudies(GWASs)arerapidlyincreasing.Consequently,manygeneticlociassociatedwithtraitshavebeenidentified.ItisdifficulttointerprethowthesemanylociidentifiedbyGWAScontributetothetraits.AsafunctionofSNP,regulationofgeneexpressionlevelisconsidered.TheSNPiscalledasexpressionquantitativetraitloci(eQTLs).TheGTExprojectrevealedmanyeQTLsinmanytissuesofhuman.Inthisstudy,IproposeanapproachofagenesetanalysisusingGWASsummarystatisticsandGTExdatabasetoinvestigatehowthegeneticlociidentifiedbyGWAScontributetothetrait.Thisapproachhasthreesteps.Atfirst,trait-associatedSNPsareidentifiedbyGWAS.Second,geneswhoseexpressionlevelwasassociatedwithtrait-associatedSNPsinatleastonetissueintheGTExdatabasearesearched.Thesegeneswereclassifiedintoeitherofpositivelyornegativelycorrelatedgenes.Finally,genesetenrichmentanalysesofpositivelycorrelatedgenesandnegativelycorrelatedgenesareperformedwiththemodifiedFisher’sexacttesttoidentifytrait-associatedpathwaysorgenesets.Usingthisapproach,Ifoundserumuricacid(SUA)-associatedgenesetsbasedonaSUAGWAS.GenesetenrichmentanalysisofUniProttermsfoundtheterms“Williams-Beurensyndrome”,“sodium”,“transport”,“sodiumtransport”,and“alternativesplicing”wereenrichedforthepositivelycorrelatedgenes.ThisapproachprovidesanotherinsightintotheSNPsidentifiedbyGWAS.
Page 106
98
GeneralGeneral
TargetingCancerviaSignalingPathways:ANovelApproachtotheDiscoveryofGeneCCDC191'sDouble-agentFunctionusingDifferentialGeneExpression,HeatMap
AnalysesthroughAIDeepLearning,andMathematicalModeling
AnnieOstojic
PurdueUniversity
AnnieOstojicAccordingtoarecentJohnsHopkinsUniversitystudypostedinMayof2018,thenumberoftotalgenesinthegenomewasrecalculatedtobe43,162genescomprisedof21,306protein-codedgenesand21,865non-codedgenes.WithcompletionofbasepairsequencingintheHumanGenomeProjectbackin2003,hopeexistedforaccelerationofnewmedicaltreatmentsanddiseaseintervention.However,earlierbioinformaticprocesseswereunabletoproduceresultsquicklyenough,somanygenefunctionsremainunknowntodate.Aneedexiststoanalyzegenefunctionsinpathwaystomeetachangingmedicalindustryofpharmacogenomics,personalizedmedicine,andcancertreatmentsrelativetogeneexpressionpatterns.Newmethodologyfordeterminingfunctionsofunstudiedgenestorapidlyextrapolate,classify,andcorrelatetheirgeneexpressionstobiologicalpathwaysisattheforefrontofbioinformaticstudies.ThisresearchdiscoveredthefunctionofgeneCCDC191,acoiled-coildomain-containingprotein-codinggene,whosefunctionhadnotbeenfullystudiednordefined.AnovelapproachwasutilizedtodeterminethefunctionofCCDC191bycombininggeneexpressionanalysis,patientsurvivalanalysis,differentialgeneexpression,heatmapwithAIdeeplearning,andreverseengineeringmathematicalmodeling.ThisstudypresentsanalysesandinsightsintogeneCCDC191whichhavenotbeenperformedprior,anditprovidesareplicablemethodologywhichincorporatesAIdeeplearningimageclassification,andreverseengineeringmathematicalmodelingtodeterminegenefunctionsinpathwaysandcancerconnectedness.
Page 107
99
General
RFEX:SimpleRandomForestModelandSampleExplainerfornon-MachineLearningexperts
DragutinPetkovic,AliAlavi,DanDanCai,JizhouYang,SabihaBarlaskar
SanFranciscoStateUniversity(allauthors)
DragutinPetkovicMachineLearning(ML)isbecominganincreasinglycriticaltechnologyinmanyareas.However,itscomplexityanditsfrequent“non-transparency”createsignificantchallenges,especiallyinthebiomedicalandhealthareas.OneofthecriticalcomponentsinaddressingtheabovechallengesistheexplainabilityortransparencyofMLsystems,whichreferstothemodel(relatedtothewholedata)andsampleexplainability(relatedtospecificsamples).OurresearchfocusesonbothmodelandsampleexplainabilityofRandomForest(RF)classifiers.OurRFexplainer,RFEX,isdesignedfromthegroundupwithnon-MLexpertsinmind,andwithsimplicityandfamiliarity,e.g.providingaone-pagetabularoutputandmeasuresfamiliartomostusers.InthispaperwepresentsignificantimprovementinRFEXModelexplainercomparedtotheversionpublishedpreviously,anewRFEXSampleexplainerthatprovidesexplanationofhowtheRFclassifiesaparticulardatasampleandisdesignedtodirectlyrelatetoRFEXModelexplainer,andaRFEXModelandSampleexplainercasestudyfromourcollaborationwiththeJ.CraigVenterInstitute(JCVI).WeshowthatourapproachoffersasimpleyetpowerfulmeansofexplainingRFclassificationatthemodelandsamplelevels,andinsomecasesevenpointstoareasofnewinvestigation.RFEXiseasytoimplementusingavailableRFtoolsanditstabularformatofferseasy-to-understandrepresentationsfornon-experts,enablingthemtobetterleveragetheRFtechnology.
Page 108
100
General
ApparentbiastowardlonggenemisregulationinMeCP2syndromesdisappearsaftercontrollingforbaselinevariations
AyushT.Raman1,2,AmyE.Pohodich2,Ying-WooiWan2,HariKrishnaYalamanchili2,WilliamE.Lowry3,HudaY.Zoghbi2,ZhandongLiu2
1BroadInstituteofMITandHarvard,2BaylorCollegeofMedicine,3UniversityofCaliforniaLos
Angeles
AyushRamanBackground:RettsyndromeisaneurodevelopmentaldisordercausedbymutationsinMECP2,amethyl-bindingproteinwhosetaskistoorchestrategeneexpression,andMeCP2mutationsdisrupttheexpressionofseveralthousandgenes.Overthepasttenyears,anumberofstudiesobservedthatRettsyndromeandotherdisordersthataffectneuronalsynapsesseemtopreferentiallydysregulategenesthatarelongerthan100Kb.Theselength-dependenttranscriptionalchangesinMeCP2-mutantsamplesaremodest,but,giventhelowsensitivityofhigh-throughputtranscriptomeprofilingtechnology,herewere-evaluatethestatisticalsignificanceoftheseresults.Results:Wedeveloparobuststatisticalapproachtoestimatenoiseaccuratelyandidentifystatisticallysignificantgenelength-dependentchanges.Wefindthattheapparentlength-dependenttrendspreviouslyobservedinMeCP2microarrayandRNA-sequencingdatasetsdisappearafterestimatingbaselinevariability(i.e.,intra-sampledifferences)fromrandomizedcontrolsamplesacrosspublicallyavailable17differentMeCP2datasets.WeshowthatevenMAQC/SEQCPhase-IIIbenchmarkdatasetsarepronetothelonggenebias,whichdoesnotincludeMeCP2oritseffectsonexpression—suggestingthatthebiasisnotaninherentfeatureofgeneexpressionfollowingMeCP2disruption.WehypothesizedthatPCRamplification,aprocesssharedbybothmicroarrayandRNA-seqtechnologies,mightintroducetheobservedbiasinlonggeneexpression.WefindnobiaswithnanoStringtechnology,atechniquethatdoesnotusePCRamplification,forSEQC/MAQCsamplesorMecp2mutantsamples.Thisconfirmedournotionthatthepreviousobservationsoflong-genebiasresultedfromamplification-basedtechnologiesandthefailuretoestablishaproperbaseline.Conclusions:Weconcludethataccuratecharacterizationoflength-dependent(orother)trendsrequiresestablishingabaselinefromrandomizedcontrolsamples.WeproposethatsmallerfoldchangesintranscriptionobservedafterPCRamplificationleadstoanoverestimationoflonggeneexpressionlevels.
Page 109
101
General
Predictionofchronologicalandbiologicalagefromlaboratorydata
LukeSagers1,LukeMelas-Kyriazi2,ChiragJ.Patel3,ArjunK.Manrai1
1BostonChildren’sHospitalComputationalHealthInformaticsProgram,2HarvardUniversityDepartmentofMathematics,3HarvardMedicalSchoolDepartmentofBiomedicalInformatics
LukeSagersAginghaspronouncedeffectsonbloodlaboratorybiomarkersusedintheclinic.Priorstudieshavelargelyinvestigatedasinglebiomarkerorpopulationatatime,limitingacomprehensiveviewofbiomarkervariationandagingacrossdifferentpopulations.Herewedevelopasupervisedmachinelearningapproachtostudytheagingprocessusing356bloodbiomarkersmeasuredin67,536individualsacrossdemographicallydiversepopulations.Ourmodelpredictsagewithameanabsoluteerror(MAE)inheld-outdataof4.76yearsandanR2valueof0.92.Agepredictionwashighlyaccurateforthepediatriccohort(MAE=0.87,R2=0.94)butinaccurateforages65+(MAE=4.30,R2=0.25).Extensivevariabilitywasobservedinwhichbiomarkerscarrythemostpredictivepoweracrossdifferentagegroups,genders,andrace/ethnicitygroups,andnovelcandidatebiomarkersofagingwereidentifiedforspecificageranges(e.g.VitaminEforages18-45).Wefurthershowthatpredictorsaccurateforoneagegroupmayfailtogeneralizetoothergroups,andfindthatnearlyathirdofallbiomarkersexhibitnon-linearitynearadulthood.Aspopulationsworldwideundergomajordemographicchanges,itwillbeincreasinglyimportanttocataloguebiomarkervariationacrossagegroupsanddiscovernewbiomarkerstodistinguishchronologicalandbiologicalaging.
Page 110
102
General
WholegenomesequencinganalysisofinfluenzaCvirusinKorea
SooyeonLim,HanSolLee,JiYunNoh,JoonYoungSong,HeeJinCheong,WooJooKim
DivisionofInfectiousDiseases,DepartmentofInternalMedicine,KoreaUniversityCollegeofMedicine,Seoul,SouthKorea;DivisionofBrainKorea21ProgramforBiomedicineScience,
CollegeofMedicine,KoreaUniversity,Seoul,SouthKorea;AsiaPacificInfluenzaInstitute,KoreaUniversityCollegeofMedicine,Seoul,SouthKorea
SooyeonLimThroughtheHospital-basedInfluenzaMorbidityandMortality(HIMM)surveillancesystem,973nasopharyngealswabspecimensfromchildrenunder2yearsofagewerecollectedandtestedforinfluenzavirusesusingreal-timePCR.Amongthetestedspecimens,383werepositiveforinfluenzaAand/orBvirus.InfluenzaCviruswasconfirmedinfivespecimens.Inthisstudy,weusedfiveinfluenzaCviruspositivespecimensandacell-culturedinfluenzaCvirus.ViralRNAwasisolatedusingtheQIAampviralRNAminikit(Qiagen,Hilden,Germany)followingamanufacturer’sinstructions.AllisolatedRNAwasfinallyelutedwith60ulofdistilledwater.ReversetranscriptionreactionwasperformedbyPrimescript1ststrandcDNAsynthesiskit(Takara,Shiga,Japan)usinguni-5’primer.Thegenome-wideamplificationoftheinfluenzaCviruswasperformedusingtaqpolymerase.TheamplifiedgenefragmentswereperformedusingtheNexteraXTDNAlibraryPrepkit(Illumina),accordingtothemanufacturer’sprotocol.ThisstudywasthefirstreportofinfluenzaCvirususingNGSanalysisinSouthKorea.Inthisstudy,youngchildrenwithinfluenzaCvirusinfectionshadacuterespiratoryillnesses,suchasfever,rhinorrhea,andcough,butnopneumoniaorsevererespiratoryillnesswasobserved.BasedonNGSanalysis,wecanexpandourunderstandingvarioussymptomsofinfluenzaCvirus.
Page 111
103
General
MiningtheHumuhumunukunukuapuaandtheShakaofAutismwithBigDataBiomedicalDataScience
PeterWashington,BriannaChrisman,KaitiDunlap,AaronKline,ArmanHusic,MichaelNing,KelleyMariePaskov,NathanielStockham,MayaVarma,EmilieLeBlanc,JackKent,Yordan
Penev,MinWooSun,Jae-YoonJung,CatalinVoss,NickHaber,DennisP.Wall
DepartmentsofPediatrics(SystemsMedicine)andBiomedicalDataScience,StanfordUniversity
DennisWallMentalhealthisarguablyatthecoreofallhealth,andearlychildhoodmentalhealthpredictsalongtermhealthylifecourse.Yet,finding,treating,andpreventingmentalhealthdisordersinchildrenislimitedbyreachandscalablemethods.Thankfully,advancesinAIandubiquitoustechnologyhavemarshaledinunparalleledopportunitiesforscalablemobilehealth.Wehaveconstructedaseriesofmobilesolutionsthattreatandtrackwhilesimultaneouslybuildingnovelcomputervisionlibrariesforprecisionmodels.Thesesolutionsfunctionasmobilegamesthatarehighlyengaginganddesignedfortheindividual,encouragingcompliancewiththerequired“dose”whilepassivelycollectingmetricstomeasure,andultimatelypredictoutcomes.Wecanquantifyordigitizeachild’sphenotypethroughthesepassivelycollecteddata,notjustonce,butmanytimes,asthechildplaysourgamesandlearnsthroughplaying.Thesegamesengendertrustandastheydo,we“crowd”buildacommunityofstakeholdersthatnotonlysharesPhenomedata,butalsodataontheirGenomeandtheEnvironment.Withthe3modalities,weusedatafusionmultivariatetechniquestoresolvetheG+E=Pequationforautismandsetthestagefordoingthesameinotherspectrumdisordersacrossmentalhealth.
Page 112
104
General
Developmentofarecurrencepredictionmodelforearlylungadenocarcinomausingradiomics-basedartificialintelligence
HeeChulYang,GunseokPark,JiEunOh
DivisionofConvergenceTechnology,NationalCancerCenterResearchInstitute
HeeChulYangPurpose:Thisstudyaimedatpredictingtherecurrenceaftercurativeresectionforthepatientswithlungadenocarcinoma(ADC)usingthephenotypicradiomicsfeaturesobtainedfromtheCTimages.Material:FromJanuary1,2010,toDecember31,2015,atotalof604primarylungADCpatientswhohadthetumorsizeof1-3cmunderwentcurativeresectionatasingleinstitution.Method:Atotalof604patients’preoperativeCTimageswereusedforfeatureextraction.Thefinaldatasetwasrandomizedintoatrainingset(n=424)andatestset(n=180)withtheratioof7:3.Radiomicsfeatureswereselectedfromt-test(P<0.05)andaradiomicssignaturewasclassifiedbythelogisticregressionmodel.TheoptimalmodelwasevaluatedthroughaROCcurve.Result:Inalogisticregressionanalysis,6radiomicsfeatureswerefinallyselectedfrom51featurestobuildaradiomicssignaturethatwassignificantlyassociatedwithrecurrence.Theoptimalmodelwasbuiltwithfeaturesassociatedwiththedependentvariable.TheypresentedgoodperformanceinthepredictionofrecurrencealonewithanAUCof76.2%accuracy.Thetestsetvalidated72.2%accuracy.Conclusion:Theradiomicssignaturecanbeausefulrecurrencepredictiontooleveninsmall-sizedlungADC.
Page 113
105
General
DRLPC:DimensionReductionofSequencingDatausingLocalPrincipalComponents
YunJooYoo1,FatemehYavartanu1,ShelleyB.Bull2
1SeoulNationalUniversity,2TheLunenfeld-TanenbaumResearchInstitute
YunJooYooGenome-wideassociationstudies(GWAS)usingsinglenucleotidepolymorphism(SNP)datausuallyhavemillionsofvariableswithcomplexcorrelationstructureresultingfromlinkagedisequilibrium.Whenmulti-SNPjointanalysisusingmultipleregressionisapplied,adimensionreductionmethodsuchasprincipalcomponentanalysiscanbeconsidered.ReplacingSNPdatawithprincipalcomponentscanresolvemulti-collinearitywhichoftenoccursinregressionusinghigh-densitysequencingorimputedSNPdata.However,theprincipalcomponentsconstructedfromallSNPvariablesinaregionarehardtointerpretasabiologicalentityandarenotusefulforlocalizationandfinemapping.Inthisstudy,weproposeanalgorithmDRLPC(DimensionReductionusingLocalPrincipalComponents)toreducethedimensionforregressionanalysisbyselectingclustersofSNPsinhighcorrelationandreplacingeachclusterbyalocalprincipalcomponentconstructedfromtheSNPsinthecluster.Thealgorithmaimstoresolvemulticollinearitybetweenupdatedvariablesbyconsideringvarianceinflationfactor(VIF)andremovingvariableswithhighVIF.WeexaminedthebehaviourofDRLPCbyapplyingthealgorithmtothe1000GenomesProjectdata.Chromosome22SNPsetsofthreepopulations(EUR,ASN,AFR)weredimensionreducedforeachgeneregionseparatelycomparingseveralchoicesofthresholdvaluesforclusteringandprincipalcomponentsselection.Whenaveragedacrossthegenes,theratioofthenumberoffinalvariablesoverthenumberoforiginalvariableswas50%forthegeneswith5~10SNPsandaslowas10%forthegeneswithmorethan1,000SNPs.ThereductionratewassmallerfortheAFRpopulationcomparedtotheotherpopulationsEURandASN,possiblyduetoweakerLDintheAfricanpopulation.Wealsocomparedthepowerofmulti-SNPtestsconstructedbasedonregressionresultsobtainedfromtheoriginaldataanddimensionreduceddata.ThesetestsincludegeneralizedWald,LC(linearcombination)tests,andMLC(Multi-binslinearcombination)tests.LCtestsandMLCtestsarealsodimensionreductiontechniquesinthesensethatLCcombinesallindividualeffectsintoaonedegreeoffreedomtestandandMLCcombinestheindividualeffectsintoalinearcombinationwithinabin(cluster)andconstructsatestwithdegreesoffreedomequaltothenumberofclusters.SinceDRLPCusesthesameclusteringalgorithmbasedoncliquepartitioningasMLCwecomparedresultsofMLCwithoriginaldatatoDRLPCWaldtestwithprocesseddataunderthesameclusteringthresholdandfoundthattheyyieldsimilarpower.WeconcludethatDRLPCcanprovideefficientdimensionreductionwhileresolvingmulti-collinearityandalsolessenstheproblemofinterpretabilitybecausetheseprincipalcomponentsrepresentsmallersizedregions,possiblyshorthaplotypes.
Page 114
106
General
Meta-analysisinexhaustedTcellsfromHomosapiensandMusmusculusprovidesnoveltargetsforimmunotherapy
LinZhang1,YichengGuo2,HafumiNishi1
1TohokuUniversityGraduateSchoolofInformationSciences,2ColumbiaUniversity,Department
ofSystemsBiology
LinZhangAntibodytargetimmunecheckpointinhibitorstoreverseTcellexhaustionisapromisingapproachforimmunotherapyofcancers.However,thetherapeuticefficacyisstilllowforknownimmunecheckpointinhibitors,suchasPD1andCTLA4.TcellexhaustionisastateofTcelldysfunctionduringchronicinfectionsandcancers.Itexhibitsseveralcharacteristicfeatures,suchaspooreffectorfunctionsinahierarchicalmanner,impairedmemoryTcellpotential,sustainedupregulationandco-expressionofmultipleinhibitoryreceptors.ThemechanismandpathwaysforTcellexhaustionremaintobefullydescribed.Inthisstudy,weperformedmeta-analysiswith7datasetsfrombothhumansandmice,touncoverthemolecularmechanismofTcelldysfunction.Throughgenesetenrichmentanalysis,thepredefinedexhaustiongenesetswereobservedtobesignificantenrichmentintheexhaustedTcells.Thedifferentexpressionanalysesshowedanoverlapof21upregulationand37downregulationgenessharedbyexhaustedTcellsinhumansandmice.Thesegenesweresignificantlyenrichedinexhaustionresponse-relatedpathways,suchassignaltransduction,immunesystemprocess,andregulationofcytokineproduction.Besides,co-expressionanalysisidentified175geneswerehighlycorrelatedwithexhaustiontraitinhumansandmice.Aboveall,ourstudyrevealedthatTOXandCD200R1mightbeconsideredaspotentialandhigh-efficienttargetsforimmunotherapy.
Page 115
107
INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS
POSTERPRESENTATIONS
Page 116
108
IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions
DisorderedFunctionConjunction:Onthein-silicofunctionannotationofintrinsicallydisorderedregions
SinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan
VirginiaCommonwealthUniversity
SinaGhadermarziIntrinsicallydisorderregions(IDRs)lackastablestructure,yetperformbiologicalfunctions.ThefunctionsofIDRsincludemediatinginteractionswithothermolecules,includingproteins,DNA,orRNAandentropicfunctions,includingdomainlinkers.Computationalpredictorsprovideresiduelevelindicationsoffunctionfordisorderedproteins,whichcontrastswiththeneedtofunctionallyannotatethethousandsofexperimentallyandcomputationallydiscoveredIDRs.Inthiswork,weinvestigatethefeasibilityofusingresidue-levelpredictionmethodsforregion-levelfunctionpredictions.Foraninitialexaminationofthemultiplefunctionregion-levelpredictionproblem,weconstructedadatasetof(likely)singlefunctionIDRsinproteinsthataredissimilartothetrainingdatasetsoftheresidue-levelfunctionpredictors.Wefindthatavailableresidue-levelpredictionmethodsareonlymodestlyusefulinpredictingmultipleregion-levelfunctions.Classificationisenhancedbysimultaneoususeofmultipleresidue-levelfunctionpredictionsandisfurtherimprovedbyinclusionofaminoacidscontentextractedfromtheproteinsequence.WeconcludethatmultifunctionpredictionforIDRsisfeasibleandbenefitsfromtheresultsproducedbycurrentresidue-levelfunctionpredictors,however,ithastoaccommodateinaccuracyinfunctionalannotations.
Page 117
109
MUTATIONALSIGNATURES
POSTERPRESENTATIONS
Page 118
110
MutationalSignatures
Transcription-associatedregionalmutationratesandsignaturesinregulatoryelementsacross2,500wholecancergenomes
JüriReimand
OntarioInstituteforCancerResearch,UniversityofToronto
JuriReimandThegenomesofhealthyandcancerouscellsaccumulatesomaticmutationsovertimewithcomplexvariationsacrosstissuesandgenomiccontexts.Certainclassesoffunctionalelementsofthegenomearesubjecttodifferentialmutationratesduetoregionalizedactivitiesofmutationalprocesses.Toinvestigateregionalmutations,wedevelopedRM4RM,astatisticalframeworkfordetectingdifferentialmutationratesandtrinucleotidesignaturesinsetsofgenomicregulatoryelements.Tovalidateourmodel,wefirstanalyzedCTCFbindingsitesacross>2,500wholecancergenomesof39cancertypesoftheICGC-TCGAPCAWGcohort.WefoundsignificantmutationenrichmentsinCTCFsitesinliver,esophageal,breastandothercancertypesthatwasprimarilydrivenbyT>C/Gmutationsandmultipleraremutationsignaturesofunknownetiology.Transcriptionstartsitesofprotein-codinggenesandabroadersetofexperimentally-definedregulatoryelementsderivedfromprimarytumorsoftheTCGAprojectalsoshowedsignificantlyelevatedregionalmutationratesinmultiplecancertypes.TSS-specificregionalmutationenrichmentwasparticularlydominantinhighlytranscribedgenesofmatchingtumorswhilenonewasapparentinsilencedgenes.Incontrast,nomutationenrichmentdependencyontranscriptabundancewasobservedindistalregulatoryelements.Thesedataindicateatranscriptioninitiation-coupledmutationalprocessactiveinmultiplecancertypessupportedbymultiplemutationalprocessesandtrinucleotidesignaturesspecificallyenrichedinhighly-transcribedTSSs.Ourfindingsandstatisticalmodelenabledetailedstudiesofthemechanismsofsomaticmutagenesisandadvancesourunderstandingofgeneticdriversofdisease.
Page 119
111
MutationalSignatures
Complexmosaicstructuralvariationsinhumanfetalbrains
ShobanaSekar1,LiviaTomasini2,MariaKalyva3,TaejeongBae1,LoganManlove1,BoZhou4,JessicaMariani2,FritzSedlazeck5,AlexanderE.Urban4,ChristosProukakis3,FloraM.Vaccarino2,
AlexejAbyzov1
1MayoClinic,2YaleUniversity,3UniversityCollegeLondon,4StanfordUniversity,5BaylorCollege
ofMedicine
AlexejAbyzovSomaticmosaicismincellsofthehumanbrainiscommonandmayhavefunctionalconsequencesthatleadtodiseasesincludingneurologicalones.Mosaicvariationsinbraincanbepointmutations,insertionsofmobileelements,andstructuralchanges.Previouslywedetectedanddescribed200-400mosaicpointmutationspersinglecellclonesfromcorticesofthreehumanfetuses(15to21weekspostconception).Herewedescribefourmosaicstructuralvariations(SVs)inthesamebrains.TheSVswereofkilobasescaleandcomplex,i.e.,consistingofdeletion(s)andafewrearrangedgenomicfragmentsthatsometimesoriginatedfromdifferentchromosomes.Sequencesatbreakpointsattherearrangementshadmicrohomologiessuggestingtheiroriginfromreplicationerrors.OneSVwasfoundintwoclonesandwetimeditsoriginto~14weekspostconception.OurstudyrevealstheexistenceofmosaicSVs,likelyarisingfromcellproliferation,inthehumanbraininmid-neurogenesis.
Page 120
112
PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK
POSTERPRESENTATIONS
Page 121
113
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWorkPatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Stratificationofkidneytransplantrecipientsbasedontemporaldiseasetrajectories
IsabellaFriisJørgensenPhD1,SørenSchwartzSørensenPhD2,SørenBrunakPhD1
1NovoNordiskFoundationCenterforProteinResearch-FacultyofHealthandMedicalSciences-UniversityofCopenhagen-Blegdamsvej3B-DK-2200CopenhagenN-Denmark;2DepartmentofNephrology-Rigshospitalet-CopenhagenUniversityHospital-Blegdamsvej9-DK-2100
CopenhagenØ-Denmark
IsabellaFriisJørgensenOrgantransplantationsoftenimprovethelifeofchronicallysickpatients.However,immune-suppressivemedicationgiventotransplantrecipientsincreasetheriskofcomplications,especiallyinfectionsandinfection-relateddeath.Oneinfivekidneytransplantrecipientsdiefrominfection.Wewanttostratifykidneytransplantrecipientsintogroupsofpatientswithdifferentpatternsofinfectiousdiseasesandmortalitytopredictwhichpatientshavehigherriskofspecificinfections.WeusetheDanishNationalPatientRegistry(DNPR)thatcontainshospitaldiagnosesfor6.9millionpatientsfromtheentireDanishpopulationfrom1994to2018.Weuseapreviouslypublishedmethodtoidentifysignificanttime-dependentdiseasetrajectoriesforallpatientswithakidneytransplantation.Subsequently,weusehierarchicalclusteringofJaccarddistancesbetweenthediseasetrajectoriestofinddistinctgroupsoftrajectoriesfromkidneytransplantrecipients.IntheDNPR,weidentified5,644patientswithakidneytransplantationresultingin43significantdiseasetrajectoriesthatconsistofthreeconsecutivediseasesincludingseveralinfectious-relateddiagnoses.Morethan87%ofthekidneytransplantationrecipientsfollowatleastoneofthesetrajectories;hencearediagnosedwiththethreediseasesintheorderthetrajectoryspecifies.Clusteringrevealstwomaingroupsoftemporaldiseasetrajectories.Weidentifypatientsfollowingthetwogroupsofdiseasetrajectoriesanddiscoversignificantdifferencesinmortalityafterkidneytransplantationbetweenpatientsfollowingdifferentdiseasetrajectories.Thisstudyusedpreviousdiseasehistoryfromlarge-scalehospitaldiagnosestostratifycommon,temporaldiseasetrajectoriesintotwodistinctgroups.Dependingonthetypeoftrajectorykidneytransplantationrecipientsfollowsignificantdifferencesinmortalityareseen.Thesemethodscanbeusedtoguidecliniciansabouthigherrisksofcertaininfectionsandmortalityofcertaingroupsofkidneytransplantrecipients.
Page 122
114
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
ModelingGeneExpressionLevelsfromEpigeneticMarkersUsingaDynamicalSystemsApproach
JamesBrunner1,JacobKim2,KordM.Kober3
1MayoClinic,Rochester,MN;2ColumbiaUniversity,NewYork,NY;3UniversityofCalifornia,San
Francisco,CA
KordKoberGeneregulationisanimportantfundamentalbiologicalprocessandinvolvesanumberofcomplexbiologicalprocessesthatareessentialfordevelopmentandadaptationtotheenvironment.Understandingtheroleofepigeneticchangesingeneexpressionisafundamentalquestionofmolecularbiology.Predictinggeneexpressionfromepigeneticdataisanactiveareaofresearchandpreviousstudieshaveusedstatisticalapproachesforbuildingpredictionmodels.Dynamicalsystemscanbeusedtogenerateamodeltopredictgeneexpressionusingepigeneticdataandageneregulatorynetwork(GRN).Bydynamicallysimulatinghypothesizedmechanismsoftranscriptionalregulation,weprovidepredictionsbaseddirectlyonthesebiologicalhypotheses.Furthermore,astochasticdynamicalsystemprovidesuswithadistributionofgeneexpressionestimates,representingthepossibilitiesthatmayoccurwithinthecell.ThepurposeofthisstudyistodevelopandevaluateastochasticdynamicalsystemsmodelpredictinggeneexpressionlevelsfromepigeneticdataforagivenGRN.Wemodelgeneregulationusingapiecewise-deterministicMarkovprocess(PDMP)wheretranscriptionfactor(TF)bindingisaBooleanrandomvariablerepresentingthebound/unboundstateofabindingsiteregionofDNA.TFbindingisgivenasthedifferenceoftwoPoissonjumpprocesses(i.e.,bindingandunbinding),sothattimebetweenbindingandunbindingeventsisexponentiallydistributedwithpropensitiestakentobelinearfunctionsoftheavailableTF.EpigeneticmodificationoftheTFbindingsiteimpactsthebindingpropensityofTFandismeasuredasthepercentageofmethylatedbases(i.e.,beta).WeusealinearordinarydifferentialequationbasedontheunderlyingGRNtodeterminethevalueofthetranscriptbetweenTFbindingorunbindingevents.Weincludebaselinetranscriptionanddecayandareabletosolveexactlybetweenjumpsofbinding/unbindingevents.Inadiscretespace,continuoustimeMarkovprocess,theequilibriumdistributioncanbeestimatedbysamplingfromarealizationoftheprocess.ForourcontinuousspacePDMPwecanestimatetheequilibriumdistributioninasimilarmannerusingkerneldensityestimationwithaGaussiankernel.Weestimatethemarginaldistributionsofvariousgenevariableswitha1-dimensionalkernel.WeuseaGRNassumetobeknowntocreateamodelofgeneregulationthatincludesTFbindingdynamics.Weassociatebindingsiteswiththegenesthattheyregulateandusetheseassociationstocreateabipartitegraph.TheGRNandtraining/testingdataarecreatedfrompubliclyavailabledata.Theepigeneticparameterisassumedtobemeasurable.Theremainingparametersareestimatedusinganegativelog-likelihoodminimizationprocedure.Wecancomputealog-likelihoodforasetofpairedepigeneticandtranscriptionsamplesbytimeaveragingasamplepathagainstaGaussiankernel.Wereportonthedesignandevaluationofthemodel’sperformance.
Page 123
115
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
TranslatingBigDataneuroimagingfindingsintomeasurementsofindividualvulnerability
PeterKochunov1,PaulThompson2,NedaJahanshad2,ElliotHong1
1UniversityofMarylandSchoolofMedicine,Maryland,USA;2UniversityofSouthernCalifornia,
California,USAPeterKochunovWeproposeanintuitiveanatomicallyinformedapproachtoderiveanindexofsimilaritybetweenindividualbrainpatternsandtheexpectedpatternsofneuropsychiatricdisordersbasedonBigDataneuroimagingstudies.BigDataneuroimagingstudies,suchastheseperformedbyEnchancingNeuroImagingGeneticsMetaAnalysis(ENIGMA)consortiumprovidedscientificcommunitywiththeregionalpatternsofeffectsizesincommonneuropsychiatricdisorderssuchasschizophrenia(SZ),bipolarandmajordepressivedisorders(BPandMDD),epilepsy(EP),Alzheimer’sdementia(AD),mildcognitiveimpairment(MCI)andothers.ThesepatternsdescriberegionaldeficitusingstandardizedsMRI,dMRIandrsfMRIworkflows.Theyarederivedfromstatisticallypowerfulandinclusivesamplesandarehighlyreproducible(r=0.8-0.9)inindependentsamples.Wedeveloped“RegionalVulnerabilityIndex”(RVI)tomeasuresimilaritybetweenanindividualandtheexpectedpatternofthepatient-controldifferencesRVIcanbecalculatedforasingleoracrossimagingmodalities.ForasinglemodalityRVI,exampleusesFractionalAnisotropy(FA)measurefromdMRI,iscalculatedasfollowing.FAforeachofthe23majorwhitematterregions,asdefinedbyENIGMAatlas,inanindividualisconvertedtoz-valuesby(A)calculatingtheresidualvaluesafterregressingoutageandsexeffectsforthisregionand(B)subtractingtheaveragevalueforaregionand(C)dividingbythestandarddeviationcalculatedfromthehealthycontrols.Thisproducesavectorof23z-values(oneperregion)foreachindividualinthesample.RVIiscalculatedasthecorrelationcoefficientbetween23region-wisezvaluesforthesubjectandthepatient-controlseffectsizesinENIGMA.RVItakesvaluesfrom1(individualpatternisalignedwithdisorderpattern)to-1(individualpatternisinanti-alignment).Forcross-modalityresearch,RVIcanbeexpandedhierarchicallybybuildingacombinedvectorthatincludesmultiplephenotypes.Forexample,theRVI-WhiteMattercalculationusesavectorof69valuesthatcombinetract-wiseFA,radial(RaD)andaxial(AxD)diffusivityvaluesperperson.Tomergeeffectsizesacrossdiversedomains,weuseapseudo-ordinarytransformationthatmapseffectsizesbetween0and1whilepreservingtherelativedistancebetweenthem.WefirstdemonstratedthatRVI-SZvaluesaresignificantlyelevatedinpatientswithSZandarealsopredictiveoftreatmentresistance.ThatissubjectswhodevelopedresistancetomodernantipsychoticmedicationshadsignificantlyhigherRVI-SZvaluesthanthesewhorespondedtotreatment.WenextdemonstratedthatRVIforSZweresignificantlycorrelatedwithRVIforADbutnotMCIduetosignificantoverlapindeficitpatternsbetweenthesedisorders.WenextshowedthatcalculatingRVIacrossmultiplemodalitiesproducesvulnerabilitymeasuresthataremoresensitivetopatientcontroldifferencesintheindependentdatasetsandshowedstrongersensitivitytocognitivedeficitsandnegativesymptoms.TheRVIcalculatortoolsaredistributedwithsolar-eclipsesoftware(www.solar-eclipse-genetics.org)
Page 124
116
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Automatingnew-usercohortconstructionwithindicationembeddings
RachelD.Melamed
DepartmentofComputationalBiomedicineandBiomedicalData,UniversityofChicago
RachelMelamedTheelectronichealthrecordisarisingresourceforquantifyingmedicalpracticeanddiscoveringadverseeffectsofdrugs.Oneofthechallengesofhealthcaredataisthehighdimensionalityofthehealthrecord.Anystudyofpatternsinhealthdatamustaccountfortensofthousandsofpotentiallyrelevantdiagnosesortreatments.Inthiswork,wedevelopindicationembeddings,awaytoreducethedimensionalityofhealthdatawhilecapturingtheinformationrelevanttotreatmentdecisions.Wedemonstratethattheseembeddingsrecovertherapeuticusesofdrugs.Thenweusetheseembeddingsasaninformativerepresentationofrelationshipsbetweendrugs,betweenhealthhistoryeventsanddrugprescriptions,andbetweenpatientsataparticulartimeintheirhealthhistory.Weshowtheapplicationoftheseembeddingsinareasofcurrentresearch.Fordrugsafetystudies,particularlyretrospectivecohortstudies,ourlow-dimensionalrepresentationhelpsinfindingcomparatordrugsandconstructingcomparatorcohorts.Thisenablesustodevelopanautomatedapproachtochoosecomparatorcohortsforatreatedpopulation.
Page 125
117
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Reproducibility-optimizedstatisticaltestingforomicsstudies
TomiSuomi,LauraElo
TurkuBioscienceCentre,UniversityofTurkuandÅboAkademiUniversity,Turku,Finland
LauraEloDifferentialexpressionanalysisisoneofthemostcommontypesofanalysesperformedonvariousbiologicalandbiomedicaldata,includinge.g.RNA-sequencingandmassspectrometryproteomics.Itistheprocessthatdetectsfeatures,suchasgenesorproteins,showingstatisticallysignificantdifferencesbetweenthesamplegroupsundercomparison.However,asdifferentteststatisticsperformwellindifferentdatasets,thechoiceofanappropriateteststatistichasremainedamajorchallenge.Toaddressthechallenge,ourreproducibility-optimizedteststatistic(ROTS)optimizesthestatisticonthebasisofthedatabymaximizingthereproducibilityofthetop-rankedfeaturesthroughabootstrapprocedure.Finally,itprovidesarankingofthefeaturesaccordingtotheirstatisticalevidencefordifferentialexpressionbetweenthesamplegroups.WehaveshowntherobustperformanceofROTSinarangeofstudiesfromtranscriptomicstoproteomics,coveringbothbulkandsinglecellmeasurements.ROTSisfreelyavailableasanRpackageinBioconductor.
Page 126
118
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
DataIntegrationExpectationMaps:Towardsmoreinformed'omicdataintegration
TiaTate1,ChristianRichardson2,ClarLyndaWilliams-DeVane3
1UniversityofNorthCarolina-Charlotte,2DukeUniversity,3FiskUniversity
ClarLyndaWilliams-DeVaneInnovativedatatechnologiesanddecreasingcostshaveexpandedthescopeofavailabledatarelatingtovariousdiseases.Avastamountof-omicsdatageneratedatdiverselevels(DNA,RNA,protein,metaboliteandepigenetic)haverevealedrelationshipsofvariousbiologicalprocesses.Generally,thesediversedatatypesareconsideredindependentlywhilecombinationsoftwoormoredatatypesarelessexplored.Thisnarrowapproachoftenfailstoidentifytheintricateinteractionsresponsiblefortheetiologyofcomplexdisease.Completebiologicalmodelsofcomplexdiseasesareonlylikelytobediscoveredifthevariouslevelsof-omicmechanismsareconsideredfromanintegrativeperspective.Integrativemodelsoftenrequiretheintegrationofbiological,computational,mathematical,andstatisticaldomains.However,awell-documentedshortageofresearcherswithacommandofmultipledomainsexists.Thus,wehaveproposedtheuseofDataIntegrationExpectationMaps(DIEMs)asvisualtoolsforfacilitatingtheunderstandingofintegratingvarious-omicdatatypestounderstandcomplexdiseasesbyfillingingapsinbiologicalknowledge.DIEMsprovideauser-friendlyformatforunderstandingintegrativemodeldevelopmentincomplexdiseasesby1)identifyingdataformatsthatcanand/orhavebeenintegrated,2)providingguidanceonthebestmethodtointegratethedata,and3)providinganexpectationofbiologicalinsighttobegainedfromtheintegration.
Page 127
119
PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE
POSTERPRESENTATIONS
Page 128
120
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Integratedomicsdataminingofsynergisticgenepairsforcancerprecisionmedicine
EunaJeong,ChoaPark,SukjoonYoon
SookmyungWomen'sUniversity
EunJeongCurrenthigh-throughputtechnologiesenablesimultaneousacquisitionofmulti-levelomicsandRNAi/chemicalscreeningdataincancers.Productionandintegrationofthesedatahelpidentifyingassociationsofdrugtargetsandsynergisticbiomarkers(mutationsorgeneexpression),thusacceleratingtheirclinicalapplicationsandpatientstratification.WehaveextensivelycarriedoutcancerbigdataminingandphenotypicsiRNAlibraryscreeningforfindingtheoptimalcombinationoftargetsandbiomarkersforadvancedcancertherapiessuchasregulatingcancerstem-likecells(CSLCs)andoncogenictranscriptionfactors.Ourmultiplexedscreeningdissectphenotypicresponsesintosensitivityandresistancytothetargetknockdown.Combinedwithmutaomeandtransciptomedataofscreenedcelllines,targetome-wideknockdowndatarevealthefunctionalaspectofsynergisticeffectsbetweentargetsiRNAsandmutation/transcriptionsignatures,leadingtothediscoveryofnovelsyntheticlethalgenepairs.Productionandintegrationofthesedataenabledustoidentifytarget-biomarkercombinationsforacceleratingtheirclinicalapplicationsandpatientstratification.
Page 129
121
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Thepowerofdynamicsocialnetworkstopredictindividuals'mentalhealth
ShikangLiu1,DavidHachen1,OmarLizardo2,ChristianPoellabauer1,AaronStriegel1,TijanaMilenkovic1
1UniversityofNotreDame,2UniversityofCaliforniaatLosAngeles
ShikangLiuPrecisionmedicinehasreceivedattentionbothinandoutsidetheclinic.Wefocusonthelatter,byexploitingtherelationshipbetweenindividuals'socialinteractionsandtheirmentalhealthtopredictone'slikelihoodofbeingdepressedoranxiousfromrichdynamicsocialnetworkdata.Existingstudiesdifferfromourworkinatleastoneaspect:theydonotmodelsocialinteractiondataasanetwork;theydosobutanalyzestaticnetworkdata;theyexamine"correlation"betweensocialnetworksandhealthbutwithoutmakinganypredictions;ortheystudyotherindividualtraitsbutnotmentalhealth.Inacomprehensiveevaluation,weshowthatourpredictivemodelthatusesdynamicsocialnetworkdataissuperiortoitsstaticnetworkaswellasnon-networkequivalentswhenrunonthesamedata.
Page 130
122
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Robust-ODAL:Learningfromheterogeneoushealthsystemswithoutsharingpatient-leveldata
JiayiTong1,RuiDuan1,RuowangLi1,MartijnJ.Scheuemie2,JasonH.Moore1,YongChen1
1UniversityofPennsylvania,2JanssenResearchandDevelopmentLLC
JiayiTongElectronicHealthRecords(EHR)containextensivepatientdataonvarioushealthoutcomesandriskpredictors,providinganefficientandwide-reachingsourceforhealthresearch.IntegratedEHRdatacanprovidealargersamplesizeofthepopulationtoimproveestimationandpredictionaccuracy.Toovercometheobstacleofsharingpatient-leveldata,distributedalgorithmsweredevelopedtoconductstatisticalanalysesacrossmultipleclinicalsitesthroughsharingonlyaggregatedinformation.However,theheterogeneityofdataacrosssitesisoftenignoredbyexistingdistributedalgorithms,whichleadstosubstantialbiaswhenstudyingtheassociationbetweentheoutcomesandexposures.Inthisstudy,weproposeaprivacy-preservingandcommunication-efficientdistributedalgorithmwhichaccountsfortheheterogeneitycausedbyasmallnumberoftheclinicalsites.Weevaluatedouralgorithmthroughasystematicsimulationstudymotivatedbyreal-worldscenariosandappliedouralgorithmtomultipleclaimsdatasetsfromtheObservationalHealthDataSciencesandInformatics(OHDSI)network.TheresultsshowedthattheproposedmethodperformedbetterthantheexistingdistributedalgorithmODALandameta-analysismethod.
Page 131
123
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
PharmGKB:AutomatedLiteratureAnnotations
MichelleWhirl-Carrillo1,LiGong1,RachelHuddart1,KatrinSangkuhl1,RyanWhaley1,MarkWoon1,JuliaM.Barbarino2,JakeLever3,RussB.Altman4,TeriE.Klein5
1DepartmentofBiomedicalDataScience,StanfordUniversity;2FormerlyDepartmentofBiomedicalDataScience,StanfordUniversity;3DepartmentofBioengineering,StanfordUniversity;4DepartmentsofBioengineering,MedicineandGenetics,StanfordUniversity;
5DepartmentsofBiomedicalDataScienceandMedicine,StanfordUniversity
MichelleWhirl-CarilloPharmGKBisthelargestpubliclyavailableresourceforpharmacogenomics(PGx)discoveryandimplementation.Itsmissionistocollect,curate,integrateanddisseminateknowledgeabouthowhumangeneticvariationinfluencesdrugresponse.PharmGKBscientistSmanuallycuratetheprimaryliteraturetocapturedetailsofpublishedpharmacogenomicstudiessuchasvariant-gene-drug-phenotypeassociations,statisticalsignificance,studysizeandpopulationcharacteristics.PharmGKBreferstothesemanuallycreatedannotationsas“VariantAnnotations.”
Page 132
124
PACKAGINGBIOCOMPUTINGSOFTWARETOMAXIMIZEDISTRIBUTIONANDREUSE
WORKSHOPPOSTERPRESENTATIONS
Page 133
125
Workshop:PackagingBiocomputingSoftwaretoMaximizeDistributionandReuse
ApolloprovidesCollaborativeGenomeAnnotationEditingwiththepowerofJBrowse
NathanDunn1,ColinDiesh2,RobertBuels2,HelenaRasche3,AnthonyBretaudeau4,NomiHarris1,IanHolmes2
1LawrenceBerkeleyNationalLab,2UCBerkeley,3UniversityofFreiburg,4INRA
NathanDunnGenomeannotationprojectsinvolvemulti-stepworkflowsthatarelargelyautomated.However,evenwithafullyautomatedannotationpipelinevisualinspectionandrefinementofdiversetypesofinformationsuchasgenomicandtranscriptomealignmentsandpredictivemodelsbasedonsequenceelementsarecriticaltoassureandimprovetheaccuracyofthegenomeannotationspriortopublication.Tothisend,Apollo(https://github.com/GMOD/Apollo/)isawebapplicationthatprovidesresponsiveandcustomizablevisualizationandeditingofgenomicelements.BuiltontopoftheJBrowsegenomebrowser(http://jbrowse.org/)anditslargeregistryofplugins(https://gmod.github.io/jbrowse-registry/),Apollosupportsefficientannotationcurationthroughdrag-and-dropediting,alargesuiteofautomatedstructuraleditoperations,theabilitytopre-definecuratorcommentsandannotationstatustomaintainconsistency,attributionofannotationauthors,fine-graineduserandgroupaccessandeditpermissions,andavisualhistoryofrevertibleannotationedits.SettingupanewgenomeannotationinApolloisstraightforward.ApollocanberunfromDockerorfromprovidedAWSinstances,andgenomeswithfeatureevidencecanberetrievedfromanexistingJBrowsedirectory.Wehavealsorecentlyenabledresearcherstouploadtheirgenomesequenceandfeatures(inFASTA,VCF,BAM,orGFF3format)directlytoApollo,minimizingtheneedforscriptingorserveraccess.ItisalsopossibletocreateannotationsontheflyfromBLATorBLASTsearchresults,whichprovidesawaytoinitiateagenepreviouslyannotatedonacloselyrelatedspecies..ApolloprovidesaPythonlibrarythatwrapstheweb-services(https://github.com/galaxy-genome-annotation/python-apollo)sothatworkflowenvironmentssuchasGalaxycanbeautomatedsothattheoutputofanautomatedworkflowcandirectlycreategenomeprojects,provideevidence,andmanageaccesstoanApolloinstance.Apollosupportsseveralpopularformatsfordataexport.StructuralgenomeannotationscanbeexportedasFASTA,GFF3,orVCF(ifannotatingvariants)alongwithanyassociatedmetadata.FunctionalannotationsmappedtoGeneOntologytermscanbeexportedinGPAD2orGPI2format.Apolloisanopen-sourcetoolusedinoveronehundredgenomeannotationprojectsaroundtheworld,rangingfromtheannotationofasinglespeciestolineage-specificeffortssupportingtheannotationofdozensofgenomes.https://github.com/GMOD/Apollo/https://genomearchitect.readthedocs.io/
Page 134
126
Workshop:PackagingBiocomputingSoftwaretoMaximizeDistributionandReuse
g:Profiler - One functional enrichment analysis tool, many interfaces serving life science communities
Liis Kolberg, Uku Raudvere, Ivan Kuzmin, Jaak Vilo, Hedi Peterson
University of Tartu
Making sense of gene lists plays an important role in majority of biological and biomedical experiments. There are several methods and tools that help the scientists to carry out the computational load of these tasks. One of such is g:Profiler (https://biit.cs.ut.ee/gprofiler), a widely used toolset for functional interpretation and conversion of gene lists from hundreds of species. g:Profiler has served the community since 2007 and continues to provide life scientists with the most up-to-date data and methods to this day. Keeping the service trustworthy, the results reproducible and transparent has been the main goal of the team developing g:Profiler. The success in this end is indicated in the increasing number of user requests per year, which already in 2019 alone is close to 9 million queries. These millions of queries originating across the world reflect the diversity of usage preferences, skill sets and research goals of the scientific community. We, as the developers of g:Profiler, have taken this into account by developing and supporting different access options which, in hindsight, has been a huge factor in the increasing user traffic. On the one hand, g:Profiler web application provides researchers, who want quick and easily interpretable results, with nice visualizations, searchable tables and data export possibilities. On the other hand, there is a large bioinformatics community, whose members prefer to analyze gene lists in an automated manner. We support them by offering a standardized access through public APIs. And, as R and Python are the most popular programming languages among life scientists with informatics expertise, we have simplified the usage of APIs by wrapping them into corresponding packages named gprofiler2 and gprofiler-official, respectively. For the users somewhere in between, g:Profiler is also available from the Galaxy platform, which is a popular framework for data intensive biomedical research pipelines run in a graphical user interface. It is clear that the tools in such an interdisciplinary field need to be flexible in order to fully benefit the research community. However, from our experience, the complexity of providing a widely distributed toolset lies in the maintenance of the services rather than in the development, and this is the core reason for depreciation of tools. In g:Profiler the separate interfaces all use the data and methods from a shared hub making them reliable and consistent with each other even after the frequent data updates. We are positive that g:Profiler has been able to help thousands of researchers across the life science community because our priorities have been to reuse high quality and regularly updated data, and to maximize the access options so that we would not leave any life science subcommunity behind.
Page 135
127
Workshop:PackagingBiocomputingSoftwaretoMaximizeDistributionandReuse
IncreasingusabilityanddisseminationofthePathFXalgorithmusingwebapplicationsanddockersystems
JenniferWilson1,NicholasStepanov2,AjinkyaChalke2,MikeWong3,DragutinPetkovic2,RussB.Altman4
1DepartmentofChemical&SystemsBiologyatStanfordUniversity;2ComputerScienceDeptat
SanFranciscoStateUniversity;3COSEComputingforLifeSciencesatSanFranciscoStateUniversity;4HelixGroupatStanfordUniversity
MikeWongLimitedefficacyandunacceptablesafetyconfoundtherapeuticdevelopment.Identifyingpotentialliabilitiesearlierindrugdevelopmentcouldsignificantlyimprovesuccessrates.Recently,incollaborationwiththeUSFDA,wedevelopedthePathFXalgorithmandopenlyavailablePathFXwebapplicationforbetterunderstandingpathway-levelsafetyandefficacyphenotypesassociatedwithadrug’starget(s).RunningPathFXalgorithmlocallywouldenableimprovedefficiency,security,andprivacy,howeverinstallationofPathFXanditsdependenciesischallengingfornon-computationalscientistsandpreventsdissemination.Inaddition,whilePathFX-webquicklyanalyzesnetworkassociations,thephenotypeclusteringfeaturehashighcomputationalcoststhatlimittheefficiencyofthesharedcloudserver.Toresolvethesechallenges,wedevelopedPathFX-webDockercontainerwhichprovidesaneasy-to-install,easy-to-usewebinterface,astandalonecommand-lineformulationtoPathFX,addedsecurity/privacyandallowsleveragingofthecomputationalpoweroftheuser’shardware.
Page 136
128
TRANSLATIONALBIOINFORMATICSWORKSHOP:BIOBANKSINTHEPRECISIONMEDICINEERA
WORKSHOPPOSTERPRESENTATIONS
Page 137
129
Workshop:TBIworkshop
Identificationofbiomarkersrelatedtoautismspectrumdisorderusinggenomicinformation
LeenaSait,MarthaGizaw,IosifVaisman
SchoolofSystemsBiology,GeorgeMasonUniversity
LeenaSaitAutismspectrumdisorder(ASD)isoneofthemostcommonneurodevelopmentaldisorders.Worldwide,ASDtendstohaveaprevalenceofoneper132persons,withanestimatedprevalenceof1in59children,accordingtoCDC’sAutismandDevelopmentalDisabilitiesMonitoringNetwork.Todate,noeffectivemedicaltreatmentsforthecoresymptomsofASDexists.However,biomarkerscapableofdetectinganddiagnosingASDcanhelptotranslateexperimentalresearchresultstobenchsideclinicalpractices.BiomarkerdiscoveryinASDiscomplicatedbythediversityofcoresymptomswhichcomprisedeficitsinsocialcommunication,presenceofrigid,repetitiveandstereotypicalbehaviors,andcomorbidmedical(e.g.,epilepsy)orpsychiatricsymptoms.TheEU-AIMSLongitudinalEuropeanAutismProject(LEAP),thelargestconsortiamadeagreatadvancementinthediscoveryofbiomarkersforASD.Itseekstoidentifystratificationbiomarkersusingneurobiologicalorneurocognitivemeasures,neuroimaging,electrophysiology,biochemistryandgenetics.Thisworkisaimedattheidentificationofsinglenucleotidepolymorphisms(SNPs)basedonSNPgenotypingingenomicDNAinalargecohortofASDpatientsandunaffectedrelatedindividualstohelpunderstandtheexactgeneticcausesofASD.Wehypothesizedthatrankingthegenesbasedondistanceinthespaceoftheallelesfrequenciesbetweenaffectedandunaffectedpopulationscanbeusedtoidentifynewputativebiomarkers.ThedatasetretrievedfromtheGeneExpressionOmnibusdatabase(GSE6754)containsmorethan6000samplesfrom1,400families.OurresultsshowthattheSNPsthatarehighlyrankedbythedistanceinthree-dimensionalgenotypecountspacebetweenalltheaffectedandunaffectedsubjectsinthecohortaremorelikelytobelinkedtoASD.TheseresultscanopennewpossibilitiesforfurtherinvestigationinidentifyingthegeneticmechanismsofASD.
Page 138
130
Workshop:TBIworkshop
Apan-cancer3-genesignaturetopredictdormancy
IvyTran1,AnchalSharma2,SubhajyotiDe2
1RutgersUniversity-Camden,2RutgersCancerInstituteofNewJersey
IvyTranTumordormancyischaracterizedbythedisseminationofhibernatingtumorcellsthatdonotproliferateuntilyearsafterapparentlysuccessfulremovalofpatients’primarycancer,resultinginthelaterelapseofthecancer.Distinguishingbetweentheriskofearly(£8months)andlate(³5years)relapseincancerpatientsisimportantforthetargetedtreatmentofthetumor.Inthisstudy,weidentified53genesthatweresignificantlyup-regulatedordown-regulatedindormantcells,fromwhichthreegenes,CD300LG,OCIAD2,VSIG4,weredeterminedbyrecursivefeatureeliminationtobethemostimportantfeaturesinpredictingtumordormancy.Usingthisthreegenesignature,wetrainedaRandomForestalgorithmonacross-validated(10foldrepeated3times)dataset(n=422)randomlysubsettedintotrainingdata(75%)andtestdata(25%),consistingofsevendifferenttumortypes-testicularcancer,breastcancer,glioblastomamultiforme,lungcancer,colonrectalcancer,kidneycancerandmelanoma.Thetunedpredictionmodelyielded80.19%predictionaccuracyusingconfusionmatrixanalysis,and82.74%predictionaccuracywhenusingAUCofaROCcurveastheaccuracymetric.Whenindependentlytestingthemodelonavalidationset(n=44)oflivercancerdownloadedfromICGC,confusionmatrixanalysisyieldeda67.44%accuracyandAUCofaROCcurveyieldeda60.48%accuracy.Thisidentified3-genesignaturecanbeusefulinpredictingearlyorlaterelapseofcancerinpatientsinclinicalpractice.
Page 139
131
AUTHORINDEX
’
’tJong,Geert·85
A
Abyzov,Alexej·111Adkins,Joshua·96
·3Agrawal,MonicaAlavi,Ali·99
·28Allen,MaryA.·47Alterovitz,Wei-Lun
Althagafi,Azza·70·27,34,37,123,127Altman,RussB.
·23Anastopoulos,IoannisN.·21Andrade-Navarro,MiguelA.
Andrianova,Katia·74·31Arslanturk,Suzan
·50Atwal,Gurnit
B
·63Bae,HoBae,Taejeong·111
·65Baladandayuthapani, VeerabhadranBaltrus,David·96Barash,Yoseph·92
·34,123Barbarino,JuliaM.·10,108Barik,Amita
Barlaskar,Sabiha·99·39Barnard,Martha·19Beam,AndrewL.
Bebek,Gurkan·75Belyeu,Jon·83
·60Benchek,PenelopeBerger,Howard·85
·65Bhattacharyya,Rupam·22Blinder,Pablo·20,76,93Bobak,CarlyA.
Bock,Christoph·91·25Bourque,Guillaume
·7Branch,Andrea·2Brand,Lodewijk
Brannon,Charlotte·77Bretaudeau,Anthony·125
·27Brinton,ConnorBrodie,Sonia·85Brooks,ThomasG.·79,92Brown,James·85Brown,Joe·83Brown,Yaadira·80Brunak,Søren·113
Brunner,James·114Buels,Robert·125
·4Bui,NamBull,ShelleyB.·105
·21Burkhardt,Sophie·60,64Bush,WilliamS.
·42Bustamante,CarlosD.
C
·6Cai,ChunhuiCai,DanDan·99
·19Cai,TianxiCalvanese,Vincenzo·95
·44Candido,ElisaCapellera-Garcia,Sandra·95Carleton,BruceC.·78,85Carrillo,KatherineI.·81
·49Ceri,StefanoChalke,Ajinkya·127Chaudhry,Shahnaz·85
·3Chen,IreneY.·4Chen,JessicaW.
·12Chen,JianhanChen,Jun·70
·45Chen,Yang·38,122Chen,Yong
Cheong,HeeJin·102·56Cheong,Jae-Ho·53Cherng,SarahT.
Chia,Nicholas·71·50Chmura,Jacob·63Choi,Hyun-Soo
·68,103Chrisman, Brianna·54Christensen,BrockC.
·15Christensen,SarahChu,Chong·82
·6Cohen,WilliamW.·52Coker,Beau
·64CookeBailey,JessicaN.Cormier,Michael·83CornwellIII,EdwardE.·80
·4,42Costa,HelioA.·64Crawford,DanaC.
·5Crowell,Andrea·8Cui,Tianyi
D
Dale,Ryan·88·53Danieletto,Matteo
De,Subhajyoti·130·27Derry,Alexander
Diesh,Colin·125Ding,Yali·84
Page 140
132
Dovat,Sinisa·84·28Dowell,RobinD.
·31Draghici,SorinDrögemöller,BrittI.·78,85
·38,122Duan,Rui·44Duchen,Raquel·7,53Dudley,JoelT.·47Dunker,A.Keith
Dunlap,Kaiti·103 ·68Dunlap, Kaitlyn
Dunn,Nathan·125Durmaz,Arda·75
E
Ekstrand,Sophia·95·15El-Kebir,Mohammed
Elo,Laura·117
F
·47Faraggi,EshelFarlik,Matthias·91Feng,Song·96
·43Feng,YunyiFitzGerald,GarretA.·79,92
·60Fondran,JeremyR.Fortelny,Matthias·91
·19Fried,Inbar·23Friedl,Verena
G
·59Gao,Jean ·65Garmire, Lana
Gerstein,Mark·77·10,108Ghadermarzi,Sina
Ghayoori,Sholeh·85Ghosh,Sayan·96
·28Gilchrist,AlisonR.Gizaw,Martha·129
·21Glodde,Josua·22Golgher,Lior
·34,123Gong,LiGorjifard,Sayeh·88Grant,GregoryR.·79,92Groeneweg,GabriellaS.S.·85Gumerov,VadimM.·86
·27,37Guo,MargaretGuo,Yicheng·106
·22Gur,ShirGursoy,Gamze·77
H
·65Ha,MinJin·23Haan,David
·68,103Haber, Nick·35,121Hachen,David
·60Haines,JonathanL.Halappanavar,Mahantesh·96
·66Hall, MollyA.·60Hamilton-Nelson,KaraL.
·24Hao,Jie·5Harati,Sahar
·16,87Harrigan,CaitlinF.Harris,Nomi·125
·8Hauskrecht,Milos ·66He, Xi
·32Hernandez-Ferrer,CarlesHigginson,Michelle·85
·20,76,93Hill,JaneE.·25Hocking,TobyDylan
Hoehndorf,Robert·70,90Hogenesch,JohnB.·92
·11Hogue,ChristopherW.V.Holmes,Ian·125Hong,Elliot·115
·3Horng,Steven·47Huang,Fei·2Huang,Heng·58Huang,Kun
·34,123Huddart,RachelHughitt,V.Keith·88Husic,Arman·103
·56Hwang,TaeHyun
I
Ito,Shinya·78,85
J
·44Jaakkimainen,Liisa·11Jagannathan,N.Suhas
Jahanshad,Neda·115Jang,Hyun-Jong·94Jenkins,Willysha·89Jeong,Euna·120Jørgensen,IsabellaFriis·113Jouline,Igor·74
·7Jun,Tomi·63Jung,Dahuin
Jung,Jae-Yoon·103
Page 141
133
K
Kafkas,Senay·90Kalantari,John·71
·68Kalantarian, HaikKalyva,Maria·111
·24Kang,Mingon·56Kar,Nabhonil
Karjalainen,Anzhelika·91·10,108Katuwawala,Akila
Keats,JonathanJ.·88·41Kelly,Libusha
Kent,Jack·103Khan,Ariful·96
·41Khan,SaadKim,Jacob·114Kim,WooJoo·102
·64Kinzy,Tyler ·66Kleber, MarcusE.
·34,81,123Klein,TeriE. ·68,103Kline, Aaron
·47Kloczkowski,AndrzejKober,KordM.·114
·33Kocher,Jean-PierreKochunov,Peter·115
·55Koestler,DevinC.·19Kohane,IsaacS.
Kolberg,Liis·126·19,52Kompa,Benjamin
·32Kong,SekWonKoohi-Moghadam,Mohamad·72
·24Kosaraju,SaiChandraKoster,Johannes·83
·21Kramer,Stefan·13Kriwacki,RichardW.
Krunic,Milica·91·4Kunder,ChristianA.
·60Kunkle,BrianW.·10,108Kurgan,Lukasz
Kuzmin,Ivan·126
L
Lahens,NicholasF.·79,92·33Larson,MelissaC.·33Larson,NicholasB.
Lassnig,Caroline·91Lawrence,CrisW.·79,92LeBlanc,Emilie·103Lee,E.Alice·82Lee,HanSol·102
·53Lee,Hao-ChihLee,Joon-Yong·96
·45Lee,RenaLee,Soohyun·82Lee,SungHak·94
·15,17Leiserson,MarkD.M.·34,123Lever,Jake
·54Levy,JoshuaJ.
Li,Hongyan·72·7Li,Li
·38,122Li,Ruowang·26Lichtarge,Olivier
Lim,Sooyeon·102·43Lin,Deborah
·64Lin,John·20,93Lin,Justin·43Lin,Simon·43Liu,Chang·65Liu,Qingzhi·35,121Liu,Shikang·12Liu,Xiaorong
Liu,Zhandong·100·35,121Lizardo,Omar
Lowry,WilliamE.·100·6Lu,Xinghua
·36Luthria,Gaurav·45Lv,Tianling
M
Ma,Feiyang·95·58Machiraju,Raghu
Macho-Maschler,Sabine·91 ·66Maerz, Winfried
Magee,LauraA.·85·58Mallick,Parag
Manlove,Logan·111Manrai,ArjunK.·101Mariani,Jessica·111
·5Mayberg,HelenMcDermott,Jason·96
·20,93McDonnell,Lauren·55Meier,Richard
Melamed,RachelD.·116Melas-Kyriazi,Luke·101
·47Meng,JingweiMiao,Fudan·85Michalowski,AleksandraM.·88Mikkola,HannaK.A.·95
·35,121Milenkovic,Tijana·53Miotto,Riccardo·13Mitrea,DianaM.
Mock,BeverlyA.·88Monroy,Rebeca·82
·38,122Moore,JasonH.·43Moosavinasab,Soheil
·16,44,50,87Morris,QuaidMueller,Mathias·91
·66Mueller-Myhsok, Bertram
N
·33Na,JieNakatochi,Masahiro·97Nayak,Soumyashant·79,92Nelson,Heidi·71
Page 142
134
Nelson,William·96·5Nemati,Shamim
Nemesure,Matthew·93·20Nemesure,MatthewD.
·55Neums,LisaNguyen,Justine·96
·31Nguyen,Tin·2Nichols,Kai
·42Nie,AllenNing,Michael·103Nishi,Hafumi·106Noh,JiYun·102
O
·64O'Toole,JohnF.Oh,JiEun·104Ola,MojoyinolaJoanna·91
·10,47,108Oldfield,ChristopherJ.Olufajo,OlubodeA.·80Orloff,Mohammed·75Ostojic,Annie·98
P
·19Palmer,NathanPark,Choa·120Park,Gunseok·104Park,PeterJ.·82
·56Park,Sunho·50Park,Yoonsik
·57Parmigiani,Giovanni ·68,103Paskov, KelleyMarie ·66Passero, Kristin
Patel,ChiragJ.·101·42Patel,RonakY.
·57Patil,Prasad ·68Patnaik, Ritik
Payne,JonathonL.·84Pedersen,Brent·83Pellegrini,Matteo·95Penev,Yordan·103
·37Pershad,Yash·7Perumalswami,Ponni
Peterson,Hedi·126Petkovic,Dragutin·99,127
·26Pham,Minh ·67Pietras, ChristopherMichael
·42Pineda,ArturoL.·49Pinoli,Pietro·49Piro,Rosario
·35,121Poellabauer,ChristianPoelzl,Andrea·91Pohodich,AmyE.·100Polley,EricC.·88
·67Power, LiamProukakis,Christos·111Pruneda,Jonathan·96
·17Przytycka,TeresaM.
Q
Quinlan,AaronR.·83
R
Raman,AyushT.·100·57Ramchandran,Maya·61Ramsey,StephenA.
Rasche,Helena·125Rassekh,Shahrad·78Rassekh,ShahradR.·85Raudvere,Uku·126Reimand,Jüri·110Richardson,Christian·89,118
·47Romero,PedroRoss,ColinJ.D.·78,85
·33Rowsey,Ross·16,87Rubanova,Yulia
·39Ryder,Nathan
S
Sagers,Luke·101Sait,Leena·129
·54Salas,LucasA.Sanatani,Shubhayan·85
·34,123Sangkuhl,KatrinSarantopoulou,Dimitra·79,92
·28Sawyer,SaraL.·38,122Scheuemie,MartijnJ.
·19Schmaltz,AllenSchug,Jonathan·92
·68Schwartz, JesseySedlazeck,Fritz·111
·64Sedor,JohnR.Sekar,Shobana·111
·16,87Selega,Alina·17Sharan,Roded
Sharma,Anchal·130·58Sharpnack,Michael
Shaw,Kaitlyn·85·2Shen,Li·19Shi,Xu
Shoebridge,Stephen·91·21Siekiera,Julia
Simmons,JohnK.·88Skander,Dannielle·75
·67Slonim, DonnaK.·13Somjee,Ramiz·24Song,DaeHyun
Song,JoonYoung·102·3Sontag,David
Sørensen,SørenSchwartz·113·33Sosa,CarlosP.
Page 143
135
·27Sosa,DanielN.Southerland,William·80
·54Sriharan,AravindhanSrinivasan,Anand·92
·58Srivastava,Arunima·28Stabell,AlexC.
·49Stamoulakatou,Eirini·28Stanley,JacobT.
Staub,Michelle·85·4Stehr,Henning
Stepanov,Nicholas·127 ·68,103Stockham, Nathaniel
·35,121Striegel,AaronStrobl,Birgit·91
·23Stuart,JoshuaM.Sun,Hongzhe·72Sun,MinWoo·103Suomi,Tomi·117
T
·23Tao,Ruikang·6Tao,Yifeng
·68Tariq, QandeelTate,Tia·118
·55Thompson,JeffreyA.Thompson,Paul·115
·39Tintle,NathanTomasini,Livia·111
·38,122Tong,JiayiTran,Ivy·130
·59Tran,NhatTrueman,Jessica·85
·24Tsaku,NelsonZange·11Tucker-Kellogg,Lisa
U
Urban,AlexanderE.·111·47Uversky,VladimirN.
V
Vaccarino,FloraM.·111·54Vaickus,LouisJ.
Vaisman,Iosif·129·7Vandromme,Maxence
·68,103Varma, MayaVilo,Jaak·126
·68,103Voss, Catalin
W
Wagner,Sarah·77 ·68,103Wall, DennisP.
Wan,Ying-Wooi·100·42Wand,Hannah
·33Wang,Chen·29Wang,Gao
Wang,Haibo·72·2Wang,Hua
Wang,Junwen·72·36Wang,Qingbo
Wang,Yuchuan·72·29Wang,Yue
·29Wang,Yunlong·60Warfe,Mike
·68,103Washington,Peter·19Weber,Griffin
·27Wei,Eric·23Weinstein,AlanaS.
West,Nicholas·85·39Westra,Jason·34,123Whaley,Ryan
·60Wheeler,NicholasR.·34,123Whirl-Carrillo,Michelle
Whyte,SimonD.·85Williams-DeVane,ClarLynda·89,118Wilson,Jennifer·127
·44Wilton,AndrewS.·44Wodchis,Walter·17Wojtowicz,Damian
·39Wolf,Jack·22Wolf,Lior
·23Wong,ChristopherK.Wong,Mike·127
·34,123Woon,MarkWright,GalenE.B.·78,85
·42Wright,MattW.·29Wu,Tong·42Wulf,Bryan
X
·39Xia,Xueting·45Xing,Lei·47Xue,Bin
Y
Yalamanchili,HariKrishna·100Yang,HeeChul·104Yang,Jizhou·99Yang,Xinming·72
·61Yao,YaoYavartanu,Fatemeh·105Yoo,YunJoo·105Yoon,Sukjoon·120
·63Yoon,Sungroh·50Young,Adamo
·8Yu,KeYue,Feng·84
Page 144
136
Z
·4Zehnder,JamesL.·43Zeng,Xianlong
Zhang,Bo·84·44Zhang,Haoran
Zhang,Lin·106
·8Zhang,Mingda·45Zhao,Wei
Zhou,Bo·111 ·66Zhou, Jiayan
Zhulin,IgorB.·86Zoghbi,HudaY.·100
·42Zou,James