Development of a Screening Informa3cs System at the UNM Center for Molecular Discovery Mesa OpenEye OpenBabel SciTouch Powered by: Jeremy Yang, Oleg Ursu, Stephen Mathias, Cristian Bologa, Anna Waller, Annette Evangelisti, Gergely Záhoransky-Köhalmi, and Tudor Oprea University of New Mexico, Albuquerque, New Mexico, USA ACS National Meeting, San Diego, March 25-29, 2012 Intro to Flow Cytometry at UNMCMD References: 1.Flow Cytometry Shi8ing Gears, Gene=c Eng & Biotech News, Nov 15, 2011 (Vol. 31, No. 20) , hJp://www.genengnews.com/genar=cles/flowcytometryshi8inggears/3913 . 2.Edwards BS, Young SM, Saunders MJ, Bologa C, Oprea TI, Ye RD, Prossnitz ER, Graves SW, Sklar LA. High throughput flow cytometry for drug discovery. Expert Opin. Drug Discov. 2, 685696, 2007. 3.Haynes MK, Strouse JJ, Waller A, Leitão A, Curpan RF, Bologa C, Oprea TI, Prossnitz ER, Edwards BS, Sklar LA, Thompson TA. Detec=on of intracellular granularity induc=on in prostate cancer cell lines by small molecules using the HyperCyt® high throughput flow cytometry system. J. Biomol. Screening, 14, 596609, 2009 4.The NIH's Molecular Libraries Program What's Next? | SLAS Electronic Laboratory Neighborhood, hJp:// www.eln.slas.org/story/1/52thenihsmolecularlibrariesprogramwhatsnext. Mul=plexed! One of the primary mo=va=ons for cheminforma=cs has been drug discovery which involves bioassay screening and increasingly, high throughput screening (HTS). What is screening informa3cs? • Informa=cs in support of screening for biomolecular discovery, usually pharma discovery: Acquisi=on, processing and storage of bioassay data for use during projects and for retrospec=ve analyses. • Searching over molecules, assays, ac=vi=es, targets, etc. I/O & integra=on in conformance with contractual, legal/regulatory, business, and scien=fic requirements. • Applica=ons and interfaces suited to transdisciplinary audience (biology, chemistry, pharmacology, medicine, etc.). Screening Informa3cs ≠ Cheminforma3cs !! Cheminforma=cs is a key part of screening informa=cs but biology is primary. Plates, wells, samples, and measurements are physically real and informa=cally authorita=ve while structure data is a model which may be incorrect or imprecise. Chem and bio contexts must be integrated for successful system. E.g. EC50 = 1.7µM is about a sample, a well, a plate, an assay, a biological system… eventually we hope about a lead compound. Why screening informa3cs? Major challenges • New methodology, such as highcontent and mul=plex bioassays • More data, internal and external • New privacy and collabora=on models • Advances in cheminforma=cs and bioinforma=cs methodology • Development concurrent with ongoing projects and deadlines requiring con=nually opera=onal system. No shrinkwrapped solu3ons Due to the complexity of modern screening informa=cs, and in par=cular our novel, highly versa=le mul=plex flowcytometry plasorm (patented, and commercialized as HyperCyt), there cannot be a shrink wrapped solu=on providing all needed func=onality for all possible experiments. Solu3on: hybrid, agile system of apps & APIs Heterogeneous so8ware components from (1) commercial vendors, (2) open source projects, and (3) custom code developed at UNM. AEI & Pipeline Pilot & customiza3on A8er licensing the Accelrys Accord Enterprise Informa=cs (AEI) and Pipeline Pilot (PP) so8ware in 2009, efforts began to configure and customize AEI/PP. AccelrysUNMCMD consulta=on, customiza=on and training, revealed (1) what components could be used with minor configura=on efforts, and (2) scope of required custom coding. This experience was essen=al and decisive in the evolu=onary design process. UNMCMD specialized for flow cytometry Automa3ng when every assay is special Flow cytometry generates mul=ple fluorescence measurements per sample and per target, where mul=plex = mul=target. Even “singleplex” assays may employ mul=ple posi=ve and nega=ve control targets. Assays can differ greatly in raw data outputs and analysis protocols to calculate a “response” represen=ng a biological outcome (e.g. binding to a target). In some cases, it may seem more appropriate to conceive an API (programming interface) to recode each assay analysis rather than an informa=cs system, flexible but generally constant, and in fact, our solu=on combines elements of both. Custom code: Using the right tools for the tasks Custom so8ware development has included: Oracle SQL w/ AEI, Excel macros, Perl, Java, Python, NCBI EntrezU=ls apps, custom PP protocols, Prism batch code, and more. Interfaces include command line apps, web apps, and inhouse APIs for rapid development. Conclusion The good news is that advances in so8ware and informa=cs provide choices of solu=ons and opportuni=es to effec=vely manage screening. The complexity of the so8ware landscape is truly both a challenge and opportunity. It is hoped that our experiences will be helpful to others similarly tasked with designing and implemen=ng a screening informa=cs system. c/o Anna Waller, UNMCMD HyperViewSession_20110603 Accurate data acquisi3on key prerequisite Excel remains an important tool for scien=fic data processing, analysis and visualiza=on, at UNMCMD and elsewhere. But it has fundamental limita=ons and drawbacks, esp. data and code access and version control. E.g. Bcl2 assay analysis worksheets, UNMCMD, 2007 (PubChem AID=1693). MicroSoP Excel, not going away soon Screening informa=cs depends on accurate measurements with addi=onal informa=cs challenges, such as “binning”, i.e. correla=ng fluorescence data to wells and substances. AEVA (Assay Explora3on, Viewing & Analysis) web app PP protocol, via WebPort, to generate PubChem compliant depositor upload. Hit Defini3on: various assays, various methods •Response: >(ac=va=on) or <(inhibi=on) cutoff •SD: >(ac=va=on) or <(inhibi=on) cutoff SDs from plate mean. •Custom: custom func=on specified for assays with "special needs“. Custom may include countertargets, mul=ple +/ controls, etc., etc.