Drug Discovery Today Volume 14, Numbers 5/6 March 2009 REVIEWS Novel web-based tools combining chemistry informatics, biology and social networks for drug discovery Moses Hohman 1 , Kellan Gregory 1 , Kelly Chibale 2 , Peter J. Smith 3 , Sean Ekins 4,5,6 and Barry Bunin 1 1 Collaborative Drug Discovery, Inc. 1818 Gilbreth Road, Suite 220, Burlingame, CA 94403, USA 2 Institute of Infectious Disease and Molecular Medicine and Department of Chemistry, University of Cape Town, Rondebosch 7701, South Africa 3 Division of Pharmacology, Department of Medicine, University of Cape Town, Medical School, K45, OMB, Groote Schuur Hospital, Observatory, 7925, South Africa 4 Collaborations in Chemistry, Jenkintown, PA 19046, USA 5 University of Medicine and Dentistry of New Jersey, Robert Wood Johnson Medical School, 675 Hoes Lane, Piscataway, NJ 08854, USA 6 Department of Pharmaceutical Sciences, University of Maryland, 20 Penn Street, Baltimore, MD 21201, USA A convergence of different commercial and publicly accessible chemical informatics, databases and social networking tools is positioned to change the way that research collaborations are initiated, maintained and expanded, particularly in the realm of neglected diseases. A community-based platform that combines traditional drug discovery informatics with Web2.0 features in secure groups is believed to be the key to facilitating richer, instantaneous collaborations involving sensitive drug discovery data and intellectual property. Heterogeneous chemical and biological data from low-throughput or high- throughput experiments are archived, mined and then selectively shared either just securely between specifically designated colleagues or openly on the Internet in standardized formats. We will illustrate several case studies for anti-malarial research enabled by this platform, which we suggest could be easily expanded more broadly for pharmaceutical research in general. The networked revolution Recent research suggests that open collaborative drug discovery will be the future paradigm of biomedical research [1–3]. Reviews in this journal have provided a perspective on the many publicly accessible, open access chemistry databases and Internet-based collaborative tools [4,5] that are likely to enhance scientific research in future. Some of these public databases are already being used for structure activity relation- ship (SAR) development [6] and rapid lead identification [7]. It takes a combination of biology and chemistry insight, however, to translate molecules into potential drugs and there has been little, if any, discussion of how collaborations between chemists and biologists are to be facilitated [8]. The challenges associated with bringing chemists and biologists together for virtual drug discovery projects for neglected diseases [8] provide an arena for testing new approaches that can perhaps be expanded more broadly to commercial drug discovery projects. The biological data available for sharing are frequently stored in single docu- ment or Excel TM files. Compilation of data is sporadic with no depth and little, if any, standardization of the data formats or crucial information such as experimental procedures and statistical analysis to quantify data quality to allow reproduci- bility and comparisons between groups. Before collaborations begin, data security and integrity should always be considered while intellectual property arrangements [Materials Transfer and intellectual property (IP) Rights Agreements] are often (at least in academia) seen as necessary, but generally as a hin- drance to progress. As a collaboration progresses the needs of data users may change, so it is important to have flexibility in the use of systems for tracking or storage of data and between systems [8]. Any tool that can tap into a growing community of researchers becomes more valuable as a function of Metcalfe’s law, which simply states the value of a network is equal to the square of the Reviews INFORMATICS Corresponding author: Bunin, B. ([email protected]) 1359-6446/06/$ - see front matter ß 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.drudis.2008.11.015 www.drugdiscoverytoday.com 261
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
chemistry informatics, biology and socialnetworks for drug discoveryMoses Hohman1, Kellan Gregory1, Kelly Chibale2, Peter J. Smith3, Sean Ekins4,5,6 andBarry Bunin1
1Collaborative Drug Discovery, Inc. 1818 Gilbreth Road, Suite 220, Burlingame, CA 94403, USA2 Institute of Infectious Disease and Molecular Medicine and Department of Chemistry, University of Cape Town, Rondebosch 7701, South Africa3Division of Pharmacology, Department of Medicine, University of Cape Town, Medical School, K45, OMB, Groote Schuur Hospital, Observatory, 7925, South Africa4Collaborations in Chemistry, Jenkintown, PA 19046, USA5University of Medicine and Dentistry of New Jersey, Robert Wood Johnson Medical School, 675 Hoes Lane, Piscataway, NJ 08854, USA6Department of Pharmaceutical Sciences, University of Maryland, 20 Penn Street, Baltimore, MD 21201, USA
A convergence of different commercial and publicly accessible chemical informatics, databases and
social networking tools is positioned to change the way that research collaborations are initiated,
maintained and expanded, particularly in the realm of neglected diseases. A community-based platform
that combines traditional drug discovery informatics with Web2.0 features in secure groups is believed
to be the key to facilitating richer, instantaneous collaborations involving sensitive drug discovery data
and intellectual property. Heterogeneous chemical and biological data from low-throughput or high-
throughput experiments are archived, mined and then selectively shared either just securely between
specifically designated colleagues or openly on the Internet in standardized formats. We will illustrate
several case studies for anti-malarial research enabled by this platform, which we suggest could be easily
expanded more broadly for pharmaceutical research in general.
The networked revolutionRecent research suggests that open collaborative drug discovery
will be the future paradigm of biomedical research [1–3].
Reviews in this journal have provided a perspective on the
many publicly accessible, open access chemistry databases
and Internet-based collaborative tools [4,5] that are likely to
enhance scientific research in future. Some of these public
databases are already being used for structure activity relation-
ship (SAR) development [6] and rapid lead identification [7]. It
takes a combination of biology and chemistry insight, however,
to translate molecules into potential drugs and there has been
little, if any, discussion of how collaborations between chemists
and biologists are to be facilitated [8]. The challenges associated
with bringing chemists and biologists together for virtual drug
discovery projects for neglected diseases [8] provide an arena for
testing new approaches that can perhaps be expanded more
CDD Technical detailsThe CDD web application is hosted on a dual-Xeon, 4GB RAMserver with a RAID-5 SCSI hard drive array with one online spare. Incase of machine failure, there is an online failover machine withlive database and application code replicates. These machines sitbehind a hardware firewall allowing in only HTTP/S connectionsfrom the Internet. All HTTP requests are redirected to HTTPS,providing transport confidentiality from the user’s browser to theserver. CDD currently colocates servers at ColoServe in SanFrancisco (www.coloserve.com), which provides redundant power,HVAC and backbone connections, fire suppression and physicalsecurity. The CDD software is written in Ruby on Rails over a MySQLdatabase. Ruby on Rails is a novel web application frameworknoted for enabling productive, ‘quick and clean’, well-factoredobject-oriented software development; strong web standardsadherence; thorough automated software testing and horizontalscaling via a shared-nothing architecture. CDD does a full onsiteand offsite backup of the production database on a nightly basisand backs up incremental changes every five minutes whileapplication code is backed up nightly.
Review
s�IN
FORMATICS
hosted collaborative system with an important advantage over
traditional PC-based database systems because it can enable secure
login into the database from any computer, using any common
browser (e.g. Firefox, Internet Explorer or Safari). This unique
capability for a database system provides flexibility for the users.
The CDD web-based database architecture (Box 3; Figure 1B)
handles a broad array of data types that can be archived and then
selectively shared among colleagues or openly shared on the
Internet in standardized formats. The CDD platform incorporates
Marvin, calculated pluggins for physical chemical calculations and
the JChem Cartridge for structure searching from ChemAxon
(Budapest, Hungary) within the application as the chemistry
engine. This allows one to do sophisticated SAR analysis, including
chemical pattern recognition (e.g. similarity and substructure
searching), physical chemical property calculations, Boolean
search and save capabilities for potency, selectivity, toxicity and
other experimentally derived properties. CDD technologies han-
dle heterogeneous data files from instruments and individual
experiments as well as standardized csv and sdf file convertible
formats that represent the chemical and biological data (compa-
tible with the NIH Pubchem initiative). CDD is tailored for com-
mon data formats used by biologists such as Microsoft ExcelTM
(.xls) and text (.txt) files. The technology can mine against a
variety of values including concentration, time, percent, real,
Collaborative research between three different groups sharing chemical structures of interest ‘in house’ with a biologist half-way around the world.Reviews�INFORMATICS
and sources from around the globe for novel or similar com-
pounds. When one is not particularly interested in novel compo-
sitions of matter (as is frequently the case for neglected diseases),
the efficiency of the research can be increased by tapping directly
into data from the current generation and past generations of
scientists.
The most promising compounds from this three way collabora-
tion were shipped to the University of Cape Town, and then tested
to identify novel compounds and several FDA-approved drugs that
almost completely reversed the chloroquine resistance in resistant
strains in human red blood cells (Figure 3).
In this case, there was a known chemotype (chemical substruc-
ture with an aromatic ring four atoms from a secondary nitrogen)
that was conserved among chemosensitizers initially observed in
verapamil [19,24] (Figure 4). Because groups were willing to work
collaboratively, the compounds being screened at UCSF by Pro-
fessor James McKerrow’s group were shared in an ‘invitation-only’,
username and password protected secure group to maintain IP
protection. A substructure search for the known chemosensitizer
substructure led to the identification of hundreds of compounds
for laboratory evaluation by the laboratories of Dr. Peter Smith in
Cape Town. Leading candidates were identified and sent for
Resistance reversal experiments in Plasmodium falciparumK1usingmolecules derived froma substructure search acrossmultiple datasets through theCDDdatabase.Inset shows example dose response curves. The highest concentration at which no antimalarial activity was observed was established for each compound. This
concentration of each compound is included in a chloroquine dose response curve against the chloroquine resistant strain, K1. The ratio of the IC50 in presence and
absence of the compound (RMI) corresponds to the chloroquine reversal activity at the chosen concentrations. Several compounds almost completely reversed
chloroquine resistance in vitro (7-fold), and these include the FDA-approved drugs pimozide, vinblastine, sertraline and dihydroergotamine mesylate.
Review
s�IN
FORMATICS
evaluation of efficacy in assays using the resistant African malarial
parasite strains in human red blood cells. Novel compounds that
almost entirely reversed the resistance were identified (Figure 3).
This process shaves months off a project timeline relative to
synthesizing new compounds from scratch.
The same substructure query was used on the set of known FDA-
approved and orphan-approved drug compounds (including struc-
tures) provided by Dr. Christopher Lipinski (www.collaborative-
drug.com/register). Because the compounds are already approved
for other indications, they could be developed rapidly if found to
be efficacious. Eighteen compounds were identified with the
conserved substructure and half a dozen were purchased, shipped
to Africa and, when tested in the assay, these known drugs were
shown to reverse (7-fold reversal) the resistance in human blood
cells almost completely (Figure 4). Because the compounds in the
Lipinski-CDD Database are drugs that are already known to be safe
and efficacious in humans, the process could save years off the
drug development timeline [25,26]. The repurposing of old drugs
for malaria has also been indicated by others recently [27,28] as a
generally useful strategy that can also be applied elsewhere.
Temporarily restricted data sharingA second example of how the CDD platform can be used involves a
large set of anti-malarial animal SAR data that was intentionally
kept private for 12 months before being released for use by the
malaria researcher community by Professor R. Kiplan Guy (St. Jude
Childrens Research Hospital). The data came from a two-volume
collection of studies on malarial drugs published by the U.S. Army
in 1946 [29]. This publication had contributions from a number of
leading researchers of the time and was designed to help research-
266 www.drugdiscoverytoday.com
ers develop effective anti-malarial drugs, and to serve as a model
for how scientists could develop drugs for other infections. The
corresponding SAR dataset consisted of over 12 000 hand-drawn
molecules with bioactivity relative to known compounds tested in
half a dozen animal species. The collection contains other phar-
macological data, in addition to their level of toxicity (see
Figure 5). Although the original studies were decades old, now,
for the first time, the data are accessible in a format for computa-
tional model building and direct comparisons with recent experi-
mental results. Professor Alex Tropsha’s group at the University of
North Carolina was able to build new predictive computational
models using their combinatorial QSAR modeling techniques
[30,31] with this ‘new’ data. Initially, 131 active and 228 inactive
compounds (that were most chemically similar to actives) were
selected from 3133 compounds screened for anti-P falciparum (3D7
strain) activity and used to develop preliminary combinatorial
QSAR k-nearest neighbors (kNN) classification models with Dra-
gon descriptors. Three hundred and eighty three internally vali-
dated models afforded a correct classification rate for an external
dataset of 80.7%. Additionally, 674 compounds (with log activity
�1.52�2.78) with in vivo data from Peking ducks inoculated with
Plasmodium lophurae malaria were also used to generate 283 con-
tinuous kNN models (R2 = 0.80 for an external test set of 80
molecules). These models enabled virtual screening of libraries
of compounds to find further compounds for in vitro testing and
repopulating the CDD database for selection of candidates for
further in vitro testing.
In this case, the group only has access to data with the permis-
sion of the data owners to generate and refine a master combina-
torial model. Moreover, the exchange of data is governed by
Community-based anti-malarial animal data. These data were released for general use in a public group following a 12 months escrow period when the data were
exclusively only in a private group.
FIGURE 6
Chemists, biologists and computational scientists can privately or openly share structures, SAR and predictions via CDD Database as part of a growing community
Representative drug-centric view with structural information, bioactivity data and calculated properties. Target centric views are also supported for target
validation.
Reviews�INFORMATICS
datasets would provide a technical solution to enable them to
make decisions if, when and with whom they want in order to
share some, all or none of their data. The same model could also be
applied to enable greater efficiency for pharmaceutical and bio-
technology companies as well as academic-driven or foundation-
driven drug discovery in secure, private collaborative groups. In all
cases, collaborative researchers can go far beyond what they would
normally be able to do with just their own limited laboratory
networks, ideas and resources.
ConclusionWhat do these trends mean for the future of drug discovery
research and development informatics technologies and where
is this field headed? To date, there has not been an extensive
assessment of previously developed web semantic tools and their
utilization. Yet, even in the absence of this, the trend across all
industries towards SAAS web-hosted applications will become
more prevalent for the drug discovery industry too. The incor-
poration of more private to private collaborative features and
web2.0 social networking features would provide an integrated
platform (CDD or similar sets of technologies) as a personal e-lab
notebook for capturing organizing and collaborating with other
scientists in the growing community, while maintaining the
required security and privacy features. These new capabilities
provide a useful, secure environment for any research and devel-
opment organization to tap immediately into collective expertize
whether it is connecting academic postdoctoral researchers,
employees at a research organization half-way around the world
with their colleagues in other countries or employees within a
large biopharmaceutical company. Participating laboratories
contribute in aggregate to the generation of datasets and predic-
tive models, yet no specific data or approaches need to be exposed
to the other research groups. Each group is then able to exploit the
model to help guide its own screening activities or explore other
scaffolds in silico without revealing any aspects of its intellectual
approach.
New informatics tools that incorporate biology and chemistry
with social networking technologies should enable a better, faster,
cheaper mechanism to discover and advance drug candidates in a
collaborative manner, regardless of whether they are for neglected,
orphan or potential ‘blockbuster’ diseases.
Conflicts of interestMoses Hohman, Kellan Gregory and Barry Bunin are employed by
Collaborative Drug Discovery Inc. Sean Ekins is a consultant for
Collaborative Drug Discovery Inc. Kelly Chibale and Peter J. Smith
have no conflicts of interest to declare.
AcknowledgementsThe authors gratefully acknowledge Jim Wikel and Deborah
Bunin for comments and the support of the following researchers
without whom none of this would have been possible: C. Lipinski
(Melior Discovery), J. McKerrow (UCSF), E. Hansell (UCSF), K. Guy
(St Jude CRH), A. Shelat (St. Jude CRH), A.Tropsha (Univ. of North