Page 1
The Changing Face of Scholarly Communication and the Opportunities it
Affords the Bioinformatics/Systems Biology Student
Philip E. BourneUniversity of California San Diego
[email protected] ://www.sdsc.edu/pb
Third UCSD Bioinformatics and Systems Biology Expo – 2/28/2011
Page 2
Observation 1: Everyone in this Room is Driven by One Thing Above All Else
Page 3
Observation 2: We Are a Field That Uses/Produces Public On-Line Data Like No Other
Page 4
Observation 3: We Have Shaped the Way Data Are Shared – We Have Had Very Little Impact on
Publications
Page 5
Perhaps it is Time We Though Less About a Publication as a Reward and More About How it
Can be Presented to Maximize its Use
Page 6
So What Needs to Happen
– We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives
– We need to be more open with both– We need to think more about the tools that analyze,
visualize and annotate data to maximize knowledge discovery
– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet
Easy Hard
Page 7
One Personal Example of Why This Needs to Happen Now
Page 8
Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation
http://sagecongress.org/Presentations/Sommer.pdf
Page 9
Chordoma
• A rare form of brain cancer
• No known drugs• Treatment – surgical
resection followed by intense radiation therapy
http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG
Page 10
http://sagecongress.org/Presentations/Sommer.pdf
Page 11
http://sagecongress.org/Presentations/Sommer.pdf
Page 12
http://sagecongress.org/Presentations/Sommer.pdf
Page 13
Adapted: http://sagecongress.org/Presentations/Sommer.pdf
Isaac
If I have seen further it is only by standing on the shoulders of giants
Isaac Newton
From Josh’s point of view the climb up just takes too long
> 15 years and > $850M to be more precise
Page 14
http://sagecongress.org/Presentations/Sommer.pdf
Page 15
http://sagecongress.org/Presentations/Sommer.pdf
Page 16
http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation
Page 17
So We Have Seem What Needs the Change and Why. What about the How?
Page 18
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
We Need Data and Knowledge About That
Data to Interoperate
1. User clicks on content2. Metadata and
webservices to data provide an interactive view that can be annotated
3. Selecting features provides a data/knowledge mashup
4. Analysis leads to new content I can share
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
PLoS Comp. Biol. 2005 1(3) e34
Page 19
We Need Data and Knowledge About That Data to Interoperate – What is Stopping US?
• Open Access• Governance – publishers vs. database
providers• Reward• Metadata standards for provenance, privacy
etc.• Exemplars• ….
Page 20
A Small Example - The World Wide Protein Data Bank
• The single worldwide repository for data on the structure of biological macromolecules
• Vital for drug discovery and the life sciences
• 39 years old• Free to all
http://www.wwpdb.org
We need data and knowledge about that data to interoperatePLoS Comp. Biol. 2005 1(3) e34
Page 21
The World Wide Protein Data Bank – The Best Case Scenario
• Paper not published unless data are deposited – strong data to literature correspondence
• Highly structured data conforming to an extensive ontology
• DOI’s assigned to every structure
http://www.wwpdb.org
We need data and knowledge about that data to interoperatePLoS Comp. Biol. 2005 1(3) e34
Page 22
www.rcsb.org/pdb/explore/literature.do?structureId=1TIM
Example Interoperability: The Database View
We need data and knowledge about that data to interoperateBMC Bioinformatics 2010 11:220
Page 23
Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu
Nucleic Acids Research 2008 36(S2) W385-389 We need data and knowledge about that data to interoperate
Page 24
ICTP Trieste, December 10, 2007We need data and knowledge about that data to interoperate
Page 25
Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that
Data, But as Yet Not Used Much
Will Widgets and Semantic Tagging Change Computational Biology? PLoS Comp. Biol. 6(2) e1000673
We need data and knowledge about that data to interoperate
Page 26
Semantic Tagging of Database Content in The Literature or Elsewhere
http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jspPLoS Comp. Biol. 6(2) e1000673Semantic Tagging
Page 27
We need data and knowledge about that data to interoperate
Page 28
The Publishers are Starting to Do It
From Anita de Waard, Elsevier
Page 29
This is Literature Post-processingBetter to Get the Authors Involved
• Authors are the absolute experts on the content
• More effective distribution of labor
• Add metadata before the article enters the publishing process
We need data and knowledge about that data to interoperate
Page 30
Word 2007 Add-in for authors
• Allows authors to add metadata as they write, before they submit the manuscript
• Authors are assisted by automated term recognition– OBO ontologies– Database IDs
• Metadata are embedded directly into the manuscript document via XML tags, OOXML format– Open– Machine-readable
• Open source, Microsoft Public License
http://www.codeplex.com/ucsdbiolit
We need data and knowledge about that data to interoperate
Page 31
Challenges
• Authors – Carrot IF one or more publishers fast tracked a
paper that had semantic markup it might catch on
• Publishers– Carrot Competitive advantage
We need data and knowledge about that data to interoperate
Page 32
The Promise – A Hypothetical Example
Immunology Literature
Cardiac DiseaseLiterature
Shared Function
We need data and knowledge about that data to interoperate
Page 33
High-throughput Biology Requires High-throughput Knowledge Discovery
Consider an Example from Our Own Work…
Roger Chang Will Give You Another Example
Page 34
The TB-Drugome1. Determine the TB structural proteome
2. Determine all known drug binding sites from the PDB
3. Determine which of the sites found in 2 exist in 1
4. Call the result the TB-drugome
High-throughput Data Requires High-throughput Knowledge
Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
Page 35
1. Determine the TB Structural Proteome
284
1, 446
3, 996 2, 266
TB proteome
homology models
solved structu
res
• High quality homology models from ModBase (http://modbase.compbio.ucsf.edu) increase structural coverage from 7.1% to 43.3%
Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
Page 36
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 370
20
40
60
80
100
120
140
2. Determine all Known Drug Binding Sites in the PDB
• Searched the PDB for protein crystal structures bound with FDA-approved drugs
• 268 drugs bound in a total of 931 binding sites
No. of drug binding sites
No.
of d
rugs
MethotrexateChenodiol
AlitretinoinConjugated estrogens
DarunavirAcarbose
Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
Page 37
Map 2 onto 1 – The TB-Drugomehttp://funsite.sdsc.edu/drugome/TB/
Similarities between the binding sites of M.tb proteins (blue), and binding sites containing approved drugs (red).
Page 38
1 2 3 4 5 6 7 8 9 10 11 12 13 140
2
4
6
8
10
12
14
16
18
20
From a Drug Repositioning Perspective
• Similarities between drug binding sites and TB proteins are found for 61/268 drugs
• 41 of these drugs could potentially inhibit more than one TB protein
No. of potential TB targets
No.
of
drug
s raloxifenealitretinoin
conjugated estrogens &methotrexate
ritonavir
testosteronelevothyroxine
chenodiol
Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
Page 39
Top 5 Most Highly Connected Drugs
Drug Intended targets Indications No. of connections TB proteins
levothyroxine transthyretin, thyroid hormone receptor α & β-1, thyroxine-binding globulin, mu-crystallin homolog, serum albumin
hypothyroidism, goiter, chronic lymphocytic thyroiditis, myxedema coma, stupor
14
adenylyl cyclase, argR, bioD, CRP/FNR trans. reg., ethR, glbN, glbO, kasB, lrpA, nusA, prrA, secA1, thyX, trans. reg. protein
alitretinoin retinoic acid receptor RXR-α, β & γ, retinoic acid receptor α, β & γ-1&2, cellular retinoic acid-binding protein 1&2
cutaneous lesions in patients with Kaposi's sarcoma 13
adenylyl cyclase, aroG, bioD, bpoC, CRP/FNR trans. reg., cyp125, embR, glbN, inhA, lppX, nusA, pknE, purN
conjugated estrogens
estrogen receptormenopausal vasomotor symptoms, osteoporosis, hypoestrogenism, primary ovarian failure
10acetylglutamate kinase, adenylyl cyclase, bphD, CRP/FNR trans. reg., cyp121, cysM, inhA, mscL, pknB, sigC
methotrexatedihydrofolate reductase, serum albumin
gestational choriocarcinoma, chorioadenoma destruens, hydatidiform mole, severe psoriasis, rheumatoid arthritis
10acetylglutamate kinase, aroF, cmaA2, CRP/FNR trans. reg., cyp121, cyp51, lpd, mmaA4, panC, usp
raloxifeneestrogen receptor, estrogen receptor β
osteoporosis in post-menopausal women 9
adenylyl cyclase, CRP/FNR trans. reg., deoD, inhA, pknB, pknE, Rv1347c, secA1, sigC
Page 40
We Need Better Ways to Associate Data and Knowledge and its More than Just Text Mining of
PubMed Abstracts – Its About Changing the System
Our Future is in Your Hands!
Page 41
Acknowledgements• BioLit Team
– Lynn Fink– Parker Williams– Marco Martinez– Rahul Chandran– Greg Quinn
• Microsoft Scholarly Communications– Pablo Fernicola– Lee Dirks– Savas Parastitidas– Alex Wade– Tony Hey
• RCSB PDB team– Andreas Prilc– Dimitris Dimitropoulos
• TB Drugome Team– Lei Xie– Sarah Kinnings– Li Xie
http://funsite.sdsc.edu/drugome/TB/
http://biolit.ucsd.eduhttp//www.pdb.orghttp://www.codeplex.com/ucsdbiolit
Page 42
Questions?
[email protected]