Scholarly Communication for Bioinformatics Students

Post on 24-Jan-2015

772 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation made to the incoming bioinformatics and systems biology students at UCSD on how they could get involved in changing scholarly communication. Given February 28, 2011

Transcript

The Changing Face of Scholarly Communication and the Opportunities it

Affords the Bioinformatics/Systems Biology Student

Philip E. BourneUniversity of California San Diego

pbourne@ucsd.eduhttp://www.sdsc.edu/pb

Third UCSD Bioinformatics and Systems Biology Expo – 2/28/2011

Observation 1: Everyone in this Room is Driven by One Thing Above All Else

Observation 2: We Are a Field That Uses/Produces Public On-Line Data Like No Other

Observation 3: We Have Shaped the Way Data Are Shared – We Have Had Very Little Impact on

Publications

Perhaps it is Time We Though Less About a Publication as a Reward and More About How it

Can be Presented to Maximize its Use

So What Needs to Happen

– We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives

– We need to be more open with both– We need to think more about the tools that analyze,

visualize and annotate data to maximize knowledge discovery

– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet

Easy Hard

One Personal Example of Why This Needs to Happen Now

Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation

http://sagecongress.org/Presentations/Sommer.pdf

Chordoma

• A rare form of brain cancer

• No known drugs• Treatment – surgical

resection followed by intense radiation therapy

http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG

http://sagecongress.org/Presentations/Sommer.pdf

http://sagecongress.org/Presentations/Sommer.pdf

http://sagecongress.org/Presentations/Sommer.pdf

Adapted: http://sagecongress.org/Presentations/Sommer.pdf

Isaac

If I have seen further it is only by standing on the shoulders of giants

Isaac Newton

From Josh’s point of view the climb up just takes too long

> 15 years and > $850M to be more precise

http://sagecongress.org/Presentations/Sommer.pdf

http://sagecongress.org/Presentations/Sommer.pdf

http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation

So We Have Seem What Needs the Change and Why. What about the How?

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

We Need Data and Knowledge About That

Data to Interoperate

1. User clicks on content2. Metadata and

webservices to data provide an interactive view that can be annotated

3. Selecting features provides a data/knowledge mashup

4. Analysis leads to new content I can share

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

The Knowledge and Data Cycle

PLoS Comp. Biol. 2005 1(3) e34

We Need Data and Knowledge About That Data to Interoperate – What is Stopping US?

• Open Access• Governance – publishers vs. database

providers• Reward• Metadata standards for provenance, privacy

etc.• Exemplars• ….

A Small Example - The World Wide Protein Data Bank

• The single worldwide repository for data on the structure of biological macromolecules

• Vital for drug discovery and the life sciences

• 39 years old• Free to all

http://www.wwpdb.org

We need data and knowledge about that data to interoperatePLoS Comp. Biol. 2005 1(3) e34

The World Wide Protein Data Bank – The Best Case Scenario

• Paper not published unless data are deposited – strong data to literature correspondence

• Highly structured data conforming to an extensive ontology

• DOI’s assigned to every structure

http://www.wwpdb.org

We need data and knowledge about that data to interoperatePLoS Comp. Biol. 2005 1(3) e34

www.rcsb.org/pdb/explore/literature.do?structureId=1TIM

Example Interoperability: The Database View

We need data and knowledge about that data to interoperateBMC Bioinformatics 2010 11:220

Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu

Nucleic Acids Research 2008 36(S2) W385-389 We need data and knowledge about that data to interoperate

ICTP Trieste, December 10, 2007We need data and knowledge about that data to interoperate

Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that

Data, But as Yet Not Used Much

Will Widgets and Semantic Tagging Change Computational Biology? PLoS Comp. Biol. 6(2) e1000673

We need data and knowledge about that data to interoperate

Semantic Tagging of Database Content in The Literature or Elsewhere

http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jspPLoS Comp. Biol. 6(2) e1000673Semantic Tagging

We need data and knowledge about that data to interoperate

The Publishers are Starting to Do It

From Anita de Waard, Elsevier

This is Literature Post-processingBetter to Get the Authors Involved

• Authors are the absolute experts on the content

• More effective distribution of labor

• Add metadata before the article enters the publishing process

We need data and knowledge about that data to interoperate

Word 2007 Add-in for authors

• Allows authors to add metadata as they write, before they submit the manuscript

• Authors are assisted by automated term recognition– OBO ontologies– Database IDs

• Metadata are embedded directly into the manuscript document via XML tags, OOXML format– Open– Machine-readable

• Open source, Microsoft Public License

http://www.codeplex.com/ucsdbiolit

We need data and knowledge about that data to interoperate

Challenges

• Authors – Carrot IF one or more publishers fast tracked a

paper that had semantic markup it might catch on

• Publishers– Carrot Competitive advantage

We need data and knowledge about that data to interoperate

The Promise – A Hypothetical Example

Immunology Literature

Cardiac DiseaseLiterature

Shared Function

We need data and knowledge about that data to interoperate

High-throughput Biology Requires High-throughput Knowledge Discovery

Consider an Example from Our Own Work…

Roger Chang Will Give You Another Example

The TB-Drugome1. Determine the TB structural proteome

2. Determine all known drug binding sites from the PDB

3. Determine which of the sites found in 2 exist in 1

4. Call the result the TB-drugome

High-throughput Data Requires High-throughput Knowledge

Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976

1. Determine the TB Structural Proteome

284

1, 446

3, 996 2, 266

TB proteome

homology models

solved structu

res

• High quality homology models from ModBase (http://modbase.compbio.ucsf.edu) increase structural coverage from 7.1% to 43.3%

Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 370

20

40

60

80

100

120

140

2. Determine all Known Drug Binding Sites in the PDB

• Searched the PDB for protein crystal structures bound with FDA-approved drugs

• 268 drugs bound in a total of 931 binding sites

No. of drug binding sites

No.

of d

rugs

MethotrexateChenodiol

AlitretinoinConjugated estrogens

DarunavirAcarbose

Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976

Map 2 onto 1 – The TB-Drugomehttp://funsite.sdsc.edu/drugome/TB/

Similarities between the binding sites of M.tb proteins (blue), and binding sites containing approved drugs (red).

1 2 3 4 5 6 7 8 9 10 11 12 13 140

2

4

6

8

10

12

14

16

18

20

From a Drug Repositioning Perspective

• Similarities between drug binding sites and TB proteins are found for 61/268 drugs

• 41 of these drugs could potentially inhibit more than one TB protein

No. of potential TB targets

No.

of

drug

s raloxifenealitretinoin

conjugated estrogens &methotrexate

ritonavir

testosteronelevothyroxine

chenodiol

Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976

Top 5 Most Highly Connected Drugs

Drug Intended targets Indications No. of connections TB proteins

levothyroxine transthyretin, thyroid hormone receptor α & β-1, thyroxine-binding globulin, mu-crystallin homolog, serum albumin

hypothyroidism, goiter, chronic lymphocytic thyroiditis, myxedema coma, stupor

14

adenylyl cyclase, argR, bioD, CRP/FNR trans. reg., ethR, glbN, glbO, kasB, lrpA, nusA, prrA, secA1, thyX, trans. reg. protein

alitretinoin retinoic acid receptor RXR-α, β & γ, retinoic acid receptor α, β & γ-1&2, cellular retinoic acid-binding protein 1&2

cutaneous lesions in patients with Kaposi's sarcoma 13

adenylyl cyclase, aroG, bioD, bpoC, CRP/FNR trans. reg., cyp125, embR, glbN, inhA, lppX, nusA, pknE, purN

conjugated estrogens

estrogen receptormenopausal vasomotor symptoms, osteoporosis, hypoestrogenism, primary ovarian failure

10acetylglutamate kinase, adenylyl cyclase, bphD, CRP/FNR trans. reg., cyp121, cysM, inhA, mscL, pknB, sigC

methotrexatedihydrofolate reductase, serum albumin

gestational choriocarcinoma, chorioadenoma destruens, hydatidiform mole, severe psoriasis, rheumatoid arthritis

10acetylglutamate kinase, aroF, cmaA2, CRP/FNR trans. reg., cyp121, cyp51, lpd, mmaA4, panC, usp

raloxifeneestrogen receptor, estrogen receptor β

osteoporosis in post-menopausal women 9

adenylyl cyclase, CRP/FNR trans. reg., deoD, inhA, pknB, pknE, Rv1347c, secA1, sigC

We Need Better Ways to Associate Data and Knowledge and its More than Just Text Mining of

PubMed Abstracts – Its About Changing the System

Our Future is in Your Hands!

Acknowledgements• BioLit Team

– Lynn Fink– Parker Williams– Marco Martinez– Rahul Chandran– Greg Quinn

• Microsoft Scholarly Communications– Pablo Fernicola– Lee Dirks– Savas Parastitidas– Alex Wade– Tony Hey

• RCSB PDB team– Andreas Prilc– Dimitris Dimitropoulos

• TB Drugome Team– Lei Xie– Sarah Kinnings– Li Xie

http://funsite.sdsc.edu/drugome/TB/

http://biolit.ucsd.eduhttp//www.pdb.orghttp://www.codeplex.com/ucsdbiolit

Questions?

pbourne@ucsd.edu

top related