Top Banner
The Reaming of Life Philip E. Bourne University of California San Diego [email protected] Jim Gray eScience Award Lecture Oct. 12, 2010
57

Jim Gray Award Lecture

May 10, 2015

Download

Education

Philip Bourne

The Jim Gray Award Lecture presented at the Microsoft Research Symposium on October 12, 2010.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

The Reaming of Life

Philip E. BourneUniversity of California San Diego

[email protected]

Oct. 12, 2010

Page 2: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

Disclaimer

• I am a domain (life) scientist not a computer or information scientist

• I am fortunate enough to have a major biological resource (the Protein Data Bank) and a major biological journal (PLoS Computational Biology) as my playground

• I am part of the long tail

• I am naïve, but I am the majorityOct. 12, 2010

Page 3: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

The Reaming of Life -What on Earth is He Talking About?

• A reamer is a tool for turning a roughly punched hole into an accurate and smooth one

• The digital data deluge has punched that rough hole

• For the life {other?} sciences to optimally advance we need an accurate and smooth conduit through which data can be distilled, analyzed, visualized, distributed and above all else comprehended

Oct. 12, 2010

Page 4: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

… and we need to accelerate the process by which this is done

here is why ….

This is just another way of saying what Jim said and is embodied in the

Fourth Paradigm

Oct. 12, 2010

Page 5: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

The Scientific Process is Too Slow to Respond to a Crisis – Either Global or Personal

Oct. 12, 2010Motivation

http://knol.google.com/k/plos-currents-influenza#

By the time the paper is published we could all be dead

Page 6: Jim Gray Award Lecture

* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm

Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010

1RUZ: 1918 H1 Hemagglutinin

Structure Summary page activity forH1N1 Influenza related structures

3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir

In a time of crisis the need for fast access to accurate data and any knowledge ofthat data are paramount

Motivation

Page 7: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

If that is not enough…

For some people the scientific process may be too slow to save their life

Oct. 12, 2010Motivation

Page 8: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation

Oct. 12, 2010http://sagecongress.org/Presentations/Sommer.pdf

Motivation

Page 9: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

Chordoma

• A rare form of brain cancer

• No known drugs• Treatment – surgical

resection followed by intense radiation therapy

Oct. 12, 2010Motivation

http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG

Page 10: Jim Gray Award Lecture

Jim Gray eScience Award LectureOct. 12, 2010

http://sagecongress.org/Presentations/Sommer.pdf

Motivation

Page 11: Jim Gray Award Lecture

Jim Gray eScience Award LectureOct. 12, 2010

http://sagecongress.org/Presentations/Sommer.pdf

Motivation

Page 12: Jim Gray Award Lecture

Jim Gray eScience Award LectureOct. 12, 2010

http://sagecongress.org/Presentations/Sommer.pdf

Motivation

Page 13: Jim Gray Award Lecture

Jim Gray eScience Award LectureOct. 12, 2010

Adapted: http://sagecongress.org/Presentations/Sommer.pdf

Motivation

Isaac

If I have seen further it is only by standing on the shoulders of giants

Isaac Newton

From Josh’s point of view the climb up just takes too long

> 15 years and > $850M to be more precise

Page 14: Jim Gray Award Lecture

Jim Gray eScience Award LectureOct. 12, 2010

http://sagecongress.org/Presentations/Sommer.pdf

Motivation

Page 15: Jim Gray Award Lecture

Jim Gray eScience Award LectureOct. 12, 2010Motivation

http://sagecongress.org/Presentations/Sommer.pdf

Page 16: Jim Gray Award Lecture

Jim Gray eScience Award LectureOct. 12, 2010

http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation

Motivation

Page 17: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

Now we are all hopefully motivated let us break this down to what actually needs to be done in my opinion

Here are a few big things …

Oct. 12, 2010

Page 18: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

A Few Things to Accelerate the Rate of Scientific Discovery

• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate

i.e. we need new kinds of fast, versatile publications and data archives

– We need to be more open with both– We need to think more about the tools that analyze, visualize

and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet

Oct. 12, 2010 Easy Hard

Page 19: Jim Gray Award Lecture

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

We Need Data and Knowledge About That

Data to Interoperate

1. User clicks on content2. Metadata and

webservices to data provide an interactive view that can be annotated

3. Selecting features provides a data/knowledge mashup

4. Analysis leads to new content I can share

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

The Knowledge and Data Cycle

PLoS Comp. Biol. 2005 1(3) e34

Page 20: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

We Need Data and Knowledge About That Data to Interoperate – What is Stopping US?

• Governance – publishers vs. database providers

• Reward• Metadata standards for provenance, privacy

etc.• Exemplars• ….

Oct. 12, 2010

Caveat: Each discipline is different – I speak very much from a biomedicalsciences perspective

Page 21: Jim Gray Award Lecture

Certainly the Argument for Interoperability in the Biomedical Sciences is Strong

• PubMed contains 18,792,257 entries

• ~100,000 papers indexed per month

• In Feb 2009:– 67,406,898 interactive

searches were done– 92,216,786 entries were

viewed

• 1078 databases reported in NAR 2008

• MetaBase http://biodatabase.org reports 2,651 entries edited 12,587 times

Data as of April 14, 2009

We need data and knowledge about that data to interoperatePLoS Comp. Biol. 2005 1(3) e34

Page 22: Jim Gray Award Lecture

A Small Example - The World Wide Protein Data Bank

• The single worldwide repository for data on the structure of biological macromolecules

• Vital for drug discovery and the life sciences

• 39 years old• Free to all

http://www.wwpdb.org

We need data and knowledge about that data to interoperatePLoS Comp. Biol. 2005 1(3) e34

Page 23: Jim Gray Award Lecture

The World Wide Protein Data Bank – The Best Case Scenario

• Paper not published unless data are deposited – strong data to literature correspondence

• Highly structured data conforming to an extensive ontology

• DOI’s assigned to every structure

http://www.wwpdb.org

We need data and knowledge about that data to interoperatePLoS Comp. Biol. 2005 1(3) e34

Page 24: Jim Gray Award Lecture

www.rcsb.org/pdb/explore/literature.do?structureId=1TIM

Example Interoperability: The Database View

We need data and knowledge about that data to interoperateBMC Bioinformatics 2010 11:220

Page 25: Jim Gray Award Lecture

Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu

Nucleic Acids Research 2008 36(S2) W385-389 We need data and knowledge about that data to interoperate

Page 26: Jim Gray Award Lecture

ICTP Trieste, December 10, 2007We need data and knowledge about that data to interoperate

Page 27: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that

Data, But as Yet Not Used Much

Oct. 12, 2010

Will Widgets and Semantic Tagging Change Computational Biology? PLoS Comp. Biol. 6(2) e1000673

We need data and knowledge about that data to interoperate

Page 28: Jim Gray Award Lecture

Semantic Tagging of Database Content in The Literature or Elsewhere

http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jspPLoS Comp. Biol. 6(2) e1000673Semantic Tagging

Page 29: Jim Gray Award Lecture

We need data and knowledge about that data to interoperate

Page 30: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

The Publishers are Starting to Do It

Oct. 12, 2010 From Anita de Waard, Elsevier

Page 31: Jim Gray Award Lecture

This is Literature Post-processingBetter to Get the Authors Involved

• Authors are the absolute experts on the content

• More effective distribution of labor

• Add metadata before the article enters the publishing process

We need data and knowledge about that data to interoperate

Page 32: Jim Gray Award Lecture

Word 2007 Add-in for authors

• Allows authors to add metadata as they write, before they submit the manuscript

• Authors are assisted by automated term recognition– OBO ontologies– Database IDs

• Metadata are embedded directly into the manuscript document via XML tags, OOXML format– Open– Machine-readable

• Open source, Microsoft Public License

http://www.codeplex.com/ucsdbiolit

We need data and knowledge about that data to interoperate

Page 33: Jim Gray Award Lecture

Challenges

• Authors – Carrot IF one or more publishers fast tracked a

paper that had semantic markup it might catch on

• Publishers– Carrot Competitive advantage

We need data and knowledge about that data to interoperate

Page 34: Jim Gray Award Lecture

The Promise – A Hypothetical Example

Immunology Literature

Cardiac DiseaseLiterature

Shared Function

We need data and knowledge about that data to interoperate

Page 35: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

A Few Things to Accelerate the Rate of Scientific Discovery

• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate

i.e. we need new kinds of fast, versatile publications and data archives

– We need to be more open with both– We need to think more about the tools that analyze, visualize

and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet

Oct. 12, 2010 Easy Hard

Page 36: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

One Small Example – The Molecular Biology Toolkit (MBT)

• jMol, VMD … are de facto standard important tools for rendering biological molecules .. but

• They are not versatile ie do not for example:– Respond to the data they are

reading– Offer views that match the users

interests– Allow the user to annotate the

data– Allow those annotations to be

shared (published?)

Oct. 12, 2010 Think More About the Tools

Page 37: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

MBT Featureshttp://mbt.sdsc.edu

• Offer a framework not an end user application

• Responds to the data type• Support read write access• Encourages others to

write end user applications

• Discourages feature creep

Oct. 12, 2010 Think More About the Tools

Immunome Research, 2007 3(1):3

Immunologists

MedicinalChemists

BMC Bioinformatics 2005, 6:21.

Page 38: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

A Few Things to Accelerate the Rate of Scientific Discovery

• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate

i.e. we need new kinds of fast, versatile publications and data archives

– We need to be more open with both– We need to think more about the tools that analyze, visualize

and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet

Oct. 12, 2010 Easy Hard

Page 39: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

Reward Systems Need to ChangeWhat is Needed?

• Author disambiguation• Auditing (identification and metrics) of all

scholarship - means new tools• Seniors need to promote alternative forms of

scholarship• Juniors need to respond

Oct. 12, 2010Reward Systems Need to Change

Ten Simple Rules for Getting Promoted as a Computational Biologist in Academia PLoS Comp Biol to appear

Page 40: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

Example Tools

Oct. 12, 2010

http://pubnet.gersteinlab.org/

http://www.researcherid.com/

http://www.biomedexperts.com

Page 41: Jim Gray Award Lecture

What Are these Alternative Forms of Scholarship?

Research[Grants]

JournalArticle

ConferencePaper

PosterSession

Reviews

BlogsCommunity Service/Data

Curation

Reward Systems Need to Change

Page 42: Jim Gray Award Lecture

Reward Systems Need to Change

Page 43: Jim Gray Award Lecture

A Unique Identifier is Going to Happen

• It is DOIs for people• Some scientists will

resist• The winner is ORCHID?

Reward Systems Need to Change

Page 44: Jim Gray Award Lecture

Ideally the ID will be Tagged to Every Piece of Scholarly Communication

I an Not a Scientist I am a NumberPLoS Comp. Biol. 2008 4(12) e1000247

Reward Systems Need to Change

Page 45: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

A Few Things to Accelerate the Rate of Scientific Discovery

• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate

i.e. we need new kinds of fast, versatile publications and data archives

– We need to be more open with both– We need to think more about the tools that analyze, visualize

and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet

Oct. 12, 2010 Easy Hard

Page 46: Jim Gray Award Lecture

The Truth About My Laboratory

• I have ?? mail folders!

• The intellectual memory of my laboratory is in those folders

• This is an unhealthy hub and spoke mentality

We Need Scientist Management Tools

Page 47: Jim Gray Award Lecture

The Truth About My Laboratory

• I generate way more negative that positive data, but where is it?

• Content management is a mess– Slides, posters…..– Data, lab notebooks ….– Collaborations, Journal clubs …

• Software is open but where is it?• Farewell is for the data too

Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. 2008 4(7): e1000136 We Need Scientist Management Tools

http://artbyvida.com/portfolio.php

Page 48: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

Many Great Tools Out There

Oct. 12, 2010 We Need Scientist Management Tools

Taverna

Page 49: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

Where I See the Problems

• The long tail is confused

• Lack of interoperability between the options

• The reward (publishing) is still removed from the available tools

Oct. 12, 2010 We Need Scientist Management Tools

Page 50: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

A Few Things to Accelerate the Rate of Scientific Discovery

• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate

i.e. we need new kinds of fast, versatile publications and data archives

– We need to be more open with both– We need to think more about the tools that analyze, visualize

and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet

Oct. 12, 2010 Easy Hard

Page 51: Jim Gray Award Lecture

Yes YouTube Can Increase the Rate of Discovery

Unleash the full power of the Internet

Page 52: Jim Gray Award Lecture

The Lab ExperimentPaper+Rich Media

• My students enjoyed the experience• The shyest student was actually the most bold

in front of the camera• “We will become a generation of “science

castors”• They liked the exposure for the most part –

rather than the PI it puts them out in front

Unleash the full power of the Internet

Page 53: Jim Gray Award Lecture

Organic Growth

• Some of their work viewed 20,000+ times• Global audience of researchers, educators and

academic/research institutions– 60,000 unique visitors & 2M pageviews/month– 16,000 registered users & 600 communities– 5,000 uploads of video content (about journal articles,

conferences, research news and classes)– Growing 4-5% monthly

• Sustainability - evolving a business model supporting journals and conferences

3 Years Laterwww.scivee.tv

Unleash the full power of the Internet

Page 54: Jim Gray Award Lecture

Products

ApplicationProduct Primary Customers

Journals PubCast Journals, publishers, societies

Meetings PosterCast Societies, conference orgs.SlideCast

Comm. PaperCast Societies, journalsPodcastSlideCast

Education PosterCast Societies, universitiesSlideCast

Books BookCast Publishers, book sellers

What Emerged: SciveeCasts

Unleash the full power of the Internet

Page 55: Jim Gray Award Lecture

Jim Gray eScience Award Lecture

Summarizing the Reaming of Life

• By “Life” I mean experiences in the Life Sciences• By “Reaming” I mean the the making of something

smooth, fast and accurate• The Monty Python parody is on conversation cards

for getting a dialog going ..

• The rest is just a few examples of the small ways we are trying to address big problems in the hope they will inspire us all to think more deeply about the problem

Oct. 12, 2010

Page 56: Jim Gray Award Lecture

Acknowledgements• BioLit Team

– Lynn Fink– Parker Williams– Marco Martinez– Rahul Chandran– Greg Quinn

• MBT– John Moreland– John Beaver

• Microsoft Scholarly Communications– Pablo Fernicola– Lee Dirks– Savas Parastitidas– Alex Wade– Tony Hey

• wwPDB team– Andreas Prilc– Dimitris Dimitropoulos

• SciVee Team– Apryl Bailey– Leo Chalupa– Lynn Fink– Marc Friedman (CEO)– Ken Liu– Alex Ramos– Willy Suwanto– Ben Yukich

http://www.scivee.tv

http://biolit.ucsd.eduhttp//www.pdb.orghttp://www.codeplex.com/ucsdbiolit