Top Banner
A Few RDAP Thoughts Based on Experience with The RCSB Protein Data Ban www.rcsb.org Philip E. Bourne UCSD [email protected] 3/31/11 RDAP Summit 2011 http://www.slideshare.net/pebourne/rdap-033111
21

Bourne RDAP11 Data Publication Repositories

Nov 12, 2014

Download

Documents

ASIS&T

Phil Bourne, Protein Data Bank; Data Publication Repositories; RDAP11 Summit

The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bourne RDAP11 Data Publication Repositories

A Few RDAP ThoughtsBased on Experience with

The RCSB Protein Data Bank

www.rcsb.org

Philip E. BourneUCSD

[email protected]

3/31/11 RDAP Summit 2011

http://www.slideshare.net/pebourne/rdap-033111

Page 2: Bourne RDAP11 Data Publication Repositories

Disclaimer

I am not an expert in institutional repositories

I happen to have helped develop and oversee a resource that I use for my own research

3/31/11 RDAP Summit 2011

Page 3: Bourne RDAP11 Data Publication Repositories

What is the Protein Data Bank (PDB)?

“Stored collective”

“Consistent with scholarly practice”

Clifford Lynch

3/31/11 RDAP Summit 2011

Page 4: Bourne RDAP11 Data Publication Repositories

What is the Protein Data Bank (PDB)? The single community owned

worldwide repository containing structures of publically accessible biological macromolecules

A resource used by ~ 200,000 individuals per month

A resource distributing worldwide the equivalent to ¼ the National Library of Congress each month

A bicoastal resource 1TB

3/31/11

Page 5: Bourne RDAP11 Data Publication Repositories

Nu

mb

er o

f re

leas

ed e

ntr

ies

Year

PDB Total Contents by Year

3/31/11

Page 6: Bourne RDAP11 Data Publication Repositories

Why We Think We Are Successful?

Number of visits and page views is growing faster than number of unique visitors

Page 7: Bourne RDAP11 Data Publication Repositories

Metric of Success - A Research Tool for Influenza

* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm

Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010

1RUZ: 1918 H1 Hemagglutinin

Structure Summary page activity forH1N1 Influenza related structures

*

3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir

Page 8: Bourne RDAP11 Data Publication Repositories

Looking Back Over the Past 12 Years – In General Everything was harder and took longer than we thought There are a lot of politics associated with data Emphasis has shifted from archive to + analytical tool

to + educational tool Consequently outreach is our most important yet least

understood activity today Staff needed to change accordingly Policy has changed as well – some support for non-

generic tools Prorated our budget has decreased

Page 9: Bourne RDAP11 Data Publication Repositories

Looking Back Over the Past 12 Years – Infrastructure It took about 5 years to achieve and

subsequently sustain 99.99% uptime We have gone through 3 distinct architectural

changes– Object model / Perl CGI– Object-relational model Enterprise Java– Redesign same model widget based UI

3/31/11 RDAP Summit 2011

Bluhm et al. 2011 Quality Assurance doi: 10.1093/database/bar003

Page 10: Bourne RDAP11 Data Publication Repositories

Looking Back Over the Past 12 Years – Data & Data Management About 25% of our budget has been spent on data

remediation Support yearly snapshots and versioning Our ontology/data model has been a critical component of

our workflow and data accuracy The same model is too complex to facilitate wide adoption

by others that use our data Our data are such that we can retain redundant copies Data objects are discreet and we assign DOIs Constantly striving to have the user distinguish raw from

derived data

3/31/11 RDAP Summit 2011

Page 11: Bourne RDAP11 Data Publication Repositories

Trends Today

Constant demand for better performance Use of Web services (SOAP and now RESTful) are

increasing The uptake on the use of widgets has been slower

than I hoped Users are hankering after additional annotations of

the data – working on database-literature integration

Mobile use is increasing Web 2.0 services are in demand

3/31/11 RDAP Summit 2011

Page 12: Bourne RDAP11 Data Publication Repositories

Website Performance Improvements

Back End– Back-end tuning and use of

multilevel caching in the areas of searches, query results, explorer pages and hierarchical views

– Better performance and a more robust and scalable system

Front End– Cleaner JavaScript and

CSS

– Inline Image Data– Compressed Content

(Gzip + Base 64)

– Result: 25% - 40% increase in render performance

Page 13: Bourne RDAP11 Data Publication Repositories

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

Literature Integration – The Dream

1. User clicks on content

2. Metadata and webservices to data provide an interactive view that can be annotated

3. Selecting features provides a data/knowledge mashup

4. Analysis leads to new content I can share

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

The Knowledge and Data Cycle

PLoS Comp. Biol. 2005 1(3) e34

Page 14: Bourne RDAP11 Data Publication Repositories

www.rcsb.org/pdb/explore/literature.do?structureId=1TIM

Example of Interoperability: The Database View

BMC Bioinformatics 2010 11:220

Page 15: Bourne RDAP11 Data Publication Repositories

Example of Interoperability – The Literature View

From Anita de Waard, Elsevier

Page 16: Bourne RDAP11 Data Publication Repositories

Worldwide Protein Data Bank

www.wwpdb.orgSemantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that Data, But as Yet Not Used Much

Will Widgets and Semantic Tagging Change Computational Biology? PLoS Comp. Biol. 6(2) e1000673

Page 17: Bourne RDAP11 Data Publication Repositories

Semantic Tagging of Database Content in The Literature or Elsewhere

http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jspPLoS Comp. Biol. 6(2) e1000673Semantic Tagging

Page 18: Bourne RDAP11 Data Publication Repositories

PDBMobile

• Fast, low bandwidth data access• First version supports iPhone OS• Future versions will support Android,

Blackberry OS6 and others.• HTML 5-based web application• Client-side database stores data for

offline-access• Tight integration with MyPDB

Objective: PDB Data Access On-The-Go

Page 19: Bourne RDAP11 Data Publication Repositories

PDBMobile

• Access to saved queries • Add/delete queries • Flag interesting entries• Add personal structure annotations

Tight Integration with MyPDB

Page 20: Bourne RDAP11 Data Publication Repositories

Future

New views on the data for subclasses of user

New data deposition system – increase speed and accuracy while reducing costs

New types of analysis

Page 21: Bourne RDAP11 Data Publication Repositories

Acknowledgements

Funding Agencies: NSF, NIGMS, DOE, NLM, NCI, NCRR, NIBIB, NINDS, NIDDK

213/31/11 RDAP Summit 2011