DataShare for the UCs 6 February 2014
Nov 11, 2014
DataShare for the UCs
6 February 2014
From Flickr by Leo Hidalgo
Background Demo of UCSF DataShare Technical details Other details Future plans Q&A
Where we’re going
Catalyze widespread research data sharing
Develop a system that lowers data sharing barriers and builds an engaged user community
Goal
How
How?
Other
Survey of users by Angela Rizk-‐Jackson
Has your research group provided public access to data? No Yes
Journal required
Funder required
Other
Why?
Repository
Website
n = 114
Repository choices…
Repositories
for data
General content
Non-‐institutional
Publishers/for-‐profits
Short-‐term projects
Institutional
Discipline-‐specific
Repository choices…
Institutional
Discipline-‐specific
• All data associated with a paper
• Tells a story • Clearinghouse for
researcher’s works
• Some of data for a given paper
• Discoverable • Integrated systems • Collection policies
Repository choices…
? Both
Which should a researcher use?
Which is more important?
Depends
Institutional
• All data associated with a paper
• Tells a story • Clearinghouse for
researcher’s works
IR’s are SO 2002.
From Flickr by Colin ZHU
From Flickr by Ludie Cochrane
From
Flickr by john
sons53
1
From Flickr by Kapil Karekar
… “Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products.”
Last year…
IR
From
Flickr by wiccked
From Flickr by jackcheng
But…
Not always self-‐service
Sometimes complicated
Data?
“Old” user interfaces
Simplify data deposit for UC researchers
Simple metadata
Self-‐service upload and download Branded for campus
Most Important: Institutional Control Over Data
From Flickr by Leo Hidalgo
Background Demo of UCSF DataShare Technical details Other details Future plans Q&A
From Flickr by Leo Hidalgo
Background Demo of UCSF DataShare Technical details Other details Future plans Q&A
Technical goals • Easy submission
• Persistent citation
• Preservation assurance
• Effective discovery
• Control over terms of use
• All the benefits of a centrally hosted service, while maintaining campus branding and identity
From www.dimensionsinfo.com
From Flickr by Eric Peacock
System components • Easy submission
• Persistent citation
• Preservation assurance
• Effective discovery
• Control over terms of use
• All the benefits of a centrally hosted service, while maintaining campus branding and identity
UCSF drag-‐n-‐drop client
DNS, Apache, CSS, and campus Shibboleth IdPs
datashare.berkeley.edu datashare.ucdavis.edu datashare.uci.edu datashare.ucla.edu …
Data use agreements (DUAs)
Deposit interactions
Merritt Discovery (XTF)
Drag-‐n-‐drop client
EZID
DataCite Data Citation Index Primo
Campus IdP
Researcher (data producer)
Atom Shib
Authenticate with campus credentials
Assemble dataset Add metadata Submit to Merritt
Request DOI Register metadata
Populate XTF index
Request DOI Register metadata
Harvest for A&I discovery Harvest for A&I discovery
DataShare portal CSS
datashare.campus.edu
SDSC cloud
Preservation storage
Assign DOI
Assign DOI
Data use agreement
Download interactions
Merritt Discovery (XTF)
Drag-‐n-‐drop client
EZID
DataCite Data Citation Index Primo
Campus IdP
Researcher (data consumer)
DataShare portal CSS
datashare.campus.edu
SDSC cloud Data use agreement
Accept DUA terms
Faceted search / browse
Faceted search / browse Faceted search / browse
Retrieve data
Download data Synchronous for small datasets; asynchronous for large (> 500 MB)
From Flickr by Leo Hidalgo
Background Demo of UCSF DataShare Technical details Other details Future plans Q&A
Campus Library Delivers service to community Shapes user interface, URL, branding Customizes key components Develops help, training
UC3 / CDL Guides the campus
Preserves content in Merritt Connects to EZID
Deploys XTF for discovery Works with vendors
SDSC Maintains production storage infrastructure Holds three independent copies of content
Roles
Branding & Customization
• Logo • URL • Contact information • Other…? From Flickr by Diorama Sky
• EZID accounts – Existing campus memberships provide unlimited
DOIs
• Merritt recharge proposal (awaiting UCOP approval)
– Pay-‐as-‐you-‐go $0.40/GB/year – Paid-‐up (for 10 years) $2.93/GB – Threshold pricing 100, 200, 500 GBs
1, 2, 5, 10, 20, 50, 100 TBs
Cost From Flickr by Maura Teague
Anticipated cost of providing all campus ladder-‐track faculty with 5 GBs for 10 years
Cost
Campus Faculty Threshold Paid-‐up cost
Berkeley 1,260 10 TB $ 29,300
Davis 1,240 10 TB $ 29,300
Irvine 1,051 10 TB $ 29,300
Los Angeles 1,701 10 TB $ 29,300
Merced 159 1 TB $ 2,930
Riverside 561 5 TB $ 14,650
San Diego 1,109 10 TB $ 29,300
San Francisco 366 2 TB $ 5,860
Santa Barbara 746 5 TB $ 14,650
Santa Cruz 485 5 TB $ 14,650
Source: http://legacy-‐its.ucop.edu/uwnews/stat/headcount_fte/oct2013/welcome.html
Governance & Agreements
Goal: Simplify & Scale Data Use & Deposit Agreements
CDL
UC Campus
Data Depositor
Data User
Terms of service
ODL or similar
Terms of service
ODL or similar
Governance & Agreements
From Flickr by Leo Hidalgo
Background Demo of UCSF DataShare Technical details Other details Next steps & future plans Q&A
Who Decides?
• CDL to work with each campus to implement & shape service
• Campus-‐to-‐campus interaction • Group meetings as needed • SAG1 check-‐ins • Communication (…)
This is a group project
From
Flickr by Misch
ievo
us One
Two heads are better than
one!
From Flickr by Alice Bartlett
• eScholarship connection • ORCID • Altmetrics • Solr/Blacklight for discovery • Expand metadata options • Embargoes • Restricted access for peer review • Annotations • Export to citation managers • Staging area • Private storage • Mapping metadata/GIS support
From Flickr by Emil Nordén
Google Groups Web Forum
Communication
UC3 confluence site confluence.ucop.edu/display/Curation/DataShare+for+UCs
Communication
• Listserv? • Twitter @DataShareOrg • …?
Communication
From
Flickr by gsagos/n
ho
github.com/CDLUC3/datashare
Communication
DASH: Helping Community Repositories
What Makes DASH Unique: • Modern, intuitive user interface for superior user experience • Freely available code for download and use by anyone • User-‐friendly API(s) to ensure interoperability with existing
repositories (e.g., SWORD for deposit; Atom, OAI-‐PMH, ResourceSync for populating the discovery index).
• Customizable interfaces that can be altered easily to reflect service provider branding
• Authentication via institutional Identity Management Systems
To be ReviseD
Next Steps – Next 2 Weeks • details to be established – who’s interested – tech contact for interested campuses
– communication lines
From Flickr by Themactep
• get DataShare up and running – Shibboleth configuration & other authentication
– Domains/URLs established – Customizations – logos etc.
From Flickr by Themactep
Next Steps – Next 2 Months
• in-‐person meeting? • CDL camp? • communication/outreach?
From Flickr by Themactep
Next Steps – Longer term
• Geoffrey Boushey • Julia Kochi • Megan Laurence
• Stephen Abrams • Trisha Cruse • Carly Strasser • Perry Willett
• Anirvan Chatterjee • Angela Rizk-‐Jackson • Maninder Kahlon
Acknowledgements