Page 1
A DATACITE CASE STUDY FROM THE UK DATA ARCHIVE
……………………………………………………………………………………………………
TOM ENSOM…………………….…………………………….…
UK DATA SERVICEUK DATA ARCHIVEUNIVERSITY OF ESSEX………………………………..…………………….
C4D WORKSHOP, JULY 2013, LONDON
Page 2
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
WHO WE ARE
• Established in 1968 - 46 years of selecting, curating, preserving and providing access to social science data
• 6,000 datasets in the collection• Over 25,000 registered users
• Data and data support services for higher and further education for research, teaching and learning
• Have been registered to ISO 27001 (information security standard) since June 2010
Page 3
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
OUR SERVICES
• UK Data Archive itself a department of the University of Essex
• Distributed service established 1 January 2003 called the Economic and Social Data Service (ESDS)
• New five-year UK Data Service from 2012
Page 4
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
WHAT WE DO
• Research & development, innovation • Promoting best practice in data curation
• Raise standards in data security and awareness of ethical/legal issues
• Raise standards in data management• Data management hub• We provide guidance to ESRC
researchers and anyone else who asks
Page 5
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
WE SUPPORT RESEARCHERS
• Popular training materials• Managing and Sharing Guide• Training Resources
• Website:http://data-archive.ac.uk/create-manage
• Bespoke training events• Large and small scale workshops
Page 6
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
ENGAGEMENT WITH RDM COMMUNITY
• Recently completed JISC Managing Research Data project with University of Essex
• Cross support service, departmental engagement• Piloted an RDM infrastructure• http://www.data-archive.ac.uk/create-manage/proj
ects/rd-essex
• Outputs of value to RDM community:• Metadata profile for institutional data
repositories• Research data plugin for EPrints
Page 7
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
WHY CITE DATA?
It’s a vital part of a rigorous research process:
• Acknowledges researcher’s sources • Gives data creators, authors and data
curators proper credit when their work is reused
• Facilitates data resource discovery and access
• Helps track the use and impact of data collections
Page 8
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
OUR APPROACH TO CITATION
• Required by our user agreement (End User Licence) for many years:
Page 9
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
OUR APPROACH TO CITATION
• Should include enough information to ensure the exact version can be located
“University of Essex. Institute for Social and Economic Research and National Centre for Social Research, Understanding Society: Wave 1, 2009-2010 [computer file]. 2nd Edition. Colchester, Essex: UK Data Archive [distributor], November 2011. SN: 6614.”
• No widely agreed standard citation format yet!• Version information crucial
Page 10
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
PERSISTENT IDENTIFERS
• Persistent Identifiers (PIDs) • A string identifying a clearly
defined digital object• Persistence must mean enduring• Identifiers must be unique
• PIDs have been attached to scientific publications for some time
• Next logical step: data• Also being applied to other entities
e.g. people via ORCID system
Page 11
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
CHANGES TO DATA
• Our ‘data collections’ are not discrete digital objects
• Approx. 15% UKDA data collections are altered within first year after publication
• Versioning - we need to distinguish between major and minor changes to a data collection
• Integrate processes with:• Digital preservation activities• Current ingest infrastructure / workflows
Page 12
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
MINOR CHANGES – LOW IMPACT
• Publication reference added• Correction of spelling in variable
labels• Small changes in variable labels• Removal of (erroneously
supplied) admin variables• Correction of spelling in
metadata• Minor changes in documentation• New index (keyword) terms• Additional documentation added
(non-fundamental)• Change in access conditions
Page 13
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
MAJOR CHANGES – HIGH IMPACT
• Adding new ‘waves’ in a data series
• New variable added
• New labels/value codes added
• Weighting variables reconstructed
• Wrong data supplied (e.g., March not April)
• Mis-coded data (e.g., Don’t know/Refused mix-up)
• Change in format (file migration)
• Significant changes in documentation
• Change in access conditions
Page 14
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
DATACITE DOIs
• 2011: we started working with the British Library and DataCite to develop a permanent, reliable method of citing our data collections
• DataCite • Founded by organisations from six
countries• Established a citation format for research
data, including a DOI• Works with data publishers, e.g.
established data centres and institutional repositories
Page 15
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
WHY DATACITE?
Not the only choice, but right for us:• DOI framework an international and persistent
standard for identifying digital objects
• Familiar within the research data domain
• Centralised resolution service
• Metadata registry (and thus de facto standard)
• Discovery link up
• API – allowing for automation of minting process (but also manual if you prefer!)
Page 16
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
DOI FORMAT
Readable archive identifier
Resource identifier type
Resource identifier
Resource version
10.5255 / UKDA – SN – 1 – 1
Unique archive identifier
Page 17
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
DOI VERSIONING
……
……
……
……
.……
……
……
……
……
……
……
……
……
……
…
High impact change
10.5255/UKDA-SN-1-1
10.5255/UKDA-SN-1-2
Low impact change
10.5255/UKDA-SN-1-1Increments
major version – new DOI
Increments minor version - internal
Page 18
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
• New data collection ‘ingested’
• Structured DOI ‘created’
• New change log• New citation file
CREATING A NEW DOI
• DataCite API sends back an approval
• Flagged behind the scenes
• Minimal DataCite metadata inc. requested DOI pushed to DataCite metadata store via API
Page 19
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
• Minimal DataCite metadata inc. requested DOI pushed to DataCite metadata store via API
• DataCite API sends back an approval
• Flagged behind the scenes
• High impact change to data collection
• Incremental DOI version ‘created’
• Update change log
• New citation file
UPDATING A DOI – HIGH IMPACT
Page 20
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
• Minimal DataCite metadata pushed to DataCite metadata store via API
• Low impact change to data collection
• Update change log
UPDATING A DOI – LOW IMPACT
• DataCite API sends back an approval
• Flagged behind the scenes
Page 21
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
THE END RESULT…
DOI: SN-####-1
DOI: SN-####-3
DOI: SN-####-2
SN####Survey Waves 1-13
SN####Survey Waves 1-14
SN####Survey Waves 1-15
Instance-specific data and metadata
Instance-specific data and metadata
(current)
Instance-specific data and metadata
Jump page (= change log)
Page 22
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
Page 23
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
Page 24
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
OUR DOI METADATA
Page 25
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
Page 26
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
CHALLENGES FOR THE FUTURE
• Citing parts (fragments) of data collections• single files• subsets of quantitative data files • extracts of textual data
• Still uncertainty over where exactly research data should go – IR, Subject Specific Repository, Data Journal?• Who should be minting DOIs?• Avoid assigning multiple identifiers to an object
Page 27
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
ESRC’s CITATION AWARENESS GUIDE
Page 28
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
ACKNOWLEDGEMENTS
Thanks to the following UKDA/UKDS staff for their assistance in putting this together:
• Matthew Woollard• Louise Corti • John Payne• Matthew Brumpton• Sharon Bolton
Page 29
……………………………………………………………………………………………………………………………….……………………………
…………………………………………………………………………………………………………………………………………………………………
UK DATA ARCHIVE
CONTACT
TOM ENSOM
UK DATA ARCHIVEUNIVERSITY OF ESSEXWIVENHOE PARKCOLCHESTERESSEX CO4 3SQ……………..…..………………………..T +44 (0)1206 872974 E [email protected]