Principles for Sustainable Data Curation;
Post on 20-Feb-2016
35 Views
Preview:
DESCRIPTION
Transcript
Principles for Sustainable Data Curation;
Steven WorleyComputational and Information Systems Laboratory
NCAR
Can Research Library Repositories Benefit from the Federal Lab Experience?
3
Topics
My perspective – Research Data Archive @ NCAR Principles for Sustainable Data Curation
Stable FundingKnowledgeable Staff Robust Digital StorageProtection from LossData and Metadata FormatPartnerships
Data Management Evolution
21 March 2012 ARL, Leadership Fellows
ARL, Leadership Fellows 4
My perspective – Research Data Archive @ NCAR
21 March 2012
Operational and Reanalysis Model Outputs
Meteorological and Oceanographic Observations
Remote Sensing Observations
Topography, Bathymetry, Vegetation, and Land Use
Core Data Categories
ARL, Leadership Fellows 5
My perspective – Research Data Archive @ NCAR
21 March 2012
Purposes Support climate & weather research at NCAR and
UCAR Universities Extend data service worldwide
Basic MetricsEstablished in 1960s600+ datasets, +4M files+70 datasets growing daily - monthly
ARL, Leadership Fellows 6
My perspective – Research Data Archive @ NCAR
21 March 20122006.0 2007.0 2008.0 2009.0 2010.0
0
1000
2000
3000
4000
5000
6000
7000
8000
0
100
200
300
400
500
600
700
800RDA Total Size and # of Unique Users
# of Users Size (TB)Year
Uni
que
Use
rs
Size
in T
B
Users
Size
ARL, Leadership Fellows 721 March 2012
• US• International
• Data• Assistance• Feedback
• Management• Supervision• Guidance• Integrity• Access
• Archiving• Metadata• Data Integrity• Preservation
Curation Steward-ship
UsersRequests
andNeeds
• Archiving• Metadata• Data Integrity• Preservation
ARL, Leadership Fellows 8
Sustainable Curation - Stable Funding
Permits: Flexibility
Evolution of data management to meet expectationsHolistic approach – not driven by narrowly defined
projectsTake advantage of unplanned opportunities
Necessary to keep collection viable for long-term
21 March 2012
ARL, Leadership Fellows 9
Sustainable Curation - Knowledgeable Staff
Data domain knowledge enables: Understand data and do integrity checks Choose data organization to fit science discipline Design appropriate access systems and do
consulting
Consistent staffing levels nurtures: Professionals dedicated to best practices Human-based knowledge cannot be under
estimated
21 March 2012
ARL, Leadership Fellows 10
Sustainable Curation – Robust Digital Storage
Keep pace with digital media evolution: Expect data migration every 2-5 years
Tape, disk capacity, etc. Plan, test, and implement migration carefully
Mistakes are irrecoverable!Use knowledgeable staff heavily
Why evolve? Users expect more data with faster access Media will eventually fail
21 March 2012
ARL, Leadership Fellows 11
Sustainable Curation – Protection from Loss
Create backup data and test disaster recoveryWhy? Physical failures
Environmental: Power outage, Fire, Flood, …..Hardware: Disk system failure, Tape degradation
Poor curation practicesMetadata lossAccidental data over-writes and deletions
Solutions Store backup at separate physical location Treat metadata and data as equals - couple together
21 March 2012
ARL, Leadership Fellows 12
Sustainable Curation – Protection from Loss
21 March 20122006.0 2007.0 2008.0 2009.0 2010.0
0
200
400
600
800
1000
1200
Size
in T
Bs
User Data
Full Archive
ARL, Leadership Fellows 13
Sustainable Curation – Protection from Loss
21 March 20122006.0 2007.0 2008.0 2009.0 2010.0
0
200
400
600
800
1000
1200
Size
in T
Bs
Full Archive
User DataBACKUPSRDA : 40%
ARL, Leadership Fellows 14
Sustainable Curation – Data and Metadata Format
Formats are a serious consideration because: Must maintain data access for long-term How?
Insist that data and metadata are in standard formatsAvoid computer OS dependent formats
Worry about application driven formatsE.G.: .xls, .xlsx, .doc, .docx, .ppt, .pptx, etc.
Challenge; Scientist are reluctant to help Curators nightmare; never ending data and
metadata format diversity
21 March 2012
ARL, Leadership Fellows 15
Sustainable Curation – Partnerships
Science productivity is enhanced by partnerships Open sharing of data and metadata
Relies heavily on standards No one archive or repository can do it all
BUT, users need/want it all Cost saving by sharing
21 March 2012
ARL, Leadership Fellows 16
Data Management Evolution – Person-centric
21 March 2012
1960s to 1990s
ARL, Leadership Fellows 17
Data Management Evolution – Metadata-centric
21 March 2012
1990s – 2010s
18
Summary: For Research Library Repositories
21 March 2012 ARL, Leadership Fellows
Sustainable Data Curation
Stable Funding KnowledgeableStaff
Robust Digital Storage
Protection fromLoss
Data/MetadataFormat
Partnerships
ARL, Leadership Fellows 1921 March 2012
Research Data Archive @ NCARhttp://dss.ucar.edu/
worley@ucar.edu
top related