L IBBIE S TEPHENSON , D ATA A RCHIVIST (R ETIRED ) UCLA S OCIAL S CIENCE D ATA A RCHIVE LIBBIE @ G . UCLA . EDU HTTPS :// DATAVERSE . HARVARD . EDU / DATAVERSE / SSDA _ UCLA Data Curation for Quantitative Social Science Research: A Case Study NISO Virtual Conference: Data Curation – Cultivating Past Research Data for Future Consumption August 31, 2016
30
Embed
Stephenson - Data Curation for Quantitative Social Science Research
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
I am retired from UCLA so my comments reflect my own experience and expertise. They do not necessarily reflect the ideas, opinions or practices of anyone at UCLA.
These materials are free for you to use, but please cite accordingly.
NISO - AUGUST 31, 2016
2
OVERVIEW
About the Archive
About the data we manage
What we are trying to do
What we actually do
Some illustrations
NISO - AUGUST 31, 2016
3
ABOUT THE ARCHIVE
Operating since 1964 -- before email, PC’s, Internet, laptops, smart phones; Manage survey/quantitative data stored on media from punch cards to cloud
Staff have library science degrees; statistical and technical expertise; quantitative social science background
Serve all UCLA quantitative researchers: Provide reference, cataloging/metadata, long term archiving; support in data rescue, management, security.
Data Deposit Form signatures and completeness; commitment to share data; privacy and confidentiality
NISO - AUGUST 31, 2016
10
DATA QUALITY REVIEW
Use of statistical packages, emulator, Adobe Pro, Excel, Colectica, Text editor
Verify deposit package, check sums, freq’s, compare data to documentation
Completeness of codebook, question text, sampling, weighting, recodes, methods
Disclosure analysis, check for personal identifiers and assess privacy/confidentiality of respondents
Documentation converted to PDF/A
11
NISO - AUGUST 31, 2016
EXAMPLE: WHAT KIND OF DATA?
NISO - AUGUST 31, 2016
12
CODEBOOK DOCUMENTS THE
COLUMNS
NISO - AUGUST 31, 2016
13
5002 01 01 302000 001 101 10004B121068965
Each item is called a variable. We refer to the numeric content of each item as a value.
COMPARE FREQS TO CODEBOOK
NISO - AUGUST 31, 2016
14
VALUES VALUE LABELS
VARIABLE
RUN MARGINALS/FREQUENCIES
NISO - AUGUST 31, 2016
15
Sex of Respondent Frequency Percent Valid Percent Cumulative Percent Valid MALE 856 45.1 45.1 45.1 FEMALE 1041 54.9 54.9 100.0 Total 1897 100.0 100.0 What is your race - ethnicity Frequency Percent Valid Percent Cumulative Percent Valid White 618 32.6 32.6 32.6 Hispanic 475 25.0 25.0 57.6 Black 474 25.0 25.0 82.6 Asian or Pacific Islander 282 14.9 14.9 97.5 Native American or Alaskan native 17 .9 .9 98.4 Identifies more than one of the above groups 20 1.1 1.1 99.4 DON'T KNOW 2 .1 .1 99.5 REFUSED 9 .5 .5 100.0 Total 1897 100.0 100.0
Use the Data tab to import files from SPSS or STATA formats.
NISO - AUGUST 31, 2016
23
Label
Question
text
Numeric
values
Variable Details include variable name, label, description or question text, and types of coding.
NISO - AUGUST 31, 2016
24
EXAMPLE DDI FROM COLECTICA
NISO - AUGUST 31, 2016
25
DDI fields are in red; used to create documentation; can be repurposed
PRESERVATION AND CURATION
Continuous monitoring of file formats; migrate to new formats when: New operating system; New version of statistical software New mode of file transfer; Code change
Monitoring of database function; software updates or redesigns
Monitoring of servers, external media health; replace as needed
Data forensics; check sums; validation; authentication; version control; format migration; refresh media; record preservation metadata -- DDI
Review disaster plan and collection policy at regular intervals
Review new or revised regulations for intellectual property; security; data producers/distributors; funding agencies
Review with original depositor, their data management plans, changes in access or user permissions
26 Focus is on functional-level preservation and long term usability through use of DDI and continuous review.
UNCOMFORTABLE TRUTHS
Data management in institutions requires high level administrative participation; new, sustained funding; and differently trained staff
Data management planning is not a static event but a continuous process to ensure long term independently understandable informed reuse of research
There is an urgent need for standards, tools, and best practice models for many different file formats and disciplines
NISO - AUGUST 31, 2016
27
NEXT STEPS FOR PRACTITIONERS
“Crucial metadata about data are not always being captured or created and linked to data in repositories. Storage and persistence of data submissions isn't enough. We need data archivists and librarians to commit to partnering with researchers to curate data -- to review incoming data for usability, confidentiality, and completeness of descriptive information.”
NISO - AUGUST 31, 2016
28
Ann Green (2016) Email communication Used with permission
ANY QUESTIONS?
THANK YOU!
Social Science Data Archive, UCLA
Box 951484 Los Angeles, CA 90095-1484 310-825-0716
NISO - AUGUST 31, 2016
29
LINKS
Social Science Data Archive dataverse.harvard.edu/dataverse/ssda_ucla
Data Seal of Approval www.datasealofapproval.org/en/
National Digital Stewardship Alliance ndsa.org/activities/levels-of-digital-preservation/
Open Archival Information System www.oclc.org/research/publications/library/2000/lavoie-oais.html
Social Science Data Archive Policy data-archive.library.ucla.edu/SSDA_collectionAndArchivingPolicy.pdf?_ga= 1.3255478.786669706.1378228281
Data Curation Profile datacurationprofiles.org/
Data Management Planning at ICPSR www.icpsr.umich.edu/icpsrweb/content/datamanagement/dmp/index.html
ICPSR Guide to Data Preparation www.icpsr.umich.edu/icpsrweb/content/deposit/guide/