Top Banner
DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences [email protected] DataCite Summer Meeting 2010 – Making datasets visible and accessible Hannover, 7-8 June 2010
28

DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences [email protected] DataCite.

Mar 27, 2015

Download

Documents

Antonio Ball
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

DDI – Metadata for social science data

Wolfgang Zenk-MöltgenGESIS – Leibniz Institute for the Social Sciences

[email protected]

DataCite Summer Meeting 2010 – Making datasets visible and accessibleHannover, 7-8 June 2010

Page 2: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

• About DDI• Basic DDI Concepts• Identification and Citation• Application Examples

Topics

Acknowledgement:

DDI Alliance TIC members, namely Wendy Thomas, Arofan Gregory, Joachim Wackerow

Page 3: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

About DDIDDI – Data Documentation Initiative

„The Data Documentation Initiative (DDI) is an effort to create an international standard for describing social science data. Expressed in XML, the DDI metadata specification now supports the entire life cycle of social science datasets. DDI metadata accompanies and enables data conceptualization, collection, processing, distribution, discovery, analysis, repurposing, and archiving.” (Stefan Kramer)

http://www.ddialliance.org/

Page 4: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

History of DDI

• Concept of DDI and definition of needs grew out of the data archival community

• Established in 1995 as a grant funded project, initiated and organized by ICPSR

• February 2003 – Formation of DDI Alliance– Membership based alliance– Formalized development procedures

Page 5: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

Members of DDI• Initial members

– Social science data archives – Statistical data producers

• Actual membership expanded by– Research data centers– Data producers– Commercial organizations

University of Alberta, CanadaAustralian Bureau of Statistics (ABS)Australian Social Science Data Archive (ASSDA)University of California, Berkeley -- Computer-Assisted Survey Methods Program and UCDATAUniversity of California, California Digital LibraryCentro De Investigaciones Sociologicas (CIS), SpainCEPS/INSTEAD -- LuxembourgCornell University (CISER)Danish Data ArchiveData Archiving and Networked Services (DANS), The NetherlandsFinnish Social Science Data ArchiveGerman Socio-Economic Panel Study (SOEP)GESIS - Leibniz Institute for the Social SciencesUniversity of GuelphInstitute for Quantitative Social Science (IQSS) at Harvard UniversityInstitute for the Study of Labor (IZA)Inter-university Consortium for Political and Social Research (ICPSR)Massachusetts Institute of Technology (MIT)University of Minnesota, Minnesota Population CenterNational Opinion Research Center (NORC)Norwegian Social Science Data Service (NSD)Open Data FoundationPrinceton UniversityResearch Data Centre of the German Federal Employment Agency, Institute for Employment Research (IAB)Roper CenterStanford UniversitySurvey Research Operations, University of MichiganSwedish National Data Service (SND)Swiss Foundation for Research in Social Sciences (FORS)United Kingdom Data ArchiveUniversity of TorontoUniversity of WisconsinU.S. Bureau of Labor Statistics (Associate Member)World Bank, Development Data Group (DECDG)Yale University

Page 6: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

DDI is being used around the world

• Archives and Data Libraries

• Research Institutes and Data Service Centers

• International Organizations and National Statistical Agencies

Page 7: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

DDI Versions• 2000 – DDI 1.0

– Documentation of simple surveys, microdata only• 2003 – DDI 2.0 and 2.1

– Extension to aggregate data– Support for geographic material

• 2008 – DDI 3.0– Lifecycle model: Shift from the codebook centric / variable centric model to capturing the

lifecycle of data– Focus on metadata creation and re-use– “Machine-actionable” aspects of DDI to support programming– CAI instruments supported by expanded description of the questionnaire – Data series support (longitudinal surveys, panel studies, etc.)– Support comparison by design and comparison-after-the-fact– Improved support for describing complex data files

• 2009 – DDI 3.1– Correction of bugs – Introduction of final URN structure to ensure persistent URNs for all identified elements

Page 8: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

Basic DDI 3 Concepts

• Lifecycle Concept• Re-usable documentation

– Modules– Maintainables, versionables, identifiables– Scheme-based (maintainable lists)

• Relations to other standards• Controlled Vocabularies

Page 9: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

The Data Life Cycle

CollectionConceptProcessin

gDistributi

onDiscovery Analysis

Archiving

Repurposing

Page 10: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

DDI 3 versus earlier versions

• Previous versions had the “codebook” idea that creates a documentation of a social science dataset

• DDI 3 with it’s lifecycle model allows for documentation at all stages from study conception and data processing until analysis and repurposing of data

• DDI 3 uses XML Schemas instead of XML Data Type Definition (DTD) to have a stronger definition of metadata types, to make better reuse of content and to reach the goal of “machine actionability”

• A DDI 3 instance includes now the “simple instance” from previous DDI versions. Multiple data products can be included for a single study.

Page 11: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

DDI 3.1 ModulesContain groups of related documentation elementsSome are related to the Lifecycle model, some are technically grouped• Archive module• Comparative module• Conceptual components module• Data collection module• Dataset module• Dublin Core Elements module• DDI profile module• Grouping module• Instance module• Logical product module• Physical data product module

– (plus inline n-cube, normal n-cube, tabular n-cube module and proprietary module)• Physical instance module• Reusable module• Study unit module

Page 12: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

Usage of DDI 3 ModulesStudy Unit• Identification• Coverage

– Topical– Temporal– Spatial

• Conceptual Components– Universe– Concept– Representation (optional replication)

• Purpose, Abstract, Proposal, Funding

Data Collection • Methodology• Question Scheme

– Question– Response domain

• Instrument– using Control Construct

Scheme• Coding Instructions

– question to raw data– raw data to public file

• Interviewer Instructions

Logical Product• Category Schemes• Coding Schemes• Variables• NCubes• Variable and NCube Groups• Data Relationships

Physical Data Structure• Links to Data Relationships• Links to Variable or NCube Coordinate• Description of physical storage structure

– in-line, fixed, delimited or proprietary

Physical Instance• One-to-one relationship with a data file• Coverage constraints• Variable and category statistics

Archive• Organization or individual which has

control over the metadata• Lifecycle events• Archive specific information

etc…

Page 13: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

Maintainables, Versionables, Identifiables

Inheritance

Maintainables (may be maintained separately, need agency)

Versionables (may be versioned in the form „1.0.0“)

Identifiables (may be identified and be referenced, either by ID or URN)

Other DDI elements

Inheritance

Page 14: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

DDI SchemesSchemes = Lists of elements of one typeExamples• archive

– OrganizationScheme• datacollection

– QuestionScheme– ControlConstructScheme– InterviewerInstructionScheme

• conceptualcomponent– ConceptScheme– UniverseScheme– GeographicStructureScheme– GeographicLocationScheme

• logicalproduct– CategoryScheme– CodeScheme– VariableScheme– NCubeScheme

• physicaldataproduct– PhysicalStructureScheme– RecordLayoutScheme

Page 15: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

Relationship to Other Standards

• Dublin Core– Basic bibliographic citation information– Basic holdings and format information

• METS– Upper level descriptive information for managing digital objects– Provides specified structures for domain specific metadata

• OAIS – Reference model for the archival lifecycle

• PREMIS– Supports and documents the digital preservation process

• ISO 19115 – Geography (FGDC)– Metadata structure for describing geographic feature files such as shape, boundary, or map image files and their associated

attributes• ISO/IEC 11179

– International standard for representing metadata in a Metadata Registry– Consists of a hierarchy of “concepts” with associated properties for each concept

• SDMX– Exchange of statistical information (time series/indicators) – Supports metadata capture as well as implementation of registries

Page 16: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

Contr. Vocab• Not part of standard• Recommendations on:

• Example: TimeMethod may be– Longitudinal (Cohort or Trend)– Panel (Continuous or Interval)– TimeSeries (Continuous or Discrete)– CrossSectional– CrossSectionalAdHocFollowUp– Other

LifeCycleEventType

CommonalityTypeCoded

TimeMethod

ResponseUnit

AggregationMethodsType

DataType

SoftwarePackage

CharacterSet

CategoryStatistic

SummaryStatistic

Date@Calendar

AnalysisUnit

Contributor@Role

Publisher@Role

Page 17: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

Identification in DDI 3

• Two possibilities to identify an element:– Specify the <ID> Tag

• Agency and Version are inherited– Use the specially-structured URN

• Agency and Version must be included• The structured URN approach is preferred• These IDs/URNs can be referenced• Both ways need a resolver service that turns the names into

locations to make effective re-use possible • DDI Alliance ist currently working on that, based on the DNS

(Domain Name System) infrastructure approach

Page 18: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

URN Identification ExamplesURN of a maintained objectTo identify of a variable scheme in DDI 3 via a URN would be as follows:urn=“urn:ddi:us.icpsr:VariableScheme.V_GENDER_SCHEME.1.0.0”

URN of an versionable objectAll versionable objects are contained within maintainable objects. To identify a variable in DDI 3 via a URN would be as follows:urn=“urn:ddi:us.icpsr.VariableScheme. V_GENDER_SCHEME.1.0.0:Variable.Gender.1.0.0”

URN of an identifiable objectAn identifiable object may be a direct child of a maintainable object or be contained by a versionable object within a maintainable object. The full path should be provided to facilitate locating the item when referenced.<DataCollection isMaintainable=”true” id=”DC_5698” version=”2.4.0”>

<Methodology isVersionable=”true” id=”Meth_Type_1” version=”1.0.0”> <TimeMethod isIdentifiable=”true” id=”TM_1”>

To identify the identifiable object in the above hierarchy in DDI 3 via a URN would be as follows:urn=“urn:ddi:us.icpsr:DataCollection.DC_5698.2.4.0:TimeMethod.TM_1.1.0.0”

(from the DDI Technical Specification Part I)

Page 19: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

Citation in DDI

Page 20: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

OtherMaterial • Elements • Citation holds full citation information for the external object• ExternalURLReference location of the external object• ExternalURNReference URN expression for the external object• MIMEType the standard internet MIME type for applications • Relationship reference to DDI object and description of relation to it• Segment specifies part of external object (e.g. with audio/video files)• UserID unique ID of other types, e.g. DOI

• Attributes• Action used for local overrides in case of inheritance ("Add" | "Update" |

"Delete") • id DDI ID of the element• isIdentifiable fixed value of "true" • objectSource source name or location • type required type code for type of the external object• urn DDI URN of the element• xml:lang optional identification of the language of the external object •

Page 21: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

DOIs and DDI URNs • Relationship still unclear• DDI URN resolution service still needed• Every identifiable element could be registered with a

DOI, that would result in huge amounts of DOIs• Only study level could be registered with a DOI, e.g.

each StudyUnit• In DDI all registered DOIs should be documented• Vice versa each DOI should contain the DDI URN in the

metadata• Diverse software applications will make use of them

Page 22: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

Application Examples

• Enhanced Publications– Providing Information to connect Publications with the

underlying datasets/variables used– Making retrieval of research with specific

datasets/variables possible

• Version History of Datasets– Documenting errata and correction history– Making it easy to cite used data

Page 23: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

Supporting Enhanced PublicationsPublications with References to Data:DDI 3.1 URN contains:Agency ObjectVersion

URL ofDocumentation and/or Data

URL ofDocumentation and/or Data

DDI Alliance

find agency gesis.de.ddi

return resolver address

find object

return URL

http://resolve.gesis.org

http://www.gesis.org/doc/docxyzrequest documentreturn document

Publication with References (URNs)

Publication with References (URNs)

urn:ddi:de.gesis:VariableScheme.ZA3811_VarSch.1.0.0:Variable.V8.1.0.0

Page 24: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

Supporting Enhanced PublicationsDSDM DDI 3 EPE Simple Export Wizard 1.2.0

Page 25: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

Enhancing Publications - DatapluS

A University of Tilburg and Centerdata project, supported by GESIS and the European Values Study

Page 26: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

Version History of Datasets• The GESIS data catalogue holds study descriptions with

links to data access• GESIS currently introduces a common versioning policy

for datasets• Starting with version 1.0.0 and increasing the major, minor

or revision number according to change in the dataset• Corresponding to each published version a DOI will be

created• That gives transparancy in the history of data processing• Citation of used datasets will include the specific version

to ease replication

Page 27: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

Data Catalogue

Page 28: DDI – Metadata for social science data Wolfgang Zenk-Möltgen GESIS – Leibniz Institute for the Social Sciences Wolfgang.Zenk-Moeltgen@gesis.org DataCite.

Thank you!