Top Banner
Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: [email protected] URL: http://www.adp.fdv.uni- lj.si MOST (UNESCO) and GESIS workshop, Berlin, 22-24 February 2002
37

Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: [email protected] URL: .

Dec 24, 2015

Download

Documents

Elfreda Newton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Janez Štebe DDI Experience in ADP (2002)

Arhiv družboslovnih podatkov (ADP)

University of LjubljanaE-mail:

[email protected]:

http://www.adp.fdv.uni-lj.si

MOST (UNESCO) and GESIS workshop, Berlin, 22-24 February 2002

Page 2: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .
Page 3: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Topics of a presentation

A brief history of technical standards and its influence on Data Archives organisation

The adoption of DDI in 1999

Advantages and disadvantages of using existent but still emerging standard

What are XML and DDI?

Quick look inside DDI DTD document structure

DDI XML Codebooks production line in ADP

Discussion

Page 4: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

A brief history of data archives technical standards (Tannenbaum,

Taylor 1990)

Late 1950s – IBM cards

Easily reproduced, recycled – the advent of DA

1960s – electronic computers – end of storage standards

A task of data conversion and interchange – DA matured

Page 5: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Beginning of the www era in early 90s (DDI Committee, 2001)

CSSDA electronic codebook specificationOSIRIS Codebook Dictionary (SRC,ICPSR)Standard study description

But lack of coordination resulted in noncompatible catalogues

Page 6: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

“Midwife function” (Scheuch, 1990)

A role of ZA in late 1960 when 5 new archives were established in Europe:

“offers to share experiences, especially of past errors”

“technical information on data storage and retrieval”

Page 7: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Situation in 1997 when ADP establishes

“Multiplicity of classificatory languages, search techniques and standards for documenting data” (DDI Committee, 2001)

Every organisation adopt its own dialect of existing standards

A CESSDA IDC functioned as a lone example of still living integrating efforts

Page 8: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

But... DDI was under discussion

March 1999 – DDI Beta version became operable

ADP applied for a grant which secured a six-month long intensive learning and practise of its own XML codebooks production

Results:

1. Successful implementation of first ten XML codebooks

2. Enhancing a production line for a routine codebook production.

Page 9: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

2000 - 2001

Preparation of our own XSL for XML Codebook presentation on the internetMarch 2000 –DDI DTD Version 1.0 was publishedMachine conversion of DDI DTD Beta XML Codebooks into 1.0 version Continuing production of XML Codebooks

Page 10: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

NESSTAR

Meanwhile a parallel refinement of NESSTAR tool was developing, which promises to add functionality to a growing collection of XML codebooks

End of 2001 – a configuration of ADP NESSTAR server catalogue

Page 11: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .
Page 12: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Advantages and disadvantages of using existent but still emerging

standard

There is no need for (re)inventing a local catalogue rules

Cooperation in document production (sharing documents between sites)

A danger of staying alone if others will not adopt the same standard

Less capability to add specific emphasis according to local needs

Page 13: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

+/ -

Use of existing and emerging software tools suitable for the standard environment

Virtual catalogue

Conversion tools from SPSS and CAI software files

Dependency on others timetable in dynamic of tools production

E.g. NESSTAR was late in full adoption of UTF-8 convention which was crucial fur us

Page 14: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

What is xml?

“XML is to a document’s intellectual content what HTML is to the physical structure of that document” (Thomas, Bloc 2001)

Page 15: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Why XML?

XML can be accomplished without professional or expert knowledge (user-friendly)

It is ready for preparing a multiple format presentations, e.g. printed book, internet etc.

It can be filled by different authors - each with specialist knowledge of its subject area. All obey the same content structure.

Page 16: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

DDI DTD <> XML?

DTD= xml Document type definition

DDI DTD = a special Data documentation initiative XML Codebook definition

A Codebook xml document must be “well-formed” and “valid”

Page 17: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Well-formed

Any XML document, e.g. HTML, can be well-formed – in accordance with the XML syntaxMain features: <tags> must be closed</tags>Sensitive “UPPER–lower” case namingOnly one <tag-name ID=“id-entry”> per document

Page 18: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Valid = Well formed +

Conforms to a specific DTD

Example: an underlined path calls ...

<!DOCTYPE codeBook SYSTEM "CONFIG10/CODEBOOK-EN.DTD“>

<codeBook>

<docDscr> ...

Page 19: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

... a file "CONFIG10/CODEBOOK-EN.DTD“>(Content of a file):...<!ELEMENT codeBook (docDscr*

, stdyDscr+ , fileDscr* , dataDscr* , otherMat*) >

<!ATTLIST codeBook %a.global; >

...

Page 20: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

What does it all mean?

You do not have to look in the “machine-readable” “codebook.DTD” file to fill-in a .XML Codebook: A XML editor helps to check well-formedness and document

validity It helps choosing appropriate elements in accordance with

the DTD while editing

A “human-readable” Tag Library consists of element definition with practical examples. It gives you guidance on type and form of information

Page 21: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Let’s look

Inside DDI DTD document structure...

Page 22: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Integrates different levels of information in a same documentdocDscr (XML document and sources description)stdyDscr (Overall study + stdy level references)fileDscr (Physical data files)dataDscr (variables)othMat (additional material for variables documentation)

Page 23: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

It specifies both...

The content of catalogue - suitable as input to virtual catalogue of different sites, produced on various platforms.

The content of codebook (variables description) – suitable as input to “virtual library of all individual measurements in the studies in a collection”

Page 24: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

A dilemma of Library vs. Data service concept (Scheuch, 1990

The unit of storage is “study”

The unit of storage is the variable

Page 25: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

In a DDI DTD XML codebook you can integrate meta-information about...

Intellectual content of a study

Its scope

Methodological details

Retrieval and dissemination policies

File location and format

Page 26: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

(+) References to accompanying documents, e.g.

Reports on methodology,

Publications,

Classifications lists,

Questionnaires and similar,

Computer syntax files,

Tables of results, etc.

Page 27: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

(+) Hyperlink cross-references inside and outside document

The use of ID and IDRefs attributes

The use of URI attributes

Page 28: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

To sum up:

XML is similar to HTML in that it is:

Easy to use,

Broadly accessible,

Hyper-textual

In addition it has:

Computer&human readable and understandable structure of document content

Page 29: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

DDI XML Codebooks production line in ADP

First step:1. Basic information about new data set file, depositor,

and accompanying material is first entered in ADP Inventory book (ACCESS Data base)

2. After choosing best suited predefined XML DDI Codebook template we extract the information from ACCESS data base to the draft XML Codebook

3. A resulting codebook is moved to an Internet catalogue for quick info about new study, viewing is supported by referenced XSL through IE5 or better.

Page 30: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Second step: Full Study description

1. A depositor is requested to fill a MS Word form, containing elements corresponding to DDI DTD study description section

2. A draft XML Codebook from previous step is edited with XMetaL® XML editor. Missing peaces of information are added manually

Page 31: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Third step: Codebook Data description generated from SPSS

data file 1. Final SPSS data file, if fully labelled, is

converted with the NSD XML Generator ® to an XML data description section of DDI Codebook and integrated with previous study description

Page 32: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Step 4: Codebook Data description with full questions text

1. For most important data sets full questions text is entered into dD section from original questionnaire text file

or 2. by using a conversion tool from CAI

computer readable files to a DDI XML files.

Page 33: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Finally NESSTAR ®

Final two documents, Slovene and English language DDI XML Codebooks, are converted into a NESSTAR complaint format and together with the data file published into a NESSTAR catalogue.

Page 34: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Codebook.xsl

Original paper documents

Free-text documents

Codebook.xml (XML Editor)

Computer readableHuman + computer

readable

Human readable

IE explorer view Printed codebook

NESSTAR Catalogue + Data Explorer

SPSS data + labels,

CAI quest. docDscr stdyDscr fileDscr dataDscr othMat...

Coversion Tools

stdyDscr form filled-in by depositor

Code-book.dtd

Tag Library

Page 35: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Common issues in DDI XML codebooks production

1. XML editors does not necessarily support UNICODE

2. The use of entities in XML document helps to standardise document production, makes it faster and easier to translate into English

Page 36: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Conclusions:

DDI DTD receive growing attention in a community which guaranty production of new tools for enhancing its use

Despite continuing developments and overlapping archival standards, DDI 1.0 as today’s technology promises the longevity of XML Codebook 1.0 documents

Slovene ADP have taken the experience with DDI for guidance of its organisation.

Page 37: Janez Štebe DDI Experience in ADP (2002) Arhiv družboslovnih podatkov (ADP) University of Ljubljana E-mail: arhiv.podatkov@uni-lj.si URL: .

Main references

DDI Committee (2001): The Data Documentation Initiative (DDI) Version 1.1: The New Specification for Social Science Metadata. Project Description.

Data Documentation Initiative. A Project of a Social Science Community. (2002) http://www.icpsr.umich.edu/DDI

Scheuch, Erwin K. (1990): From a data archive to an infrastructure for the social sciences. International Social Science Journal, No. 123, pp. 93-111.

Tanenbaum, Eric and Marcia Taylor (1990): Developing social science archives. International Social Science Journal, No. 124?.

Thomas, Wendy L. And William C. Block (2001): An Introduction to the Data Documentation Initiative (DDI). ICPSR OR Meeting 2001. http://www.icpsr.umich.edu/DDI/PAPERS/