University of Central Florida University of Central Florida STARS STARS Faculty Scholarship and Creative Works 11-19-2014 Data documentation & metadata Data documentation & metadata Sai Deng University of Central Florida, [email protected]Part of the Cataloging and Metadata Commons, and the Scholarly Communication Commons Find similar works at: https://stars.library.ucf.edu/ucfscholar University of Central Florida Libraries http://library.ucf.edu This Other Presentation is brought to you for free and open access by STARS. It has been accepted for inclusion in Faculty Scholarship and Creative Works by an authorized administrator of STARS. For more information, please contact [email protected]. Original Citation Original Citation Deng, S. (2014). Data documentation and metadata. University of Central Florida graduate students library research workshop: Publishing in the Academy.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
University of Central Florida University of Central Florida
STARS STARS
Faculty Scholarship and Creative Works
11-19-2014
Data documentation amp metadata Data documentation amp metadata
Sai Deng University of Central Florida saidengucfedu
Part of the Cataloging and Metadata Commons and the Scholarly Communication Commons
Find similar works at httpsstarslibraryucfeduucfscholar
University of Central Florida Libraries httplibraryucfedu
This Other Presentation is brought to you for free and open access by STARS It has been accepted for inclusion in
Faculty Scholarship and Creative Works by an authorized administrator of STARS For more information please
contact STARSucfedu
Original Citation Original Citation Deng S (2014) Data documentation and metadata University of Central Florida graduate students library research workshop Publishing in the Academy
Sai Deng Metadata Librarian
University of Central Florida Libraries
Data Documentation
ampMetadata
UCF Libraries Research Workshop
Part I The Survey and
Some Data Basics
oThe UCF Research Data Management
Survey Data Recording and Analysis
Section Results (Q D)
oUnderstanding Data Research Data and
Datasets
oWhy data documentation (Q)
Part II Data
Documentation ABC
oData Documentation Study-
level (E)
oData Documentation Data-level
(Structured tabular data
Qualitative data) (E)
Part III Dataset Metadata
oDataset record examples their
associated standards and data
repositories (E D)
oData DOIs and Data Citation
oControlled Vocabularies and
Thesauri (Q)
oCuration Tools for Datasets
Part IV Thoughts and
Services
oA Researcherrsquos View vs A
Curator or Librarians Perspective
on Data Documentation (D)
oDataset and Metadata Services
at UCF
Q w question E w examples D w discussion
o Data
o Research data
o Dataset
o Data documentation
o Data types
o Data formats
o Project level
o File level
o Variable level
o Label
o Code
o Derived data
o Data list
o SPSS
o SAS
o R
o Access
o Spreadsheet
o Curation tool
o Metadata
o Metadata standards
o Metadata schemas
o Controlled vocabularies
o Thesauri
o Funding agencies
o Research data management
o DataCite
o DOI
o Data citation
o Data repository
o Dataset Metadata Service
Word cloud generated using Tagxedo
oThe UCF Research Data Management (RDM) Survey
oThe UCF Research Data Management Survey November 2013
oResults delivered on Research Computing Day at Institute for
Simulation and Training by Dr Penny Beile on February 11 2014
ohttpwwwistucfeduhpcrcdBeile_datahandoutpdf
oData Recording and Analysis Section Questions and Results
o17 Provide any technical details about the tools that you use or
would like to be able to easily use for your work or research
These can be name or vendor of the software product technical
requirements of the software special accelerators like graphical
processor units (GPU) etc
oProvide any technical details about the tools that you use or would
like to be able to easily use for your work or research
oIf applicable how are you recording lab data Please check all that apply
o Lab notebooks in paper
o Excel (or other) files on computers in the lab
o Electronic lab notebook (ELN) tool Please specify which one
oDo you document or record any metadata for your data or dataset
o Yes
oNo
oIf you record metadata for your dataset do you use any local agency-
specific or national standards or guidelines
o Yes
oNo
oNot sure
Processing analysis and writing
software and databases
Processing backup and storage
network server and cloud space
AMOS Automated backup internal to UCF
system (2)
AnsysFluent (2) Black Armor RAID backup system
ArcGISGIS ((2) Cloud storagebackup (Dropbox and
HIPAA-compliant cloudspace
specifically mentioned) (4)
AspenTech DSpace
CST Microwave Studio Personal drives
Database with graphical viewing
capabilities basic statistics filtering
custom output of datasets
Replication
DTreg STOKES
EndNote
FACTSAGE
GPower Hardware
Gephi EPSON Workforce Pro GT-550 scanner
GitGitHub (2) Tablets
Interactive Data Language
LimeSurvey
Lumerical FDTD
MathCad (Vensim) (2)
MatLab (5)
MS Office (2)
NVivo (3)
Origin
RedCap
REMARKrsquoS OMR software
R-project programs (4)
SASSAS Enterprise version (6)
SciFinder Scholar
SigmaPlot (3)
SPSS (5)
SQL
Stata (2)
Video performance analysis software
Thirty-nine (39)
respondents listed a
variety of technical tools
used or needed to
perform their research
More popular tools
SASSAS Enterprise version (6)
MatLab (5) SPSS (5)
R-project programs (4)
NVivo (3) SigmaPlot (3)
hellipSource
httpwwwistucfeduhpcrcd
Beile_datahandoutpdf
o18 If applicable how are you recording lab data Please
check all that apply
oThe 49 respondents selected multiple answers with Excel (or other)
files on computers in the lab the most popular choice with 48
responses (98) This was followed by Lab notebooks in paper (n=29
59) and Electronic lab notebook tool (n=3 6)
oIf respondents indicated that they used an Electronic lab notebook
they were asked to specify which one The two ELNs identified were
Google Docs and Word with embedded images storing NMR and other
equipment data in a digital format
Lab notebooks in paper 29 59
Excel (or other) files on
computers in the lab
48 98
Electronic lab notebook
(ELN) tool Please specify
which one
3 6
Source
httpwwwistucfeduhpcrcd
Beile_datahandoutpdf
o19 Do you document or record any metadata for your
data or dataset
oOf the 62 people who responded 41 (66) indicated that
they do not add metadata to their datasets while 21 (34)
noted that they do If respondents replied to the
affirmative they were asked about specific standards or
guidelines Those responses are reported in question 20
Yes 21 34
No 41 66
Total 62 100
Source
httpwwwistucfeduhpcrcd
Beile_datahandoutpdf
o20 If you record metadata for your dataset do you use any
local agency-specific or national standards or guidelines
oTwenty-one (21) respondents indicated that they assigned metadata to
their data or dataset in question 19 Each of the respondents also
answered the follow up question as to the type of standard or guideline
applied Of the responses 15 (71) do not use any specific standards or
guidelines five (24) use identified standards and one (5) was not sure
oThe five who use standards or guidelines provided the following types
HIPAAFERPA FITS standard program specific librarians are helping us
with this and all of the above
Yes (please specify) 5 24
No 15 71
Im not sure 1 5
Total 21
Source
httpwwwistucfeduhpcrcd
Beile_datahandoutpdf
oAfter all is data recording and documentation needed or
important in your research lifecycle
oWhat are the various ways to do data recording
documentation or analysis
oWill you consider any standard for data documentation in your
research process (eg local agency-specific or national
standards or guidelines) Is it necessary What are these
standards and where to find them
oWhat are the typical tools out there that can help with data
recording and analysis
oData are numerical quantities or other factual attributes derived
from observation experiment or calculation
ndash National Research Council 1992a Setting priorities for space research
Opportunities and imperatives
oData are facts numbers letters and symbols that describe an object
idea condition situation or other factors Data in a database may be
characterized as predominantly word oriented (eg as in a text
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied
oXML schema brings documentation into a single document creates
structured content about the data and allows data interoperability and
sharing
oIt can document comprehensive variable level information such as basic
data dictionary question text and question routing instructions
oData Documentation Initiative (DDI) a metadata specification for the
social and behavioral sciences It is an XML metadata standard for
documenting numeric data Detailed information is available
at httpwwwddiallianceorg
oProjects using the DDI (httpwwwddiallianceorgddi-at-workprojects)
oDDI-compliant data repository
o ICPSR - Inter-university Consortium for Political and Social Research
o Data deposit form httpswwwicpsrumicheducgi-binddf2
o UCF is a member of ICPSR
oUKDA - UK Data Archive
Field Labels
TitlePrincipal investigator(s)
Summary
Access notes
Dataset(s)
httpwwwicpsrumicheduicpsrwebNA
CJDstudies20363archive=NACJDampq=22
university+of+central+florida22amppermit
5B05D=AVAILABLEampx=-999ampy=-84
ICPSR Interuniversity
Consortium for
Political and
Social Research
Dataset(s)
DSO Study-Level Files
Documentation
Questionnairepdf
User guidepdf
DS1 Female Interviews
Documentation
Codebookpdf
hellip
Field Labels
Study description
Citation
Funding
Scope of studybull Subject terms
bull Smallest
geographic unit
bull Geographic
coverage
bull Time period
bull Date of collection
bull Unit of
observation
bull Universe
bull Data types
bull Data collection
notes
Methodologybull Study purpose
bull Study design
Field Labels
bull Sample
bull Mode of data collection
bull Description of variables
bull Response rates
bull Presence of common
scales
bull Extent of processing
Field Labels
Version(s)
Related publications
Variables
Utilities
bull Metadata exports
bull Download statistics
Variables
List all 1682 variables in this study
egID QUESTIONNAIRE ID NUMBER ISEX INTERVIEWER GENDER START INTERVIEW START TIME HHMM USE 24 HR CLOCK Q1A COUNTRY OF BIRTH Q1B STATE OF BIRTH - INITIALS OF STATEQ1C CITY OF BIRTH WRITE IN NOT APPQ1D YEARS LIVED IN USAQ1E RESIDENCY STATUSCHECK1 CHECKPOINT 1 BORN IN SAME METRO AREAQ2 HOW LONG LIVED IN THIS AREA hellip (httpwwwicpsrumicheduicpsrwebNACJDssv
dstudies20363variables)
httpwwwicpsrumicheduicpsrwebICPSRddi2studies20363
docDscrThe Document
Description
consists of
bibliographic
information
describing the
DDI-compliant
document
itself as a
whole
Included Fields
citation
bull titleStmt
bull prodStmt
bull verStmt
bull holdings
Included FieldsCitation
titlStmt
rspStmt
prodStmt
fundAg
grantNo
distStmt
biblCit
Holdings
stdyInfoSubject
Abstract
sumDscr
MethoddataColl
Notes
anlyInfo
dataAccssetAvail
useStmt
stdyDscr The Study
Description consists of
information about the
data collection study
or compilation that the
DDI-compliant
documentation file
describes This section
includes information
about how the study
should be cited who
collected or compiled
the data who
distributes the data
keywords about the
content of the data
summary (abstract) of
the content of the data
data collection methods
and processing etc
Included Fields
fileDscr
fileTxt
fileName
fileDscr
Data Files
Description
Information about
the data file(s)
that comprises a
collection This
section can be
repeated for
collections with
multiple files
oContext and participant details of interviews can be
oA descriptive header or summary page in transcripts or
field notes
oA structured data list
oXML mark-up of data for example
oText Encoding Initiative (TEI) to mark up interview
transcript
oQualitative Data Exchange Format (QuDEx) for
researcher annotations and data linking
oAnonymisation of textual data (eg replacing real names of people
organizations and locations with pseudonyms)
oFile naming
oMeaningful short names identify file types (eg interviews focus groups
field notes audio recordings) avoid space special characters avoid long
names
oOrganizing files in folders Create uniform and structured folder names based
on cases studies locations data types etc or the original anonymized
coded or annotated versions of data
oVersion control Version numbering in file names
oDocumentation Methodology description project plan interview guidelines
consent form templates data analyses and manipulation
o Example is from A NESSTAR FOR QUALITATIVE DATA BUILDING BLOCKS FOR DIGITAL FUTURES By Corti Louise et al available at httpdata-archiveacukmedia376907digitalfutures_dashish_21nov2012pdf
oData List
Interview ID
x001
x002
hellip
Text File Name
6124int001
6124int002
hellip
oCreate and generate metadata for your research data and
datasets in your research lifecycle to preserve the data in the
long run
oConsider what information is needed for the data to be
read and interpreted in the future
oUnderstand your funder requirements for data
documentation and metadata Funder requirements for NSF
GBMF IMLS NEH NIH and NOAA can be found at
httpsdmptoolorgguidance
oConsult available metadata standards in your field You may
refer to Common Metadata Standards and Domain Specific
Metadata Standards for details
oDescribe data and datasets created in your research lifecycle and
use software programs and tools to assist in data documentation
Assign or capture administrative descriptive technical structural
and preservation metadata for the data Some potential information
to document
oDescriptive metadata
oName of creator of data set
oName of author of document
oTitle of document
oFile name
oLocation of file
oSize of file
oStructural metadata
oFile relationships (eg child parent)
oTechnical metadata
oFormat (eg text SPSS Stata Excel tiff mpeg 3D Java FITS CIF)
oCompression or encoding algorithms
oEncryption and decryption keys
oSoftware (including release number) used to create or update the data
oHardware on which the data were created
oOperating systems in which the data were created
oApplication software in which the data were created
oAdministrative metadata
o Information about data creation (eg date)
o Information about subsequent updates transformation versioning
summarization
oDescriptions of migration and replication
o Information about other events that have affected the files
oPreservation metadata
oFile format (eg txt pdf doc rtf xls xml spv jpg fits)
oSignificant properties
oTechnical environment
oFixity information
oAdopt a thesauri in your field if applicable or compile a data dictionary for
your dataset
oObtain persistent identifiers (eg doi purl) for datasets if possible to ensure
data can be found in the future
oFor your full data management plan visit UCF Libraries Data Management
Guide Also refer to Digital Curation Centrersquos Checklist for a Data
Management Plan (httpwwwdccacuksitesdefaultfilesdocumentsresourceDMP_Checklist_2013pdf)
oCommon Metadata Standards
oDisciplinary Metadata Standards
oActivity Choose a dataset or a standard in your field to examine and critique
oSocial Science Dataset
oHumanities Dataset
oBiological Sciences Dataset
oBiotechnology Dataset
oGeospatial Dataset
oEarth Science Dataset
oPhysical Science Dataset
oOtherhellip
oDublin Core (DC) A general metadata standard for describing a wide range of
digital resources
o Dublin Core Metadata Element Set Version 11
(httpdublincoreorgdocumentsdces)
o 15 Elements Title Creator Subject or keyword Description Publisher Type Format
Identifier Source Language Relation Coverage Rights
o DCMI Metadata Terms (httpdublincoreorgdocumentsdcmi-terms)
o DC Qualifiers (httpdublincoreorgdocumentsusageguidequalifiersshtml)
o Encoded Archival Description (EAD)
o A standard for encoding archival finding aids with XML
oGovernment Information Locator Service (GILS)
o The Global Information Locator Service defines a core element set for government
information so that it can be more searchable and discoverable by the general public
oONIX for Books (ONline Information eXchange)
o An international standard for representing and communicating book industry product
oMARC Code List for Countries httpwwwlocgovmarccountries
oMARC Code List for Languages httpwwwlocgovmarclanguages
oMARC Source Codes for Vocabularies Rules and Schemes
httpwwwlocgovmarcsourcecodeformformsourcehtml
oFor digital and online resources
oInternet Media Types wwwianaorgassignmentsmedia-
typesindexhtml
oMODS Note Types httpwwwlocgovstandardsmodsmods-
noteshtml
oDCMI Type Vocabulary httpdublincoreorgdocumentsdcmi-
termsindexshtmlH7
o Subject Thesauri and Ontologies
o AGROVOC (Agricultural Organization of the United Nations Vocabulary)
o Astronomy Thesaurus
o CAB Thesaurus (for life sciences technology and social sciences)
o CIF dictionaries (for Physics)
o Eurovoc (European Union Thesaurus)
o Ethnographic Thesaurus
o Gene Ontology
o GeoNames
o Getty Institute Art and Architecture Thesaurus Online
o Getty Institute Thesaurus of Geographic Names
o ICD (International Classification of Diseases)
o Library of Congress Authorities for subject headings
o Library of Congress Thesaurus for Graphic Materials
o Logical Observation Identifiers Names and Codes (LOINC)
o MESH (Medical Subject Headings)
o Public Health Language
o Rare Books and Manuscripts Section (RBMS) Controlled Vocabularies
o RxNorm (for drugs)
o SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms)
o STW Thesaurus for Economics
o UNBIS Thesaurus
o UNESCO Thesaurus
o USDA National Agricultural Library Agriculture Thesaurus
Question Have you ever
used thesauri in your study
and research
Getty Union List of Artist Names
(ULAN)The ULAN includes proper names and
associated information about artists
Artists may be either individuals
(persons) or groups of individuals working
together (corporate bodies) Artists in
the ULAN generally represent creators
involved in the conception or production
of visual arts and architecture
Library of Congress Name
Authority File (LCNAF)
The LCNAF provides authoritative
data for names of persons
organizations events places and
titles
Virtual International
Authority File (VIAF)
The VIAFtrade (Virtual International
Authority File) combines multiple
name authority files into a single
OCLC-hosted name authority
service The goal of the service is to
lower the cost and increase the
utility of library authority files by
matching and linking widely-used
authority files and making that
information available on the Web
Web Ontology Language
(OWL)The OWL 2 Web Ontology Language is an
ontology language for the Semantic Web
with formally defined meaning OWL 2
ontologies provide classes properties
individuals and data values and are stored
as Semantic Web documents OWL 2
ontologies can be used along with
information written in RDF and OWL 2
ontologies themselves are primarily
exchanged as RDF documents
MADSRDFThe Metadata Authority Description
Schema (MADS) is an XML schema for an
element set that may be used to provide
metadata about authorized forms of
agents (people organizations) events
and terms (topics geographics genres
etc) MADSRDF
builds on MADSXML as a knowledge
organization system
Resource Description
Framework (RDF)RDF is a standard model for data
interchange on the Web RDF extends
the linking structure of the Web to use
URIs to name the relationship
between things as well as the two
ends of the link (this is usually
referred to as a ldquotriplerdquo) Using this
simple model it allows structured and
semi-structured data to be mixed
exposed and shared across different
applications
SKOS Simple Knowledge
Organization for the Web SKOS is a W3C recommendation
designed for representation of
thesauri classification
schemes taxonomies subject-
heading systems or any other
type of structured controlled
vocabularyLinked data
examplesbull FAST Faceted
Application of
Subject
Terminology
bull Dewey Decimal
Classification
bull Open Metadata
Registry (RDA
vocabularies)
bull Library of Congress
Linked Data
Service
hellip
OpenRefine (ex-Google Refine) is a powerful tool for working with messy data cleaning it transforming it from one format into another extending it with web services and linking it to databases like Freebasehttpopenrefineorg
Nesstar Publisher is a
free advanced data management program It can be used for the preparation of data and metadata Its DDI complianthttpwwwnesstarcomsoftwarepublisherhtml
QualAnon DSDR
Qualitative Data Anonymizer
This free transcript anonymizationtool is designed solely to de-identify qualitative interview transcriptshttpswwwicpsrumicheduicpsrwebDSDRtoolsanonymizejsp
Colectica for Microsoft Excel
A free tool to document your spreadsheet data using the Data Documentation Initiative (DDI) metadata format the open standard for data documentationhttpwwwcolecticacomsoftwarecolecticaforexcel
Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees It is a structural schema language expressed in XML using a small number of elements and XPathhttpxmlasccnetresourceschematronschematronhtml
Altova XMLSpy is an advanced XML editor for modeling editing transforming and debugging XML-related
technologieshttpwwwaltovacomxmlspy
html
ltoXygengt XML
Editor is an XML tool that supports all the XML schema languages The XSLT and XQuery support is enhanced with powerful debuggers and performance profilers You can use ltoXygengt XML Editor to work with all XML-based technologies including XML databases XProcpipelines and web serviceshttpwwwoxygenxmlcom
LabTrove is a free blogging
platform specifically designed for use in a research environment It aims to serve as a highly flexible electronic notebook and data management system by integrating with a labrsquos data-producing instruments researchers can describe an experiment and associate it with its data output at the time of capture rather than annotating after the fact httpwwwlabtroveorg
Kepler is a scientific workflow
modeling and management system that enables users regardless of programming experience to set up data analysis pipelines The software will assemble execute and document theof services and scripts that scientists with large-scale data use to execute researchhttpskepler-projectorg
DataCiteThe DataCite Consortium
provides a number of
services to support
efforts at increasing the
ease and prevalence of
data citationhttpwwwdataciteorg
DMPTool is an online service to enable researchers to create data management plans now required by many funding agencies and to receive tailored institutional guidance to help them in the processhttpsdmpcdliborg
oSection II addresses data documentation more from the
researcherrsquos view
oSection III interprets data documentation more from
a curator or librarians perspective
oWhat do researchers really care about
oWill each party see the other sidersquos points and
emphases
Create edit share and save
data management plans
Open access scholarly publishing services
papers journals books seminars amp more
Curation repository store manage and share research data
Create and manage
persistent identifiers
Open source add-in for Microsoft
Excel as a data collection tool
An infrastructure to publish and get credit
for sharing research data
CDL Curation and Publishing Services
httpwwwcdliborg
This slide is by Joan Starr California Digital Library httpwwwslidesharenetjoanstarrdataset-metadata-tools-approaches-for-access-preservationfrom_search=1
Data Publication
httplibraryucfeduScholarlyCommunicationUCFResearchLifecyclepdfData Set Related Services
oldquoData Set (also called lsquoDatasetrsquo) Metadatardquo provides
researchers consultation on
oProject and dataset documentation
oMetadata standards (Common and Domain Specific)
oMetadata schemas customization
oControlled vocabularies and thesauri
oData curation tools and practices
oAssists in describing basic properties of your data and enriching
metadata for your datasets
oSupports applying controlled vocabularies or optimizing keywords
to enhance the search of your datasets
oHelps to prepare your metadata and data for deposit and
preservation
oScholarly Communication (httplibraryucfeduScholarlyCommunication)
oSC Contact Information (httplibraryucfeduScholarlyCommunicationContactphp)
oUCF Library Research Guides (httpguidesucfedu)
oMetadata Guide (httpguidesucfedumetadata)
oData Management Guide (httpguidesucfedudata)
oResearch and Information Services (httplibraryucfeduReference)
Abstract United States Geological Survey Saint Petersburg Florida Center for Coastal and Watershed
Studieshellip
Purpose These data and information are intended for science researchers studentshellip
Language eng USA
Citation
Title Biological data of field activity 08CRD01 (B-1-08-VI) in US Virgin Islands from 05302008 to 06132008
Date
Date 2013-03-03
Date Type Publication Date
Organisation Name US Geological Survey (USGS) lthttpwwwusgsgovgt Coastal and Marine Geology
(CMG) lthttpwalruswrusgsgovgt
Role Publisher
Contact Info hellip
Point Of Contact hellip
Representation Type Vector
Topic Category
Keyword Collection
Keyword EARTH SCIENCE gt OCEANS
Associated Thesaurus Global Change Master Directory (GCMD)
Keyword Marine Geology
Associated Thesaurus USGS CMG InfoBank
Spatial Extent
West Bounding Longitude -6575000
East Bounding Longitude -6325000
North Bounding Latitude 1875000
South Bounding Latitude 1725000
FGDCCSDGM
Metadata
Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site access Some USGS information accessed through this means may be preliminary in nature and presented without the approval of the Director of the USGShellip
Legal Constraints
Use Constraints Other Restrictions
Other Constraints Use Constraints Please recognize the US Geological Survey (USGS) as the source of this information Physical materials are under controlled on-site accesshellip
hellip
Distribution
Distribution Format
Format Name ASCII
Format Version
File Decompression Technique No compression applied