Day 4 Metadata Statistics Canada December 1 st 2011 SIMPII – Workshop on Information Technology
Apr 01, 2015
Day 4
Metadata
Statistics Canada
December 1st 2011
SIMPII – Workshop on Information Technology
23-04-11Statistics Canada • Statistique Canada2
Outline
What is metadata? Standards Why is it important? Implementation example with Social Surveys
Common Tools
23-04-11Statistics Canada • Statistique Canada3
What is metadata?
Definition: “Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource”*
*NISO (2004) Understanding Metadata. Bethesda, NISO Press
Describes content, quality, condition and other characteristics about data
23-04-11Statistics Canada • Statistique Canada4
What is metadata?
Metadata answers questions about your data:• What is the concept?• Where is the input source?• What is it used for?• When did it changed?• Who changed the variable last?
Helps to improve the communication between:• Data developers, Data users and Organizations
23-04-11Statistics Canada • Statistique Canada5
Standards
Intended to establish a common understanding of the meaning or semantics of the data
As an example in StatCan, we use :• DDI : standard for technical documentation describing
social science data
23-04-11Statistics Canada • Statistique Canada6
Why is it important?
Records basic information about your data Provides a common understanding of your data Allows for reuse during Survey Development
Life Cycle Facilitates connections between systems &
services Support archiving & preservation
23-04-11Statistics Canada • Statistique Canada7
Example“dog” “golden retriever puppy”
Clearly, this more specific search term is better. But it only works if someone has taken the time to associate the metadata.
Clearly, this more specific search term is better. But it only works if someone has taken the time to associate the metadata.
23-04-11Statistics Canada • Statistique Canada8
Example
This puppy example illustrates not only the effectiveness of metadata but also the importance of tagging content with metadata.
If users don’t take the time to attach metadata when they create, upload, or edit documents the benefits will be lost.
DOCDOC
document name
audience
expiration date
version
department
project
23-04-11Statistics Canada • Statistique Canada9
Enterprise Metadata Classification
23-04-11Statistics Canada • Statistique Canada10
Common Tools Logo
23-04-11Statistics Canada • Statistique Canada11
Common Tools Technical Architecture
23-04-11Statistics Canada • Statistique Canada12
Solution Overview Social Survey Metadata Environment (SSME)
• Supporting environment of a metadata driven processing system
Interfaces are developed to access and manipulate appropriate metadata in support of a particular business process• Questionnaire Development (QDT)• Data Dictionary (DDT)• Processing and Specifications (PST)• Derived Variable (DVT)
23-04-11Statistics Canada • Statistique Canada13
Solution Overview
Social Survey Processing Environment (SSPE)• A set of generalized processes that can be used in the
processing activities of the Survey Life Cycle.
The purpose of these processes is to allow subject matter and survey support staff to specify and run the processing of a survey in a timely fashion with high quality outputs.
23-04-11Statistics Canada • Statistique Canada14
Questionnaire Development Tool screenshot
23-04-11Statistics Canada • Statistique Canada15
Questionnaire Development Tool screenshot
23-04-11Statistics Canada • Statistique Canada16
QDT Auto-generated ReportCELL_Q03 For which of the following reasons did she get her
cell phone?Pour quelles raisons, parmi les suivantes, a-t-elle acquis son téléphone cellulaire?
INTERVIEWER: Read categories to respondent.Mark all that apply.
INTERVIEWEUR : Lisez les catégories au répondant.Choisissez toutes les réponses appropriées.
01 It was a gift C'était un cadeau
02 In case of emergency En cas d'urgence
03 Peer influence Influence des pairs
04 Work requires it Requis pour le travail
05 To browse the Internet Pour naviguer Internet
06 To replace a regular landline phone
Pour remplacer un téléphone régulier
07 To replace another multimedia player
Pour remplacer un autre appareil multimédia
08 Other Autres
DK, RF NSP, RF
23-04-11Statistics Canada • Statistique Canada17
Processing Specifications Tool
23-04-11Statistics Canada • Statistique Canada18
Processing Specifications Tool
23-04-11Statistics Canada • Statistique Canada19
Processing Specifications Tool
23-04-11Statistics Canada • Statistique Canada20
Data Dictionary Tool output
Code Answer Categories Frequencies Population %
1 Yes 22,345 4,746,561 17
2 No 108,655 23,080,670 82
6 Valid skip 950 201,801 1
7 Don’t know 3 637 0
8 Refusal 1 212 0
9 Not Stated 5 1062 0
Total 131,959 28,030,943 100
Variable Name: CELL_03A Length: 1 Position: 5 Question Name: CELL_Q03 Concept: Reasons to get a cell phone – Gift
Question: For which of the following reasons did you get your cell phone? – Gift Universe: Respondents who answered CELL_1=1
23-04-11Statistics Canada • Statistique Canada21
Common Tools Entity Relationship Diagram
23-04-11Statistics Canada • Statistique Canada22
Common Tools Portal
SDMX
Statistical Data and Metadata eXchange (born in 2002)- Standardization for statistical data and metadata access and exchange- Between NSO’s and international organizations- Within a national statistical system - Within an organization- For dissemination
Sponsors: BIS, ECB, EUROSTAT, IMF, OECD, UN, World Bank 1) Technical standards (v1: ISO 17369)
- XML-based message formats (SDMX-ML)- GESMES and the UN/EDIFACT-based message formats- Guidelines for SDMX web service implementations- SDMX registry specification (“yellow pages”)
2) SDMX Content-Oriented Guidelines- Statistical subject-matter domains (to locate data and working groups)- Cross-domain concepts/code lists (incl. metadata concepts, mapping if difficult to agree)- Metadata common vocabulary (terminology)
SDMX Plans for Statistics Canada
Create SDMX-ML outputs from CANSIM Investigate OECD implementation of SDMX
using .STAT software Participate in Statistical network -- Innovation in
dissemination, Machine to machine transfer stream with Stats New Zealand, Australian Bureau of Statistics
Investigate implementation of SDMX Reference Infrastructure from Eurostat
23-04-11Statistics Canada • Statistique Canada25
Conclusion
Communication is key to collaboration Help for decision making Reduces system and data redundancy Enables enterprise-wide application
development
23-04-11Statistics Canada • Statistique Canada26
Jean LabbéField IT ManagerStatistical Information System Division Informatics Branch(613) [email protected]
Xie xie