Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar Corcho Escuela Técnica Superior de Ingenieros Informáticos Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net/ ocorcho@fi.upm.es
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Linked Statistical Data 101ESS Workshop on dissemination of official
statistics as open data18-19 January 2017, Malta
Oscar CorchoEscuela Técnica Superior de Ingenieros Informáticos
Universidad Politécnica de MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular
• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background- W3C RDF DataCube
• Preparing the discussion on benefits for different types of stakeholders
3
Contents
• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular
• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background- W3C RDF DataCube
• Preparing the discussion on benefits for different types of stakeholders
What is Open Data?
• Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and share alike
• Key aspects:- Availability and access: the data must be available as a
whole and at no more than a reasonable reproduction cost, preferably by downloading over the Internet. The data must also be available in a convenient and modifiable form.
- Re-use and redistribution: the data must be provided under terms that permit re-use and redistribution including the intermixing with other datasets.
- Universal participation: everyone must be able to use, re-use and redistribute - there should be no discrimination against fields of endeavour or against persons or groups
[source: Open Data Handbook, http://opendatahandbook.org/en/what-is-open-data/ ]
5) Via APIs (semantically enhanced) and linked- To be used by systems (and sometimes persons)- It allows generating added-value services- Standardised formats (JSON, JSON-LD, RDF)- Standardised models (vocabularies, ontologies)
Difficult to reuse
√ Reusable. Not open
√ Reusable, open Difficult to link together
√ Reusable, open, complete, easier to link
Data representation formats
And many more: JSON, JSON-LD, Shapefiles, KMZ, KML, PC-Axis, etc.
Recap: The 5-star categorisation from TBL
16
Contents
• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular
• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background- W3C RDF DataCube
• Preparing the discussion on benefits for different types of stakeholders
• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular
• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background- W3C RDF DataCube
• Preparing the discussion on benefits for different types of stakeholders
Why Linked Statistical Data? (I)
• Facilitate data (re)use by developers outside our organisation• Data access APIs (according to standards)• Do they prefer CSVs, PCAxis, SDMX, RDF?• Fine-grained data granularity (refer to specific facts)
• Integration with other data sources from other public or private organisations- E.g., Government of Aragón for municipalities
• Allow for queries across datasets- E.g., tell me how many municipalities may benefit from this
funding that I am making available with these restrictions: number of registered companies lower than 5 and unemployed population higher than 15%
41
Why Linked Statistical Data? (II)
• Internal benefits as well- Codelists are made available and more visible internally
- Methodology and metadata explicitly described as part of the RDF DataCube data (e.g., reference years in datasets)
Linked Statistical Data 101ESS Workshop on dissemination of official
statistics as open data18-19 January 2017, Malta
Oscar CorchoEscuela Técnica Superior de Ingenieros Informáticos
Universidad Politécnica de MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madrid