Top Banner
Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar Corcho Escuela Técnica Superior de Ingenieros Informáticos Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net/ ocorcho@fi.upm.es
42

Linked Statistical Data 101

Apr 06, 2017

Download

Oscar Corcho
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linked Statistical Data 101

Linked Statistical Data 101ESS Workshop on dissemination of official

statistics as open data18-19 January 2017, Malta

Oscar CorchoEscuela Técnica Superior de Ingenieros Informáticos

Universidad Politécnica de MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madrid

http://www.oeg-upm.net/ [email protected]

Page 2: Linked Statistical Data 101

2

Contents

• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular

• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)

• A bit of technical background- W3C RDF DataCube

• Preparing the discussion on benefits for different types of stakeholders

Page 3: Linked Statistical Data 101

3

Contents

• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular

• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)

• A bit of technical background- W3C RDF DataCube

• Preparing the discussion on benefits for different types of stakeholders

Page 4: Linked Statistical Data 101

What is Open Data?

• Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and share alike

• Key aspects:- Availability and access: the data must be available as a

whole and at no more than a reasonable reproduction cost, preferably by downloading over the Internet. The data must also be available in a convenient and modifiable form.

- Re-use and redistribution: the data must be provided under terms that permit re-use and redistribution including the intermixing with other datasets.

- Universal participation: everyone must be able to use, re-use and redistribute - there should be no discrimination against fields of endeavour or against persons or groups

[source: Open Data Handbook, http://opendatahandbook.org/en/what-is-open-data/ ]

Page 5: Linked Statistical Data 101

Relevant Legislation. Europe and Spain

• Open Access Initiative (2001). Scientific information; > 510 orgs• Aarhus Convention (1998). Right to participate and access; 41

countries and the EU• PSI Directives. PSI reuse (2003/98/EC and 2013/37/UE)• Convention about access to official documentation (2009)

- 12 countries

• Law 37/2007. PSI reuse (transposition of directive 2003/98/EC)- Modified in law 18/2015 (BOE 10/07/2015, directive 2013/37/UE )

• Law 11/2007. Citizen access to public services, and rights to good quality services

• RD 4/2010 Esquema Nacional de Interoperabilidad- Open standards, technology neutral, open source

• RD 1495/2011 It develops Law 37/2007 for national agencies• Norma Técnica de Interoperabilidad (19/02/2013, BOE 4/3/2013)

[source: based on a presentation from Antonio Rodríguez Pascual (CNIG)]

Page 6: Linked Statistical Data 101

An Explosion of Open Data Portals

Page 7: Linked Statistical Data 101

Open Data and how to publish it

1) In a posterboard- For those with a lot of free time available- Or those who happen to be there at the right time

Adapted from: Antonio Rodríguez Pascual (IGN)

Page 8: Linked Statistical Data 101

Open Data and how to publish it

2) On a Web page or mobile app- For people, but not downloadable

Adapted from: Antonio Rodríguez Pascual (IGN)

Page 9: Linked Statistical Data 101

Open Data and how to publish it

3) In files- These can be downloaded and use by humans in

information systems (XML, HTML, CSV, GTFS, etc.)- Luckily, it is not a scanned PDF

Adapted from: Antonio Rodríguez Pascual (IGN)

Page 10: Linked Statistical Data 101

Open Data and how to publish it

4) Via Web Services- They can be used by systems (sometimes persons)- They allow generating added value- Ease of integration in the application logic

Adapted from: Antonio Rodríguez Pascual (IGN)

Page 11: Linked Statistical Data 101

All together…, Shaken, not stirred…

Page 12: Linked Statistical Data 101

What is Linked Data?

1. Use URIs to identify rsources

2. Use HTTP URIs, so that they can be found

3. Use de-referenceable URIs, that is, provide useful data (RDF, JSON, SPARQL)

4. Include links to other URIs.

• http://www.w3.org/DesignIssues/LinkedData.html

Page 13: Linked Statistical Data 101

Open Data and how to publish it

5) Via APIs (semantically enhanced) and linked- To be used by systems (and sometimes persons)- It allows generating added-value services- Standardised formats (JSON, JSON-LD, RDF)- Standardised models (vocabularies, ontologies)

Page 14: Linked Statistical Data 101

Difficult to reuse

√ Reusable. Not open

√ Reusable, open Difficult to link together

√ Reusable, open, complete, easier to link

Data representation formats

And many more: JSON, JSON-LD, Shapefiles, KMZ, KML, PC-Axis, etc.

Page 15: Linked Statistical Data 101

Recap: The 5-star categorisation from TBL

Page 16: Linked Statistical Data 101

16

Contents

• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular

• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)

• A bit of technical background- W3C RDF DataCube

• Preparing the discussion on benefits for different types of stakeholders

Page 17: Linked Statistical Data 101

INFRASTRUCTURE

MICRODATA

MACRODATA

i

Cartography, streets,directories, codes…

ANALYSTSJOURNALISTS

CITIZENS

RESEARCHERS

NON

PUBL

IC

PUBL

ICMETADATA

Which type of data and which (re)users?

[source: Alberto González Yanes (ISTAC)]

Page 18: Linked Statistical Data 101

18

Our use case: Aragón

• IAEST - Instituto Aragonés de

Estadística

• Good open data ecosystem- Aragón Open Data

• http://opendata.aragon.es/ - Zaragoza

• http://datos.zaragoza.es/

Page 19: Linked Statistical Data 101

Reports and templates from Oracle BI

Current Web application for local statistics

Statistics about municipalities

Page 21: Linked Statistical Data 101

Reports and templates from Oracle BI

Current Web application for local statistics

What have we done?

Page 22: Linked Statistical Data 101

SPARQL

Elda

Linked Data

Transformation process

API

Publication process

General architecture

This is not the purpose of my talk

https://github.com/aragonopendata/local-data-aragopedia

Page 23: Linked Statistical Data 101

URIs for datasets

• Let’s look for the dataset on “Number of homes per owner per municipality”- Número de hogares por tipo de propietario por municipio

• The dataset has a URI- http://opendata.aragon.es/recurso/iaest/dataset/01-

010013TM

Page 24: Linked Statistical Data 101

24

What is behind that URI?

This is not the purpose of my talk

Page 25: Linked Statistical Data 101

25

URIs for each observation

• And now we can point to specific observations in this dataset- In 2001, the number of buildings owned by one person in the

municipality of Ilche• http://opendata.aragon.es/recurso/iaest/observacion/01-010013

TM/00794aab-964f-35c7-8e7c-156c9bc60133

Page 26: Linked Statistical Data 101

26

URIs for each observation

Page 27: Linked Statistical Data 101

27

And links to other URIs in Aragón

• The municipality of Ilche- http://opendata.aragon.es/recurso/territorio/Municipio/Ilche - This information is owned by another department of the

Government of Aragón

Page 28: Linked Statistical Data 101

28

And links to codelists

• Types of owners- http://opendata.aragon.es/kos/iaest/clase-de-propietario

• The community• A person• A society• A public organisation

Page 29: Linked Statistical Data 101

SPARQL endpoint

The women population in Zaragoza in the age range of 0-15 years growed until 2013 and then reduced

select distinct ?year ?personaswhere { ?x a qb:Observation . ?x qb:dataSet <http://opendata.aragon.es/recurso/iaest/dataset/03-030005TM> . ?x <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?year . ?x <http://purl.org/linked-data/sdmx/2009/dimension#refArea>

<http://opendata.aragon.es/recurso/territorio/Municipio/Zaragoza>. ?x <http://opendata.aragon.es/def/iaest/dimension#edad-grandes-grupos>

<http://opendata.aragon.es/kos/iaest/edad-grandes-grupos/0-a-15> . ?x <http://opendata.aragon.es/def/iaest/dimension#sexo>

<http://opendata.aragon.es/kos/iaest/sexo/mujeres>. ?x <http://opendata.aragon.es/def/iaest/medida#personas> ?personas .} ORDER BY ?year

Examples at https://github.com/aragonopendata/local-data-aragopedia/blob/master/consultas.md

Page 30: Linked Statistical Data 101

30

Contents

• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular

• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)

• A bit of technical background- W3C RDF DataCube

• Preparing the discussion on benefits for different types of stakeholders

Page 31: Linked Statistical Data 101

W3C Data Cube

3131

http://www.w3.org/TR/vocab-data-cube/

Page 32: Linked Statistical Data 101

W3C Data Cube

3232

Page 33: Linked Statistical Data 101

DataSets and Observations

33

Page 34: Linked Statistical Data 101

34

Observations in a dataset

qb:DataSet

qb:Observation

qb:dataSet

rdf:type

iaest-data:01-010003M/22001/030-045 aod:Abiego

sdmx:refArea

Iaest-codelist:superficie030-045

iaest:superficieUtil

“1”^^xsd:int

Iaest:numeroHogares

iaest:01-010003M

qb:dataSetrdf:type

Page 35: Linked Statistical Data 101

DataCube Structure Definition

35

Page 36: Linked Statistical Data 101

36

Describing the dataset

qb:DataSet

qb:DataStructureDefinition

qb:ComponentSpecification

qb:ComponentProperty

sdmx:refArea

iaest:superficieUtil

qb:structure qb:component qb:componentProperty

rdf:type rdf:type

iaest:01-010003M iaest--dsd:01-010003M

qb:structure qb:component

qb:measureiaest:numeroHogares

qb:dimension

qb:dimension

rdf:typerdf:type

Page 37: Linked Statistical Data 101

37

Dimensions

qb:DataSet

qb:DataStructureDefinition

rdfs:rangeqb:concept

qb:DimensionProperty

qb:MeasureProperty

qb:Observation

esadm:Municipio

Iaest:SuperficieUtil

qb:ComponentSpecification

qb:ComponentProperty

rdfs:subClassOf

qb:dataSet

iaest:numeroHogaressdmx:refAreaiaest:superficieUtil

rdf:type rdf:type

rdfs:range

xsd:int

rdfs:range

qb:structure qb:component

qb:componentProperty

Page 38: Linked Statistical Data 101

38

SKOS Codelists

rdfs:subClassOf

sdmx:CodeList

skos:Concept

skos:ConceptScheme

iaest:SuperficieUtil

qb:codeListIaest-codelist:SuperficieUtil

rdf:type

Iaest-codelist:superficie030-045

skos:hasTopConceptrdf:type

Iaest-codelist:superficie046-060

Iaest-codelist:superficie180-mas

Page 39: Linked Statistical Data 101

39

Contents

• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular

• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)

• A bit of technical background- W3C RDF DataCube

• Preparing the discussion on benefits for different types of stakeholders

Page 40: Linked Statistical Data 101

Why Linked Statistical Data? (I)

• Facilitate data (re)use by developers outside our organisation• Data access APIs (according to standards)• Do they prefer CSVs, PCAxis, SDMX, RDF?• Fine-grained data granularity (refer to specific facts)

• Integration with other data sources from other public or private organisations- E.g., Government of Aragón for municipalities

• Allow for queries across datasets- E.g., tell me how many municipalities may benefit from this

funding that I am making available with these restrictions: number of registered companies lower than 5 and unemployed population higher than 15%

Page 41: Linked Statistical Data 101

41

Why Linked Statistical Data? (II)

• Internal benefits as well- Codelists are made available and more visible internally

- Methodology and metadata explicitly described as part of the RDF DataCube data (e.g., reference years in datasets)

Page 42: Linked Statistical Data 101

Linked Statistical Data 101ESS Workshop on dissemination of official

statistics as open data18-19 January 2017, Malta

Oscar CorchoEscuela Técnica Superior de Ingenieros Informáticos

Universidad Politécnica de MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madrid

http://www.oeg-upm.net/ [email protected]