Top Banner
© Information Systems Lab - 2013 http://islab.uom.gr Linked Open Government Data Analytics Evangelos Kalampokis, Efthimios Tambouris, Konstantinos Tarabanis
25

Linked Open Government Data Analytics

Jan 14, 2017

Download

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linked Open Government Data Analytics

© Information Systems Lab - 2013

http://islab.uom.gr

Linked Open Government Data Analytics

Evangelos Kalampokis, Efthimios Tambouris,

Konstantinos Tarabanis

Page 2: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Aim of the paper

Introduce the concept of Data Analytics on top of

distributed statistical linked OGD

Describe the technical prerequisites

Demonstrate the end-user value

Page 3: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Open Government Data

More than 180 Open Government Data portals around the globe

provide data that “can be freely used, reused and redistributed by

anyone”

Page 4: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

OGD impact

The majority of existing

applications exploits a

single dataset and

visualizes data on a

map.

Expected OGD

potential has not yet

realized

Page 5: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Importance of Data in modern societies

Business Intelligence

Evidence based policy-making

Academia

Page 6: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Open Statistical Data

A big portion of Open Government Data concerns statistics such as

population figures, economic and social indicators

For example, the majority (5867 out of 6098 datasets) of the data

published on the EU Open Data Portal are of statistical nature

Page 7: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

But…although OGD enables free access to everyone, data is often

isolated (e.g. due to the available formats)

Data Silos

http://www.flickr.com/photos/rachelrusinski/526260022

Page 8: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Vision: Linked Open Government Data Analytics

Combining statistical OGD that were previously closed in disparate

sources

Performing data analytics on top of combined data

Gaining unexpected and unexplored insights into different domains

and problem areas.

Page 9: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Combining Statistical Data

Requires effort to:

– Discover data (e.g. datasets sharing common joint points and thus allow

for further analysis)

– Collect data

– Clean data (timely, accurate, relevant data)

– Transform data (common formats)

– Integrate data (interoperability, levels of granularity etc.)

– Visualize and statistically analyze (semi-automatic according to the type

of variables and measures)

We need to shift this effort from end-users to data-providers

http://www.flickr.com/photos/tetsumo/3586864217

Page 10: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Connecting Data Silos

We need an infrastructure that will enable connecting data silos over

the Web and thus reducing the effort required for statistical data reuse

This is where Linked Data comes in…

http://www.flickr.com/photos/sethwoodworth/2303531107

Page 11: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Linked Data

Items in a dataset are identified using URIs

URIs are dereferenceable using HTTP

RDF links to other URIs in other datasets are included

Page 12: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Technical Prerequisites

Metadata for data discovery

Vocabularies

Code lists, concept schemes and classifications

Typed links (e.g. olws:sameAs) between

– Dimensions definitions

– Values of dimensions

– Categories of measures

Page 13: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

RDF data cube vocabulary

Page 14: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

The UK Elections Case

Objective:

– To gain insights regarding UK elections through OGD

Starting point:

– Data regarding the results of two UK general elections from 2005 and

2010 – in both national and constituency level (Open Data in Guardian)

OGD:

– We need to discover data that could be analyzed together with the

election results data (i.e. that share common joint points)

Page 15: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

OGD

Source:

– Data from data.gov.uk

Datasets:

– Unemployment and poverty between 2005-2010 in the UK parliament

constituencies

– In this paper we concentrate on unemployment due to space limitations

Page 16: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Data Conditioning: Linked Data Creation

Page 17: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Linked Data Analytics

Enables the semi-automatic visualization and performance of

statistical analyses based on :

– Joint points (i.e. variables that are described at a parliament constituency

level)

– Type of variables (e.g. Regression in the case of continuous and

classification in the case of categorical)

Page 18: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Logistic regression Classification Analysis

Measures the relationship between a categorical dependent variable

and one or more continuous independent variables by converting the

dependent variable to probability scores through the logistic function

Identify the relationship of unemployment rate of a parliament

constituency and the probability P(A) a particular political party to win

the elections in the constituency

P(A) 1

1 ey

y c0 c1x1 ...cnxn

Page 19: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Visualization Unemployment & Labours Results (2005)

The probability for the

Labour Party to win in a

constituency increases as

the unemployment rate of

the constituency increases

In constituencies with

unemployment rate > 5%

the Labour Party has

strong probability to win

In 2005 the average

unemployment rate was

3.35%

Page 20: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Visualization Unemployment & Labours Results (2010)

The pattern is the same

but was moved to the right.

The average

unemployment rate was

3.35% in 2005 and 7.5% in

2010

Page 21: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Visualization Unemployment & Cons Results (2005)

In 2005 the average

unemployment rate was

3.35%

If unemployment rate > 5%

then Conservatives have

very small probability to

win

Page 22: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Visualization Unemployment & Cons Results (2010)

In 2010 Conservatives do

not win in constituencies

with unemployment rate

>13%

However the average

unemployment rate

increased from 3.5% to

7.5%

The logistic regression

pattern is the same

Page 23: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Statistical model creation

Logistic function that measures the probability P(A) for a party to win in

a specific parliament constituency

For example, consider the Labour Party in the 2010 elections

x is the unemployment rate of the constituency.

In a constituency with 12% unemployment rate the probability for the

Labour Party to win is P(A)=0.8

P(A) 1

1 ey

y 3.823 0.437 x

Page 24: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Conclusion and Future Work

Significant efforts for developing tools and applications facilitating

Open Government Data (OGD) publishing and reuse

OGD has not yet realized the full potential.

Today, data analytics employ data closed in isolated systems

We claim that the real value of OGD will emerge from performing Data

Analytics on top of combined statistical datasets

Linked Open Government Data Analytics show the road ahead

Future work includes development of a platform enabling semi-

automatic identification of important relations between variables

described in distributed datasets

Page 25: Linked Open Government Data Analytics

© Information Systems Lab, University of Macedonia

Acknowledgments

The work presented in the paper is partly funded by