Top Banner
Data Access Linking and Integration (DALI) An enabling technology for Data Analytics over large scale, heterogeneous data sets Martin Stephenson Lead Architect Semantic and Reasoning Systems IBM Research Ireland
12

Data Access Linking and Integration (DALI) - An enabling technology for Data Analytics over large scale, heterogeneous data sets, Martin Stephenson of IBM Smart Cities - Dublinked

Jul 14, 2015

Download

Data & Analytics

Dub Linked
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Access Linking and Integration (DALI) - An enabling technology for Data Analytics over large scale, heterogeneous data sets, Martin Stephenson of IBM Smart Cities - Dublinked

1

Data Access Linking and Integration (DALI)

An enabling technology for Data Analytics over

large scale, heterogeneous data sets

Martin Stephenson

Lead Architect – Semantic and Reasoning Systems

IBM Research Ireland

Page 2: Data Access Linking and Integration (DALI) - An enabling technology for Data Analytics over large scale, heterogeneous data sets, Martin Stephenson of IBM Smart Cities - Dublinked

©2015 IBM Corporation

Background

• 20 years working in software development and design in various

companies

Banking

Telecoms

Biometric Security

Innovation and development

• Currently work in IBM Research Ireland

Semantics and Reasoning group

Mix of PhD researchers and software engineers

Working with heterogeneous, noisy and semi-structured data

Focus on semantic data integration

• Responsible for the Dublinked backend

Design of metadata

Design, development and maintenance of backend

publishing system

Page 3: Data Access Linking and Integration (DALI) - An enabling technology for Data Analytics over large scale, heterogeneous data sets, Martin Stephenson of IBM Smart Cities - Dublinked

©2015 IBM Corporation

Open data – what and why ?

What ?

• Cities, governments and their service providers are starting to make data available on

the internet

• Generally, this data consists of business or operational data.

• Typically, it has the following characteristics:

Heterogeneous

Semi-Structured

Many different formats (e.g. xls, csv, pdf, xml, kml etc)

• This is in addition to the vast amount of data already on the internet.

All of the above is “Open data”

Why ?

• Transparency

• Improved or new private products and services

• Improved effectiveness of services

• Improved efficiency of services

• Innovation

• New knowledge from existing data sources, combined data sources and patterns in

large data volumes

Page 4: Data Access Linking and Integration (DALI) - An enabling technology for Data Analytics over large scale, heterogeneous data sets, Martin Stephenson of IBM Smart Cities - Dublinked

©2015 IBM Corporation

How do I expose this data for use ?

In order to leverage this open data and utilise it, there are 2 main things we need to be able to do: 1) Understand the data 2) Integrate it with our existing corpus of data and systems

(1) Understanding the data

Problem : As this is heterogeneous, noisy data it is difficult to understand

Solution : “Semantically uplift” the data – this allows us to give meaning to the data within a specific context

(2) Integration of the data Problem : The data is multi-format and semi-structured. A traditional

RDBMS approach is complex and very difficult to maintain (essentially we would need the “data model of everything”) – Also this would be extremely difficult to populate, update etc..

Solution : Use linked data to create a dynamic model that can be incrementally built and expose this linked data in a easy to consume way (e.g. RESTful APIs)

Page 5: Data Access Linking and Integration (DALI) - An enabling technology for Data Analytics over large scale, heterogeneous data sets, Martin Stephenson of IBM Smart Cities - Dublinked

©2015 IBM Corporation

How can DALI help ?

Based on the expertise we have gained through working on the research aspects of the Dublinked open data publishing platform, we have developed a prototype system that can

Analyse tabular like data and Databases

Expose the data as linked data

Semantically annotate / enrich the data

Link the data to well-know vocabularies (e.g. IPSV and DBPedia)

Link the data to other (analysed) data sets and databases

Expose it via RESTful APIs

Data Context Links Views Insight Format

These steps are

Semi- Automatic

Page 6: Data Access Linking and Integration (DALI) - An enabling technology for Data Analytics over large scale, heterogeneous data sets, Martin Stephenson of IBM Smart Cities - Dublinked

©2015 IBM Corporation

How does it work ?

Page 7: Data Access Linking and Integration (DALI) - An enabling technology for Data Analytics over large scale, heterogeneous data sets, Martin Stephenson of IBM Smart Cities - Dublinked

©2015 IBM Corporation

DEMO

Page 8: Data Access Linking and Integration (DALI) - An enabling technology for Data Analytics over large scale, heterogeneous data sets, Martin Stephenson of IBM Smart Cities - Dublinked

©2015 IBM Corporation

Use Case – Smarter Care

• Use open data merged with patient data to help provide better outcome for patients

• Build a “Safety Net” of Services that can be used to plan care for an

elderly person • Use Open Data from New York, including

Hospital Performance Scores Locations of services Cost averages from hospitals for treatments

• Use IBM Cúram as the social program system, and integrate the data

using DALI on the back-end. Output from this fed into planning and ranking analytics

Page 9: Data Access Linking and Integration (DALI) - An enabling technology for Data Analytics over large scale, heterogeneous data sets, Martin Stephenson of IBM Smart Cities - Dublinked

©2015 IBM Corporation

Use Case – Smarter Care

Page 10: Data Access Linking and Integration (DALI) - An enabling technology for Data Analytics over large scale, heterogeneous data sets, Martin Stephenson of IBM Smart Cities - Dublinked

©2015 IBM Corporation

Use Case – Smarter Care

Page 11: Data Access Linking and Integration (DALI) - An enabling technology for Data Analytics over large scale, heterogeneous data sets, Martin Stephenson of IBM Smart Cities - Dublinked

©2015 IBM Corporation

Use Case – Smarter Care

Page 12: Data Access Linking and Integration (DALI) - An enabling technology for Data Analytics over large scale, heterogeneous data sets, Martin Stephenson of IBM Smart Cities - Dublinked

©2015 IBM Corporation

Questions ?