Data Access Linking and Integration (DALI) An enabling technology for Data Analytics over large scale, heterogeneous data sets Martin Stephenson Lead Architect – Semantic and Reasoning Systems IBM Research Ireland
Jul 14, 2015
1
Data Access Linking and Integration (DALI)
An enabling technology for Data Analytics over
large scale, heterogeneous data sets
Martin Stephenson
Lead Architect – Semantic and Reasoning Systems
IBM Research Ireland
©2015 IBM Corporation
Background
• 20 years working in software development and design in various
companies
Banking
Telecoms
Biometric Security
Innovation and development
• Currently work in IBM Research Ireland
Semantics and Reasoning group
Mix of PhD researchers and software engineers
Working with heterogeneous, noisy and semi-structured data
Focus on semantic data integration
• Responsible for the Dublinked backend
Design of metadata
Design, development and maintenance of backend
publishing system
©2015 IBM Corporation
Open data – what and why ?
What ?
• Cities, governments and their service providers are starting to make data available on
the internet
• Generally, this data consists of business or operational data.
• Typically, it has the following characteristics:
Heterogeneous
Semi-Structured
Many different formats (e.g. xls, csv, pdf, xml, kml etc)
• This is in addition to the vast amount of data already on the internet.
All of the above is “Open data”
Why ?
• Transparency
• Improved or new private products and services
• Improved effectiveness of services
• Improved efficiency of services
• Innovation
• New knowledge from existing data sources, combined data sources and patterns in
large data volumes
©2015 IBM Corporation
How do I expose this data for use ?
In order to leverage this open data and utilise it, there are 2 main things we need to be able to do: 1) Understand the data 2) Integrate it with our existing corpus of data and systems
(1) Understanding the data
Problem : As this is heterogeneous, noisy data it is difficult to understand
Solution : “Semantically uplift” the data – this allows us to give meaning to the data within a specific context
(2) Integration of the data Problem : The data is multi-format and semi-structured. A traditional
RDBMS approach is complex and very difficult to maintain (essentially we would need the “data model of everything”) – Also this would be extremely difficult to populate, update etc..
Solution : Use linked data to create a dynamic model that can be incrementally built and expose this linked data in a easy to consume way (e.g. RESTful APIs)
©2015 IBM Corporation
How can DALI help ?
Based on the expertise we have gained through working on the research aspects of the Dublinked open data publishing platform, we have developed a prototype system that can
Analyse tabular like data and Databases
Expose the data as linked data
Semantically annotate / enrich the data
Link the data to well-know vocabularies (e.g. IPSV and DBPedia)
Link the data to other (analysed) data sets and databases
Expose it via RESTful APIs
Data Context Links Views Insight Format
These steps are
Semi- Automatic
©2015 IBM Corporation
How does it work ?
©2015 IBM Corporation
Use Case – Smarter Care
• Use open data merged with patient data to help provide better outcome for patients
• Build a “Safety Net” of Services that can be used to plan care for an
elderly person • Use Open Data from New York, including
Hospital Performance Scores Locations of services Cost averages from hospitals for treatments
• Use IBM Cúram as the social program system, and integrate the data
using DALI on the back-end. Output from this fed into planning and ranking analytics
©2015 IBM Corporation
Use Case – Smarter Care
©2015 IBM Corporation
Use Case – Smarter Care
©2015 IBM Corporation
Use Case – Smarter Care
©2015 IBM Corporation
Questions ?