Top Banner
“Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh,EH14 4AS {pilar,lachlan}@macs.hw.ac.uk Doctoral Consortium
23

“Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Jan 12, 2016

Download

Documents

May Harris
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

“Solving Data Inconsistencies and Data Integration with a Data

Quality Manager”

Presented by Maria del Pilar Angeles, Lachlan M.MacKinnonSchool of Mathematical and Computer Sciences, Heriot-Watt University,

Edinburgh,EH14 4AS{pilar,lachlan}@macs.hw.ac.uk

Doctoral Consortium

Page 2: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 2

Agenda• Introduction• Proposal• Data Quality Manager Components

– Reference Model– Measure Model– Assessment Model– Quality Metadata

• Information Integration Process– Classification of DataSources– Selection of Best Datasources– Query Planning– Data Fusion– Ranking of Query results

• Questions

Page 3: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 3

Naming Data Representation

Domain Data scalingdefinition Data Precision

GeneralizationAbstract Aggregation

Data value attributeSchematic Attribute entitydiscrepancy Data value entity

Known inconsistencyData Value Temporal inconsistency

Acceptable inconsistency

Default value

Database idEntity Namingdefinition Union compatibility

Structural Schema isomorphismConflicts Missing data item

Attribute integrity constraints

(Sheth92)

Introduction

Approached by

Ontology

Metadata

Transformation rules

Mapping

Page 4: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 4

Introduction

Emp_no Name salary

123987 Alastair Freich

14000

456339 Fernando Lujan

NULL

SSN fullname sal

123987 A. Freich 20000

789222 Fiona Shaning

15000

employe SFE salary

123987 Al. Freich NULL

393765 Lauren MacMillan

14500

DS 1 DS 2 DS 3

Employee_number Full_name_employee Salary

123987 Alastair F. 14000

123987 A. Freich 20000

123987 Al. Freich NULL

456339 Fernando Lujan NULL

393765 Lauren MacMillan 14500

789222 Fiona Shaning 15000

Page 5: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 5

Proposal

We propose the development of a Data Quality Manager (DQM) to establish communication between the process of integration of information, the user and the application, to deal with semantic heterogeneity.

Page 6: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 6

Proposal

Local Schema 1

Local User 1 Local User 2 Local User N

Wrapper

Global User 1 Global User 2 Global User 3

ExportSchema 1

ExportSchema N

ExportSchema 2

Mediator

Data Quality Manager

Applications

GlobalSchema

Data Source 1

Data Source 2

Data Source N

WrapperWrapper …

Local Schema 2

Local Schema N

Global User M…

Page 7: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 7

• Definition of Quality Criteria

Reference Model

DQM Components

Page 8: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 8

• Definition of Quality Criteria

• Definition of Metrics

Measurement Model

Reference Model

DQM Components

Page 9: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 9

• Definition of Quality Criteria

• Definition of Metrics

• Definition of Assessment methodsAssessment

Model

Measurement Model

Reference Model

DQM Components

Page 10: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 10

• Definition of Quality Criteria

• Definition of Metrics

• Definition of Assessment methods

• Definition of Quality Metadata (QMD)

QualityMetadata

Assessment Model

Measurement Model

Reference Model

DQM Components

Page 11: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 11

Completeness

Accuracy

Currency

Survey,Queries,

benchmarks

# incomplete # total

# errors # total

Age + delivery time – input time

Based on DQM components, classify the data sources

QMD

QMD Population

DQM: Data Quality Manager

QMD: Quality Meta Data

Page 12: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 12

Data

Quality

Manager

Selection of

Best Data Sources

Information Integration Process

Page 13: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 13

Data

Quality

Manager Query

Planning

Selection of

Best Data Sources

Information Integration Process

Page 14: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 14

Data

Quality

Manager

Fusion of Data

Inconsistencies

Query

Planning

Selection of

Best Data Sources

Information Integration Process

Page 15: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 15

Data

Quality

Manager

Query

Integration

Fusion of Data

Inconsistencies

Query

Planning

Selection of

Best Data Sources

Information Integration Process

Page 16: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 16

Data

Quality

Manager

Ranking of

Query results

Query

Integration

Fusion of Data

Inconsistencies

Query

Planning

Selection of

Best Data Sources

Information Integration Process

Page 17: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 17

User Query

1. The Quality user priorities are given by the user.

Mapping Local/Global

Schemas

Selection of best Data Sources

QMD

Quality User Priorities

Data sources Involved in the

Query1

2 3

4Ranking of best Data Sources

2. The ranking of best data sources involved in the query is given before execution

Page 18: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 18

User Query

Top ranking Query Plan

Query Partition

QMD

Quality User

Priorities

QueryA

QueryB

QueryCPlan 1Plan 2Plan 3

.Plan N

Query Planning

Page 19: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 19

ResultX

DataInconsistencies

DetectionData

fusion

QMD

Quality user

priorities

ResultY

ResultZ

InconsistentQuery Result

ExecuteQueryPlan

Data Fusion

ConsistentQuery Result

As in the DQM is stored where data comes from, it is possible to make decisions at data fusion time.

Page 20: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 20

QMD

Quality user

priorities

DataFusion

ResultJ

ResultK

ResultLQuery

IntegrationQueryResult

Ranking

Ranking Query Result

ConsistentQuery Result

Page 21: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 21

ConclusionUsing Data Quality Manager we can..

• Approach data value level inconsistencies during Information Integration Process, using data quality properties.

• User may demand different quality priorities at query time.

• Manage user quality priorities AND data quality properties to give the expected quality query result by the user.

What we need to do now….

Identify tools for measurement, assessment and develop a QMD.

Store quality of data sources involved in the heterogeneous system.

Identify techniques for

Ranking of data sources and plans involved in the query

Inconsistency detection

Fusion data using data source and data level properties

Ranking of query results.

Page 22: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 22

Questions?

Page 23: “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Solving Data Inconsistencies and Data Integration with a Data Quality ManagerAngeles Maria del Pilar, MacKinnon Lachlan M. 23

Thanks !!