Top Banner
AN INTRODUCTION TO DATA QUALITY SERVICES koen verbeeck BI consultant
30

An introduction to Data Quality Services (DQS)

Dec 24, 2014

Download

Technology

Speaker: Koen Verbeeck

Download SQL Server 2012: http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An introduction to Data Quality Services (DQS)

AN INTRODUCTION TODATA QUALITY SERVICES

koen verbeeckBI consultant

Page 2: An introduction to Data Quality Services (DQS)

WHO AM I

• BI consultant @ Ordina

• member of SQLUG.be

• MCTS, MCITP in SQL Server 2008

• working with Microsoft BI for over 2 years

• beer and comic books enthusiast

• married with children…

Page 3: An introduction to Data Quality Services (DQS)

INTRODUCTION

data quality?

• achieved through people, technology & processes• can be measured with various dimensions

• accuracy• consistency• completeness• duplicates (uniqueness)• timeliness• validness

• bad data = bad business

Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J. M. Juran). - Wikipedia on Data Quality

Page 4: An introduction to Data Quality Services (DQS)

INTRODUCTION

Data Quality

Issue Sample Data Problem

Standard Are data elements consistently defined and understood?

Gender code = M, F, U in one system and Gender code = 0, 1, 2 in another system

Complete Is all necessary data present ?

20% of customers’ last name is blank, 50% of zip-codes are 99999

Accurate Does the data accurately represent reality or a verifiable source?

A supplier is listed as ‘Active’ but went out of business six years ago

Valid Do data values fall within acceptable ranges?

Temperature recordings should be between -100°C and +100°C

Unique Data appears several times Prince, The Artist formerly known as Prince, The Artist, … are they the same person?

Page 5: An introduction to Data Quality Services (DQS)

INTRODUCTION

Cleansing

MatchingProfiling

Monitoring

Monitoring Tracking and monitoring the state of Quality activities and Quality of Data

Cleansing Amend, remove or enrich data that is incorrect or incomplete. This includes correction, standardization and enrichment.

Profiling Analysis of the data source to provide insight into the quality of the data and help to identify data quality issues.

MatchingIdentifying, linking or merging related entries within or across sets of data.

Page 6: An introduction to Data Quality Services (DQS)

OUTLINE

• introduction

• overview of data quality services

• building a knowledge base

• data cleansing & matching

• SSIS integration

• conclusion

Page 7: An introduction to Data Quality Services (DQS)

OVERVIEW OF DQS

Data Quality Services (DQS) is a Knowledge-Driven data quality solution, enabling IT Pros and data

stewards to easily improve the quality of their data

Page 8: An introduction to Data Quality Services (DQS)

Knowledge Discovery

Semantics

Open and Extendible

Easy to use

Knowledge-Driven

OVERVIEW OF DQS

Based on a Data Quality Knowledge Base (DQKB)

Data Domains capture the semantics of your data

Acquires additional knowledge the more you use it

Support use of user-generated knowledge and IP by 3rd party reference data providers

Compelling user experience designed for increased productivity

Page 9: An introduction to Data Quality Services (DQS)

OVERVIEW OF DQS

• easy installation• pre-installation checks

o SQL Server 2012 database engine (server)o .NET 4.0 & IE 6.0 or higher (client)

• installation of DQS using SQL Server set-up

• post-installation taskso run DQSInstaller.exeo grant DQS roles to userso enable TCP/IP

Page 10: An introduction to Data Quality Services (DQS)

OUTLINE

• introduction

• overview of data quality services

• building a knowledge base

• data cleansing & matching

• SSIS integration

• conclusion

Page 11: An introduction to Data Quality Services (DQS)

BUILDING A KNOWLEDGE BASE

Build

UseDQ Projects

KnowledgeManagement

Match &

De-dupe

Correc

t

& sta

ndardiz

e

Knowledge

Manage

Discover / Explore Data / Connect

ReferenceData

Cloud Services

EnterpriseData

IntegratedProfiling

Notifications

ProgressStatus

Knowledge

Base

Page 12: An introduction to Data Quality Services (DQS)

BUILDING A KNOWLEDGE BASE

DomainsRepresent the data

type

Values

Rules & Relation

s

3rd party Reference Data Knowledg

e Base

Composite

Domains

Matching Policy

Domains

Page 13: An introduction to Data Quality Services (DQS)

DEMOour first knowledge base

Page 14: An introduction to Data Quality Services (DQS)

Z85HVQ4

Page 15: An introduction to Data Quality Services (DQS)

BUILDING A KNOWLEDGE BASE

• iterative process• knowledge discovery

• gather knowledge fromo Excelo SQL Server

• profiling of datao not the same as SSIS profiling task!

• automatically detects anomalies

Page 16: An introduction to Data Quality Services (DQS)

BUILDING A KNOWLEDGE BASE

• domain management• knowledge about fields is kept in domains

• data steward cano create ruleso assign synonyms and correctionso create term based relations (str. street)o link domains together into

composite domains

• import knowledge fromo reference data (e.g. Azure Marketplace)o other knowledge bases

Page 17: An introduction to Data Quality Services (DQS)

OUTLINE

• introduction

• overview of data quality services

• building a knowledge base

• data cleansing & matching

• SSIS integration

• conclusion

Page 18: An introduction to Data Quality Services (DQS)

DATA CLEANSING & MATCHING

• cleansing• why?o identifies incomplete or incorrect datao standardizes and enriches data by using

domain values, domain rules and reference data

• DQS cleansingo create a knowledge base or select an existing oneo create a data quality projecto 2-step process

– computer assisted cleansing– interactive cleansing

o export results

• St. --> street (corrected)• Microsot --> Microsoft (corrected)

• john.doe@hotmail (invalid)• 0472/34672 (invalid)

• Verbeek --> Verbeeck (suggested)

Page 19: An introduction to Data Quality Services (DQS)

DATA CLEANSING & MATCHING

• matching• why?o identify duplicates with the data sourceo create consolidated view of data

• DQS matchingo build a matching policy in KBomatching trainingo create matching projecto choose survivors

DQ Client – Match Results

• Prince• The Artist Formerly Known

As Prince• The Artist

• Jon Doe, High Street 13, NY, [email protected]

John Doe, High Str, NY, [email protected]

Page 20: An introduction to Data Quality Services (DQS)

DEMOcleanse datause a matching policy to find duplicates

Page 21: An introduction to Data Quality Services (DQS)

DATA CLEANSING & MATCHING

• create a cleansing project• uses knowledge gathered in a DQS knowledge base

• simple user-friendly process

• profile results

Page 22: An introduction to Data Quality Services (DQS)

DATA CLEANSING & MATCHING

• create a matching project• uses a matching policy created

in a knowledge base

• eliminates duplicates

• profile results

• the more knowledge that is added the better results will beo tip: clean-up the data first using a cleansing project

• choose survivors at the end

• export results into .csvor SQL Server

Page 23: An introduction to Data Quality Services (DQS)

OUTLINE

• introduction

• overview of data quality services

• building a knowledge base

• data cleansing & matching

• SSIS integration

• conclusion

Page 24: An introduction to Data Quality Services (DQS)

SSIS INTEGRATION

Knowledge Base

Reference Data Definition

Values/Rules

New Records

Corrections &Suggestions

Correct Records

Invalid Records

Source + Mapping

Data correctionComponent

SSIS Package

Destination

Reference Data

Services

DQS Server

SSIS Data Flow

Page 25: An introduction to Data Quality Services (DQS)

DEMOan SSIS cleansing project

Page 26: An introduction to Data Quality Services (DQS)

SSIS INTEGRATION

• cleaning as a batch process

• only cleaning, matching is (not yet?) possible

• composite domains are supported

Page 27: An introduction to Data Quality Services (DQS)

OUTLINE

• introduction

• overview of data quality services

• building a knowledge base

• data cleansing & matching

• SSIS integration

• conclusion

Page 28: An introduction to Data Quality Services (DQS)

CONCLUSION

Rich Knowledge BaseContinuous improvement and knowledge acquisitionBuild once, reuse for multiple DQ improvements

Focus on productivity and user experienceDesigned for business usersOut-of-the-box knowledge

Focus on cloud-basedReference DataUser-generated knowledgeIntegration with SSIS

Knowledge-driven Easy To Use Open & Extendible

Page 29: An introduction to Data Quality Services (DQS)

RESOURCES

• DQS Team Blog @ MSDNhttp://blogs.msdn.com/b/dqs/

• DQS documentation @ MSDNhttp://msdn.microsoft.com/en-us/library/ff877917(v=sql.110).aspx

• SQL Server 2012 Resource Center (nice How-To videos)http://msdn.microsoft.com/en-us/sqlserver/ff898410.aspx

• DQS Forum @ MSDNhttp://social.msdn.microsoft.com/Forums/en-US/sqldataqualityservices/threads

• TechEd presentation about DQS by Elad Ziklikhttp://channel9.msdn.com/Events/TechEd/NorthAmerica/2011/DBI207

Page 30: An introduction to Data Quality Services (DQS)

THE END thanks for watching!