Mark Gschwind Enterprise Information Management in SQL Server 2012 Originally Titled “Master Data and Data Quality Management in SQL Server 2012”
Nov 16, 2014
Mark Gschwind
Enterprise Information Management in SQL Server 2012Originally Titled “Master Data and Data Quality Management in SQL Server 2012”
Mark Gschwind Independent Consultant Business Intelligence practitioner, manager since 1995 Over 50 Business BI projects
Data Warehousing/Cubing/Reporting/Data Mining/EIM MCP, certified in Oracle Essbase, Melissa Data MVP Working with clients on EIM since 2008
find me onwww.linkedin.com/in/markgschwind
Blog Site:www.marksbiblog.com
Agenda Enterprise Information Management (EIM)
What is it and why do we need it? Microsoft EIM, 3 technologies working together
DQS• Capabilities• Demo
SSIS MDS
• Capabilities• Demo
EIM=DQS+MDS+SSIS Wrap up Questions
Why Do We Need EIM?
Impediments to EIM Success
Enterprise Information Management (EIM)with SQL Server 2012EIM=DQS+MDS+SSIS
Produce accurate, trustworthy data
Deliver credible, consistent data to the right users with end-to-end data integration,
cleansing & data management
DATA QUALITY SERVICES
Knowledge-based Data Cleansing & Matching
Standalone & SSIS Integrated
MASTER DATA SERVICES
Excel UI to Manage Data & Dimensions and create workflow processes
INTEGRATION SERVICES
Integrated Deployment & Management
Improved User Experience
+People+Processes
What is Data Quality?
DQS: What is Data Quality?
Data Quality represents the degree to which the data is suitable for business usages
Data Quality is built through People + Processes + Technology
Bad Data Bad Business
“Poor data quality can cost companies 15% to 25% (or more) of their operating budget”
- Larry English (International Data Quality Expert)
Common Data Quality Issues
Data Quality
Issue Sample Data Problem
Standard Are data elements consistently defined and understood?
Gender code = M, F, U in one system and Gender code = 0, 1, 2 in another system
Complete Is all necessary data present? 20% of customers’ last name is blank, 50% of zip-codes are 99999
Accurate Does the data accurately represent reality or a verifiable source?
A Supplier is listed as ‘Active’ but went out of business six years ago
Valid Do data values fall within acceptable ranges?
Salary values should be between 60,000-120,000
Unique Data appears several times Both John Ryan and Jack Ryan appear in the system – are they the same person?
Common Issues DQS AddressesName Gender Street House # Zip code City State D.O.B
John Doe Male 60th street 45 New York New York 08/12/64
Jane Doe Male Jonathan ln 36 10023 Poughkeepsy NY 21-dec-1954
Name Gender Street House # Zip code
City State D.O.B
John Doe Male E 60th St 45W 10022 New York NY 08/12/64
Jane Doe Female Jonathan Lane
36 10023 Poughkeepsie NY 12/21/54
Name Address Postal Code City StateJohn Smith 545 S Valley View Drive # 136 34563 Anytown New YorkMargaret & John smith 545 Valley View ave unit 136 34563-2341 Anytown New YorkMaggie Smith 545 S Valley View Dr Anytown New YorkJohn Smith 545 Valley Drive St. 34253 NY NY
Name Address Zip Code City State ClusterJohn Smith 545 S Valley View Drive # 136 34563 Anytown New York 1Margaret & John smith 545 Valley View ave unit 136 34563-2341 Anytown New York 1Maggie Smith 545 S Valley View Dr Anytown New York 1John Smith 545 Valley Drive St. 34253 NY NY 2
Before
Before
After
After
Completeness Accuracy Conformity Consistency Uniqueness
DQS Use Cases
• One-Time cleanupso Merge/Migrate multiple divisional CRMs into one
• Continuous Process with Steward Interventiono Vendor master with continuous trickle of datao Customer master with incomplete data
• Continuous Process with Minimal Interventiono Database marketing mailing list
DQS Process
Build
Use
Knowledge
Management
Match & De-dupeCorre
ct & sta
ndardize
Manage Knowledge
Connect
EnterpriseData
ReferenceData
Cloud Services
Knowledge
Base
Discover / Explore Data
Data Quality Services
Demo
Integrate DQS using SSIS(continuous low-intervention use case)
MDS: What is Master Data?
Master Data is the set of data objects that are at the center of business activities (Customers, Products, Cost Centers, Locations…) requiring
Continuous quality management Ease of use for business users (not just IT) Effective sharing (producing and consuming) Centralized maintenance, by different departments Changes that keep pace with the business
Master Data contains different attributes for different departments (marketing, finance, operations, business groups…)
The challenge: To make a trusted single source of business data used across multiple systems, applications, and processes
MDS Use CasesRegulatory
Enable security management and auditing of data used for regulatory reporting
Data Warehouse / Data Marts Mgmt
Operational Data Management
Enable business users to manage the dimensions and hierarchies of DW / Data Marts
Central data records mgmt and consumption sourced by other operational systems
A company has adopted 6 “best of breed” systems from different vendors. They need to be able to propagate the correct customer information to each system in a consistent way.
MDS provides a platform for central schema, integration points and validation for SI/ISV/Internal IT to develop a custom solution
The IT department has built a data warehouse and reporting platform, but business users complain about the correctness of the dimensions and lack of agility in making updates.
MDS empowers the business users to manage dimensions themselves while IT can govern the changes
There are 3 G/L systems whose G/L accounts need to be consolidated and rolled up to create financial statements for regulatory reporting to several countries
MDS enables an approval process for changes with role-based security and transactional auditing of all changes
Where is Master Data (in a DW)?
Here
Here
Here
Versioning
ValidationAuthoring business rules
to ensure data correctness
ModelingEntities, Attributes,
Hierarchies
Enabling Integration & Sharing
MDS Capabilities
Role-based Security and Transaction Annotation
Master Data Stewardship
External (CRM, ..)
Excel DWH
Loading batched data through
Staging Tables
Consuming data through Views
Registering to changes through
APIs
Excel Add-In Web UI
Workflow / Notifications
Data Matching
(DQS Integrated)
MDS Architecture
MDS Database
Entity BasedStaging Tables
Subscription Views
IIS Service
MDS Service
Excel Add-InWEB-UI
External System
CRM/ERP
Workflow / Notifications
DWH
Excel Cleansing and Matching
(DQS)
Silverlight
SSIS
SSIS
SSIS
BI OLAP
External System
WCF
PW Pivot
BizTalk / Others
Master Data Services
Demo
Business Rules
Business Rules are expressions and actions that can govern the conduct of business processes*
Enable data governance by:-- Enforcing data standards-- Alerting users to data quality issues-- Creating simple workflows
Have limitations, but can be extended
*EIM = DQS+MDS+SSIS+People+Process
Security
Functional area permissions Model/Entity level permissions provide column-
level security
Hierarchy permissions allow row-level security
Use AD groups, not individual users Only use Hierarchy permissions if row-level
security is required
Enterprise Information Management (EIM)with SQL Server 2012EIM=DQS+MDS+SSIS
Produce accurate, trustworthy data
Deliver credible, consistent data to the right users with end-to-end data integration,
cleansing & data management
DATA QUALITY SERVICES
Knowledge-based Data Cleansing & Matching
Standalone & SSIS Integrated
MASTER DATA SERVICES
Excel UI to Manage Data & Dimensions and create workflows
INTEGRATION SERVICES
Integrated Deployment & Management
Improved User Experience
+People+Processes
Key Takeaways
SQL Server has tools to address EIM, the biggest impediment to BI success
EIM is People + Processes enabled by Technology