Top Banner
Integrating Data from Multiple Sources 2015-02-26 David Loshin Knowledge Integrity, Inc. [email protected] © 2015 Knowledge Integrity, Inc [email protected] (301) 754-6350 1
26
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Working With Different Kinds of Data

Integrating Data fromMultiple Sources

2015-02-26

David Loshin

Knowledge Integrity, Inc.

[email protected]

© 2015 Knowledge Integrity, Inc [email protected] (301) 754-6350 1

Page 2: Working With Different Kinds of Data

Ingesting Data from Multiple Sources

• Continuously streamed data sources may influence business performance analytics:– Influence customer

satisfaction

– Expose opportunities for revenue generation

– Identify brand risk

– Flag fraud and abuse

– Improve customer profiling and customer experience

© 2015 Knowledge Integrity, [email protected](301) 754-6350

2

Page 3: Working With Different Kinds of Data

Challenges

• Entity identifiability

• Limited or no data governance

• Editorial bias

• Absence of metadata

© 2015 Knowledge Integrity, [email protected](301) 754-6350

3

Page 4: Working With Different Kinds of Data

Entity Identifiability

• Recognizing and resolving identities is challenging for static, complete data sets

• Entity identifiability becomes more challenging when merging static and streamed information:– Entity attribute identification

– Entity recognition

– Identity resolution

– Linkage across data sets

© 2015 Knowledge Integrity, [email protected](301) 754-6350

4

Is this the same guy?

Page 5: Working With Different Kinds of Data

Limited or No Data Governance

• Little or no knowledge of– Defined data quality criteria

– Edits or controls

– Chain of accountability

• Limited shared definitions– Typically tabular data dictionaries with nondescript

definitions

• Harvested data has no discernable lineage– Completely devoid of context or production chain

© 2015 Knowledge Integrity, [email protected](301) 754-6350

5

Page 6: Working With Different Kinds of Data

Editorial Bias

• Creating data sets for external consumption involves editorial decisions and biases

• Choices are made about– The physical structure of the data values

– Which data elements are included

– Which are excluded from the final artifact

© 2015 Knowledge Integrity, [email protected](301) 754-6350

6

Selection criteria

Page 7: Working With Different Kinds of Data

Absence of Metadata

• Numerous data sources have little or no metadata at all– Dynamically harvested tabular data

– Scraped data

– Human-generated content

– Automata-generated content

– Unstructured data artifacts

– Other data artifacts (graphics, images, video, audio, etc.)

© 2015 Knowledge Integrity, [email protected](301) 754-6350

7

Page 8: Working With Different Kinds of Data

Example: Healthcare Provider Data

• NPPESProvider First Line Business Mailing Address

• Definition:– “provider’s first line business

mailing address”

• Open PaymentsRecipient_Primary_Business_Street_Address_Line_1

• Definition:– “The first line of the primary

practice/business street address of the physician or teaching hospital (covered recipient) receiving the payment or other transfer of value.”

© 2015 Knowledge Integrity, [email protected](301) 754-6350

8

• Is “provider” the same as “recipient”? • Are these conformant data elements?• Actually it turns out that the Open Payments data element is sourced from

the NPPES data set!

Page 9: Working With Different Kinds of Data

Preparing to Integrate

• Infer the source data sets metadata

• Determine if the data element inventories are structurally conformable

• Determine if the data element inventories are semantically conformable

© 2015 Knowledge Integrity, [email protected](301) 754-6350

9

Page 10: Working With Different Kinds of Data

Inferring Metadata Using Profiling

• Analysis of data sets, records, data elements, and data values to– Infer data element types and sizes

– Identify reference value domains

– Make educated guesses about intent/meaning

© 2015 Knowledge Integrity, [email protected](301) 754-6350

10

Attribute

First d 4 6 y

Last f 6 2 h

Street d 4 7 n

City a 0 2 o

State

Value Count

A 12000

I 10000

L 7655

X 3208

N 120

M 8

Profiling

Page 11: Working With Different Kinds of Data

Conformable Data Elements

• Data elements are conformable if– Share the same data element concept

– Share the same value domain

– Share the same definition and semantics

© 2015 Knowledge Integrity, [email protected](301) 754-6350

11

• These two data elements are conformable if their definitions are the same!

CountryOfOrigin2-character IDO 3166 Country Code

CountryOfManufacture2-character IDO 3166 Country Code

Page 12: Working With Different Kinds of Data

Using Metadata to Test Conformability

• Inferred structural metadata provides the first cut at determining whether two data elements are conformable

• Introduce internal governance and management around external metadata– Use a metadata repository to capture inferred metadata

– Define policies for identification, assessment, documentation, and use of external data sources

– Institute stewardship for each external data source for process management, validation, and maintenance

• Select a metadata tool that provides– Enterprise-wide metadata visibility

– Integration with data assessment tools

– Historical lineage for metadata capture

– Collaboration among data consumers

© 2015 Knowledge Integrity, [email protected](301) 754-6350

12

Page 13: Working With Different Kinds of Data

Questions & Suggestions

• www.knowledge-integrity.com

• www.dataqualitybook.com

• www.decisionworx.com

• If you have questions, comments, or suggestions, please contact me

David Loshin

301-754-6350

[email protected]

© 2015 Knowledge Integrity, [email protected](301) 754-6350

13

Page 14: Working With Different Kinds of Data

EMBARCADERO TECHNOLOGIESEMBARCADERO TECHNOLOGIES

Joy RuffProduct Marketing Manager | ER/[email protected]

ER/Studio Team Server Overview

Page 15: Working With Different Kinds of Data

EMBARCADERO TECHNOLOGIES

Keeping pace with the rapid growth of data, change and compliance

Evolving Database

EcosystemsVolume, Velocity,

Variety

Agile Development

CyclesMaximizing IT

InfrastructureComplianceLimited

Resources

Database Professionals Need the Right Tools

15

Page 16: Working With Different Kinds of Data

EMBARCADERO TECHNOLOGIES

Share Models & Metadata with Business & IT

3

Team Server

ER Repository

Modeling Teams

• Business

Analysts

• Executives

• App and DB Developers

• Data Stewards

• DBAs

Page 17: Working With Different Kinds of Data

EMBARCADERO TECHNOLOGIES

• Powerful enterprise glossary & metadata collaboration

• Integrate key business terms and definitions with business systems

• View, store, and manage a single source of business definitions

• Attach business policies to daily workflows with contextual alerts

and tips

Page 18: Working With Different Kinds of Data

EMBARCADERO TECHNOLOGIES

The Power of Unlimited Involvement

• Use business terms to easily locate and relate information assets

• Maintain enterprise glossaries, terms, and underlying metadata in a central interface

• Enable a consistent flow of information and collaboration around data management

18

Contributors

Business

Architecture

IT

Definition

Structure

Deployment

Synd

ication

Co

llabo

ration

Consumers

Executive

Analyst

Developer

Integration

Page 19: Working With Different Kinds of Data

EMBARCADERO TECHNOLOGIES

Benefit of Relating Metadata to Models

• Expand the depth of information by accessing the underlying framework

19

• Models and terms seamlessly integrate to one another

Page 20: Working With Different Kinds of Data

EMBARCADERO TECHNOLOGIES

The Primary Resource for Data Information

20

• Manage a single source of business definitions in an enterprise glossary

• Avoid the issue of information stagnation

• Improve productivity and accuracy in data analysis, application, BI and ETL development

Page 21: Working With Different Kinds of Data

EMBARCADERO TECHNOLOGIES

Data Source Registry

Page 22: Working With Different Kinds of Data

EMBARCADERO TECHNOLOGIES

Unified Glossary and Terms

22

Page 23: Working With Different Kinds of Data

EMBARCADERO TECHNOLOGIES

Empowering the Organization

23

!

© 2014 Embarcadero Technologies, Inc. Embarcadero, the Embarcadero Technologies logos, and all other Embarcadero Technologies

product or service names are trademarks or registered trademarks of Embarcadero Technologies, Inc. All other trademarks are property of

their respective owners. | 102714

!

!Embarcadero!Technologies!has!been!committed!to!developing!industry7leading!tools!

in!the!database!management!and!architecture!space!for!over!20!years.!!Our!ER/Studio*Team*Server*Core!environment!is!the!next!step!on!that!journey,!offering!modeling!and!metadata!collaboration!and!management.!!Your!IT!and!business!users!gain!visibility!to!existing!data!assets!at!a!deeper!level,!enabling!their!leverage!as!the!critical!decision7making!assets!they!can!and!should!be!–!across!the!enterprise.!!If!you!found!Portal!to!be!useful,!you’re!going!to!love!Team!Server!Core.!

!!

The!added!functionality!of!Team*Server*Core,!including!unlimited!web!user!read/write!access!so!that!all!stakeholders!in!the!company!are!able!to!contribute!to!and!have!access!to!the!critical!models,!metadata,!and!the!enterprise!data!dictionary!(glossary).!!Security!and!data!rights!management!have!also!been!enhanced,!so!you!can!have!complete!confidence!that!your!data!is!protected!and!shared!with!the!right!people!at!the!right!times!and!in!the!right!formats.!

Product(Feature(( Definition((Team(Server(Core(

Portal(

Inline(Definitions(Integrate!enterprise!business!definitions!with!data!management!tools!and!internal!web!assets!into!daily!workflows!

! !

Privacy(and(Security(Alerts(

Adhere!to!industry!regulations!and!business!standards!regarding!security!and!privacy!by!alerting!users!who!view!or!modify!sensitive!data!within!integrated!data!management!tools!

! !

Semantic(Mapping(Develop!applications!and!analyses!faster!by!using!business!terms!to!easily!find!data!elements!

! !

Mapped(Data(Source(Registry(

Generate!information!maps!by!relating!data!models!with!their!data!sources!and!creating!a!single!searchable!registry!of!all!available!data!sources!to!store!information!in!one!place!

! !

Centralized(Reporting(Create!and!share!integrated!reports!using!standard!templates!and!a!reporting!wizard!for!ad!hoc!reports!

! !

Team(Collaboration(Apply!enterprise!collaboration!capabilities!to!capture!and!use!corporate!knowledge!to!reduce!time!identifying!and!correcting!expensive!data!quality!issues!

! !

Model(Sharing(Distribute!and!view!models!across!the!organization,!and!set!permissions!for!visibility!of!objects!

! !

Enterprise(Glossary(View,!classify,!relate!and!centrally!store!authoritative!business!definitions!in!an!extensible!enterprise!glossary!

! !

Custom(Extensions(Enhance!comprehension!of!business!terms!and!data!elements!with!custom!extensions!

! !

Unlimited(Access(to(Metadata(

View,!share,!and!update!the!enterprise!glossaries,!business!terms,!and!custom!attributes,!via!the!web!interface,!for!any!business!or!IT!user!

! !

Limit the level of confusion by centralizing glossaries, terms, and object relationships

• Discuss and add to the development of models and metadata

• Track and gain insight into who and what information has changed in the environment

Page 24: Working With Different Kinds of Data

EMBARCADERO TECHNOLOGIES

Team Collaboration

Page 25: Working With Different Kinds of Data

EMBARCADERO TECHNOLOGIES

The Right Tools are Everything Discover the Benefits of the Ultimate Cross-Platform Database Tools

25

Page 26: Working With Different Kinds of Data

EMBARCADERO TECHNOLOGIES

Thank you!

• Learn more about the ER/Studio product family: http://www.embarcadero.com/data-modeling

• Team Server Hosted Trial: http://www.embarcadero.com/products/er-studio/team-server-hosted-trial

• To arrange a demo, please contact Embarcadero Sales: [email protected], (888) 233-2224

26