Top Banner
• Data Profiling https://store.theartofservice.com/the-data-profiling- toolkit.html
30
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Profiling .

• Data Profiling

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 2: Data Profiling .

Business intelligence Amount and quality of available data

1 Before implementation it is a good idea to do data profiling

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 3: Data Profiling .

Business intelligence Amount and quality of available data

1 Data Profiling: check inappropriate value, null/empty

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 4: Data Profiling .

Data quality - Overview

1 Data profiling - initially assessing the data to understand its quality challenges

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 5: Data Profiling .

Extract, transform, load - Challenges

1 The range of data values or data quality in an operational system may exceed the expectations of designers at the time

validation and transformation rules are specified. Data profiling of a source during

data analysis can identify the data conditions that must be managed by

transform rules specifications. This leads to an amendment of validation rules explicitly

and implicitly implemented in the ETL process.

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 6: Data Profiling .

Extract, transform, load - Virtual ETL

1 By using a persistent metadata repository, ETL tools can transition

from one-time projects to persistent middleware, performing data

harmonization and data profiling consistently and in near-real time.

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 7: Data Profiling .

Extract, transform, load - Tools

1 Many ETL vendors now have data profiling, data quality, and metadata capabilities

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 8: Data Profiling .

Data profiling

1 Data profiling is the process of examining the data available in an

existing data source (e.g. a database or a file) and collecting statistics and

information about that data. The purpose of these statistics may be

to:

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 9: Data Profiling .

Data profiling - Introduction

1 Thus the purpose of data profiling is both to validate metadata when it is available and to discover metadata

when it is not

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 10: Data Profiling .

Data profiling - How to do Data Profiling

1 Normally purpose-built tools are used for data profiling to

ease the process

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 11: Data Profiling .

Data profiling - When to Conduct Data Profiling

1 An additional time to conduct data profiling is during the data

warehouse development process after data has been loaded into

staging, the data marts, etc

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 12: Data Profiling .

Data profiling - Benefits of Data Profiling

1 Although data profiling is effective, then do remember to find a suitable

balance and do not slip into “analysis paralysis”.

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 13: Data Profiling .

Surveillance - Data mining and profiling

1 Data profiling can be an extremely powerful tool for psychological and social network

analysis

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 14: Data Profiling .

Prototype - Data prototyping

1 To achieve this, a data architect uses a graphical interface to interactively develop and execute transformation and cleansing rules using raw data. The resultant data is

then evaluated and the rules refined. Beyond the obvious visual checking of the data on-

screen by the data architect, the usual evaluation and validation approaches are to use Data profiling software and then to insert the resultant data into a test version of the

target application and trial its use.

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 15: Data Profiling .

Data loading - Virtual ETL

1 By using a persistent metadata repository, ETL tools can transition

from one-time projects to persistent middleware, performing data

harmonization and data profiling consistently and in near-real time.

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 16: Data Profiling .

Angoss - Software

1 * KnowledgeSEEKER is a data mining product. Its features include data profiling, data visualization and

decision tree analysis.[http://www.comsol.ch/conte

nt.php?si=317id=132anzeige=Angoss%20Products COMSOL ONLINE -

ANGOSS - Knowledge Engineering] It was first released in 1990.

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 17: Data Profiling .

Angoss - Software

1 * KnowledgeSTUDIO is a data mining and predictive analytics suite for the model development and deployment

cycle. Its features include data profiling, data visualization, decision tree analysis, predictive modeling,

implementation, scoring, validation, monitoring and scorecard

development.

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 18: Data Profiling .

Integration competency center - Central services ICC

1 It also offers more support for development projects, providing

management, development resources, data profiling, data

quality, and unit testing

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 19: Data Profiling .

IBM Infosphere - IBM InfoSphere software

1 * IBM InfoSphere Information Analyzer [

http://www-01.ibm.com/software/data/infosphere/information-analyzer/ IBM - Data Profiling, Data Rules and

Quality Monitoring - InfoSphere Information Analyzer - Software] to

profile and track data quality

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 20: Data Profiling .

Talend - Data management

1 * Talend Open Studio for Data Quality: an open source data profiling

tool that examines the content, structure and quality of complex data

structures

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 21: Data Profiling .

Oracle Warehouse Builder - Features

1 Further it offers capabilities for Relational model|relational,

Dimensional modeling|dimensional and metadata modeling|metadata data modeling, data profiling, data

cleansing and data auditing

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 22: Data Profiling .

Oracle Warehouse Builder - History

1 The 10gR1 release was essentially a certification of the 10g database, and

the 10gR2 release (code named Paris) was a huge release

incorporating a wide spectrum of functionality from dimensional modelling to data profiling and

quality

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 23: Data Profiling .

Data quality assurance

1 'Data quality assurance' is the process of Data profiling|profiling the data to discover inconsistencies and other anomalies in the data, as well

as performing data cleansing activities (e.g. removing outliers,

missing data interpolation) to improve the data quality .

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 24: Data Profiling .

Data quality assurance - Overview

1 #Data profiling - initially assessing the data to understand its quality challenges

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 25: Data Profiling .

Data movement - Virtual ETL

1 By using a persistent metadata repository, ETL tools can transition

from one-time projects to persistent middleware, performing data

harmonization and data profiling consistently and in near-real time.

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 26: Data Profiling .

Jumper 2.0 - Features

1 * User published data profiling

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 27: Data Profiling .

Information Server - Architecture overview

1 :*Understand — data profiling and metadata creation to understand the

content, quality, and structure of information as it resides in source

systems

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 28: Data Profiling .

Information Server - History

1 The core technologies of an information server are not new. Data integration

technologies like extract, transform, and load (ETL), data cleansing and matching (both

relational and probabilistic approaches), data profiling, and data federation or replication

have been around for many years. Reputable vendors and several discrete but inter-related markets focus on solutions for these differing styles of data integration (ETL, data quality,

data replication, data federation, etc.).

https://store.theartofservice.com/the-data-profiling-toolkit.html

Page 29: Data Profiling .

Covert surveillance - Data mining and profiling

1 Data profiling can be an extremely powerful tool for psychological and social network

analysis

https://store.theartofservice.com/the-data-profiling-toolkit.html