Top Banner
© 2013 IBM Corporation Big Data und Datenqualität selbst erleben Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn
16

Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

May 05, 2018

Download

Documents

vuongnga
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

Big Data und Datenqualität selbst erleben

Hands on Workshop zu

IBM InfoSphere Information Server

Reto Cavegn

Page 2: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

Agenda

12:00 - 12:15 Introduction to IBM Information Server and Data Quality

12:15 - 12:55 Lab 1: Review the Completeness of your Data

13:00 - 13:15 Lab 2: Transforming your data with InfoSphere DataStageand BigInsights

13:15 - 13:30 Review and Q&A

Page 3: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

Information Server Capabilities Address Each of the Requirements for Information Integration

Business Information Exchange

Understanding &

Collaboration

• Information blueprints• Relationship discovery across data sources

• IT-to-business mapping

DataQuality

Cleansing &

Monitoring

• Analysis & validation• Data cleansing• Data quality rules & management

DataIntegration

Transformation

• Massive scalability• Power for any complexity• Total traceability

Delivery

• Data capture at any time• Delivery anywhere• Big data readiness

InfoSphere Information Server Enterprise Edition:Integrating and transforming data and content to deliver accurate, consistent, timely and complete information on a single platform unified by a common metadata layer

Page 4: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

Data in

Motion

Data at

Rest

Data in

Many Forms

Decision

Management

BI and Predictive

Analytics

Navigation

and Discovery

Intelligence

Analysis

Information Governance, Security and Business Continuity

Real-timeAnalytics

Streams

Landing, Analytics and Archive

MapReduce

Hadoop

Integrated Exploration

Warehouse / Marts

Ingestion and Integration

ETL, Quality, MDM

Die IBM Big Data Zonen-Architektur

Page 5: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

Information Server Capabilities Address Each of the Requirements for Information Integration

Business Information Exchange

Understanding &

Collaboration

• Information blueprints• Relationship discovery across data sources

• IT-to-business mapping

DataQuality

Cleansing &

Monitoring

• Analysis & validation• Data cleansing• Data quality rules & management

DataIntegration

Transformation

• Massive scalability• Power for any complexity• Total traceability

Delivery

• Data capture at any time• Delivery anywhere• Big data readiness

InfoSphere Information Server Enterprise Edition:Integrating and transforming data and content to deliver accurate, consistent, timely and complete information on a single platform unified by a common metadata layer

Page 6: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

Data Quality: Cleanse Data and Monitor Quality, Turning Data Assets into Trusted Information

Analyze data and control data quality Cleanse & Monitor Quality

AnalyzeUse source system analysis tounderstand your issues� automated discovery of critical data and hidden data relationships

Control & monitor qualityAssess and monitor the quality of your data in any place (database/or data flow) and across systems� unique capability to align DQ metrics with

business & governance objectives

Page 7: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

Information Server Capabilities Address Each of the Requirementsfor Information Integration

Business Information Exchange

Understanding &

Collaboration

• Information blueprints• Relationship discovery across data sources

• IT-to-business mapping

DataQuality

Cleansing &

Monitoring

• Analysis & validation• Data cleansing• Data quality rules & management

DataIntegration

Transformation

• Massive scalability• Power for any complexity• Total traceability

Delivery

• Data capture at any time• Delivery anywhere• Big data readiness

InfoSphere Information Server Enterprise Edition:Integrating and transforming data and content to deliver accurate, consistent, timely and complete information on a single platform unified by a common metadata layer

Page 8: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

Programming Hadoop

setOptions

({conf:{"mapred.job.name":"DataStage BalOp job IS-SERVER.IBM.COM:dstage1

ff_read_hadoop_write_db2_jaql_balopt_join Customer_source2 4_#DSJobInvocationId#"}});

setOptions({conf:{"mapred.reduce.tasks":1}});

PassCustomer = read(del(location='/user/dsadm/customer', delimiter=',',

quoted=false, schema=schema {CUSTOMER_NUMBER:string, COUNTRY:string, LANGUAGE:string}));

PassCustomer2 = read(del(location='/user/dsadm/customer', delimiter=',',

quoted=true, schema=schema {CUSTOMER_NUMBER:string, COUNTRY:string, LANGUAGE:string}));

DSLink15 = join PassCustomer2, PassCustomer

where PassCustomer2.CUSTOMER_NUMBER == PassCustomer.CUSTOMER_NUMBER

into {CUSTOMER_NUMBER:PassCustomer2.CUSTOMER_NUMBER, COUNTRY:PassCustomer2.COUNTRY, LANGUAGE:PassCustomer2.LANGUAGE}

->sort by [$.CUSTOMER_NUMBER asc];

DSLink15_3 = DSLink15

-> transform {CUSTOMER_NUMBER: (if(isnull($.CUSTOMER_NUMBER)) ''

else $.CUSTOMER_NUMBER), COUNTRY: (if(isnull($.COUNTRY)) ''

else $.COUNTRY), LANGUAGE: (if(isnull($.LANGUAGE)) ''

else $.LANGUAGE)}; DSLink15_3

-> write({location:'/tmp/BalOpTmp_2_#DSJobInvocationId#',

outoptions:{type:'hdfs',

adapter:'com.ibm.jaql.io.stream.FileStreamOutputAdapter',

Page 9: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

Intuitive Job Design for BigData Processing

Job Name on image:

uc3_ff_read_write_to_hadoop_jaql_balopt_join

Job Name on image:

uc3a_ff_read_write_to_hadoop_jaql_balopt_join_Optimized1

The Entire E-T-L Process Transformed into Target JAQL Queries

BDFS stage properties

Discovering the Value of IBM InfoSphere Information Server9

Page 10: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

Generelle Informationen

Login

• Windows user / password: Administrator / inf0server

•Information Server user / password: isadmin / inf0server

Page 11: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

Lab 1: Review the Completeness of your Data

Duration around 45 Minutes

Start the Information Analyzer Console by double-clicking on the Information Server Console icon on the Desktop.

Page 12: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

Lab 2: Transforming your data with InfoSphere DataStage and BigInsights

Duration around 15 Minutes

Start with Lab 2 / Chapter 2.1 Step _94

by double clicking on the InfoSphere DS/QS Designer Client icon on the desktop.

Lab End Lab 2 / Chapter 2.1 Step _102

Page 13: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

Validating Data Rules in InfoSphere Information Server

� Embed Data Rule Definitions

in jobs

� Create new data rules

through the InfoSphere Information Server Designer

�Enables an integrated andcomprehensive development environment for InfoSphere Information Server

Page 14: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

Data Validation: InfoSphere enables integrated data quality management and monitoring

� Data load jobs use Data Rule Definitions to ensure seamless data quality management.

� You can view exceptions in real-time in your Data Quality Dashboard.

Page 15: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

Thank You!

Page 16: Hands on Workshop zu IBM InfoSphere Information Server€¦ ·  · 2013-11-20Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn ... Integrating and transforming data

© 2013 IBM Corporation

ibm.com/bigdata