© 2013 IBM Corporation Big Data und Datenqualität selbst erleben Hands on Workshop zu IBM InfoSphere Information Server Reto Cavegn
© 2013 IBM Corporation
Big Data und Datenqualität selbst erleben
Hands on Workshop zu
IBM InfoSphere Information Server
Reto Cavegn
© 2013 IBM Corporation
Agenda
12:00 - 12:15 Introduction to IBM Information Server and Data Quality
12:15 - 12:55 Lab 1: Review the Completeness of your Data
13:00 - 13:15 Lab 2: Transforming your data with InfoSphere DataStageand BigInsights
13:15 - 13:30 Review and Q&A
© 2013 IBM Corporation
Information Server Capabilities Address Each of the Requirements for Information Integration
Business Information Exchange
Understanding &
Collaboration
• Information blueprints• Relationship discovery across data sources
• IT-to-business mapping
DataQuality
Cleansing &
Monitoring
• Analysis & validation• Data cleansing• Data quality rules & management
DataIntegration
Transformation
• Massive scalability• Power for any complexity• Total traceability
Delivery
• Data capture at any time• Delivery anywhere• Big data readiness
InfoSphere Information Server Enterprise Edition:Integrating and transforming data and content to deliver accurate, consistent, timely and complete information on a single platform unified by a common metadata layer
© 2013 IBM Corporation
Data in
Motion
Data at
Rest
Data in
Many Forms
Decision
Management
BI and Predictive
Analytics
Navigation
and Discovery
Intelligence
Analysis
Information Governance, Security and Business Continuity
Real-timeAnalytics
Streams
Landing, Analytics and Archive
MapReduce
Hadoop
Integrated Exploration
Warehouse / Marts
Ingestion and Integration
ETL, Quality, MDM
Die IBM Big Data Zonen-Architektur
© 2013 IBM Corporation
Information Server Capabilities Address Each of the Requirements for Information Integration
Business Information Exchange
Understanding &
Collaboration
• Information blueprints• Relationship discovery across data sources
• IT-to-business mapping
DataQuality
Cleansing &
Monitoring
• Analysis & validation• Data cleansing• Data quality rules & management
DataIntegration
Transformation
• Massive scalability• Power for any complexity• Total traceability
Delivery
• Data capture at any time• Delivery anywhere• Big data readiness
InfoSphere Information Server Enterprise Edition:Integrating and transforming data and content to deliver accurate, consistent, timely and complete information on a single platform unified by a common metadata layer
© 2013 IBM Corporation
Data Quality: Cleanse Data and Monitor Quality, Turning Data Assets into Trusted Information
Analyze data and control data quality Cleanse & Monitor Quality
AnalyzeUse source system analysis tounderstand your issues� automated discovery of critical data and hidden data relationships
Control & monitor qualityAssess and monitor the quality of your data in any place (database/or data flow) and across systems� unique capability to align DQ metrics with
business & governance objectives
© 2013 IBM Corporation
Information Server Capabilities Address Each of the Requirementsfor Information Integration
Business Information Exchange
Understanding &
Collaboration
• Information blueprints• Relationship discovery across data sources
• IT-to-business mapping
DataQuality
Cleansing &
Monitoring
• Analysis & validation• Data cleansing• Data quality rules & management
DataIntegration
Transformation
• Massive scalability• Power for any complexity• Total traceability
Delivery
• Data capture at any time• Delivery anywhere• Big data readiness
InfoSphere Information Server Enterprise Edition:Integrating and transforming data and content to deliver accurate, consistent, timely and complete information on a single platform unified by a common metadata layer
© 2013 IBM Corporation
Programming Hadoop
setOptions
({conf:{"mapred.job.name":"DataStage BalOp job IS-SERVER.IBM.COM:dstage1
ff_read_hadoop_write_db2_jaql_balopt_join Customer_source2 4_#DSJobInvocationId#"}});
setOptions({conf:{"mapred.reduce.tasks":1}});
PassCustomer = read(del(location='/user/dsadm/customer', delimiter=',',
quoted=false, schema=schema {CUSTOMER_NUMBER:string, COUNTRY:string, LANGUAGE:string}));
PassCustomer2 = read(del(location='/user/dsadm/customer', delimiter=',',
quoted=true, schema=schema {CUSTOMER_NUMBER:string, COUNTRY:string, LANGUAGE:string}));
DSLink15 = join PassCustomer2, PassCustomer
where PassCustomer2.CUSTOMER_NUMBER == PassCustomer.CUSTOMER_NUMBER
into {CUSTOMER_NUMBER:PassCustomer2.CUSTOMER_NUMBER, COUNTRY:PassCustomer2.COUNTRY, LANGUAGE:PassCustomer2.LANGUAGE}
->sort by [$.CUSTOMER_NUMBER asc];
DSLink15_3 = DSLink15
-> transform {CUSTOMER_NUMBER: (if(isnull($.CUSTOMER_NUMBER)) ''
else $.CUSTOMER_NUMBER), COUNTRY: (if(isnull($.COUNTRY)) ''
else $.COUNTRY), LANGUAGE: (if(isnull($.LANGUAGE)) ''
else $.LANGUAGE)}; DSLink15_3
-> write({location:'/tmp/BalOpTmp_2_#DSJobInvocationId#',
outoptions:{type:'hdfs',
adapter:'com.ibm.jaql.io.stream.FileStreamOutputAdapter',
© 2013 IBM Corporation
Intuitive Job Design for BigData Processing
Job Name on image:
uc3_ff_read_write_to_hadoop_jaql_balopt_join
Job Name on image:
uc3a_ff_read_write_to_hadoop_jaql_balopt_join_Optimized1
The Entire E-T-L Process Transformed into Target JAQL Queries
BDFS stage properties
Discovering the Value of IBM InfoSphere Information Server9
© 2013 IBM Corporation
Generelle Informationen
Login
• Windows user / password: Administrator / inf0server
•Information Server user / password: isadmin / inf0server
© 2013 IBM Corporation
Lab 1: Review the Completeness of your Data
Duration around 45 Minutes
Start the Information Analyzer Console by double-clicking on the Information Server Console icon on the Desktop.
© 2013 IBM Corporation
Lab 2: Transforming your data with InfoSphere DataStage and BigInsights
Duration around 15 Minutes
Start with Lab 2 / Chapter 2.1 Step _94
by double clicking on the InfoSphere DS/QS Designer Client icon on the desktop.
Lab End Lab 2 / Chapter 2.1 Step _102
© 2013 IBM Corporation
Validating Data Rules in InfoSphere Information Server
� Embed Data Rule Definitions
in jobs
� Create new data rules
through the InfoSphere Information Server Designer
�Enables an integrated andcomprehensive development environment for InfoSphere Information Server
© 2013 IBM Corporation
Data Validation: InfoSphere enables integrated data quality management and monitoring
� Data load jobs use Data Rule Definitions to ensure seamless data quality management.
� You can view exceptions in real-time in your Data Quality Dashboard.