Top Banner
For more details please contact us: US : +1 718 819 9361 INDIA : +91 8099776681 Email Us : [email protected] Welcome to IBM Data Stage 9.1
26

IBM Infosphere Datastage Introduction Online Training

Jan 12, 2017

Download

Education

Kernel Training
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: IBM Infosphere Datastage Introduction Online Training

For more details please contact us:US : +1 718 819 9361INDIA : +91 8099776681Email Us : [email protected]

Welcome to IBM Data Stage 9.1

Page 2: IBM Infosphere Datastage Introduction Online Training

2 http://kerneltraining.com/ibm-data-stage/

DATA WAREHOUSE A data warehouse is a copy of transaction data specifically

structured for querying and reporting. An expanded definition for data warehousing includes business

intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata.

This definition of the data warehouse focuses on data storage. A data warehouse can be normalized or de normalized. It can be a relational database, multidimensional database, flat file,

hierarchical database, object database, etc. Data warehouse data often gets changed. And data warehouses often focus on a specific activity or entity.

Page 3: IBM Infosphere Datastage Introduction Online Training

3 http://kerneltraining.com/ibm-data-stage/

DATA WAREHOUSE

Page 4: IBM Infosphere Datastage Introduction Online Training

4 http://kerneltraining.com/ibm-data-stage/

Reasons for Dirty Data

Dummy Values Absence of Data Multipurpose Fields Cryptic Data Contradicting Data Inappropriate Use of Address Lines Violation of Business Rules Reused Primary Keys, Non-Unique Identifiers Data Integration Problems

Page 5: IBM Infosphere Datastage Introduction Online Training

5 http://kerneltraining.com/ibm-data-stage/

Data Cleansing

Source systems contain dirty data that must be cleansed

ETL software contains rudimentary data cleansing capabilities

Specialized data cleansing software is often used. Important for performing name and address correction and house holding functions

Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and First logic (i.e. Centric)

Page 6: IBM Infosphere Datastage Introduction Online Training

6 http://kerneltraining.com/ibm-data-stage/

IBM ETL Overview

Page 7: IBM Infosphere Datastage Introduction Online Training

7 http://kerneltraining.com/ibm-data-stage/

IBM ETL Overview

Page 8: IBM Infosphere Datastage Introduction Online Training

8 http://kerneltraining.com/ibm-data-stage/

Data Stage

In its simplest form, Data Stage performs from source systems to target systems in batch and in real time. The data sources may include indexed files, sequential files, relational databases, archives, external data sources, enterprise applications and message queues.

Page 9: IBM Infosphere Datastage Introduction Online Training

9 http://kerneltraining.com/ibm-data-stage/

Data Stage

Data Stage Administrator

Data Stage Designer

Data Stage Director

The Data Stage client components are:

Page 10: IBM Infosphere Datastage Introduction Online Training

10 http://kerneltraining.com/ibm-data-stage/

Data Stage Administrator Designer Director

Specify general server defaults Add and delete projects Set project properties

Access Data Stage Repository by command interface

Use Data Stage Administrator to:

Page 11: IBM Infosphere Datastage Introduction Online Training

11 http://kerneltraining.com/ibm-data-stage/

Data Stage Administrator Designer Director

Page 12: IBM Infosphere Datastage Introduction Online Training

12 http://kerneltraining.com/ibm-data-stage/

Data Stage Administrator Designer Director

Specify how the data is extracted

Specify data transformations

Decode (de normalize) data going into the data mart using referenced lookups

Aggregate data Split data into

multiple outputs on the basis of defined constraints

Use Data StageDesigner to:

Page 13: IBM Infosphere Datastage Introduction Online Training

13 http://kerneltraining.com/ibm-data-stage/

Data Stage Administrator Designer DirectorUse Data stage Director to run, schedule, and monitor your Data Stage jobs. You can also gather statistics as the job runs. Also used for looking at logs for debugging purposes.

The Data Stage Director window is divided into two panes: The Job Category pane lists all of the jobs in the repository. Right pane shows one of three views: Status view, Schedule view, or

Log view.

Page 14: IBM Infosphere Datastage Introduction Online Training

14 http://kerneltraining.com/ibm-data-stage/

Data Stage Administrator Designer Director

Page 15: IBM Infosphere Datastage Introduction Online Training

15 http://kerneltraining.com/ibm-data-stage/

Frequently seen Status

1 Finished 2 Finished (see log) 9 Has been reset 11 Validated OK 12 Validated (see log) 21 Has been reset 99 Compiled 0 Running 3 Aborted 8 Failed validation 13 Failed validation 96 Aborted 97 Stopped 98 Not Compiled

Page 16: IBM Infosphere Datastage Introduction Online Training

16 http://kerneltraining.com/ibm-data-stage/

Data Stage:Getting Started

Set up a project – Before you can create any Data Stage jobs, you must set up your project by entering information about your data.

Create a job – When a Data Stage project is installed, it is empty and you must create the jobs you need in Data Stage Designer.

Define Table Definitions Develop the job – Jobs are designed and developed

using the Designer. Each data source, the data warehouse, and each processing step is represented by a stage in the job design. The stages are linked together to show the flow of data.

Page 17: IBM Infosphere Datastage Introduction Online Training

17 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer Developing a Job

Page 18: IBM Infosphere Datastage Introduction Online Training

18 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer Developing a Job

Page 19: IBM Infosphere Datastage Introduction Online Training

19 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer Input Stage

Page 20: IBM Infosphere Datastage Introduction Online Training

20 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer Transformer Stage

The Transformer stage performs any data conversion required before the data is output to another stage in the job design.

After you are done, compile and run the job.

Page 21: IBM Infosphere Datastage Introduction Online Training

21 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer

Page 22: IBM Infosphere Datastage Introduction Online Training

22 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer

Page 23: IBM Infosphere Datastage Introduction Online Training

23 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer

Page 24: IBM Infosphere Datastage Introduction Online Training

24 http://kerneltraining.com/ibm-data-stage/

Data Stage Designer

Page 25: IBM Infosphere Datastage Introduction Online Training

25

Call us: +91 8099776681Email: [email protected]://kerneltraining.com/ibm-data-stage/

Questions ?

Page 26: IBM Infosphere Datastage Introduction Online Training

26

Call us: +91 8099776681Email: [email protected]://kerneltraining.com/ibm-data-stage/