Top Banner

of 122

CapGemini_Datastage

Apr 06, 2018

Download

Documents

gangadhar1310
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/2/2019 CapGemini_Datastage

    1/122

    1

    Training course Datastage (part 1) V. BEYET

    03/07/2006

  • 8/2/2019 CapGemini_Datastage

    2/122

  • 8/2/2019 CapGemini_Datastage

    3/122

    3

    Summary

    General presentation (DataStage : what is it ?)

    DataStage : how to use it ?

    The other components (part 2)

  • 8/2/2019 CapGemini_Datastage

    4/122

    4

    General presentation

    Datastage : What is it ?

    An ETL tool: Extract-Transform-Load

    A graphic environment

    A tool integrated in a suite of BI tools

    Developed by Ascential (IBM)

  • 8/2/2019 CapGemini_Datastage

    5/122

    5

    Datastage : why to use it ?

    big size of data (volume)

    multi-source and multi-target :

    files, Databases (oracle, sqlserver, access, ).

    Data transformation :

    Select,

    Format,Combine,AggregateSort.

    General presentation

  • 8/2/2019 CapGemini_Datastage

    6/122

    6

    Datastage : how it works ?

    Development is done :

    on a client-server mode,with a graphical Design of flows,with simple and basic elements,with a simple language (basic).

    Treatments are :

    Compiled and run by an engine,Written on a Universe database,

    General presentation

  • 8/2/2019 CapGemini_Datastage

    7/122

    7

    The different tools

    Server

    Designer Manager

    Administrator Director

    General presentation

  • 8/2/2019 CapGemini_Datastage

    8/122

    8

    Server

    The server contains programs and data.

    The programs

    Called Jobs : first as source code and then asexecutable programs, written in Universe Database

    But we cant understand source code

    Data :

    May be written in Universe Database but better inserver directories.

    General presentation

  • 8/2/2019 CapGemini_Datastage

    9/122

    9

    Server

    What is a Project for Datastage ?

    A server is organized in different environments calledProjects

    A Project is a separated environment for jobs, tabledefinitions and routines

    A Project can be created at any time

    The number of projects is unlimitedThe number of jobs is unlimited for each projectBut the number of simultaneous client connection is

    limited

    General presentation

  • 8/2/2019 CapGemini_Datastage

    10/122

    10

    Servur

    Universe Database:

    The Universe Database is a relational Database with files

    Tables are called " Hash File "

    A Hash file is an indexed file; Its the central element to use all

    the possibilities of the Datastage engine.

    A Hash file with incorrectly defined keys may create disastrous problems.

    General presentation

  • 8/2/2019 CapGemini_Datastage

    11/122

    11

    General presentation (Datastage : what is it ?) DataStage : how to use it ?

    The other components (part 2)

    Summary

  • 8/2/2019 CapGemini_Datastage

    12/122

    12

    The designer

    The designer is to design jobs : look at the icon

    The jobs are composed with Stages :

    active stages : action

    passive stages : data storage

    Links : between the stages

    Designer

  • 8/2/2019 CapGemini_Datastage

    13/122

    13

    The designer

    Passive stages : a place for Data storage (thedata flow is from the stage or to the stage)

    Text File : sequential file

    Hash File : It can be treated only by

    datastage (and not by WordPad, ) but

    simultaneous access is possible on Hash file.

    UV Stage : The file is in the Universe Core

    (DataStage engine).

    ODBC Stage, OLEDB, ORAOCI :

    Representation of a database; it allows to

    access directly to a database with an ODBC

    link.

    Designer

  • 8/2/2019 CapGemini_Datastage

    14/122

    14

    Active stagesAn active stage is a representation of a transformation on the dataflow :

    Designer

    The designer

    Sort : of a file

    Aggregator : calculations

    Transformer : selection, transformation, transport of properties

  • 8/2/2019 CapGemini_Datastage

    15/122

    15

    links

    Designer

    The designer

    Between active and passive stages

    Between passive stages

    Between active stages

  • 8/2/2019 CapGemini_Datastage

    16/122

    16

    The designer

    A job in the designer

    Designer

    Passive StageActive Stage

  • 8/2/2019 CapGemini_Datastage

    17/122

    17

    The designer Designer

    DataStage Designer : Each job has :- one or more source of data- one or more transformation- one or more destination for the dataThe toolbar contains the stage icons to designthe jobs.The jobs have to be compiled to createexecutable programs.

  • 8/2/2019 CapGemini_Datastage

    18/122

    18

    The designer Designer

    The repository

    The toolbarwith stageicons

    (palette)

    To compile the job

    To run the job

  • 8/2/2019 CapGemini_Datastage

    19/122

    19

    The designer Designer

    Lets study now the different Stages :

    Sequential Files (text files)Transformer

    Hash FilesSortAggregatorRoutinesUV Stages

  • 8/2/2019 CapGemini_Datastage

    20/122

    20

    Sequential file Stage :

    Can be read,Can be written,Can be read and written in the same job,Can be written cash or not,

    Can be DOS file or Unix file Can be read by two jobs at the same time

    Cant be written by two jobs at the same time

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    21/122

    21

    The designer

    Sequential File :

    Designer

    Stage name

    File Type

    Stage description

  • 8/2/2019 CapGemini_Datastage

    22/122

    22

    The designer Designer

    Sequential File :

    Output link

    Stage name (to be written)

  • 8/2/2019 CapGemini_Datastage

    23/122

    23

    The designer Designer

    Sequential File :

    Data Format (Output file)

    Always those values

  • 8/2/2019 CapGemini_Datastage

    24/122

    24

    The designer Designer

    Sequential File : To test the connection andview the data in the fileDifferent columns of thefile (Output) : type, length

    Size to display(for View Data)

  • 8/2/2019 CapGemini_Datastage

    25/122

    25

    Group your tabledefinitions byapplication

    Create or modify the tabledefinitions (for files,databases, transformers, )

    The designer Designer

    To describe easily a file :use or create a tabledefinition

    Sequential File :

  • 8/2/2019 CapGemini_Datastage

    26/122

    26

    Then it can be used in different jobs (click on Load to find the rightdefinition).

    The designer Designer

    Sequential File :

  • 8/2/2019 CapGemini_Datastage

    27/122

    27

    View Data

    The designer Designer

    Sequential File :

  • 8/2/2019 CapGemini_Datastage

    28/122

    28

    Transformer Stage :

    Multi-source and multi-target,

    Wait for the availability of the source of data,Makes lookup between 2 flows (reference),Transform or propagate the data of each flow,Allows to select, filter, create refusals file.

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    29/122

    29

    Transformer Stage :

    Can do treatments by :

    native basic function or created in the manager,DataStage function or DataStage macro,routines ( before/after type) Or only propagate columns .

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    30/122

    30

    Transformer Stage :

    The designer Designer

    Input data Output data

    Right click :propagate allthe columns

  • 8/2/2019 CapGemini_Datastage

    31/122

    31

    The designer Designer

    Input data

    Output data

    Transformer Stage :

  • 8/2/2019 CapGemini_Datastage

    32/122

    32

    Exercise n1 : Objective : Read a sequential file and create a new one (save the file)

    The catalogue.in file has to be read and the catalogue_save.tmp file has to be written

    Source File : catalogue.in(in \in directory)Target File : catalogue_save.tmp (in \tmp directory)

    Steps :1- Create a table definition (structure of Catalogue table )2- Design the job with 2 Sequential Files and 1 Transformer

    3- Create the links (data flow)4- Save and Compile the job5- Run the job6-Look at the performances statistics (right click)

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    33/122

    33

    Look at the performances of your job :

    Right click on the grid and then select

    Show performance statistics

    The designer Designer

    Transformer Stage :

  • 8/2/2019 CapGemini_Datastage

    34/122

    34

    Create the parameters of the job :menu Edit - Job Properties , tab Parameters.

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    35/122

    35

    Exercise n2 :

    Objective : Use environment variables

    - create a job parameter : directory- place it on all the paths from the job of the firstexercise (example : #directory#\tmp),- compile- modify your input file (add your best film)- run with different path (other groups).

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    36/122

    36

    Hash File Stage :

    The designer Designer

    Necessary for a lookup

    One Hash file is entirely written before it can beread ( FromTrans link must be finished before FromFilmTypeHFcan start)

    Allows to group multiple records with the samekey (suppress duplicate keys)

    Can be read in different jobs simultaneouslyCan be written by different links simultaneously

    (in the same job or in different jobs)

  • 8/2/2019 CapGemini_Datastage

    37/122

    37

    Hash File :

    The designer Designer

    Stage name

    Account name(DataStage project)

    File path

  • 8/2/2019 CapGemini_Datastage

    38/122

    38

    The designer Designer

    Hash File :File name

    For files to write

    Select this check box tospecify that all recordsshould be cached, ratherthan written to the hashedfile immediately. This isnot recommended where

    your job writes and readsto the same hashed file inthe same stream ofexecution

  • 8/2/2019 CapGemini_Datastage

    39/122

    39

    A key must be defined (it can be a single or multiple key)

    The designer Designer

    Hash File :

  • 8/2/2019 CapGemini_Datastage

    40/122

    40

    Stage Transformer : Lookup The main flow can be from every type The secondary flow must has a Hash File to design a lookup (so veryoften, you will have to design a temporary Hash File) The look up is done with the key of the secondary flow

    The number of records in the main flow cant be higher after thelookup than before the look up The lookup is shown with a dotted line When a lookup is exclusive the number of records after the lookupis smaller then the number of records before the lookup

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    41/122

    41

    The designer Designer

    Transformer Stage : Lookup

    Principal Flow(horizontal)

    Reference Flow(vertical flow)

  • 8/2/2019 CapGemini_Datastage

    42/122

    42

    Exercise n3 : Objective : make a lookup between Catalog file and Film Typeto put the type film in the output file.

    Source File : catalogue.in(in \in directory)Target File : catalogue.out (in \out directory)

    Steps :1- Create a table definition (structure of FilmType table )2- Modify your job to create a Hash File from the FilmType.in file

    3- Create the link to show the lookup (data flow)4- Save and Compile the job5- Run the job6-Look at the performances statistics (right click)

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    43/122

    43

    Exercise n4 : Objective : put the director name and the film name togetherseparated by a >. If the film type is not found, put unknowntype in the output file. What happens when the director name isempty ? Find a solution.

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    44/122

    44

    Exercise n5 : Objective : If the film type is not found (use constraint), put thefilm in a refusals file (First a Sequential file and then a Hash File)

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    45/122

    45

    Stage Lookup with selection (exclusive lookup)

    Dont forget : lookup can be designed with ORAOCI stage or UV stage but it is more better with Hash Files.

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    46/122

    46

    The designer Designer

    Exercise n6 : Objective : Select only the films for which the type is known(that means that the lookup is OK)

  • 8/2/2019 CapGemini_Datastage

    47/122

    47

    Exercise n7 : Objective : Select all the clients who are female to put them inan output fileThe SEXE column contains M (Male) or F (female)

    And then create an annotation for this job (all the jobs must have annotations)

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    48/122

    48

    The director Director

    The Director is the job controller, it allows to : Run jobs

    Immediately or later, with more options than in the Designer

    Control job status

    Status : Compiled, Running, Aborted, Validated, Failed validation ...

    Job monitoring

    To control the number of lines treated by each active stage of a job.

  • 8/2/2019 CapGemini_Datastage

    49/122

    49

    Run jobs with Director

    The director Director

    Select the job andclick here

    And then enterthe parameters

  • 8/2/2019 CapGemini_Datastage

    50/122

    50

    To run a job later :

    Director The director

    click here

    And then choosethe date and time

  • 8/2/2019 CapGemini_Datastage

    51/122

    51

    To modify running parameters for a job : Limits Tab

    Director The director

    Warnings limit : the jobstops after x warnings

    Rows limit : the job stops after xrows (on each flow)

  • 8/2/2019 CapGemini_Datastage

    52/122

    52

    Verify the status of jobs with Director

    The status : "Not compiled" "Compiled" "Failed validation" "Validated ok" "Aborted" "Finished" "Running"

    The director Director

  • 8/2/2019 CapGemini_Datastage

    53/122

    53

    Director

    Example : list of jobs

    The director

    To run jobs To stop jobs To run jobs later To view the log To reset job status

  • 8/2/2019 CapGemini_Datastage

    54/122

    54

    Example of a Monitor :

    Director

    For each step : the number of treated lines (input and output)the beginning timethe execution duration (Elapsed time)the statusthe performance (rows/sec)

    The director

    Link type :Pri : principal flow

    Ref : reference flow (lookup)Out : output flow

    The monitor allows to follow thedifferent stages of a job. Seethe importance of a good namefor the stages and the links !

  • 8/2/2019 CapGemini_Datastage

    55/122

    55

    Example of a log :

    Director The director

    Green : OK No problemYellow : warningRed : blocking problem

    Dont forget : Clear the log from time to time ( Job>Clear log).

    To look at error messages,choose the job and click on thelog button

  • 8/2/2019 CapGemini_Datastage

    56/122

    56

    All the elements :

    jobs

    Routines

    table definitions

    are classified in Categories but the

    name must be unique within a project

    The manager

    The manager is the tool to export/import elements from aDataStage project to an other DataStage project.

    Manager

    To import or export elements click on

    the appropriate button

    File>Open Project to change project

    Drag and Drop on an element to changecategory

  • 8/2/2019 CapGemini_Datastage

    57/122

    57

    EXPORT

    Manager The manager

    To append to anexisting file

    To change the selectionoptions :- By category

    - By individual components

    Jobs

    Routines (always checkSource Code box)

    Table definitions

    choose what do you want to export (create a .dsx)

  • 8/2/2019 CapGemini_Datastage

    58/122

    58

    IMPORT

    Manager The manager

    This will create/modify elements inthe DataStage Project

    Make your choice

    choose what do you want to import

  • 8/2/2019 CapGemini_Datastage

    59/122

    59

    With the manager, you can compile many jobs at the same time (multiple compile

    jobs)

    Tools > Run multiple job compile

    you select the type of jobs you want to compile and select Show manual

    selection page and click on Next button

    select the jobs and click on Next button

    click on the Start compile button

    Manager The manager

  • 8/2/2019 CapGemini_Datastage

    60/122

    60

    Sort Stage :

    The designer Designer

    Criteria of sorting are filled inIn Stage Tab/Properties Tab

    Modify those parameters if thefile to sort has a lot of lines

  • 8/2/2019 CapGemini_Datastage

    61/122

    61

    Exercise n8 : Objective : When you have selected all the Women, sort the fileby alphabetical order.

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    62/122

    62

    Aggregator Stage :

    - Allows data to be aggregated on a smaller number ofrecords,- Intermediate treatments executed in memory,- Allows to execute a before/after routine (before or afterthe stage treatment when all the lines have been treated),- Performances are better if data is sorted (Input tab),

    - The aggregator does not sort the records.

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    63/122

    63

    Aggregator Stage : Input Tab

    The designer Designer

    When input datais sorted

  • 8/2/2019 CapGemini_Datastage

    64/122

    64

    Aggregator Stage : Output tab

    The designer Designer

    Group by

    Differentfunctions

  • 8/2/2019 CapGemini_Datastage

    65/122

    65

    Exercise n9 :

    Objective : create a Job which reads location.inAnd calculates the hit-parade from the most hired cassettes (orderby number of hire descending). Put also the name of the film andnot only the number of the cassette (lookup with catalogue.in).

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    66/122

  • 8/2/2019 CapGemini_Datastage

    67/122

    67

    Exercise n9 (job to design)

    The designer Designer

    D i

  • 8/2/2019 CapGemini_Datastage

    68/122

    68

    Exercise n10 (job to design)

    The designer Designer

    D i

  • 8/2/2019 CapGemini_Datastage

    69/122

    69

    Hash File Stage : We have seen that the Hash File is necessary for a lookupWe have seen also that Hash File allows to suppressduplicate keyLets see now how it is useful to group different flows

    The designer Designer

    D i

  • 8/2/2019 CapGemini_Datastage

    70/122

    70

    Exercise n11 :

    Objective : With the job from exercise 10 (use the 2 methods inthe same job), create a Hash File to put the different results in the

    same Hash File.Column 1 : AVERAGE METHOD 1 or AVERAGEMETHOD 2Column 2 : the result of each methodIn the Hash file, you must have 2 lines.

    The designer Designer

    Designer

  • 8/2/2019 CapGemini_Datastage

    71/122

    71

    Exercise n11 (job to design )

    The designer Designer

    Designer

  • 8/2/2019 CapGemini_Datastage

    72/122

    72

    Stage Variables : Simple treatments can be made easily with stage variable.

    - It is a data which remain active during all the duration of the stage. So youcan find a max (if data is sorted), calculate a sum or count something.- In the transformer, click on the right button and then select Show Stagevariables. Example :

    The designer Designer

    Designer

  • 8/2/2019 CapGemini_Datastage

    73/122

    73

    The designer Designer

    Another example :

    Designer

  • 8/2/2019 CapGemini_Datastage

    74/122

    74

    Exercise n12 :

    Objective : Try to calculate the average with stage variables.

    The designer Designer

    Exercise n13 :

    Objective : Create a job that create a file with all the client (key)and in a second column the list of the films (separated by a dot).

    h d Designer

  • 8/2/2019 CapGemini_Datastage

    75/122

    75

    The designer Designer

    Exercise n13 (job to design)

    Th d i Designer

  • 8/2/2019 CapGemini_Datastage

    76/122

    76

    The designer Designer

    Exercise n13 (job to design) The order of the different variables is important. The instructions are executed in the

    order of the stage variables ! (to change the order => right click>stage properties>Link ordering Tab)

    The variables must be initialized (=> right click>stage properties>variables).

    There must be a hash file after the stage.

    Th d i Designer

  • 8/2/2019 CapGemini_Datastage

    77/122

    77

    DataStage Variables :

    Different variables are defined by Datastage :-@NULL- @INROWNUM, @OUTROWNUM- @DATE- @TRUE, @FALSE- @PATH

    The designer Designer

    Link Variables :

    The more useful is : NOTFOUND

    Th d i Designer

  • 8/2/2019 CapGemini_Datastage

    78/122

    78

    Routines : - Source code (written with Basic language)- It is external from the jobs and can be used many times at many

    levels- It can be a Transform function or a Before/After Function :a transform function is called at each linea before subroutine is called before the first line

    (example : empty a file)

    an after subroutine is called when all the lines have beentreated

    The designer Designer

    Th d i Designer

  • 8/2/2019 CapGemini_Datastage

    79/122

    79

    Routines (1/3)

    The designer g

    Type of routineName of the routine

    Always fill in thisShort description

    Th d i Designer

  • 8/2/2019 CapGemini_Datastage

    80/122

    80

    Routines (2/3)

    The designer g

    To be filled inArguments : theyare used in the code

    Th d ig Designer

  • 8/2/2019 CapGemini_Datastage

    81/122

    81

    The designer g

    Routines (3/3)

    Code : useArgument names

    Save CompileTest oftheroutine

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    82/122

    82

    The designer

    Routines : access to a sequential file

    CloseSeq FicXXX

    OpenSeq FicXXX to xxx thenendelseend

    WriteSeq FicXXX to xxx thenendelseend

    ReadSeq FicXXX to xxx thenendelseend

    File Header

    WeofSeq xxx To empty the file

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    83/122

    83

    The designer

    Routines :

    If Then EndElseEnd

    GoTo

    For i= To Next i

    Loop WhileRepeat

    Loop UntilRepeat

    Call DSLogInfo("Information", "RoutineName")Call DSLogWarn("Warning", "RoutineName")Call DSLogFatal("Abort", "RoutineName")

    A=Hello B=World C=A:B

    C=Hello World

    field(,',',3,1) search string file after the third comma

    Trim(, ,T) suppress the trailing spaces

    Upcase()

    Iconv("05/27/97", "D2/")

    Oconv(10740, "D2/")

    A=Hello

    A[1,3]=Hel

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    84/122

    84

    The designer

    Routines : Test

    By double-click on Result column

    The designer

    Designer

  • 8/2/2019 CapGemini_Datastage

    85/122

    85

    Exercise n14 :

    Step 1 :

    Objective : write a routine which calculates the number of daybetween two dates.If begin date is null then return 0 ,If end date is null then initialize it with date of today,

    Save, compile and test the routine.

    The designer

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    86/122

    86

    The designer

    The designer

    Designer

  • 8/2/2019 CapGemini_Datastage

    87/122

    87

    Exercise n14 :

    Step 2

    Objective : Read location.in, generate a file with the hireduration (returned cassettes only)Non returned cassettes after 10 days (end date null) will bewritten in a refusals file with the name and address of client (tosend then a mail)

    The designer

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    88/122

    88

    Exercise n14 (job to be designed)

    The designer

    The designer

    Designer

  • 8/2/2019 CapGemini_Datastage

    89/122

    89

    Exercise n15 :

    Objective : With a routine (Use CASE ), calculate the amountfor the cassette hire (days number * hire price * coefficient).

    The coefficient is calculated with that rule :=5 and =10 and = 30 days = days * hire price * 3

    The designer

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    90/122

    90

    UV Stage : works with internal hash file (in the DataStage Project) makes a Cartesian product uses SQL requests (select from where order by )

    The designer

    The designer

    Designer

  • 8/2/2019 CapGemini_Datastage

    91/122

    91

    Exercise n16 : execute the Cartesian product on Clients fileand Cassettes file

    Objective : Propose to the clients cassettes he has never hired Step 1 : create the job parameter account, Step 2 : create a job to write clients hash file et cassettes hash file

    in the DS project with account parameterStep 3 : In a new job, use those hash files to make the Cartesianproduct

    Look at your job performances !!

    The designer

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    92/122

    92

    Exercise 16 : Step 1 and Step 2

    The designer

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    93/122

    93

    Step 3 :

    The designer

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    94/122

    94

    The designer

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    95/122

    95

    The designer

    The number of records

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    96/122

    96

    Normalization :

    The designer

    12 A|B|C|D|E

    12 A12 B12 C12 D12 E

    The normalization :

    Un-normalization :

    Multi-valuated file Normalized file

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    97/122

    97

    Normalization :

    g

    Multi-valuated file must have :1- a key2- char(253) or @VM for separator3- The Normalize On field from Hash File checked4- the column(s) to normalize

    1 3 42

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    98/122

    98

    Exercise n17 : normalization/un-normalizationStep 1 : create a job which reads location.in file and writes a hashfile (Id_Cli as the key and the list of all Id_Cas separated by@VM) : use Sort stage and Stage Variables !=> View Data on the Input Link of the Hash File

    Step 2 : modify the a job to add normalization of this file=> View Data on the Output Link of the Hash FileStep 3 : Compare the sequential file with location.in file

    g

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    99/122

    99

    g

    Exercise N17 : job to design and View Data

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    100/122

    100

    g

    The ORAOCI Stages :

    The version of oracle used is 9i so use ORAOCI9 stageYou can :

    Either use a query generated by DataStage

    Or use a user-defined queryOr a combination of the both precedent possibilitiesThe access parameters have to be defined by job parametersThe stage can access only one table or moreDifferent actions can be programmed : read, insert, update

    You can also use Stocked Procedures

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    101/122

    101

    g

    The ORAOCI Stages :The access parameters have to be defined by job parameters

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    102/122

    102

    g

    The ORAOCI Stages : Output link

    query generated byDataStage or user-defined query

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    103/122

    103

    g

    Selection of the table(s)

    Selection ofthecolumns

    Group byclause

    Sort parametersquery generatedby DataStage

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    104/122

    104

    Generate SELECT clause from column list; enter other clauses

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    105/122

    105

    Enter custom SQL statement : when you want to add something specific

    To format a date forexample

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    106/122

    106

    The ORAOCI Stages : Output link

    Choose the table

    Choose the action

    Important parameters

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    107/122

    107

    The ORAOCI Stages : Output link

    Number of linesbetween 2 commit

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    108/122

    108

    The ORAOCI Stages : verify error code (1/3)

    If the job must abortwhen there is aSQL error

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    109/122

    109

    The ORAOCI Stages : verify error code (2/3)

    To receive SQL error code

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    110/122

    110

    The ORAOCI Stages : verify error code (3/3)

    Treat lines 1 by 1

    To receive SQL error code

    To select the errors

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    111/122

    111

    The ORA Bulk Stages :

    - to insert in a table (like SQLLOAD)- Very fast (deactivate the index before the load and reactivate it

    after the load)- But no warning if the index is in Unusable state after the load

    (when duplicate keys for example)- Not a lot of Date and Time format (DD.MM.YYYY, YYYY-MM-DD, DD-

    MON-YYYY, MM/DD/YYYY - hh24:mi:ss, hh:mi:ss am)

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    112/122

    112

    The ORA Bulk StagesDSN

    Date and Time format

    password

    Table name (with

    oracle.tableName)

    Number of linesbetween 2 Commit

    user

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    113/122

    113

    How to create a table definition from a table in the database ?

    On the repository,

    right click on Table Definitions

    and then choose Import

    and then Plug-in Meta Data

    Definitions

    The designer

    Designer

  • 8/2/2019 CapGemini_Datastage

    114/122

    114

    Then choose the table (s) and click on Import

    The table definitions will be created in the category ODBC

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    115/122

    115

    Exercise n18 : Read a Database

    Objective : Create a job which reads the tableREF_CPTE in BIODS database

    Step 1 : create the table definition from the databaseStep 2 : create the job that reads the table

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    116/122

    116

    Exercise n19 : Write in a Database

    Objective : Create a job which writes in the tableTST_ALADIN_JGV in BIODS database (only the 2 firstcolumns : keys)Location.in TST_ALADIN_JGV :Id_Cli ======== >> CHAR1Id_Cas ======== >> CHAR2In CHAR1, put a letter (different for each group) before the client number (Id_Cli).

    Step 1 : Use ORAOCI stage

    Step 2 : Same exercise with ORABULK stage

    The designer Designer

  • 8/2/2019 CapGemini_Datastage

    117/122

    117

    Exercise n20 : Update a Database

    Objective : Create a job to update the columns BEGIN_DATEand END_DATE in the table TST_ALADIN_JGV in BIODSdatabase from location.in file

    BEGIN_DATE and END_DATE are defined as timestamp !

    Administrator The administrator

  • 8/2/2019 CapGemini_Datastage

    118/122

    118

    The Administrator :

    Create a DataStage project

    Unlock a jobSometimes, due to server problems, the designer (or manager) falls down and

    some elements may be locked (jobs, table definitions, routines, ) In that case, in the Administrator (with administrator security rights) :

    Administrator The administrator

  • 8/2/2019 CapGemini_Datastage

    119/122

    119

    Unlock a job (1/3)

    choose your project

    And click on

    Command button

    To create a project

    Administrator The administrator

  • 8/2/2019 CapGemini_Datastage

    120/122

    120

    Unlock a job (2/3) CHDIR C:\Ascential\DataStage\EngineLIST.READU

    Search the device number

    Search the user number

    Administrator The administrator

  • 8/2/2019 CapGemini_Datastage

    121/122

    121

    unlock your job with device number

    Unlock a job (3/3) or with user number(UNLOCK USER UserNumber READULOCK)Or everything(UNLOCK ALL)

    Administrator The administrator

  • 8/2/2019 CapGemini_Datastage

    122/122

    Project name

    Create a project Location for the Project (jobs,

    routines, UV hash files, table

    definitions, ) on the server. Must be

    different from the location for the

    directories of data !