8/2/2019 CapGemini_Datastage
1/122
1
Training course Datastage (part 1) V. BEYET
03/07/2006
8/2/2019 CapGemini_Datastage
2/122
8/2/2019 CapGemini_Datastage
3/122
3
Summary
General presentation (DataStage : what is it ?)
DataStage : how to use it ?
The other components (part 2)
8/2/2019 CapGemini_Datastage
4/122
4
General presentation
Datastage : What is it ?
An ETL tool: Extract-Transform-Load
A graphic environment
A tool integrated in a suite of BI tools
Developed by Ascential (IBM)
8/2/2019 CapGemini_Datastage
5/122
5
Datastage : why to use it ?
big size of data (volume)
multi-source and multi-target :
files, Databases (oracle, sqlserver, access, ).
Data transformation :
Select,
Format,Combine,AggregateSort.
General presentation
8/2/2019 CapGemini_Datastage
6/122
6
Datastage : how it works ?
Development is done :
on a client-server mode,with a graphical Design of flows,with simple and basic elements,with a simple language (basic).
Treatments are :
Compiled and run by an engine,Written on a Universe database,
General presentation
8/2/2019 CapGemini_Datastage
7/122
7
The different tools
Server
Designer Manager
Administrator Director
General presentation
8/2/2019 CapGemini_Datastage
8/122
8
Server
The server contains programs and data.
The programs
Called Jobs : first as source code and then asexecutable programs, written in Universe Database
But we cant understand source code
Data :
May be written in Universe Database but better inserver directories.
General presentation
8/2/2019 CapGemini_Datastage
9/122
9
Server
What is a Project for Datastage ?
A server is organized in different environments calledProjects
A Project is a separated environment for jobs, tabledefinitions and routines
A Project can be created at any time
The number of projects is unlimitedThe number of jobs is unlimited for each projectBut the number of simultaneous client connection is
limited
General presentation
8/2/2019 CapGemini_Datastage
10/122
10
Servur
Universe Database:
The Universe Database is a relational Database with files
Tables are called " Hash File "
A Hash file is an indexed file; Its the central element to use all
the possibilities of the Datastage engine.
A Hash file with incorrectly defined keys may create disastrous problems.
General presentation
8/2/2019 CapGemini_Datastage
11/122
11
General presentation (Datastage : what is it ?) DataStage : how to use it ?
The other components (part 2)
Summary
8/2/2019 CapGemini_Datastage
12/122
12
The designer
The designer is to design jobs : look at the icon
The jobs are composed with Stages :
active stages : action
passive stages : data storage
Links : between the stages
Designer
8/2/2019 CapGemini_Datastage
13/122
13
The designer
Passive stages : a place for Data storage (thedata flow is from the stage or to the stage)
Text File : sequential file
Hash File : It can be treated only by
datastage (and not by WordPad, ) but
simultaneous access is possible on Hash file.
UV Stage : The file is in the Universe Core
(DataStage engine).
ODBC Stage, OLEDB, ORAOCI :
Representation of a database; it allows to
access directly to a database with an ODBC
link.
Designer
8/2/2019 CapGemini_Datastage
14/122
14
Active stagesAn active stage is a representation of a transformation on the dataflow :
Designer
The designer
Sort : of a file
Aggregator : calculations
Transformer : selection, transformation, transport of properties
8/2/2019 CapGemini_Datastage
15/122
15
links
Designer
The designer
Between active and passive stages
Between passive stages
Between active stages
8/2/2019 CapGemini_Datastage
16/122
16
The designer
A job in the designer
Designer
Passive StageActive Stage
8/2/2019 CapGemini_Datastage
17/122
17
The designer Designer
DataStage Designer : Each job has :- one or more source of data- one or more transformation- one or more destination for the dataThe toolbar contains the stage icons to designthe jobs.The jobs have to be compiled to createexecutable programs.
8/2/2019 CapGemini_Datastage
18/122
18
The designer Designer
The repository
The toolbarwith stageicons
(palette)
To compile the job
To run the job
8/2/2019 CapGemini_Datastage
19/122
19
The designer Designer
Lets study now the different Stages :
Sequential Files (text files)Transformer
Hash FilesSortAggregatorRoutinesUV Stages
8/2/2019 CapGemini_Datastage
20/122
20
Sequential file Stage :
Can be read,Can be written,Can be read and written in the same job,Can be written cash or not,
Can be DOS file or Unix file Can be read by two jobs at the same time
Cant be written by two jobs at the same time
The designer Designer
8/2/2019 CapGemini_Datastage
21/122
21
The designer
Sequential File :
Designer
Stage name
File Type
Stage description
8/2/2019 CapGemini_Datastage
22/122
22
The designer Designer
Sequential File :
Output link
Stage name (to be written)
8/2/2019 CapGemini_Datastage
23/122
23
The designer Designer
Sequential File :
Data Format (Output file)
Always those values
8/2/2019 CapGemini_Datastage
24/122
24
The designer Designer
Sequential File : To test the connection andview the data in the fileDifferent columns of thefile (Output) : type, length
Size to display(for View Data)
8/2/2019 CapGemini_Datastage
25/122
25
Group your tabledefinitions byapplication
Create or modify the tabledefinitions (for files,databases, transformers, )
The designer Designer
To describe easily a file :use or create a tabledefinition
Sequential File :
8/2/2019 CapGemini_Datastage
26/122
26
Then it can be used in different jobs (click on Load to find the rightdefinition).
The designer Designer
Sequential File :
8/2/2019 CapGemini_Datastage
27/122
27
View Data
The designer Designer
Sequential File :
8/2/2019 CapGemini_Datastage
28/122
28
Transformer Stage :
Multi-source and multi-target,
Wait for the availability of the source of data,Makes lookup between 2 flows (reference),Transform or propagate the data of each flow,Allows to select, filter, create refusals file.
The designer Designer
8/2/2019 CapGemini_Datastage
29/122
29
Transformer Stage :
Can do treatments by :
native basic function or created in the manager,DataStage function or DataStage macro,routines ( before/after type) Or only propagate columns .
The designer Designer
8/2/2019 CapGemini_Datastage
30/122
30
Transformer Stage :
The designer Designer
Input data Output data
Right click :propagate allthe columns
8/2/2019 CapGemini_Datastage
31/122
31
The designer Designer
Input data
Output data
Transformer Stage :
8/2/2019 CapGemini_Datastage
32/122
32
Exercise n1 : Objective : Read a sequential file and create a new one (save the file)
The catalogue.in file has to be read and the catalogue_save.tmp file has to be written
Source File : catalogue.in(in \in directory)Target File : catalogue_save.tmp (in \tmp directory)
Steps :1- Create a table definition (structure of Catalogue table )2- Design the job with 2 Sequential Files and 1 Transformer
3- Create the links (data flow)4- Save and Compile the job5- Run the job6-Look at the performances statistics (right click)
The designer Designer
8/2/2019 CapGemini_Datastage
33/122
33
Look at the performances of your job :
Right click on the grid and then select
Show performance statistics
The designer Designer
Transformer Stage :
8/2/2019 CapGemini_Datastage
34/122
34
Create the parameters of the job :menu Edit - Job Properties , tab Parameters.
The designer Designer
8/2/2019 CapGemini_Datastage
35/122
35
Exercise n2 :
Objective : Use environment variables
- create a job parameter : directory- place it on all the paths from the job of the firstexercise (example : #directory#\tmp),- compile- modify your input file (add your best film)- run with different path (other groups).
The designer Designer
8/2/2019 CapGemini_Datastage
36/122
36
Hash File Stage :
The designer Designer
Necessary for a lookup
One Hash file is entirely written before it can beread ( FromTrans link must be finished before FromFilmTypeHFcan start)
Allows to group multiple records with the samekey (suppress duplicate keys)
Can be read in different jobs simultaneouslyCan be written by different links simultaneously
(in the same job or in different jobs)
8/2/2019 CapGemini_Datastage
37/122
37
Hash File :
The designer Designer
Stage name
Account name(DataStage project)
File path
8/2/2019 CapGemini_Datastage
38/122
38
The designer Designer
Hash File :File name
For files to write
Select this check box tospecify that all recordsshould be cached, ratherthan written to the hashedfile immediately. This isnot recommended where
your job writes and readsto the same hashed file inthe same stream ofexecution
8/2/2019 CapGemini_Datastage
39/122
39
A key must be defined (it can be a single or multiple key)
The designer Designer
Hash File :
8/2/2019 CapGemini_Datastage
40/122
40
Stage Transformer : Lookup The main flow can be from every type The secondary flow must has a Hash File to design a lookup (so veryoften, you will have to design a temporary Hash File) The look up is done with the key of the secondary flow
The number of records in the main flow cant be higher after thelookup than before the look up The lookup is shown with a dotted line When a lookup is exclusive the number of records after the lookupis smaller then the number of records before the lookup
The designer Designer
8/2/2019 CapGemini_Datastage
41/122
41
The designer Designer
Transformer Stage : Lookup
Principal Flow(horizontal)
Reference Flow(vertical flow)
8/2/2019 CapGemini_Datastage
42/122
42
Exercise n3 : Objective : make a lookup between Catalog file and Film Typeto put the type film in the output file.
Source File : catalogue.in(in \in directory)Target File : catalogue.out (in \out directory)
Steps :1- Create a table definition (structure of FilmType table )2- Modify your job to create a Hash File from the FilmType.in file
3- Create the link to show the lookup (data flow)4- Save and Compile the job5- Run the job6-Look at the performances statistics (right click)
The designer Designer
8/2/2019 CapGemini_Datastage
43/122
43
Exercise n4 : Objective : put the director name and the film name togetherseparated by a >. If the film type is not found, put unknowntype in the output file. What happens when the director name isempty ? Find a solution.
The designer Designer
8/2/2019 CapGemini_Datastage
44/122
44
Exercise n5 : Objective : If the film type is not found (use constraint), put thefilm in a refusals file (First a Sequential file and then a Hash File)
The designer Designer
8/2/2019 CapGemini_Datastage
45/122
45
Stage Lookup with selection (exclusive lookup)
Dont forget : lookup can be designed with ORAOCI stage or UV stage but it is more better with Hash Files.
The designer Designer
8/2/2019 CapGemini_Datastage
46/122
46
The designer Designer
Exercise n6 : Objective : Select only the films for which the type is known(that means that the lookup is OK)
8/2/2019 CapGemini_Datastage
47/122
47
Exercise n7 : Objective : Select all the clients who are female to put them inan output fileThe SEXE column contains M (Male) or F (female)
And then create an annotation for this job (all the jobs must have annotations)
The designer Designer
8/2/2019 CapGemini_Datastage
48/122
48
The director Director
The Director is the job controller, it allows to : Run jobs
Immediately or later, with more options than in the Designer
Control job status
Status : Compiled, Running, Aborted, Validated, Failed validation ...
Job monitoring
To control the number of lines treated by each active stage of a job.
8/2/2019 CapGemini_Datastage
49/122
49
Run jobs with Director
The director Director
Select the job andclick here
And then enterthe parameters
8/2/2019 CapGemini_Datastage
50/122
50
To run a job later :
Director The director
click here
And then choosethe date and time
8/2/2019 CapGemini_Datastage
51/122
51
To modify running parameters for a job : Limits Tab
Director The director
Warnings limit : the jobstops after x warnings
Rows limit : the job stops after xrows (on each flow)
8/2/2019 CapGemini_Datastage
52/122
52
Verify the status of jobs with Director
The status : "Not compiled" "Compiled" "Failed validation" "Validated ok" "Aborted" "Finished" "Running"
The director Director
8/2/2019 CapGemini_Datastage
53/122
53
Director
Example : list of jobs
The director
To run jobs To stop jobs To run jobs later To view the log To reset job status
8/2/2019 CapGemini_Datastage
54/122
54
Example of a Monitor :
Director
For each step : the number of treated lines (input and output)the beginning timethe execution duration (Elapsed time)the statusthe performance (rows/sec)
The director
Link type :Pri : principal flow
Ref : reference flow (lookup)Out : output flow
The monitor allows to follow thedifferent stages of a job. Seethe importance of a good namefor the stages and the links !
8/2/2019 CapGemini_Datastage
55/122
55
Example of a log :
Director The director
Green : OK No problemYellow : warningRed : blocking problem
Dont forget : Clear the log from time to time ( Job>Clear log).
To look at error messages,choose the job and click on thelog button
8/2/2019 CapGemini_Datastage
56/122
56
All the elements :
jobs
Routines
table definitions
are classified in Categories but the
name must be unique within a project
The manager
The manager is the tool to export/import elements from aDataStage project to an other DataStage project.
Manager
To import or export elements click on
the appropriate button
File>Open Project to change project
Drag and Drop on an element to changecategory
8/2/2019 CapGemini_Datastage
57/122
57
EXPORT
Manager The manager
To append to anexisting file
To change the selectionoptions :- By category
- By individual components
Jobs
Routines (always checkSource Code box)
Table definitions
choose what do you want to export (create a .dsx)
8/2/2019 CapGemini_Datastage
58/122
58
IMPORT
Manager The manager
This will create/modify elements inthe DataStage Project
Make your choice
choose what do you want to import
8/2/2019 CapGemini_Datastage
59/122
59
With the manager, you can compile many jobs at the same time (multiple compile
jobs)
Tools > Run multiple job compile
you select the type of jobs you want to compile and select Show manual
selection page and click on Next button
select the jobs and click on Next button
click on the Start compile button
Manager The manager
8/2/2019 CapGemini_Datastage
60/122
60
Sort Stage :
The designer Designer
Criteria of sorting are filled inIn Stage Tab/Properties Tab
Modify those parameters if thefile to sort has a lot of lines
8/2/2019 CapGemini_Datastage
61/122
61
Exercise n8 : Objective : When you have selected all the Women, sort the fileby alphabetical order.
The designer Designer
8/2/2019 CapGemini_Datastage
62/122
62
Aggregator Stage :
- Allows data to be aggregated on a smaller number ofrecords,- Intermediate treatments executed in memory,- Allows to execute a before/after routine (before or afterthe stage treatment when all the lines have been treated),- Performances are better if data is sorted (Input tab),
- The aggregator does not sort the records.
The designer Designer
8/2/2019 CapGemini_Datastage
63/122
63
Aggregator Stage : Input Tab
The designer Designer
When input datais sorted
8/2/2019 CapGemini_Datastage
64/122
64
Aggregator Stage : Output tab
The designer Designer
Group by
Differentfunctions
8/2/2019 CapGemini_Datastage
65/122
65
Exercise n9 :
Objective : create a Job which reads location.inAnd calculates the hit-parade from the most hired cassettes (orderby number of hire descending). Put also the name of the film andnot only the number of the cassette (lookup with catalogue.in).
The designer Designer
8/2/2019 CapGemini_Datastage
66/122
8/2/2019 CapGemini_Datastage
67/122
67
Exercise n9 (job to design)
The designer Designer
D i
8/2/2019 CapGemini_Datastage
68/122
68
Exercise n10 (job to design)
The designer Designer
D i
8/2/2019 CapGemini_Datastage
69/122
69
Hash File Stage : We have seen that the Hash File is necessary for a lookupWe have seen also that Hash File allows to suppressduplicate keyLets see now how it is useful to group different flows
The designer Designer
D i
8/2/2019 CapGemini_Datastage
70/122
70
Exercise n11 :
Objective : With the job from exercise 10 (use the 2 methods inthe same job), create a Hash File to put the different results in the
same Hash File.Column 1 : AVERAGE METHOD 1 or AVERAGEMETHOD 2Column 2 : the result of each methodIn the Hash file, you must have 2 lines.
The designer Designer
Designer
8/2/2019 CapGemini_Datastage
71/122
71
Exercise n11 (job to design )
The designer Designer
Designer
8/2/2019 CapGemini_Datastage
72/122
72
Stage Variables : Simple treatments can be made easily with stage variable.
- It is a data which remain active during all the duration of the stage. So youcan find a max (if data is sorted), calculate a sum or count something.- In the transformer, click on the right button and then select Show Stagevariables. Example :
The designer Designer
Designer
8/2/2019 CapGemini_Datastage
73/122
73
The designer Designer
Another example :
Designer
8/2/2019 CapGemini_Datastage
74/122
74
Exercise n12 :
Objective : Try to calculate the average with stage variables.
The designer Designer
Exercise n13 :
Objective : Create a job that create a file with all the client (key)and in a second column the list of the films (separated by a dot).
h d Designer
8/2/2019 CapGemini_Datastage
75/122
75
The designer Designer
Exercise n13 (job to design)
Th d i Designer
8/2/2019 CapGemini_Datastage
76/122
76
The designer Designer
Exercise n13 (job to design) The order of the different variables is important. The instructions are executed in the
order of the stage variables ! (to change the order => right click>stage properties>Link ordering Tab)
The variables must be initialized (=> right click>stage properties>variables).
There must be a hash file after the stage.
Th d i Designer
8/2/2019 CapGemini_Datastage
77/122
77
DataStage Variables :
Different variables are defined by Datastage :-@NULL- @INROWNUM, @OUTROWNUM- @DATE- @TRUE, @FALSE- @PATH
The designer Designer
Link Variables :
The more useful is : NOTFOUND
Th d i Designer
8/2/2019 CapGemini_Datastage
78/122
78
Routines : - Source code (written with Basic language)- It is external from the jobs and can be used many times at many
levels- It can be a Transform function or a Before/After Function :a transform function is called at each linea before subroutine is called before the first line
(example : empty a file)
an after subroutine is called when all the lines have beentreated
The designer Designer
Th d i Designer
8/2/2019 CapGemini_Datastage
79/122
79
Routines (1/3)
The designer g
Type of routineName of the routine
Always fill in thisShort description
Th d i Designer
8/2/2019 CapGemini_Datastage
80/122
80
Routines (2/3)
The designer g
To be filled inArguments : theyare used in the code
Th d ig Designer
8/2/2019 CapGemini_Datastage
81/122
81
The designer g
Routines (3/3)
Code : useArgument names
Save CompileTest oftheroutine
The designer Designer
8/2/2019 CapGemini_Datastage
82/122
82
The designer
Routines : access to a sequential file
CloseSeq FicXXX
OpenSeq FicXXX to xxx thenendelseend
WriteSeq FicXXX to xxx thenendelseend
ReadSeq FicXXX to xxx thenendelseend
File Header
WeofSeq xxx To empty the file
The designer Designer
8/2/2019 CapGemini_Datastage
83/122
83
The designer
Routines :
If Then EndElseEnd
GoTo
For i= To Next i
Loop WhileRepeat
Loop UntilRepeat
Call DSLogInfo("Information", "RoutineName")Call DSLogWarn("Warning", "RoutineName")Call DSLogFatal("Abort", "RoutineName")
A=Hello B=World C=A:B
C=Hello World
field(,',',3,1) search string file after the third comma
Trim(, ,T) suppress the trailing spaces
Upcase()
Iconv("05/27/97", "D2/")
Oconv(10740, "D2/")
A=Hello
A[1,3]=Hel
The designer Designer
8/2/2019 CapGemini_Datastage
84/122
84
The designer
Routines : Test
By double-click on Result column
The designer
Designer
8/2/2019 CapGemini_Datastage
85/122
85
Exercise n14 :
Step 1 :
Objective : write a routine which calculates the number of daybetween two dates.If begin date is null then return 0 ,If end date is null then initialize it with date of today,
Save, compile and test the routine.
The designer
The designer Designer
8/2/2019 CapGemini_Datastage
86/122
86
The designer
The designer
Designer
8/2/2019 CapGemini_Datastage
87/122
87
Exercise n14 :
Step 2
Objective : Read location.in, generate a file with the hireduration (returned cassettes only)Non returned cassettes after 10 days (end date null) will bewritten in a refusals file with the name and address of client (tosend then a mail)
The designer
The designer Designer
8/2/2019 CapGemini_Datastage
88/122
88
Exercise n14 (job to be designed)
The designer
The designer
Designer
8/2/2019 CapGemini_Datastage
89/122
89
Exercise n15 :
Objective : With a routine (Use CASE ), calculate the amountfor the cassette hire (days number * hire price * coefficient).
The coefficient is calculated with that rule :=5 and =10 and = 30 days = days * hire price * 3
The designer
The designer Designer
8/2/2019 CapGemini_Datastage
90/122
90
UV Stage : works with internal hash file (in the DataStage Project) makes a Cartesian product uses SQL requests (select from where order by )
The designer
The designer
Designer
8/2/2019 CapGemini_Datastage
91/122
91
Exercise n16 : execute the Cartesian product on Clients fileand Cassettes file
Objective : Propose to the clients cassettes he has never hired Step 1 : create the job parameter account, Step 2 : create a job to write clients hash file et cassettes hash file
in the DS project with account parameterStep 3 : In a new job, use those hash files to make the Cartesianproduct
Look at your job performances !!
The designer
The designer Designer
8/2/2019 CapGemini_Datastage
92/122
92
Exercise 16 : Step 1 and Step 2
The designer
The designer Designer
8/2/2019 CapGemini_Datastage
93/122
93
Step 3 :
The designer
The designer Designer
8/2/2019 CapGemini_Datastage
94/122
94
The designer
The designer Designer
8/2/2019 CapGemini_Datastage
95/122
95
The designer
The number of records
The designer Designer
8/2/2019 CapGemini_Datastage
96/122
96
Normalization :
The designer
12 A|B|C|D|E
12 A12 B12 C12 D12 E
The normalization :
Un-normalization :
Multi-valuated file Normalized file
The designer Designer
8/2/2019 CapGemini_Datastage
97/122
97
Normalization :
g
Multi-valuated file must have :1- a key2- char(253) or @VM for separator3- The Normalize On field from Hash File checked4- the column(s) to normalize
1 3 42
The designer Designer
8/2/2019 CapGemini_Datastage
98/122
98
Exercise n17 : normalization/un-normalizationStep 1 : create a job which reads location.in file and writes a hashfile (Id_Cli as the key and the list of all Id_Cas separated by@VM) : use Sort stage and Stage Variables !=> View Data on the Input Link of the Hash File
Step 2 : modify the a job to add normalization of this file=> View Data on the Output Link of the Hash FileStep 3 : Compare the sequential file with location.in file
g
The designer Designer
8/2/2019 CapGemini_Datastage
99/122
99
g
Exercise N17 : job to design and View Data
The designer Designer
8/2/2019 CapGemini_Datastage
100/122
100
g
The ORAOCI Stages :
The version of oracle used is 9i so use ORAOCI9 stageYou can :
Either use a query generated by DataStage
Or use a user-defined queryOr a combination of the both precedent possibilitiesThe access parameters have to be defined by job parametersThe stage can access only one table or moreDifferent actions can be programmed : read, insert, update
You can also use Stocked Procedures
The designer Designer
8/2/2019 CapGemini_Datastage
101/122
101
g
The ORAOCI Stages :The access parameters have to be defined by job parameters
The designer Designer
8/2/2019 CapGemini_Datastage
102/122
102
g
The ORAOCI Stages : Output link
query generated byDataStage or user-defined query
The designer Designer
8/2/2019 CapGemini_Datastage
103/122
103
g
Selection of the table(s)
Selection ofthecolumns
Group byclause
Sort parametersquery generatedby DataStage
The designer Designer
8/2/2019 CapGemini_Datastage
104/122
104
Generate SELECT clause from column list; enter other clauses
The designer Designer
8/2/2019 CapGemini_Datastage
105/122
105
Enter custom SQL statement : when you want to add something specific
To format a date forexample
The designer Designer
8/2/2019 CapGemini_Datastage
106/122
106
The ORAOCI Stages : Output link
Choose the table
Choose the action
Important parameters
The designer Designer
8/2/2019 CapGemini_Datastage
107/122
107
The ORAOCI Stages : Output link
Number of linesbetween 2 commit
The designer Designer
8/2/2019 CapGemini_Datastage
108/122
108
The ORAOCI Stages : verify error code (1/3)
If the job must abortwhen there is aSQL error
The designer Designer
8/2/2019 CapGemini_Datastage
109/122
109
The ORAOCI Stages : verify error code (2/3)
To receive SQL error code
The designer Designer
8/2/2019 CapGemini_Datastage
110/122
110
The ORAOCI Stages : verify error code (3/3)
Treat lines 1 by 1
To receive SQL error code
To select the errors
The designer Designer
8/2/2019 CapGemini_Datastage
111/122
111
The ORA Bulk Stages :
- to insert in a table (like SQLLOAD)- Very fast (deactivate the index before the load and reactivate it
after the load)- But no warning if the index is in Unusable state after the load
(when duplicate keys for example)- Not a lot of Date and Time format (DD.MM.YYYY, YYYY-MM-DD, DD-
MON-YYYY, MM/DD/YYYY - hh24:mi:ss, hh:mi:ss am)
The designer Designer
8/2/2019 CapGemini_Datastage
112/122
112
The ORA Bulk StagesDSN
Date and Time format
password
Table name (with
oracle.tableName)
Number of linesbetween 2 Commit
user
The designer Designer
8/2/2019 CapGemini_Datastage
113/122
113
How to create a table definition from a table in the database ?
On the repository,
right click on Table Definitions
and then choose Import
and then Plug-in Meta Data
Definitions
The designer
Designer
8/2/2019 CapGemini_Datastage
114/122
114
Then choose the table (s) and click on Import
The table definitions will be created in the category ODBC
The designer Designer
8/2/2019 CapGemini_Datastage
115/122
115
Exercise n18 : Read a Database
Objective : Create a job which reads the tableREF_CPTE in BIODS database
Step 1 : create the table definition from the databaseStep 2 : create the job that reads the table
The designer Designer
8/2/2019 CapGemini_Datastage
116/122
116
Exercise n19 : Write in a Database
Objective : Create a job which writes in the tableTST_ALADIN_JGV in BIODS database (only the 2 firstcolumns : keys)Location.in TST_ALADIN_JGV :Id_Cli ======== >> CHAR1Id_Cas ======== >> CHAR2In CHAR1, put a letter (different for each group) before the client number (Id_Cli).
Step 1 : Use ORAOCI stage
Step 2 : Same exercise with ORABULK stage
The designer Designer
8/2/2019 CapGemini_Datastage
117/122
117
Exercise n20 : Update a Database
Objective : Create a job to update the columns BEGIN_DATEand END_DATE in the table TST_ALADIN_JGV in BIODSdatabase from location.in file
BEGIN_DATE and END_DATE are defined as timestamp !
Administrator The administrator
8/2/2019 CapGemini_Datastage
118/122
118
The Administrator :
Create a DataStage project
Unlock a jobSometimes, due to server problems, the designer (or manager) falls down and
some elements may be locked (jobs, table definitions, routines, ) In that case, in the Administrator (with administrator security rights) :
Administrator The administrator
8/2/2019 CapGemini_Datastage
119/122
119
Unlock a job (1/3)
choose your project
And click on
Command button
To create a project
Administrator The administrator
8/2/2019 CapGemini_Datastage
120/122
120
Unlock a job (2/3) CHDIR C:\Ascential\DataStage\EngineLIST.READU
Search the device number
Search the user number
Administrator The administrator
8/2/2019 CapGemini_Datastage
121/122
121
unlock your job with device number
Unlock a job (3/3) or with user number(UNLOCK USER UserNumber READULOCK)Or everything(UNLOCK ALL)
Administrator The administrator
8/2/2019 CapGemini_Datastage
122/122
Project name
Create a project Location for the Project (jobs,
routines, UV hash files, table
definitions, ) on the server. Must be
different from the location for the
directories of data !