Steps Involved in building an ETL process… 1. Create Source Definition 2. Create Target Definition 3. Design Mapping with or without Transformation Rule 4. Create Session for each Mapping 5. Create Workflow 6. Execute Workflow Prerequisites: 1. Creation of User Accounts Process 1. Create User in Oracle (It can be any database) a) StartProgramOracleApplication DevelopmentSQL*PLUS b) Login with Username: system Password: manager Host String: ORCL SQL> CREATE USER BATCH7 IDENTIFIED BY target; SQL> GRANT DBA TO BATCH7; EMP DEPT BONUS Emp Username: batch7 Password: target Username: scott Passwor: tiger Target DB Source DB
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Steps Involved in building an ETL process…
1. Create Source Definition
2. Create Target Definition
3. Design Mapping with or without Transformation Rule
4. Create Session for each Mapping
5. Create Workflow
6. Execute Workflow
Prerequisites:
1. Creation of User Accounts
Process
1. Create User in Oracle (It can be any database)
a) StartProgramOracleApplication DevelopmentSQL*PLUS
b) Login with
Username: system
Password: manager
Host String: ORCL
SQL> CREATE USER BATCH7 IDENTIFIED BY target;
SQL> GRANT DBA TO BATCH7;
EMP
DEPT
BONUS
Emp
Username: batch7Password: target
Username: scottPasswor: tiger
Target DBSource DB
SQL> CONNECT BATCH7/target@ORCL;
SQL> CREATE TABLE DIM_EMP
( EMPNO NUMBER(5) PRIMARY KEY,
ENAME VARCHAR2 (10),
SAL NUMBER(7,2),
DNO NUMBER(3) );
2. Create the ODBC connection
An ODBC is a middleware or an interface which provides an access to the databases..
StartSettingControl PanelPerformance & MaintenanceAdministrative tools Data Source (ODBC)
For Stand alone PC set “USER DSN”
For PC on Network:-
Select SYSTEM DSN tabClickAdd
-- Now the Create New Data Source Window will appear…
Select the driver Oracle in oradb10g_home (For 9i Oracle in OraHome90)
ClickFinish
-- Now the Oracle ODBC driver Configuration window will appear…
Data Source Name: Batch7_Source_Oracle
TNS Service Name: ORCL
USER ID: scott
ClickTest connection give password (tiger)
A message will appear “CONNECTION SUCCESSFUL” It means now your user Scott is connected through ODBC. Otherwise check the configuration setting again and do it properly.
** One more ODBC connection required for target similarly create an ODBC connection name with BATCH7_TARGET_ORACLE by repeating the same process explained above but here the username will be BATCH7 which you have created with password target.
3. Starting Services
For Starting Services you can use MSCONFIG command on RUN prompt a window will appear in that choose services. Or you can find Services in control panelAdministrative tool.
Connect to the database with the following details…
ODBC data source- Give the connection name you had given earlier while creating your ODBC connection.
Username: SCOTT
Ownername: SCOTT
Password: tiger
Click Connect
Select desired tables you want to be as a source definitionOK
Repository menuSave
**Now your Source Definition has been created and saved in repository.
Step 2: Create Target Definition
Target Definition can be created using Target Designer Tool in the Designer client Component.
Procedure
1. Tool menuTarget Designer
2. Source menuImport from database
Connect to the database with following details:
ODBC data source
Username
Password
ClickConnect
Select TablesOK
Repository menuSave
Step3: Design a mapping without Transformation Rule
** A mapping without Transformation Rule called Simple Pass Mapping.
A Mapping is created using mapping designer tool. Every mapping is uniquely identified by name.
Procedure
1. Tools menuMapping Designer
2. Mapping MenuCreate
3. Enter the Mapping NameOK
4. From repository navigator pane drag the source (EMP) and Target (Dim_Emp) table definition; drop on mapping designer work space.
5. From Source_Qualifier (SQ_EMP) connect column to the corresponding columns in the target table definition by just dragging. (You can also use auto connect)
6. Repository menuSave
Note: Every source table definition by default associates with source qualifier transformation.
The source qualifier transformation prepares an SQL statement which is used for extraction by integration service.
Step 4: Creation of Session
Process
1. Open the client workflow manager
2. ConnectRepository
3. SelectFolder(your folder from Repository pane)Tool menuTask Developer
4. Task MenuCreate
5. SelectTask type sessionEnter the nameClickCreate
6. SelectMappingOK
7. ClickDone
** Creation of Source Connection:- (This connection is required for extraction and loading of actual data. Earlier connection ODBC which you made is only for extracting the structure of the table.)
Connection menuRelational
FromlistOracleNewEnter the Details
Name-BATCH7PM_SRC
Username- scott
Password-tiger
Connect String-ORCL
** Similarly create target connection following the above stated process.
8. Double ClickSession(S_simplepass)Mapping tab
9. From left paneSQ_EMPSet Connection with the value which you have created for the “Extraction” connection name
10. Repeat the processDim_employeeSet Connection
11. From propertiesset Target load type “Normal”ApplyOK
12. RepositorySave
Step 5: Creation of Workflow
Process
1. Tool menuWorkflow Designer
2. Workflow menuCreate
3. Enter the Workflow name (WKF_simplepass)
4. From repository Navigator window expandSession sub folderDrag the session, drop beside the workflow
5. From Task menuLink taskDrag the link from workflow, drop on session
6. Repository menuSave
Step 6: Executing Workflow
Process
1. Workflow menuStart Workflow
** Now Start the Workflow Monitor to view your workflow status.
** There is an option in the Informatica PowerCenter in which you can create the target definition manually to the Target Database.
Target DefinitionManual Approach
Process
1. Tools menuTarget Designer
2. Target menuCreate
3. Enter the Target Table nameselect Database type (Oracle)CreateDone
4. Double ClickTarget DefinitionColumn Tab from toolbarAdd new column
The column Structure will look like this…
Column name Data Type Precision Scale Not Null Key Type
DEPTNO Number(p,s) 2 0 Primary key
DNAME Varchar2 10 0 Not A key
SUMSAL Number(p,s) 7 2 Not A key
5. ClickOkDone
6. Target menuGenerate/Execute SQL
7. ClickConnectGive the Information
ODBC Data source
Username
Password
8. ClickConnect
√
9. Select Create TableGenerate/ExecuteClose
Transformation
1. (Filter, Rank, Expression)
DFD
14 Rows 14 Rows 14 (I) 6(O) 6(I) 3(O) 3(I) 3(O)
Business Logic: Calculate the Tax for top 3 employees of department number 30
Ans: First we use the Filter transformation to filter the data of department number 30 only, then Rank transformation to take the top 3 employees from department number 30 and at last Expression transformation to calculate the Tax for each employee.
Business Logic: Calculate the total Salary Paid for each department
Ans: Sorter transformation is used for better performance of Aggregator transformation, and grouping the data department wise, aggregator transformation aggregate the salary (Sum) department wise. There is Dname port is in the target table but Dname is not in the Emp table so, we used the Lookup Transformation to get the Dname from Department table. (Emp.deptno=Dept.deptno)
Procedure
1. Create the Source and Target Definition
2. Create the mapping with name M_LKP
3. Drop the Source & Target Definitions
4. Create the Transformation type Sorter and Aggregator.
5. From SQ_EMP copy the port (Dept, Sal) to Sorter Transformation
Emp SQ_EMP Sorter T/R Aggregator T/R Emp_Sum
Lookup T/R
6. Double clickSelect Port tabfor port name Deptno check, key checkboxapplyok
7. From Sorter Transformation copy ports to aggregator transformation
8. Double click on aggregator transformationSelect port tabfor a port name deptnocheck group by checkbox
9. From a port name SAL uncheck output port (O)
10. From toolbar add new port
Port name DataType P S I O V Expression
Sumsal decimal 7 2 √ sum(sal)
11. Select the properties tab select a sorted inputclick applyOK
12. From aggregator connects the port to the target definition.
14. Select the tab Sourceselect the source table Dept definition ok
15. From aggregator transformation copy the port Deptno to the lookup transformation double click on lookup transformation
16. Select the condition tab from a toolbar click on add a new condition.
Lookup table column Operator Transformation Port
Deptno = Deptno1
17. ClickapplyOK
18. From lookup transformation connect the Dname port to the target.
19. ClickRepository menusave
20. Repeat the process of creation of session and workflow and run the workflow.
Note: - Lookup Transformation supports both Equi-Join and Non-Equijoin.
3. Joiner Transformation
DFD
-Empno-Ename-Job-Sal-Deptno
-Deptno-Dname-Location
Joiner Transformation
Emp
Emp_Dept
Dept
-Empno-Ename-Job-Sal-Deptno-Dname-Location
Business Logic Merge the table Emp and Dept
Ans: Use Joiner transformation copy the ports from the both Emp and Dept table to joiner transformation. Set the condition on the port which is available in both tables just like equi-join (deptno).
Procedure
1. Create Source and Target definition
2. Create MappingNameM_DATA_JOIN
3. Drop source and Target Definition
4. Create the transformation type joiner
5. From SQ_EMP copy the following ports to the joiner transformation (Empno, ename job, sal, deptno)
6. From SQ_DEPT copy the following ports to joiner transformation (Deptno, Dname, Location)
7. Double click on joiner transformationCondition tabFrom toolbar Add new condition
Master Operator Detail
Deptno1 = Deptno
8. Clickapplyok
9. From joiner transformation connect to the port to target definition.
10. Repeat the process for creating session & workflow and run the workflow
Router Transformation
Router TransformationInput
State=HRState=DLState=KADefault
Business Logic: Divide the Emp table records Department wise.
Ans: For this we will use Router Transformation because this takes one input and provides multiple outputs. Connect the target table through the every distinct department groups of output port in the Router transformation.
Procedure
Sales Sales_SQ
State HR
State DL
State KA
Default
1. Create Source and Target Definition
2. Create mapping with name M_Router
3. Drag and drop the source and target definition on to the mapping designer work space.
4. Create Router Transformation from transformation menu.
5. Copy all the port from Source Qualifier to Router transformation
6. Double click Router TransformationGroup tabAdd new Group from toolbar
Group Name Group Filter Condition
Dept 10 Deptno=10
Dept 20 Deptno=20
Dept 30 Deptno=30
7. ClickApplyOk
8. Copy ports from Dept 10 Group to Emp_dept 10 Target. (Repeat the process for Dept 20 and Dept 30 group)
9. Click Repository menuSave
10. Repeat the process for creating session and workflow and run the workflow.
Union Transformation
Note: All the sources should have the same structure.
Business Logic: Merge two table Emp and Employees table into one table Emp_Union
Ans: For this we will use Union Transformation It takes multiple inputs and provides one output. But the various sources should have the same structure. Connect the target table through single output port of the Union transformation.
Procedure
1. Import metadata of Emp and Employees from source database using Source Analyzer.
Emp
Employees
SQ_Emp
Employees_SQ
Output
Group 2
Group 1Emp_Union
2. Create Target table Emp_Union
3. Create a mapping with the name M_Union
4. Drag and Drop the Source and Target Definition on the mapping designer workspace.
5. Create Union Transformation with name Emp_Union
6. Double Click on Union TransformationGroup tabAdd new groupName the group Emp and Employees
7. Group port tabAdd new port
Name Datatype Precision Scale
Empno decimal 2 0
Ename String 10 0
Job String 10 0
8. Copy ports SQ_Emp to Emp Group
9. Copy ports Employees_SQ to Employees group
10. Copy the ports from output group to target (Emp_Union)
11. Repeat the Process of creating Session and workflow and Run the workflow.
Stored Procedure Transformation
Create the following stored procedure in the database.
Create procedure Annual_tax
(Sal In Number,
Tax Out Number)
IS
Begin
Tax := Sal*0.15;
End;
Business Logic: Calculate the Annual Salary of the employee for the new column tax in the target table.
Ans: For calculating the annual tax this time we will use stored procedure transformation. Because by this the overheads of the Integration service will be reduced and the performance will be increased.
Procedure
1. Create Source and Target Definition
2. Create the mapping with the name M_StoredP
3. Drop the source and target definition on to the Mapping designer Workspace.
4. Create the transformation type stored procedure
5. Give the ODBC Connection to run the stored procedure according to it’s place means whether it is in Source database or Target database.
If the Stored procedure is in the target database, then nothing to configure additional settings otherwise do the additional settings as follows:
6. Double Click Stored procedure Properties TabSet the “Connection InformationAs per your Relational connection name for source.
7. While Configuring the mapping in the session for connection set the connection for transformation alsoAs per your Relational Connection name for source.
8. From SQ_Emp Connect the SAL port to the stored procedure transformation.
9. From stored procedure connect the TAX port to the target definition.
10. From SQ_Emp connect the remaining ports to target definition.
11. Click Repository menuSave
12. Repeat the process for session & Workflow and Run the workflow.
Source Qualifier Transformation
In this transformation we generally do the changing in the SQL code of the Source Qualifier.
Business Logic: Load the data of employees in the target those belongs to only department number 20 and 30 and records should be sorted by their salary in ascending order.
Ans: Earlier we did this by the filter transformation but now we will do this by source qualifier transformation. It will certainly increase the efficiency of the Integration Service.
Procedure
1. Create Source and Target Definition
2. Create Mapping with a name M_Source_filter
3. Drop the source and target definition on the mapping designer window.
4. From SQ_Emp connect the required port to the Target table just like simple pass.
5. Double click SQ_EmpProperties tabSet the value for “Number of sorted ports=1” Set the value for “sql query” as “SELECT EMP.EMPNO, EMP.ENAME, EMP.SAL, EMP.DEPTNOFROM EMPWHERE DEPTNO IN (20, 30) ORDER BY EMP.SALclick Generate SQLClick ApplyOk
** Here you can also set properties for you query such as Distinct by check the checkbox of Distinct.
** By default the order by clause will be imposed on empno because you had set value for number of sorted port =1 the integration service take it sequentially from empno if you would choose value 2 then it will take to port empno and ename so you must do changes in the SQL query according to your requirements.
Note: If you will set the value for sql query without connecting to target port you will get error message, so connect to the target port from SQ_Emp first.
6. Repeat the process for creating session and workflow and Run the workflow.
User Define Join in Source Qualifier Transformation
User defined joins possible in the source qualifier only when the two sources are belongs to same database user account or schema.
Business Logic: Join two tables Emp and Dept to get the Department name and Location from Dept table for each employee in the Emp table.
Ans: Instead of using joiner transformation we will use User Defined Join option in source Qualifier because both table are in the same schema of scott. It will certainly increase the performance measure also.
Procedure
1. Create the source and target definition
2. Create a mapping with the name M_Source_join
3. Drag and Drop the source and Target definition on to the mapping designer workspace.
4. From source qualifier connects the port to the target definition.
5. Double click on source qualifier Properties Tab Set the value for “User defined join” Emp.Deptno=Dept.Deptno
6. Click ApplyOk
7. Repository menuSave
8. Repeat the process for creating session and workflow and run the workflow.
Mapplet
A mapplet is reusable metadata object created with business logic using set of transformation.
Procedure
1. Tool menuMapplet Designer
2. Mapplet menuCreateGive the name of the mapplet
3. Transformation menuCreateMapplet InputEnter the proper nameCreateDone
4. Transformation menuCreateMapplet InputEnter the proper nameCreateDone
5. Create the Filter Transformation and Expression Transformation.
7. From Mapplet Input Copy the port to Filter TransformationChange the data type, precision, Scale for the required portDefine filter condition Deptno=20 or Deptno=30
8. From Filter Transformation copy the port to expression transformation.
9. Create an output port with the name TAX, Uncheck the input port checkbox from properties & develop the expression with the following syntax
IFF(SAL>2000, SAL*15, SAL*20)
10. From Expression Transformation copy the ports to mapplet output transformation.
11. Repository menuSave
Design a Mapping with Mapplet
Business logic: Create a mapping for extraction and loading the data from table emp for those employee whose belongs to department number 20 or 30 and also calculate their annual tax.
Ans: We will use mapplet that we have just created in the above exercise because we have already implemented this business logic into creating mapplet.
Procedure
1. Create Source and Target Definition
2. Create a mapping with name M_Mapplet
3. Drop the source and target definition onto the mapping designer workspace.
4. From mapplet subfolder drag the mapplet drop beside the source qualifier.
5. From SQ_Emp connect the ports to mapplet input and from mapplet output connect the ports to target definition.
6. Repository menuSave
7. Repeat the process for creating the workflow and Run the workflow.
Constraint Based Load Ordering
A CBL is specified when you want to load the data into snowflake dimension, which are having primary and foreign key relationship.
Exercise: Using CBL load the data into dimension named DEPT and EMP in which deptno is primary key in the DEPT table and Foreign key in the EMP table.
Procedure
1. Create Source and Target Definition
2. Create Mapping with the name M_CBL
3. Drag and Drop the source and target definition on to the mapping working space
4. From SQ_Emp_dept connect the port to the target definition.
5. Create a session with name S_CBL
6. Double clickSessionConfig object tabCheck Constraint Based Load ordering
7. SelectMapping tabSet the source and each target connection relation typeapplyOk
8. Repeat the process for creating workflow and Run the workflow.
Scheduling Workflow
A schedule specifies the data and time to run the workflow.
Procedure
1. From the workflow managerTool menuWorkflow DesignerCreate Workflow
2. Select scheduler tabSelect Reusable Radio buttonSet the values for scheduler
For Run Option
Run on Integration service Initialization
Schedule option Select Run Everyday
Select End OptionForever
Set the start date and time
3. ClickApplyOk
4. Repeat the Rest of the process of creating workflow.
Working with Flat files
Procedure
Step 1Creation of Source Definition
i) Tool menuSource AnalyzerSources menuImport from file
ii) Browse the location of flat fileSelect the fileOk
iii) In the pop-up window do these settings
a) Select Flat File TypeDelimited
b) Select (Check) Import field name from the first lineClickNext
c) Select delimiter typeNext
d) If required alter the data type for source definitionFinish
iv) Repository menuSave
Step 2 Create the target definition in the target database and repeat the process for target definition
Step 3 Create MappingGive proper name From Source Qualifier connect the ports to the target definition.
Step 4 Create a sessionGive the proper name
Step 5 Double click session Mapping tabSelect SQ_Customer from left paneSet the attribute as follows
Attributes Values
Source File Directory D:\Flatfiles
Source File Name Customer.txt
Source File Type Direct
Step 6 Set the target setting for loading as usual with relational connection.
Step 7Repeat the process for creating workflow and run the workflow.
Direct and Indirect Communication of Integration Service with Source File type
Integration Service Integration Service
Direct Indirect
C:\Flatfile\Customer.txt List of filesPath:D:\Files\Cust.txt
8. Select Option “User Defined” Click on Browse Events to Choose an EventSelect the EventClick Ok
9. Repository menuSave
10. Execute Workflow
Workflow with Decision Task
You can enter a condition that determines the execution of the workflow with decision task, similar to the link condition.
Use decision task instead of multiple link condition in the workflow.
Procedure
1. Create four Sessions.
2. Tool MenuWorkflow DesignerWorkflow menuCreate
3. Enter the Workflow nameEvents tabCreate new EventsEnter the Event nameClick Ok
WKF
S10
S200
S300
Decision Task
CommandTask
Event WaitTask
S400
Decision Condition
4. Create Workflow as shown above figureTask menuCreateSelect Task “Decision”, “Command” and “Event Wait” CreateDone
5. Make a link between tasksDouble Click on “Decision Task” Select the Properties tab
Attribute Value
Decision Name $ S10.Status=Succeeded AND$S20.Status=Succeeded AND$ S30.Status=Succeeded
6. ClickOk
7. Double ClickLink Input to “Command Task”Properties tabCreate the Expression
Expression: $Decision .condition=True
8. Double ClickCommand TaskCommand tab
Name Command
Success copy D:\CMDTASK\RESULT.txt D:\Success
9. Double Click “Event Wait task”Predefined EventEnter the file name
D:\Success\RESULT.txt
Note: If you want to delete the watch file after complete the task. You can select the option “Delete Watch File” From properties tab of Event wait task.
10. Repository menuSave
Timer Task
You can specify the period of time to wait before integration service runs the next task in the workflow with the timer task.
Procedure
1. Create timer task from task menu of workflow designer.
2. Double clickTimer taskTimer tab
3. Select absolute timespecify date and timeApplyOk
Design a workflow with multiple link condition (Alternative to decision task)
WKFS10
CommandTask
S20
$S10.Status=Succeeded
$S20.Status=Succeeded
Procedure
1. Design the workflow as shown in above figure.
2. Double clickCommand TaskGeneral tab
Treat Input Link as AND OR
3. Command TabGive the command
4. Repository menuSave
Assignment Task
You can assign a value to user defined workflow variable with the assignment task.
To use an assignment task in the workflow first create and add an assignment task to the workflow then configure the assignment task to assign value or expression to user defined variable.
** Weekly and Daily Loading
Procedure
1. Create three sessions.
2. From Tools menuSelect workflow designerFrom workflow menuSelect Create
3. Enter the workflow name Select Variable tabFrom toolbarClick Add new Variable
Name Datatype Persistent
$$WKF_RUNS Integer √
Enter the default value 0
4. From Repository navigator windowDrag and drop the session S10 drop beside the start task.
5. Create the task type “Decision” and “Assignment”
6. Drag and Drop the session S20, S30
7. Make the link between tasksDouble click on link between S10 and Assignment taskDevelop the following expression
$S10.Status=Succeeded
8. Double click “Assignment task”Expression tabFrom toolbar click on add a new Expression
User Defined Variable Operator Expression
$$WKF_RUNS = $$WKF_RUNS + 1
9. Double clickLink between Assignment task and Decision taskDevelop the following expression
$Assign_value.Status=Succeeded
10. Double clickDecision TaskProperties tab
Attribute Value
Decision Name MOD($$WKF_RUNS, 7) = 0
11. Double clickLink between decision task and session S20Develop the link condition
$Decision .Condition=True
12. Double clickLink between decision task and session S30Develop the link condition
$Decision . Condition=False
13. Repository MenuSave
E-mail Task
Used to send an e-mail within a workflow.
Procedure
1. Develop a workflow with e-mail task.
2. Double clickEmail taskProperties tabSet the following Attribute