Informatica Interview Questions1.How to join two tables without
common columns?Create a Dummy Port in both tables and assign same
value e.g. 1 to both ports in expression transformation before
joiner.Now in join condition use this dummy port to join both
tables.OrYou join these table using null key.Pass {} to the key2.
How to generate the sequence of keys or numbers in target without
using the sequence generator transformation.It can be done using a
setvariable function. We need to add a mapping variable with the
initial value given as 0.
Then in the expression transformation:1. Seq_No --> 2.
Out_Seq_No --> setvariable(,)
At every run, the value of the mapping variable will be
incremented by 1.
3. How do you take only duplicate rows in target table in
InformaticaUse this condition in sql override select * from
table_name where rowid not in(select max(rowid) from table_name
group by key_column_name);Or you use rank transformation make range
according the field which represinting duplicasy rows that have
more then 1 rank put get only those rows in target table4. when we
use only Aggregator transformation in our mapping for approx 5
milion records it takes 40-42min time but when we use it with a
sorter transformation, the time reduces to 12 -13 min. We have also
noticed that throughput of select statement from source was also
very high. It aggregates grouped data quickly is ok but why
throughput of select statement was also higher using sorter
transformation?When an Aggregator transformation is used without
sorter, it stores all the data before performing the grouping
operation. But when a sorter is used before aggregator, it sorts
all the data before passing it to the Aggregator
transformation.When the source records pass through the aggregator
transformation, it groups the rows based on the group by function
once the rows for that coulmn(used in group by) are passed to
it.
eg.
eno ename1 A2 B1 C
Once the record " 2 B" is passed to the aggregator t/f, it
groups the records for "1" which is not the case when sorter is not
used.5. How to find out duplicate records using aggregator. Its
similar to the SQL query,
SELECT * FROM ,,....FROM TABLE_NAMEGROUP BY ,,.....HAVING
COUNT(*)=1
Similarly in Informatica Aggregator transformation, select group
by for all the columns and add one output
port,OUT_CNT_RCRDS=count(*)
In the next transformation, use a Router transformation and put
a condition, G1_OUT_CNT_RCRDS=1G2_OUT_CNT_RCRDS>1
G1_OUT_CNT_RCRDS --> TGT_NO_DUPLICATESG2_OUT_CNT_RCRDS -->
TGT_DUPLICATES
6. Delete First 3 Rows & Last 3 Rows in Target TableHow to
delete first 3 rows & last 3 rows in target table in
Informatica?
select count(1) from (select * from where rownum A time
column--> Mentioning a version
8. What is the way to add the total number of records that have
been read from src in the tgt file as last line?This can be
achieved using an Aggregator transformation.
In the aggregator transformation, check the group by columns for
all the source columns and add one extra output port in the
aggregator.
OUT_TTL_RECORDS=count(*)
Pass this port value as the last record of the flat file
target.
9. If there are multiple source flat files with different names
but same file structure. How do we load all those files to the
target in one step?
1. Create the mapping as if there is onlysingle source and
target.2. Now create an additional file on the server whichwill
list themultiple source file names along with their paths.3.
Specify the path and name of this file in the "Source File" under
session properties.4. Now the most important thing - Set"Source
Filetype" as "indirect" under session properties.
10.write a query to retrieve latest records from the target
table means if we have used scd2 version type of dimension, than
retrieve the record with highest version no.for eg
verno id loc1 100 bang2 100 kol1 101 bang 2 101 chen
we have to retrieve 100/kol and 101/chen. how it is possible
through query.
select * from table_name whererowid in (select max(rowid) from
table_name groupby verno);
11. , I have scenario like, I have ten flat files to be loaded
in to target but I need to load file names in to a table in mapping
level. I think we can do achieve though TC transformation.
Unfortunately I did not get though.. Please advise how to implement
the logic. ( I need to capture in mapping level only). This
question has been asked one of the interviews.If you are loading a
target table from multiple flat files and looking to add the source
file name as a field in each row in the Target Table, then
Transaction Control will not help you here. You have to load all
the source files using Indirect Option in Session Level and list
all the source file names to be loaded in one flat file and give
that as an input source file. Then in the PW Designer, go to the
Source Definition and enable the property Add Currently Processed
Flat File Name Port. This will add an additional port in the source
definition. Pass that port to the target tables filename
field.12.Suppose I have one source which is linked into 3
targets.When the workflow runs for the first time only the first
target should be populated and the rest two(second and last) should
not be populated.When the workflow runs for the second time only
the second target should be populated and the rest two(first and
last) should not be populated.When the workflow runs for the third
time only the third target should be populated and the rest
two(first and second) should not be populated.u can use the 3
target tables as lookup. If an incoming row from the file is in the
target, set flags accordingly. Then next step you evaluate the
flags and then use a router.
if in target 1, set flag1=Y, else Nif in target2, set flag2=Y
else Nif in target3, set flag3=Y else N
Now if flag1=N, route totarget 1if flag1=Y and flag2=N, route to
target 2if flag1=Y, flag2=Y and flag3=N route to target3
Of couse this is only if you are inserting rows into the
targets. If you have updates, then of course the logic gets
complicated because you have to check for changed values. But the
concept would still be the same.
0rdeclarea workflow variable like counter assign default
variable =1each time we run the workflowjust increment variable
like counter + 1.if your are running first time, check the counter
value mod 3 you will be getting 1 then load first target.during the
second time we will get (counter mod 3 )=2 then load the data into
second target table.during the thrid time we will get (counter mod
3 )=0 then load the data into third target table.repository server
automatically update the counter value in repository when it is
successfully finished. while executing second time repository
serverread the recent value from repository
in my mapping i am having mutliple files and only one target
output Flat file,and i need to implement below logic.Can any one
suggests me an idea ,how to do it?input------file1:field1 field2
field31 A B2 C D3 E Ffile2:4 G H1 I J5 K Lfile3:4 M N6 O Phere i am
reading three different files in the order File3,file2,file1 .The
logic i needed is , for example if the record corresponding to '1'
is present in multiple files ,then i need to write the record which
is present in the first file and discard the records corresponding
to 1 in the rest rest of the files.My target is a flat file and i
tried with update strategy but i had later found that "update
concept" wont work with flat files. So please suggest another way
to get this logicoutput--------6 O P4 G H5 K L1 A B2 C D3 E F
by this at informatica level we can do the required thing
however instead of having fixed number of source piplines (as # of
files will be placed is not known in case)...it is better to read
all the files by indirect listing and then do the ranking based on
source filename port and grouping on field1....so by indirect
listing we will be independent of number of files coming from
source and can avoid UNION operations in turn
13.Informatica partition:
14Adder header and footer in Infor.matica?
You can get the column heading for a flat file using the session
configuration as below. This session setting will give a file with
header record 'Cust ID,Name, Street #,City,State,ZIP'
Use Case 5 : Custom Flat File Footer.You can get the footer for
a flat file using the session configuration as given in below
image. This configuration will give you a file with ***** End Of
The Report ***** as the last row of the file.
15.To Read a compressed Source file :
Before the file is read, the file need to be unzipped. We do not
need any other pres session script to achieve this. This can be
done easy with the below session setting.
This command configuration generates rows to stdout and the Flat
file reader reads directly from stdout, hence removes need for
staging data.
16.Reading multiple files0Generating a File List.For reading
multiple file sources with same structure, we use indirect file
method. Indirect file reading is made easy using File Command
Property in the session configuration as shown below.
Command writes list of file names to stdout and PowerCenter
interprets this as a file list
17.Zip the output target file:
Zip the Target File.We can zip the target file using a post
session script. but this can be done with out a post session script
as shown in below session configuration.
18. Informatica PowerCenter Partitioning for Parallel Processing
and Faster Delivery
In addition to a better ETL design, it is obvious to have a
session optimized with no bottlenecks to get the best session
performance. After optimizing the session performance, we can
further improve the performance by exploiting the under utilized
hardware power. This refers to parallel processing and we can
achieve this in Informatica PowerCenter using Partitioning
Sessions.What is Session PartitioningThe Informatica PowerCenter
Partitioning Option increases the performance of PowerCenter
through parallel data processing. Partitioning option will let you
split the large data set into smaller subsets which can be
processed in parallel to get a better session
performance.Partitioning TerminologyLets understand some
partitioning terminology before we get into mode details. Partition
: A partition is a subset of the data that executes in a single
thread. Number of partitions : We can divide the data set into
smaller subset by increasing the number of partitions. When we add
partitions, we increase the number of processing threads, which can
improve session performance. Stage : Stage is the portion of a
pipeline, which is implemented at run time as a thread. Partition
Point : This is the boundary between two stages and divide the
pipeline into stages. Partition point is always associated with a
transformation. Partition Type : It is an algorithm for
distributing data among partitions, which is always associated with
a partition point. The partition type controls how the Integration
Service distributes data among partitions at partition points.Below
image shows the points we discussed above. We have three partitions
and three partition points in below session demo.
Type of Session PartitionsDifferent type of partition algorithms
are available. Database partitioning: The Integration Service
queries the database system for table partition information. It
reads partitioned data from the corresponding nodes in the
database. Round-Robin Partitioning : Using this partitioning
algorithm, the Integration service distributes data evenly among
all partitions. Use round-robin partitioning when you need to
distribute rows evenly and do not need to group data among
partitions. Hash Auto-Keys Partitioning: The PowerCenter Server
uses a hash function to group rows of data among partitions. When
hash auto-key partition is used, the Integration Service uses all
grouped or sorted ports as a compound partition key. You can use
hash auto-keys partitioning at or before Rank, Sorter, and unsorted
Aggregator transformations to ensure that rows are grouped properly
before they enter these transformations. Hash User-Keys
Partitioning : Hash user keys. The Integration Service uses a hash
function to group rows of data among partitions based on a
user-defined partition key. You choose the ports that define the
partition key. Key Range Partitioning: With this type of
partitioning, you specify one or more ports to form a compound
partition key for a source or target. The Integration Service then
passes data to each partition depending on the ranges you specify
for each port. Pass-through Partitioning : In this type of
partitioning, the Integration Service passes all rows at one
partition point to the next partition point without redistributing
them.Setting Up Session PartitionsLets see what is required to
setup a session with partition enabled.
We can invoke the user interface for session partition as shown
in below image from your session using the menu Mapping ->
Partitions.
The interface will let you Add/Modify Partitions, Partition
Points and Choose the type of partition Algorithm. Choose any
transformation from the mapping and the "Add Partition Point"
button will let you add additional partition points.
Choose any transformation from the mapping and the "Delete
Partition Point" or"Edit Partition Point"button will let you modify
partition points.
The "Add/Delete/Edit Partition Point" opens up an additional
window which let you modify the partition and choose the type of
the partition algorithm as shown in below image.
Example:Business Use CaseLets consider a business use case to
explain the implementation of appropriate partition algorithms and
configuration.Daily sales data generated from three sales region
need to be loaded into an Oracle data warehouse. The sales volume
from three different regions varies a lot, hence the number of
records processed for every region varies a lot. The warehouse
target table is partitioned based on product line.
Below is the simple structure of the mapping to get the assumed
functionality.
Pass-through PartitionA pass-through partition at the source
qualifier transformation is used to split the source data into
three different parallel processing data sets. Below image shows
how to setup pass through partition for three different sales
regions.
Once the partition is setup at the source qualifier, you get
additional Source Filter option to restrict the data which
corresponds to each partition. Be sure to provide the filter
condition such that same data is not processed through more than
one partition and data is not duplicated. Below image shows three
additional Source Filters, oneper each partition.
Round Robin PartitionSince the data volume from three sales
region is not same, use round robin partition algorithm at the next
transformation in pipeline. So that the data is equally distributed
among the three partitions and the processing load is equally
distributed. Round robin partition can be setup as shown in below
image.
Hash Auto Key PartitionAt the Aggregator transformation, data
need to redistribute across the partitions toavoid the potential
splitting of aggregator groups. Hash auto key partition algorithm
will make sure the data from different partition
isredistributedsuch that records with the same key is in the same
partition. This algorithm will identify the keys based on the group
key provided in the transformation.
Processing records of the same aggregator group in different
partition will result in wrong result.
Key Range PartitionUse Key range partition when required
todistribute the records among partitions based on the range of
values of a port or multiple ports.
Here the target table is range partitioned on product line.
Create a range partition on target definition on PRODUCT_LINE_ID
port to get the best write throughput.Below images shows the steps
involved in setting up the key range partition.
Click on Edit Keys to define the ports on which the key range
partition is defined.
A pop up window shows the list of ports in the transformation,
Choose the ports on which the key range partition is required.
Now give the value start and end range for each partition as
shown below.
We did not have to use Hash User Key Partition and Database
Partition algorithm in the use case discussed here.
Hash User Keypartition algorithm will let you choose the ports
to group rows among partitions. This algorithm can be used in most
of the places where hash auto key algorithm is appropriate.Database
partition algorithm queries the database system for table partition
information. It reads partitioned data from the corresponding nodes
in the database. This algorithm can be applied either on the source
or target definition.
19.Change data capture:
Change Data Capture framework for such project is not a
recommended way to handle this, just because of the efforts
required to build the framework may not be justified. Here in this
article lets discuss about a simple,easyapproach handle Change Data
Capture.
We will be using Informatica Mapping Variables to building our
Change Data Capture logic. Before even we talk about the
implementation, lets understand the Mapping VariableInformatica
Mapping VariableWhat is Mapping VariableThese are variables created
in PowerCenter Designer, which you can use in any expression in a
mapping, and you can also use the mapping variables in a source
qualifier filter, user-defined join, or extract override, and in
the Expression Editor of reusable transformations.Mapping
VariableStarting ValueMapping variable can take the starting value
from 1. Parameter file 2. Pre-session variable assignment 3. Value
saved in the repository4. Initial value 5. Default ValueThe
Integration Service looks for the start value in the order
mentioned above. Value of the mapping variable can be changed with
in the session using an expression and the final value of the
variable will be saved into the repository. The saved value from
the repository is retrieved in the next session run and used as the
session start value.SettingMapping Variable ValueYou can change the
mapping variable value with in the mapping or session using the Set
Function. We need to use the set function based on the Aggregation
Type of the variable. Aggregation Type of the variable can be set
when the variable is declared in the mapping. SetMaxVariable. Sets
the variable to the maximum value of a group of values. To use the
SetMaxVariable with a mapping variable, the aggregation type of the
mapping variable must be set to Max. SetMinVariable. Sets the
variable to the minimum value of a group of values. use the
SetMinVariable with a mapping variable, the aggregation type of the
mapping variable must be set to Min. SetCountVariable. Increments
the variable value by one. In other words, it adds one to the
variable value when a row is marked for insertion, and subtracts
one when the row is marked for deletion. To use the
SetCountVariable with a mapping variable, the aggregation type of
the mapping variable must be set to Count. SetVariable. Sets the
variable to the configured value. At the end of a session, it
compares the final current value of the variable to the start value
of the variable. Based on the aggregate type of the variable, it
saves a final value to the repository.Change Data Capture
ImplementationNow we understand the mapping variables, lets go
ahead and start building our mapping with Change Data Capture. Here
we are going to implement Change Data Capture for CUSTOMER data
load.We need to load any new customer or changed customers data to
a flat file. Since the columnUPDATE_TSvalue changes for any new or
updated customer record, we will be able to find the new or changed
customer records using UPDATE_TS column. As the first step lets
start the mapping and create a mapping variable as shown in below
image. $$M_DATA_END_TIME asDate/Time
Now bring in the source and source qualified to the mapping
designer workspace. Open the source qualifier and give the filter
condition to get the latest data from the source as shown below.
STG_CUSTOMER_MASTER.UPDATE_TS >
CONVERT(DATETIME,'$$M_DATA_END_TIME')
Note : This filter condition will make sure that, latest data is
pulled from the source table each and every time. Latest value for
the variable $M_DATA_END_TIME is retrieved from the repository
every time the session is run. Now map the column UPDATE_TS to an
expression transformation and create a variable expression as
below. SETMAXVARIABLE($M_DATA_END_TIME,UPDATE_TS)
Note : This expression will make sure that, latest value from
the the column UPDATE_TS is stored into the repository after the
successful completion of the session run.
Now you can map all the remaining columns to the down stream
transformation and complete all other transformation required in
the mapping.
Thats all you need to configure Change Data Capture, Now create
your workflow and run the workflow.
Once you look into the session log file you can see the mapping
variable value is retrieved from the repository and used in the
source SQL, just like shown in the image below.
You can look at the mapping variable value stored in the
repository, from workflow manager. Choose the session from the
workspace, right click and select 'View Persistent Value'. You get
the mapping variable in a pop up window, like shown below.
20. Difference between STOP and ABORT
Stop - If the Integration Service is executing a Session task
when you issue the stop command, the Integration Service stops
reading data. It continues processing and writing data and
committing data to targets. If the Integration Service cannot
finish processing and committing data, you can issue the abort
command. Abort - The Integration Service handles the abort command
for the Session task like the stop command, except it has a timeout
period of 60 seconds. If the Integration Service cannot finish
processing and committing data within the timeout period, it kills
the DTM process and terminates the session.Stop: Stop command is
used immediatly kills the processAbort: Abort command is used it
takes certain time period.after kill the process.It will takes 60
Sec to kill the process..21. What are the join types in joiner
transformation? There are 4 Types of Joiner Trasnformations: 1)
Normal 2) Master Outer 3) Detail Outer 4) Full Outer
Note: A normal or master outer join performs faster than a full
outer or detail outer join.
Example: In EMP, we have employees with DEPTNO 10, 20, 30 and
50. In DEPT, we have DEPTNO 10, 20, 30 and 40. DEPT will be MASTER
table as it has less rows.
Normal Join: With a normal join, the Power Center Server
discards all rows of data from the master and detail source that do
not match, based on the condition. All employees of 10, 20 and 30
will be there as only they are matching.
Master Outer Join: This join keeps all rows of data from the
detail source and the matching rows from the master source. It
discards the unmatched rows from the master source. All data of
employees of 10, 20 and 30 will be there. There will be employees
of DEPTNO 50 and corresponding DNAME and LOC Columns will be
NULL.
Detail Outer Join: This join keeps all rows of data from the
master source and the matching rows from the detail source. It
discards the unmatched rows from the detail source. All employees
of 10, 20 and 30 will be there. There will be one record for DEPTNO
40 and corresponding data of EMP columns will be NULL.
Full Outer Join: A full outer join keeps all rows of data from
both the master and detail sources. All data of employees of 10, 20
and 30 will be there. There will be employees of DEPTNO 50 and
corresponding DNAME and LOC Columns will be NULL. There will be one
record for DEPTNO 40 and corresponding data of EMP Columns will be
NULL22. How to enter same record twice in target table? give me
syntax.
In mapping drag source 2 times and make sure that source and
target doesn't have any key constraints.Then add UNION TRF and link
both sources to union and link output ports from union to
target.
or You can use Normalizer t/f to achieve the desired output.
There is an "Occur" option in Normalizer in which you can mention
the no of times you want to load the same source data into
target.23. How to get particular record from the table in
informatica?
We can use regmatch function in InformaticaOr we can use substr
and instr aption to match particular records.
24.How to create primary key only on odd numbers?use Mod
function in the aggregator to find odd and even numbers... then
filter the records with odd no and use sequence generator
25. why sorter transformation is an active transformation?It
allows to sort data either in ascending or descending order
according to a specified field. Also used to configure for
case-sensitive sorting, and specify whether the output rows should
be distinct. then it will not return all the rowsSo If any
transformation has the distinct option then it will be a active
one,bec active transformation is nothing but the transformation
which will change the no. of o/p records.So distinct always filters
the duplicate rows,which inturn decrease the no of o/p records when
compared to i/n records.One more thing is"An active transformation
can also behave like a passive"26. How we can validate all mapping
at a time?In repository go to menu tool then queries.query browser
dialog box will appear.then click on new button.
in query editor,choose folder name and object type after that
execute it(by clicking the blue arrow button) query results window
will appear.u select single mapping or select whole mappings(by
pressing ctrl+A) and then go to tools then validate option to
validate it
27. what is the difference between index cache and data cache
INDEX CACHE: cache contains all the port values which port values
are satisfies the condition those port values are stored in index
cache.
DATA CACHE: cache contains all the port values which port values
are not satisfies the condition those port values are stored in
data cache.
All these properties are just for improving performance. cahce
creates 2 files index and data cache file. In index file, it just
stores frequently acessed key columns wrt transformation where more
I/O and comparisions is required.Assume if infa storing all data in
single cache file considering a table of 100 columns. So assume it
may create a file of 100MB. So we are reading whole file actually
where we just want to read 1 key column data because of joining or
sorting. Rest of 99 column data is just has to be passed to
downstream transformation without any other operation on
it.Consider same scenario now by separating a file into 2, one file
stores data of 1 key column of joiner or sorter. Then size of file
to be read will be too less than 100MB (can say 10MB). So think abt
reading a file of 10MB and 100MB just for comparision even rest of
99 column data is not required for comaprision.28. how to format
phone number 9999999999 into (999)999-9999 in informatica ' ( ' ||
SUBSTR(sample,1,3) || ')' ||SUBSTR(sample,4,3) || '-' ||
SUBSTR(sample,7,4)
29.Different type of dimensions:In Informatica 4 types of
dimensions are available, these are 1) Degenerate dimensions
2) Junk dimesnsions ( a dimension which contains the less
numbers of cordinality vales are less number of indicators )
3) Confirmed dimensions ( a dimension which can be stored by
multiple fact tables)
4) Slowly changing ( based on period of time the dimensions will
be changeda)SCD1 (most recent values in the target)b)SCD2 (current+
history data)c)SCD3 (just partial history)
5) Casual dimension
6) Dirty dim
30. difference between summary filter and details filter?
Summary Filter --- we can apply records group by that contain
common values.Detail Filter --- we can apply to each and every
record in a database
31.data movement in Informatica:
32.Types of load in Informatica:Incremental load:Incremental
means suppose today we processed 100 records ,for tomorrow run u
need to extract whatever the records inserted newly and updated
after previous run based on last updated timestamp (Yesterday run)
this process called as incremental or deltaNormal load:In normal
load we are processing entire source data into target with
constraint based checkingBulk load:In bulk load with out checking
constraints in target we are processing entire source data into
target
What is a Cold Start in Informatica Workflow? Cold Start means
that Integration Service will restart a task or workflow without
recovery. You can restart task or workflow without recovery by
using a cold start.Now Recovering a workflow means to restart
processing of the workflow or tasks from the point of interruption
of the workflow or task. By default, the recovery strategy for
Workflow tasks is to fail the task and continue running the
workflow. Else you need to configure the recovery strategy.
To restart a task or workflow without recovery:1. You can select
the task or workflow that you want to restart.2. Right click >
Cold Start Task or Cold Start Workflow.What is a FACTLESS FACT
TABLE?Where we use Factless Fact We know that fact table is a
collection of many facts and measures having multiple keys joined
with one or more dimesion tables.Facts contain both numeric and
additive fields.But factless fact table are different from all
these.A factless fact table is fact table that does not contain
fact.They contain only dimesional keys and it captures events that
happen only at information level but not included in the
calculations level.just an information about an event that happen
over a period.
A factless fact table captures the many-to-many relationships
between dimensions, but contains no numeric or textual facts. They
are often used to record events or coverage information. Common
examples of factless fact tables include: Identifying product
promotion events (to determine promoted products that didnt sell)
Tracking student attendance or registration events Tracking
insurance-related accident events Identifying building, facility,
and equipment schedules for a hospital or universityFactless fact
tables are used for tracking a process or collecting stats. They
are called so because, the fact table does not have aggregatable
numeric values or information.There are two types of factless fact
tables: those that describe events, and those that describe
conditions. Both may play important roles in your dimensional
models.
Factless fact tables for EventsThe first type of factless fact
table is a table that records an event. Many event-tracking tables
in dimensional data warehouses turn out to be factless.Sometimes
there seem to be no facts associated with an important business
process. Events or activities occur that you wish to track, but you
find no measurements. In situations like this, build a standard
transaction-grained fact table that contains no facts.For eg.
The above fact is used to capture the leave taken by an
employee.Whenever an employee takes leave a record is created with
the dimensions.Using the fact FACT_LEAVE we can answer many
questions like Number of leaves taken by an employee The type of
leave an employee takes Details of the employee who took
leaveFactless fact tables for ConditionsFactless fact tables are
also used to model conditions or other important relationships
among dimensions. In these cases, there are no clear transactions
or events.It is used to support negative analysis report. For
example a Store that did not sell a product for a given period. To
produce such report, you need to have a fact table to capture all
the possible combinations. You can then figure out what is
missing.For eg, fact_promo gives the information about the products
which have promotions but still did not sell
This fact answers the below questions: To find out products that
have promotions. To find out products that have promotion that
sell. The list of products that have promotion but did not
sell.This kind of factless fact table is used to track conditions,
coverage or eligibility. In Kimball terminology, it is called a
"coverage table."
Note:We may have the question that why we cannot include these
information in the actual fact table .The problem is that if we do
so then the fact size will increase enormously .
Factless fact table is crucial in many complex business
processes. By applying you can design a dimensional model that has
no clear facts to produce more meaningful information for your
business processes.Factless fact table itself can be used to
generate the useful reports.
The different types of ETL Testing are,1. RequirementsTesting2.
DataValidation Testing3. IntegrationTesting4. ReportTesting5.
UserAcceptance Testing6. PerformanceTesting7. RegressionTesting
Requirements Testing Phase in ETL Testing The steps are, Are the
requirements complete? Are the requirements testable? Are the
requirements clear (is there any ambiguity)?
Data Validation Testing Phase in ETL Testing
Compare record counts between data sources Ensure that the ETL
application properly rejects, replaces with default values and
reports invalid data Verify that data is transformed correctly
according to system requirements and business rules Compare unique
values of key fields between source data and warehouse data Ensure
that all projected data is loaded into the data warehouse without
any data loss or truncation Test the boundaries of each field to
find any database limitations
Integration Testing Phase in ETL Testing The steps are, Verify
the sequence and outcome of ETL batch jobs Verify that ETL
processes function with upstream and downstream processes Verify
the initial load of records on data warehouse Verify any
incremental loading of records at a later date for newly inserted
or updated data Test the rejected records that fail ETL rules Test
error log generation
Report Testing Phase in ETL Testing The steps are, Verify report
data with the data source Create SQL queries to verify
source/target data Verify field-level dataUser Acceptance
Testing(UAT) Phase in ETL Testing The steps are, Verify that the
business rules have been met Confirm that the system is acceptable
to the client
Performance Testing Phase in ETL Testing The steps are, Verify
that data loads and queries are executed within anticipated time
frames Verify that maximum anticipated volume of data is loaded
within an acceptable time frame Verify load times with various
amounts of data to predict scalabilityRegression Testing Phase in
ETL Testing The steps are, Ensure that current functionality stays
intact whenever new code is releaseInformatica Java Transformation
Practical ExampleFeel the Power of Java programming language to
transform data in PowerCenter Informatica. Java Transformation in
Informatica can be used either in Active or Passive Mode. Suppose I
have the requirement where my source data looks like this: Source
Data NAMECUST_IDSVC_ST_DTSVC_END_DT
TOM131/08/200923/03/2011
DICK201/01/200431/05/2010
HARRY328/02/200731/12/2009
Here I have a service start date and service end date tied to a
customer. Now I want my target table data in a flattened manner
like this: Target Data NAMECUST_IDSVC_ST_DTSVC_END_DT
TOM131/08/200931/12/2009
TOM101/01/201031/12/2010
TOM101/01/201123/03/2011
DICK201/01/200431/12/2004
DICK201/01/200531/12/2005
DICK201/01/200631/12/2006
DICK201/01/200731/12/2007
DICK201/01/200831/12/2008
DICK201/01/200931/12/2009
DICK201/01/201031/05/2010
HARRY328/02/200731/12/2007
HARRY301/01/200831/12/2008
HARRY301/01/200931/12/2009
i.e. I want to split the service start date and service end
dates on a yearly basis. The first thing that comes to mind with
this situation is to use Informatica Normalizer. Thats TRUE. But if
you think twice, you will find that we need to assume or hard-code
one thing. That means you should consider that either the time span
should have a fixed maximum value. Actually say the maximum span
between the start and end date should be 5 years. Knowingly you are
trying to set the number of occurences of the Normalizer. Next you
will be using a expression transformation followed by a filter to
achieve the requirement. But in this manner the requirement would
not be satisfied when a customer having tenure more than 5 years.
Now here I will be using a small portion of Java Code. The real raw
power of Java programming language called from Informatica
Powercenter will do the data transformation. Lets go straight to
the mapping and the code.
Find the Java Code:- try{ DateFormat formatter = new
SimpleDateFormat("dd/MM/yyyy"); Calendar cal1 =
Calendar.getInstance(); Calendar cal2 = Calendar.getInstance(); int
st_yr, ed_yr, st_mon, ed_mon, st_date, ed_date, st_ldm, ed_ldm;
String str; Date st_dt = (Date)formatter.parse(SVC_ST_DT); Date
ed_dt = (Date)formatter.parse(SVC_END_DT);
cal1.clear(); cal1.setTime(st_dt); cal2.clear();
cal2.setTime(ed_dt);
st_yr = cal1.get(Calendar.YEAR); ed_yr =
cal2.get(Calendar.YEAR); do { OUT_NAME = NAME; OUT_CUST_ID =
CUST_ID; OUT_SVC_ST_DT = formatter.format(st_dt); if(ed_yr !=
st_yr) { str = "31/12/" + st_yr; st_dt =
(Date)formatter.parse(str); cal1.setTime(st_dt); OUT_SVC_END_DT =
formatter.format(st_dt); } else OUT_SVC_END_DT =
formatter.format(ed_dt); generateRow(); st_yr = st_yr + 1; str =
"01/01/" + st_yr; st_dt = (Date)formatter.parse(str);
cal1.setTime(st_dt); st_yr = cal1.get(Calendar.YEAR);
}while(ed_yr >= st_yr);
}catch (ParseException e){ System.out.println(e);}
Next now if we want to transform and load the data on a monthly
basis. Simply find the Mapping and the Code.
Find the Java Code:-
try{ DateFormat formatter = new SimpleDateFormat("dd/MM/yyyy");
DateFormat formatter1 = new SimpleDateFormat("dd/M/yyyy"); Calendar
cal1 = Calendar.getInstance(); Calendar cal2 =
Calendar.getInstance(); int yr, st_mon, ed_mon, st_ldm; String str;
Date st_dt = (Date)formatter.parse(SVC_ST_DT); Date ed_dt =
(Date)formatter.parse(SVC_END_DT);
cal1.clear(); cal1.setTime(st_dt); cal2.clear();
cal2.setTime(ed_dt);
yr = cal1.get(Calendar.YEAR); st_mon =
cal1.get(Calendar.MONTH)+1; ed_mon = cal2.get(Calendar.MONTH)+1;
st_ldm = cal1.getActualMaximum(Calendar.DAY_OF_MONTH); while(ed_mon
!= st_mon) { OUT_NAME = NAME; OUT_CUST_ID = CUST_ID; OUT_SVC_ST_DT
= formatter.format(st_dt);
if(ed_mon != st_mon) { str = st_ldm + "/" + st_mon +"/" + yr;
st_dt = (Date)formatter1.parse(str); cal1.clear();
cal1.setTime(st_dt); OUT_SVC_END_DT = formatter.format(st_dt); }
else { OUT_SVC_ST_DT = formatter.format(ed_dt); } generateRow();
st_mon = st_mon + 1; str = "01/" + st_mon + "/" + yr; st_dt =
(Date)formatter1.parse(str); cal1.clear(); cal1.setTime(st_dt);
st_mon = cal1.get(Calendar.MONTH)+1; st_ldm =
cal1.getActualMaximum(Calendar.DAY_OF_MONTH); }
OUT_NAME = NAME; OUT_CUST_ID = CUST_ID; OUT_SVC_ST_DT =
formatter.format(st_dt); OUT_SVC_END_DT = formatter.format(ed_dt);
generateRow();}catch (ParseException e){
System.out.println(e);}
Note: You can extend PowerCenter functionality with the Java
transformation which provides a simple native programming interface
to define transformation functionality with the Java programming
language. You can use the Java transformation to quickly define
simple or moderately complex transformation functionality without
advanced knowledge of the Java programming language. For example,
you can define transformation logic to loop through input rows and
generate multiple output rows based on a specific condition. You
can also use expressions, user-defined functions, unconnected
transformations, and mapping variables in the Java code.
Implementing Informatica Incremental Aggregation Last Updated on
Wednesday, 13 March 2013 07:35 Written by Saurav Mitra Using
incremental aggregation, we apply captured changes in the source
data (CDC part) to aggregate calculations in a session. If the
source changes incrementally and we can capture the changes, then
we can configure the session to process those changes. This allows
the Integration Service to update the target incrementally, rather
than forcing it to delete previous loads data, process the entire
source data and recalculate the same data each time you run the
session. Incremental AggregationWhen the session runs with
incremental aggregation enabled for the first time say 1st week of
Jan, we will use the entire source. This allows the Integration
Service to read and store the necessary aggregate data information.
On 2nd week of Jan, when we run the session again, we will filter
out the CDC records from the source i.e the records loaded after
the initial load. The Integration Service then processes these new
data and updates the target accordingly. Use incremental
aggregation when the changes do not significantly change the
target.If processing the incrementally changed source alters more
than half the existing target, the session may not benefit from
using incremental aggregation. In this case, drop the table and
recreate the target with entire source data and recalculate the
same aggregation formula .
INCREMENTAL AGGREGATION, may be helpful in cases when we need to
load data in monthly facts in a weekly basis. Sample MappingLet us
see a sample mapping to implement incremental aggregation:
Look at the Source Qualifier query to fetch the CDC part using a
BATCH_LOAD_CONTROL table that saves the last successful load date
for the particular mapping.
Look at the ports tab of Expression transformation.
Look at the ports tab of Aggregator Transformation.
Now the most important session properties configuation to
implement incremental Aggregation
If we want to reinitialize the aggregate cache suppose during
first week of every month we will configure the same session in a
new workflow at workflow level with the Reinitialize aggregate
cache property checked in.
Example with DataNow have a look at the source table data:
CUSTOMER_KEYINVOICE_KEYAMOUNTLOAD_DATE
1111500110001/01/2010
2222500225001/01/2010
3333500330001/01/2010
1111600720007/01/2010
1111600815007/01/2010
2222600925007/01/2010
4444123435007/01/2010
5555615750007/01/2010
After the first Load on 1st week of Jan 2010, the data in the
target is as follows: CUSTOMER_KEYINVOICE_KEYMON_KEYAMOUNT
11115001201001100
22225002201001250
33335003201001300
Now during the 2nd week load it will process only the
incremental data in the source i.e those records having load date
greater than the last session run date. After the 2nd weeks load
after incremental aggregation of the incremental source data with
the aggregate cache file data will update the target table with the
following dataset:
CUSTOMER_KEYINVOICE_KEYMON_KEYAMOUNTRemarks/Operation
11116008201001450The cache file updated after aggretation
22226009201001500The cache file updated after aggretation
33335003201001300The cache file remains the same as before
44441234201001350New group row inserted in cache file
55556157201001500New group row inserted in cache file
Understanding Incremental Aggregation ProcessThe first time we
run an incremental aggregation session, the Integration Service
processes the entire source. At the end of the session, the
Integration Service stores aggregate data for that session run in
two files, the index file and the data file. The Integration
Service creates the files in the cache directory specified in the
Aggregator transformation properties. Each subsequent time we run
the session with incremental aggregation, we use the incremental
source changes in the session. For each input record, the
Integration Service checks historical information in the index file
for a corresponding group. If it finds a corresponding group, the
Integration Service performs the aggregate operation incrementally,
using the aggregate data for that group, and saves the incremental
change. If it does not find a corresponding group, the Integration
Service creates a new group and saves the record data. When writing
to the target, the Integration Service applies the changes to the
existing target. It saves modified aggregate data in the index and
data files to be used as historical data the next time you run the
session. Each subsequent time we run a session with incremental
aggregation, the Integration Service creates a backup of the
incremental aggregation files. The cache directory for the
Aggregator transformation must contain enough disk space for two
sets of the files. The Integration Service creates new aggregate
data, instead of using historical data, when we configure the
session to reinitialize the aggregate cache, Delete cache files
etc. When the Integration Service rebuilds incremental aggregation
files, the data in the previous files is lost.
Pushdown Optimization which is a new concept in Informatica
PowerCentre, allows developers to balance data transformation load
among servers. This article describes pushdown techniques. What is
Pushdown Optimization?Pushdown optimization is a way of
load-balancing among servers in order to achieve optimal
performance. Veteran ETL developers often come across issues when
they need to determine the appropriate place to perform ETL logic.
Suppose an ETL logic needs to filter out data based on some
condition. One can either do it in database by using WHERE
condition in the SQL query or inside Informatica by using
Informatica Filter transformation.
Sometimes, we can even "push" some transformation logic to the
target database instead of doing it in the source side (Especially
in the case of EL-T rather than ETL). Such optimization is crucial
for overall ETL performance. How does Push-Down Optimization
work?One can push transformation logic to the source or target
database using pushdown optimization. The Integration Service
translates the transformation logic into SQL queries and sends the
SQL queries to the source or the target database which executes the
SQL queries to process the transformations. The amount of
transformation logic one can push to the database depends on the
database, transformation logic, and mapping and session
configuration. The Integration Service analyzes the transformation
logic it can push to the database and executes the SQL statement
generated against the source or target tables, and it processes any
transformation logic that it cannot push to the database. Using
Pushdown OptimizationUse the Pushdown Optimization Viewer to
preview the SQL statements and mapping logic that the Integration
Service can push to the source or target database. You can also use
the Pushdown Optimization Viewer to view the messages related to
pushdown optimization. Let us take an example:
Filter Condition used in this mapping is: DEPTNO>40 Suppose a
mapping contains a Filter transformation that filters out all
employees except those with a DEPTNO greater than 40. The
Integration Service can push the transformation logic to the
database. It generates the following SQL statement to process the
transformation logic: INSERT INTO EMP_TGT(EMPNO, ENAME, SAL, COMM,
DEPTNO) SELECT
EMP_SRC.EMPNO,EMP_SRC.ENAME,EMP_SRC.SAL,EMP_SRC.COMM,EMP_SRC.DEPTNOFROM
EMP_SRCWHERE (EMP_SRC.DEPTNO >40)The Integration Service
generates an INSERT SELECT statement and it filters the data using
a WHERE clause. The Integration Service does not extract data from
the database at this time. We can configure pushdown optimization
in the following ways: Using source-side pushdown optimization:The
Integration Service pushes as much transformation logic as possible
to the source database. The Integration Service analyzes the
mapping from the source to the target or until it reaches a
downstream transformation it cannot push to the source database and
executes the corresponding SELECT statement. Using target-side
pushdown optimization:The Integration Service pushes as much
transformation logic as possible to the target database. The
Integration Service analyzes the mapping from the target to the
source or until it reaches an upstream transformation it cannot
push to the target database. It generates an INSERT, DELETE, or
UPDATE statement based on the transformation logic for each
transformation it can push to the database and executes the DML.
Using full pushdown optimization:The Integration Service pushes as
much transformation logic as possible to both source and target
databases. If you configure a session for full pushdown
optimization, and the Integration Service cannot push all the
transformation logic to the database, it performs source-side or
target-side pushdown optimization instead. Also the source and
target must be on the same database. The Integration Service
analyzes the mapping starting with the source and analyzes each
transformation in the pipeline until it analyzes the target. When
it can push all transformation logic to the database, it generates
an INSERT SELECT statement to run on the database. The statement
incorporates transformation logic from all the transformations in
the mapping. If the Integration Service can push only part of the
transformation logic to the database, it does not fail the session,
it pushes as much transformation logic to the source and target
database as possible and then processes the remaining
transformation logic. For example, a mapping contains the following
transformations: SourceDefn -> SourceQualifier -> Aggregator
-> Rank -> Expression -> TargetDefn SUM(SAL), SUM(COMM)
Group by DEPTNO RANK PORT on SAL TOTAL = SAL+COMM
The Rank transformation cannot be pushed to the database. If the
session is configured for full pushdown optimization, the
Integration Service pushes the Source Qualifier transformation and
the Aggregator transformation to the source, processes the Rank
transformation, and pushes the Expression transformation and target
to the target database. When we use pushdown optimization, the
Integration Service converts the expression in the transformation
or in the workflow link by determining equivalent operators,
variables, and functions in the database. If there is no equivalent
operator, variable, or function, the Integration Service itself
processes the transformation logic. The Integration Service logs a
message in the workflow log and the Pushdown Optimization Viewer
when it cannot push an expression to the database. Use the message
to determine the reason why it could not push the expression to the
database. How does Integration Service handle Push Down
OptimizationTo push transformation logic to a database, the
Integration Service might create temporary objects in the database.
The Integration Service creates a temporary sequence object in the
database to push Sequence Generator transformation logic to the
database. The Integration Service creates temporary views in the
database while pushing a Source Qualifier transformation or a
Lookup transformation with a SQL override to the database, an
unconnected relational lookup, filtered lookup. 1. To push Sequence
Generator transformation logic to a database, we must configure the
session for pushdown optimization with Sequence. 2. To enable the
Integration Service to create the view objects in the database we
must configure the session for pushdown optimization with View.
After the database transaction completes, the Integration Service
drops sequence and view objects created for pushdown optimization.
Configuring Parameters for Pushdown OptimizationDepending on the
database workload, we might want to use source-side, target-side,
or full pushdown optimization at different times and for that we
can use the $$PushdownConfig mapping parameter. The settings in the
$$PushdownConfig parameter override the pushdown optimization
settings in the session properties. Create $$PushdownConfig
parameter in the Mapping Designer , in session property for
Pushdown Optimization attribute select $$PushdownConfig and define
the parameter in the parameter file. The possible values may be, 1.
none i.e the integration service itself processes all the
transformations. 2. Source [Seq View], 3. Target [Seq View], 4.
Full [Seq View] Using Pushdown Optimization ViewerUse the Pushdown
Optimization Viewer to examine the transformations that can be
pushed to the database. Select a pushdown option or pushdown group
in the Pushdown Optimization Viewer to view the corresponding SQL
statement that is generated for the specified selections. When we
select a pushdown option or pushdown group, we do not change the
pushdown configuration. To change the configuration, we must update
the pushdown option in the session properties. Database that
supports Informatica Pushdown OptimizationWe can configure sessions
for pushdown optimization having any of the databases like Oracle,
IBM DB2, Teradata, Microsoft SQL Server, Sybase ASE or Databases
that use ODBC drivers. When we use native drivers, the Integration
Service generates SQL statements using native database SQL. When we
use ODBC drivers, the Integration Service generates SQL statements
using ANSI SQL. The Integration Service can generate more functions
when it generates SQL statements using native language instead of
ANSI SQL. Pushdown Optimization Error HandlingWhen the Integration
Service pushes transformation logic to the database, it cannot
track errors that occur in the database. When the Integration
Service runs a session configured for full pushdown optimization
and an error occurs, the database handles the errors. When the
database handles errors, the Integration Service does not write
reject rows to the reject file. If we configure a session for full
pushdown optimization and the session fails, the Integration
Service cannot perform incremental recovery because the database
processes the transformations. Instead, the database rolls back the
transactions. If the database server fails, it rolls back
transactions when it restarts. If the Integration Service fails,
the database server rolls back the transaction.
Aggregation with out Informatica Aggregator Last Updated on
Sunday, 31 March 2013 09:13 Written by Saurav Mitra Since
Informatica process data on row by row basis, it is generally
possible to handle data aggregation operation even without an
Aggregator Transformation. On certain cases, you may get huge
performance gain using this technique! General Idea of Aggregation
without Aggregator TransformationLet us take an example: Suppose we
want to find the SUM of SALARY for Each Department of the Employee
Table. The SQL query for this would be: SELECT DEPTNO, SUM(SALARY)
FROM EMP_SRC GROUP BY DEPTNO; If we need to implement this in
Informatica, it would be very easy as we would obviously go for an
Aggregator Transformation. By taking the DEPTNO port as GROUP BY
and one output port as SUM(SALARY) the problem can be solved
easily. But we want to achieve this without aggregator
transformation!We will use only Expression transformation to
achieve the functionality of Aggregator expression. The trick is to
use the very funda of the expression transformation of holding the
value of an attribute of the previous tuple over here. But wait...
why would we do this? Aren't we complicating the things here?Yes,
we are! But as it appears, in many cases, it might have an
performance benefit (especially if the input is already sorted or
when you know input data will not violate the order, like you are
loading daily data and want to sort it by day). Please see this
article to know more about how to improve the performance of
Aggregator transformationRemember Informatica holds all the rows in
Aggregator cache for aggregation operation. This needs time and
cache space and this also voids the normal row by row processing in
Informatica. By removing the Aggregator with an Expression, we
reduce cache space requirement and ease out row by row processing.
The mapping below will show how to do this.
Mapping for Aggregation with Expression and Sorter only:
Sorter (SRT_SAL) Ports Tab
Now I am showing a sorter here just illustrate the concept. If
you already have sorted data from the source, you need not use this
thereby increasing the performance benefit. Expression (EXP_SAL)
Ports Tab
Sorter (SRT_SAL1) Ports Tab
Expression (EXP_SAL2) Ports Tab
Filter (FIL_SAL) Properties Tab
This is how we can implement aggregation without using
Informatica aggregator transformation.
Approach to send an Email Notification when a Job runs for a
Long time: Description: Here is an approach to send an Email
Notification if a desired task is running for a long time or
exceeding a stipulated time. This approach doesnot send an email
notification when the desired task runs normally or within the
stipulated time. Approach: This approach enables to send an email
notification if a Task is running for more than a stipulated time
[or say 20 mins] . Here in the below scenario consider the
EventWait task to check its run time. Create a Work flow variable
$$GO_SIGNAL_FOR_EMAIL with nstring as datatype.Set the default
value of this variable to a character N and validate it. Create an
Assignment task next to the Task for whose delay a notification has
to be sent . Link the Assingment task to the parent task using a
link task.From the Assignment task connect to the rest of the tasks
in the workflow. Now assign the workflow variable
$$GO_SIGNAL_FOR_EMAIL inside the assignment task with a character
Y. Now connect a Timer task to the Start task as shown below or to
the Task for whose delay a notification is to be sent . Now set the
Timer task with the time it has to wait for to send a notification
as below:
Connect an Email task to the Timer task using a link task.In the
link task which is in between the timer and email tasks define a
condition as: $Timer.Status=SUCCEEDED AND $$GO_SIGNAL_FOR_EMAIL !=
Y . Validate it and after whole Work Flow is completed save and
proceed for running it.
Advantages: Does not impact the rest of the workflow. Sends an
email notification only when the desired Task is running for more
than the stipulated time.
Limitations: The overall status of the Work Flow is shown as
Running until the Timer task is SUCCEEDED. *Note: Even the Timer
task is succeeded the approach only sends an Email Notification
when it the desired task exceeds the stipulated time set.
How can you complete unrecoverable session?Under certain
circumstances, when a session does not complete, you need to
truncate the target tables and run the session from the beginning.
Run the session from the beginning when the Informatica server
cannot run recovery or when running recovery might result in
inconsistent data.
How to recover sessions in concurrent batches? If multiple
sessions in a concurrent batch fail, you might want to truncate all
targets and run the batch again. However, if a session in a
concurrent batch fails and the rest of the sessions complete
successfully, you can recover the session as a standalone session.
To recover a session in a concurrent batch: 1.Copy the failed
session using Operations-Copy Session. 2.Drag the copied session
outside the batch to be a standalone session. 3.Follow the steps to
recover a standalone session. 4.Delete the standalone copy
Explain about perform recovery? When the Informatica Server
starts a recovery session, it reads the OPB_SRVR_RECOVERY table and
notes the row ID of the last row committed to the target database.
The Informatica Server then reads all sources again and starts
processing from the next row ID. For example, if the Informatica
Server commits 10,000 rows before the session fails, when you run
recovery, the Informatica Server bypasses the rows up to 10,000 and
starts loading with row 10,001. By default, Perform Recovery is
disabled in the Informatica Server setup. You must enable Recovery
in the Informatica Server setup before you run a session so the
Informatica Server can create and/or write entries in the
OPB_SRVR_RECOVERY table
Explain about Recovering sessions? If you stop a session or if
an error causes a session to stop, refer to the session and error
logs to determine the cause of failure. Correct the errors, and
then complete the session. The method you use to complete the
session depends on the properties of the mapping, session, and
Informatica Server configuration. Use one of the following methods
to complete the session: Run the session again if the Informatica
Server has not issued a commit. Truncate the target tables and run
the session again if the session is not recoverable. Consider
performing recovery if the Informatica Server has issued at least
one commit
What is difference between stored procedure transformation and
external procedure transformation? In case of storedprocedure
transformation procedure will be compiled and executed in a
relational data source.U need data base connection to import the
stored procedure in to ur maping.Where as in external procedure
transformation procedure or function will be executed out side of
data source.Ie u need to make it as a DLL to access in u r
maping.No need to have data base connection in case of external
procedure transformation
what is incremantal aggregation? When using incremental
aggregation, you apply captured changes in the source to aggregate
calculations in a session. If the source changes only incrementally
and you can capture changes, you can configure the session to
process only those changes. This allows the Informatica Server to
update your target incrementally, rather than forcing it to process
the entire source and recalculate the same calculations each time
you run the session
How can u access the remote source into Ur session? Relational
source: To acess relational source which is situated in a remote
place ,u need to configure database connection to the
datasource.
FileSource : To access the remote source file U must configure
the FTP connection to the host machine before u create the
session.
Hetrogenous : When Ur maping contains more than one source
type,the server manager creates a hetrogenous session that displays
source options for all types
What r the out put files that the informatica server creates
during the session running? Informatica server log: Informatica
server(on unix) creates a log for all status and error
messages(default name: pm.server.log).It also creates an error log
for error messages.These files will be created in informatica home
directory.
Session log file: Informatica server creates session log file
for each session.It writes information about session into log files
such as initialization process,creation of sql commands for reader
and writer threads,errors encountered and load summary.The amount
of detail in session log file depends on the tracing level that u
set.
Session detail file: This file contains load statistics for each
targets in mapping.Session detail include information such as table
name,number of rows written or rejected.U can view this file by
double clicking on the session in monitor window
Performance detail file: This file contains information known as
session performance details which helps U where performance can be
improved.To genarate this file select the performance detail option
in the session property sheet.
Reject file: This file contains the rows of data that the writer
does notwrite to targets.
Control file: Informatica server creates control file and a
target file when U run a session that uses the external loader.The
control file contains the information about the target flat file
such as data format and loading instructios for the external
loader.
Post session email: Post session email allows U to automatically
communicate information about a session run to designated
recipents.U can create two different messages.One if the session
completed sucessfully the other if the session fails.
Indicator file: If u use the flat file as a target,U can
configure the informatica server to create indicator file.For each
target row,the indicator file contains a number to indicate whether
the row was marked for insert,update,delete or reject.
output file: If session writes to a target file,the informatica
server creates the target file based on file prpoerties entered in
the session property sheet.
Cache files: When the informatica server creates memory cache it
also creates cache files.For the following circumstances
informatica server creates index and datacache files.
Aggreagtor transformation Joiner transformation Rank
transformation Lookup transformation
To achieve the session partition what r the necessary tasks u
have to do?4 Configure the session to partition source data.
Install the Informatica server on a machine with multiple cpus.
Describe two levels in which update strategy transformation
sets? Within a session. When you configure a session, you can
instruct the Informatica Server to either treat all records in the
same way (for example, treat all records as inserts), or use
instructions coded into the session mapping to flag records for
different database operations.
Within a mapping. Within a mapping, you use the Update Strategy
transformation to flag records for insert, delete, update, or
reject
What r the rank caches?Asked By: Interview Candidate | Asked On:
Sep 21st, 2004 During the session ,the Informatica server compares
an inout row with rows in the datacache.If the input row out-ranks
a stored row,the Informatica server replaces the stored row with
the input row.The Informatica server stores group information in an
index cache and row data in a data cache
Why we use partitioning the session in Informatica?
Partitioning achieves the session performance by reducing the
time period of reading the source and loading the data into
target.Performance can be improved by processing data in parallel
in a single session by creating multiple partitions of the
pipeline.Informatica server can achieve high performance by
partitioning the pipleline and performing the extract ,
transformation, and load for each partition in parallel.
Which transformation should we use to normalize the COBOL and
relational sources? Normalizer Transformation. When U drag the
COBOL source in to the mapping Designer workspace,the normalizer
transformation automatically appears,creating input and output
ports for every column in the What is the Rankindex in
Ranktransformation? The Designer automatically creates a RANKINDEX
port for each Rank transformation. The Informatica Server uses the
Rank Index port to store the ranking position for each record in a
group. For example, if you create a Rank transformation that ranks
the top 5 salespersons for each quarter, the rank index numbers the
salespeople from 1 to 5What r the different types of Type2
dimension maping? Type2 Dimension/Version Data Maping: In this
maping the updated dimension in the source will gets inserted in
target along with a new version number.And newly added dimension in
source will inserted into target with a primary key.
Type2 Dimension/Flag current Maping: This maping is also used
for slowly changing dimensions.In addition it creates a flag value
for changed or new dimension. Flag indiactes the dimension is new
or newlyupdated.Recent dimensions will gets saved with cuurent flag
value 1. And updated dimensions r saved with the value 0.
Type2 Dimension/Effective Date Range Maping: This is also one
flavour of Type2 maping used for slowly changing dimensions.This
maping also inserts both new and changed dimensions in to the
target.And changes r tracked by the effective date range for each
version of each dimension.How the informatica server sorts the
string values in Ranktransformation? When the informatica server
runs in the ASCII data movement mode it sorts session data using
Binary sortorder.If U configure the seeion to use a binary sort
order,the informatica server caluculates the binary value of each
string and returns the specified number of rows with the higest
binary values for the stringWhen Informatica Server runs in UNICODE
data movement mode ,then it uses the sortorder configured in
session properties
what is a time dimension? give an example?Time dimension is one
of important in Datawarehouse. Whenever u genetated the report ,
that time u access all data from thro time dimension.eg. employee
time dimensionFields : Date key, full date, day of wek, day ,
month,quarter, fiscal yearIn a relational data model, for
normalization purposes, year lookup, quarter lookup, month lookup,
and week lookups are not merged as a single table. In a dimensional
data modeling(star schema), these tables would be merged as a
single table called TIME DIMENSION for performance and slicing
data.This dimensions helps to find the sales done on date, weekly,
monthly and yearly basis. We can have a trend analysis by comparing
this year sales with the previous year or this week sales with the
previous weekATIME DIMENSION is a table that contains the detail
information of the time at whicha particular 'transaction' or
'sale' (event)has taken place.
The TIME DIMENSION has the details of DAY, WEEK, MONTH, QUARTER,
YEAR
Can i start and stop single session in concurent bstch?ya
shoor,Just right click on the particular session and going to
recovery optionorby using event wait and event riseDifference
between static cache and dynamic cache
Static cache Dynamic cache
U can not insert or update the cacheU can insert rows into the
cache as u pass to the target
The informatic server returns a value from the lookup table or
cache when the condition is true.When the condition is not true,
informatica server returns the default value for connected
transformations and null for unconnected transformations.The
informatic server inserts rows into cache when the condition is
false.This indicates that the the row is not in the cache or target
table. U can pass these rows to the target table
how to use mapping parameters and what is their use?in designer
u will find the mapping parameters and variables options.u can
assign a value to them in designer. comming to there uses suppose u
r doing incremental extractions daily. suppose ur source system
contains the day column. so every day u have to go to that mapping
and change the day so that the particular data will be extracted .
if we do that it will be like a layman's work. there comes the
concept of mapping parameters and variables. once if u assign a
value to a mapping variable then it willchange between sessions
mapping parameters and variables make the use of mappings more
flexible.and also it avoids creating of multiple mappings. it helps
in adding incremental data.mapping parameters and variables has to
create in the mapping designer by choosing the menu option as
Mapping ----> parameters and variables and the enter the name
for the variable or parameter but it has to be preceded by $$. and
choose type as parameter/variable, datatypeonce defined the
variable/parameter is in the any expression for example in SQ
transformation in the source filter prop[erties tab. just enter
filter condition and finally create a parameter file to assgn the
value for the variable / parameter and configigure the session
properties. however the final step is optional. if ther parameter
is npt present it uses the initial value which is assigned at the
time of creating the variable
What r the options in the target session of update strategy
transsformatioin? Insert Delete Update Update as update Update as
insert Update esle insert Truncate table Update as Insert:This
option specified all the update records from source to be flagged
as inserts in the target. In other words, instead of updating the
records in the target they are inserted as new records.Update else
Insert:This option enables informatica to flag the records either
for update if they are old or insert, if they are new records from
source
how to create the staging area in your databaseclient having
database throught that data base u get all sourcesA Staging area in
a DW is used as a temporary space to hold all the records from the
source system. So more or less it should be exact replica of the
source systems except for the laod startegy where we use truncate
and reload options.So create using the same layout as in your
source tables or using the Generate SQL option in the Warehouse
Designer tabcreating of staging tables/area is the work of data
modellor/dba.just like " create table ......" the tables will be
created. they will have some name to identified as staging like
dwc_tmp_asset_eval.tmp-----> indicate temparary tables nothing
but staging
What is the difference between connected and unconnected stored
proceduresUnconnected:The unconnected Stored Procedure
transformation is not connected directly to the flow of the
mapping. It either runs before or after the session, or is called
by an expression in another transformation in the
mapping.connected:The flow of data through a mapping in connected
mode also passes through the Stored Procedure transformation. All
data entering the transformation through the input ports affects
the stored procedure. You should use a connected Stored Procedure
transformation when you need data from an input port sent as an
input parameter to the stored procedure, or the results of a stored
procedure sent as an output parameter to another transformation
Run a stored procedure before or after your
session.Unconnected
Run a stored procedure once during your mapping, such as pre- or
post-session.Unconnected
Run a stored procedure every time a row passes through the
Stored Procedure transformation.Connected or Unconnected
Run a stored procedure based on data that passes through the
mapping, such as when a specific port does not contain a null
value.Unconnected
Pass parameters to the stored procedure and receive a single
output parameter.Connected or Unconnected
Pass parameters to the stored procedure and receive multiple
output parameters.Note: To get multiple output parameters from an
unconnected Stored Procedure transformation, you must create
variables for each output parameter. For details, see Calling a
Stored Procedure From an Expression.Connected or Unconnected
Run nested stored procedures.Unconnected
Call multiple times within a mapping.Unconnected
while running multiple session in parallel which loads data in
the same table, throughput of each session becomes very less and
almost same for each session. How can we improve the performance
(throughput) in such cases?I think this will be handled by the
database we use.When the operations/loading on the table is in
progress the table will be locked.If we are trying to load the same
table with different partitions then we run into rowID erros if the
database is 9i and we can apply a patch to reslove this issue
How can you delete duplicate rows with out using Dynamic Lookup?
Tell me any other ways using lookup delete the duplicate rows?For
example u have a table Emp_Name and it has two columns Fname, Lname
in the source table which has douplicate rows. In the mapping
Create Aggregator transformation. Edit the aggregator
transformation select Ports tab select Fname then click the check
box on GroupBy and uncheck the (O) out port. select Lname then
uncheck the (O) out port and click the check box on GroupBy. Then
create 2 new ports Uncheck the (I) import then click Expression on
each port. In the first new port Expression type Fname. Then second
Newport type Lname. Then close the aggregator transformation link
to the target tableIn a joiner trasformation, you should specify
the source with fewer rows as the master source. Why?in joinner
transformationinformatica server reads all the records from master
source builds index and data caches based on master table
rows.after building the caches the joiner transformation reads
records from the detail source and perform joins Joiner
transformation compares each row of the master source against the
detail source. The fewer unique rows in the master, the fewer
iterations of the join comparison occur, which speeds the join
process.
What is data merging, data cleansing, samplingCleansing:---TO
identify and remove the retundacy and inconsistencysampling: just
smaple the data throug send the data from source to target
what is tracing level?Tracing leveldetermines the amount of
information that informatcia server writes in a session logYa its
the level of information storage in session log.The option comes in
the properties tab of transformations. By default it remains
"Normal". Can be Verbose InitialisationVerbose DataNormal or
Terse
How can we join 3 database like Flat File, Oracle, Db2 in
Informatrica?You have to use two joiner transformations.fIRST one
will join two tables and the next one will join the third with the
resultant of the first joiner
How do we analyse the data at database level?Data can be viewed
using Informatica's designer tool.If you want to view the dataon
source/target we can preview the data but with some limitations.We
can use data profiling too
how can we eliminate duplicate rows from flat file?keep
aggregator between source qualifier and target and choose group by
field key, it will eliminate the duplicate records.
What are the index you used? Bitmap join index?Bitmap index used
in data warehouse environment to increase query response time,
since DWH has low cardinality, low updates, very efficient for
where clause.Bitmap join index used to join dimension and fact
table instead reading 2 different index.
What is Data driven? Data driven is a process, in which data is
insert/deleted/updated based on the data. here it is not predifed
tht data is to insert or delete or update .. it will come to knw
only when data is proceesed
What is batch? Explain the types of the batches? Session: A
session is a set of commands that describes the server to move data
to the target.Batch : A Batch is set of tasks that may include one
or more numbar of tasks (sessions, ewent wait, email, command,
etc..,)There are two types of batches in Informatica:1. Sequential:
When Data moves one after another from source to target it is
sequential. Concurrent: When whole data moves simultaneously from
source to target it is Concurrent
What are the types of meta data repository stores?
Global objects Mappings Mapplets Multidimensional metadata
Reusable transformations Sessions and batches Short cuts Source
definitions Target defintions Transformations.Can you use the
mapping parameters or variables created in one mapping into another
mapping? NO. We can use mapping parameters or variables in any
transformation of the same maping or mapplet in which U have
created maping parameters or variables.NO. You might want to use a
workflow parameter/variable if you want it to be visible with other
mappings/sessionsWhy did we use stored procedure in our ETL
Application? Using of stored procedures plays important
role.Suppose ur using oracle database where ur doing some ETL
changes you may use informatica .In this every row of the table
pass should pass through informatica and it should undergo
specified ETL changes mentioned in transformations. If use stored
procedure i..e..oracle pl/sql package this will run on oracle
database(which is the databse where we need to do changes) and it
will be faster comapring to informatica because it is runing on the
oracle databse.Some things which we cant do using tools we can do
using packages.Some jobs make take hours to run ........in order to
save time and database usage we can go for stored procedures
What is the default join operation performed by the look up
transformation equi-join
What is hash table Informatica? Use hash partitioning when you
want the Integration Service to distribute rows to the partitions
by group. For example, you need to sort items by item ID, but you
do not know how many items have a particular ID number
Difference between Cached lookup and Un-cached lookup? For a
cached lookup the entire rows (lookup table) will be put in the
buffer, and compare these rows with the incomming rows.where as
uncached lookup, for every input row the lookup will query the
lookup table and get the rows.So for performance Go for Cache
lookup if Lookup table size< Mapping rowsGo for UnCache lookup
if Lookup table size> Mapping rows..What is polling? displays
the updated information about the session in the monitor window.
The monitor window displays the status of each session when you
poll the Informatica server.
What is rank cache?
The integration service compares input rows in the data cache,
if the input row out-ranks a cached row, the integration service
replaces the cached row with the input row. If you configure the
rank transformation to rank across multiple groups, the integration
service ranks incrementally for each group it finds. The
integration service stores group information in index cache and row
data in data cache