Top Banner
Informatica Question and Answers what is rank transformation?where can we use this ... Rank transformation is used to find the status.ex if we have one sales table and in this if we find more employees selling the same product and we are in need to find the first 5 0r 10 employee who is selling more products.we can go for rank transformation. Where is the cache stored in informatica? cache stored in informatica is in informatica server. If you want to create indexes after the load process which transformation you choose?stored procedure transformation In a joiner transformation, you should specify the source with fewer rows as the master source. Why? In joiner transformation Inforrmatica server reads all the records from master source builds index and data caches based on master table rows after building the caches the joiner transformation reads records from the detail source and perform joins What happens if you try to create a shortcut to a non-shared folder? It only creates a copy of it. What is Transaction? A transaction can be defined as DML operation. means it can be insertion, modification or deletion of data performed by users/ analysts/applicators Can any body write a session parameter file which will change the source and targets for every session i.e different source and targets for each session run. You are supposed to define a parameter file. And then in the Parameter file, you can define two parameters, one for source and one for target. Give like this for example: $Src_file = c:\program files\informatica\server\bin\abc_source.txt $tgt_file = c:\targets\abc_targets.txt
105
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: informatica

Informatica Question and Answers  what is rank transformation?where can we use this ...

 

Rank transformation is used to find the status.ex if we have one sales table and in this if we find more employees selling the same product and we are in need to find the first 5 0r 10 employee who is selling more products.we can go for rank transformation.

  Where is the cache stored in informatica?

 

cache stored in informatica is in informatica server.

 If you want to create indexes after the load process which transformation you choose?stored  procedure transformation In a joiner transformation, you should specify the source with fewer rows as the master source. Why? In joiner transformation Inforrmatica server reads all the records from master source builds index and data caches based on master table rows after building the caches the joiner transformation reads records from the detail source and perform joins What happens if you try to create a shortcut to a non-shared folder? It only creates a copy of it.  What is Transaction?

 

 A transaction can be defined as DML operation.

means it can be insertion, modification or deletion of data performed by users/ analysts/applicators

 Can any body write a session parameter file which will change the source and targets for every session i.e different source and targets for each session run.

You are supposed to define a parameter file. And then in the Parameter file, you can define two parameters, one for source and one for target.

Give like this for example:

$Src_file = c:\program files\informatica\server\bin\abc_source.txt

$tgt_file = c:\targets\abc_targets.txt

Then go and define the parameter file:

[folder_name.WF:workflow_name.ST:s_session_name]$Src_file =c:\program files\informatica\server\bin\abc_source.txt$tgt_file = c:\targets\abc_targets.txt

If its a relational db, you can even give an overridden sql at the session level...as a parameter. Make sure the sql is in a single line.

Informatica Live Interview Questions

Page 2: informatica

here are some of the interview questions i could not answer, any body can help giving answers for others also.thanks in advance.

Explain grouped cross tab?Explain reference cursorWhat are parallel query's and query hintsWhat is meta data and system catalogWhat is factless fact schemaWhat is confirmed dimensionWhich kind of index is preferred in DWHWhy do we use DSS database for OLAP tools

confirmed dimension ==  one  dimension  that  shares with  two fact table

factless   means, fact table  without   measures  only  contains  foreign  keys-two  types  of  factless  table, one  is  event  tracking   and  other  is   coverage  table

Bit map indexes preferred in the data ware housing

Metadate  is  data  about  data,  here  every  thing  is  stored  example-mapping, sessions, privileges  other  data, in informatica we  can  see  the  Metadate  in    the   repository.

System  catalog  that  we   used  in  the   cognos, that   also  contains   data, tables, privileges, predefined   filter  etc,  using  this   catalog  we   generate  reports

group  cross  tab  is   a  type   of   report  in  cognos, where  we   have  to assign 3  measures  for  getting   the   result

What is meant by Junk Attribute in Informatica?

 

Junk Dimension A Dimension is called junk dimension if it contains attribute which are rarely changed ormodified. example In Banking Domain , we can fetch four attributes accounting to a junk dimensions like from the Overall_Transaction_master table tput flag tcmp flag del flag advance flag all these attributes can be a part of a junk dimensions.

  Can anyone explain about incremental aggregation with an example?

When you use aggregator transformation to aggregate it creates index and data caches to store the data 1.Of group by columns 2. Of aggregate columns

the incremental aggregation is used when we have historical data in place which will be used in aggregation incremental aggregation uses the cache which contains the historical data and for each group by column value already present in cache it add the data value to its corresponding data cache value and outputs the row in case of a incoming value having no match in index cache the new values for group by and output ports are inserted into the cache .

Difference between Rank and Dense Rank?

 

Rank:1

Page 3: informatica

2<--2nd position2<--3rd position45

Same Rank is assigned to same totals/numbers. Rank is followed by the Position. Golf game usually Ranks this way. This is usually a Gold Ranking.

Dense Rank:12<--2nd position2<--3rd position34

Same ranks are assigned to same totals/numbers/names. The next rank follows the serial number.

 About Informatica Power center 7:1) I want to know which mapping properties can be overridden on a Session Task level.2)Know what types of permissions are needed to run and schedule Work flows.

1) I want to Know which mapping properties can be overridden on a Session Task level?

You can override any properties other than the source and targets. Make sure the source and targets exist in your db if it is a relational db. If it is a flat file, you can override its properties. You can override sql if its a relational db, session log, DTM buffer size, cache sizes etc.

2) Know what types of permissions are needed to run and schedule Work flows

You need execute permissions on the folder to run/schedule a workflow. You may have read and write. But u need execute permissions as well.

Can any one explain real time complain mappings or complex transformations in Informatica.Especially in Sales Domain.

Most complex logic we use is denormalization. We don’t have any Denormalizer transformation in Informatica. So we will have to use an aggregator followed by an expression. Apart from this, we use most of the complex in expression transformation involving lot of nested IIF and Decode statements...another one is the union transformation and joiner.

How do you create a mapping using multiple lookup transformation?

Use unconnected lookup if same lookup repeats multiple times.

In the source, if we also have duplicate records and we have 2 targets, T1- for unique values and T2- only for duplicate values. How do we pass the unique values to T1 and duplicate values to T2 from the source to these 2 different targets in a single mapping?

Page 4: informatica

Soln1: source--->sq--->exp-->sorter (with enable select distinct check box) --->t1

                            --->aggregator (with enabling group by and write count function) --->t2

If u wants only duplicates to t2 u can follow this sequence

                             --->agg (with enable group by write this code decode(count(col),1,1,0))--->Filter(condition is 0)--->t2.

Soln2: take two source instances and in first one embedded distinct in the source qualifier and connect it to the target t1.

 and just write a query in the second source instance to fetch the duplicate records and connect it to the target t2.

<< if u use aggregator as suggested by my friend u will get duplicate as well as distinct records in the second target >>

Soln3: Use a sorter transformation. Sort on key fields by which u want to find the duplicates. then use an expression transformation.

Example:

Example:field1-->field2-->

SORTER:field1 --ascending/descendingfield2 --ascending/descending

Expression:--> field1--> field2

<--> v_field1_curr = field1<--> v_field2_curr = field2v_dup_flag = IIF(v_field1_curr = v_field1_prev, true, false)o_dup_flag = IIF(v_dup_flag = true, 'Duplicate', 'Not Duplicate'

<--> v_field1_prev = v_field1_curr<--> v_field2_prev = v_field2_curr

Use a Router transformation and put o_dup_flag = 'Duplicate' in T2 and 'Not Duplicate' in T1.

Informatica evaluates row by row. So as we sort, all the rows come in order and it will evaluate based on the previous and current rows.

What are the enhancements made to Informatica 7.1.1 version when compared to 6.2.2 version? In 7+ versions

Page 5: informatica

- We can lookup a flat file - Union and custom transformation- There is propagate option i.e., if we change any data type of a field, all the linked columns will reflect that change- We can write to XML target.- We can use up to 64 partitions What is the difference between Power Centre and Power Mart?

What is the procedure for creating Independent Data Marts from Informatica 7.1?

Power Centre have Multiple Repositories,where as Power mart have single repository(desktop repository)Power Centre again linked to global repositor to share between users

  Power center PowermartNo. of repository n No. n No.

aplicability high end WHlow&mid range WH

global repository supported  not supportedlocal repository supported supportedERP support available not available

What is lookup transformation and update strategy transformation and explain with an example.

Look up transformation is used to lookup the data in a relational table, view, Synonym and Flat file.

The informatica server queries the lookup table based on the lookup ports used in the transformation.

It compares the lookup transformation port values to lookup table column values based on the lookup condition

By using lookup we can get related value, Perform a calculation and Update SCD.

Two types of lookups

Connected

Unconnected

Update strategy transformation

This is used to control how the rows are flagged for insert, update, delete or reject.

To define a flagging of rows in a session it can be insert, Delete, Update or Data driven.

In Update we have three options

Update as Update

Update as insert

Update else insert

Page 6: informatica

What is the logic will you implement to load the data in to one fact able from 'n' number of dimension tables.

To load data into one fact table from more than one dimension tables. Firstly you need to create a fact table and dimension tables. Later load data into individual dimensions by using sources and transformations (aggregator, sequence generator, lookup) in mapping designer then to the fact table connect the surrogate to the foreign key and the columns from dimensions to the fact.

After loading the data into the dimension tables we will load the data into the fact tables    ... the reason for this is that the dimension tables contain the data related to the fact table.

To load the data from dimension table to fact table is simple ..

assume  (dimension table as  source tables) and  fact table as target. that all.....

Can i use a session Bulk loading option that time can i make a recovery to the session?

If the session is configured to use in bulk mode it will not write recovery information to recovery tables. So Bulk loading will not perform the recovery as required.

No, why because in bulk load u won’t create redo log file, when u normal load we create redo log file, but in bulk load session performance increases.

  How do you configure mapping in informatica

 

You should configure the mapping with the least number of transformations and expressions to do the most amount of work possible. You should minimize the amount of data moved by deleting unnecessary links between transformations.

For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup transformations), limit connected input/output or output ports. Limiting the number of connected input/output or output ports reduces the amount of data the transformations store in the data cache.

You can also perform the following tasks to optimize the mapping:

Configure single-pass reading. Optimize datatype conversions. Eliminate transformation errors. Optimize transformations. Optimize expressions. You should configure the mapping with the least number of

transformations and expressions to do the most amount of work possible. You should minimize the amount of data moved by deleting unnecessary links between transformations.

For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup transformations), limit connected input/output or output ports. Limiting the number of connected input/output or output ports reduces the amount of data the transformations store in the data cache.

Page 7: informatica

You can also perform the following tasks to optimize the mapping:

  o Configure single-pass reading. o Optimize datatype conversions. o Eliminate transformation errors. o Optimize transformations. o Optimize expressions.

  What is difference between dimension table and fact table and what are different dimension tables and fact tables

In the fact table contain measurable data and fewer columns and many rows,

It's contain primary key

Different types of fact tables:

Additive, non additive, semi additive

In the dimensions table contain textual description of data and also contain many columns, less rows

Its contain primary key

What are Work let and what use of work let and in which situation we can use it

Worklet is a set of tasks. If a certain set of task has to be reused in many workflows then we use work lets. To execute a Work let, it has to be placed inside a workflow.

The use of work let in a workflow is similar to the use of mapplet in a mapping.

What are mapping parameters and variables in which situation we can use it

If we need to change certain attributes of a mapping after every time the session is run, it will be very difficult to edit the mapping and then change the attribute. So we use mapping parameters and variables and define the values in a parameter file. Then we could edit the parameter file to change the attribute values. This makes the process simple.

Mapping parameter values remain constant. If we need to change the parameter values then we need to edit the parameter file.

But value of mapping variables can be changed by using variable function. If we need to increment the attribute value by 1 after every session run then we can use mapping variables

In a mapping parameter we need to manually edit the attribute value in the parameter file after every session run.

explain use of update strategy transformation

 

Page 8: informatica

Maintain the history data and maintain the most recent changes data.

what is meant by complex mapping,

 

Complex mapping means involved in more logic and more business rules.Actually in my project complex mapping isIn my bank project, I involved in construct a 1 data ware houseMany customer is there in my bank project, They r  after taking loans relocated in to another place that time i feel to difficult maintain both previous and current addressesin the sense i am using scd2This is an simple example of complex mapping

I have an requirement where in the columns names in a table (Table A) should appear in rows of target table (Table B) i.e. converting columns to rows. Is it possible through Informatica? If so, how?

if data in tables as follows Table AKey-1 char(3);table A values_______123

Table Bbkey-a char(3);bcode  char(1);table b values1 T1 A1 G2 A2 T2 L3 A

and output required is as

1, T, A2, A, T, L3, A

the SQL query in source qualifier should be

select key_1,          max(decode( bcode, 'T', bcode, null )) t_code,             max(decode( bcode, 'A', bcode, null )) a_code,          max(decode( bcode, 'L', bcode, null )) l_code    from a, b    where a.key_1 = b.bkey_a    group by key_1   /

Page 9: informatica

If a session fails after loading of 10,000 records in to the target How can u load the records from 10001 th record when u run the session next time in informatica 6.1?

Simple solution, Nothing by using performance recovery option

Can we run a group of sessions without using workflow manager

ya Its Possible using pmcmd Command with out using the workflow Manager run the group of session.

what is the difference between stop and abort

 

The Power Center Server handles the abort command for the Session task like the stop command, except it has a timeout period of 60 seconds. If the Power Center Server cannot finish processing and committing data within the timeout period, it kills the DTM process and terminates the session. 

stop: _______If the session u want to stop is a part of batch you must stop the batch,

if the batch is part of nested batch, Stop the outer most bacth\

Abort:----

You can issue the abort command , it is similar to stop command except it has 60 second time out .

If the server cannot finish processing and committing data with in 60 sec

 What is difference between lookup cache and uncached lookup?

Can i run the mapping with out starting the informatica server?

The difference between cache and uncached lookup is when you configure the lookup transformation cache lookup it stores all the lookup table data in the cache when the first input record enter into the lookup transformation, in cache lookup the select statement executes only once and compares the values of the input record with the values in the cache but in uncached lookup the select statement executes for each input record entering into the lookup transformation and it has to connect to database each time entering the new record

I want to prepare a questionnaire. The details about it are as follows: -

1. Identify a large company/organization that is a prime candidate for DWH project. (For example Telecommunication, an insurance company, banks, may be the prime candidate for this)

2. Give at least four reasons for the selecting the organization.

3. Prepare a questionnaire consisting of at least 15 non-trivial questions to collect requirements/information about the organization. This information is required to build data warehouse.

Page 10: informatica

Can you please tell me what should be those 15 questions to ask from a company, say a telecom company?

First of all meet your sponsors and make a BRD (business requirement document) about their expectation from this data warehouse (main aim comes from them).For example they need customer billing process. Now go to business management team they can ask for metrics out of billing process for their use. Now management people monthly usage, billing metrics, sales organization, rate plan to perform sales rep and channel performance analysis and rate plan analysis. So your dimension tables can be Customer (customer id, name, city, state etc) Sales rep sales rep number, name, idsalesorg: sales ord idBill dimension: Bill #,Bill date, Numberrate plan:rate plan codeAnd Fact table can be:Billing details(bill #,customer id, minutes used, call details etc)you can follow star and snow flake schema in this case. Depend upon the granularity of your data.

Can i start and stop single session in concurrent batch?

Just right click on the particular session and going to recovery option

or

by using event wait and event rise

What is Micro Strategy? Why is it used for? Can any one explain in detail about it?

Micro strategy is again an BI tool which is a HOLAP... u can create 2 dimensional report and also cubes in here.......basically a reporting tool. It has a full range of reporting on web also in windows.

What is difference b/w Informatica 7.1 and Abinitio

There is a lot of difference between Inforrmatica an Abinitio

In Ab Initio we r using 3 parllalisim

but Informatica using 1 parllalisim

In Ab Initio no scheduling option we can scheduled manully or pl/sql script

but informatica contains 4 scheduling options

Ab Inition contains co-operating system

but informatica is not

Ramp time is very quickly in Ab Initio campare than Informatica

Ab Initio is userfriendly than Informatica

  What is mystery dimension?

 

Page 11: informatica

Also known as Junk Dimensions

Making sense of the rogue fields in your fact table..

 What is cost based and rule based approaches and the difference

Cost based and rule based approaches are the optimization techniques which are used in related to databases, where we need to optimize a SQL query.

Basically Oracle provides Two types of Optimizers (indeed 3 but we use only these two techniques. bcz the third has some disadvantages.)

When ever you process any SQL query in Oracle, what oracle engine internally does is, it reads the query and decides which will the best possible way for executing the query. So in this process, Oracle follows these optimization techniques.

1. cost based Optimizer (CBO): If a SQL query can be executed in 2 different ways ( like may have path 1 and path2 for same query),then What CBO does is, it basically calculates the cost of each path and the analyses for which path the cost of execution is less and then executes that path so that it can optimize the query execution.

2. Rule base optimizer(RBO): this basically follows the rules which are needed for executing a query. So depending on the number of rules which are to be applied, the optimzer runs the query.

Use:

If the table you are trying to query is already analysed, then oracle will go with CBO.

If the table is not analysed , the Oracle follows RBO. 

For the first time, if table is not analysed, Oracle will go with full table scan.

what are partition points?

Partition points mark the thread boundaries in a source pipeline and divide

the pipeline into stages.

How to append the records in flat file (Informatica) ? Where as in Data stage we have the options i) overwrite the existing fileii) Append existing file

This is not there in Informatica v 7. But heard that it’s included in the latest version 8.0 where u can append to a flat file. Its about to be shipping in the market.

If u had to split the source level key going into two separate tables. One as surrogate and other as primary. Since informatica does not gurantee keys are loaded properly(order!) into those tables. What are the different ways you could handle this type of situation?

Page 12: informatica

foreign key

what is the best way to show metadata(number of rows at source, target and each transformation level, error related data) in a report format

When your workflow gets completed go to workflow monitor right click the session .then go to transformation statistics there we can see number of rows in source and target. if we go for session properties we can see errors related to data

You can select these details from the repository table.  you can use the view REP_SESS_LOG to get these data

Two relational tables are connected to SQ transformation, what are the possible errors it will be thrown?

We can connect two relational tables in one sq Transformation. No errors will be perform

With out using Updatestrategy and sessons options, how we can do the update our target table?

Soln1: You can use this by using "update override" in target properties

Soln2: In session properties, There is an option

insert

update

insert as update

update as update

like that

by using this we will easily solve

Soln3: By default all the rows in the session is set as insert flag ,you can change it in the session general properties -- Treate source rows as :update

so, all the incoming rows will be set with update flag. now you can update the rows in the target table

  Could anyone please tell me what are the steps required for type2 dimension/version data mapping. how can we implement it

Go to mapping designer in it go for mapping select wizard in it go for slowly changing dimension 

Here u will find a new window their u need to give the mapping name source table target table and type of slowly changing dimension then if select finish slowly changing dimension 2 mapping is created

Page 13: informatica

go to ware designer and generate the table then validate the mapping in mapping designer save it to repository run the session in workflow manager

later update the source table and re run again u will find the difference in target table

How to import oracle sequence into Informatica.

Create one procedure and declare the sequence inside the procedure,finally call the procedure in informatica with the help of stored procedure transformation

What is data merging, data cleansing, sampling?

Cleansing:---TO identify and remove the retundacy and inconsistency

sampling: just smaple the data throug send the data from source to target

What is IQD file?

 

IQD file is nothing but Impromptu Query Definition, This file is mainly used in Cognos Impromptu tool after creating a imr ( report) we save the imr as IQD file which is used while creating a cube in power play transformer.In data source type we select Impromptu Query Definetion.

Differences between Normalizer and  Normalizer transformation.

Normalizer: It is a transormation mainly using for cobol sources,

it's change the rows into coloums and columns into rows

Normalization:To remove the retundancy and inconsitecy

How do I import VSAM files from source to target. Do I need a special plugin

In mapping Designer we have direct option to import files from VSAM Navigation : Sources => Import from file => file from COBOL

What is the procedure or steps implementing versioning if you are already in version7.X. Any gotcha\'s or precautions..

For version control in ETL layer using informatica, first of all after doing anything in your designer mode or workflow manager, do the following steps.....

1> First save the changes or new implementations.

2>Then from navigator window, right click on the specific object you are currently in. There will be a pop up window. In that window at the lower end side, you will find versioning->Check In. A window will be opened. Leave the information you have done  like "modified this mapping" etc. Then click ok button.

can anyone explain error handling in informatica with examples so that it will be easy to explain the same in the interview.

Page 14: informatica

go to the session log file there we will find the information regarding to the

session initiation process,

errors encountered.

load summary.

so by seeing the errors encountered during the session running, we can resolve the errors.

If you have four lookup tables in the workflow How do you troubleshoot to improve performance?

There r many ways to improve the mapping which has multiple lookups.

1) We can create an index for the lookup table if we have permissions(staging area).

2) Divide the lookup mapping into two  (a) dedicate one for insert means: source - target,, these r new rows only the new rows will come to mapping and the process will be fast . (b) Dedicate the second one to update : source=target,, these r existing rows only the rows which exists allready will come into the mapping.

3)we can increase the chache size of the lookup

If you are workflow is running slow in informatica. Where do you start trouble shooting and what are the steps you follow? If you are workflow is running slow in informatica. Where do you start trouble shooting and what are the steps you follow?

SOLN1: when the work flow is running slowly you have to find out the bottlenecks

in this order

target

source

mapping

session

system

SOLN2: work flow may be slow due to different reasons one is alpha characters in decimal data check it out this and due to insufficient length of strings check with the SQL override

How do you handle decimal places while importing a flatfile into informatica?

while importing the flat file, the flat file wizard helps in configuring the properties of the file so that select the numeric column and just enter the precision value and the scale. Precision includes the scale for

Page 15: informatica

examples if the number is 98888.654, enter precision as 8 and scale as 3 and width as 10 for fixed width flat file

In a sequential Batch how can we stop single session?

we have a task called wait event using that we can stop.

we start using raise event.

why dimenstion tables are denormalized in nature ?...

 

Because in Data warehousing historical data should be maintained, to maintain historical data means suppose one employee details like where previously he worked, and now where he is working, all details should be maintain in one table, if u maintain primary key it won't allow the duplicate records with same employee id. so to maintain historical data we are all going for concept data warehousing by using surrogate keys we can achieve the historical data(using oracle sequence for critical column).

so all the dimensions are marinating historical data, they are de normalized, because of duplicate entry means not exactly duplicate record with same employee number another record is maintaining in the table

Can we use aggregator/active transformation after update strategy transformation?

We can use, but the update flag will not be remain. But we can use passive transformation

Can any one comment on

significance of oracle 9i in informatica when compared to oracle 8 or 8i.

i mean how is oracle 9i advantageous when compared to oracle 8 or 8i when used in informatica

it's  very easy

Actually oracle 8i not allowed user defined data types

But 9i allows

and then blob, lob allow only 9i not 8i

and  more over list partinition is there in 9i only

in the concept of mapping parameters and variables, the variable value will be saved to the repository after the completion of the session and the next time when u run the session, the server takes the saved variable value in the repository and starts assigning the next value of the saved value. for example i ran a session and in the end it stored a value of 50 to the repository.next time when i run the session, it should start with the value of 70. not with the value of 51.

Page 16: informatica

how to do this.

SOLN1: u can do onething after running the mapping,, in workflow manager

              start-------->session.

 right clickon the session  u will get a menu, in that go for persistant values, there u will find the last value stored in the repository regarding to mapping variable. then remove it and put ur desired one, run the session... i hope ur task will be done

SOLN2: it takes value of 51 but u can override the saved variable in the repository by defining the value in the parameter file.if there is a parameter file for the mapping variable it uses the value in the parameter file not the value+1 in the repositoryfor example assign the value of the mapping variable as 70.in othere words higher preference is given to the value in the parameter file

how to use mapping parameters and what is their use

Mapping parameters and variables make the use of mappings more flexible and also it avoids creating of multiple mappings. it helps in adding incremental data mapping parameters and variables has to create in the mapping designer by choosing the menu option as Mapping ----> parameters and variables and the enter the name for the variable or parameter but it has to be preceded by $$. and choose type as parameter/variable, data type once defined the variable/parameter is in the any expression for example in SQ transformation in the source filter properties tab. just enter filter condition and finally create a parameter file to assign the value for the variable / parameter and configure the session properties. however the final step is optional. if their parameter is not present it uses the initial value which is assigned at the time of creating the variable

How to delete duplicate rows in flat files source is any option in informatica

Use a sorter transformation , in that u will have a "distinct" option make use of it .

What is the use of incremental aggregation? Explain me in brief with an example.

Its a session option when the informatica server performs incremental aggregation it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally for performance we will use it.

What is the procedure to load the fact table.Give in detail?

SOLN1: we use the 2 wizards (i.e) the getting started wizard and slowly changing dimension wizard to load the fact and dimension tables,by using these 2 wizards we can create different types of mappings according to the business requirements and load into the star schemas(fact and dimension tables).

SOLN2: first dimenstion tables need to be loaded, then according to the specifications the fact tables should be loaded. Don’t think that fact table’s r different in case of loading; it is general mapping as we do for other tables. specifications will play important role for loading the fact.

How to lookup the data on multiple tabels.

Page 17: informatica

if u want to lookup data on multiple tables at a time u can do one thing join the tables which u want then lookup that joined table. informatica provieds lookup on joined tables

How to retrieve the records from a rejected file. explane with syntax or example

SOLN1: there is one utility called "reject Loader" where we can find out the reject records and able to refine and reload the rejected records..

SOLN2: During the execution of workflow all the rejected rows will be stored in bad files (where your informatica server get installed C:\Program Files\Inforrmatica Power Center 7.1\Server) These bad files can be imported as flat a file in source then thro' direct mapping we can load these files in desired format.

How does the server recognise the source and target databases?

By using ODBC connection.if it is relational.if is flat file FTP connection..see we can make sure with connection in the properties of session both sources & targets

What are variable ports and list two situations when they can be used?

We have mainly three ports Inport, Outport, Variable port. Inport represents data is flowing into transformation. Outport is used when data is mapped to next transformation. Variable port is used when we mathematical calculations are required.

you can also use as for example consider price and quantity  and total as a variable  we can make a sum on the total_amt by giving

sum (total_amt)

variable port is used to break the complex expression into simpler

and also it is used to store intermediate values

What is difference between IIF and DECODE function...

 

You can use nested IIF statements to test multiple conditions. The following example tests for various conditions and returns 0 if sales is zero or negative:

IIF( SALES > 0, IIF( SALES < 50, SALARY1, IIF( SALES < 100, SALARY2, IIF( SALES < 200, SALARY3, BONUS))), 0 )

You can use DECODE instead of IIF in many cases. DECODE may improve readability. The following shows how you can use DECODE instead of IIF :

   SALES > 0 and SALES < 50, SALARY1,

   SALES > 49 AND SALES < 100, SALARY2,

   SALES > 99 AND SALES < 200, SALARY3,

   SALES > 199, BONUS)

Page 18: informatica

in Dimensional modeling fact table is normalized or denormalized?in case of star schema and incase of snow flake schema?

No concept of normailzation in the case of star schema but in the case of snow flack schema dimension table must be normalized.

Star schema--De-Normalized dimensions

Snow Flake Schema-- Normalized dimensions

which is better among connected lookup and unconnected lookup transformations in informatica or any other ETL tool?

When you compared both basically connected lookup will return more values and unconnected returns one value conn lookup is in the same pipeline of source and it will accept dynamic caching. Unconn lookup don't have that facility but in some special cases we can use Unconnected. if o/p of one lookup is going as i/p of another lookup this unconnected lookups are favorable

I think the better one is connected look up. beacaz we can use dynamic cache with it ,, also connected loop up can send multiple columns in a single row, where as unconnected is concerned it has a single return port.(in case of  etl informatica is concerned)

What is the limit to the number of sources and targets you can have in a mapping

As per my knowledge there is no such restriction to use this number of sources or targets inside a mapping.

Question is " if you make N number of tables to participate at a time in processing what is the position of your database. I organization point of view it is never encouraged to use N number of tables at a time, It reduces database and informatica server performance"

The restriction is only on the database side. how many concurrent threads r u allowed to run on the db server?

which objects are required by the debugger to create a valid debug session?

Initially the session should be valid session.

Source, target, lookups, expressions should be available min 1 break point should be available for debugger to debug your session.

Informatica server Object is must.

what is the procedure to write the query to list the highest salary of three employees?

SELECT salFROM (SELECT sal FROM my_table ORDER BY sal DESC)WHERE ROWNUM < 4;

since this is informatica.. you might as well use the Rank transformation.  check out the help file on how to use it.

Page 19: informatica

We are using Update Strategy Transformation in mapping how can we know whether insert or update or reject or delete option has been selected during running of sessions in Informatica.

In Designer while creating Update Strategy Transformation uncheck "forward to next transformation". If any rejected rows are there automatically it will be updated to the session log file.

Update or insert files are known by checking the target file or table only.

Suppose session is configured with commit interval of 10,000 rows and source has 50,000 rows. Explain the commit points for Source based commit and Target based commit. Assume appropriate value wherever required.

Source based commit will commit the data into target based on commit interval so for every 10,000 rows it will commit into target.

Target based commit will commit the data into target based on buffer size of the target. i.e., it commits the data into target when ever the buffer fills Let us assume that the buffer size is 6,000. So for every 6,000 rows it commits the data.

How do we estimate the number of partitions that a mapping really requires? Is it dependent on the machine configuration?

It depends upon the informatica version we r using suppose if we r using informatica 6 it supports only 32 partitions where as informatica 7 supports 64 partitions

Can Informatica be used as a Cleansing Tool? If yes give example of transformations that can implement a data cleansing routine.

Yes, we can use Informatica for cleansing data some time we use stages to cleansing the data. It depends upon performance again else we can use expression to cleansing data.

For example a field X has some values and other with Null values and assigned to target field where target field is not null column, inside an expression we can assign space or some constant value to avoid session failure.

The input data is in one format and target is in another format, we can change the format in expression.

We can assign some default values to the target to represent complete set of data in the target.

How do you decide whether you need it do aggregations at database level or at Informatica level?

It depends upon our requirement only If you have good processing database you can create aggregation table or view at database level else its better to use informatica. Here I am explaining why we need to use informatica.

what ever it may be informatica is a third party tool, so it will take more time to process aggregation compared to the database, but in Informatica  an option we  called "Incremental aggregation" which will help you to update the current values with current  values +new values. No necessary to process entire

Page 20: informatica

values again and again unless this can be done if nobody deleted that cache files. If that happened total aggregation we need to execute on informatica also.

In database we don't have Incremental aggregation facility.

Identifying bottlenecks in various components of Informatica and resolving them.

The best way to find out bottlenecks is writing to flat file and see where the bottle neck is .

How to join two tables without using the Joiner Transformation

SOLN1:   It possible to join the two or more tables by using source qualifier. But provided the tables should have relationship.

             When u drag n drop the table u will getting the source qualifier for each table. Delete all the source qualifiers. Add a common source qualifier for all. Right click on the source qualifier u will find EDIT click on it. Click on the properties tab, u will find sql query in that u can write ur sqls

SOLN2: joiner transformation is used to join n (n>1) tables from same or different databases, but source qualifier transformation is used to join only n tables from same database

SOLN3: use Source Qualifier transformation to join tables on the SAME database.  Under its properties tab, you can specify the user-defined join. Any select statement you can run on a database.. you can do also in Source Qualifier. 

Note: you can only join 2 tables with Joiner Transformation but you can join two tables from different databases. 

In a filter expression we want to compare one date field with a db2 system field CURRENT DATE.Our Syntax: datefield = CURRENT DATE (we didn't define it by ports, its a system field ), but this is not valid (PMParser: Missing Operator)..Can someone help us.

the db2 date format is  "yyyymmdd"  where as sysdate in oracle will give "dd-mm-yy" so conversion of db2 date formate to local database date formate is compulsary. other wise u will get that type of error

Use Sysdate or use to_date for the current date

what does the expression n filter transformations do in Informatica Slowly growing target wizard?

EXPESSION transformation detects and flags the rows from source.

Filter transformation filters the rows that are not flagged and passes the flagged rows to the Update strategy transformation

how to create the staging area in your database

 

Page 21: informatica

A Staging area in a DW is used as a temporary space to hold all the records from the source system. So more or less it should be exact replica of the source systems except for the laod startegy where we use truncate and reload options.

So create using the same layout as in your source tables or using the Generate SQL option in the Warehouse Designer tab.

whats the diff between Informatica powercenter server, repositoryserver and repository?

Power center server contains the scheduled runs at which time data should load from source to targetRepository contains all the definitions of the mappings done in designer.

What are the Differences between Informatica Power Center versions 6.2 and 7.1, also between Versions 6.2 and 5.1?

The main difference between informatica 5.1 and 6.1 is that in 6.1 they introduce a new thing called repository server and in place of server manager(5.1), they introduce workflow manager and workflow monitor.

In ver 7x u have the option of looking up (lookup) on a flat file.

U can write to XML target.

Versioning

LDAP authentication

Support of 64 bit architectures

Differences between Informatica 6.2 and Informatica 7.0

Features in 7.1 are :

1. Union and custom transformation

2. Lookup on flat file

3. Grid servers working on different operating systems can coexist on same server

4. We can use pmcmdrep

5.  We can export independent and dependent rep objects

6. We ca move mapping in any web application

7. Version controlling

8. Data profilling

What is the difference between connected and unconnected stored procedures.

Page 22: informatica

Run a stored procedure before or after your session. Unconnected

Run a stored procedure once during your mapping, such as pre- or post-session.

Unconnected

Run a stored procedure every time a row passes through the Stored Procedure transformation.

Connected or Unconnected

Run a stored procedure based on data that passes through the mapping, such as when a specific port does not contain a null value.

Unconnected

Pass parameters to the stored procedure and receive a single output parameter.Connected or Unconnected

Pass parameters to the stored procedure and receive multiple output parameters.Note: To get multiple output parameters from an unconnected Stored Procedure transformation, you must create variables for each output parameter. For details, see Calling a Stored Procedure From an Expression.

Connected or Unconnected

Run nested stored procedures. Unconnected

Call multiple times within a mapping. Unconnected

Discuss which is better among incremental load, Normal Load and Bulk load

If the database supports bulk load option from Inforrmatica then using BULK LOAD for intial loading the tables is recommended.

Depending upon the requirment we should choose between Normal and incremental loading strategies

If supported by the database  bulk load can do the loading faster than normal load.(incremental load concept is differnt dont merge with bulk load, mormal load)

Compare Data Warehousing Top-Down approach with Bottom-up approach

in top down approch: first we have to build dataware house then we will build data marts. which will need more crossfunctional skills and timetaking process also costly.

in bottom up approach: first we will build data marts then data warehuse. the data mart that is first build will remain as a proff of concept for the others. less time as compared to above and less cost.

What is the difference between summary filter and detail filter

summary filter can be applied on a group of rows that contain a common value where as detail filters can be applied on each and every rec of the data base.

what are the difference between view and materialized view?

Materialized views are schema objects that can be used to summarize, precompute, replicate, and distribute data. E.g. to construct a data warehouse.

Page 23: informatica

A materialized view provides indirect access to table data by storing the results of a query in a separate schema

object. Unlike an ordinary view, which does not take up any storage space or contain any data can we modify the data in flat file?

 

Just open the text file with notepad, change what ever you want (but datatype should be the same)

 how to get the first 100 rows from the flat file into the target?

SOLN1: task ----->(link)   session (workflow manager)

double click on link and type $$source sucsess rows(parameter in session variables) = 100

it should automatically stops session.

SOLN2: 1. Use test download option if you want to use it for testing.

2. Put counter/sequence generator in mapping and perform it.

can we lookup a table from a source qualifer transformation-unconnected lookup

No. we can't do.

I will explain you why.

1) Unless you assign the output of the source qualifier to another transformation or to target no way it will include the feild in the query.

2) source qualifier don't have any variables feilds to utalize as expression.

what is a junk dimension

 

A "junk" dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes. A good example would be a trade fact in a company that brokers equity trades.

What is the difference between Narmal load and Bul...

 

Normal Load: Normal load will write information to the database log file so that if any recorvery is needed it is will be helpful. when the source file is a text file and loading data to a table,in such cases we should you normal load only, else the session will be failed.Bulk Mode: Bulk load will not write information to the database log file so that if any recorvery is needed we can't do any thing in such cases. compartivly Bulk load is pretty faster than normal load.

At the max how many tranformations can be us in a mapping?

There is no such limitation to use this number of transformations. But in performance point of view using too many transformations will reduce the session performance. 

Page 24: informatica

My idea is "if needed more tranformations to use in a mapping its better to go for some stored procedure."  

  Waht are main advantages and purpose of using Normalizer Transformation in Informatica?

 

Narmalizer Transformation is used mainly with COBOL sources where most of the time data is stored in de-normalized format. Also, Normalizer transformation can be used to create multiple rows from a single row of data How do u convert rows to columns in Normalizer? could you explain us??

Normally, its used to convert columns to rows but for converting rows to columns, we need an aggregator and expression and little effort is needed for coding. Denormalization is not possible with a Normalizer transformation.

Discuss the advantages & Disadvantages of star & snowflake schema?

In a star schema every dimension will have a primary key.  In a star schema, a dimension table will not have any parent table.  Whereas in a snow flake schema, a dimension table will have one or more parent tables.  Hierarchies for the dimensions are stored in the dimensional table itself in star schema.  Whereas hierachies are broken into separate tables in snow flake schema. These hierachies helps to drill down the data from topmost hierachies to the lowermost hierarchies.

star schema consists of single fact table surrounded by some dimensional table.In snowflake schema the dimension tables are connected with some subdimension table.

In starflake  dimensional ables r denormalized,in snowflake dimension tables r normalized.

star schema is used for report generation ,snowflake schema is used for cube.

The advantage of snowflake schema is that the normalized tables r easier to maintain.it also saves the storage space.

The disadvantage of snowflake schema is that it reduces the effectiveness of navigation across the tables due to large no of joins between them.

what is a time dimension? give an example.

 

Time dimension is one of important in Datawarehouse. Whenever u genetated the report , that time u access all data from thro time dimension.  eg. employee time dimension  Fields : Date key, full date, day of wek, day , month,quarter,fiscal year

What r the connected or unconnected transforamations?

Connected transformation is a part of your data flow in the pipeline while unconnected Transformation is not.

Page 25: informatica

much like calling a program by name and by reference.

use unconnected transforms when you wanna call the same transform many times in a single mapping

An unconnected transformation cant be connected to another transformation. but it can be called inside another transformation.

uncondition transformation are directly connected and can/used in as many as other transformations. If you are using a transformation several times, use unconditional. You get better performance.

How can U create or import flat file definition in to the warehouse designer?

 U can create flat file definition in warehouse designer.in the warehouse designer,u can create new target: select the type as flat file. save it and u can enter various columns for that created target by editing its properties.Once the target is created, save it. u can import it from the mapping designer.

U can not create or import flat file defintion in to warehouse designer directly.Instead U must analyze the file in source analyzer,then drag it into the warehouse designer.When U drag the flat file source defintion into warehouse desginer workspace,the warehouse designer creates a relational target defintion not a file defintion.If u want to load to a file,configure the session to write to a flat file.When the informatica server runs the session,it creates and loads the flatfile.

What r the tasks that Loadmanger process will do?

Manages the session and batch scheduling: Whe u start the informatica server the load maneger launches and queries the repository for a list of sessions configured to run on the informatica server.When u configure the session the loadmanager maintains list of list of sessions and session start times.When u sart a session loadmanger fetches the session information from the repository to perform the validations and verifications prior to starting DTM process.

Locking and reading the session: When the informatica server starts a session lodamaager locks the session from the repository.Locking prevents U starting the session again and again.

Reading the parameter file: If the session uses a parameter files,loadmanager reads the parameter file and verifies that the session level parematers are declared in the file

Verifies permission and privelleges: When the sesson starts load manger checks whether or not the user have privelleges to run the session.

Creating log files: Loadmanger creates logfile contains the status of session.

How do you transfert the data from data warehouse to flatfile?

You can write a mapping with the flat file as a target using a DUMMY_CONNECTION. A flat file target is built by pulling a source into target space using Warehouse Designer tool.

Diff between informatica repositry server & informatica server

Informatica Repository Server:It's manages connections to the repository from client application. Informatica Server:It's extracts the source data,performs the data transformation,and loads the transformed data into the target

Page 26: informatica

Router transformation

 

A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. A Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. However, a Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group. What are 2 modes of data movement in Informatica Server?The data movement mode depends on whether Informatica Server should process single byte or multi-byte character data. This mode selection can affect the enforcement of code page relationships and code page validation in the Informatica Client and Server.

a) Unicode - IS allows 2 bytes for each character and uses additional byte for each non-ascii character (such as Japanese characters)

b) ASCII - IS holds all data in a single byte

The IS data movement mode can be changed in the Informatica Server configuration parameters. This comes into effect once you restart the Informatica Server.

How to read rejected data or bad data from bad file and reload it to target?

correction the rejected data and send to target relational tables using loadorder utility. Find out the rejected data by using column indicatior and row indicator.

Explain the informatica Architecture in detail

Informatica server connects source data and target data using native

odbc drivers

again it connect to the repository for running sessions and retriveing metadata information

source------>informatica server--------->target

                                       |

                                       |

                                REPOSITORY      repository←Repository→Repository ser.adm.

control                      server                           ¢Õ source←informatica server→target  -------------¢Õ             ¢Õ                 ¢Õdesigner      w.f.manager        w.f.monitor how can we partition a session in Informatica?

 

The Informatica® PowerCenter® Partitioning option optimizes parallel processing on multiprocessor hardware by providing a thread-based architecture and built-in data partitioning. 

Page 27: informatica

GUI-based tools reduce the development effort necessary to create data partitions and streamline ongoing troubleshooting and performance tuning tasks, while ensuring data integrity throughout the execution process. As the amount of data within an organization expands and real-time demand for information grows, the PowerCenter Partitioning option enables hardware and applications to provide outstanding performance and jointly scale to handle large volumes of data and users.

 What is Load Manager? While running a Workflow,the PowerCenter Server uses the Load Manager process and the Data Transformation Manager Process (DTM) to run the workflow and carry out workflow tasks.When the PowerCenter Server runs a workflow, the Load Manager performs the following tasks:  1. Locks the workflow and reads workflow properties. 2. Reads the parameter file and expands workflow variables. 3. Creates the workflow log file. 4. Runs workflow tasks. 5. Distributes sessions to worker servers. 6. Starts the DTM to run sessions. 7. Runs sessions from master servers. 8. Sends post-session email if the DTM terminates abnormally.  When the PowerCenter Server runs a session, the DTM performs the following tasks: 1. Fetches session and mapping metadata from the repository. 2. Creates and expands session variables. 3. Creates the session log file. 4. Validates session code pages if data code page validation is enabled. Checks query conversions if data code page validation is disabled. 5. Verifies connection object permissions. 6. Runs pre-session shell commands. 7. Runs pre-session stored procedures and SQL. 8. Creates and runs mapping, reader, writer, and transformation threads to extract,transform, and load data. 9. Runs post-session stored procedures and SQL. 10. Runs post-session shell commands. 11. Sends post-session email.What is Data cleansing..?

    The process of finding and removing or correcting data that is incorrect, out-of-date, redundant, incomplete, or formatted incorrectly.

        This is nothing but polising of data. For example of one of the sub system store the Gender as M and F. The other may store it as MALE and FEMALE. So we need to polish this data, clean it before it is add to Datawarehouse. Other typical example can be Addresses. The all sub systesms maintinns the customer address can be different. We might need a address cleansing to tool to have the customers addresses in clean and neat form.

To provide support for Mainframes source data,which files r used as a source definitions?COBOL Copy-book filesWhere should U place the flat file to import the flat file defintion to the designer?

Page 28: informatica

There is no such restrication to place the source file. In performance point of view its better to place the file in server local src folder. if you need path please check the server properties availble at workflow manager.

It doesn't mean we should not place in any other folder, if we place in server src folder by default src will be selected at time session creation

How many ways you can update a relational source defintion and what r they?Two ways 1. Edit the definition 2. Reimport the definitionWhich transformation should u need while using the cobol sources as source defintions?Normalizer transformaiton which is used to normalize the data.Since cobol sources r oftenly consists of Denormailzed data. What is the maplet?

 

For Ex:Suppose we have several fact tables that require a series of dimension keys.Then we can create a mapplet which contains a series of Lkp transformations to find each dimension key and use it in each fact table mapping instead of creating the same Lkp logic in each mapping. 

what is a transforamation?It is a repostitory object that generates,modifies or passes data.A transformation is repository object that pass data to the next stage(i.e to the next transformation or target) with/with out modifying the dataWhat r the active and passive transforamtions?An active transforamtion can change the number of rows that pass through it.A passive transformation does not change the number of rows that pass through it.

Transformations can be active or passive. An active transformation can change the number of rows that pass through it, such as a Filter transformation that removes rows that do not meet the filter condition.

A passive transformation does not change the number of rows that pass through it, such as an Expression transformation that performs a calculation on data and passes all rows through the transformation.

What r the reusable transforamtions?Reusable transformations can be used in multiple mappings.When u need to incorporate this transformation into maping,U add an instance of it to maping.Later if U change the definition of the transformation ,all instances of it inherit the changes.Since the instance of reusable transforamation is a pointer to that transforamtion,U can change the transforamation in the transformation developer,its instances automatically reflect these changes.This feature can save U great deal of work.What r the methods for creating reusable transforamtions?Two methods 1.Design it in the transformation developer. 2.Promote a standard transformation from the mapping designer.After U add a transformation to the mapping , U can promote it to the status of reusable transformation. Once U promote a standard transformation to reusable status,U can demote it to a standard transformation at any time. If u change the properties of a reusable transformation in mapping,U can revert it to the original reusable transformation properties by clicking the revert button.What r the unsupported repository objects for a mapplet?COBOL source definition Joiner transformations Normalizer transformations Non reusable sequence generator transformations. Pre or post session stored procedures Target defintions

Page 29: informatica

Power mart 3.5 style Look Up functions XML source definitions IBM MQ source definitions  Source definitions. Definitions of database objects (tables, views, synonyms) or files that provide source data.   Target definitions. Definitions of database objects or files that contain the target data.   Multi-dimensional metadata. Target definitions that are configured as cubes and dimensions.   Mappings. A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data.   Reusable transformations. Transformations that you can use in multiple mappings.   Mapplets. A set of transformations that you can use in multiple mappings.   Sessions and workflows. Sessions and workflows store information about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping.What r the mapping paramaters and maping variables?Maping parameter represents a constant value that U can define before running a session.A mapping parameter retains the same value throughout the entire session. When u use the maping parameter ,U declare and use the parameter in a maping or maplet.Then define the value of parameter in a parameter file for the session. Unlike a mapping parameter,a maping variable represents a value that can change throughout the session.The informatica server saves the value of maping variable to the repository at the end of session run and uses that value next time U run the session.Can U use the maping parameters or variables created in one maping into another maping?NO. We can use mapping parameters or variables in any transformation of the same maping or mapplet in which U have created maping parameters or variables.Can u use the maping parameters or variables created in one maping into any other reusable transformation?Yes.Because reusable tranformation is not contained with any maplet or maping. How can U improve session performance in aggregator transformation?

 

use sorted input:

1. use a sorter before the aggregator

2. donot forget to check the option on the aggregator that tell the aggregator that the input is sorted on the same keys as group by.

the key order is also very important

What is aggregate cache in aggregator transforamtion?The aggregator stores data in the aggregate cache until it completes aggregate calculations.When u run a session that uses an aggregator transformation,the informatica server creates index and data caches in memory to process the transformation.If the informatica server requires more space,it stores overflow values in cache files.

When you run a workflow that uses an Aggregator transformation, the Informatica Server creates index and data caches in memory to process the transformation. If the Informatica Server requires more space, it stores overflow values in cache files.

What r the diffrence between joiner transformation and source qualifier transformation?U can join hetrogenious data sources in joiner transformation which we can not achieve in source qualifier transformation. U need matching keys to join two relational sources in source qualifier transformation.Where as u doesn’t need matching keys to join two sources. Two relational sources should come from same datasource in sourcequalifier.U can join relatinal sources which r coming from diffrent sources also.In which condtions we can not use joiner

Page 30: informatica

transformation(Limitaions of joiner transformation)?Both pipelines begin with the same original data source. Both input pipelines originate from the same Source Qualifier transformation. Both input pipelines originate from the same Normalizer transformation. Both input pipelines originate from the same Joiner transformation. Either input pipelines contains an Update Strategy transformation. Either input pipelines contains a connected or unconnected Sequence Generator transformation.what r the settiings that u use to cofigure the joiner transformation?  Master and detail source   Type of join   Condition of the join

the Joiner transformation supports the following join types, which you set in the Properties tab:

Normal (Default) Master Outer Detail Outer Full Outer

What r the join types in joiner transformation?

 

Normal (Default) -- only matching rows from both master and detailMaster outer -- all detail rows and only matching rows from masterDetail outer -- all master rows and only matching rows from detailFull outer  -- all rows from both master and detail ( matching or non matching)

follw this

1. In the Mapping Designer, choose Transformation-Create. Select the Joiner transformation. Enter a name, click OK.

The naming convention for Joiner transformations is JNR_TransformationName. Enter a description for the transformation. This description appears in the Repository Manager, making it easier for you or others to understand or remember what the transformation does. The Designer creates the Joiner transformation. Keep in mind that you cannot use a Sequence Generator or Update Strategy transformation as a source to a Joiner transformation.

2. Drag all the desired input/output ports from the first source into the Joiner transformation.

The Designer creates input/output ports for the source fields in the Joiner as detail fields by default. You can edit this property later.

3. Select and drag all the desired input/output ports from the second source into the Joiner transformation.

The Designer configures the second set of source fields and master fields by default. 4. Double-click the title bar of the Joiner transformation to open the Edit Transformations

dialog box. 5. Select the Ports tab. 6. Click any box in the M column to switch the master/detail relationship for the sources.

Change the master/detail relationship if necessary by selecting the master source in the M column.

Tip: Designating the source with fewer unique records as master increases performance during a join.

7. Add default values for specific ports as necessary.

Page 31: informatica

Certain ports are likely to contain NULL values, since the fields in one of the sources may be empty. You can specify a default value if the target database does not handle NULLs.

8. Select the Condition tab and set the condition. 9. Click the Add button to add a condition. You can add multiple conditions. The master

and detail ports must have matching datatypes. The Joiner transformation only supports equivalent (=) joins:

 10. Select the Properties tab and enter any additional settings for the transformations. 11. Click OK. 12. Choose Repository-Save to save changes to the mapping.

 

What r the joiner caches?When a Joiner transformation occurs in a session, the Informatica Server reads all the records from the master source and builds index and data caches based on the master rows.

After building the caches, the Joiner transformation reads records from the detail source and perform joinswhat is the look up transformation?Use lookup transformation in u’r mapping to lookup data in a relational table,view,synonym. Informatica server queries the look up table based on the lookup ports in the transformation.It compares the lookup transformation port values to lookup table column values based on the look up condition.Why use the lookup transformation ?To perform the following tasks. Get a related value. For example, if your source table includes employee ID, but you want to include the employee name in your target table to make your summary data easier to read. Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales). Update slowly changing dimension tables. You can use a Lookup transformation to determine whether records already exist in the target. What r the types of lookup?

 

1. Connected lookup2. Unconnected lookup1. Persistent cache2. Re-cache from database3. Static cache4. Dynamic cache5. Shared cache

Differences between connected and unconnected lookup?

Connected lookup Unconnected lookup

Receives input values diectly from the pipe line.

Receives input values from the result of a  lkp expression in a another transformation.

U can use a dynamic or static cache U can use a static cache.

Cache includes all lookup columns used in the maping

Cache includes all lookup out put ports in the lookup condition and the lookup/return port.

Support user defined default values Does not support user defiend default values

What is meant by lookup caches?The informatica server builds a cache in memory when it processes the first row af a data in a cached look up transformation.It allocates memory for the cache based on the amount u configure in the transformation or session properties.The informatica server stores condition values in the index cache and output values in the data cache.What r the types of lookup

Page 32: informatica

caches?Persistent cache: U can save the lookup cache files and reuse them the next time the informatica server processes a lookup transformation configured to use the cache.

Recache from database: If the persistent cache is not synchronized with he lookup table, U can configure the lookup transformation to rebuild the lookup cache.

Static cache: U can configure a static or readonly cache for only lookup table.By default informatica server creates a static cache.It caches the lookup table and lookup values in the cache for each row that comes into the transformation.when the lookup condition is true,the informatica server does not update the cache while it prosesses the lookup transformation.

Dynamic cache: If u want to cache the target table and insert new rows into cache and the target,u can create a look up transformation to use dynamic cache.The informatica server dynamically inerts data to the target table.

Shared cache: U can share the lookup cache between multiple transactions. U can share unnamed cache between transformations in the same maping.Difference between static cache and dynamic cache

Static cache Dynamic cache

U can not insert or update the cacheU can insert rows into the cache as u pass to the target

The informatica server returns a value from the lookup table or cache when the condition is true. When the condition is not true, informatica server returns the default value for connected transformations and null for unconnected transformations.

The informatica server inserts rows into cache when the condition is false. This indicates that the row is not in the cache or target table. U can pass these rows to the target table

Which transformation should we use to normalize the COBOL and relational sources?Normalizer Transformation. When U drag the COBOL source in to the mapping Designer workspace,the normalizer transformation automatically appears,creating input and output ports for every column in the source.How the informatica server sorts the string values in Ranktransformation?When the informatica server runs in the ASCII data movement mode it sorts session data using Binary sortorder.If U configure the seeion to use a binary sort order,the informatica server caluculates the binary value of each string and returns the specified number of rows with the higest binary values for the string.What r the rank caches?During the session ,the informatica server compares an inout row with rows in the datacache.If the input row out-ranks a stored row,the informatica server replaces the stored row with the input row.The informatica server stores group information in an index cache and row data in a data cache.What is the Rankindex in Ranktransformation?The Designer automatically creates a RANKINDEX port for each Rank transformation. The Informatica Server uses the Rank Index port to store the ranking position for each record in a group. For example, if you create a Rank transformation that ranks the top 5 salespersons for each quarter, the rank index numbers the salespeople from 1 to 5:What is the Router transformation?A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. However, a Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. A Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group. If you need to test the same input data based on multiple conditions, use a Router Transformation in a mapping instead of creating multiple Filter transformations to perform the same task.What r the types of groups in Router transformation?Input group Output group

The designer copies property information from the input ports of the input group to create a set of output ports for each output group. Two types of output groups

Page 33: informatica

User defined groups Default group U can not modify or delete default groups.Why we use stored procedure transformation? A Stored Procedure transformation is an important tool for populating and maintaining databases. Database administrators create stored procedures to automate time-consuming tasks that are too complicated for standard SQL statementsWhat r the types of data that passes between informatica server and stored procedure?3 types of data Input/Out put parameters Return Values Status code.What is the status code?Status code provides error handling for the informatica server during the session.The stored procedure issues a status code that notifies whether or not stored procedure completed sucessfully.This value can not seen by the user.It only used by the informatica server to determine whether to continue running the session or stop. What is source qualifier transformation? What r the tasks that source qualifier performs?

 

When you add a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier represents the rows that the Informatica Server reads when it executes a session.

  Join data originating from the same source database. You can join two or more tables with primary-foreign key relationships by linking the sources to one Source Qualifier.   Filter records when the Informatica Server reads source data. If you include a filter condition, the Informatica Server adds a WHERE clause to the default query.   Specify an outer join rather than the default inner join. If you include a user-defined join, the Informatica Server replaces the join information specified by the metadata in the SQL query.   Specify sorted ports. If you specify a number for sorted ports, the Informatica Server adds an ORDER BY clause to the default SQL query.   Select only distinct values from the source. If you choose Select Distinct, the Informatica Server adds a SELECT DISTINCT statement to the default SQL query.   Create a custom query to issue a special SELECT statement for the Informatica Server to read source data. For example, you might use a custom query to perform aggregate calculations or execute a stored procedure.

What is the target load order?U specify the target loadorder based on source qualifiers in a maping.If u have the multiple source qualifiers connected to the multiple targets,U can designatethe order in which informatica server loads data into the targets.

A target load order group is the collection of source qualifiers, transformations, and targets linked together in a mapping.

What is the default join that source qualifier provides?Inner equi join.

The Joiner transformation supports the following join types, which you set in the Properties tab:

Normal (Default) Master Outer Detail Outer Full Outer

What r the basic needs to join two sources in a source qualifier?Two sources should have primary and Foreign key relation ships. Two sources should have matching data types.

Page 34: informatica

what is update strategy transformation ?

 

The model you choose constitutes your update strategy, how to handle changes to existing rows. In PowerCenter and PowerMart, you set your update strategy at two different levels:

Within a session. When you configure a session, you can instruct the Informatica Server to either treat all rows in the same way (for example, treat all rows as inserts), or use instructions coded into the session mapping to flag rows for different database operations.

Within a mapping. Within a mapping, you use the Update Strategy transformation to flag rows for insert, delete, update, or reject.

Describe two levels in which update strategy transformation sets?Within a session. When you configure a session, you can instruct the Informatica Server to either treat all records in the same way (for example, treat all records as inserts), or use instructions coded into the session mapping to flag records for different database operations.

Within a mapping. Within a mapping, you use the Update Strategy transformation to flag records for insert, delete, update, or reject.What is the default source option for update stratgey transformation?Data driven.What is Datadriven?The informatica server follows instructions coded into update strategy transformations with in the session maping determine how to flag records for insert, update, delete or reject. If u do not choose data driven option setting,the informatica server ignores all update strategy transformations in the mapping.What r the options in the target session of update strategy transsformatioin?Insert Delete Update Update as update Update as insert Update esle insert Truncate table

Update as Insert:

This option specified all the update records from source to be flagged as inserts in the target. In other words, instead of updating the records in the target they are inserted as new records.

Update else Insert:

This option enables informatica to flag the records either for update if they are old or insert, if they are new records from source.

What r the types of maping wizards that r to be provided in Informatica?Simple Pass through Slowly Growing Target Slowly Changing the Dimension Type1 Most recent values Type2Full History

VersionFlagDate

Type3 Current and one previousWhat r the types of maping in Getting Started Wizard?Simple Pass through maping : Loads a static fact or dimension table by inserting all rows. Use this mapping when you want to drop all

Page 35: informatica

existing data from your table before loading new data.

Slowly Growing target : Loads a slowly growing fact or dimension table by inserting new rows. Use this mapping to load new data when existing data does not require updates.What r the mapings that we use for slowly changing dimension table? Type1: Rows containing changes to existing dimensions are updated in the target by overwriting the existing dimension. In the Type 1 Dimension mapping, all rows contain current dimension data. Use the Type 1 Dimension mapping to update a slowly changing dimension table when you do not need to keep any previous versions of dimensions in the table.

Type 2: The Type 2 Dimension Data mapping inserts both new and changed dimensions into the target. Changes are tracked in the target table by versioning the primary key and creating a version number for each dimension in the table. Use the Type 2 Dimension/Version Data mapping to update a slowly changing dimension table when you want to keep a full history of dimension data in the table. Version numbers and versioned primary keys track the order of changes to each dimension.

Type 3: The Type 3 Dimension mapping filters source rows based on user-defined comparisons and inserts only those found to be new dimensions to the target. Rows containing changes to existing dimensions are updated in the target. When updating an existing dimension, the Informatica Server saves existing data in different columns of the same row and replaces the existing data with the updatesWhat r the different types of Type2 dimension maping?Type2 Dimension/Version Data Maping: In this maping the updated dimension in the source will gets inserted in target along with a new version number.And newly added dimension in source will inserted into target with a primary key.

Type2 Dimension/Flag current Maping: This maping is also used for slowly changing dimensions.In addition it creates a flag value for changed or new dimension. Flag indiactes the dimension is new or newlyupdated.Recent dimensions will gets saved with cuurent flag value 1. And updated dimensions r saved with the value 0.

Type2 Dimension/Effective Date Range Maping: This is also one flavour of Type2 maping used for slowly changing dimensions.This maping also inserts both new and changed dimensions in to the target.And changes r tracked by the effective date range for each version of each dimension.How can u recognise whether or not the newly added rows in the source r gets insert in the target ?In the Type2 maping we have three options to recognise the newly added rows Version number Flagvalue Effective date RangeWhat r two types of processes that informatica runs the session?Load manager Process: Starts the session, creates the DTM process, and sends post-session email when the session completes. The DTM process. Creates threads to initialize the session, read, write, and transform data, and handle pre- and post-session operations.What r the new features of the server manager in the informatica 5.0?U can use command line arguments for a session or batch.This allows U to change the values of session parameters,and mapping parameters and maping variables.

Parallel data processing: This feature is available for powercenter only.If we use the informatica server on a SMP system,U can use multiple CPU’s to process a session concurently.

Process session data using threads: Informatica server runs the session in two processes.Explained in previous question.Can u generate reports in Informatcia? It is a ETL tool, you could not make reports from here, but you can generate metadata report, that is not going to be used for business analysis

Page 36: informatica

What is metadata reporter?It is a web based application that enables you to run reports againist repository metadata. With a meta data reporter,u can access information about U’r repository with out having knowledge of sql,transformation language or underlying tables in the repository.Define maping and sessions?Maping: It is a set of source and target definitions linked by transformation objects that define the rules for transformation. Session : It is a set of instructions that describe how and when to move data from source to targets.Which tool U use to create and manage sessions and batches and to monitor and stop the informatica server?Informatica server manager.what is polling?It displays the updated information about the session in the monitor window. The monitor window displays the status of each session when U poll the informatica server.While importing the relational source defintion from database,what are the meta data of source U import?Source name Database location Column names Datatypes Key constraints What r the designer tools for creating tranformations?Mapping designer Tansformation developer Mapplet designerHow many ways u create ports?Two ways 1.Drag the port from another transforamtion 2.Click the add buttion on the ports tab.Why we use partitioning the session in informatica?Partitioning achieves the session performance by reducing the time period of reading the source and loading the data into target.

Performance can be improved by processing data in parallel in a single session by creating multiple partitions of the pipeline.

Informatica server can achieve high performance by partitioning the pipleline and performing the extract , transformation, and load for each partition in parallel. 

To achieve the session partition what r the necessary tasks u have to do?Configure the session to partition source data.

Install the informatica server on a machine with multiple CPU’s.How the informatica server increases the session performance through partitioning the source?For a relational sources informatica server creates multiple connections for each parttion of a single source and extracts seperate range of data for each connection.Informatica server reads multiple partitions of a single source concurently.Similarly for loading also informatica server creates multiple connections to the target and loads partitions of data concurently.

For XML and file sources,informatica server reads multiple files concurently.For loading the data informatica server creates a seperate file for each partition(of a source file).U can choose to merge the targets.Why u use repository connectivity?When u edit,schedule the sesion each time,informatica server directly communicates the repository to check whether or not the session and users r valid.All the metadata of sessions and mappings will be stored in repository.  What is DTM process?After the loadmanger performs validations for session,it creates the DTM process.DTM is to create and manage the threads that carry out the session tasks.I creates the master thread.Master thread creates and manges all the other threads.What r the different threads in DTM process?Master thread: Creates and manages all other threads

Maping thread: One maping thread will be creates for each session.Fectchs session and maping information.

Pre and post session threads: This will be created to perform pre and post session operations.

Page 37: informatica

Reader thread: One thread will be created for each partition of a source.It reads data from source.

Writer thread: It will be created to load data to the target.

Transformation thread: It will be created to tranform data.What r the data movement modes in informatcia?Datamovement modes determines how informatcia server handles the charector data.U choose the datamovement in the informatica server configuration settings.Two types of datamovement modes avialable in informatica.

ASCII mode Uni code mode.What r the out put files that the informatica server creates during the session running?Informatica server log: Informatica server(on unix) creates a log for all status and error messages(default name: pm.server.log).It also creates an error log for error messages.These files will be created in informatica home directory.

Session log file: Informatica server creates session log file for each session.It writes information about session into log files such as initialization process,creation of sql commands for reader and writer threads,errors encountered and load summary.The amount of detail in session log file depends on the tracing level that u set.

Session detail file: This file contains load statistics for each targets in mapping.Session detail include information such as table name,number of rows written or rejected.U can view this file by double clicking on the session in monitor window

Performance detail file: This file contains information known as session performance details which helps U where performance can be improved.To genarate this file select the performance detail option in the session property sheet.

Reject file: This file contains the rows of data that the writer does notwrite to targets.

Control file: Informatica server creates control file and a target file when U run a session that uses the external loader.The control file contains the information about the target flat file such as data format and loading instructios for the external loader.

Post session email: Post session email allows U to automatically communicate information about a session run to designated recipents.U can create two different messages.One if the session completed sucessfully the other if the session fails.

Indicator file: If u use the flat file as a target,U can configure the informatica server to create indicator file.For each target row,the indicator file contains a number to indicate whether the row was marked for insert,update,delete or reject.

output file: If session writes to a target file,the informatica server creates the target file based on file prpoerties entered in the session property sheet.

Cache files: When the informatica server creates memory cache it also creates cache files.For the following circumstances informatica server creates index and datacache files.

Aggreagtor transformation Joiner transformation Rank transformation Lookup transformationIn which circumstances that informatica server creates Reject files?When it encounters the DD_Reject in update strategy transformation. Violates database constraint

Page 38: informatica

Filed in the rows was truncated or overflowed.Can u copy the session to a different folder or repository?Yes. By using copy session wizard u can copy a session in a different folder or repository.But that target folder or repository should consists of mapping of that session. If target folder or repository is not having the maping of copying session , u should have to copy that maping first before u copy the sessionIn addition, you can copy the workflow from the Repository manager. This will automatically copy the mapping, associated source,targets and session to the target folder.What is batch and describe about types of batches?Grouping of session is known as batch.Batches r two types Sequential: Runs sessions one after the other Concurrent: Runs session at same time.

If u have sessions with source-target dependencies u have to go for sequential batch to start the sessions one after another.If u have several independent sessions u can use concurrent batches. Whch runs all the sessions at the same time. How many number of sessions that u can create in a batch?Any number of sessions.When the informatica server marks that a batch is failed?If one of session is configured to "run if previous completes" and that previous session failsWhat is a command that used to run a batch?pmcmd is used to start a batch.What r the different options used to configure the sequential batches?Two options

Run the session only if previous session completes sucessfully. Always runs the session.In a sequential batch can u run the session if previous session fails?Yes.By setting the option always runs the session.Can u start a batches with in a batch?U can not. If u want to start batch that resides in a batch,create a new independent batch and copy the necessary sessions into the new batch.Can u start a session inside a batch idividually?We can start our required session only in case of sequential batch.in case of concurrent batch we cant do like this.How can u stop a batch?By using server manager or pmcmd.What r the session parameters?Session parameters r like maping parameters,represent values U might want to change between sessions such as database connections or source files.

Server manager also allows U to create userdefined session parameters.Following r user defined session parameters. Database connections Source file names: use this parameter when u want to change the name or location of session source file between session runs Target file name : Use this parameter when u want to change the name or location of session target file between session runs. Reject file name : Use this parameter when u want to change the name or location of session reject files between session runs.What is parameter file?Parameter file is to define the values for parameters and variables used in a session.A parameter file is a file created by text editor such as word pad or notepad. U can define the following values in parameter file Maping parameters Maping variables session parameters

For Windows command prompt users, the parameter file name cannot have beginning or trailing spaces. If the name includes spaces, enclose the file name in double quotes:

-paramfile ”$PMRootDir\my file.txt”

Page 39: informatica

Note: When you write a pmcmd command that includes a parameter file located on another machine, use the backslash (\) with the dollar sign ($). This ensures that the machine where the variable is defined expands the server variable.

pmcmd startworkflow -uv USERNAME -pv PASSWORD -s SALES:6258 -f east -w wSalesAvg -paramfile '$PMRootDir/myfile.txt'

How can u access the remote source into U’r session?Relational source: To acess relational source which is situated in a remote place ,u need to configure database connection to the datasource.

FileSource : To access the remote source file U must configure the FTP connection to the host machine before u create the session.

Hetrogenous : When U’r maping contains more than one source type,the server manager creates a hetrogenous session that displays source options for all types.What is difference between partioning of relatonal target and partitioning of file targets?If u parttion a session with a relational target informatica server creates multiple connections to the target database to write target data concurently.If u partition a session with a file target the informatica server creates one target file for each partition.U can configure session properties to merge these target fileswhat r the transformations that restricts the partitioning of sessions?Advanced External procedure tranformation and External procedure transformation: This transformation contains a check box on the properties tab to allow partitioning.

Aggregator Transformation: If u use sorted ports u can not parttion the assosiated source

Joiner Transformation : U can not partition the master source for a joiner transformation

Normalizer Transformation

XML targets.Performance tuning in Informatica?The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server.Increase the session performance by following.

The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections ofteny affect on session performance.So aviod netwrok connections.

Flat files: If u’r flat files stored on a machine other than the informatca server, move those files to the machine that consists of informatica server. Relational datasources: Minimize the connections to sources ,targets and informatica server to improve session performance.Moving target database into server system may improve session performance. Staging areas: If u use staging areas u force informatica server to perform multiple datapasses. Removing of staging areas may improve session performance.

U can run the multiple informatica servers againist the same repository.Distibuting the session load to multiple informatica servers may improve session performance.

Run the informatica server in ASCII datamovement mode improves the session performance.Because ASCII datamovement mode stores a character value in one byte.Unicode mode takes 2 bytes to store a character.

If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit

Page 40: informatica

from optimization such as adding indexes.

We can improve the session performance by configuring the network packet size,which allows data to cross the network at one time.To do this go to server manger ,choose server configure database connections.

If u r target consists key constraints and indexes u slow the loading of data.To improve the session performance in this case drop constraints and indexes before u run the session and rebuild them after completion of session.

Running a parallel sessions by using concurrent batches will also reduce the time of loading the data.So concurent batches may also increase the session performance.

Partittionig the session improves the session performance by creating multiple connections to sources and targets and loads data in paralel pipe lines.

In some cases if a session contains a aggregator transformation ,u can use incremental aggregation to improve session performance.

Aviod transformation errors to improve the session performance.

If the sessioin containd lookup transformation u can improve the session performance by enabling the look up cache.

If U’r session contains filter transformation ,create that filter transformation nearer to the sources or u can use filter condition in source qualifier.

Aggreagator,Rank and joiner transformation may oftenly decrease the session performance .Because they must group data before processing it.To improve session performance in this case use sorted ports option.What is difference between maplet and reusable transformation?Maplet consists of set of transformations that is reusable.A reusable transformation is a single transformation that can be reusable.

If u create a variables or parameters in maplet that can not be used in another maping or maplet.Unlike the variables that r created in a reusable transformation can be usefull in any other maping or maplet.

We can not include source definitions in reusable transformations.But we can add sources to a maplet.

Whole transformation logic will be hided in case of maplet.But it is transparent in case of reusable transformation.

We cant use COBOL source qualifier,joiner,normalizer transformations in maplet.Where as we can make them as a reusable transformations.Define informatica repository?The Informatica repository is a relational database that stores information, or metadata, used by the Informatica Server and Client tools. Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets.

The repository also stores administrative information such as usernames and passwords, permissions and privileges, and product version.

Use repository manager to create the repository.The Repository Manager connects to the repository database and runs the code needed to create the repository tables.Thsea tables stores metadata in specific format the informatica server,client tools use.What r the types of metadata that stores in repository?Following r the types of metadata that stores in the repository

Page 41: informatica

Database connections Global objects Mappings Mapplets Multidimensional metadata Reusable transformations Sessions and batches Short cuts Source definitions Target defintions Transformations  Source definitions. Definitions of database objects (tables, views, synonyms) or files that provide source data.   Target definitions. Definitions of database objects or files that contain the target data.   Multi-dimensional metadata. Target definitions that are configured as cubes and dimensions.   Mappings. A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data.   Reusable transformations. Transformations that you can use in multiple mappings.   Mapplets. A set of transformations that you can use in multiple mappings.   Sessions and workflows. Sessions and workflows store information about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mappingWhat is power center repository?The PowerCenter repository allows you to share metadata across repositories to create a data mart domain. In a data mart domain, you can create a single global repository to store metadata used across an enterprise, and a number of local repositories to share the global metadata as needed.  Standalone repository. A repository that functions individually, unrelated and unconnected to other repositories.   Global repository. (PowerCenter only.) The centralized repository in a domain, a group of connected repositories. Each domain can contain one global repository. The global repository can contain common objects to be shared throughout the domain through global shortcuts.   Local repository. (PowerCenter only.) A repository within a domain that is not the global repository. Each local repository in the domain can connect to the global repository and use objects in its shared folders.How can u work with remote database in informatica?did u work directly by using remote connections?To work with remote datasource u need to connect it with remote connections.But it is not preferable to work with that remote source directly by using remote connections .Instead u bring that source into U r local machine where informatica server resides.If u work directly with remote source the session performance will decreases by passing less amount of data across the network in a particular time. You can work with remote,  But you have to   Configure FTPConnection detailsIP addressUser authentication what is incremantal aggregation?When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes only incrementally and you can capture changes, you can configure the session to process only those changes. This allows the Informatica Server to update your target incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you run the session.What r the scheduling options to run a sesion?U can shedule a session to run at a given time or intervel,or u can manually run the session.

Page 42: informatica

Different options of scheduling

Run only on demand: server runs the session only when user starts session explicitly Run once: Informatica server runs the session only once at a specified date and time. Run every: Informatica server runs the session at regular intervels as u configured. Customized repeat: Informatica server runs the session at the dats and times secified in the repeat dialog box.What is tracing level and what r the types of tracing level?Tracing level represents the amount of information that informatcia server writes in a log file. Types of tracing level Normal Verbose Verbose init Verbose dataWhat is difference between stored procedure transformation and external procedure transformation?In case of storedprocedure transformation procedure will be compiled and executed in a relational data source.U need data base connection to import the stored procedure in to u’r maping.Where as in external procedure transformation procedure or function will be executed out side of data source.Ie u need to make it as a DLL to access in u r maping.No need to have data base connection in case of external procedure transformation.Explain about Recovering sessions?If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of failure. Correct the errors, and then complete the session. The method you use to complete the session depends on the properties of the mapping, session, and Informatica Server configuration. Use one of the following methods to complete the session: · Run the session again if the Informatica Server has not issued a commit. · Truncate the target tables and run the session again if the session is not recoverable. · Consider performing recovery if the Informatica Server has issued at least one commit.  If a session fails after loading of 10,000 records in to the target.How can u load the records from 10001 th record when u run the session next time?As explained above informatcia server has 3 methods to recovering the sessions.Use performing recovery to load the records from where the session fails.Explain about perform recovery?When the Informatica Server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the row ID of the last row committed to the target database. The Informatica Server then reads all sources again and starts processing from the next row ID. For example, if the Informatica Server commits 10,000 rows before the session fails, when you run recovery, the Informatica Server bypasses the rows up to 10,000 and starts loading with row 10,001. By default, Perform Recovery is disabled in the Informatica Server setup. You must enable Recovery in the Informatica Server setup before you run a session so the Informatica Server can create and/or write entries in the OPB_SRVR_RECOVERY table.How to recover the standalone session?A standalone session is a session that is not nested in a batch. If a standalone session fails, you can run recovery using a menu command or pmcmd. These options are not available for batched sessions.

To recover sessions using the menu: 1. In the Server Manager, highlight the session you want to recover. 2. Select Server Requests-Stop from the menu. 3. With the failed session highlighted, select Server Requests-Start Session in Recovery Mode from the menu.

To recover sessions using pmcmd: 1.From the command line, stop the session. 2. From the command line, start recovery.How can u recover the session in sequential batches?If you configure a session in a sequential batch to stop on failure, you can run recovery starting with the failed session. The Informatica Server completes the session and then runs the rest of the batch. Use the Perform Recovery session property

To recover sessions in sequential batches configured to stop on failure:

Page 43: informatica

1.In the Server Manager, open the session property sheet. 2.On the Log Files tab, select Perform Recovery, and click OK. 3.Run the session. 4.After the batch completes, open the session property sheet. 5.Clear Perform Recovery, and click OK.

If you do not clear Perform Recovery, the next time you run the session, the Informatica Server attempts to recover the previous session. If you do not configure a session in a sequential batch to stop on failure, and the remaining sessions in the batch complete, recover the failed session as a standalone session. How to recover sessions in concurrent batches?If multiple sessions in a concurrent batch fail, you might want to truncate all targets and run the batch again. However, if a session in a concurrent batch fails and the rest of the sessions complete successfully, you can recover the session as a standalone session. To recover a session in a concurrent batch: 1.Copy the failed session using Operations-Copy Session. 2.Drag the copied session outside the batch to be a standalone session. 3.Follow the steps to recover a standalone session. 4.Delete the standalone copy.How can u complete unrcoverable sessions?Under certain circumstances, when a session does not complete, you need to truncate the target tables and run the session from the beginning. Run the session from the beginning when the Informatica Server cannot run recovery or when running recovery might result in inconsistent data.What r the circumstances that infromatica server results an unreciverable session?The source qualifier transformation does not use sorted ports. If u change the partition information after the initial session fails. Perform recovery is disabled in the informatica server configuration. If the sources or targets changes after initial session fails. If the maping consists of sequence generator or normalizer transformation. If a concuurent batche contains multiple failed sessions.If i done any modifications for my table in back end does it reflect in informatca warehouse or maping desginer or source analyzer?NO. Informatica is not at all concern with back end data base.It displays u all the information that is to be stored in repository.If want to reflect back end changes to informatica screens, again u have to import from back end to informatica by valid connection.And u have to replace the existing files with imported files.After draging the ports of three sources(sql server,oracle,informix) to a single source qualifier, can u map these three ports directly to target?NO.Unless and until u join those three ports in source qualifier u cannot map them directly

if u drag three hetrogenous sources and populated to target without any join means you are entertaining Carteisn product. If you don't use join means not only diffrent sources but homegeous sources are show same error.

If you are not interested to use joins at source qualifier level u can add some joins sepratly.

What are Target Types on the Server?Target Types are File, Relational, XML and ERP What are Target Options on the Servers?Target Options for File Target type are FTP File, Loader and MQ.

There are no target options for ERP target type

Target Options for Relational are Insert, Update (as Update), Update (as Insert), Update (else Insert), Delete, and Truncate Table

How do you identify existing rows of data in the target table using lookup transformation?

Page 44: informatica

Can identify existing rows of data using unconnected lookup transformation.

You  can use a Connected Lookup with dynamic cache on the target

What are Aggregate transformation?

Aggregator transform is much like the Group by clause in traditional SQL.

this particular transform is a connected/active transform which can take the incoming data form the mapping pipeline and group them based on the group by ports specified and can calculated aggregate funtions like ( avg, sum, count, stddev....e.tc) for each of those groups.

From a performanace perspective if your mapping has an AGGREGATOR transform use filters and sorters very early in the pipeline if there is any need for them.

What are various types of Aggregation?

Various types of aggregation are SUM, AVG, COUNT, MAX, MIN, FIRST, LAST, MEDIAN, PERCENTILE, STDDEV, and VARIANCE.

What is Code Page Compatibility?Compatibility between code pages is used for accurate data movement when the Informatica Sever runs in the Unicode data movement mode. If the code pages are identical, then there will not be any data loss. One code page can be a subset or superset of another. For accurate data movement, the target code page must be a superset of the source code page.

Superset - A code page is a superset of another code page when it contains the character encoded in the other code page, it also contains additional characters not contained in the other code page.

Subset - A code page is a subset of another code page when all characters in the code page are encoded in the other code page.

What is Code Page used for?

Code Page is used to identify characters that might be in different languages. If you are importing Japanese data into mapping, u must select the Japanese code page of source data.

 what is a source qualifier?

It is a transformation which represents the data Informatica server reads from source.

The Source Qualifier represents the rows that the Informatica Server reads when it executes a session. It represents all data queried from the source.

What are Dimensions and various types of Dimensions? set of level properties that describe a specific aspect of a business, used for analyzing the factual measures of one or more cubes, which use that dimension. Egs. Geography, time, customer and product.

 What is Data Transformation Manager?After the load manager performs validations for the session, it creates the DTM process. The DTM process is the second process associated with the session run. The primary purpose of the DTM process is to create and manage threads that carry out the session tasks.

Page 45: informatica

· The DTM allocates process memory for the session and divide it into buffers. This is also known as buffer memory. It creates the main thread, which is called the master thread. The master thread creates and manages all other threads.

· If we partition a session, the DTM creates a set of threads for each partition to allow concurrent processing.. When Informatica server writes messages to the session log it includes thread type and thread ID. Following are the types of threads that DTM creates:

Master thread - Main thread of the DTM process. Creates and manages all other threads.Mapping thread - One Thread to Each Session. Fetches Session and Mapping Information.Pre and Post Session Thread-One Thread each to Perform Pre and Post Session Operations.reader thread-One Thread for Each Partition for Each Source Pipeline.WRITER THREAD-One Thread for Each Partition if target exist in the source pipeline write to the target.tRANSFORMATION THREAD - One or More Transformation Thread For Each Partition.

What is Session and Batches?Session - A Session Is A set of instructions that tells the Informatica Server How And When To Move Data From Sources To Targets. After creating the session, we can use either the server manager or the command line program pmcmd to start or stop the session.Batches - It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By The Informatica Server. There Are Two Types Of Batches :

Sequential - Run Session One after the Other.concurrent - Run Session At The Same Time.

Why we use lookup transformations?Lookup Transformations can access data from relational tables that are not sources in mapping. With Lookup transformation, we can accomplish the following tasks:

Get a related value-Get the Employee Name from Employee table based on the Employee IDPerform Calculation.

Update slowly changing dimension tables - We can use unconnected lookup transformation to determine whether the records already exist in the target or not.

             

        ETL Questions and Answers what is the metadata extension?

 

Informatica allows end users and partners to extend the metadata stored in the repository by associating information with individual objects in the repository. For example, when you create a mapping, you can store your contact information with the mapping. You associate information with repository metadata using metadata extensions.

Informatica Client applications can contain the following types of metadata extensions:

Vendor-defined. Third-party application vendors create vendor-defined metadata extensions. You can view and change the values of vendor-defined metadata extensions, but you cannot create, delete, or redefine them.

User-defined. You create user-defined metadata extensions using PowerCenter/PowerMart. You can create, edit, delete, and view user-defined metadata extensions. You can also change the values of user-defined extensions.

what is ODS (operation data source)

Page 46: informatica

ANS1: ODS - Operational Data Store.

ODS Comes between staging area & Data Warehouse. The data is ODS will be at the low level of granularity.

Once data was poopulated in ODS aggregated data will be loaded into into EDW through ODS.

ANS2: An updatable set of integrated operational data used for enterprise- wide tactical decision making.Contains live data, not snapshots,and has minimal history retained

can we lookup a table from source qualifier transformation. ie. unconnected lookup

You cannot lookup from a source qualifier directly. However, you can override the SQL in the source qualifier to join with the lookup table to perform the lookup.

What are the different Lookup methods used in Informatica?

In the lookup transormation mainly 2 types

1)connected 2)unconnected lookup

Connected lookup: 1)It recive the value directly from pipeline

                           2)it iwill use both dynamic and static

                            3)it return multiple value

                             4)it support userdefined value

Unconnected lookup:it recives the value :lkp expression

                            2)it will be use only dynamic

                            3)it return only single value

                             4)it does not support user defined values  

What are parameter files ? Where do we use them?

Parameter file is any text file where u can define a value for the parameter defined in the informatica session, this parameter file can be referenced in the session properties,When the informatica sessions runs the values for the parameter is fetched from the specified file. For eg : $$ABC is defined in the infomatica mapping and the value for this variable is defined in the file called abc.txt as [foldername_session_name] ABC='hello world"  In the session properties u can give in the parameter file name field abc.txt

What is a mapping, session, worklet, workflow, mapplet?

Page 47: informatica

Mapping - represents the flow and transformation of data from source to taraget. Mapplet - a group of transformations that can be called within a mapping. Session - a task associated with a mapping to define the connections and other configurations for that mapping. Workflow - controls the execution of tasks such as commands, emails and sessions. Worklet - a workflow that can be called within a workflow.

Session - a task associated with a mapping to define the connections and other configurations for that mapping. Workflow - controls the execution of tasks such as commands, emails and sessions.  Worklet - a workflow that can be called within a workflow. Mapping - represents the flow and transformation of data from source to taraget. 

Mapplet - a group of transformations that can be called within a mapping.  

What is the difference between Power Center & Power Mart?

Power Mart is designed for:  Low range of warehouses only for local repositories mainly desktop environment.  Power mart is designed for:  High-end warehouses Global as well as local repositories ERP support.

Can Informatica load heterogeneous targets from heterogeneous sources?

yes! it loads from heterogeneous sources..

What are the various tools? - Name a few

The various ETL tools are as follows.  Informatica Datastage Business Objects Data Integrator Abinitio,  OLAp tools are as follows.  Cognos Business Objects 

What are snapshots? What are materialized views & where do we use them? What is a materialized view log?

Materialized view is a view in wich data is also stored in some temp table.i.e if we will go with the View concept in DB in that we only store query and once we call View it extract data from DB.But In materialized View data is stored in some temp tables.

Page 48: informatica

What is partitioning? What are the types of partitioning?

Partitioning is a part of physical data warehouse design that is carried out to improve performance and simplify stored-data management. Partitioning is done to break up a large table into smaller, independently-manageable components because it: 1. reduces work involved with addition of new data. 2. reduces work involved with purging of old data.  Two types of partitioning are: 1. Horizontal partitioning.  2. Vertical partitioning (reduces efficiency in the context of a data warehouse).

What is Full load & Incremental or Refresh load?

Full Load is the entire data dump load taking place the very first time. Gradually to synchronize the target data with source data, there are further 2 techniques:- Refresh load - Where the existing data is truncated and reloaded completely. Incremental - Where delta or difference between target and source data is dumped at regular intervals. Timestamp for previous delta load has to be maintained.

   What are the modules in Power Mart?

 

1. Power Mart Designer 2. Server 3. Server Manager 4. Repository 5. Repository Manager

What is a staging area? Do we need it? What is the purpose of a staging area?

Staging area is place where you hold temporary tables on data warehouse server. Staging tables are connected to work area or fact tables. We basically need staging area to hold the data , and perform data cleansing and merging , before loading the data into warehouse

A staging area is like a large table with data separated from their sources to be loaded into a data warehouse in the required format. If we attempt to load data directly from OLTP, it might mess up the OLTP because of format changes between a warehouse and OLTP. Keeping the OLTP data intact is very important for both the OLTP and the warehouse.

Staging area is a temp schema used to  1. Do Flat mapping i.e dumping all the OLTP data in to it without applying any business rules pushing data into staging will take less time because there is no business rules or transformation applied on it.  2. Used for data cleansing and validation using First Logic.

How to determine what records to extract?

Data modeler will provide the ETL developer, the tables that are to be extracted from various sources.

Page 49: informatica

When addressing a table some dimension key must reflect the need for a record to get extracted. Mostly it will be from time dimension (e.g. date >= 1st of current mth) or a transaction flag (e.g. Order Invoiced Stat). Foolproof would be adding an archive flag to record which gets reset when record changes

What are the various transformation available?

 

Aggregator Transformation  Expression Transformation  Filter Transformation  Joiner Transformation  Lookup Transformation  Normalizer Transformation  Rank Transformation  Router Transformation  Sequence Generator Transformation  Stored Procedure Transformation  Sorter Transformation  Update Strategy Transformation  XML Source Qualifier Transformation  Advanced External Procedure Transformation  External Transformation    What is a three tier data warehouse?

 

Three tier data warehouse contains three tier such as bottom tier, middle tier and top tier. Bottom tier deals with retrieving related data’s or information from various information repositories by using SQL. Middle tier contains two types of servers. 1. ROLAP server 2. MOLAP server Top tier deals with presentation or visualization of the results . The 3 tiers are: 1. Data tier - bottom tier - consists of the database 2. Application tier - middle tier - consists of the analytical server  3. Presentation tier - tier that interacts with the end-user   Do we need an ETL tool? When do we go for the tools in the market?

 

ETL Tools are meant to extract, transform and load the data into Data Warehouse for decision making. Before the evolution of ETL Tools, the above mentioned ETL process was done manually by using SQL code created by programmers. This task was tedious and cumbersome in many cases since it involved many resources, complex coding and more work hours. On top of it, maintaining the code placed a great challenge among the programmers.  These difficulties are eliminated by ETL Tools since they are very powerful and they offer many advantages in all stages of ETL process starting from extraction, data cleansing, data profiling, transformation, debugging and loading into data warehouse when compared to the old method. 

1. Normally ETL Tool stands for Extraction Transformation Loader 

2. This helps you to extract the data from different ODS/Database, 

Page 50: informatica

3. If you have a requirement like this you need to get the ETL tools, else you no need any ETL

   How can we use mapping variables in Informatica? Where do we use them?

 

 After creating a variable, we can use it in any expression in a mapping or a mapplet. Als they can be used in source qualifier filter, user defined joins or extract overrides and in expression editor of reusable transformations. Their values can change automatically between sessions.  What are the various methods of getting incremental records or delta records from the source systems

 

getting incremental records from source systems to target can be done by using incremental aggregation transformation

Techniques of Error Handling - Ignore , Rejecting bad records to a flat file , loading the records and reviewing them (default values)

Rejection of records either at the database due to constraint key violation or the informatica server when writing data into target table These rejected records we can find in the bad file folder where a reject file will be created for a session. we can check why a record has been rejected and this bad file contains first column a row indicator and second column a column indicator. These row indicators or of four types D-valid data, O-overflowed data, N-null data, T- Truncated data, And depending on these indicators we can changes to load data successfully to target. 

Can we use procedural logic inside Inforrmatica  If yes how  if now how can we use external procedural logic in Inforrmatica?

We can use External Procedure Transformation to use external procedures. Both COM and Inforrmatica Procedures are supported using External procedure Transformation

Can we override a native sql query within Informatica? Where do we do it? How do we do it?

we can override a sql query in the sql override property of a source qualifier

What is latest version of Power Center / Power Mart?

Power Center 7.1

How do we call shell scripts from Inforrmatica?

Page 51: informatica

 

You can use a Command task to call the shell scripts, in the following ways:  1. Standalone Command task. You can use a Command task anywhere in the workflow or worklet to run shell commands.  2. Pre- and post-session shell command. You can call a Command task as the pre- or post-session shell command for a Session task. For more information about specifying pre-session and post-session shell commands  What is Informatica Metadata and where is it stored?

 

Informatica Metadata contains all the information about the source tables, target tables, the transformations, so that it will be useful and easy to perform transformations during the ETL process.   The Informatica Metadata is stored in Informatica repository 

What are active transformation / Passive transformations?

An active transformation can change the number of rows as output after a transformation, while a passive transformation does not change the number of rows and passes through the same number of rows that was given to it as input.

Transformations can be active or passive. An active transformation can change the number of rows that pass through it, such as a Filter transformation that removes rows that do not meet the filter condition. A passive transformation does not change the number of rows that pass through it, such as an Expression transformation that performs a calculation on data and passes all rows through the transformation

Active transformations  Advanced External ProcedureAggregatorApplication Source QualifierFilterJoinerNormalizerRankRouterUpdate Strategy Passive transformation  ExpressionExternal ProcedureMaplet- InputLookupSequence generatorXML Source QualifierMaplet - Output    When do we Analyze the tables? How do we do it?

 

Page 52: informatica

When the data in the data warehouse changes frequently we need to analyze the tables. Analyze tables will compute/update the table statistics, that will help to boost the performance of your SQL.   Compare ETL & Manual development?

 

There are pros and cons of both tool based ETL and hand-coded ETL. Tool based ETL provides maintainability, ease of development and graphical view of the flow. It also reduces the learning curve on the team.

Handcoded ETL is good when there is minimal transformational logic involved. It is also good when the sources and targets are in the same environment. However, depending on the skill level of the team, this can extend the overall development time.Primary Key Materialized ViewsThe following statement creates the primary-key materialized view on the table emp located on a remote database.SQL>    CREATE MATERIALIZED VIEW mv_emp_pk        REFRESH FAST START WITH SYSDATE         NEXT  SYSDATE + 1/48        WITH PRIMARY KEY         AS SELECT * FROM emp@remote_db; Materialized view created.Note: When you create a materialized view using the FAST option you will need to create a view log on the master tables(s) as shown below:SQL> CREATE MATERIALIZED VIEW LOG ON emp;Materialized view log created.Rowid Materialized ViewsThe following statement creates the row id materialized view on table emp located on a remote database:SQL>    CREATE MATERIALIZED VIEW mv_emp_rowid         REFRESH WITH ROWID         AS SELECT * FROM emp@remote_db;  Materialized view log created.Sub query Materialized ViewsThe following statement creates a sub query materialized view based on the emp and dept tables located on the remote database:SQL> CREATE MATERIALIZED VIEW  mv_empdeptAS SELECT * FROM emp@remote_db eWHERE EXISTS     (SELECT * FROM dept@remote_db

d     WHERE e.dept_no = d.dept_no)REFRESH CLAUSE[refresh [fast|complete|force]        [on demand | commit]        [start with date] [next

date]        [with {primary key|rowid}]]The refresh option specifies: a. The refresh method used by Oracle to refresh data in materialized view b. Whether the view is primary key based or row-id based c. The time and interval at which the view is to be refreshed

Refresh Method - FAST ClauseThe FAST refreshes use the materialized view logs (as seen above) to send the rows that have changed from master tables to the materialized view.You should create a materialized view log for the master tables if you specify the REFRESH FAST clause. SQL> CREATE MATERIALIZED VIEW LOG ON emp; Materialized view log created.Materialized views are not eligible for fast refresh if the defined subquery contains an analytic function.Refresh Method - COMPLETE ClauseThe complete refresh re-creates the entire materialized view. If you request a complete refresh, Oracle performs a complete refresh even if a fast refresh is possible.Refresh Method - FORCE ClauseWhen you specify a FORCE clause, Oracle will perform a fast refresh if one is possible or a complete refresh otherwise. If you do not specify a refresh method (FAST, COMPLETE, or FORCE), FORCE is the default.PRIMARY KEY and ROWID ClauseWITH PRIMARY KEY is used to create a primary key materialized view i.e. the materialized view is based on the primary key of the master table instead of ROWID (for ROWID clause). PRIMARY KEY is the default option. To use the PRIMARY KEY clause you should have defined PRIMARY KEY on the master table or else you should use ROWID based materialized views.Primary key materialized views allow materialized view master tables to be reorganized without affecting the eligibility of the materialized view for fast refresh. Rowid materialized views should have a single master table and cannot contain any of the following:

Distinct or aggregate functions

Page 53: informatica

GROUP BY Subqueries , Joins & Set operations

Timing the refreshThe START WITH clause tells the database when to perform the first replication from the master table to the local base table. It should evaluate to a future point in time. The NEXT clause specifies the interval between refreshesSQL>    CREATE MATERIALIZED VIEW mv_emp_pk        REFRESH FAST         START WITH SYSDATE NEXT  SYSDATE + 2        WITH PRIMARY KEY         AS SELECT * FROM emp@remote_db; Materialized view created.In the above example, the first copy of the materialized view is made at SYSDATE and the interval at which the refresh has to be

performed is every two days.     

 

Informatica Training in Bangalore, Marathahalli

Informatica Interview Questions - Part 15

What are active and passive transformations? Transformations can be active or passive. An active transformation can change the number of rows that pass through it, such as a Filter transformation that removes rows that do not meet the filter condition.

A passive transformation does not change the number of rows that pass through it, such as an Expression transformation that performs a calculation on data and passes all rows through the transformation.

What is tracing level and what are the types of tracing levels?Tracing level represents the amount of information that informatcia server writes in a log file.

Types of tracing level:NormalVerboseVerbose initVerbose data

How can you say that union Transormation is Active transformation?

Page 54: informatica

By Definition, Active transformation is the transformation that changes the number of rows that pass through it. In union transformation the number of rows resulting from union can be different from the actual number of rows.

Is a fact table normalized or de-normalized? A fact table is always DENORMALISED table. It consists of data from dimension table (Primary Key's) and Fact table has foreign keys and measures.

0 comments

Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz

Informatica Interview Questions - Part 14

What are the different threads in DTM process?Master thread: Creates and manages all other threads

Mapping thread: One mapping thread will be creates for each session.Fectchs session and mapping information.

Pre and post session threads: This will be created to perform pre and post session operations.

Reader thread: One thread will be created for each partition of a source. It reads data from source.

Writer thread: It will be created to load data to the target.

Transformation thread: It will be created to transform data.

If we are using Update Strategy Transformation in a mapping how can we know whether insert or update or reject or delete option has been selected during running of sessions in Informatica?In Designer while creating Update Strategy Transformation uncheck "forward to next transformation". If any rejected rows are there automatically it will be updated to the session log file.

Page 55: informatica

Update or insert files are known by checking the target file or table only.

How to join two tables without using the Joiner Transformation?It’s possible to join the two or more tables by using source qualifier. But provided the tables should have relationship.

When you drag and drop the tables you will be getting the source qualifier for each table. Delete all the source qualifiers. Add a common source qualifier for all. Right click on the source qualifier you will find EDIT, click on it. Click on the properties tab and then you will find sql query in that you can write your sql.

Which is better among incremental load, Normal Load and Bulk load?It depends on the requirement. Otherwise Incremental load can be better as it takes only that data which is not available previously on the target.

What is the difference between summary filter and detail filter?Summary filter can be applied on a group of rows that contain a common value. Whereas detail filters can be applied on each and every red of the data base.

What are the tasks that Load manger process will do? Manages the session and batch scheduling: When you start the informatica server the load manager launches and queries the repository for a list of sessions configured to run on the informatica server. When you configure the session the load manager maintains list of list of sessions and session start times. When you start a session load manger fetches the session information from the repository to perform the validations and verifications prior to starting DTM process.

Locking and reading the session: When the informatica server starts a session load manager locks the session from the repository. Locking prevents starting the session again and again.

Reading the parameter file: If the session uses a parameter files,loadmanager reads the parameter file and verifies that the session level parameters are declared in the file

Verifies permission and privileges: When the session starts load manger checks whether or not the user have privileges to run the session.

Page 56: informatica

Creating log files: Load manger creates log file contains the status of session.

0 comments

Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz

Informatica Interview Questions - Part 13

What is Router transformation?Router transformation allows you to use a condition to test data. It is similar to filter transformation. It allows the testing to be done on one or more conditions.

What type of metadata is stored in repository?Source definitions: Definitions of database objects (tables, views, synonyms) or files that provide source data.

Target definitions: Definitions of database objects or files that contain the target data.

Multi-dimensional metadata: Target definitions that are configured as cubes and dimensions.

Mappings: A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data.

Reusable transformations: Transformations that you can use in multiple mappings.

Mapplets: A set of transformations that you can use in multiple mappings.

Sessions and workflows: Sessions and workflows store information about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping.

How to delete duplicate rows in flat files source?

Page 57: informatica

Use a sorter transformation, in this you will have a "distinct" option make use of it.

Can you use aggregator/active transformation after update strategy transformation?You can use aggregator after update strategy. The problem will be, once you perform the update strategy, say you had flagged some rows to be deleted and you had performed aggregator transformation for all rows, say you are using SUM function, then the deleted rows will be subtracted from this aggregator transformation.

What is the difference between dimension table and fact table and what are different dimension tables and fact tables?Fact table contain measurable data, contains primary key

Different types of fact tables:1. Additive2. Non additive 3. Semi additive

Dimensions table contain textual description of data.It contains primary key.

0 comments

Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz

Informatica Interview Questions - Part 12

What is meant by lookup cache?The informatica server builds a cache in memory when it processes the first row at a data in a cached look up transformation. It allocates memory for the cache based on the amount you configure in the transformation or session properties. The informatica server stores condition values in the index cache and output values in the data cache.

Can you use the mapping parameters or variables created in one mapping into any other reusable transformation?Yes. Because reusable transformation is not contained with any maplet or mapping.

What are reusable transformations? You can design using two methods:

Page 58: informatica

1. using transformation developer2. Create normal one and promote it to reusable

What is Code Page used for?Code Page is used to identify characters that might be in different languages. If you are importing Japanese data into mapping, you must select the Japanese code page of source data.

Can you use a session Bulk loading options and during this time can you make a recovery to the session? If the session is configured to use in bulk mode it will not write recovery information to recovery tables. So Bulk loading will not perform the recovery as required.

What are the differences between connected and unconnected lookup?Connected lookup:1) Receives input values directly from the pipe line.2) you can use a dynamic or static cache.3) Cache includes all lookup columns used in the mapping.4) Support user defined default values.

Unconnected lookup:1) Receives input values from the result of a lkp expression in a another transformation.2) You can use a static cache.3) Cache includes all lookup output ports in the lookup condition and the lookup/return port.4) Does not support user defined default values.

0 comments

Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz

Informatica Interview Questions - Part 11

What are the scheduling options to run a session?A session can be scheduled to run at a given time or intervel, or you can manually run the session.Different options of scheduling:Run only on demand: server runs the session only when user starts session explicitly.Run once: Informatica server runs the session only once at a specified date and time.Run every: Informatica server runs the session at regular intervals as u configured.Customized repeat: Informatica server runs the session at the dates and times specified in the

Page 59: informatica

repeat dialog box.

What is parameter file?Parameter file is to define the values for parameters and variables used in a session. A parameter file is a file created by text editor such as word pad or notepad.

You can define the following values in parameter file:Mapping parametersmapping variablessession parameters.

What are the session parameters?Session parameters are like mapping parameters, that represent values you might want to change between sessions such as database connections or source files.Server manager also allows you to create user defined session parameters. Following are user defined session parameters:Database connectionsSource file names: Use this parameter when you want to change the name or location of session source file between session runs.Target file name: Use this parameter when you want to change the name or location of session target file between session runs.Reject file name: Use this parameter when you want to change the name or location of session reject files between session runs.

In a sequential batch can you run the session if previous session fails?Yes. By setting the option always runs the session.

How can you transform row to a column?1. We can use normalizer transformation or2.Use pivot function in oracle

What are the basic needs to join two sources in a source qualifier?Basic need to join two sources using source qualifier:1) Both sources should be in same database2) The should have at least one column in common with same data types

0 comments

Page 60: informatica

Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz

Informatica Interview Questions - Part 10

What are two types of processes that informatica runs the session?Load manager Process: Starts the session, creates the DTM process, and sends post-session email when the session completes.The DTM process: Creates threads to initialize the session, read, write, and transform data, and handle pre- and post-session operations.

What are mapping parameters and variables in which situation we can use it ?If we need to change certain attributes of a mapping after every time the session is run, it will be very difficult to edit the mapping and then change the attribute. So we use mapping parameters and variables and define the values in a parameter file. Then we could edit the parameter file to change the attribute values. This makes the process simple.

Mapping parameter values remain constant. If we need to change the parameter value then we needs to edit the parameter file.

But value of mapping variables can be changed by using variable function. If we need to increment the attribute value by 1 after every session run then we can use mapping variables.

In a mapping parameter we need to manually edit the attribute value in the parameter file after every session run.

What is the method of loading 5 flat files of having same structure to a single target and which transformations I can use?Two Methods.1. Write all files in one directory then use file repository concept (don’t forget to type source file type as indirect in the session).2. Use union transformation to combine multiple input files into a single target.

In which circumstances that informatica server creates Reject files?When it encounters the DD_Reject in update strategy transformation.Violates database constraintField in the rows was truncated or overflown.

Page 61: informatica

What is the default join that source qualifier provides?Inner equi join.

What is the difference between Stored Procedure (DB level) and Stored proc trans (INFORMATICA level) ? Why should we use SP trans ? First of all stored procedures (at DB level) are series of SQL statement. And those are stored and compiled at the server side. In the Informatica it is a transformation that uses same stored procedures which are stored in the database. Stored procedures are used to automate time-consuming tasks that are too complicated for standard SQL statements. if you don't want to use the stored procedure then you have to create expression transformation and do all the coding in it.

0 comments

Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz

Informatica Interview Questions - Part 9

What are variable ports and list two situations when they can be used? We have mainly tree ports Import, Output, Variable port. Import represents data is flowing into transformation. Out port is used when data is mapped to next transformation. Variable port is used when we mathematical calculations are required. This is a scenario in which the source has 2 columns10 A 10 A20 C30 D40 E20 Cand there should be 2 targets one to show the duplicate values and another target for distinct rows.T1 T210 A 10 A20 C 20 C30 D

which transformation can be used to load data into target?40 EStep1: sort the source data based on the unique key.

Expression:

Page 62: informatica

Flag= iif(col1 =prev_col1,'Y','N')prev_col1 = col1

Router:1.for duplicate record: condition: falg = 'Y'2. For distinct Records condition flag = 'N'

What r the types of lookup caches? 1) Static Cache2) Dynamic Cache3) Persistent Cache4) Reusable Cache5) Shared Cache

What are the real times problems that generally come up while doing/running mapping/any transformation? Explain with an example? Here are few real time examples of problems while running informatica mappings:

1) Informatica uses OBDC connections to connect to the databases.The database passwords (production) is changed in a periodic manner and the same is not updated at the Informatica side. Your mappings will fail in this case and you will get database connectivity error.2) If you are using Update strategy transformation in the mapping, in the session propertiesyou have to select Treat Source Rows: Data Driven. If we do not select this Informatica server will ignore updates and it only inserts rows.

3) If we have mappings loading multiple target tables we have to provide the Target Load Planin the sequence we want them to get loaded.4) Error: Snapshot too old is a very common error when using Oracle tables. We get this error while using too large tables. Ideally we should schedule these loads when server is not verybusy (meaning when no other loads are running).5) We might get some poor performance issues while reading from large tables. All the source tables should be indexed and updated regularly.

Informatica Interview Questions - Part 8

Is sorter an active or passive transformation? What happens if we uncheck the distinct option in sorter? Will it be under active or passive transformation?

Page 63: informatica

Sorter is an active transformation. if you don't check the distinct option it is considered as a passive transformation. Because this distinct option eliminates the duplicate records from the table.

How can we partition a session in Informatica? Partitioning option optimizes parallel processing on multiprocessor hardware by providing a thread-based architecture and built-in data partitioning. GUI-based tools reduce the development effort necessary to create data partitions and streamline ongoing troubleshooting and performance tuning tasks, while ensuring data integrity throughout the execution process. As the amount of data within an organization expands and real-time demand for information grows, the Power Center Partitioning option enables hardware and applications to provide outstanding performance and jointly scale to handle large volumes of data and users.

In update strategy target table or flat file which gives more performance? Why? Pros: Loading, Sorting, Merging operations will be faster as there is no index concept and Data will be in ASCII mode.Cons: There is no concept of updating existing records in flat file. As there is no indexes, while lookups speed will be lesser.

What is the difference between constraint base load ordering and target load plan ?Constraint based load ordering

Example:Table 1---Master Take 2---Detail

If the data in Table-1 is dependent on the data in Table-2 then Table-2 should be loaded first. In such cases to control the load order of the tables we need some conditional loading which is nothing but constraint based load. In Informatica this feature is implemented by just one check box at the session level.

What is parameter file? When you start a workflow, you can optionally enter the directory and name of a parameter file. The Informatica Server runs the workflow using the parameters in the file you specify.

For UNIX shell users, enclose the parameter file name in single quotes: -paramfile '$PMRootDir/myfile.txt'

Page 64: informatica

For Windows command prompt users, the parameter file name cannot have beginning or trailing spaces. If the name includes spaces, enclose the file name in double quotes: -paramfile ?$PMRootDirmy file.txt?

Note: When you write a pmcmd command that includes a parameter file located on another machine, use the backslash () with the dollar sign ($). This ensures that the machine where the variable is defined expands the server variable.Pmcmd startworkflow -UV USERNAME -PV PASSWORD -s SALES: 6258 -f east -w wSalesAvg -paramfile '$PMRootDir/myfile.txt'

Informatica interview questions - Part 7

Define informatica repository? Infromatica Repository: The informatica repository is at the center of the informatica suite. You create a set of metadata tables within the repository database that the informatica application and tools access. The informatica client and server access the repository to save and retrieve metadata.

What are the difference between joiner transformation and source qualifier transformation?Joiner Transformation can be used to join tables from heterogeneous (different sources), but we still need a common key from both tables. If we join two tables without a common key we will end up in a Cartesian Join. Joiner can be used to join tables from difference source systems where as Source qualifier can be used to join tables in the same database. We definitely need a common key to join two tables no mater they are in same database or difference databases.

How can you improve session performance in aggregator transformation?One way is supplying the sorted input to aggregator transformation. In situations where sorted input cannot be supplied, we need to configure data cache and index cache at session/transformation level to allocate more space to support aggregation.

What is the difference between connected and unconnected stored procedures?Unconnected: The unconnected Stored Procedure transformation is not connected directly to the flow of the mapping. It either runs before or after the session, or is called by an expression in another transformation in the mapping.Connected: The flow of data through a mapping in connected mode also passes through the Stored Procedure transformation. All data entering the transformation through the input ports affects the stored procedure. You should use a connected Stored Procedure transformation

Page 65: informatica

when you need data from an input port sent as an input parameter to the stored procedure, or the results of a stored procedure sent as an output parameter to another transformation.

Informatica interview questions - Part 6

Explain error handling in informatica with examples?There is one file called the bad file which generally has the format as *.bad and it contains the records rejected by informatica server. There are two parameters one for the types of row and other for the types of columns. The row indicators signify what operation is going to take place (i.e. insertion, deletion, updating etc.). The column indicators contain information regarding why the column has been rejected. (Such as violation of not null constraint, value error, overflow etc.) If one rectifies the error in the data present in the bad file and then reloads the data in the target, then the table will contain only valid data.

What is power center repository? Standalone repository: A repository that functions individually, unrelated and unconnected to other repositories. Global repository: (Power Center only.) The centralized repository in a domain, a group of connected repositories. Each domain can contain one global repository. The global repository can contain common objects to be shared throughout the domain through global shortcuts. Local repository. (Power Center only.) A repository within a domain that is not the global repository. Each local repository in the domain can connect to the global repository and use objects in its shared folders.

Explain difference between static and dynamic cache with one example?Static Cache: Once the data is cached, it will not change. Example unconnected lookup uses static cache.Dynamic Cache: The cache is updated as to reflect the update in the table (or source) for which it is referring to. (Ex. connected lookup).

What is update strategy transformation? The model you choose constitutes your update strategy, how to handle changes to existing rows. In Power Center and Power Mart, you set your update strategy at two different levels:

Within a session. When you configure a session, you can instruct the Informatica Server to either treat all rows in the same way (for example, treat all rows as inserts), or use instructions coded into the session mapping to flag rows for different database operations. Within a mapping. Within a mapping, you use the Update Strategy transformation to flag rows for insert, delete, update, or reject.

Page 66: informatica

Explain Informatica server Architecture? Informatica server, load manager/rs,data transfer manager,reader,temp server and writer are the components of informatica server. First load manager sends a request to the reader if the reader is ready to read the data from source and dump into the temp server and data transfer manager manages the load and it send the request to writer as per first in first out process and writer takes the data from temp server and loads it into the target.

What is Data driven?The informatica server follows instructions coded into update strategy transformations with in the session mapping determine how to flag records for insert, update, delete or reject. If you do not choose data driven option setting, the informatica server ignores all update strategy transformations in the mapping.

How the informatica server sorts the string values in Rank transformation? We can run informatica server either in UNICODE data moment mode or ASCII data moment mode.Unicode mode: In this mode informatica server sorts the data as per the sorted order in session.ASCII Mode: In this mode informatica server sorts the date as per the binary order.

When do you use an unconnected lookup and connected lookup?Orwhat is the difference between dynamic and static lookup?OrWhy and when do we use dynamic and static lookup?In static lookup cache, you cache all the lookup data at the starting of the session. In dynamic lookup cache, you go and query the database to get the lookup value for each record which needs the lookup. Static lookup cache adds to the session run time, but it saves time as informatica does not need to connect to your database every time it needs to lookup. Depending on how many rows in your mapping needs a lookup, you can decide on this. Also remember that static lookup eats up space. so remember to select only those columns which are needed.

How do we do unit testing in informatica? How do we load data in informatica? Unit testing in informatica are of two types1. Quantitative testing 2. Qualitative testing

Steps:1. First validate the mapping

Page 67: informatica

2.Create session on the mapping and then run workflow.

Once the session is succeeded then right click on session and go for statistics tab.There you can see how many numbers of source rows are applied and how many number of rows loaded in to targets and how many number of rows rejected. This is called Quantitative testing.

If once rows are successfully loaded then we will go for qualitative testing.

Steps:1.Take the DATM (DATM means where all business rules are mentioned to the corresponding source columns) and check whether the data is loaded according to the DATM in to target table. If any data is not loaded according to the DATM then go and check in the code and rectify it.

This is called Qualitative testing.This is what a developer will do in Unit Testing.

What are the output files that the informatica server creates during the session run

What are the output files that the informatica server creates during the session run?Informatica server log: Informatica server(on Unix) creates a log for all status and error messages(default name: pm.server.log). It also creates an error log for errormessages. These files will be created in informatica home directory.

Session log file: Informatica server creates session log file for each session. It writes information about session into log files such as initialization process, creation of sqlcommands for reader and writer threads, errors encountered and load summary. The amount of detail in session log file depends on the tracing level that you set.

Session detail file: This file contains load statistics for each target in mapping. Session detail include information such as table name, number of rows written or rejected you can view this file by double clicking on the session in monitor window.

Performance detail file: This file contains information known as session performance details which helps you where performance can be improved. To generate this file select the performance detail option in the session property sheet.

Page 68: informatica

Reject file: This file contains the rows of data that the writer does not write to targets.

Control file: Informatica server creates control file and a target file when you run a session that uses the external loader. The control file contains the information about the target flat file such as data format and loading instructions for the external loader.

Post session email: Post session email allows you to automatically communicate information about a session run to designated recipents.You can create two different messages. One if the session completed successfully the other if the session fails.

Indicator file: If you use the flat file as a target, you can configure the informatica server to create indicator file. For each target row, the indicator file contains a number to indicatewhether the row was marked for insert, update, delete or reject.

Output file: If session writes to a target file, the informatica server creates the target file based on file properties entered in the session property sheet.

Cache files: When the informatica server creates memory cache it also creates cache files.

For the following circumstances informatica server creates index and data cache files:Aggregator transformationJoiner transformationRank transformationLookup transformation

How do you handle decimal places while importing a flat file into informatica?While importing flat file definition just specify the scale for a numeric data type. In the mapping, the flat file source supports only number data type(no decimal and integer). In the SQ associated with that source will have a data type as decimal for that number port of the source.Source - Number data type port - SQ - decimal datatype. Integer is not supported. Hence decimal is taken care.

What is the use of incremental aggregation? Explain in brief with an example?It’s a session option. When the informatica server performs incremental aggregation, it passes new source data through the mapping and uses historical chache data to perform new

Page 69: informatica

aggregation caluculations incrementaly. For performance we will use it.

Differences between Normalizer and Normalizer transformation?Normalizer: It is a transormation mainly used for Cobol sources,it changes the rows into columns and columns into rowsNormalization: To remove the redundancy and inconsistency

What is the target load order?You specify the target load order based on source qualifiers in a maping.If you have the multiple source qualifiers connected to the multiple targets, you can designate the order in which informatica server loads data into the targets.

What can you do to increase performance or explain Performance tuning in Informatica?

What can you do to increase performance or explain Performance tuning in Informatica?

The goal of performance tuning is to optimize session performance so sessions run during the available load window for the Informatica Server.Increase the session performance by following:

The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance.So aviod netwrok connections.

Flat files: If your flat files stored on a machine other than the informatca server, move those files to the machine that consists of informatica server.

Relational datasources: Minimize the connections to sources, targets and informatica server toimprove session performance.Moving target database into server system may improve sessionperformance.

Staging areas: If you use staging areas you force informatica server to perform multiple datapasses.Removing of staging areas may improve session performance.

Page 70: informatica

You can run the multiple informatica servers’ againist the same repository.Distibuting the session load to multiple informatica servers may improve session performance.

Run the informatica server in ASCII datamovement mode improves the session performance. Because ASCII datamovement mode stores a character value in one byte.Unicode mode takes 2 bytes to store a character.

If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.

We can improve the session performance by configuring the network packet size, which allowsdata to cross the network at one time. To do this go to server manger, choose server configure database connections.

If your target consists key constraints and indexes you slow the loading of data. To improve the session performance in this case drop constraints and indexes before you run the session and rebuild them after completion of session.

Running parallel sessions by using concurrent batches will also reduce the time of loading thedata. So concurent batches may also increase the session performance.

Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in paralel pipe lines.

In some cases if a session contains an aggregator transformation, you can use incremental aggregation to improve session performance.

Aviod transformation errors to improve the session performance.

If the session contained lookup transformation you can improve the session performance by enabling the look up cache.

Page 71: informatica

If your session contains filter transformation, create that filter transformation nearer to the sources or you can use filter condition in source qualifier.

Aggreagator, Rank and joiner transformation may often decrease the session performance .Because they must group data before processing it. To improve session performance in this case use sorted ports option.

What is snow flake scheme design in database?Snow flake schema is one of the designs that are present in database design. Snow flake schema serves the purpose of dimensional modeling in data warehousing. If the dimensional table is split into many tables, where the schema is inclined slightly towards normalization, then the snow flake design is utilized. It contains joins in depth. The reason is that, the tables split further.

Explain the difference between star and snowflake schemas?Star schema: A highly de-normalized technique. A star schema has one fact table and is associated with numerous dimensions table and depicts a star.

Snow flake schema: The normalized principles applied star schema is known as Snow flake schema. Every dimension table is associated with sub dimension table.

Differences:

A dimension table will not have parent table in star schema, whereas snow flake schemas have one or more parent tables.

The dimensional table itself consists of hierarchies of dimensions in star schema, where as hierarchies are split into different tables in snow flake schema. The drilling down data from top most hierarchies to the lowermost hierarchies can be done.

What is the difference between view and materialized view?A view is created by combining data from different tables. Hence, a view does not have data of itself. On the other hand, Materialized view usually used in data warehousing has data. This data helps in decision making, performing calculations etc. The data stored by calculating it before hand using queries.

When a view is created, the data is not stored in the database. The data is created when a query is fired on the view. Whereas, data of a materialized view is stored.

What is junk dimension?A single dimension is formed by lumping a number of small dimensions. This dimension is

Page 72: informatica

called a junk dimension. Junk dimension has unrelated attributes. The process of grouping random flags and text attributes in dimension by transmitting them to a distinguished sub dimension is related to junk dimension.

What is degenerate dimension table? A degenerate table does not have its own dimension table. It is derived from a fact table. The column (dimension) which is a part of fact table but does not map to any dimension. E.g. employee_id

What is conformed fact and conformed dimensions use for? Conformed fact in a warehouse allows itself to have same name in separate tables. They can be compared and combined mathematically. Conformed dimensions can be used across multiple data marts. These conformed dimensions have a static structure. Any dimension table that is used by multiple fact tables can be conformed dimensions. What is the difference between Informatica 7.0 and 8.0?The architecture of Power Center 8 has changed a lot:1. PC8 is service-oriented for modularity, scalability and flexibility. 2. The Repository Service and Integration Service (as replacement for Rep Server and Informatica Server) can be run on different computers in a network (so called nodes), even redundantly. 3. Management is centralized, that means services can be started and stopped on nodes via a central web interface. 4. Client Tools access the repository via that centralized machine, resources are distributed dynamically. 5. Running all services on one machine is still possible, of course. 6. It has a support for unstructured data which includes spreadsheets, email, Microsoft Word files, presentations and .PDF documents. It provides high availability, seamless fail over, eliminating single points of failure. 7. It has added performance improvements (To bump up systems performance, Informatica has added "push down optimization" which moves data transformation processing to the native relational database I/O engine whenever it is most appropriate.) 8. Informatica has now added more tightly integrated data profiling, cleansing, and matching capabilities. 9. Informatica has added a new web based administrative console. 10. Ability to write a Custom Transformation in C++ or Java. 11. Midstream SQL transformation has been added in 8.1.1, not in 8.1. 12. Dynamic configuration of caches and partitioning 13. Java transformation is introduced. 14. User defined functions 15. PowerCenter 8 release has "Append to Target file" feature.

Page 73: informatica

What is Data warehousing?A data warehouse can be considered as a storage area where interest specific or relevant data is stored irrespective of the source. What actually is required to create a data warehouse can be considered as Data Warehousing. Data warehousing merges data from multiple sources into an easy and complete form.

What are fact tables and dimension tables?As mentioned, data in a warehouse comes from the transactions. Fact table in a data warehouse consists of facts and/or measures. The nature of data in a fact table is usually numerical. On the other hand, dimension table in a data warehouse contains fields used to describe the data in fact tables. A dimension table can provide additional and descriptive information (dimension) of the field of a fact table. e.g. If I want to know the number of resources used for a task, my fact table will store the actual measure (of resources) while my Dimension table will store the task and resource details. Hence, the relation between a fact and dimension table is one to many.

What is ETL process in data warehousing?ETL stands for Extraction, transformation and loading. That means extracting data from different sources such as flat files, databases or XML data, transforming this data depending on the application’s need and loads this data into data warehouse.

Explain the difference between data mining and data warehousing?Data mining is a method for comparing large amounts of data for the purpose of finding patterns. Data mining is normally used for models and forecasting. Data mining is the process of correlations, patterns by shifting through large data repositories using pattern recognition techniques.Data warehousing is the central repository for the data of several business systems in an enterprise. Data from various resources extracted and organized in the data warehouse selectively for analysis and accessibility.

What is an OLTP system and OLAP system?OLTP stands for OnLine Transaction Processing. Applications that supports and manges transactions which involve high volumes of data are supported by OLTP system. OLTP is based on client-server architecture and supports transactions across networks.OLAP stands for OnLine Analytical Processing. Business data analysis and complex calculations on low volumes of data are performed by OLAP. An insight of data coming from various resources can be gained by a user with the support of OLAP.

Page 74: informatica

What are cubes?Multi dimensional data is logically represented by Cubes in data warehousing. The dimension and the data are represented by the edge and the body of the cube respectively. OLAP environments view the data in the form of hierarchical cube. A cube typically includes the aggregations that are needed for business intelligence queries.

What is snow flake scheme design in database?Snow flake schema is one of the designs that are present in database design. Snow flake schema serves the purpose of dimensional modeling in data warehousing. If the dimensional table is split into many tables, where the schema is inclined slightly towards normalization, then the snow flake design is utilized. It contains joins in depth. The reason is that, the tables split further.

Explain the difference between star and snowflake schemas?Star schema: A highly de-normalized technique. A star schema has one fact table and is associated with numerous dimensions table and depicts a star.

Snow flake schema: The normalized principles applied star schema is known as Snow flake schema. Every dimension table is associated with sub dimension table.

Differences:

A dimension table will not have parent table in star schema, whereas snow flake schemas have one or more parent tables.

The dimensional table itself consists of hierarchies of dimensions in star schema, where as hierarchies are split into different tables in snow flake schema. The drilling down data from top most hierarchies to the lowermost hierarchies can be done.

What is the difference between view and materialized view?A view is created by combining data from different tables. Hence, a view does not have data of itself. On the other hand, Materialized view usually used in data warehousing has data. This data helps in decision making, performing calculations etc. The data stored by calculating it before hand using queries.

When a view is created, the data is not stored in the database. The data is created when a query is fired on the view. Whereas, data of a materialized view is stored.

What is junk dimension?

Page 75: informatica

A single dimension is formed by lumping a number of small dimensions. This dimension is called a junk dimension. Junk dimension has unrelated attributes. The process of grouping random flags and text attributes in dimension by transmitting them to a distinguished sub dimension is related to junk dimension.

What is degenerate dimension table? A degenerate table does not have its own dimension table. It is derived from a fact table. The column (dimension) which is a part of fact table but does not map to any dimension. E.g. employee_id

What is conformed fact and conformed dimensions use for? Conformed fact in a warehouse allows itself to have same name in separate tables. They can be compared and combined mathematically. Conformed dimensions can be used across multiple data marts. These conformed dimensions have a static structure. Any dimension table that is used by multiple fact tables can be conformed dimensions.

What is Virtual Data Warehousing? A virtual data warehouse provides a compact view of the data inventory. It contains Meta data. It uses middleware to build connections to different data sources. They can be fast as they allow users to filter the most important pieces of data from different legacy applications.

What is active data warehousing?An Active data warehouse aims to capture data continuously and deliver real time data. They provide a single integrated view of a customer across multiple business lines. It is associated with Business Intelligence Systems

What is the difference between dependent and independent data warehouse?A dependent data warehouse stored the data in a central data warehouse. On the other hand independent data warehouse does not make use of a central data warehouse.

Difference between data modeling and data mining?Data modeling aims to identify all entities that have data. It then defines a relationship between these entities. Data models can be conceptual, logical or Physical data models. Conceptual

Page 76: informatica

models are typically used to explore high level business concepts in case of stakeholders. Logical models are used to explore domain concepts. While Physical models are used to explore database design.Data mining is used to examine or explore the data using queries. These queries can be fired on the data warehouse. Data mining helps in reporting, planning strategies, finding meaningful patterns etc. it can be used to convert a large amount of data into a sensible form.

What is the difference between ER Modeling and Dimensional Modeling?ER modeling, that models an ER diagram represents the entire businesses or applications processes. This diagram can be segregated into multiple Dimensional models. This is to say, an ER model will have both logical and physical model. The Dimensional model will only have physical model.

What is Data Mart? Data mart stores particular data that is gathered from different sources. Particular data may belong to some specific community (group of people) or genre. Data marts can be used to focus on specific business needs.

What are various methods of loading Dimension tables?Conventional load: Here the data is checked for any table constraints before loading.Direct or Faster load: The data is directly loaded without checking for any constraints.

What is the difference between OLAP and data warehouse? A data warehouse serves as a repository to store historical data that can be used for analysis. OLAP is Online Analytical processing that can be used to analyze and evaluate data in a warehouse. The warehouse has data coming from varied sources. OLAP tool helps to organize data in the warehouse using multidimensional models.

Describe the foreign key columns in fact table and dimension table?The primary keys of entity tables are the foreign keys of dimension tables.The Primary keys of fact dimensional table are the foreign keys of fact tables.

Define the term slowly changing dimensions (SCD)?SCD are dimensions whose data changes very slowly. An example of this can be city of an employee. This dimension will change very slowly. The row of this data in the dimension can be either replaced completely without any track of old record OR a new row can be inserted, OR the change can be tracked

Page 77: informatica

What is a Star Schema? A star schema comprises of fact and dimension tables. Fact table contains the fact or the actual data. Usually numerical data is stored with multiple columns and many rows. Dimension tables contain attributes or smaller granular data. The fact table in start schema will have foreign key references of dimension tables.

What is the difference between star and snowflake schema?Star Schema: A de-normalized technique in which one fact table is associated with several dimension tables. It resembles a star.Snow Flake Schema: A star schema that is applied with normalized principles is known as Snow flake schema. Every dimension table is associated with sub dimension table.

Explain the use lookup tables and Aggregate tables?An aggregate table contains summarized view of data. Lookup tables, using the primary key of the target, allow updating of records based on the lookup condition.

What is real time data-warehousing? In real time data-warehousing, the warehouse is updated every time the system performs a transaction. It reflects the businesses real time information. This means that when the query is fired in the warehouse, the state of the business at that time will be returned.

Define non-additive facts?The facts that can not be summed up for the dimensions present in the fact table are called non-additive facts. The facts can be useful if there are changes in dimensions. For example, profit margin is a non-additive fact for it has no meaning to add them up for the account level or the day level.

Define BUS Schema?A BUS schema is to identify the common dimensions across business processes, like identifying conforming dimensions. BUS schema has conformed dimension and standardized definition of facts.

What is data cleaning? How can we do that?Data cleaning is the process of identifying erroneous data. The data is checked for accuracy, consistency, typos etc.

Data cleaning Methods:Parsing - Used to detect syntax errors.Data Transformation - Confirms that the input data matches in format with expected data.

Page 78: informatica

Duplicate elimination - This process gets rid of duplicate entries.Statistical Methods- values of mean, standard deviation, range, or clustering algorithms etc are used to find erroneous data.

What is the purpose of Fact less Fact Table?Fact less tables are so called because they simply contain keys which refer to the dimension tables. Hence, they don’t really have facts or any information but are more commonly used for tracking some information of an event. Eg. To find the number of leaves taken by an employee in a month.

What is a level of Granularity of a fact table?A fact table is usually designed at a low level of Granularity. This means that we need to find the lowest level of information that can store in a fact table. E.g. Employee performance is a very high level of granularity. Employee_performance_daily, employee_perfomance_weekly can be considered lower levels of granularity.

What is Bit Mapped Index? Bitmap indexes make use of bit arrays (bitmaps) to answer queries by performing bitwise logical operations.They work well with data that has a lower cardinality which means the data that take fewer distinct values.Bitmap indexes are useful in the data warehousing applications.Bitmap indexes have a significant space and performance advantage over other structures for such data. Tables that have less number of insert or update operations can be good candidates.

The advantages of Bitmap indexes are:They have a highly compressed structure, making them fast to read. Their structure makes it possible for the system to combine multiple indexes together so that they can access the underlying table faster.

The Disadvantage of Bitmap indexes is:The overhead on maintaining them is enormous.

What is Data Cardinality? Cardinality is the term used in database relations to denote the occurrences of data on either side of the relation.

There are 3 basic types of cardinality:High data cardinality:Values of a data column are very uncommon.

Page 79: informatica

e.g.: email ids and the user namesNormal data cardinality:Values of a data column are somewhat uncommon but never unique. e.g.: A data column containing LAST_NAME (there may be several entries of the same last name)Low data cardinality:Values of a data column are very usual. e.g.: flag statuses: 0/1

Determining data cardinality is a substantial aspect used in data modeling. This is used to determine the relationshipsTypes of cardinalities:The Link Cardinality - 0:0 relationships The Sub-type Cardinality - 1:0 relationshipsThe Physical Segment Cardinality - 1:1 relationship The Possession Cardinality - 0: M relation The Child Cardinality - 1: M mandatory relationshipThe Characteristic Cardinality - 0: M relationshipThe Paradox Cardinality - 1: M relationship.