Top Banner
MANAPPS V 1.1 2009/01 ETL Benchmarks Pg 1 ETL Benchmarks Version corrigée V 1.1 Comparing DATASTAGE SERVER 7.5 DATASTAGE PX 7.5 TALEND OPEN STUDIO 2.4.1 INFORMATICA 8.1.1 PENTAHO DATA INTEGRATOR 3.0.0 [email protected]
130

ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

Mar 22, 2018

Download

Documents

hoangliem
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 1

ETL Benchmarks

Version corrigée V 1.1

Comparing

� DATASTAGE SERVER 7.5

� DATASTAGE PX 7.5

� TALEND OPEN STUDIO 2.4.1

� INFORMATICA 8.1.1

� PENTAHO DATA INTEGRATOR 3.0.0

[email protected]

Page 2: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 2

This document is published under the Creative Commons license:

http://creativecommons.org/licenses/by/3.0/us/

You are free:

to Share — to copy, distribute, display, and perform the work

to Remix — to make derivative works

Under the following conditions:

Attribution. You must attribute the work in the manner specified by the author or

licensor (but not in any way that suggests that they endorse you or your use of the

work).

� For any reuse or distribution, you must make clear to others the license terms of this work.

The best way to do this is with a link to this web page.

� Any of the above conditions can be waived if you get permission from the copyright holder.

� Apart from the remix rights granted under this license, nothing in this license impairs or

restricts the author's moral rights.

Page 3: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 3

Table of Contents

You are free: .................................................................................................................................... 2

Under the following conditions: ...................................................................................................... 2

Table of Contents .................................................................................................................................... 3

General comments .................................................................................................................................. 5

Hardware Configuration .......................................................................................................................... 6

Test 1: File Input Delimited > File Output Delimited ............................................................................... 7

Scenario: .............................................................................................................................................. 7

Test results: ....................................................................................................................................... 13

Test 2: File Input Delimited > Table MySQL Output .............................................................................. 14

Scenario: ............................................................................................................................................ 14

Test results: ....................................................................................................................................... 17

Test 3: Table Oracle Input > File Output Delimited ............................................................................... 17

Scenario: ............................................................................................................................................ 17

Test results: ....................................................................................................................................... 24

Test 4: File Input Delimited > Table Output Oracle BULK ..................................................................... 25

Scenario: ............................................................................................................................................ 25

Test results: ....................................................................................................................................... 31

Test 5: File Input Delimited > Transform > File Output Delimited ........................................................ 32

Scenario: ............................................................................................................................................ 32

Tests result: ....................................................................................................................................... 44

Test 6: Table Input Oracle > Aggregation > Table Output Oracle (ELT) ................................................ 45

Scenario: ............................................................................................................................................ 45

Test results: ....................................................................................................................................... 51

Test 7: Tables Input Oracle > Transformation > Tables Output Oracle (ELT) ........................................ 52

Scenario: ............................................................................................................................................ 52

Test results: ....................................................................................................................................... 58

Test 8: File Input Delimited > Sort > File Output Delimited .................................................................. 59

Page 4: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 4

Scenario: ............................................................................................................................................ 59

Tests result: ....................................................................................................................................... 65

Test 9: File Input Delimited > Aggregate > File Output Delimited ........................................................ 69

Scenario: ............................................................................................................................................ 69

Tests result: ....................................................................................................................................... 76

Test 10: File Input Delimited > Lookup > File Output Delimited ........................................................... 79

Scenario: ............................................................................................................................................ 79

Tests result: ....................................................................................................................................... 91

Test 11: File Input Delimited > Lookup > File Output Delimited && rejects ......................................... 95

Scenario: ............................................................................................................................................ 95

Tests result: ..................................................................................................................................... 108

Page 5: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 5

General comments

This document constitutes Version 1.1 of the ETL Benchmark, as version 1.0 showed inaccurate

tests results for the PowerCenter solution powered by Informatica, as our tests were carried out

with inadequate settings for this product.

An expert from Informatica suggested adapted settings, and the same tests were run again on the

same environment, in order to preserve the benchmarking basis between all compared ETL tools.

Use of the correct settings on the Informatica PowerCenter solution greatly improve the results

obtained by this solution on the same ETL benchmark tests, as detailed in this corrected version of

our benchmark.

This Version 1.1 of the benchmark thus includes the updated results and comparison between all

tested tools, and Annexe1 details the changes in the use of the Informatica software.

We are open to comments from all tested editors, but also to other

publishers, and are ready to give access to our testing conditions in order to

allow them to verify the results obtained by their products and to suggest

applicable best practices.

For the tests with DataStage PX, we used 2 nodes to take advantage of the dual cores and of the

parallelization feature of the tool.

Results:

Even if it is difficult to give results for this kind of benchmark, and we think that each test is different,

some people ask us to give a global synthesis of those tests.

� Global performance: As requested by some people after the issue of version 1.0 of this ETL

Benchmark, we have assigned, for each test, a specific number of points to the tested

solutions (5 points to the best, 4 to the second…1 to the fifth). According to this scenario,

results are as follows:

o First: Informatica 8.1.1 (353 points)

o Second: Talend Open Studio 2.4.1 (333 points)

o Third: IBM Datastage PX 7.5 (239 points)

Page 6: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 6

o Fourth: IBM Dataserver 7.5 (199 points)

o Fifth: Pentaho Data Integration 3.0.0 (148 points)

Below are the detailed results:

TOS 2.4.1 PDI 3.0.0 IBM DS 7.5 IBM DS PX 7.5 INFA PWC 8.1.1

Test1 13 7 19 8 16

Test2 0 0 0 0 0

Test3 13 3 7 9 11

Test4 8 7 12 5 13

Test5 15 4 13 12 18

Test6 15 4 10 5 12

Test7 11 3 7 8 15

Test8 13 12 5 14 16

Test8.2 12 13 4 15 18

Test8.3 12 12 4 15 17

Test9 12 6 15 12 17

Test9.2 16 5 12 9 19

Test9.3 12 8 13 11 16

Test10 20 7 12 10 13

Test10.2 20 6 6 13 16

Test10.3 16 6 6 14 18

Test10.4 12 4 8 17 19

Test11 20 7 10 8 16

Test11.2 20 6 6 12 16

Test11.3 16 6 6 13 19

Test12 20 8 13 6 13

Test12.2 20 7 6 11 16

Test12.3 17 7 5 12 19

Total 333 148 199 239 353

� Open Source ETL & Parallelization: Pentaho Data Integrator claims the first position here. It is

easier to parallelize with PDI. We did however fine some issues with the way the tool lets you

to parallelize all the components, but some results are inconsistent.

Hardware Configuration

� OS: Windows XP Pro SP2

� CPU: Intel Core2 Duo 2 GHz

� JVM 1.6.0_87

� RAM: 4 Go

Page 7: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 7

Test 1: File Input Delimited > File Output Delimited

Scenario:

Reading X lines from a file input delimited and writing in a file output delimited.

File input delimited extract:

Page 8: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 8

TALEND OPEN STUDIO

Job name: file_input_delimited__file_output_delimited

Job

Schema of file_input_delimited

Page 9: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 9

PENTAHO DATA INTEGRATION

Job name: file_input_delimited__file_output_delimited

Job

Schema of file_input_delimited

Page 10: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 10

DATASTAGE SERVER

Job name: file_input_delimited__file_output_delimited

Job

Schema of file_input_delimited

Page 11: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 11

DATASTAGE PX

Job name: PX_file_input_delimited__file_output_delimited

Job

Schema of file_input_delimited

Page 12: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 12

INFORMATICA

Job name: file_input_delimited__file_output_delimited

Job

Schema of file_input_delimited

Page 13: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 13

Test results:

Test 1: File Input Delimited > File Output Delimited

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 1,00 7,80 39,10 162,09

PDI 3.0.0 2,00 15,50 83,80 417,80

IBM DS 7.5 2,00 4,00 12,50 66,00

IBM DS PX 7.5 3,40 12,00 40,00 150,00

INFA PWC 8.1.1 2,00 7,00 18,00 74,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 2 2 3,4 2

1 000 000 1,99 0,51 1,54 0,9

Page 14: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 14

5 000 000 2,14 0,32 1,02 0,46

20 000 000 2,58 0,41 0,93 0,47

Test 2: File Input Delimited > Table MySQL Output

Scenario:

Reading X lines from a file input delimited and writing into a table output MySQL.

Comments:

DataStage 7.5, DataStage PX 7.5 and Informatica 8.1.1 are not tested for this use case. To

begin, the test has been done with default parameters. To optimize the performances, the commit

parameter has been learned. To finish, the job has been parallelize. To parallelize with TOS 2.4.1, we

just have to cut through our file input delimited (With the header and the limit parameters) and

parallelize two sub-jobs. With PDI 3.0.0, we just have to increment the number of copy.

TOS 2.4.1 permits to use the extended insert, which is a MySQL feature. This feature limits

the number of database accesses and increases the performances. With this feature, TOS 2.4.1 is 6

times faster.

Page 15: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 15

TALEND OPEN STUDIO

Job name: file_input_delimited__table_output_mysql

Job (Multi-Thread Execution checked on Job Settings)

Schema of file_input_delimited

Page 16: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 16

PENTAHO DATA INTEGRATION

Job name: file_input_delimited__table_output_mysql

Job

Schema of file_input_delimited

Page 17: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 17

Test results:

Test 2: File Input Delimited > Table MySQL Output

Lines 100 000 1 000 000 5 000 000

TOS 2.4.1 15,26 144,50 731,78

PDI 3.0.0 14,90 151,80 843,90

TOS 2.4.1 with Extended Insert 2,60 25,00 129,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 TOS 2.4.1 Extended Insert

ratio compared with TOS 2.4.1

100 000 0,98 0,18

1 000 000 1,05 0,17

5 000 000 1,15 0,18

Test 3: Table Oracle Input > File Output Delimited

Scenario:

Page 18: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 18

Reading X lines from a table output Oracle and writing into a file output delimited.

Page 19: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 19

TALEND OPEN STUDIO

Job name: table_input_oracle__file_output_delimited

Job

Schema of table_input_oracle

Page 20: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 20

PENTAHO DATA INTEGRATION

Job name: table_input_oracle__file_output_delimited

Job

SCHEMA VIEWER NOT POSSIBLE

Schema of table_input_oracle

Page 21: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 21

DATASTAGE SERVER

Job name: table_input_oracle__file_output_delimited

Job

Schema of table_input_oracle

Page 22: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 22

DATASTAGE PX

Job name: PX_table_input_oracle__file_output_delimited

Job

Schema of table_input_oracle

Page 23: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 23

INFORMATICA

Job name: table_input_oracle__file_output_delimited

Job

Schema of table_input_oracle

Page 24: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 24

Test results:

Test 3: Table Oracle Input > File Output Delimited

Lines 100 000 500 000 1 000 000

TOS 2.4.1 2,25 6,26 14,25

PDI 3.0.0 4,78 21,20 37,40

IBM DS 7.5 4,00 11,00 19,00

IBM DS PX 7.5 4,00 8,00 15,00

INFA PWC 8.1.1 5 6 9

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 2,12 1,78 1,78 2

500 000 3,39 1,76 1,28 0,95

1 000 000 2,62 1,33 1,05 0,63

Page 25: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 25

Test 4: File Input Delimited > Table Output Oracle BULK

Scenario:

Reading X lines from a file input delimited and writing into a table output Oracle BULK.

Page 26: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 26

TALEND OPEN STUDIO

Job name: file_input_delimited__table_output_oracle_bulk

Job

Page 27: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 27

PENTAHO DATA INTEGRATION

Job name: file_input_delimited__table_output_oracle_bulk

Job

Schema of file_input_delimited

Page 28: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 28

DATASTAGE SERVER

Job name: file_input_delimited__table_output_oracle_bulk

Job

Schema of file_input_delimited

Page 29: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 29

DATASTAGE PX

Job name: PX_file_input_delimited__table_output_oracle_bulk

Job

Schema of file_input_delimited

Page 30: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 30

INFORMATICA

Job name: file_input_delimited__table_output_oracle_bulk

Job

Schema of file_input_delimited

Page 31: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 31

Test results:

Test 4: File Input Delimited > Table Output Oracle BULK

Lines 100 000 1 000 000 2 000 000

TOS 2.4.1 4,36 22,12 49,66

PDI 3.0.0 2,60 30,60 72,70

IBM DS 7.5 3,00 18,00 40,00

IBM DS PX 7.5 6,00 27,00 55,00

INFA PWC 8.1.1 4 7 11

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 0,6 0,69 1,38 0,92

1 000 000 1,38 0,81 1,22 0,31

2 000 000 1,46 0,8 1,11 0,22

Page 32: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 32

Test 5: File Input Delimited > Transform > File Output Delimited

Scenario:

Reading X lines from a file input delimited and writing in a file output delimited after some

changes.

Changes list:

• The field `rate` content is multiplied by 100.

• The new field `name` is a concatenation (`firstname`+ « » +`lastname`).

• The fields `address` content is converted to uppercase.

Comments:

Pentaho Data Integration hasn’t any graphic component to transform data. Thus, we have to

use a custom code component. The used language is JavaScript. The four others ETL got a

transformer to do this. Talend Open Studio got a custom code too, named tJavaRow or tPerlRow.

Page 33: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 33

TALEND OPEN STUDIO

Job name: file_input_delimited__transformation__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_output_delimited

Page 34: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 34

tMap

Page 35: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 35

PENTAHO DATA INTEGRATION

Job name: file_input_delimited__transformation__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_output_delimited

Page 36: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 36

JavaScript Custom Code

Select Values

Select Values

Page 37: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 37

DATASTAGE SERVER

Job name: file_input_delimited__transformation__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_output_delimited

Page 38: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 38

Transformer

Page 39: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 39

DATASTAGE PX

Job name: PX_file_input_delimited__transformation__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_output_delimited

Page 40: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 40

Transformer

Page 41: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 41

INFORMATICA

Job name: file_input_delimited__transformation__file_output_delimited

Job

Schema of file_input_delimited

Page 42: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 42

Schema of file_output_delimited

Page 43: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 43

Mapping

Page 44: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 44

Tests result:

Test 5: File Input Delimited > Transform > File Output Delimited

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 1,30 8,50 43,10 183,13

PDI 3.0.0 5,30 51,00 259,40 1126,10

IBM DS 7.5 2,00 10,00 56,00 178,00

IBM DS PX 7.5 4,75 11,33 41,00 155,00

INFA PWC 8.1.1 3,00 6,00 17,00 74,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 4,07 1,54 3,65 2,3

1 000 000 6 1,18 1,33 0,7

5 000 000 6,02 1,3 0,95 0,39

20 000 000 6,16 0,97 0,84 0,4

Page 45: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 45

Test 6: Table Input Oracle > Aggregation > Table Output Oracle (ELT)

Scenario:

Reading X lines from tables input Oracle and writing into another tables output Oracle (ELT

Mod).

Comments:

Only Talend Open Studio permits to use an ELT mod. Informatica got the Push Down

Optimization, but I didn’t find this feature on the tool.

Page 46: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 46

TALEND OPEN STUDIO

Job names: ELT__table_input_oracle__aggregate_group_by_age_count__table_output_oracle

Job (ELT)

Schema of table_input_oracle

Page 47: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 47

PENTAHO DATA INTEGRATION

Job name: table_input_oracle__aggregate_group_by_age_count__table_output_oracle

Job

SCHEMA VIEWER NOT POSSIBLE

Schema of table_input_oracle

Page 48: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 48

DATASTAGE SERVER

Job name: table_input_oracle__aggregate_group_by_age_count__table_output_oracle

Job

Schema of table_input_oracle

Page 49: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 49

DATASTAGE PX

Job name: PX_table_input_oracle__aggregate_group_by_age_count__table_output_oracle

Job

Schema of table_input_oracle

Page 50: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 50

INFORMATICA

Job name: table_input_oracle__aggregate_group_by_age_count__table_output_oracle

Job

Schema of table_input_oracle

Page 51: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 51

Test results:

Test 6: Table Input Oracle > Aggregation > Table Output Oracle (ELT)

Lines 100 000 500 000 1 000 000

TOS 2.4.1 1,24 1,4 1,69

PDI 3.0.0 4,26 22,26 47,80

IBM DS 7.5 2,40 8,00 13,67

IBM DS PX 7.5 8,00 12,00 17,50

INFA PWC 8.1.1 4 3 4

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 3,44 1,94 6,45 3,22

500 000 15,9 5,71 8,57 2,14

1 000 000 28,28 8,09 10,36 2,36

Page 52: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 52

Test 7: Tables Input Oracle > Transformation > Tables Output Oracle (ELT)

Scenario:

Reading X lines from tables input Oracle and writing into another tables output Oracle (ELT

Mod) after some changes.

Page 53: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 53

TALEND OPEN STUDIO

Job name: table_input_oracle__elt__table_output_oracle

Job (ELT)

Schema of table_lookup_oracle

Schema of table_input_oracle

Page 54: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 54

PENTAHO DATA INTEGRATION

Job name: table_input_oracle__elt__table_output_oracle

Job

SCHEMA VIEWER NOT POSSIBLE

Schema of table_lookup_oracle

SCHEMA VIEWER NOT POSSIBLE

Schema of table_input_oracle

Page 55: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 55

DATASTAGE SERVER

Job name: table_input_oracle__elt__table_output_oracle

Job

Schema of table_lookup_oracle

Schema of table_input_oracle

Page 56: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 56

DATASTAGE PX

Job name: PX_table_input_oracle__elt__table_output_oracle

Job

Schema of table_lookup_oracle

Schema of table_input_oracle

Page 57: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 57

INFORMATICA

Job name: table_input_oracle__elt__table_output_oracle

Job

Schema of table_lookup_oracle

Page 58: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 58

Schema of table_input_oracle

Test results:

Test 7: Tables Input Oracle > Transformation > Tables Output Oracle (ELT)

Lines 100 000 500 000 1 000 000

TOS 2.4.1 5,99 23,26 52,72

PDI 3.0.0 38,35 201,60 382,60

IBM DS 7.5 12,70 65,00 116,00

IBM DS PX 7.5 15,00 30,50 47,50

INFA PWC 8.1.1 5 9 14

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 6,4 2,12 2,5 0,83

500 000 8,67 2,79 1,31 0,39

1 000 000 7,26 2,2 0,9 0,27

Page 59: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 59

Test 8: File Input Delimited > Sort > File Output Delimited

Scenario:

Reading X lines from a file input delimited and writing in a file input delimited sorted.

Sorts list:

• Order by the integer field `age` ASC.

• Order by the string field `firstname` ASC.

• Order by the fields `age` and `firstname` ASC.

Comments:

With the version used, I can’t do sort in memory with Pentaho Data Integrator. But the

feature is present on latest version.

On Talend Open Studio, with a large volume (5 000 000 and 20 000 000), we have to use the

component tExternalSort which use GNU sort, a sort software.

Page 60: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 60

TALEND OPEN STUDIO

Job names:

• file_input_delimited__sort_on_age__file_output_delimited

• file_input_delimited__sort_on_firstname__file_output_delimited

• file_input_delimited__sort_on_firstname_and_age__file_output_delimited

Job

Schema of file_input_delimited

Page 61: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 61

PENTAHO DATA INTEGRATION

Job names:

• file_input_delimited__sort_on_age__file_output_delimited

• file_input_delimited__sort_on_firstname__file_output_delimited

• file_input_delimited__sort_on_firstname_and_age__file_output_delimited

Job

Schema of file_input_delimited

Page 62: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 62

DATASTAGE SERVER

Job names:

• file_input_delimited__sort_on_age__file_output_delimited

• file_input_delimited__sort_on_firstname__file_output_delimited

• file_input_delimited__sort_on_firstname_and_age__file_output_delimited

Job

Schema of file_input_delimited

Page 63: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 63

DATASTAGE PX

Job names:

• PX_file_input_delimited__sort_on_age__file_output_delimited

• PX_file_input_delimited__sort_on_firstname__file_output_delimited

• PX_file_input_delimited__sort_on_firstname_and_age__file_output_delimited

Job

Schema of file_input_delimited

Page 64: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 64

INFORMATICA

Job names:

• file_input_delimited__sort_on_age__file_output_delimited

• file_input_delimited__sort_on_firstname__file_output_delimited

• file_input_delimited__sort_on_firstname_and_age__file_output_delimited

Job

Schema of file_input_delimited

Page 65: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 65

Tests result:

Test 8: File Input Delimited > Sort > File Output Delimited

Sorted by Age

Sorted by age

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 1,44 15,73 188,21 1016,03

PDI 3.0.0 3,63 32,85 155,95 668,20

IBM DS 7.5 4,20 60,70 267,70

IBM DS PX 7.5 4,00 16,25 64,50 492,67

INFA PWC 8.1.1 5,00 13,00 50,00 201,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 2,51 2,92 2,78 3,47

1 000 000 2,09 3,86 1,03 0,82

5 000 000 0,83 1,42 0,34 0,26

Page 66: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 66

20 000 000 0,66 +++ 0,48 0,2

Test 8: File Input Delimited > Sort > File Output Delimited

Sort By First Name

Sorted by firstname

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 1,69 18,05 168,46 1071,20

PDI 3.0.0 3,40 31,20 157,15 739,20

IBM DS 7.5 6,00 58,00 426,00

IBM DS PX 7.5 4,00 16,00 57,00 624,00

INFA PWC 8.1.1 4,00 13,00 51,00 223,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 2,01 3,55 2,37 2,36

1 000 000 1,73 3,21 0,89 0,72

Page 67: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 67

5 000 000 0,93 2,53 0,34 0,3

20 000 000 0,69 +++ 0,58 0,21

Test 8: File Input Delimited > Sort > File Output Delimited

Sort By First Age, Name

Sorted by age & firstname

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 1,33 17,40 225,03 1007,00

PDI 3.0.0 3,22 29,27 159,10 842,20

IBM DS 7.5 7,33 60,00 360,00

IBM DS PX 7.5 4,50 16,33 59,00 582,50

INFA PWC 8.1.1 5,00 13,00 49,00 211,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 2,42 5,51 3,38 3,75

1 000 000 1,68 3,45 0,94 0,74

5 000 000 0,71 1,6 0,26 0,22

Page 68: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 68

20 000 000 0,84 +++ 0,58 0,21

Page 69: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 69

Test 9: File Input Delimited > Aggregate > File Output Delimited

Scenario:

Reading X lines from a file input delimited, achieving an aggregation and writing the

operations result in a file output delimited.

1 – Group by the field `age`; Operation: COUNT.

2 – Group by the field `age`; Operations: COUNT, SUM(rate), AVG(rate), MIN(rate),

MAX(rate).

3 – Group by the field `firstname`; Operations: COUNT.

Comments:

When the output flow is too big (aggregate by firstname with big volume here), we have to

use the tSortedAggregateRow on Talend Open Studio. This component sorts rows before the

aggregation. On this case, Pentaho Data Integrator failed.

Page 70: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 70

TALEND OPEN STUDIO

Job names:

• file_input_delimited__aggregate_group_by_age_count__file_output_delimited

• file_input_delimited__aggregate_group_by_age_count_sum_avg_min_max__file_o

utput_delimited

• file_input_delimited__aggregate_group_by_firstname_count__file_output_delimit

ed

Job

Job using the tExternalSortRow component

Page 71: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 71

Schema of file_input_delimited

Schema of file_output_delimited

file_input_delimited__aggregate_group_by_age_count__file_output_delimited

Page 72: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 72

PENTAHO DATA INTEGRATION

Job names:

• file_input_delimited__aggregate_group_by_age_count__file_output_delimited

• file_input_delimited__aggregate_group_by_age_count_sum_avg_min_max__file_o

utput_delimited

• file_input_delimited__aggregate_group_by_firstname_count__file_output_delimit

ed

Job

Schema of file_input_delimited

Schema of file_output_delimited

file_input_delimited__aggregate_group_by_age_count__file_output_delimited

Page 73: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 73

DATASTAGE SERVER

Job names:

• file_input_delimited__aggregate_group_by_age_count__file_output_delimited

• file_input_delimited__aggregate_group_by_age_count_sum_avg_min_max__file_o

utput_delimited

• file_input_delimited__aggregate_group_by_firstname_count__file_output_delimit

ed

Job

Schema of file_input_delimited

Schema of file_output_delimited

file_input_delimited__aggregate_group_by_age_count__file_output_delimited

Page 74: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 74

DATASTAGE PX

Job names:

• PX_file_input_delimited__aggregate_group_by_age_count__file_output_delimited

• PX_file_input_delimited__aggregate_group_by_age_count_sum_avg_min_max__fi

le_output_delimited

• PX_file_input_delimited__aggregate_group_by_firstname_count__file_output_deli

mited

Job

Schema of file_input_delimited

Schema of file_output_delimited

file_input_delimited__aggregate_group_by_age_count__file_output_delimited

Page 75: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 75

INFORMATICA

Job names:

• file_input_delimited__aggregate_group_by_age_count__file_output_delimited

• file_input_delimited__aggregate_group_by_age_count_sum_avg_min_max__file_o

utput_delimited

• file_input_delimited__aggregate_group_by_firstname_count__file_output_delimit

ed

Job

Schema of file_input_delimited

Schema of file_output_delimited

file_input_delimited__aggregate_group_by_age_count__file_output_delimited

Page 76: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 76

Tests result:

Test 9: File Input Delimited > Aggregate > File Output Delimited

Group by age (count)

Group by Age (Count)

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 0,62 6,99 30,05 124,16

PDI 3.0.0 2,70 26,53 134,30 466,50

IBM DS 7.5 2,00 6,00 21,00 128,00

IBM DS PX 7.5 4,00 6,50 21,33 78,00

INFA PWC 8.1.1 3,00 5,00 8,00 27,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 4,35 3,23 6,45 4,84

1 000 000 3,8 0,86 0,93 0,72

Page 77: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 77

5 000 000 4,47 0,7 0,71 0,27

20 000 000 3,76 1,03 0,63 0,22

Test 9: File Input Delimited > Aggregate > File Output Delimited

Group by Age (Count, Sum(Rate), Avg(Rate), Min(Rate), Max(Rate))

Group by Age (Count, Sum(Rate), Avg(Rate), Min(Rate), Max(Rate))

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 0,84 7,44 37,61 139,12

PDI 3.0.0 2,60 25,20 138,30 426,00

IBM DS 7.5 2,00 11,00 50,00 184,00

IBM DS PX 7.5 11,25 15,33 33,50 254,33

INFA PWC 8.1.1 2,00 6,00 12,00 38,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 3,1 2,38 13,39 2,38

1 000 000 3,39 1,48 2,06 0,8

5 000 000 3,68 1,33 0,89 0,31

Page 78: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 78

20 000 000 3,06 1,32 1,91 0,27

Test 9: File Input Delimited > Aggregate > File Output Delimited

Group by FirstName (Count)

Group by FirstName (Count)

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 0,86 7,89 198,79 928,08

PDI 3.0.0 2,70 29,70 162,30 544,00

IBM DS 7.5 2,00 14,00 68,00 424,00

IBM DS PX 7.5 4,50 11,00 40,00 505,00

INFA PWC 8.1.1 4 9 23 85

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 3,14 2,33 5,23 4,65

1 000 000 3,76 1,77 1,39 1,14

5 000 000 0,82 0,34 0,2 012

20 000 000 0,59 0,46 0,54 0,092

Page 79: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 79

Test 10: File Input Delimited > Lookup > File Output Delimited

Scenario:

Reading X lines from a file input delimited, looking up to another file input delimited, for 4

fields using id_client column. Writing the jointure result into a file output delimited.

Page 80: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 80

TALEND OPEN STUDIO

Job name: file_input_delimited__file_lookup_delimited__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_lookup_delimited

Page 81: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 81

Schema file_output_delimited

tMap Component

Page 82: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 82

PENTAHO DATA INTEGRATION

Job name: file_input_delimited__file_lookup_delimited__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_lookup_delimited

Page 83: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 83

Schema of file_output_delimited

Mapping Component

Page 84: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 84

DATASTAGE SERVER

Job name: file_input_delimited__file_lookup_delimited__file_output_delimited

Job

Schema of file_input_delimited

Page 85: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 85

Schema of file_lookup_delimited

Schema file_output_delimited

Page 86: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 86

Transformer Component

Page 87: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 87

DATASTAGE PX

Job name: PX_file_input_delimited__file_lookup_delimited__file_output_delimited

Job

Schema of file_input_delimited

Page 88: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 88

Schema of file_lookup_delimited

Schema file_output_delimited

Transformer Component

Page 89: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 89

INFORMATICA

Job name: file_input_delimited__file_lookup_delimited__file_output_delimited

Job

Schema of file_input_delimited

Schema of file_lookup_delimited

Page 90: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 90

Schema file_output_delimited

Transformer Component

Page 91: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 91

Tests result:

Test 10: File Input Delimited > Lookup > File Output Delimited

Lookup 100 000 rows ~7MB

Lookup 100 000 rows ~7MB

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 1,45 6,39 28,72 108,37

PDI 3.0.0 4,14 21,40 87,60 288,90

IBM DS 7.5 5,00 10,60 33,00 139,00

IBM DS PX 7.5 5,00 12,20 40,00 122,00

INFA PWC 8.1.1 5,00 11,00 32,00 116,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 2,86 3,45 3,45 3,44

1 000 000 3,35 1,66 1,91 1,72

5 000 000 3,05 1,15 1,39 1,11

Page 92: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 92

20 000 000 2,67 1,28 1,13 1,07

Test 10: File Input Delimited > Lookup > File Output Delimited

Lookup 500 000 rows ~34MB

Lookup 500 000 rows ~34MB

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 3,9 8,89 32,36 115,67

PDI 3.0.0 7,90 24,50 97,40 291,10

IBM DS 7.5 28,00 33,00 56,00 195,00

IBM DS PX 7.5 7,00 13,00 40,00 122,00

INFA PWC 8.1.1 4,00 11,00 33,00 122,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 2,03 7,18 1,79 1,03

1 000 000 2,76 3,71 1,46 1,24

5 000 000 3,01 1,73 1,24 1,02

20 000 000 2,52 1,69 1,05 1,05

Page 93: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 93

Test 10: File Input Delimited > Lookup > File Output Delimited

Lookup 1 000 000 rows ~68MB

Lookup 1 000 000 rows ~68MB

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 9,86 14,26 38,6 121,44

PDI 3.0.0 14,50 32,20 116,60 487,25

IBM DS 7.5 68,30 80,00 102,00 203,00

IBM DS PX 7.5 9,25 15,00 40,00 123,00

INFA PWC 8.1.1 5,00 12,00 35,00 142,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 1,47 6,93 0,94 0,51

1 000 000 2,26 5,61 1,05 0,84

5 000 000 3,02 2,64 1,04 0,91

20 000 000 4,01 1,67 1,01 1,16

Page 94: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 94

Test 10: File Input Delimited > Lookup > File Output Delimited

Lookup 5 000 000 rows ~365MB

Lookup 5 000 000 rows ~365MB

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 56,51 69,1 199,26 557,1

PDI 3.0.0

IBM DS 7.5 369,00 407,00 496,00 973,00

IBM DS PX 7.5 24,00 30,00 55,00 134,00

INFA PWC 8.1.1 11,00 14,00 42,00 141,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 Failed 6,53 0,42 0,19

1 000 000 Failed 5,89 0,43 0,2

5 000 000 Failed 2,49 0,28 0,21

20 000 000 Failed 1,75 0,24 0,25

Page 95: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 95

Test 11: File Input Delimited > Lookup > File Output Delimited &&

rejects

Scenario:

Reading X lines from a file input delimited, looking up to another file input delimited, for 4

fields using id_client column. Writing the jointure result into a file output delimited and the output

rejects into another files output delimited.

1 – Filter rejects: `age` content < 18

2 – Filter rejects: `age` content < 18 and inner join reject

Comments:

Talend Open Studio and DataStage Server are the more ergonomic tools to manage the

expression filter rejects and inner join rejects (with the Transformer component (tMap on Talend

Open Studio)). For DataStage PX, Pentaho Data Integrator and Informatica, we have to use filter

components.

Talend Open Studio, Informatica and DataStage Server are the more ergonomic tools to

manage the expression filter rejects and inner join rejects. For DataStage PX, Pentaho and Data

Integrator, we have to use filter components.

Page 96: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 96

TALEND OPEN STUDIO

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_file_output_delimited

Job

Schema of file_input_delimited

Schema of file_lookup_delimited

Page 97: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 97

Schema of file_output_delimited (age>=18)

Schema of file_output_delimited (age<18) = Schema of file_ output _delimited

tMap Component

Page 98: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 98

PENTAHO DATA INTEGRATION

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_file_output_delimited

Job

Schema of file_input_delimited

Schema of file_lookup_delimited

Page 99: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 99

Schema of file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_ output _delimited

Page 100: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 100

Mapping Component

DATASTAGE SERVER

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_file_output_delimited

Job

Schema of file_input_delimited

Page 101: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 101

Schema file_lookup_delimited

Schema of file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_ output _delimited

Page 102: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 102

Transformer Component

Page 103: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 103

DATASTAGE PX

Job name:

PX_file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_file_output_delim

ited

Job

Schema of file_input_delimited

Page 104: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 104

Schema file_lookup_delimited

Schema of file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_output_delimited

Page 105: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 105

Transformer Component

Page 106: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 106

INFORMATICA

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_file_output_delimited

Job

Schema of file_input_delimited

Page 107: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 107

Schema file_lookup_delimited

Schema of file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_output_delimited

Transformer Component

Page 108: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 108

Tests result:

Test 11: File Input Delimited > Lookup > File Output Delimited && rejects

Lookup 100 000 rows ~7MB + Filter 18 years

Lookup 100 000 rows ~7MB

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 1,51 6,74 29,55 101,65

PDI 3.0.0 3,30 17,10 78,40 305,00

IBM DS 7.5 6,00 10,50 36,00 144,00

IBM DS PX 7.5 7,00 14,00 41,00 137,00

INFA PWC 8.1.1 5,00 10,00 33,00 120,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 2,19 3,97 4,64 3,31

1 000 000 2,54 1,56 2,08 1,48

5 000 000 2,65 1,22 1,39 1,12

20 000 000 3 1,42 1,35 1,18

Page 109: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 109

Test 11: File Input Delimited > Lookup > File Output Delimited && rejects

Lookup 500 000 rows ~34MB + Filter 18 years

Lookup 500 000 rows ~34MB

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 4,26 9,28 32,44 111,98

PDI 3.0.0 7,80 20,50 81,50 310,00

IBM DS 7.5 28,60 34,00 57,00 173,00

IBM DS PX 7.5 7,50 14,25 44,67 155,20

INFA PWC 8.1.1 5,00 10,00 34,00 126,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 1,83 6,71 1,76 1,17

1 000 000 2,21 3,66 1,54 1,08

5 000 000 2,51 1,76 1,38 1,05

20 000 000 2,77 1,54 1,39 1,13

Page 110: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 110

Page 111: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 111

Test 11: File Input Delimited > Lookup > File Output Delimited && rejects

Lookup 1 000 000 rows ~68MB + Filter 18 years

Lookup 1 000 000 rows ~68MB

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 10,2 15,22 38,31 126,63

PDI 3.0.0 14,10 32,35 111,35 319,05

IBM DS 7.5 66,00 68,00 95,00 220,00

IBM DS PX 7.5 9,00 18,00 51,00 153,33

INFA PWC 8.1.1 6,00 14,00 34,00 130,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 1,38 6,47 0,88 0,59

1 000 000 2,13 4,47 1,18 0,92

5 000 000 2,91 1,7 1,33 0,89

20 000 000 2,52 1,74 1,21 1,03

Page 112: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 112

TALEND OPEN STUDIO

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_and_innerjoin_rejects

_file_output_delimited

Job

Schema of file_input_delimited

Page 113: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 113

Schema of file_lookup_delimited

Schema of file_output_delimited (age>=18)

Schema of file_output_delimited (age<18) = Schema of file_output_delimited

Schema of file_output_delimited (inner join rejects) = Schema of file_output_delimited

Page 114: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 114

tMap Component

Page 115: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 115

PENTAHO DATA INTEGRATION

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_and_innerjoin_rejects

_file_output_delimited

Job

Schema of file_input_delimited

Schema of file_lookup_delimited

Page 116: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 116

Schema of file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_output_delimited

Schema of file_output_delimited (inner join rejects) = Schema of file_output_delimited

Page 117: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 117

Mapping Component

DATASTAGE SERVER

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_and_innerjoin_rejects

_file_output_delimited

Page 118: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 118

Job

Schema of file_input_delimited

Schema of file_lookup_delimited

Page 119: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 119

Schema file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_output_delimited

Schema of file_output_delimited (inner join rejects) = Schema of file_output_delimited

Page 120: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 120

Transformer Component

Page 121: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 121

DATASTAGE PX

Job name:

PX_file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_and_innerjoin_rej

ects_file_output_delimited

Job

Schema of file_input_delimited

Page 122: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 122

Schema of file_lookup_delimited

Schema file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_output_delimited

Schema of file_output_delimited (inner join rejects) = Schema of file_output_delimited

Page 123: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 123

Transformer Component

Page 124: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 124

INFORMATICA

Job name:

file_input_delimited__file_lookup_delimited__file_output_delimited__rejects_and_innerjoin_rejects

_file_output_delimited

Job

Schema of file_input_delimited

Page 125: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 125

Schema of file_lookup_delimited

Schema file_output_delimited

Schema of file_output_delimited (age<18) = Schema of file_output_delimited

Schema of file_output_delimited (inner join rejects) = Schema of file_output_delimited

Transformer Component

Page 126: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 126

Test 12: file_input_delimited >_file_lookup_delimited >

file_output_delimited__rejects && innerjoin_rejects_file_output_delimited

Lookup 100 000 rows ~7MB

Lookup 100 000 rows ~7MB

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 1,42 5,65 24,63 106,78

PDI 3.0.0 2,60 13,00 59,80 327,60

IBM DS 7.5 6,00 10,00 30,00 137,00

IBM DS PX 7.5 9,00 15,25 47,33 146,00

INFA PWC 8.1.1 4,00 12,00 33,00 121,00

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 1,83 4,22 6,34 2,82

1 000 000 2,3 1,77 2,7 2,12

5 000 000 2,43 1,22 1,92 1,64

20 000 000 3,07 1,28 1,37 1,13

Page 127: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 127

Test 12: file_input_delimited >_file_lookup_delimited >

file_output_delimited__rejects && innerjoin_rejects_file_output_delimited

Lookup 500 000 rows ~34MB

Lookup 500 000 rows ~34MB

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 4,16 8,74 30,34 120,53

PDI 3.0.0 7,26 19,30 72,25 319,60

IBM DS 7.5 28,00 35,50 63,00 189,50

IBM DS PX 7.5 11,00 16,00 44,00 150,00

INFA PWC 8.1.1 5 11 33 127

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 1,75 6,73 6,73 1,2

1 000 000 2,21 4,06 1,83 1,26

5 000 000 2,38 2,08 1,45 1,09

20 000 000 2,65 1,57 1,24 1,05

Page 128: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 128

Test 12: file_input_delimited >_file_lookup_delimited >

file_output_delimited__rejects && innerjoin_rejects_file_output_delimited

Lookup 1 000 000 rows ~68MB

Lookup 1 000 000 rows ~68MB

Lines 100 000 1 000 000 5 000 000 20 000 000

TOS 2.4.1 10,98 15,18 38,49 126,57

PDI 3.0.0 13,30 27,35 79,00 413,45

IBM DS 7.5 38,49 90,40 108,00 231,00

IBM DS PX 7.5 13,00 19,00 49,00 134,00

INFA PWC 8.1.1 6 13 37 131

Statistics:

Number of lines TOS 2.4.1 PDI 3.0.0 DataStage 7.5 DataStage PX 7.5 Informatica 8.1.1

ratio compared with TOS 2.4.1

100 000 1,21 3,51 1,18 0,55

1 000 000 1,8 5,96 1,25 0,86

5 000 000 2,05 2,81 1,27 0,96

20 000 000 3,27 1,83 1,06 1,04

Page 129: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 129

Annex 1: Informatica settings and results

This annex presents the settings changes made by Informatica and limitations they have found

Comments and amendment done on the basic PowerCenter 8.1.1

installation:

*** Since the 'benchmark' machine is a tiny laptop with limited ressource (XP 32bit, Core2 Duo CPU

and 3,43 GB of RAM) we've done following change:

- Auto-Memory deactivation:

MaxMem at 0 in the Default Session Config

- High Availability storage deactivation:

EnableHAStorage at No for the 'Integration Service

- Metadata Manager and Reporting Service deactivation

*** Configuration amendments :

- Unix environment variable INFA_DEFAULT_DOMAIN added

- Custom variable FileRdrTreatNullCharAs on the Integration Service added (NULL character are

encountered in source data files)

*** Standard Oracle 10g (10.1.0.2.0) Database installation with:

sga_max_size=164MB

pga_aggregate_target=115MB

Comments and "best-practices" for the tests:

Test 1: File Input Delimited > File Output Delimited

- dynamic partitioning at 2 with more than 5 millions rows

This is a Disk Bounded test

Test 2: File Input Delimited > Table MySQL Output

Not Applicable

Test 3: Table Oracle Input > File Output Delimited

- no partitioning as it's too small in volume and short in time

Test 4: File Input Delimited > Table Output Oracle BULK

Page 130: ETLBenchmarks Manapps 090203manapps.tm.fr/.../ETLBenchmarks_Manapps-090203.pdf · Test4 8 7 12 5 13 Test5 15 4 13 12 18 Test6 15 4 ... DataStage 7.5, DataStage PX 7.5 and Informatica

MANAPPS

V 1.1 2009/01 ETL Benchmarks

Pg 130

- commit size at 100000

- dynamic partitioning at 2 with 2 millions rows

This is a Disk Bounded test

Test 5: File Input Delimited > Transform > File Output Delimited

- function "CONCAT(CONCAT(firstname,' '),lastname)" is replaced by "firstname || ' ' || lastname"

- dynamic partitioning at 2 with more than 5 millions rows

This is a Disk Bounded test

Test 6: Table Input Oracle > Aggregation > Table Output Oracle (ELT)

- no partitioning as it's too small in volume and short in time

Oracle database is not 'tuned' for ELT mode

Test 7: Tables Input Oracle > Transformation > Tables Output Oracle (ELT)

- commit size at 50000

- no partitioning as it's too small in volume and short in time

Oracle database is not 'tuned' for ELT mode

Test 8: File Input Delimited > Sort > File Output Delimited

- sorter memory adjustment

This is a memory limited test at 20 millions rows (2 pass sort are required) and also disk limited sometime

Test 9: File Input Delimited > Aggregate > File Output Delimited

- dynamic partitioning at 2 with more than 5 millions rows in source

- aggregator memory adjustment

This is a CPU bounded test

Test 10: File Input Delimited > Lookup > File Output Delimited

- dynamic partitioning at 2 with more than 5 millions rows in source or lookup

- lookup memory adjustment

- lookup in the flow with hash partitioning point

This is a CPU bounded test

Test 11: File Input Delimited > Lookup > File Output Delimited && rejects

- use of router in place of filters

- dynamic partitioning at 2 with more than 5 millions rows in source

- lookup memory adjustment

- lookup in the flow with hash partitioning point

This is a CPU bounded test

Test 12: file_input_delimited >_file_lookup_delimited > file_output_delimited__rejects && innerjoin_rejects_file_output_delimited

- use of router in place of filters

- dynamic partitioning at 2 with more than 5 millions rows in source

- lookup memory adjustment

- lookup in the flow with hash partitioning point

This is a CPU bounded test