Asia’s Lar gest Global Software & Serv ices Company Confidential Ab Initio 1 Introduction
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
1
Introduction
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
2
Agenda
z WhatisDatawarehousing?z WhyDatawarehousing?z ETLprocessz VariousETLtoolsz IntroductionaboutAb Initioz whyAb Initioz HowUnixinvolvedwithAb Initioz GDEwindowz EMERepositoryz Sandboxes UserandStandardSandboxz Ab Initio Componentsz Creationofsimplegraphs
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
Data warehousing and
ETL Process
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
4
DataWarehouse
DataWarehouseisacollectionoflogical
DataMarts,eachofwhichisdesignedfora
particularlineofbusinessi.e.Sales,Marketing(designedtofavor/facilitatedata
analysisandreporting).
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
5
Why
Datawarehousing?
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
6
ETLprocess
DataisfirststoredtemporarilyinaStagingTable/Area
andiscalledStagingData
i.e.
Dataqueuedforprocessing.
TheprocessingtoolreadstheStagedData,performsqualitativeprocessing,filtering,
cleansing(AsrequiredfortheOLAPi.e.reporting/analysis)and
finallyloads/writes
themintoDataWarehouse.
Allthesedataflow(bothinwardandoutward)anddataprocessingactivities
(ExtractionfromSourceSystem Transformationofdatabycleansing/filtering
LoadingintoDataWarehouse)areperformedusinganETLtooli.e.Ab
Initio,
Informatica
etc.
ThisentireprocessissaidtobeasETLprocess.
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
7
Extract,Transformation,Load(ETL)functionalities
Extract: ThefirstphaseofanETLprocessistoextractthedatafromthesourcesystems.
Eachseparatesystemmayalsouseadifferentdataorganization/
format.
Commondatasourceformatsarerelationaldatabases,andflatfiles,butother
sourceformatsexist.Extractionconvertsthedataintorecordsandcolumns.
Transform: Thetransformphaseappliesaseriesofrulesorfunctionstotheextracteddata.
Examples: Deriveanewcalculatedvalue(e.g.sale_amount =qty*unit_price) Summarizemultiplerowsofdata(e.g.totalsalesforeachregion) Load:o Theloadphaseloadsthedataintothedatawarehouse.Depending
onthe
requirementsoftheorganization,thisprocessrangeswidely.
o SimpleOverwriteolddatawithnew.o Morecomplexsystems>Maintenanceofhistoryandaudittrailofallchangesto
thedata
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
8
VariousPopularETLTools
Tool Name Company Name
Informatica Informatica Corporation
DT/studio Embarcadero technologies
Datastage IBM
Abinitio Abinitio Software corporation
Talend Talend corporation
Pentaho Pentaho corporation
Datajunction Pervasive Software
Oracle warehouse builder Oracle Corporation
Microsoft SQl Server Integration Microsoft
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
Introduction-Ab-Initio
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
10
IntroductiontoAbinitio
Data processing tool from Ab Initio software corporation (http://www.abinitio.com)
Latin for from the beginning Designed to support largest and most complex business applications Graphical, intuitive, and fits the way your business works.
Focus:Moving Data -
Move small and large volumes of data in an efficient manner.Deal with the complexity associated with business data.
High Performance Scalable Solutions
Better productivityUsage:
Data Warehousing Batch Processing Data Movement Data Transformation
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
11
ProductConstituents
CooperatingSystem(Co>Ops)
GraphicalDevelopmentEnvironment
(GDE)
SSHREXECTELNETDCOM
EMEDBConduct>ITCF
Product Functionality
GDE UserInterfaceforcreatingGraphsandPlansinAb
Initio
DataProfiler Ab
Initio
ToolforDataProfiling
Co>Ops ServerComponentforrunningdeployedAb
Initio
programs
EME Ab
Initio
TechnicalRepository PartofCo>OpsInstall
Database Ab
Initio
ServerDatabaseComponents
Conduct>IT Ab
Initio
ServerComponentforrunningAb
Initio
Plans
Continuous
FlowAb
Initio
ServerComponentsforrunningCFprograms
Allservercomponentsareinstalledbydefault
AB_HOME
referstoinstallationlocationofAb
Initio VariousConnectorsandPlugins
installedinAB_HOME/Connectors
&
AB_HOME/plugins
locationAllbinariesandlibraryfilesavailableinAB_HOME/bin
&AB_HOME/lib
respectively
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
12
Ab
Initio
ProductArchitecture
Native Operating System (Unix, Windows, OS/390)
The Ab Initio Co>Operating System
Component Library
Development Environments
GDE Shell
3rd PartyComponents
User-definedComponents
User Applications
Ab Initio
EME
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
13
ProductArchitecture
Unix Shell Script or NT Batch Filey Supplies parameter values to underlying
programs through arguments and environment variables
y Controls the flow of data through pipesy Usually generated using the GDE
Operating System( Unix , Windows NT )
UserPrograms
Co>Operating SystemAb Initio Built-in Component Programs (Partitions, Transforms etc)
Host Machine 1
Operating System
UserPrograms
Host Machine 2
Co-OperatingSystem
GDE
y Ability to graphically design batch programs comprising Ab Initio components, connected by pipes
y Ability to test run the graphical design and monitor its progress
y Ability to generate a shell script or batch file from the graphical design
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
14
Co>OperatingSystemandGDE
Co>Operating System Layered on the top of the operating system. Unites a network of computing resources CPUs, storage disks,
programs, datasets into a data-processing system with scalable
performance.
GDE
can talk to the Co-operating system using several protocols like
Telnet, Rexec and FTP
It is GUI for building applications in Ab Initio
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
15
The Graph Model
Graph
isthelogicalmodularunitofanapplication. consistsofseveralcomponentsthatformsthebuildingblocksof
anAbInitio
application
StartScript(HostSetup)
LocaltotheGraph EndScript
LocaltotheGraph
Component isaprogramthatdoesaspecifictypeofjobandcanbecontrolledbyitsparameter
settings.Ex:Join,Reformatetc
ComponentOrganizer Groupsallcomponentsunderdifferentcategories.
SetupCommand AbInitioHost(AIH)file BuildsuptheenvironmenttorunanAbInitioapplication.
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
16
Partsoftypicalgraph
Files Formats Components Flows Layouts Building with mp job Building with mp run
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
17
The Graph Model: Naming the pieces
A Sample Graph
L1
Customers
L1*
Scoreout*
deselect*
L1*
Select
L1
GoodCustomers
L1
OtherCustomers
DatasetComponents
Datasets
Flows
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
18
The Graph Model: A closer look
ASampleGraph
Ports
Record format metadata
Expression Metadata
Layout
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
19
RuntimeEnvironment
Agraph,afterdevelopment,isdeployedtothebackendserverasaUnixshellscriptorWindowsNTbatchfile.
ThisbecomestheexecutabletorunatthebackendwiththehelpoftheCooperatingsystem.
TheexecutioncanbedonefromtheGDEitselformanuallyfromthebackend AbInitioruntimeenvironmentisdifferentfromthedevelopmentenvironment.
Asias Largest Global Software & Services CompanyConfidential
Ab InitioUnixandAbinitio
Unix serves as backend for Ab-initio. All the graphs/Jobs in Ab-initio can be accessible through Unix(backend) Putty connectivity Environment Quick Overview:
$AI_RUN,$AI_BINrun directory, .ksh scripts$AI_PLAN, $AI_SERIAL_
$AI_DMLrecord format files $AI_XFRtransform files $AI_MPgraphs $AI_DBdatabase config files $AI_SERIAL - serial source data, other serial data $AI_MFS - Ab Initio multifile directory in training will also contain partition
directories (more about this later!)
$AI_LOG - A location to place logging files, etc
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
Sandboxes are work areas used to develop, test or run code associated with a given project. Only one version of the code can be held within the sandbox at any time. The EME Datastore contains all versions of the code that have been checked into it.
Check-in
Check-out
Sandboxes and EME
Check-out
Asias Largest Global Software & Services CompanyConfidential
Ab InitioAbinitioEnvironment
Asias Largest Global Software & Services CompanyConfidential
Ab InitioAbinitioEnvironment
Jobrun
How a job runs
The execution of an Ab Initio graph is a job. To run a job, need to invoke a shell script that the GDE generates from a
graph.
The script process initiates job processes that control the execution of the programs represented by the graph.
Graph->mp/graph1.mp ; Shell script->run/graph1.ksh
Asias Largest Global Software & Services CompanyConfidential
Ab InitioAbinitioEnvironment
Jobrun
You can invoke the script in two ways:
From the GDE
From a command line
To invoke the script from the GDE, click the Run button or choose Run > Start from the GDE menu bar.
To invoke the script through command line,
For bin script: ksh scriptname.ksh in bin path.
To run a graph from backend: $AI_RUN Graphname.ksh parameters(if needed) in run path.
Asias Largest Global Software & Services CompanyConfidential
Ab InitioCreationofaGraph
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
Components - Overview
Asias Largest Global Software & Services CompanyConfidential
Ab InitioComponentOrganizer
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
28
Asamplegraph
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
29
A sample korn shell script
Asias Largest Global Software & Services CompanyConfidential
Ab InitioDatasetComponentProperties
Double click on acomponent to bringup its Properties Page
Asias Largest Global Software & Services CompanyConfidential
Ab InitioViewingPortProperties
Click on the Ports Tabto view the Port(s)Properties
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
32
DMLsandXFRs
DML Ab Initio stores metadata in the form of record formats. Metadata can be embedded within a component or can be stored
external to the graph in a file with a .dml extension. XFR
Data can be transformed with the help of transform functions. Transform functions can be embedded within a component or can be
stored external to the graph in a file with a .xfr extension.
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
33
DataMetadataLanguageorDML
DMLSyntax Recordtypesbeginwithrecord andendwithend Fieldsaredeclared:data_type(len)field_name; Fieldnamesconsistofletters(az,AZ),digits(09)andunderscores(_)and
areCasesensitive Keywords/Reservedwordsarerecord, end,date.
SomeoftheDataTypesavailable String Decimal Integer
StoringDatainbinaryform DateandDatetime EBCDICandASCIIrecords NullinAbInitio Nonexistenceofcolumnvalues.
Asias Largest Global Software & Services CompanyConfidential
Ab InitioInTextview(specialsymbolasdelimiter)
Asias Largest Global Software & Services CompanyConfidential
Ab InitioRecordformat
InGraphicalform(gridview)
DML format created for a data
0345John Smith0212Sam Spade0322Elvis Jones0492Sue West0121Mary Forth0221Bill Black
Asias Largest Global Software & Services CompanyConfidential
Ab InitioEditingTypesinGDE
DML creation
Field name Field type Field length
Asias Largest Global Software & Services CompanyConfidential
Ab InitioMoreRecordFormatEditing
View Attributes.
Field Type drop-down
Length can be delimiter string
Date format goes here
Asias Largest Global Software & Services CompanyConfidential
Ab InitioAutoDMLcreationinTablecomponent
Asias Largest Global Software & Services CompanyConfidential
Ab InitioDMLcreation Usefileoptionindataset
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
40
TransformFunctions:XFRs
User-defined function producing one or more output from one or more input
Associated with transform components Rules that computes expression from input values and local variable and
assigns the result to output objects Syntax
Functions :output-records : : function-name (input-records) =
begin
assignments
End;
Assignments :
Direct Mapping without any transformation: out.* :: in.*
Asias Largest Global Software & Services CompanyConfidential
Ab InitioInputfilesettings
Asias Largest Global Software & Services CompanyConfidential
Ab InitioInputData
RecordView
Asias Largest Global Software & Services CompanyConfidential
Ab InitioInputfileView Backend
Asias Largest Global Software & Services CompanyConfidential
Ab InitioOutput
Settings(Propagatingfrominput)
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
45
LookupFile
Serial or MultifilesHeld in main memory Searching and Retrieval is key-based and faster as compared to files stored on disks
associates key values with corresponding data values to index records and retrieve them
Lookup parameters Key Record Format
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
46
BasicComponents
FilterbyExpression Reformat RedefineFormat Sort Join Replicate Dedup Rollup
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
47
FilterbyExpression
Readsrecordfrominput port Evaluatetheselect_expr Ifresultistrue,recordwrittentoout port Ifresultisfalse,recordwrittentodeselect port
true?
expr
No
Input port
DeselectOut port
Yes
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
48
Diagnostic Ports
REJECT Input records that caused error
ERROR Associated error message
LOG Logging records
Asias Largest Global Software & Services CompanyConfidential
Ab InitioFilterbyExpression
Asias Largest Global Software & Services CompanyConfidential
Ab InitioFilteredoutput
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
51
Reformat
1.
Readsrecordfrominput
port
2.
Recordpassesasargumenttotransformfunctionorxfr
3.
Recordswrittentoout
ports,ifthefunctionreturnsasuccessstatus
4.
Recordswrittentoreject
ports,ifthefunctionreturnsafailurestatus
5.
ParametersofReformatComponent
Count Transform(Xfr)Function RejectThreshold Abort NeverAbort UseLimit&Ramp Limit Numberoferrorstotolerate Ramp ScaleoferrorstotolerateperInput
Asias Largest Global Software & Services CompanyConfidential
Ab InitioReformatrejectthreshold
A drop-down menu specifying the number of errors to tolerate.
Asias Largest Global Software & Services CompanyConfidential
Ab InitioTransformfunctionalityinReformat
Asias Largest Global Software & Services CompanyConfidential
Ab InitioReformattedoutput
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
55
Sort
SortComponent
Readsrecordsfrominputport,sortsthembykey,writesresulttooutputportParameters
Key Maxcore
Keys
Akeyidentifiesafieldorsetoffieldstoorganizeadataset
SingleField:employee_number MultiplefieldorCompositekey:(last_name;first_name) Modifiers:employee_numberdescending
Maxcore:Maximummemoryusageinbytes
Asias Largest Global Software & Services CompanyConfidential
Ab InitioSortFunctionality
Asias Largest Global Software & Services CompanyConfidential
Ab InitioSortedoutput
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
58
Join
1.
Readsrecordsfrommultipleinputports
2.
Operatesonrecordswithmatchingkeysusingamultiinputtransformfunction
3.
Writesresulttotheoutputport
PORTS PARAMETERS
inoutunusedreject(optional)error(optional)log(optional)
countkeyoverridekeytransformlimitramp
Asias Largest Global Software & Services CompanyConfidential
Ab InitioJoinParameters
Asias Largest Global Software & Services CompanyConfidential
Ab InitioJoinedoutput
Asias Largest Global Software & Services CompanyConfidential
Ab InitioRollup
Rollup evaluates a group of input records that have the same key, and then generates records that either summarize each group or select certain information from each group.
Parameters:
check-sort,sorted input limit,Ramp
logging log_group
log_input log_intermediate
log_output grouped-input
error_group key
key-method major-key
log_reject max-core
Asias Largest Global Software & Services CompanyConfidential
Ab InitioRollup - functionality
Asias Largest Global Software & Services CompanyConfidential
Ab InitioRollup
Output
Asias Largest Global Software & Services CompanyConfidential
Ab InitioBuiltinFunctionsforRollup
The following aggregation functions are predefined and are only available in the rollup component:
avg max min count first Product last Sum Multi-stage Transform
initialize,iterate,finalize,use of variables
Asias Largest Global Software & Services CompanyConfidential
Ab InitioRollupWizard
Note the use of an aggregation function in the expression
Asias Largest Global Software & Services CompanyConfidential
Ab InitioSimpleandComplexComponents
Inthesecomponentstherecordformatmetadata
typicallychanges(goesthroughatransformation)
frominputtooutput
Inthesecomponentstherecordformat
metadatadoesnotchangefrominputtooutput
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
67
PriorityAssignment
ThePriority
istheorderofevaluationofrulesinatransformfunction.
Anexample
Ajoincomponentmayhaveatransformfunctionwithprioritizedrulesas
out.ssn:1:in1.ssn;
out.ssn:2:in2.ssn;
out.ssn:3:"999999999";
Asias Largest Global Software & Services CompanyConfidential
Ab InitioPriorityAssignmentcontd
Asias Largest Global Software & Services CompanyConfidential
Ab InitioUsinglookupinsteadofJoin
Using Last- Visitsas a lookup file
Asias Largest Global Software & Services CompanyConfidential
Ab InitioUsingalookupfileinaTransformFunction
Output record format:recorddecimal(4) id;string(8) city;decimal(3) amount;date(YYYY/MM/DD) dt;
end
Input 0 record format:recorddecimal(4) id;string(6) name;string(8) city;decimal(3) amount;
end
Transform function:out :: lookup_info(in) =begin
out.id : : in.id;out.city : : in.city;out.amount : : in.amount;out.dt :1 : lookup(Last-Visits, in.id).dt;out.dt :2 : 1900/01/01;
end;
Asias Largest Global Software & Services CompanyConfidential
Ab InitioTheGDEDebugger
TheGDEhasabuiltindebuggercapability ToenabletheDebugger,Debugger:EnableDebugger TheDebuggerToolbar
Enable Debugger
Add Watcher File
Isolate Components
Remove All Watchers
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
72
MultistageTransform
Datatransformationinmultiplestagesfollowingseveralsetsof
rules
Eachsetofruleformonetransformfunction
Informationispassedacrossstagesbytemporaryvariables
Stagesincludeinitialization,iteration,finalizationandmore
Fewmultistagecomponentsareaggregate,rollup,scan
Aggregate/Rollup/Scan
Generatessummaryrecordsforgroupofinputrecords
Asias Largest Global Software & Services CompanyConfidential
Ab InitioDatabaseComponents
*
Join with DB* Truncate Table
Deletes all the rows in a specified DB table
* Run SQL Executes SQL statements in a DB
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
74
BuiltInFunctions
AbInitiobuiltinfunctionsareDMLexpressionsthat
canmanipulatestrings,dates,andnumbers accesssystemproperties
Functioncategories
Datefunctions:now(),today(),date_to_int(),.. Inquiryanderrorfunctions:is_defined(),is_valid(),force_error(),.. Lookupfunctions:lookup(),lookup_local(),.. Mathfunctions:ceiling(),floor(),.. Miscellaneousfunctions:decimal_round(),hash_value(),.. Stringfunctions:string_substring(),is_blank(),..
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
75
Components contd..
Name Description
Normalize Generates multiple data records from each input data recordSeparate a data record with a vector field into several individual records, each containing one element of the vector.
Denormalize Sorted
Consolidates groups of related data records into a single output record with a vector field for each groupRequires Grouped Input
Validate Records
Separates valid data records from invalid data records
Check Order Tests whether data records are sorted according to a key-specifier.
Compare Records
Compares data records from two flows one by one
Generate Records
Generates a specified number of data records with fields of specified lengths and types.
Gather Logs Collects the output from the log ports of components for analysis of a graph after execution
Sample Selects a specified number of data records at random from one or multiple input flows
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
Mechanism by which some or all constituents of an application datasets and processing modules are replicated into a number of
partitions, each spawning a process.
This makes the Ab initio to process considerable huge volume (in millions) of records with an optimum usage of hardware available.
The power of Ab Initio lies in the fact that it can process data in parallel
runtime environment
Types of Parallelismz Component Parallelismz Pipeline Parallelismz Data Parallelism
ParallelisminAbInitio
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
Component Parallelism is achieved when different instances of same
component run on separate data sets. Component parallelism scales to the
number of branches of a graph the more branches a graph has, the greater
the component parallelism. If a graph has only one branch, component
parallelism cannot occur.
ComponentParallelism
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
Pipeline parallelism occurs when several connected program components on
the same branch of a graph execute simultaneously. In this kind the two
processing stages of the graph run concurrently.
PipelineParallelism
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
When data is divided into segments or partitions and multiple instances of
program components run simultaneously on each partition
Expanded View
Linear View
DataParallelism
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
80
DataParallelism
Multifiles
Aglobalviewofasetofordinaryfilescalledpartitions usuallylocatedondifferentdisksorsystems
AbInitioprovidesshelllevelutilitiescalledm_commands forhandlingmultifiles(copy,delete,moveetc.)
MultifilesresideonMultidirectoriesEachisrepresentedusingURLnotationwithmfile astheprotocolpart:
mfile://pluto.us.com/usr/ed/mfs1/new.dat
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
//host1/vol4/pA/mydir/myfile.dat
//host2/vol3/pB/mydir/myfile.dat
//host3/vol7/pC/mydir/myfile.dat
ControlPartition
DataPartition on Host1
DataPartition on Host2
DataPartition on Host3
Afilespanningacrosspartitionsonsame/differenthosts
mfile://host1/u/jo/mfs/mydir/myfile.dat
//host1/u1/jo/mfs/mydir /myfile.dat
AMultifile
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
82
DataPartitioningComponents
Data can be partitioned using
Partition by Round-robin
Partition by Key
Broadcast
Partition by Expression
Partition by Range
Partition by Percentage
Partition by Load Balance
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
Writes records to each partition evenly Block-size records go into one partition before moving on to the next.
RoundrobinPartition
BCD
FCDBGB
DF
D
BC
D
FC
DB
GB
DF
E
E
E
E
A
AA
A
A
AA
AD
Partition 0 Partition 1 Partition 2
Asias Largest Global Software & Services CompanyConfidential
Ab InitioA Data Parallel Application: The Global View
Asias Largest Global Software & Services CompanyConfidential
Ab InitioPartitioning by Key
BC
D
FC
DBGB
DF
D
Partition 0 Partition 1 Partition 2
BCD
FCDBGB
DF
E
E
A
AA
A
E
E
A
AA
AD
BC
D
FC
DBGB
DF
D
Partition 0 Partition 1 Partition 2
BCD
FCDBGB
DF
E
E
A
AA
A
E
E
A
AA
AD
A hash code computed using the key determines which partition a record will be written on, meaning that records with the same key value will go to the same partition
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
86
DepartitioningComponents
Gather Reads data records from the flows connected to the input port Combines the records arbitrarily and writes to the output
Concatenate Concatenate appends multiple flow partitions of data records one
after another
Merge Combines data records from multiple flow partitions that have been
sorted on a key
Maintains the sort order
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
87
Factors:Phases&Checkpoints
Phasing:Breaking an application into phases limits the contention for
Main memory.Processor(s).
Breaking an application into phases costDisk space.
Checkpoint - Purpose: Provide same functionality as phase Additional: Provide restart capability
How does it work ? At job start, output datasets are copied to temporary files (in .WORK-
serial or .WORK-parallel directories) At checkpoint completion, intermediate datasets and job state are
stored in temporary files Recovery information is stored in host and vnode directories
represented by AB_WORK_DIR defined in the Ab Initio environment
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
Directory dedicated to Co>Ops Should have enough free space; Cannot be NFS or NAS mounted Holds Storage of Internal Log Files (used in recovery of Ab Initio Graph) Used when components are connected via name pipes Sub-directories of AB_WORK_DIR
host Holds Control Node Recovery Files vnode Holds Processing Node Recovery Files data Holds files for Layouts cache Holds Cache Files needed by remote components
Important logging information in host and vnode directories Usually does not have data files. Components with host layouts or database layouts, data written to data
subdirectory AB_WORK_DIR fill up leads to non-recovery of Ab Initio Jobs.
88
AB_WORK_DIR
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
89
Performance:DebuggingLogfile
A sample log file ..
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
90
Performance:DebuggingLogfile
ReadingtheLog:CPU CPUtime:totalprocessingforcomponent Status:[Running:Finished] Skew:amongCPUtimesofeachpartition Vertex:component
ReadingtheLog:DATA Databytes:#processed Records:#processed Status:[unopened:opened:closed] Skew:amongdatabytesinpartitions Flow:linkbetweencomponents
datatrackinginfoisdisplayedonflowsinGDE Vertex:component Port:ofcomponent
Interpretingthelog Computedatabytes/secthroughcomponent,ineachpartition Lookforserialization:effectiveCPU=(cpu time)/(elapsedtime) compareopenvs.closedpartitions:serialized whensomepartitionsremainopenlongafter
othershaveclosed dataskew Deadlock:no changeinrecordcountsovercoupleofintervals
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
91
Performance:Inanutshell..
AvoidSortsasitisconsumingmorememory.
AvoidcomponentslikeJoinwithDB(hitting
dbforeachandeveryrecord).
UseLookups.
UseInmemoryJoin/Rollup.
AssignDrivingPortofJoincorrectly.
Filteringunrequireddatabeforeprocessing.
Phasing.
Asias Largest Global Software & Services CompanyConfidential
Ab Initio
92
THANK YOU
Slide Number 1Slide Number 2Slide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Slide Number 11Slide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23Slide Number 24Slide Number 25Slide Number 26Slide Number 27Slide Number 28Slide Number 29Slide Number 30Slide Number 31Slide Number 32Slide Number 33Slide Number 34Slide Number 35Slide Number 36Slide Number 37Slide Number 38Slide Number 39Slide Number 40Slide Number 41Slide Number 42Slide Number 43Slide Number 44Slide Number 45Slide Number 46Slide Number 47Slide Number 48Slide Number 49Slide Number 50Slide Number 51Slide Number 52Slide Number 53Slide Number 54Slide Number 55Slide Number 56Slide Number 57Slide Number 58Slide Number 59Slide Number 60Slide Number 61Rollup - functionalitySlide Number 63Slide Number 64Slide Number 65Slide Number 66Slide Number 67Slide Number 68Slide Number 69Slide Number 70Slide Number 71Slide Number 72Slide Number 73Slide Number 74Slide Number 75Slide Number 76Slide Number 77Slide Number 78Slide Number 79Slide Number 80Slide Number 81Slide Number 82Slide Number 83Slide Number 84Slide Number 85Slide Number 86Slide Number 87Slide Number 88Slide Number 89Slide Number 90Slide Number 91Slide Number 92