WorkflowAdministrationGuide

Workflow Administration Guide

Informatica PowerCenter® (Version 7.1.1)

Informatica PowerCenter Workflow Administration GuideVersion 7.1.1August 2004

Copyright (c) 1998–2004 Informatica Corporation.All rights reserved. Printed in the USA.

This software and documentation contain proprietary information of Informatica Corporation, they are provided under a license agreement containing restrictions on use and disclosure and is also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation.

Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable.

The information in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. Informatica Corporation does not warrant that this documentation is error free.Informatica, PowerMart, PowerCenter, PowerChannel, PowerCenter Connect, MX, and SuperGlue are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners.

Portions of this software are copyrighted by DataDirect Technologies, 1999-2002.

Informatica PowerCenter products contain ACE (TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University and University of California, Irvine, Copyright (c) 1993-2002, all rights reserved.

Portions of this software contain copyrighted material from The JBoss Group, LLC. Your right to use such materials is set forth in the GNU Lesser General Public License Agreement, which may be found at http://www.opensource.org/licenses/lgpl-license.php. The JBoss materials are provided free of charge by Informatica, “as-is”, without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose.

Portions of this software contain copyrighted material from Meta Integration Technology, Inc. Meta Integration® is a registered trademark of Meta Integration Technology, Inc.

This product includes software developed by the Apache Software Foundation (http://www.apache.org/).The Apache Software is Copyright (c) 1999-2004 The Apache Software Foundation. All rights reserved.

DISCLAIMER: Informatica Corporation provides this documentation “as is” without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. The information provided in this documentation may include technical inaccuracies or typographical errors. Informatica could make improvements and/or changes in the products described in this documentation at any time without notice.

Table of Contents

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvNew Features and Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi

PowerCenter 7.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi

PowerCenter 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviii

PowerCenter 7.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlii

About Informatica Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlviii

About this Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlix

Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlix

Other Informatica Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l

Visiting Informatica Customer Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . l

Visiting the Informatica Webzine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l

Visiting the Informatica Web Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l

Visiting the Informatica Developer Network . . . . . . . . . . . . . . . . . . . . . . l

Obtaining Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . li

Chapter 1: Understanding the Server Architecture . . . . . . . . . . . . . . . 1Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Workflow Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Pipeline Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

PowerCenter Server Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Running a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Load Manager Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Managing Workflow Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Locking and Reading the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Reading the Parameter File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Creating the Workflow Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Running Workflow Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Distributing Sessions to Worker Servers . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Starting the DTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Running Sessions from Master Servers . . . . . . . . . . . . . . . . . . . . . . . . . . 10

i i i

Writing Historical Information to the Repository . . . . . . . . . . . . . . . . . . 10

Sending Post-Session Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Data Transformation Manager (DTM) Process . . . . . . . . . . . . . . . . . . . . . . . 11

Reading the Session Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Expanding Variables and Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Creating the Session Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Validating Code Pages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Verifying Connection Object Permissions . . . . . . . . . . . . . . . . . . . . . . . 12

Running Pre-Session Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Running the Processing Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Running Post-Session Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Sending Post-Session Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Understanding Processing Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Thread Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Threads and Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

PowerCenter Server Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Reading Source Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Blocking Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Block Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

System Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

CPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Load Manager Shared Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

DTM Buffer Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Cache Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Code Pages and Data Movement Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

ASCII Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Unicode Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Output Files and Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

PowerCenter Server Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Workflow Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Session Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Session Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Performance Detail File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Reject Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Row Error Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Recovery Tables and Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Control File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

iv Table of Contents

Indicator File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Output File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Cache Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Chapter 2: Configuring the Workflow Manager . . . . . . . . . . . . . . . . . 37Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Setting the Date/Time Display Format . . . . . . . . . . . . . . . . . . . . . . . . . 38

Customizing the Workflow Manager Options . . . . . . . . . . . . . . . . . . . . . . . . 39

Configuring General Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Configuring Format Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Configuring Miscellaneous Options . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Enabling Enhanced Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Registering the PowerCenter Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Server Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Steps for Registering a PowerCenter Server . . . . . . . . . . . . . . . . . . . . . . 48

Deleting a PowerCenter Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Configuring Connection Object Permissions . . . . . . . . . . . . . . . . . . . . . . . . 51

Connection Object Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Setting Up a Relational Database Connection . . . . . . . . . . . . . . . . . . . . . . . 53

Database Connect Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Database Connection Code Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Configuring Environment SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Configuring a Relational Database Connection . . . . . . . . . . . . . . . . . . . 56

Deleting Connection Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Copying a Relational Database Connection . . . . . . . . . . . . . . . . . . . . . . 59

Replacing a Relational Database Connection . . . . . . . . . . . . . . . . . . . . . . . . 62

Chapter 3: Using the Workflow Manager . . . . . . . . . . . . . . . . . . . . . . 65Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Workflow Manager Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Workflow Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Workflow Manager Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Navigating the Workspace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Customizing Workflow Manager Windows . . . . . . . . . . . . . . . . . . . . . . 69

Using Toolbars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Searching for Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Arranging Objects in the Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Table of Contents v

Zooming the Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Working with Repository Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Viewing Object Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Entering Descriptions for Repository Objects . . . . . . . . . . . . . . . . . . . . . 73

Renaming Repository Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Checking Out and In Versioned Repository Objects . . . . . . . . . . . . . . . . . . . 74

Checking Out Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Checking In Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Searching For Versioned Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Copying Repository Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Copying Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Copying Workflow Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Comparing Repository Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Steps for Comparing Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Working with Metadata Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Creating a Metadata Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Editing a Metadata Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Deleting a Metadata Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Keyboard Shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Chapter 4: Working with Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . 87Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Workflow Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Developing Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Creating a New Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Adding Tasks to Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Working with Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Using the Expression Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Deleting a Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Editing a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

Using the Workflow Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Step 1. Assign a Name and PowerCenter Server to the Workflow . . . . . . . 99

Step 2. Create a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Step 3. Schedule a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Using Workflow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Pre-Defined Workflow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

User-Defined Workflow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

vi Table of Contents

Scheduling a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Creating a Reusable Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Configuring Scheduler Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Editing Scheduler Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Disabling Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Validating a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Expression Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Task Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Workflow Properties Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

Running Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

Running the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Selecting a Server to Run the Workflow . . . . . . . . . . . . . . . . . . . . . . . . 122

Assigning the PowerCenter Server to a Workflow . . . . . . . . . . . . . . . . . 122

Running a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Running a Part of a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Running a Task in the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Suspending the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Configuring Suspension Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Stopping or Aborting the Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Server Handling of Stop and Abort . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Stopping or Aborting a Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Chapter 5: Working with Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Creating a Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Creating a Task in the Task Developer . . . . . . . . . . . . . . . . . . . . . . . . . 133

Creating a Task in the Workflow or Worklet Designer . . . . . . . . . . . . . 133

Configuring Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Reusable Workflow Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

AND or OR Input Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Disabling Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Failing Parent Workflow or Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . 138

Validating Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Working with the Assignment Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

Working with the Command Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Using Session Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Creating a Command Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Table of Contents vii

Executing Commands in the Command Task . . . . . . . . . . . . . . . . . . . . 145

Working with the Control Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

Working with the Decision Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Using the Decision Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Creating a Decision Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Working with Event Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

Example of User-Defined Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

Working with Event-Raise Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Working With Event-Wait Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

Working with the Timer Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

Chapter 6: Working with Worklets . . . . . . . . . . . . . . . . . . . . . . . . . . . 163Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

Suspending Worklets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

Developing a Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

Creating a Reusable Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

Creating a Non-Reusable Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

Configuring Worklet Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

Adding Tasks in Worklets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

Nesting Worklets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

Using Worklet Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Persistent Worklet Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Overriding Initial Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Validating Worklets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

Chapter 7: Working with Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . 173Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

Creating a Session Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Session Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Steps to Create a Session Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Editing a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

Edit Session Privilege . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

Applying Attributes to All Instances . . . . . . . . . . . . . . . . . . . . . . . . . . 178

Creating a Session Configuration Object . . . . . . . . . . . . . . . . . . . . . . . . . . 183

Using Pre- and Post-Session SQL Commands . . . . . . . . . . . . . . . . . . . . . . . 186

Guidelines for Entering Pre- and Post-Session SQL Commands . . . . . . . 186

Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

viii Table of Contents

Using Pre- or Post-Session Shell Commands . . . . . . . . . . . . . . . . . . . . . . . . 188

Using Server and Session Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

Configuring Non-Reusable Shell Commands . . . . . . . . . . . . . . . . . . . . 189

Configuring Reusable Shell Commands . . . . . . . . . . . . . . . . . . . . . . . . 192

Using Server Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

Pre-Session Shell Command Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

Using Post-Session Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

Validating a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

Validating Multiple Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

Running the Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Selecting a Server to Run the Session . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Assigning the PowerCenter Server to a Session . . . . . . . . . . . . . . . . . . . 198

Stopping and Aborting a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

Threshold Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

Fatal Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

ABORT Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

User Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

PowerCenter Server Handling for Session Failure . . . . . . . . . . . . . . . . . 201

Mapping Parameters and Variables in Sessions . . . . . . . . . . . . . . . . . . . . . . 203

Handling High Precision Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

Chapter 8: Working with Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . 207Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

Globalization Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

Source Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

Permissions and Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

Allocating Buffer Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

Partitioning Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

Configuring Sources in a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

Configuring Readers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

Configuring Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

Configuring Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

Working with Relational Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

Selecting the Source Database Connection . . . . . . . . . . . . . . . . . . . . . . 214

Defining the Treat Source Rows As Property . . . . . . . . . . . . . . . . . . . . 214

Configuring the Table Owner Name . . . . . . . . . . . . . . . . . . . . . . . . . . 216

Overriding the SQL Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

Table of Contents ix

Working with File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

Configuring Source Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

Configuring Fixed-Width File Properties . . . . . . . . . . . . . . . . . . . . . . . 220

Configuring Delimited File Properties . . . . . . . . . . . . . . . . . . . . . . . . . 222

Configuring Line Sequential Buffer Length . . . . . . . . . . . . . . . . . . . . . 225

Server Handling for File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

Character Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

Multibyte Character Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . 227

Null Character Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

Row Length Handling for Fixed-Width Flat Files . . . . . . . . . . . . . . . . . 228

Numeric Data Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

Using a File List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

Creating the File List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

Configuring a Session to Use a File List . . . . . . . . . . . . . . . . . . . . . . . . 231

Chapter 9: Working with Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

Globalization Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

Target Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

Partitioning Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

Permissions and Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

Configuring Targets in a Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

Configuring Writers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

Configuring Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

Configuring Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

Working with Relational Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

Target Database Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

Target Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

Truncating Target Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

Deadlock Retry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

Dropping and Recreating Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

Constraint-Based Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

Bulk Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

Table Name Prefix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

Reserved Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

Working with Target Connection Groups . . . . . . . . . . . . . . . . . . . . . . . . . . 257

Working with Active Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

x Table of Contents

Working with File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

Configuring Target Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

Configuring Fixed-Width Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 265

Configuring Delimited Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

Server Handling for File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

Writing to Fixed-Width Flat Files with Relational Target Definitions . . 268

Writing to Fixed-Width Files with Flat File Target Definitions . . . . . . . 269

Writing Multibyte Data to Fixed-Width Flat Files . . . . . . . . . . . . . . . . 270

Null Characters in Fixed-Width Files . . . . . . . . . . . . . . . . . . . . . . . . . 272

Character Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

Writing Metadata to Flat File Targets . . . . . . . . . . . . . . . . . . . . . . . . . 273

Working with Heterogeneous Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

Chapter 10: Understanding Commit Points . . . . . . . . . . . . . . . . . . . 275Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

Target-Based Commits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

Source-Based Commits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

Determining the Commit Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

Switching from Source-Based to Target-Based Commit . . . . . . . . . . . . . 280

User-Defined Commits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

Rolling Back Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

Understanding Transaction Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

Transformation Scope. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

Understanding Transaction Control Units . . . . . . . . . . . . . . . . . . . . . . 289

Rules and Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

Setting Commit Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

Chapter 11: Recovering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

Preparing for Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

Configuring the Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

Configuring the Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

Configuring the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

Configuring the Target Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

Creating pmcmd Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

Working with Repeatable Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

Recovering a Suspended Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

Table of Contents xi

Recovering a Suspended Workflow with Sequential Sessions . . . . . . . . . 305

Recovering a Suspended Workflow with Concurrent Sessions . . . . . . . . 306

Steps for Recovering a Suspended Workflow . . . . . . . . . . . . . . . . . . . . . 307

Recovering a Failed Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308

Recovering a Failed Workflow with Sequential Sessions . . . . . . . . . . . . . 308

Recovering a Failed Workflow with Concurrent Sessions . . . . . . . . . . . . 309

Steps for Recovering a Failed Workflow . . . . . . . . . . . . . . . . . . . . . . . . 310

Recovering a Session Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

Recovering Sequential Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

Recovering Concurrent Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

Steps for Recovering a Session Task . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

Server Handling for Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

Verifying Recovery Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

Running Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

Completing Unrecoverable Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316

Chapter 12: Sending Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320

Configuring Email on UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321

Configuring Email on Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322

Step 1. Verify the Informatica Service Startup Account . . . . . . . . . . . . . 322

Step 2. Configure a Microsoft Outlook User . . . . . . . . . . . . . . . . . . . . 322

Step 3. Configure Logon Network Security . . . . . . . . . . . . . . . . . . . . . 325

Step 4. Create Distribution Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

Step 5. Configure the PowerCenter Server Setup . . . . . . . . . . . . . . . . . 327

Working with Email Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

Email Address Tips and Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

Steps to Create an Email Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

Working with Post-Session Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

Using Server Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

Email Variables and Format Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

Configuring Post-Session Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

Sample Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

Working with Suspension Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

Using Email Tasks in a Workflow or Worklet . . . . . . . . . . . . . . . . . . . . . . . 341

Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342

xii Table of Contents

Chapter 13: Pipeline Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 345Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

Partition Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

Number of Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

Partition Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

Configuring Partitioning Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351

Adding and Deleting Partition Points . . . . . . . . . . . . . . . . . . . . . . . . . 353

Adding and Deleting Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

Entering Partition Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

Specifying Partition Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

Adding Keys and Key Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358

Cache Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359

Round-Robin Partition Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360

Hash Keys Partition Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361

Hash Auto-Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361

Hash User Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

Key Range Partition Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363

Adding a Partition Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

Adding Key Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365

Adding Filter Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

Pass-Through Partition Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

Database Partitioning Partition Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369

Partitioning Relational Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371

Entering an SQL Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371

Entering a Filter Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372

Partitioning File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374

Guidelines for Partitioning File Sources . . . . . . . . . . . . . . . . . . . . . . . . 374

Using One Thread to Read a File Source . . . . . . . . . . . . . . . . . . . . . . . 375

Using Multiple Threads to Read a File Source . . . . . . . . . . . . . . . . . . . 375

Configuring for File Partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375

Partitioning Relational Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378

Database Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379

Partitioning File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380

Configuring Connection Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380

Configuring File Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

Partitioning Joiner Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384

Partitioning Sorted Joiner Transformations . . . . . . . . . . . . . . . . . . . . . 384

Table of Contents xiii

Using Sorted Flat Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385

Using Sorted Relational Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387

Using Sorter Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389

Optimizing Sorted Joiner Transformations with Partitions . . . . . . . . . . 390

Partitioning Lookup Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391

Partitioning Sorter Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392

Configuring Sorter Transformation Work Directories . . . . . . . . . . . . . . 392

Mapping Variables in Partitioned Pipelines. . . . . . . . . . . . . . . . . . . . . . . . . 394

Partitioning Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395

Restrictions on the Number of Partitions . . . . . . . . . . . . . . . . . . . . . . . 395

Partition Restrictions for Editing Objects . . . . . . . . . . . . . . . . . . . . . . . 396

Partition Restrictions for Informatica Application Products . . . . . . . . . . 397

Partitioning Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398

Chapter 14: Monitoring Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . 401Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402

Permissions and Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403

Using the Workflow Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404

Opening the Workflow Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404

Connecting to Repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405

Connecting to PowerCenter Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . 405

Filtering Tasks and Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405

Opening and Closing Folders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407

Viewing Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408

Viewing Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408

Customizing Workflow Monitor Options . . . . . . . . . . . . . . . . . . . . . . . . . . 409

Configuring General Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409

Configuring Gantt Chart View Options . . . . . . . . . . . . . . . . . . . . . . . . 411

Configuring Task View Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412

Configuring Advanced Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412

Using Workflow Monitor Toolbars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415

Working with Tasks and Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416

Running a Task, Workflow, or Worklet . . . . . . . . . . . . . . . . . . . . . . . . 416

Resuming a Workflow or Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417

Recovering a Workflow or Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . 417

Stopping or Aborting Tasks and Workflows . . . . . . . . . . . . . . . . . . . . . 418

Scheduling and Unscheduling Workflows . . . . . . . . . . . . . . . . . . . . . . . 418

xiv Table of Contents

Viewing Session Logs and Workflow Logs . . . . . . . . . . . . . . . . . . . . . . 419

Viewing History Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419

Workflow and Task Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421

Using the Gantt Chart View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

Organizing Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

Listing Tasks and Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424

Navigating the Time Window in Gantt Chart View . . . . . . . . . . . . . . . 425

Zooming the Gantt Chart View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426

Performing a Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427

Opening All Folders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429

Using the Task View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430

Filtering in Task View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431

Opening All Folders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433

Monitoring Session Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434

Creating and Viewing Performance Details . . . . . . . . . . . . . . . . . . . . . . . . 436

Enabling Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436

Viewing Session Performance Details . . . . . . . . . . . . . . . . . . . . . . . . . . 436

Memory Requirement for Performance Details . . . . . . . . . . . . . . . . . . . 437

Understanding Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . 437

Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441

Chapter 15: Using Multiple Servers. . . . . . . . . . . . . . . . . . . . . . . . . . 443 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444

Using Server Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445

Using a File Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445

Running Sessions with Cache Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 445

Working with Server Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446

Distributing Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446

Server Grid Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447

Server Grid Guidelines and Requirements . . . . . . . . . . . . . . . . . . . . . . 448

Configuring Server Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450

Configuring Server Grid Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 450

Configuring Workflow Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450

Configuring Session Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450

Override Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451

Steps for Creating a Server Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451

Table of Contents xv

Chapter 16: Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456

Workflow Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457

Workflow Log Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458

Configuring Workflow Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459

Viewing Workflow Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462

Session Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463

Session Log Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463

Load Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467

Detailed Transformation Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469

Configuring Session Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469

Viewing Session Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474

Reject Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476

Locating Reject Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476

Reading Reject Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477

Chapter 17: Row Error Logging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482

Error Log Code Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482

Understanding the Error Log Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483

PMERR_DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483

PMERR_MSG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485

PMERR_SESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486

PMERR_TRANS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487

Understanding the Error Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489

Configuring Error Log Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493

Chapter 18: Session Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496

Session Log Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497

Changing the Session Log Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497

Changing the Session Log Name and Location . . . . . . . . . . . . . . . . . . . 498

Steps for Using $PMSessionLogFile . . . . . . . . . . . . . . . . . . . . . . . . . . . 498

Database Connection Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499

Source File Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502

Changing the Source File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502

Changing the Source File and Directory . . . . . . . . . . . . . . . . . . . . . . . . 503

xvi Table of Contents

Steps for Using a Source File Parameter . . . . . . . . . . . . . . . . . . . . . . . . 503

Target File Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504

Changing the Target File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504

Changing the Target File and Directory . . . . . . . . . . . . . . . . . . . . . . . . 505

Steps for Using a Target File Parameter . . . . . . . . . . . . . . . . . . . . . . . . 505

Lookup File Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506

Changing the Lookup File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506

Changing the Lookup File and Directory . . . . . . . . . . . . . . . . . . . . . . . 507

Steps for Using a Lookup File Parameter . . . . . . . . . . . . . . . . . . . . . . . 507

Reject File Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508

Changing the Reject File Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508

Changing the Reject File and Directory . . . . . . . . . . . . . . . . . . . . . . . . 509

Steps for Using a Reject File Parameter . . . . . . . . . . . . . . . . . . . . . . . . 509

Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510

Chapter 19: Parameter Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512

Parameter File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513

Guidelines for Creating Parameter Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 515

Sample Parameter File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517

Configuring the Parameter File Location . . . . . . . . . . . . . . . . . . . . . . . . . . 518

Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520

Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521

Chapter 20: External Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524

External Loader Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525

Permissions and Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525

External Loader Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526

Loading Data Using Named Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526

Staging Data to Flat Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526

Partitioning Sessions with External Loaders . . . . . . . . . . . . . . . . . . . . . 526

Errors and Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527

Loading to DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528

Setting DB2 External Loader Operation Modes . . . . . . . . . . . . . . . . . . 528

Configuring Authorities, Privileges, and Permissions . . . . . . . . . . . . . . 528

Configuring DB2 EE External Loader Attributes . . . . . . . . . . . . . . . . . 529

Table of Contents xvii

Configuring DB2 EEE External Loader Attributes . . . . . . . . . . . . . . . . 530

Loading to Oracle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533

Loading Multibyte Data to Oracle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533

Oracle External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533

Reject File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534

Loading to Sybase IQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535

Using Sybase IQ External Loader on UNIX . . . . . . . . . . . . . . . . . . . . . 535

Loading Multibyte Data to Sybase IQ . . . . . . . . . . . . . . . . . . . . . . . . . 535

Sybase IQ External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 536

Loading to Teradata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538

Overriding the Control File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539

Teradata MultiLoad External Loader Attributes . . . . . . . . . . . . . . . . . . 540

Teradata TPump External Loader Attributes . . . . . . . . . . . . . . . . . . . . . 542

Teradata FastLoad External Loader Attributes . . . . . . . . . . . . . . . . . . . . 545

Teradata Warehouse Builder External Loader Attributes . . . . . . . . . . . . 547

Creating an External Loader Connection . . . . . . . . . . . . . . . . . . . . . . . . . . 551

Configuring External Loading in a Session . . . . . . . . . . . . . . . . . . . . . . . . . 553

Configuring a Session to Write to a File . . . . . . . . . . . . . . . . . . . . . . . . 553

Configuring File Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554

Selecting an External Loader Connection . . . . . . . . . . . . . . . . . . . . . . . 555

Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557

Chapter 21: Using FTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560

Mainframe Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560

Creating an FTP Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561

FTP Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561

Steps for Creating an FTP Connection . . . . . . . . . . . . . . . . . . . . . . . . 562

Creating an FTP Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565

FTP File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565

FTP File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568

Chapter 22: Using Incremental Aggregation. . . . . . . . . . . . . . . . . . . 573Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574

PowerCenter Server Processing for Incremental Aggregation . . . . . . . . . . . . 575

Reinitializing the Aggregate Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576

Moving or Deleting the Aggregate Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 577

xviii Table of Contents

Finding Index and Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577

Partitioning Guidelines with Incremental Aggregation . . . . . . . . . . . . . . . . 578

Preparing for Incremental Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579

Configuring the Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579

Configuring the Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579

Chapter 23: Using pmcmd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582

Configuring Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585

Configuring PM_CODEPAGENAME. . . . . . . . . . . . . . . . . . . . . . . . . 585

Configuring PMTOOL_DATEFORMAT . . . . . . . . . . . . . . . . . . . . . . 585

Configuring Repository Username and Password . . . . . . . . . . . . . . . . . 586

Configuring PM_HOME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587

Using the Command Line Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589

Connecting to the PowerCenter Server in the Command Line Mode . . . 589

pmcmd Return Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590

Using the Interactive Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592

Connecting to the PowerCenter Server in the Interactive Mode . . . . . . . 592

Setting Defaults in the Interactive Mode . . . . . . . . . . . . . . . . . . . . . . . 593

pmcmd Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594

Command Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594

Using Quotation Marks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595

Syntax Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595

Aborttask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596

Abortworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597

Connect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597

Disconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598

Exit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598

Getrunningsessionsdetails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598

Getserverdetails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599

Getserverproperties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599

Getsessionstatistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600

Gettaskdetails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601

Getworkflowdetails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601

Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602

Pingserver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602

Quit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602

Table of Contents xix

Resumeworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603

Resumeworklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603

Scheduleworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604

Setfolder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604

Setnowait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605

Setwait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605

Showsettings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605

Shutdownserver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605

Starttask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606

Startworkflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607

Stoptask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609

Stopworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609

Unscheduleworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610

Unsetfolder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610

Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611

Waittask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611

Waitworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611

Chapter 24: Session Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614

Memory Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614

Cache Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615

Determining Cache Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617

Cache Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617

Cache Column Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618

Cache Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620

Aggregator Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621

Calculating the Aggregator Index Cache. . . . . . . . . . . . . . . . . . . . . . . . 621

Calculating the Aggregator Data Cache . . . . . . . . . . . . . . . . . . . . . . . . 622

Joiner Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624

Calculating the Number of Master Rows . . . . . . . . . . . . . . . . . . . . . . . 625

Calculating the Joiner Index Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . 625

Calculating the Joiner Data Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626

Lookup Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628

Static Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628

Dynamic Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628

Sharing Partitioned Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629

xx Table of Contents

Calculating the Lookup Index Cache . . . . . . . . . . . . . . . . . . . . . . . . . . 629

Calculating the Lookup Data Cache . . . . . . . . . . . . . . . . . . . . . . . . . . 631

Rank Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632

Calculating the Rank Index Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . 632

Calculating the Rank Data Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633

Chapter 25: Performance Tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . 635Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636

Identifying the Performance Bottleneck . . . . . . . . . . . . . . . . . . . . . . . . . . . 637

Identifying Target Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637

Identifying Source Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637

Identifying Mapping Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638

Identifying a Session Bottleneck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639

Identifying a System Bottleneck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640

Optimizing the Target Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642

Dropping Indexes and Key Constraints . . . . . . . . . . . . . . . . . . . . . . . . 642

Increasing Checkpoint Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642

Bulk Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642

External Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643

Increasing Database Network Packet Size . . . . . . . . . . . . . . . . . . . . . . . 643

Optimizing Oracle Target Databases . . . . . . . . . . . . . . . . . . . . . . . . . . 643

Optimizing the Source Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645

Optimizing the Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645

Using tempdb to Join Sybase and Microsoft SQL Server Tables . . . . . . . 646

Using Conditional Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646

Increasing Database Network Packet Sizes . . . . . . . . . . . . . . . . . . . . . . 646

Connecting to Oracle Source Databases . . . . . . . . . . . . . . . . . . . . . . . . 646

Optimizing the Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647

Configuring Single-Pass Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647

Optimizing Datatype Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . 648

Eliminating Transformation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 648

Optimizing Lookup Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 649

Optimizing Filter Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 650

Optimizing Aggregator Transformations . . . . . . . . . . . . . . . . . . . . . . . 650

Optimizing Joiner Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 651

Optimizing Sequence Generator Transformations . . . . . . . . . . . . . . . . . 652

Optimizing Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652

Table of Contents xxi

Optimizing the Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655

Pipeline Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655

Allocating Buffer Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655

Increasing the Cache Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658

Increasing the Commit Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658

Disabling High Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658

Reducing Error Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659

Removing Staging Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659

Optimizing the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660

Improving Network Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660

Using Multiple PowerCenter Servers . . . . . . . . . . . . . . . . . . . . . . . . . . 661

Using Server Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661

Running the PowerCenter Server in ASCII Data Movement Mode . . . . . 661

Using Additional CPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661

Reducing Paging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662

Using Processor Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662

Pipeline Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663

Optimizing the Source Database for Partitioning . . . . . . . . . . . . . . . . . 663

Optimizing the Target Database for Partitioning . . . . . . . . . . . . . . . . . 664

Appendix A: Session Properties Reference . . . . . . . . . . . . . . . . . . . 667General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668

Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670

General Options Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670

Performance Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673

Config Object Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675

Advanced Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675

Log Options Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677

Error Handling Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678

Mapping Tab (Transformations View) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681

Connections Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681

Sources Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683

Targets Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692

Mapping Tab (Partitions View) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705

Partition Properties Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705

KeyRange Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706

HashKeys Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706

xxii Table of Contents

Partition Points Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706

Non-Partition Points Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709

Components Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710

Reusable Pre- or Post-Session Commands . . . . . . . . . . . . . . . . . . . . . . 711

Non-Reusable Pre- or Post-Session Commands . . . . . . . . . . . . . . . . . . 712

Reusable Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714

Non-Reusable Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715

Email Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715

Metadata Extensions Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718

Appendix B: Workflow Properties Reference . . . . . . . . . . . . . . . . . . 721General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722

Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724

Scheduler Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726

Edit Scheduler Settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727

Variables Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731

Events Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732

Metadata Extensions Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733

Appendix C: Session Properties Comparison Reference . . . . . . . . 735Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736

General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737

General Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737

Source Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738

Target Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743

Session Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750

Performance Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752

Source Location Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754

Time Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755

Schedule Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755

Start Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756

Duration Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756

Use Absolute Time Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757

Log and Error Handling Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758

Log File Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758

Parameter File Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759

Batch Handling Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759

Table of Contents xxiii

Error Handling Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759

Transformations Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761

Partitions Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763

xxiv Table of Contents

List of Figures

Figure 1-1. PowerCenter Server and Data Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Figure 1-2. Partitioned Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Figure 1-3. PowerCenter Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Figure 1-4. Thread Creation for a Simple Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Figure 1-5. Thread Creation for a Pass-through Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Figure 1-6. Pipeline Stages in a Mapping With an Unsorted Aggregator Transformation . . . . . 17

Figure 1-7. Pipeline Stages in a Mapping with an Additional Partition Point . . . . . . . . . . . . . . 18

Figure 1-8. Thread Creation for a Mapping with Three Partitions . . . . . . . . . . . . . . . . . . . . . . 18

Figure 1-9. Thread Creation with Joiner Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Figure 1-10. Thread Creation with a Partition Point at a Joiner Transformation . . . . . . . . . . . 20

Figure 1-11. Target Load Order Groups and Source Pipelines . . . . . . . . . . . . . . . . . . . . . . . . . 22

Figure 1-12. Event Viewer Application Log Message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Figure 1-13. Application Log Message Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Figure 2-1. Workflow Manager General Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Figure 2-2. Workflow Manager Format Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Figure 2-3. Copy Wizard, Versioning, and Target Load Type Options . . . . . . . . . . . . . . . . . . 43

Figure 3-1. Sample Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Figure 3-2. Workflow Manager Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Figure 3-3. Check In Workflow Manager Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Figure 3-4. Query Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Figure 3-5. Diff Tool Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Figure 4-1. Sample Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Figure 4-2. Sample Workflow With Two Branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Figure 4-3. Valid Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Figure 4-4. Example of a Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Figure 4-5. Setting Link Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Figure 4-6. Displaying Link Condition in the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Figure 4-7. Expression Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Figure 4-8. Expression Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

Figure 4-9. Expression Using a Pre-Defined Workflow Variable . . . . . . . . . . . . . . . . . . . . . . 107

Figure 4-10. Status Variable Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Figure 4-11. PrevTaskStatus Variable Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Figure 4-12. Sample Workflow Using Workflow Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Figure 4-13. Schedule tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Figure 4-14. Customized Repeat Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Figure 4-15. Example Workflow - Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

Figure 4-16. Running Part of a Workflow - Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Figure 5-1. General Tab - Edit Tasks Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Figure 5-2. Revert Button in Session Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Figure 5-3. Run If Previous Completed Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

List of Figures xxv

Figure 5-4. Example Workflow Using a Decision Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .150

Figure 5-5. Example Workflow without a Decision Task . . . . . . . . . . . . . . . . . . . . . . . . . . . .150

Figure 5-6. Expanded Example Workflow Using a Decision Task . . . . . . . . . . . . . . . . . . . . . .151

Figure 5-7. Example of User-Defined Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .153

Figure 5-8. Example Workflow Using the Timer Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .161

Figure 6-1. Workflow with Multiple Worklets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .167

Figure 6-2. Workflow with Nested Worklets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .168

Figure 6-3. Example of Persistent Worklet Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .169

Figure 7-1. Session Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .177

Figure 7-2. Session Target Object Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .179

Figure 7-3. Connection Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .181

Figure 7-4. Config Object Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .183

Figure 7-5. Session Configuration Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .184

Figure 7-6. Stop or Continue the Session on Pre- or Post-Session SQL Errors . . . . . . . . . . . . .187

Figure 7-7. Make Reusable Option for Pre-Session Shell Commands . . . . . . . . . . . . . . . . . . . .189

Figure 7-8. Stop or Continue the Session on Pre-Session Shell Command Error . . . . . . . . . . . .193

Figure 7-9. Assign Server Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .198

Figure 8-1. Sources Node of the Session Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .210

Figure 8-2. Readers Settings in the Sources Node of the Mapping Tab . . . . . . . . . . . . . . . . . .211

Figure 8-3. Connections Settings in the Sources Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .212

Figure 8-4. Properties Settings in the Sources Node of the Mapping Tab . . . . . . . . . . . . . . . . .213

Figure 8-5. Treat Source Rows As Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .215

Figure 8-6. Source Table Owner Name Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .216

Figure 8-7. SQL Query Override Property in the Session Properties . . . . . . . . . . . . . . . . . . . .217

Figure 8-8. Properties Settings in the Sources Node for a Flat File Source . . . . . . . . . . . . . . . .219

Figure 8-9. Flat Files Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .221

Figure 8-10. Fixed-Width File Properties Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .221

Figure 8-11. Flat Files Dialog Box. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .223

Figure 8-12. Delimited File Properties Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .223

Figure 8-13. Line Sequential Buffer Length Property for File Sources . . . . . . . . . . . . . . . . . . .225

Figure 9-1. Defining Target Properties in the Session Properties . . . . . . . . . . . . . . . . . . . . . . .236

Figure 9-2. Writers Settings on the Mapping Tab of the Session Properties . . . . . . . . . . . . . . .237

Figure 9-3. Connections Settings on the Mapping Tab of the Session Properties . . . . . . . . . . .238

Figure 9-4. Properties Settings on the Mapping Tab of the Session Properties . . . . . . . . . . . . .239

Figure 9-5. Properties Settings on the Mapping Tab for a Relational Target . . . . . . . . . . . . . .242

Figure 9-6. Test Load Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .244

Figure 9-7. Session Retry on Deadlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .247

Figure 9-8. Mapping Using Constraint-Based Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .250

Figure 9-9. Properties Settings on the Mapping Tab for a Flat File Target . . . . . . . . . . . . . . . .262

Figure 9-10. Test Load Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .264


Figure 9-12. Fixed Width Properties Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .265


xxvi List of Figures

Figure 9-14. Delimited File Properties Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

Figure 10-1. Mapping with a Single Commit Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

Figure 10-2. Mapping with Multiple Commit Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

Figure 10-3. Mapping with Targets Connected to a Commit Source . . . . . . . . . . . . . . . . . . . 281

Figure 10-4. Mapping a Custom Transformation with a Commit Source . . . . . . . . . . . . . . . . 282

Figure 10-5. Roll Back on Failed Commit Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

Figure 10-6. Transaction Control Units. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

Figure 10-7. Session Commit Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

Figure 11-1. Mapping You Can Enable for Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

Figure 11-2. Mapping You Cannot Enable for Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304

Figure 11-3. Modified Mapping You Can Enable for Recovery . . . . . . . . . . . . . . . . . . . . . . . 304

Figure 11-4. Resuming a Suspended Workflow with Sequential Sessions . . . . . . . . . . . . . . . . 306

Figure 11-5. Resuming a Suspended Workflow with Concurrent Sessions . . . . . . . . . . . . . . . 307

Figure 11-6. Recovering Part of a Workflow With Sequential Sessions. . . . . . . . . . . . . . . . . . 308

Figure 11-7. Recovering Part of a Workflow with Concurrent Sessions . . . . . . . . . . . . . . . . . 309

Figure 11-8. Recovering Concurrent Sessions Individually . . . . . . . . . . . . . . . . . . . . . . . . . . 312

Figure 12-1. Email Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

Figure 12-2. Post-Session Email Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

Figure 12-3. Suspension Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

Figure 12-4. Email Task in a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341

Figure 12-5. Using Post-Session Commands to Generate Reports . . . . . . . . . . . . . . . . . . . . . 342

Figure 12-6. Using Email Variables to Attach Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

Figure 12-7. Sending Email without Microsoft Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

Figure 13-1. Default Partition Points and Stages in a Sample Mapping . . . . . . . . . . . . . . . . . 347

Figure 13-2. Threads Created for a Sample Mapping with Three Partitions . . . . . . . . . . . . . . 348

Figure 13-3. Sample Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349

Figure 13-4. Session Properties Partitions View on the Mapping Tab . . . . . . . . . . . . . . . . . . 351

Figure 13-5. Edit Partition Point Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352

Figure 13-6. Sample Mapping Showing Valid Partition Points . . . . . . . . . . . . . . . . . . . . . . . 354

Figure 13-7. Mapping where Round-robin Partitioning Can Increase Performance . . . . . . . . . 360

Figure 13-8. Mapping where Hash Partitioning Can Increase Performance . . . . . . . . . . . . . . 361

Figure 13-9. Edit Partition Key Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

Figure 13-10. Mapping where Key Range Partitioning Can Increase Performance . . . . . . . . . 363

Figure 13-11. Edit Partition Key Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

Figure 13-12. Adding Key Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365

Figure 13-13. Mapping where Pass-through Partitioning Can Increase Performance . . . . . . . . 367

Figure 13-14. Overriding the SQL Query and Entering a Filter Condition . . . . . . . . . . . . . . 371

Figure 13-15. Properties Settings for Relational Targets in the Session Properties . . . . . . . . . . 378

Figure 13-16. Connections Settings for File Targets in the Session Properties . . . . . . . . . . . . 381

Figure 13-17. Properties Settings for File Targets in the Session Properties . . . . . . . . . . . . . . 382

Figure 13-18. Sorted File Data with 1:n Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386

Figure 13-19. Sorted File Data Passed Through a Single Partition . . . . . . . . . . . . . . . . . . . . . 387

Figure 13-20. Sorted Relational Data with 1:n Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . 388

List of Figures xxvii

Figure 13-21. Sorted Relational Data Passed Through a Single Partition . . . . . . . . . . . . . . . . .389

Figure 13-22. Using Sorter Transformations with Hash Auto-Keys to Maintain Sort Order . . .390

Figure 13-23. Session Properties - Configuring Sorter Transformations . . . . . . . . . . . . . . . . . .393

Figure 14-1. Workflow Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .403

Figure 14-2. Workflow Monitor Statistics Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .408

Figure 14-3. General Tab for Workflow Monitor Options . . . . . . . . . . . . . . . . . . . . . . . . . . .410

Figure 14-4. Gantt Chart Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .411

Figure 14-5. Task View Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .412

Figure 14-6. Advanced Tab for Workflow Monitor Options . . . . . . . . . . . . . . . . . . . . . . . . . .413

Figure 14-7. Standard Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .415

Figure 14-8. Server Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .415

Figure 14-9. View Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .415

Figure 14-10. Filter Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .415

Figure 14-11. History Names Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .420

Figure 14-12. Gantt Chart View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .423

Figure 14-13. Organizing Gantt Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .426

Figure 14-14. Zooming the Gantt Chart View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .427

Figure 14-15. Task View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .431

Figure 14-16. Session Properties Transformation Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . .434

Figure 15-1. Distributing Sessions in a Server Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .446

Figure 15-2. Running a Non-session Task on the Master Server . . . . . . . . . . . . . . . . . . . . . . .447

Figure 16-1. Properties Settings on the Mapping Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .477

Figure 18-1. Using $PMSessionLogFile as the Name of the Session Log . . . . . . . . . . . . . . . . .497

Figure 18-2. Using Parameters to Change the Session Source File . . . . . . . . . . . . . . . . . . . . . .502

Figure 18-3. Using Parameters to Change the Session Target File . . . . . . . . . . . . . . . . . . . . . .504

Figure 18-4. Using Parameters to Change the Session Lookup File . . . . . . . . . . . . . . . . . . . . .506

Figure 18-5. Using Parameters to Change the Reject File Name . . . . . . . . . . . . . . . . . . . . . . .508

Figure 20-1. Control File Editor Dialog Box for Teradata . . . . . . . . . . . . . . . . . . . . . . . . . . .539

Figure 20-2. Writers Settings on the Mapping Tab. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .553

Figure 20-3. Properties Settings on the Mapping Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .554

Figure 20-4. Connections Settings on the Mapping Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . .556

Figure 22-1. Incremental Aggregation Session Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . .580

Figure 25-1. Single-Pass Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .648

Figure A-1. General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .668

Figure A-2. Properties Tab - General Options Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .670

Figure A-3. Properties Tab - Performance Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .673

Figure A-4. Config Object Tab - Advanced Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .676

Figure A-5. Config Object Tab - Log Option Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .677

Figure A-6. Config Object Tab - Error Handling Settings . . . . . . . . . . . . . . . . . . . . . . . . . . .679

Figure A-7. Mapping Tab - Connections Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .681

Figure A-8. Mapping Tab - Sources Node - Readers Settings . . . . . . . . . . . . . . . . . . . . . . . . .684

Figure A-9. Mapping Tab - Sources Node - Connections Settings . . . . . . . . . . . . . . . . . . . . . .685

Figure A-10. Mapping Tab - Sources Node - Properties Settings . . . . . . . . . . . . . . . . . . . . . . .686

xxviii List of Figures

Figure A-11. Flat Files Dialog Box for Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688

Figure A-12. Fixed Width Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689

Figure A-13. Delimited Properties for File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690

Figure A-14. Mapping Tab - Targets Node - Writers Settings . . . . . . . . . . . . . . . . . . . . . . . . 693

Figure A-15. Mapping Tab - Targets Node - Connections Settings . . . . . . . . . . . . . . . . . . . . 694

Figure A-16. Mapping Tab - Targets Node - Properties Settings (Relational) . . . . . . . . . . . . . 696

Figure A-17. Mapping Tab - Targets Node - File Properties Settings . . . . . . . . . . . . . . . . . . . 699

Figure A-18. Flat Files Dialog Box for Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701

Figure A-19. Fixed-Width Properties for File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702

Figure A-20. Delimited Properties for File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702

Figure A-21. Mapping Tab - Transformations Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704

Figure A-22. Mapping Tab - Partitions Properties Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705

Figure A-23. Mapping Tab - KeyRange Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706

Figure A-24. Mapping Tab - Partition Points Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707

Figure A-25. Edit Partition Point Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708

Figure A-26. Edit Partition Key Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709

Figure A-27. Components Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710

Figure A-28. Task Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712

Figure A-29. Edit Pre-Session Command Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713

Figure A-30. Email Object Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715

Figure A-31. On-Success or On-Failure Email - General Tab . . . . . . . . . . . . . . . . . . . . . . . . 716

Figure A-32. On-Success or On-Failure Email - Properties Tab . . . . . . . . . . . . . . . . . . . . . . . 717

Figure A-33. Metadata Extensions Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718

Figure B-1. Workflow Properties - General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722

Figure B-2. Workflow Properties - Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724

Figure B-3. Workflow Properties - Scheduler Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726

Figure B-4. Workflow Properties - Scheduler Tab - Edit Scheduler Dialog Box . . . . . . . . . . . 727

Figure B-5. Workflow Properties - Customized Repeat Dialog Box . . . . . . . . . . . . . . . . . . . . 729

Figure B-6. Workflow Properties - Variables Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731

Figure B-7. Workflow Properties - Events Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732

Figure B-8. Workflow Properties - Metadata Extensions Tab . . . . . . . . . . . . . . . . . . . . . . . . 733

Figure C-1. Server Manager General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737

Figure C-2. Server Manager Source Options Dialog Box for File Sources . . . . . . . . . . . . . . . . 739

Figure C-3. Server Manager Fixed-Width Properties Dialog Box . . . . . . . . . . . . . . . . . . . . . . 740

Figure C-4. Server Manager Delimited File Properties Dialog Box . . . . . . . . . . . . . . . . . . . . . 741

Figure C-5. Server Manager Source Options Dialog Box (XML Sources) . . . . . . . . . . . . . . . . 741

Figure C-6. Server Manager FTP Properties Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742

Figure C-7. Server Manager Targets Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744

Figure C-8. Server Manager Output Files Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745

Figure C-9. Server Manager External Loader Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747

Figure C-10. Server Manager Fixed-Width Dialog Box (Output Files) . . . . . . . . . . . . . . . . . . 747

Figure C-11. Server Manager Delimited File Properties Dialog Box (Output Files) . . . . . . . . 748

Figure C-12. Server Manager XML Target Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748

List of Figures xxix

Figure C-13. Server Manager Reject File Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .749

Figure C-14. Server Manager Pre-Session Commands Dialog Box . . . . . . . . . . . . . . . . . . . . . .750

Figure C-15. Server Manager Post-Session Commands and Email . . . . . . . . . . . . . . . . . . . . . .751

Figure C-16. Server Manager Configuration Parameter Dialog Box . . . . . . . . . . . . . . . . . . . . .752

Figure C-17. Server Manager Source Location Tab. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .754

Figure C-18. Server Manager Time tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .755

Figure C-19. Server Manager Repeat Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .756

Figure C-20. Server Manager Log and Error Handling Tab . . . . . . . . . . . . . . . . . . . . . . . . . . .758

Figure C-21. Server Manager Transformations Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .761

xxx List of Figures

List of Tables

Table 1-1. PowerCenter Server Connectivity Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Table 1-2. Processing Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Table 2-1. Workflow Manager General Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Table 2-2. Workflow Manager Format Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Table 2-3. Workflow Manager Miscellaneous Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Table 2-4. Default Permissions for Connection Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Table 2-5. Server Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Table 2-6. TCP/IP Settings to Register a Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Table 2-7. Native Connect String Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Table 2-8. Source and Target Code Page Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Table 2-9. Relational Database Connection Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Table 2-10. Relational Database Connection Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Table 3-1. Metadata Extension Attributes in the Workflow Manager . . . . . . . . . . . . . . . . . . . . 83

Table 3-2. Workflow Manager Keyboard Shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Table 3-3. Keyboard Shortcuts for Navigating the Workspace . . . . . . . . . . . . . . . . . . . . . . . . . 86

Table 4-1. Task-Specific Workflow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Table 4-2. Datatype Default Values for User-defined Workflow Variables . . . . . . . . . . . . . . . 110

Table 4-3. Schedule Tab Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Table 4-4. Repeat Dialog Box Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Table 5-1. Workflow Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Table 5-2. Timer Task Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

Table 7-1. Apply All Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

Table 7-2. PowerCenter Server Behavior for Failed Sessions . . . . . . . . . . . . . . . . . . . . . . . . . 201

Table 8-1. Treat Source Rows As Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

Table 8-2. Flat File Source Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

Table 8-3. Fixed-Width File Properties for File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

Table 8-4. Delimited File Properties for File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

Table 8-5. Support for ASCII and Unicode Data Movement Modes . . . . . . . . . . . . . . . . . . . 226

Table 8-6. Null Character Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

Table 9-1. Support for ASCII and Unicode Data Movement Modes . . . . . . . . . . . . . . . . . . . 234

Table 9-2. Relational Target Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

Table 9-3. Test Load Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

Table 9-4. PowerCenter Server Commands on Supported Databases . . . . . . . . . . . . . . . . . . . 245

Table 9-5. Flat File Target Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

Table 9-6. Test Load Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

Table 9-7. Writing to a Fixed-Width Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

Table 9-8. Delimited File Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

Table 9-9. Datatype Modifications for File Target Columns . . . . . . . . . . . . . . . . . . . . . . . . . 269

Table 9-10. Field Length Measurements for Fixed-Width Flat File Targets . . . . . . . . . . . . . . 270

Table 9-11. Characters to Include when Calculating Field Length for Fixed-Width Targets . . 270

List of Tables xxxi

Table 10-1. Transformation Scope Property Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .288

Table 10-2. Session Commit Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .292

Table 11-1. PM_RECOVERY Table Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .299

Table 11-2. PM_TGT_RUN_ID Table Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .299

Table 11-3. pmcmd Return Codes for Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .300

Table 11-4. Transformations that Output Repeatable Data . . . . . . . . . . . . . . . . . . . . . . . . . . .301

Table 12-1. Email Variables for Post-Session Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .334

Table 12-2. Format Tags for Email Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .334

Table 13-1. Default Partition Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .347

Table 13-2. Options on Session Properties Partitions View on the Mapping Tab . . . . . . . . . . .352

Table 13-3. Edit Partition Point Dialog Box Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .353

Table 13-4. Valid Partition Types for Partition Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .357

Table 13-5. File Properties Settings for File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .376

Table 13-6. Configuring Source File Name for Single-Threaded Reading . . . . . . . . . . . . . . . .376

Table 13-7. Configuring Source File Name for Multi-Threaded Reading . . . . . . . . . . . . . . . . .377

Table 13-8. Partitioning Relational Target Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .379

Table 13-9. File Targets Connection Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .381

Table 13-10. Target File Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .382

Table 13-11. Variable Value Calculations with Partitioned Sessions . . . . . . . . . . . . . . . . . . . .394

Table 13-12. Restrictions on the Number of Partitions for Transformations . . . . . . . . . . . . . .396

Table 13-13. Partitioning Guidelines for Informatica Application Products . . . . . . . . . . . . . . .397

Table 14-1. Workflow Monitor General Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .410

Table 14-2. Gantt Chart Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .411

Table 14-3. Advanced Workflow Monitor Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .413

Table 14-4. Workflow and Task Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .421

Table 14-5. Session Details on the Transformation Statistics Tab . . . . . . . . . . . . . . . . . . . . . .434

Table 14-6. Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .438

Table 15-1. Losing Connectivity in a Server Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .448

Table 15-2. Override Workflow Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .451

Table 15-3. Override Server Grid Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .451

Table 16-1. Log File Default Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .456

Table 16-2. Workflow Log Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .458

Table 16-3. Session Log Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .464

Table 16-4. Session Log Tracing Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .473

Table 16-5. Row Indicators in Reject File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .478

Table 16-6. Column Indicators in Reject File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .479

Table 17-1. PMERR_DATA Table Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .483

Table 17-2. PMERR_MSG Table Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .485

Table 17-3. PMERR_SESS Table Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .487

Table 17-4. PMERR_TRANS Table Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .487

Table 17-5. Error Log File Column Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .490

Table 17-6. Error Log Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .494

Table 18-1. Naming Conventions for User-Defined Session Parameters . . . . . . . . . . . . . . . . .496

xxxii List of Tables

Table 19-1. Parameters and Variables in Parameter File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513

Table 19-2. Naming Conventions for User-Defined Session Parameters . . . . . . . . . . . . . . . . . 520

Table 20-1. Partitioning Guidelines for External Loaders . . . . . . . . . . . . . . . . . . . . . . . . . . . 527

Table 20-2. DB2 EE External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529

Table 20-3. DB2 EE External Loader Return Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530

Table 20-4. DB2 EEE External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531

Table 20-5. Oracle External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534

Table 20-6. Sybase IQ External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536

Table 20-7. Teradata MultiLoad External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 540

Table 20-8. Teradata MultiLoad External Loader Attributes Defined at the Session Level . . . . 542

Table 20-9. Teradata TPump External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . 542

Table 20-10. Teradata TPump External Loader Attributes Defined at the Session Level . . . . . 544

Table 20-11. Teradata FastLoad External Loader Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 545

Table 20-12. Teradata FastLoad External Loader Attributes Defined at the Session Level . . . . 546

Table 20-13. Teradata Warehouse Builder Operators and Protocol . . . . . . . . . . . . . . . . . . . . 547

Table 20-14. Teradata Warehouse Builder External Loader Attributes . . . . . . . . . . . . . . . . . . 547

Table 20-15. Teradata Warehouse Builder External Loader Attributes Defined at the Session Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549

Table 20-16. Properties Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555

Table 21-1. FTP Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563

Table 23-1. pmcmd Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582

Table 23-2. Connection Information for the Command Line Mode . . . . . . . . . . . . . . . . . . . 590

Table 23-3. pmcmd Return Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590

Table 23-4. Setting Defaults for the Interactive Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593

Table 23-5. Command Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594

Table 23-6. pmcmd Syntax Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595

Table 24-1. Caching Storage Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614

Table 24-2. Cache File Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616

Table 24-3. Aggregate Cache Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617

Table 24-7. Column Sizes for Cache Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618

Table 24-4. Rank Cache Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618

Table 24-5. Joiner Cache Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618

Table 24-6. Lookup Cache Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618

Table 25-1. Session Tuning Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655

Table A-1. General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668

Table A-2. Properties Tab - General Options Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671

Table A-3. Properties Tab - Performance Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674

Table A-4. Config Object Tab - Advanced Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676

Table A-5. Config Object Tab - Log Options Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678

Table A-6. Config Object Tab - Error Handling Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 679

Table A-7. Mapping Tab - Connections Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682

Table A-8. Mapping Tab - Sources Node - Connections Settings . . . . . . . . . . . . . . . . . . . . . 685

Table A-9. Mapping Tab - Sources Node - Properties Settings (Relational Sources) . . . . . . . . 686

Table A-10. Mapping Tab - Sources Node - Properties Settings (File Sources) . . . . . . . . . . . . 687

List of Tables xxxiii

Table A-11. Fixed-Width Properties for File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .689

Table A-12. Delimited Properties for File Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .691

Table A-13. Mapping Tab - Targets Node - Writers Settings . . . . . . . . . . . . . . . . . . . . . . . . .693

Table A-14. Mapping Tab - Targets Node - Connections Settings . . . . . . . . . . . . . . . . . . . . . .695

Table A-15. Mapping Tab - Targets Node - Properties Settings (Relational) . . . . . . . . . . . . . .697

Table A-16. Mapping Tab - Targets Node - File Properties Settings . . . . . . . . . . . . . . . . . . . .699

Table A-17. Fixed-Width Properties for File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .702

Table A-18. Delimited Properties for File Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .703

Table A-19. Mapping Tab - Partition Points Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .707

Table A-20. Edit Partition Point Dialog Box Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .708

Table A-21. Components Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .711

Table A-22. Components Tab Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .711

Table A-23. Pre- or Post-Session Commands - General Tab . . . . . . . . . . . . . . . . . . . . . . . . . .713

Table A-24. Pre- or Post-Session Commands - Properties Tab . . . . . . . . . . . . . . . . . . . . . . . .714

Table A-25. Pre- or Post-Session Commands - Commands Tab . . . . . . . . . . . . . . . . . . . . . . .714

Table A-26. On-Success or On-Failure Emails - General Tab . . . . . . . . . . . . . . . . . . . . . . . . .716

Table A-27. On-Success or On-Failure Emails - Properties Tab . . . . . . . . . . . . . . . . . . . . . . .717

Table A-28. Metadata Extensions Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .718

Table B-1. Workflow Properties - General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .722

Table B-2. Workflow Properties - Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .724

Table B-3. Workflow Properties - Scheduler Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .727

Table B-4. Workflow Properties - Scheduler Tab - Edit Scheduler Dialog Box . . . . . . . . . . . . .728

Table B-5. Workflow Properties - Repeat Dialog Box Options . . . . . . . . . . . . . . . . . . . . . . . .729

Table B-6. Workflow Properties - Variables Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .731

Table B-7. Workflow Properties - Events Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .732

Table B-8. Workflow Properties - Metadata Extensions Tab . . . . . . . . . . . . . . . . . . . . . . . . . .733

Table C-1. General Session Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .738

Table C-2. Source Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .738

Table C-3. File Source Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .739

Table C-4. XML Sources Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .742

Table C-5. FTP Properties Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .743

Table C-6. Target Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .743

Table C-7. Relational Target Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .744

Table C-8. File Target Output Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .746

Table C-9. XML Target Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .748

Table C-10. Reject Files Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .749

Table C-11. Pre-Session Commands Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .750

Table C-12. Post-Session Commands and Email Comparison . . . . . . . . . . . . . . . . . . . . . . . . .751

Table C-13. Performance Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .752

Table C-14. Configuration Parameters Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .753

Table C-15. Log File Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .759

Table C-16. Error Handling Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .759

Table C-17. Transformations Tab Options Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . .761

xxxiv List of Tables

Preface

Welcome to PowerCenter, Informatica’s software product that delivers an open, scalable data integration solution addressing the complete life cycle for all data integration projects including data warehouses and data marts, data migration, data synchronization, and information hubs. PowerCenter combines the latest technology enhancements for reliably managing data repositories and delivering information resources in a timely, usable, and efficient manner.

The PowerCenter metadata repository coordinates and drives a variety of core functions, including extracting, transforming, loading, and managing data. The PowerCenter Server can extract large volumes of data from multiple platforms, handle complex transformations on the data, and support high-speed loads. PowerCenter can simplify and accelerate the process of moving data warehouses from development to test to production.

xxxv

New Features and Enhancements

This section describes new features and enhancements to PowerCenter 7.1.1, 7.1, and 7.0.

PowerCenter 7.1.1This section describes new features and enhancements to PowerCenter 7.1.1.

Data Profiling♦ Data sampling. You can create a data profile for a sample of source data instead of the

entire source. You can view a profile from a random sample of data, a specified percentage of data, or for a specified number of rows starting with the first row.

♦ Verbose data enhancements. You can specify the type of verbose data you want the PowerCenter Server to write to the Data Profiling warehouse. The PowerCenter Server can write all rows, the rows that meet the business rule, or the rows that do not meet the business rule.

♦ Session enhancement. You can save sessions that you create from the Profile Manager to the repository.

♦ Domain Inference function tuning. You can configure the Data Profiling Wizard to filter the Domain Inference function results. You can configure a maximum number of patterns and a minimum pattern frequency. You may want to narrow the scope of patterns returned to view only the primary domains, or you may want to widen the scope of patterns returned to view exception data.

♦ Row Uniqueness function. You can determine unique rows for a source based on a selection of columns for the specified source.

♦ Define mapping, session, and workflow prefixes. You can define default mapping, session, and workflow prefixes for the mappings, sessions, and workflows generated when you create a data profile.

♦ Profile mapping display in the Designer. The Designer displays profile mappings under a profile mappings node in the Navigator.

PowerCenter Server♦ Code page. PowerCenter supports additional Japanese language code pages, such as JIPSE-

kana, JEF-kana, and MELCOM-kana.

♦ Flat file partitioning. When you create multiple partitions for a flat file source session, you can configure the session to create multiple threads to read the flat file source.

♦ pmcmd. You can use parameter files that reside on a local machine with the Startworkflow command in the pmcmd program. When you use a local parameter file, pmcmd passes variables and values in the file to the PowerCenter Server.

xxxvi Preface

♦ SuSE Linux support. The PowerCenter Server runs on SuSE Linux. On SuSE Linux, you can connect to IBM, DB2, Oracle, and Sybase sources, targets, and repositories using native drivers. Use ODBC drivers to access other sources and targets.

♦ Reserved word support. If any source, target, or lookup table name or column name contains a database reserved word, you can create and maintain a file, reswords.txt, containing reserved words. When the PowerCenter Server initializes a session, it searches for reswords.txt in the PowerCenter Server installation directory. If the file exists, the PowerCenter Server places quotes around matching reserved words when it executes SQL against the database.

♦ Teradata external loader. When you load to Teradata using an external loader, you can now override the control file. Depending on the loader you use, you can also override the error, log, and work table names by specifying different tables on the same or different Teradata database.

Repository♦ Exchange metadata with other tools. You can exchange source and target metadata with

other BI or data modeling tools, such as Business Objects Designer. You can export or import multiple objects at a time. When you export metadata, the PowerCenter Client creates a file format recognized by the target tool.

Repository Server♦ pmrep. You can use pmrep to perform the following functions:

− Remove repositories from the Repository Server cache entry list.

− Enable enhanced security when you create a relational source or target connection in the repository.

− Update a connection attribute value when you update the connection.

♦ SuSE Linux support. The Repository Server runs on SuSE Linux. On SuSE Linux, you can connect to IBM, DB2, Oracle, and Sybase repositories.

Security♦ Oracle OS Authentication. You can now use Oracle OS Authentication to authenticate

database users. Oracle OS Authentication allows you to log on to an Oracle database if you have a logon to the operating system. You do not need to know a database user name and password. PowerCenter uses Oracle OS Authentication when the user name for an Oracle connection is PmNullUser.

Web Services Provider♦ Attachment support. When you import web service definitions with attachment groups,

you can pass attachments through the requests or responses in a service session. The document type you can attach is based on the mime content of the WSDL file. You can attach document types such as XML, JPEG, GIF, or PDF.

Preface xxxvii

♦ Pipeline partitioning. You can create multiple partitions in a session containing web service source and target definitions. The PowerCenter Server creates a connection to the Web Services Hub based on the number of sources, targets, and partitions in the session.

XML♦ Multi-level pivoting. You can now pivot more than one multiple-occurring element in an

XML view. You can also pivot the view row.

PowerCenter 7.1This section describes new features and enhancements to PowerCenter 7.1.

Data Profiling♦ Data Profiling for VSAM sources. You can now create a data profile for VSAM sources.

♦ Support for verbose mode for source-level functions. You can now create data profiles with source-level functions and write data to the Data Profiling warehouse in verbose mode.

♦ Aggregator function in auto profiles. Auto profiles now include the Aggregator function.

♦ Creating auto profile enhancements. You can now select the columns or groups you want to include in an auto profile and enable verbose mode for the Distinct Value Count function.

♦ Purging data from the Data Profiling warehouse. You can now purge data from the Data Profiling warehouse.

♦ Source View in the Profile Manager. You can now view data profiles by source definition in the Profile Manager.

♦ PowerCenter Data Profiling report enhancements. You can now view PowerCenter Data Profiling reports in a separate browser window, resize columns in a report, and view verbose data for Distinct Value Count functions.

♦ Prepackaged domains. Informatica provides a set of prepackaged domains that you can include in a Domain Validation function in a data profile.

Documentation♦ Web Services Provider Guide. This is a new book that describes the functionality of Real-time

Web Services. It also includes information from the version 7.0 Web Services Hub Guide.

♦ XML User Guide. This book consolidates XML information previously documented in the Designer Guide, Workflow Administration Guide, and Transformation Guide.

LicensingInformatica provides licenses for each CPU and each repository rather than for each installation. Informatica provides licenses for product, connectivity, and options. You store

xxxviii Preface

the license keys in a license key file. You can manage the license files using the Repository Server Administration Console, the PowerCenter Server Setup, and the command line program, pmlic.

PowerCenter Server♦ 64-bit support. You can now run 64-bit PowerCenter Servers on AIX and HP-UX

(Itanium).

♦ Partitioning enhancements. If you have the Partitioning option, you can define up to 64 partitions at any partition point in a pipeline that supports multiple partitions.

♦ PowerCenter Server processing enhancements. The PowerCenter Server now reads a block of rows at a time. This improves processing performance for most sessions.

♦ CLOB/BLOB datatype support. You can now read and write CLOB/BLOB datatypes.

PowerCenter Metadata ReporterPowerCenter Metadata Reporter modified some report names and uses the PowerCenter 7.1 MX views in its schema.

Repository Server♦ Updating repository statistics. PowerCenter now identifies and updates statistics for all

repository tables and indexes when you copy, upgrade, and restore repositories. This improves performance when PowerCenter accesses the repository.

♦ Increased repository performance. You can increase repository performance by skipping information when you copy, back up, or restore a repository. You can choose to skip MX data, workflow and session log history, and deploy group history.

♦ pmrep. You can use pmrep to back up, disable, or enable a repository, delete a relational connection from a repository, delete repository details, truncate log files, and run multiple pmrep commands sequentially. You can also use pmrep to create, modify, and delete a folder.

Repository♦ Exchange metadata with business intelligence tools. You can export metadata to and

import metadata from other business intelligence tools, such as Cognos Report Net and Business Objects.

♦ Object import and export enhancements. You can compare objects in an XML file to objects in the target repository when you import objects.

♦ MX views. MX views have been added to help you analyze metadata stored in the repository. REP_SERVER_NET and REP_SERVER_NET_REF views allow you to see information about server grids. REP_VERSION_PROPS allows you to see the version history of all objects in a PowerCenter repository.

Preface xxxix

Transformations♦ Flat file lookup. You can now perform lookups on flat files. When you create a Lookup

transformation using a flat file as a lookup source, the Designer invokes the Flat File Wizard. You can also use a lookup file parameter if you want to change the name or location of a lookup between session runs.

♦ Dynamic lookup cache enhancements. When you use a dynamic lookup cache, the PowerCenter Server can ignore some ports when it compares values in lookup and input ports before it updates a row in the cache. Also, you can choose whether the PowerCenter Server outputs old or new values from the lookup/output ports when it updates a row. You might want to output old values from lookup/output ports when you use the Lookup transformation in a mapping that updates slowly changing dimension tables.

♦ Union transformation. You can use the Union transformation to merge multiple sources into a single pipeline. The Union transformation is similar to using the UNION ALL SQL statement to combine the results from two or more SQL statements.

♦ Custom transformation API enhancements. The Custom transformation API includes new array-based functions that allow you to create procedure code that receives and outputs a block of rows at a time. Use these functions to take advantage of the PowerCenter Server processing enhancements.

♦ Midstream XML transformations. You can now create an XML Parser transformation or an XML Generator transformation to parse or generate XML inside a pipeline. The XML transformations enable you to extract XML data stored in relational tables, such as data stored in a CLOB column. You can also extract data from messaging systems, such as TIBCO or IBM MQSeries.

Usability♦ Viewing active folders. The Designer and the Workflow Manager highlight the active

folder in the Navigator.

♦ Enhanced printing. The quality of printed workspace has improved.

Version ControlYou can run object queries that return shortcut objects. You can also run object queries based on the latest status of an object. The query can return local objects that are checked out, the latest version of checked in objects, or a collection of all older versions of objects.

Web Services Provider♦ Real-time Web Services. Real-time Web Services allows you to create services using the

Workflow Manager and make them available to web service clients through the Web Services Hub. The PowerCenter Server can perform parallel processing of both request-response and one-way services.

♦ Web Services Hub. The Web Services Hub now hosts Real-time Web Services in addition to Metadata Web Services and Batch Web Services. You can install the Web Services Hub on a JBoss application server.

xl Preface

Note: PowerCenter Connect for Web Services allows you to create sources, targets, and transformations to call web services hosted by other providers. For more informations, see PowerCenter Connect for Web Services User and Administrator Guide.

Workflow MonitorThe Workflow Monitor includes the following performance and usability enhancements:

♦ When you connect to the PowerCenter Server, you no longer distinguish between online or offline mode.

♦ You can open multiple instances of the Workflow Monitor on one machine.

♦ You can simultaneously monitor multiple PowerCenter Servers registered to the same repository.

♦ The Workflow Monitor includes improved options for filtering tasks by start and end time.

♦ The Workflow Monitor displays workflow runs in Task view chronologically with the most recent run at the top. It displays folders alphabetically.

♦ You can remove the Navigator and Output window.

XML SupportPowerCenter XML support now includes the following features:

♦ Enhanced datatype support. You can use XML schemas that contain simple and complex datatypes.

♦ Additional options for XML definitions. When you import XML definitions, you can choose how you want the Designer to represent the metadata associated with the imported files. You can choose to generate XML views using hierarchy or entity relationships. In a view with hierarchy relationships, the Designer expands each element and reference under its parent element. When you create views with entity relationships, the Designer creates separate entities for references and multiple-occurring elements.

♦ Synchronizing XML definitions. You can synchronize one or more XML definition when the underlying schema changes. You can synchronize an XML definition with any repository definition or file used to create the XML definition, including relational sources or targets, XML files, DTD files, or schema files.

♦ XML workspace. You can edit XML views and relationships between views in the workspace. You can create views, add or delete columns from views, and define relationships between views.

♦ Midstream XML transformations. You can now create an XML Parser transformation or an XML Generator transformation to parse or generate XML inside a pipeline. The XML transformations enable you to extract XML data stored in relational tables, such as data stored in a CLOB column. You can also extract data from messaging systems, such as TIBCO or IBM MQSeries.

Preface xli

♦ Support for circular references. Circular references occur when an element is a direct or indirect child of itself. PowerCenter now supports XML files, DTD files, and XML schemas that use circular definitions.

♦ Increased performance for large XML targets. You can create XML files of several gigabytes in a PowerCenter 7.1 XML session by using the following enhancements:

− Spill to disk. You can specify the size of the cache used to store the XML tree. If the size of the tree exceeds the cache size, the XML data spills to disk in order to free up memory.

− User-defined commits. You can define commits to trigger flushes for XML target files.

− Support for multiple XML output files. You can output XML data to multiple XML targets. You can also define the file names for XML output files in the mapping.

PowerCenter 7.0This section describes new features and enhancements to PowerCenter 7.0.

Data ProfilingIf you have the Data Profiling option, you can profile source data to evaluate source data and detect patterns and exceptions. For example, you can determine implicit data type, suggest candidate keys, detect data patterns, and evaluate join criteria. After you create a profiling warehouse, you can create profiling mappings and run sessions. Then you can view reports based on the profile data in the profiling warehouse.

The PowerCenter Client provides a Profile Manager and a Profile Wizard to complete these tasks.

Data Integration Web Services You can use Data Integration Web Services to write applications to communicate with the PowerCenter Server. Data Integration Web Services is a web-enabled version of the PowerCenter Server functionality available through Load Manager and Metadata Exchange. It is comprised of two services for communication with the PowerCenter Server, Load Manager and Metadata Exchange Web Services running on the Web Services Hub.

Documentation♦ Glossary. The Installation and Configuration Guide contains a glossary of new PowerCenter

terms.

♦ Installation and Configuration Guide. The connectivity information in the Installation and Configuration Guide is consolidated into two chapters. This book now contains chapters titled “Connecting to Databases from Windows” and “Connecting to Databases from UNIX.”

♦ Upgrading metadata. The Installation and Configuration Guide now contains a chapter titled “Upgrading Repository Metadata.” This chapter describes changes to repository

xlii Preface

objects impacted by the upgrade process. The change in functionality for existing objects depends on the version of the existing objects. Consult the upgrade information in this chapter for each upgraded object to determine whether the upgrade applies to your current version of PowerCenter.

Functions♦ Soundex. The Soundex function encodes a string value into a four-character string.

SOUNDEX works for characters in the English alphabet (A-Z). It uses the first character of the input string as the first character in the return value and encodes the remaining three unique consonants as numbers.

♦ Metaphone. The Metaphone function encodes string values. You can specify the length of the string that you want to encode. METAPHONE encodes characters of the English language alphabet (A-Z). It encodes both uppercase and lowercase letters in uppercase.

Installation♦ Remote PowerCenter Client installation. You can create a control file containing

installation information, and distribute it to other users to install the PowerCenter Client. You access the Informatica installation CD from the command line to create the control file and install the product.

PowerCenter Metadata ReporterPowerCenter Metadata Reporter replaces Runtime Metadata Reporter and Informatica Metadata Reporter. PowerCenter Metadata Reporter includes the following features:

♦ Metadata browsing. You can use PowerCenter Metadata Reporter to browse PowerCenter 7.0 metadata, such as workflows, worklets, mappings, source and target tables, and transformations.

♦ Metadata analysis. You can use PowerCenter Metadata Reporter to analyze operational metadata, including session load time, server load, session completion status, session errors, and warehouse growth.

PowerCenter Server♦ DB2 bulk loading. You can enable bulk loading when you load to IBM DB2 8.1.

♦ Distributed processing. If you purchase the Server Grid option, you can group PowerCenter Servers registered to the same repository into a server grid. In a server grid, PowerCenter Servers balance the workload among all the servers in the grid.

♦ Row error logging. The session configuration object has new properties that allow you to define error logging. You can choose to log row errors in a central location to help understand the cause and source of errors.

♦ External loading enhancements. When using external loaders on Windows, you can now choose to load from a named pipe. When using external loaders on UNIX, you can now choose to load from staged files.

Preface xliii

♦ External loading using Teradata Warehouse Builder. You can use Teradata Warehouse Builder to load to Teradata. You can choose to insert, update, upsert, or delete data. Additionally, Teradata Warehouse Builder can simultaneously read from multiple sources and load data into one or more tables.

♦ Mixed mode processing for Teradata external loaders. You can now use data driven load mode with Teradata external loaders. When you select data driven loading, the PowerCenter Server flags rows for insert, delete, or update. It writes a column in the target file or named pipe to indicate the update strategy. The control file uses these values to determine how to load data to the target.

♦ Concurrent processing. The PowerCenter Server now reads data concurrently from sources within a target load order group. This enables more efficient joins with minimal usage of memory and disk cache.

♦ Real time processing enhancements. You can now use real-time processing in sessions that also process active transformations, such as the Aggregator transformation. You can apply the transformation logic to rows defined by transaction boundaries.

Repository Server♦ Object export and import enhancements. You can now export and import objects using

the Repository Manager and pmrep. You can export and import multiple objects and objects types. You can export and import objects with or without their dependent objects. You can also export objects from a query result or objects history.

♦ pmrep commands. You can use pmrep to perform change management tasks, such as maintaining deployment groups and labels, checking in, deploying, importing, exporting, and listing objects. You can also use pmrep to run queries. The deployment and object import commands require you to use a control file to define options and resolve conflicts.

♦ Trusted connections. You can now use a Microsoft SQL Server trusted connection to connect to the repository.

Security♦ LDAP user authentication. You can now use default repository user authentication or

Lightweight Directory Access Protocol (LDAP) to authenticate users. If you use LDAP, the repository maintains an association between your repository user name and your external login name. When you log in to the repository, the security module passes your login name to the external directory for authentication. The repository maintains a status for each user. You can now enable or disable users from accessing the repository by changing the status. You do not have to delete user names from the repository.

♦ Use Repository Manager privilege. The Use Repository Manager privilege allows you to perform tasks in the Repository Manager, such as copy object, maintain labels, and change object status. You can perform the same tasks in the Designer and Workflow Manager if you have the Use Designer and Use Workflow Manager privileges.

♦ Audit trail. You can track changes to repository users, groups, privileges, and permissions through the Repository Server Administration Console. The Repository Agent logs security changes to a log file stored in the Repository Server installation directory. The

xliv Preface

audit trail log contains information, such as changes to folder properties, adding or removing a user or group, and adding or removing privileges.

Transformations♦ Custom transformation. Custom transformations operate in conjunction with procedures

you create outside of the Designer interface to extend PowerCenter functionality. The Custom transformation replaces the Advanced External Procedure transformation. You can create Custom transformations with multiple input and output groups, and you can compile the procedure with any C compiler.

You can create templates that customize the appearance and available properties of a Custom transformation you develop. You can specify the icons used for transformation, the colors, and the properties a mapping developer can modify. When you create a Custom transformation template, distribute the template with the DLL or shared library you develop.

♦ Joiner transformation. You can use the Joiner transformation to join two data streams that originate from the same source.

Version ControlThe PowerCenter Client and repository introduce features that allow you to create and manage multiple versions of objects in the repository. Version control allows you to maintain multiple versions of an object, control development on the object, track changes, and use deployment groups to copy specific groups of objects from one repository to another. Version control in PowerCenter includes the following features:

♦ Object versioning. Individual objects in the repository are now versioned. This allows you to store multiple copies of a given object during the development cycle. Each version is a separate object with unique properties.

♦ Check out and check in versioned objects. You can check out and reserve an object you want to edit, and check in the object when you are ready to create a new version of the object in the repository.

♦ Compare objects. The Repository Manager and Workflow Manager allow you to compare two repository objects of the same type to identify differences between them. You can compare Designer objects and Workflow Manager objects in the Repository Manager. You can compare tasks, sessions, worklets, and workflows in the Workflow Manager. The PowerCenter Client tools allow you to compare objects across open folders and repositories. You can also compare different versions of the same object.

♦ Delete or purge a version. You can delete an object from view and continue to store it in the repository. You can recover or undelete deleted objects. If you want to permanently remove an object version, you can purge it from the repository.

♦ Deployment. Unlike copying a folder, copying a deployment group allows you to copy a select number of objects from multiple folders in the source repository to multiple folders in the target repository. This gives you greater control over the specific objects copied from one repository to another.

Preface xlv

♦ Deployment groups. You can create a deployment group that contains references to objects from multiple folders across the repository. You can create a static deployment group that you manually add objects to, or create a dynamic deployment group that uses a query to populate the group.

♦ Labels. A label is an object that you can apply to versioned objects in the repository. This allows you to associate multiple objects in groups defined by the label. You can use labels to track versioned objects during development, improve query results, and organize groups of objects for deployment or export and import.

♦ Queries. You can create a query that specifies conditions to search for objects in the repository. You can save queries for later use. You can make a private query, or you can share it with all users in the repository.

♦ Track changes to an object. You can view a history that includes all versions of an object and compare any version of the object in the history to any other version. This allows you to see the changes made to an object over time.

XML SupportPowerCenter contains XML features that allow you to validate an XML file against an XML schema, declare multiple namespaces, use XPath to locate XML nodes, increase performance for large XML files, format your XML file output for increased readability, and parse or generate XML data from various sources. XML support in PowerCenter includes the following features:

♦ XML schema. You can use an XML schema to validate an XML file and to generate source and target definitions. XML schemas allow you to declare multiple namespaces so you can use prefixes for elements and attributes. XML schemas also allow you to define some complex datatypes.

♦ XPath support. The XML wizard allows you to view the structure of XML schema. You can use XPath to locate XML nodes.

♦ Increased performance for large XML files. When you process an XML file or stream, you can set commits and periodically flush XML data to the target instead of writing all the output at the end of the session. You can choose to append the data to the same target file or create a new target file after each flush.

♦ XML target enhancements. You can format the XML target file so that you can easily view the XML file in a text editor. You can also configure the PowerCenter Server to not output empty elements to the XML target.

Usability♦ Copying objects. You can now copy objects from all the PowerCenter Client tools using

the copy wizard to resolve conflicts. You can copy objects within folders, to other folders, and to different repositories. Within the Designer, you can also copy segments of mappings to a workspace in a new folder or repository.

♦ Comparing objects. You can compare workflows and tasks from the Workflow Manager. You can also compare all objects from within the Repository Manager.

xlvi Preface

♦ Change propagation. When you edit a port in a mapping, you can choose to propagate changed attributes throughout the mapping. The Designer propagates ports, expressions, and conditions based on the direction that you propagate and the attributes you choose to propagate.

♦ Enhanced partitioning interface. The Session Wizard is enhanced to provide a graphical depiction of a mapping when you configure partitioning.

♦ Revert to saved. You can now revert to the last saved version of an object in the Workflow Manager. When you do this, the Workflow Manager accesses the repository to retrieve the last-saved version of the object.

♦ Enhanced validation messages. The PowerCenter Client writes messages in the Output window that describe why it invalidates a mapping or workflow when you modify a dependent object.

♦ Validate multiple objects. You can validate multiple objects in the repository without fetching them into the workspace. You can save and optionally check in objects that change from invalid to valid status as a result of the validation. You can validate sessions, mappings, mapplets, workflows, and worklets.

♦ View dependencies. Before you edit or delete versioned objects, such as sources, targets, mappings, or workflows, you can view dependencies to see the impact on other objects. You can view parent and child dependencies and global shortcuts across repositories. Viewing dependencies help you modify objects and composite objects without breaking dependencies.

♦ Refresh session mappings. In the Workflow Manager, you can refresh a session mapping.

Preface xlvii

About Informatica Documentation

The complete set of documentation for PowerCenter includes the following books:

♦ Data Profiling Guide. Provides information about how to profile PowerCenter sources to evaluate source data and detect patterns and exceptions.

♦ Designer Guide. Provides information needed to use the Designer. Includes information to help you create mappings, mapplets, and transformations. Also includes a description of the transformation datatypes used to process and transform source data.

♦ Getting Started. Provides basic tutorials for getting started.

♦ Installation and Configuration Guide. Provides information needed to install and configure the PowerCenter tools, including details on environment variables and database connections.

♦ PowerCenter Connect® for JMS® User and Administrator Guide. Provides information to install PowerCenter Connect for JMS, build mappings, extract data from JMS messages, and load data into JMS messages.

♦ Repository Guide. Provides information needed to administer the repository using the Repository Manager or the pmrep command line program. Includes details on functionality available in the Repository Manager and Administration Console, such as creating and maintaining repositories, folders, users, groups, and permissions and privileges.

♦ Transformation Language Reference. Provides syntax descriptions and examples for each transformation function provided with PowerCenter.

♦ Transformation Guide. Provides information on how to create and configure each type of transformation in the Designer.

♦ Troubleshooting Guide. Lists error messages that you might encounter while using PowerCenter. Each error message includes one or more possible causes and actions that you can take to correct the condition.

♦ Web Services Provider Guide. Provides information you need to install and configure the Web Services Hub. This guide also provides information about how to use the web services that the Web Services Hub hosts. The Web Services Hub hosts Real-time Web Services, Batch Web Services, and Metadata Web Services.

♦ Workflow Administration Guide. Provides information to help you create and run workflows in the Workflow Manager, as well as monitor workflows in the Workflow Monitor. Also contains information on administering the PowerCenter Server and performance tuning.

♦ XML User Guide. Provides information you need to create XML definitions from XML, XSD, or DTD files, and relational or other XML definitions. Includes information on running sessions with XML data. Also includes details on using the midstream XML transformations to parse or generate XML data within a pipeline.

xlvii i Preface

About this Book

The Workflow Administration Guide is written for developers and administrators who are responsible for creating workflows and sessions, running workflows, and administering the PowerCenter Server. This guide assumes you have knowledge of your operating systems, relational database concepts, and the database engines, flat files or mainframe system in your environment. This guide also assumes you are familiar with the interface requirements for your supporting applications.

The material in this book is available for online use.

Document ConventionsThis guide uses the following formatting conventions:

If you see� It means�

italicized text The word or set of words are especially emphasized.

boldfaced text Emphasized subjects.

italicized monospaced text This is the variable name for a value you enter as part of an operating system command. This is generic text that should be replaced with user-supplied values.

Note: The following paragraph provides additional facts.

Tip: The following paragraph provides suggested uses.

Warning: The following paragraph notes situations where you can overwrite or corrupt data, unless you follow the specified procedure.

monospaced text This is a code example.

bold monospaced text This is an operating system command you enter from a prompt to run a task.

Preface xlix

Other Informatica Resources

In addition to the product manuals, Informatica provides these other resources:

♦ Informatica Customer Portal

♦ Informatica Webzine

♦ Informatica web site

♦ Informatica Developer Network

♦ Informatica Technical Support

Visiting Informatica Customer PortalAs an Informatica customer, you can access the Informatica Customer Portal site at http://my.informatica.com. The site contains product information, user group information, newsletters, access to the Informatica customer support case management system (ATLAS), the Informatica Knowledgebase, Informatica Webzine, and access to the Informatica user community.

Visiting the Informatica WebzineThe Informatica Documentation team delivers an online journal, the Informatica Webzine. This journal provides solutions to common tasks, detailed descriptions of specific features, and tips and tricks to help you develop data warehouses.

The Informatica Webzine is a password-protected site that you can access through the Customer Portal. The Customer Portal has an online registration form for login accounts to its webzine and web support. To register for an account, go to http://my.informatica.com.

If you have any questions, please email [email protected].

Visiting the Informatica Web SiteYou can access Informatica’s corporate web site at http://www.informatica.com. The site contains information about Informatica, its background, upcoming events, and locating your closest sales office. You will also find product information, as well as literature and partner information. The services area of the site includes important information on technical support, training and education, and implementation services.

Visiting the Informatica Developer Network The Informatica Developer Network is a web-based forum for third-party software developers. You can access the Informatica Developer Network at the following URL:

http://devnet.informatica.com

l Preface

The site contains information on how to create, market, and support customer-oriented add-on solutions based on Informatica’s interoperability interfaces.

Obtaining Technical SupportThere are many ways to access Informatica technical support. You can call or email your nearest Technical Support Center listed below or you can use our WebSupport Service.

WebSupport requires a user name and password. You can request a user name and password at http://my.informatica.com.

North America / South America Africa / Asia / Australia / Europe

Informatica Corporation2100 Seaport Blvd.Redwood City, CA 94063Phone: 866.563.6332 or 650.385.5800Fax: 650.213.9489Hours: 6 a.m. - 6 p.m. (PST/PDT)email: [email protected]

Informatica Software Ltd.6 Waltham ParkWaltham Road, White WalthamMaidenhead, BerkshireSL6 3TNPhone: 44 870 606 1525Fax: +44 1628 511 411Hours: 9 a.m. - 5:30 p.m. (GMT)email: [email protected]

BelgiumPhone: +32 15 281 702Hours: 9 a.m. - 5:30 p.m. (local time)

FrancePhone: +33 1 41 38 92 26Hours: 9 a.m. - 5:30 p.m. (local time)

GermanyPhone: +49 1805 702 702Hours: 9 a.m. - 5:30 p.m. (local time)

NetherlandsPhone: +31 306 082 089Hours: 9 a.m. - 5:30 p.m. (local time)

SingaporePhone: +65 322 8589Hours: 9 a.m. - 5 p.m. (local time)

SwitzerlandPhone: +41 800 81 80 70Hours: 8 a.m. - 5 p.m. (local time)

Preface li

lii Preface

C h a p t e r 1

Understanding the Server Architecture

This chapter covers the following subjects:

♦ Overview, 2

♦ PowerCenter Server Connectivity, 5

♦ Running a Workflow, 7

♦ Load Manager Process, 8

♦ Data Transformation Manager (DTM) Process, 11

♦ Understanding Processing Threads, 14

♦ PowerCenter Server Processing, 22

♦ System Resources, 24

♦ Code Pages and Data Movement Modes, 27

♦ Output Files and Caches, 28

1

Overview

You can register multiple PowerCenter Servers to a repository. The PowerCenter Server moves data from sources to targets based on workflow and mapping metadata stored in a repository. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. The PowerCenter Server runs workflow tasks according to the conditional links connecting the tasks. You can run a task by placing it in a workflow.

When you have multiple PowerCenter Servers, you can assign a server to start a workflow or a session. This allows you to distribute the workload. You can increase performance by using a server grid to balance the workload. A server grid is a server object that allows you to automate the distribution of sessions across multiple servers. For more information about server grids, see “Working with Server Grids” on page 446.

A session is a type of workflow task. A session is a set of instructions that describes how to move data from sources to targets using a mapping. Other workflow tasks include commands, decisions, timers, pre-session SQL commands, post-session SQL commands, and email notification. For details on workflow tasks, see “Working with Tasks” on page 131.

Use the Designer to import source and target definitions into the repository and to build mappings. A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation. Use the Workflow Manager to develop and manage workflows. Use the Workflow Monitor to monitor workflows and stop the PowerCenter Server.

When a workflow starts, the PowerCenter Server retrieves mapping, workflow, and session metadata from the repository to extract data from the source, transform it, and load it into the target. It also runs the tasks in the workflow. The PowerCenter Server uses Load Manager and Data Transformation Manager (DTM) processes to run the workflow.

Figure 1-1 shows the processing path between the PowerCenter Server, repository, source, and target:

Figure 1-1. PowerCenter Server and Data Movement

Source PowerCenter Server

Repository

Source Data

Transformed Data

Instructions from

Metadata

Target

2 Chapter 1: Understanding the Server Architecture

The PowerCenter Server can combine data from different platforms and source types. For example, you can join data from a flat file and an Oracle source. The PowerCenter Server can also load data to different platforms and target types. For example, you can load transformed data to both a flat file target and a Microsoft SQL Server database in the same session.

Workflow ProcessesThe PowerCenter Server uses both process memory and system shared memory to perform these tasks. It runs as a daemon on UNIX and a service on Windows. The PowerCenter Server uses the following processes to run a workflow:

♦ The Load Manager process. Starts and locks the workflow, runs workflow tasks, and starts the DTM to run sessions.

♦ The Data Transformation Manager (DTM) process. Performs session validations. Creates threads to initialize the session, read, write, and transform data, and handle pre- and post-session operations.

Pipeline PartitioningWhen running sessions, the PowerCenter Server can achieve high performance by partitioning the pipeline and performing the extract, transformation, and load for each partition in parallel. To accomplish this, use the following session and server configuration:

♦ Configure the session with multiple partitions.

♦ Install the PowerCenter Server on a machine with multiple CPUs.

You can configure the partition type at most transformations in the pipeline. The PowerCenter Server can partition data using round-robin, hash, key-range, database partitioning, or pass-through partitioning.

For relational sources, the PowerCenter Server creates multiple database connections to a single source and extracts a separate range of data for each connection. For XML or file sources, the PowerCenter Server reads multiple files concurrently. The files must have the same structure or hierarchy.

When the PowerCenter Server transforms the partitions concurrently, it passes data between the partitions as needed to perform operations such as aggregation. When the PowerCenter Server loads relational data, it creates multiple database connections to the target and loads partitions of data concurrently. When the PowerCenter Server loads data to file targets, it creates a separate file for each partition. You can choose to merge the target files.

Figure 1-2 shows a mapping that contains two partitions:

Figure 1-2. Partitioned Mapping

Source TargetTransformations

Overview 3

For more information about pipeline partitioning, see “Pipeline Partitioning” on page 345.


PowerCenter Server Connectivity

The PowerCenter Server connects to the following Informatica platform components:

♦ PowerCenter Client

♦ Other PowerCenter Servers

♦ Repository Server

♦ Repository Agent

♦ Source and target databases

The PowerCenter Server is a repository client application. It connects to the Repository Server and Repository Agent to retrieve workflow and mapping metadata from the repository database. When the PowerCenter Server requests a repository connection from the Repository Server, the Repository Server starts and manages the Repository Agent. The Repository Server then re-directs the PowerCenter Server to connect directly to the Repository Agent. For details on repository connectivity, see “Understanding the Repository” in the Repository Guide.

The Workflow Manager communicates directly with the PowerCenter Server over a TCP/IP connection. The Workflow Manager communicates directly with the PowerCenter Server each time you schedule or edit a workflow, display workflow details, and request workflow and session logs. You create the connection by defining the port number in the Workflow Manager and the PowerCenter Server configuration. Use the Workflow Manager to register the PowerCenter Server in the repository.

In a server grid, the Workflow Manager communicates directly with multiple PowerCenter Servers over TCP/IP connections. Each PowerCenter Server retrieves a server grid object from the repository, which it uses to connect to the other PowerCenter Servers in the grid. When the PowerCenter Servers connect to each other, they maintain a constant line of communication with each other. For more information about creating and using server grids, see “Working with Server Grids” on page 446.

The PowerCenter Server connects to the source or target database using ODBC or native drivers. It uses TCP/IP to connect to the Repository Server. The PowerCenter Server maintains a database connection pool for stored procedures or lookup databases in a workflow. The PowerCenter Server allows an unlimited number of connections to lookup or stored procedure databases. If a database user does not have permission for the number of connections a session requires, the session fails. You can optionally set a parameter to limit the database connections.

For a session, the PowerCenter Server holds the connection as long as it needs to read data from source tables or write data to target tables.

To prevent loss of information during data transfer, the PowerCenter Server, PowerCenter Client, Repository Server, Repository Agent, and repository database must have compatible code pages.

PowerCenter Server Connectivity 5

Figure 1-3 shows the PowerCenter Server connectivity:

Table 1-1 summarizes the software you need to connect the PowerCenter Server to the platform components, source databases, and target databases:

Figure 1-3. PowerCenter Connectivity

Table 1-1. PowerCenter Server Connectivity Requirements

PowerCenter Server Connection Connectivity Requirement

PowerCenter Client TCP/IP

Other PowerCenter Servers TCP/IP

Repository Server TCP/IP

Repository Agent TCP/IP

Source and target databases Native database drivers or ODBC

Note: Both the Windows and UNIX versions of the PowerCenter Server can use ODBC drivers to connect to databases. However, Informatica recommends using native drivers when possible to improve performance.

PowerCenter Server

Repository Server

TCP/IP Native/ODBC

TCP/IP

PowerCenter Client

Sources and Targets

Repository Agent

Native/ODL

PowerCenter Repository


Running a Workflow

The PowerCenter Server uses the Load Manager process and the Data Transformation Manager Process (DTM) to run the workflow and carry out workflow tasks.

When the PowerCenter Server runs a workflow, the Load Manager performs the following tasks:

1. Locks the workflow and reads workflow properties.

2. Reads the parameter file and expands workflow variables.

3. Creates the workflow log file.

4. Runs workflow tasks.

5. Distributes sessions to worker servers.

6. Starts the DTM to run sessions.

7. Runs sessions from master servers.

8. Sends post-session email if the DTM terminates abnormally.

For details on the Load Manager process, see “Load Manager Process” on page 8.

When the PowerCenter Server runs a session, the DTM performs the following tasks:

1. Fetches session and mapping metadata from the repository.

2. Creates and expands session variables.

3. Creates the session log file.

4. Validates session code pages if data code page validation is enabled. Checks query conversions if data code page validation is disabled.

5. Verifies connection object permissions.

6. Runs pre-session shell commands.

7. Runs pre-session stored procedures and SQL.

8. Creates and runs mapping, reader, writer, and transformation threads to extract, transform, and load data.

9. Runs post-session stored procedures and SQL.

10. Runs post-session shell commands.

11. Sends post-session email.

For details on the DTM process, see “Data Transformation Manager (DTM) Process” on page 11.

Running a Workflow 7

Load Manager Process

The Load Manager is the primary PowerCenter Server process. It accepts requests from the PowerCenter Client and from pmcmd. The Load Manager runs and monitors the workflow. It performs the following tasks:

♦ Manages workflow scheduling.

♦ Locks and reads the workflow.

♦ Reads the parameter file.

♦ Creates the workflow log file.

♦ Runs workflow tasks and evaluates the conditional links connecting tasks.

♦ Starts the DTM, which runs the session.

♦ Writes historical run information to the repository.

♦ Sends post-session email in the event of DTM failure.

Managing Workflow SchedulingThe Load Manager manages workflow scheduling in the following situations:

♦ When you start the PowerCenter Server. When you start the PowerCenter Server, the Load Manager launches and queries the repository for a list of workflows configured to run on the PowerCenter Server.

♦ When you save a workflow. When you save a workflow assigned to a PowerCenter Server to the repository, the Load Manager adds the workflow to or removes the workflow from the schedule queue.

Locking and Reading the WorkflowWhen the PowerCenter Server starts a workflow, the Load Manager requests an execute lock on the workflow from the repository. The execute lock allows the PowerCenter Server to run the workflow and prevents you from starting the workflow again until it completes. If the workflow is already locked, the PowerCenter Server cannot start the workflow. A workflow may be locked if it is already running.

The Load Manager also reads the workflow from the repository at workflow run time. The Load Manager reads all links and tasks in the workflow except sessions and worklet instances. The Load Manager reads session instance information from the repository. The DTM retrieves the session and mapping from the repository at session run time. The Load Manager reads worklets from the repository when the worklet starts.

For more information on locking, see “Repository Security” in the Repository Guide.


Reading the Parameter FileWhen the workflow starts, the Load Manager checks the workflow properties for use of a parameter file. If the workflow uses a parameter file, the Load Manager reads the parameter file and expands the variable values for the workflow and any worklets invoked by the workflow.

The parameter file can also contain mapping variables, mapping parameters, session parameters, and session variables for sessions in the workflow. When starting the DTM, the Load Manager passes the parameter file name to the DTM.

For more information on the parameter file, see “Session Parameters” on page 495.

Creating the Workflow Log FileThe Load Manager creates a log file for the workflow. The workflow log file contains a history of the workflow run, including initialization, workflow task status, and error messages. You can use information in the workflow log file in conjunction with the PowerCenter Server log and session log to troubleshoot system, workflow, or session problems.

You can view the workflow log file in the Workflow Manager or open it in a text editor. The following sample shows the first few lines of a log file:

INFO : LM_36215 : (2076|2224) Starting execution of workflow [w_OrdersBooked].

INFO : LM_36255 : (2076|2224) Link [StartWorkflow --> s_BOOKINGS]: empty expression string, evaluated to TRUE.

INFO : LM_36224 : (2076|2224) Starting execution of session instance [s_BOOKINGS].

INFO : LM_36302 : (2076|2224) Started DTM process [pid = 508] for session instance [s_BOOKINGS].

For more information on workflow log files, see “Log Files” on page 455.

Running Workflow TasksThe Load Manager runs workflow tasks according to the conditional links connecting the tasks. Links define the order of execution for workflow tasks. When a task in the workflow completes, the Load Manager evaluates the completed task according to specified conditions, such as success or failure. Based on the result of the evaluation, the Load Manager runs successive links and tasks.

For more information on workflows and workflow tasks, see “Working with Workflows” on page 87.

Distributing Sessions to Worker Servers When you run a workflow in a server grid, the master server distributes session tasks to the worker servers in a round-robin fashion to balance the workload. When the master server

Load Manager Process 9

distributes a session to a worker server, the Load Manager on the worker server machine starts a DTM process to run the session.

For more information about creating and using server grids, see “Working with Server Grids” on page 446.

Starting the DTMWhen the workflow reaches a session, the Load Manager starts the DTM. The Load Manager provides the DTM with session and parameter file information that allows the DTM to retrieve the session and mapping metadata from the repository.

For more information on the DTM process, see “Data Transformation Manager (DTM) Process” on page 11.

Running Sessions from Master ServersIf a PowerCenter Server is part of a server grid, it can run sessions assigned from other master servers. The master server runs tasks in a workflow before it runs sessions assigned from other master servers.

For more information about creating and using server grids, see “Working with Server Grids” on page 446.

Writing Historical Information to the RepositoryThe Load Manager monitors the status of workflow tasks during the workflow run. When workflow tasks start or finish, the Load Manager writes historical run information to the repository. Historical run information for tasks includes start and completion times and completion status. Historical run information for sessions also includes source read statistics, target load statistics, and number of errors. You can view this information using the Workflow Monitor.

For details on using the Workflow Monitor, see “Monitoring Workflows” on page 401.

Sending Post-Session EmailThe Load Manager sends post-session email if the DTM terminates abnormally. The DTM sends post-session email in all other cases. For details on post-session email, see “Sending Email” on page 319.


Data Transformation Manager (DTM) Process

When the workflow reaches a session, the Load Manager starts the DTM process. The DTM process is the process associated with the session task. The Load Manager creates one DTM process for each session in the workflow. The DTM process performs the following tasks:

♦ Reads session information from the repository.

♦ Expands the server, session, and mapping variables and parameters.

♦ Creates the session log file.

♦ Validates source and target code pages.

♦ Verifies connection object permissions.

♦ Runs pre-session shell commands, stored procedures and SQL.

♦ Creates and runs mapping, reader, writer, and transformation threads to extract, transform, and load data.

♦ Runs post-session stored procedures, SQL, and shell commands.

♦ Sends post-session email.

Reading the Session InformationThe Load Manager provides the DTM with session instance information when it starts the DTM. The DTM retrieves the mapping and session metadata from the repository.

Expanding Variables and ParametersIf the workflow uses a parameter file, the Load Manager sends the parameter file to the DTM when it starts the DTM. The DTM creates and expands session-level, server-level, and mapping-level variables and parameters. For more information on the parameter file, see “Session Parameters” on page 495.

Creating the Session Log FileThe DTM creates a log file for the session. The log file contains a complete history of the session run, including initialization, transformation, status, and error messages. You can use information in the log file in conjunction with the PowerCenter Server log and the workflow log file to troubleshoot system or session problems.

You can view the log file in the Workflow Monitor or open it in a text editor. The following sample shows the first few lines of a log file:

MASTER> CMN_1010 System shared memory [2338661387] allocated for [12000000] bytes.

MASTER> PETL_24000 Parallel Pipeline Engine initializing.

MASTER> PETL_24001 Parallel Pipeline Engine running.

Data Transformation Manager (DTM) Process 11

MASTER> PETL_24003 Initializing session run.

MAPPING> TM_6014 Initializing session [s_Customers] at [Tue Nov 04 16:55:06 2003]

For more information on session log files, see “Log Files” on page 455.

Validating Code PagesWhen the PowerCenter Server runs in Unicode mode with data code page validation enabled, the DTM validates the following code pages:

♦ Source code pages. Must be a subset of the PowerCenter Server code page.

♦ Target code pages. Must be a superset of the PowerCenter Server code page.

♦ Repository Agent code page. Must be compatible with the PowerCenter Server code page.

♦ Repository Server code page. Must be compatible with the PowerCenter Server code page.

♦ Lookup database code page. Must be compatible with the PowerCenter Server code page.

♦ Stored procedure database code page. Must be compatible with the PowerCenter Server code page.

♦ PowerCenter Server code page. Must be registered with the Workflow Manager.

If the DTM cannot validate the code pages, it writes the error into the session log and fails the session. If you disable data code page validation, the PowerCenter Server does not enforce code page compatibility.

The PowerCenter Server processes data internally using the UCS-2 character set. When you disable data code page validation the PowerCenter Server verifies that the source query, target query, lookup database query, and stored procedure call text convert from the source, target, lookup, or stored procedure data code page to the UCS-2 character without loss of data in conversion. If the PowerCenter Server encounters an error when converting data, it writes an error message to the session log.

For more information about code pages, see “Globalization Overview” and “Code Pages” in the Installation and Configuration Guide.

Verifying Connection Object PermissionsAfter validating the session code pages, the DTM verifies permissions for connection objects used in the session. The DTM verifies that the user who started the PowerCenter Server and the user who started or scheduled the workflow has execute permissions for connection objects associated with the session.

Running Pre-Session OperationsAfter verifying connection object permissions, the DTM runs pre-session shell commands. The DTM then runs pre-session stored procedures and SQL commands.


Running the Processing ThreadsAfter initializing the session, the DTM uses reader, transformation, and writer threads to extract, transform, and load data. The number of threads the DTM uses to run the session depends on the number of partitions configured for the session. For a detailed discussion of reader, transformation, and writer threads, see “Understanding Processing Threads” on page 14.

Running Post-Session OperationsAfter the DTM runs the processing threads, it runs post-session SQL commands and stored procedures. The DTM then runs post-session shell commands.

Sending Post-Session EmailWhen the session finishes, the DTM composes and sends email reporting session completion or failure. If the DTM terminates abnormally, the Load Manager sends post-session email. For details on post-session email, see “Sending Email” on page 319.

Data Transformation Manager (DTM) Process 13

Understanding Processing Threads

The DTM allocates process memory for the session and divides it into buffers. This is also known as buffer memory. The default memory allocation is 12,000,000 bytes. The DTM uses multiple threads to process data. The main DTM thread is called the master thread.

The master thread creates and manages other threads. The master thread for a session can create mapping, pre-session, post-session, reader, transformation, and writer threads. For more information, see “Thread Types” on page 14.

For each target load order group in a mapping, the master thread can create several threads. The types of threads depend on the session properties and the transformations in the mapping. The number of threads depends on the partitioning information for each target load order group in the mapping.

For more information on target load order groups, see “Reading Source Data” on page 22.

Thread TypesThe master thread creates different types of threads for a session. The types of threads the master thread creates depend on the following factors:

♦ Pre- and post-session properties

♦ Types of transformations in the mapping

Table 1-2 lists the types of threads that the master thread can create:

Table 1-2. Processing Threads

Thread Type Description

Mapping Thread One thread for each session. Fetches session and mapping information. Compiles the mapping. Cleans up after session execution.

Pre- and Post-Session Threads

One thread each to perform pre- and post-session operations.

Reader Thread One thread for each partition for each source pipeline. Reads from sources. Relational sources use relational reader threads, and file sources use file reader threads.

Transformation Thread One or more transformation threads for each partition. Processes data according to the transformation logic in the mapping.

Writer Thread One thread for each partition, if a target exists in the source pipeline. Writes to targets. Relational targets use relational writer threads, and file targets use file writer threads.


Figure 1-4 shows the threads the master thread creates for a simple mapping that contains one target load order group:

The mapping in Figure 1-4 contains a single partition. In this case, the master thread creates one reader, one transformation, and one writer thread to process the data. The reader thread controls how the PowerCenter Server extracts source data and passes it to the source qualifier, the transformation thread controls how the PowerCenter Server processes the data, and the writer thread controls how the PowerCenter Server loads data to the target.

When the pipeline contains only a source definition, source qualifier, and a target definition, the data bypasses the transformation threads, proceeding directly from the reader buffers to the writer. This type of pipeline is a pass-through pipeline.

Figure 1-5 shows the threads for a pass-through pipeline with one partition:

Note: The previous examples assume that each session contains a single partition. For information on how partitions and partition points affect thread creation, see “Threads and Partitioning” on page 16.

Reader ThreadsThe master thread creates reader threads to extract source data. The number of reader threads depends on the partitioning information for each pipeline. The number of reader threads equals the number of partitions. For more information, see “Threads and Partitioning” on page 16.

The PowerCenter Server creates an SQL statement for each reader thread to extract data from a relational source. For file sources, the PowerCenter Server can create multiple threads to read a single source.

Figure 1-4. Thread Creation for a Simple Mapping

Figure 1-5. Thread Creation for a Pass-through Pipeline

1 Reader Thread 1 Writer Thread1 Transformation Thread

1 Reader Thread 1 Writer ThreadBypassed Transformation Thread

Understanding Processing Threads 15

Transformation ThreadsThe master thread creates transformation threads to transform data received in buffers by the reader thread, move the data from transformation to transformation, and create memory caches when necessary. The number of transformation threads depends on the partitioning information for each pipeline. For more information, see “Threads and Partitioning” on page 16.

The transformation threads store fully-transformed data in a buffer drawn from the memory pool for subsequent access by the writer thread.

If the pipeline contains a Rank, Joiner, Aggregator, Sorter, or a cached Lookup transformation, the transformation thread uses cache memory until it reaches the configured cache size limits. If the transformation thread requires more space, it pages to local cache files to hold additional data.

When the PowerCenter Server runs in ASCII mode, the transformation threads pass character data in single bytes. When the PowerCenter Server runs in Unicode mode, the transformation threads use double bytes to move character data.

Writer ThreadsThe master thread creates writer threads to load target data. The number of writer threads depends on the partitioning information for each pipeline. If the pipeline contains one partition, the master thread creates one writer thread. If it contains multiple partitions, the master thread creates multiple writer threads. For more information, see “Threads and Partitioning” on page 16.

Each writer thread creates connections to the target databases to load data. If the target is a file, each writer thread creates a separate file. You can configure the session to merge these files.

If the target is relational, the writer thread takes data from buffers and commits it to session targets. When loading targets, the writer commits data based on the commit interval in the session properties. You can configure a session to commit data based on the number of source rows read, the number of rows written to the target, or the number of rows that pass through a transformation that generates transactions, such as a Transaction Control transformation.

Threads and PartitioningThe master thread creates different numbers of threads for different mappings. The number of threads depends on the partitioning information for each target load order group. This includes the following factors:

♦ The partition points. Controls the thread boundaries and pipeline stages.

♦ The number of partitions. Controls the number of threads the master thread creates for each pipeline stage.

♦ The number of source pipelines. Controls the number of reader threads and the number of transformation threads downstream from the sources.


Partition PointsBy default, the Workflow Manager places partition points at certain transformations in each source pipeline. Partition points mark the thread boundaries in a source pipeline and divide the pipeline into stages. A pipeline stage is the section of a pipeline executed between any two partition points. When you set a partition point at a transformation, the new pipeline stage includes that transformation.

The PowerCenter Server can redistribute rows of data at partition points. For example, if you place a partition point at a Sorter transformation and specify multiple partitions, the PowerCenter Server redistributes rows among all partitions before the rows enter the Sorter transformation. The rows stay in the same partitions until they reach the next partition point. For more information, see “Pipeline Partitioning” on page 345.

By default, the Workflow Manager places a partition point at each of the following transformations:

♦ Source qualifier. Marks the reader stage. You cannot delete this partition point.

♦ Rank and unsorted Aggregator transformation. Marks the transformation stage boundaries and creates a new transformation stage. This is necessary to ensure that rows are grouped properly before the Rank and Aggregator transformations process them. You can delete these partition points under certain circumstances. For more information, see “Adding and Deleting Partition Points” on page 353.

♦ Target instance. Marks the writer stage. You cannot delete this partition point.

Figure 1-6 shows the pipeline stages for a mapping that contains an unsorted Aggregator transformation:

The mapping in Figure 1-6 contains four stages by default. The partition point at the source qualifier marks the boundary between the first (reader) and second (transformation) stages. The partition point at the Aggregator transformation marks the boundary between the second and third (transformation) stages. The partition point at the target instance marks the boundary between the third (transformation) and the fourth (writer) stages.

If you use PowerCenter, you can add and delete partition points at other transformations. For information on valid partition points, see “Pipeline Partitioning” on page 345. When you add a partition point, you increase the number of pipeline stages by one. When you remove a partition point, you decrease the number of pipeline stages by one.

Figure 1-6. Pipeline Stages in a Mapping With an Unsorted Aggregator Transformation

Default Partition Points** * *

First Stage Second Stage Third Stage Fourth Stage


Figure 1-7 shows the pipeline stages if you add a partition point at the Filter transformation:

Number of PartitionsThe number of threads that process each pipeline stage depends on the number of partitions. A partition is a pipeline stage that executes in a single reader, transformation, or writer thread. The number of partitions in any pipeline stage equals the number of threads in that stage. If you do not specify otherwise, the PowerCenter Server creates one partition in every pipeline stage. If you purchased the partitioning option, you can configure multiple partitions for a single pipeline stage.

You can specify the number of partitions at any partition point. The number of partitions must be consistent across a pipeline. Therefore, if you define two partitions at the source qualifier, the Workflow Manager sets two partitions at all transformations that are partition points, and two partitions at the target instances.

For example, suppose you need to use the mapping in Figure 1-6 on page 17 to read data from three flat files. To do this, you need to specify three partitions at the source qualifier. When you do this, the Workflow Manager sets three partitions at all other partition points in the pipeline.

The master thread creates three sets of threads. Figure 1-8 shows thread creation for a mapping with three partitions:

Figure 1-7. Pipeline Stages in a Mapping with an Additional Partition Point

Figure 1-8. Thread Creation for a Mapping with Three Partitions

Partition Points

* * **

* *

First Stage Second Stage Third Stage Fourth Stage Fifth Stage


3 Reader Threads 6 Transformation Threads 3 Writer Threads

Threads for Partition #1Threads for Partition #2Threads for Partition #3

(First Stage) (Second Stage) (Third Stage) (Fourth Stage)


When you define three partitions across the mapping in Figure 1-8, the master thread creates three threads at each pipeline stage, for a total of 12 threads. If you need to read data from four file sources, you would specify four partitions at the source qualifier. The master thread would create a fourth thread at each stage, for a total of 16 threads.

The PowerCenter Server processes partitions concurrently. When you run a session with multiple partitions, the threads run as follows:

1. The reader threads run concurrently to extract data from the source.

2. The transformation threads run concurrently in each transformation stage to process data. The PowerCenter Server redistributes data among the partitions at each partition point.

3. The writer threads run concurrently to write data to the target.

Note: Increasing the number of partitions or partition points increases the number of threads. Therefore, increasing the number of partitions or partition points also increases the load on the server machine. If the server machine contains ample CPU bandwidth, processing rows of data in a session concurrently can increase session performance. However, if you create a large number of partitions or partition points in a session that processes large amounts of data, you can overload the system.

Number of Source PipelinesThe master thread creates a reader and transformation thread for each source pipeline in the target load order group. For more information on source pipelines and target load order groups, see “Reading Source Data” on page 22.

When you connect multiple pipelines to a multiple input group transformation, such as a Joiner or Custom transformation, the PowerCenter Server maintains the transformation threads or creates a new transformation thread depending on the partitioning information:

♦ You add a partition point at the multiple input group transformation. The PowerCenter Server creates a new pipeline stage and creates one transformation thread downstream from the partition point. The PowerCenter Server creates one transformation thread regardless of the number of output groups the transformation contains.

♦ You do not add a partition point at the multiple input group transformation. The PowerCenter Server maintains the same number of transformation threads downstream from the partition point until it reaches the next partition point. However, for each partition at the multiple input group transformation and its downstream transformations, only one thread actively processes a row of data at any given time.


Figure 1-9 shows the thread creation for a mapping that contains a Joiner transformation configured for sorted input:

Each source pipeline in Figure 1-9 contains a transformation thread. The Joiner transformation is not a partition point, so both transformation threads can process data at the Joiner and Expression transformations. However, only one transformation thread processes a row at any given time. The target load order group contains one target, so the master thread creates only one writer thread.

Suppose you add a partition point at the Joiner transformation in Figure 1-9. Figure 1-10 shows the mapping in Figure 1-9 with a partition point at the Joiner transformation:

Figure 1-9. Thread Creation with Joiner Transformation

Figure 1-10. Thread Creation with a Partition Point at a Joiner Transformation


1 Reader Thread 1 Transformation Thread

Partition Points**

*

*


1 Reader Thread 1 Transformation Thread

Partition Points*

*

*

*

*

1 Transformation Thread Created After the Partition Point


Each source pipeline in Figure 1-10 contains a transformation thread. However, the transformation threads end at the Joiner transformation. The Joiner transformation is a partition point, so the master thread creates a new transformation thread starting at the partition point.

Note: If any source qualifier in either Figure 1-9 or Figure 1-10 feeds a target other than the target associated with the Joiner transformation, the master thread creates an additional writer thread.


PowerCenter Server Processing

When you run a session, the PowerCenter Server reads source data and passes it to the transformations for processing. To help understand PowerCenter Server processing, consider the following PowerCenter Server actions:

♦ Reading source data. The PowerCenter Server reads the sources in a mapping at different times depending on how you configure the sources, transformations, and targets in the mapping. For more information on reading data, see “Reading Source Data” on page 22.

♦ Blocking data. The PowerCenter Server sometimes blocks the flow of data at a transformation in the mapping while it processes a row of data from a different source. For more information on blocking data, see “Blocking Data” on page 23.

♦ Block processing. The PowerCenter Server reads and processes a block of rows at a time. For more information, see “Block Processing” on page 23.

Reading Source DataYou create a session based on a mapping. Mappings contain one or more target load order groups. A target load order group is the collection of source qualifiers, transformations, and targets linked together in a mapping. Each target load order group contains one or more source pipelines. A source pipeline consists of a source qualifier and all of the transformations and target instances that receive data from that source qualifier.

By default, the PowerCenter Server reads sources in a target load order group concurrently, and it processes target load order groups sequentially. You can configure the order that the PowerCenter Server processes target load order groups. For more information on setting the target load order, see “Mappings” in the Designer Guide.

Figure 1-11 shows a mapping that contains two target load order groups and three source pipelines:

Figure 1-11. Target Load Order Groups and Source PipelinesSources Transformations Targets

Pipeline A

T1A

Target Load Order Group 2Pipeline C

T3C

B T2

Pipeline B

Target Load Order Group 1


In the mapping shown in Figure 1-11, the PowerCenter Server processes the target load order groups sequentially. It first processes Target Load Order Group 1 by reading Source A and Source B at the same time. When it finishes processing Target Load Order Group 1, the PowerCenter Server begins to process Target Load Order Group 2 by reading Source C.

Blocking DataYou can include multiple input group transformations in a mapping. The PowerCenter Server passes data to the input groups concurrently. However, sometimes the transformation logic of a multiple input group transformation requires that the PowerCenter Server block data on one input group while it waits for a row from a different input group.

Blocking is the suspension of the data flow into an input group of a multiple input group transformation. When the PowerCenter Server blocks data, it reads data from the source connected to the input group until it fills the reader and transformation buffers. Once the PowerCenter Server fills the buffers, it does not read more source rows until the transformation logic allows the PowerCenter Server to stop blocking the source. When the PowerCenter Server stops blocking a source, it processes the data in the buffers and continues to read from the source.

The PowerCenter Server blocks data at one input group when it needs a specific row from a different input group to perform the transformation logic. Once the PowerCenter Server reads and processes the row it needs, it stops blocking the source.

Block ProcessingThe PowerCenter Server reads and processes a block of rows at a time. The number of rows in the block depend on the row size and the DTM buffer size. In the following circumstances, the PowerCenter Server processes one row in a block:

♦ Log row errors. When you log row errors, the PowerCenter Server processes one row in a block.

♦ Connect CURRVAL. When you connect the CURRVAL port in a Sequence Generator transformation, the session processes one row in a block. For optimal performance, Informatica recommends that you connect only the NEXTVAL port in mappings. For more information, see “Sequence Generator Transformation” in the Transformation Guide.

♦ Configure array-based mode for Custom transformation procedure. When you configure the data access mode for a Custom transformation procedure to be row-based, the PowerCenter Server processes one row in a block. By default, the data access mode is array-based, and the PowerCenter Server processes multiple rows in a block. For more information, see “Custom Transformation Functions” in the Transformation Guide.

PowerCenter Server Processing 23

System Resources

To allocate system resources for read, transformation, and write processing, you should understand how the PowerCenter Server allocates and uses system resources. The PowerCenter Server uses the following system resources:

♦ CPU

♦ Load Manager shared memory

♦ DTM buffer memory

♦ Cache memory

CPU UsageThe PowerCenter Server performs read, transformation, and write processing for a pipeline in parallel. It can process multiple partitions of a pipeline within a session, and it can process multiple sessions in parallel.

If you have a symmetric multi-processing (SMP) platform, you can use multiple CPUs to concurrently process session data or partitions of data. This provides increased performance, as true parallelism is achieved. On a single processor platform, these tasks share the CPU, so there is no parallelism.

The PowerCenter Server can use multiple CPUs to process a session that contains multiple partitions. The number of CPUs used depends on factors such as the number of partitions, the number of threads, the number of available CPUs, and amount or resources required to process the mapping.

For more information about partitioning, see “Pipeline Partitioning” on page 345.

Load Manager Shared MemoryThe Load Manager uses both process and shared memory. The Load Manager keeps a list of workflows and the schedule queue in process memory. The Load Manager shared memory is organized as an array of session slots that store session instance and status information. The DTM retrieves the session object and mapping object from the repository for processing.

Session instance information does not occupy the shared memory slot until session run time. When you start a workflow, the Load Manager retrieves session instance information from the repository with other workflow tasks. At session runtime, the Load Manager places the session instance information into a shared memory slot and starts the DTM. The DTM connects to the shared memory and uses the session instance information to retrieve the session and mapping from the repository. When the session completes, the Load Manager releases the session instance from the shared memory slot and writes session run information to the repository.

If the PowerCenter Server shuts down, it releases all sessions from shared memory.


You can configure three parameters in the PowerCenter Server configuration that control how the Load Manager allocates shared memory to sessions and the number of sessions the PowerCenter Server runs simultaneously:

♦ MaxSessions. The maximum sessions parameter indicates the maximum number of session slots available to the Load Manager at one time for running or repeating sessions. For example, if you select the default MaxSessions of 10, the Load Manager allocates 10 session slots. This parameter helps you control the number of sessions the PowerCenter Server can run simultaneously.

♦ LMSharedMemory. Set the Load Manager shared memory parameter in conjunction with the Maximum Sessions parameter to ensure that the Load Manager has enough memory for each session. The Load Manager requires approximately 200,000 bytes of shared memory for each session slot. The default setting is 2,000,000 bytes. For each increase of 10 sessions in the MaxSessions setting, you need to increase LMSharedMemory by 2,000,000 bytes.

♦ FailSessionIfMaxSessionsReached. The Fail Session If Max Sessions Reached option determines how the Load Manager handles a session when the number of sessions already running equals the number specified for maximum sessions. By default, this option is disabled, and the Load Manager holds sessions waiting to run in a ready queue until a session slot becomes available.

DTM Buffer MemoryThe Load Manager launches the DTM. The DTM allocates buffer memory to the session based on the DTM Buffer Size setting in the session properties. By default, it allocates 12,000,000 bytes of memory to the session.

The DTM divides the memory into buffer blocks as configured in the Buffer Block Size setting in the session properties (64,000 bytes per block, by default). The reader, transformation, and writer threads use buffer blocks to move data from sources to targets.

You can sometimes improve session performance by increasing buffer memory when you run a session handling a large volume of character data and the PowerCenter Server runs in Unicode mode. In Unicode mode, the PowerCenter Server uses double bytes to move characters, so increasing buffer memory might improve session performance.

If the DTM cannot allocate the configured amount of buffer memory for the session, the session cannot initialize. Informatica recommends you allocate no more than 1 GB for DTM buffer memory.

System Resources 25

Cache MemoryThe DTM process creates in-memory index and data caches to temporarily store data used by the following transformations:

♦ Aggregator transformation (without sorted input)

♦ Rank transformation

♦ Joiner transformation

♦ Lookup transformation (with caching enabled)

You configure memory size for the index and data cache in the transformation properties. By default, the PowerCenter Server allocates 1,000,000 bytes for the index cache and 2,000,000 bytes for the data cache.

By default, the DTM creates cache files in the directory configured for the $PMCacheDir server variable. If the DTM requires more space than it allocates, it pages to local index and data files.

The DTM process also creates an in-memory cache to store data used by a Sorter transformation. You configure the memory size for the cache in the transformation properties. By default, the PowerCenter Server allocates 8,388,608 bytes for the cache, and the DTM creates cache files in the directory configured for the $PMTempDir server variable. If the DTM requires more cache space than it allocates, it pages to local cache files.

When processing large amounts of data, the DTM may create multiple index and data files. The session does not fail if it runs out of cache memory and pages to the cache files. It does fail, however, if the local directory for cache files runs out of disk space.

After the session completes, the DTM releases memory used by the index and data caches and deletes any index and data files. However, if the session is configured to perform incremental aggregation or if a Lookup transformation is configured for a persistent lookup cache, the DTM saves all index and data cache information to disk for the next session run.

For more information about caching, see “Session Caches” on page 613.


Code Pages and Data Movement Modes

You can configure PowerCenter to move multibyte data. The PowerCenter Server can move data in either ASCII or Unicode data movement mode. These modes determine how the PowerCenter Server handles character data. You choose the data movement mode in the PowerCenter Server configuration settings. If you want to move multibyte data, choose Unicode data movement mode.

To ensure that data is not lost during conversion from one machine to another, you must also choose the appropriate code pages for your connections. In the Workflow Manager, you select code pages for the PowerCenter Server and the database connections the PowerCenter Server uses to connect to the source and target machines. The Workflow Manager validates code page compatibility when you add or edit a session.

For more information, see “Globalization Overview” and “Code Pages” in the Installation and Configuration Guide.

ASCII ModeUse ASCII mode when all sources and targets are 7-bit ASCII or EBCDIC character sets. In ASCII mode, the PowerCenter Server recognizes 7-bit ASCII and EBCDIC characters and stores each character in a single byte. When the PowerCenter Server runs in ASCII mode, it does not validate session code pages. It reads all character data as ASCII characters and does not perform code page conversions. It also treats all numerics as U.S. Standard and all dates as binary data.

Unicode ModeUse Unicode mode when sources or targets use 8-bit or multibyte character sets and contain character data. In Unicode mode, the PowerCenter Server recognizes multibyte character sets as defined by supported code pages.

If you configure the PowerCenter Server to validate data code pages, the PowerCenter Server validates source and target code page compatibility when you run a session. If you configure the PowerCenter Server for relaxed data code page validation, the PowerCenter Server lifts source and target compatibility restrictions.

When reading a source, the PowerCenter Server converts data from the source character set to Unicode based on the source code page. The PowerCenter Server allots two bytes for each character when moving data through a mapping. The PowerCenter Server converts data from Unicode to the target character set based on the target code page when writing to the target. It also treats all numerics as U.S. Standard and all dates as binary data.

The PowerCenter Server code page must be compatible with the code pages of the PowerCenter Client.

For details on code page compatibility and validation, see “Globalization Overview” in the Installation and Configuration Guide.

Code Pages and Data Movement Modes 27

Output Files and Caches

Once launched, the PowerCenter Server logs status and error messages to a UNIX log file or to the Windows Application log. During each workflow run, the PowerCenter Server creates a workflow log file. During each session, the PowerCenter Server creates a session log file and reject file. Depending on transformation cache settings and target types, the PowerCenter Server may create additional files as well.

The PowerCenter Server uses the PowerCenter Server code page to generate log files. When you directly access a log file generated by the PowerCenter Server, it appears in the character set of the PowerCenter Server code page. When you use the Workflow Manager to access a file generated by the PowerCenter Server, such as a session log, the Workflow Manager uses the PowerCenter Client code page to translate and display the session log in the character set of the PowerCenter Client code page.

The PowerCenter Server creates the following output files:

♦ PowerCenter Server log

♦ Workflow log file

♦ Session log file

♦ Session details file

♦ Performance details file

♦ Reject files

♦ Row error logs

♦ Recovery tables and files

♦ Control file

♦ Post-session email

♦ Output file

♦ Cache files

When the PowerCenter Server on UNIX creates any file other than a recovery file, it sets the file permissions according to the umask of the shell that starts the PowerCenter Server. For example, when the umask of the shell that starts the PowerCenter Server is 022, the PowerCenter Server creates files with rw-r--r-- permissions. To change the file permissions, you must change the umask of the shell that starts the PowerCenter Server and then restart it.

The PowerCenter Server on UNIX creates recovery files with rw------- permissions.

The PowerCenter Server on Windows creates files with read and write permissions.

PowerCenter Server LogThe PowerCenter Server creates a log for all status and error messages. You can troubleshoot PowerCenter Server problems by examining error messages sent to this log.


On UNIX, the default name of the PowerCenter Server log file is pmserver.log. You configure the PowerCenter Server log file name with the LogFileName option in the PowerCenter Server setup program.

On Windows, the PowerCenter Server logs status and error messages in the event log. Use the Event Viewer to access those messages. You can also configure the PowerCenter Server on Windows to write status and error messages to a file.

PowerCenter Server MessagesThe PowerCenter Server associates a message code with the text of every message. The code uses a text prefix, such as LM, CMN, or RR, with a code number, such as CMN_1039. In PowerCenter Server error logs, the codes appear before the text as follows:

LM_34003 Server initialization completed.

LM_36802 Workflow <workflow name> scheduled to run at <time>.

Some message codes are embedded within other codes, for example:

CMN_1050 [LM 2041 Received request to start session]

You can also configure the PowerCenter Server on Windows to write error messages to the Application Log, which you can view with the Event Viewer. Messages sent from the PowerCenter Server display PowerCenter in the Source column, the code prefix in the Category column, and the code number in the Event column. However, since some message codes are embedded within other codes, to ensure you are viewing the true message code, you must view the text of the message.

Figure 1-12 shows a sample application log:

Figure 1-12. Event Viewer Application Log Message

Output Files and Caches 29

Figure 1-13 shows how you can view the text of the message by selecting the message and using the Enter key:

Error MessagesUsing the listed error code, consult the Troubleshooting Guide for probable causes and actions to correct the problem.

Workflow Log FileThe PowerCenter Server creates a workflow log file for each workflow it runs. It writes information in the workflow log such as intitialization of processes, workflow task run information, errors encountered, and workflow run summary. Workflow log error messages are categorized into severity levels. You can configure the PowerCenter Server to suppress writing messages to the workflow log file. You can also configure the workflow to write workflow messages to the session log file.

As with PowerCenter Server logs and session logs, the PowerCenter Server enters a code number into the workflow log file message along with message text. You can find information on error messages in the Troubleshooting Guide.

By default, the PowerCenter Server saves workflow logs in a directory entered for the server variable $PMWorkflowLogDir in the PowerCenter Server registration and names the workflow log workflow_name.log.

By default, the PowerCenter Server saves only one workflow log for each workflow. If you want to save multiple logs for different workflow runs, you can configure the workflow to save

Figure 1-13. Application Log Message Detail


a workflow log file in two different ways:

♦ By timestamp, permitting an unlimited number of workflow logs.

♦ By cycle, saving the configured number of workflow logs, replacing the older logs with new logs. You can use the server variable $PMWorkflowLogCount to set the number of logs the PowerCenter Server archives for the workflow.

For more information about the workflow log, see “Log Files” on page 455.

Session Log FileThe PowerCenter Server creates a session log file for each session it runs. It writes information in the session log such as initialization of processes, session validation, creation of SQL commands for reader and writer threads, errors encountered, and load summary. The amount of detail in the session log depends on the tracing level that you set.

As with PowerCenter Server logs and workflow logs, the PowerCenter Server enters a code number along with message text. You can find information on error messages in the Troubleshooting Guide.

By default, the PowerCenter Server saves session logs in a directory entered for the server variable $PMSessionLogDir in the PowerCenter Server registration and names the session log session_name.log.

By default, the PowerCenter Server saves only one session log for each session. If you want to save multiple logs for different session runs, you can configure the session to save a session log file in two different ways:

♦ By timestamp, permitting an unlimited number of session logs.

♦ By cycle, saving the configured number of session logs, replacing the older logs with new logs. You can use the server variable $PMSessionLogCount to set the number of logs the PowerCenter Server archives for the session.

For more information about the session log, see “Log Files” on page 455.

Session Details When you run a session, the Workflow Manager creates session details that provide load statistics for each target in the mapping. You can monitor session details during the session or after the session completes. Session details include information such as table name, number of rows written or rejected, and read and write throughput. You can view this information by double-clicking the session in the Workflow Monitor.

For more information on session details file, see “Monitoring Session Details” on page 434.

Performance Detail FileThe PowerCenter Server can create a set of information known as session performance details to help determine where performance can be improved. Performance details provide


transformation-by-transformation information on the flow of data through the session. To generate this information for a session, select the performance detail option in the session properties.

You can view performance details in the Workflow Monitor, or open the text file that contains the information in a text editor. The PowerCenter Server names the file session_name.perf, and stores it in the same directory as the session log (in the PowerCenter Server variable directory $PMSessionLog, by default).

For more information on performance details, see “Creating and Viewing Performance Details” on page 436.

Reject FilesBy default, the PowerCenter Server creates a reject file for each target in the session. The reject file contains rows of data that the writer does not write to targets.

The writer may reject a row in the following circumstances:

♦ It is flagged for reject by an Update Strategy or Custom transformation.

♦ It violates a database constraint, such as primary key constraint.

♦ A field in the row was truncated or overflowed, and the target database is configured to reject truncated or overflowed data.

By default, the PowerCenter Server saves the reject file in the directory entered for the server variable $PMBadFileDir in the Workflow Manager, and names the reject file target_table_name.bad.

Note: If you enable row error logging, the PowerCenter Server does not create a reject file.

For more information about the reject file, see “Log Files” on page 455.

Row Error LogsWhen you configure a session, you can choose to log row errors in a central location. When a row error occurs, the PowerCenter Server logs error information that allows you to determine the cause and source of the error. The PowerCenter Server logs information such as source name, row ID, current row data, transformation, timestamp, error code, error message, repository name, folder name, session name, and mapping information.

For more information about row error logging, see “Row Error Logging” on page 481.

Recovery Tables and FilesYou can recover failed sessions that write to relational targets. The PowerCenter Server creates recovery tables on the target database system when it runs a session enabled for recovery. When you run a session in recovery mode, the PowerCenter Server uses information in the recovery tables to complete the session.

For more information about recovery, see “Recovering Data” on page 295.


Control FileWhen you run a session that uses an external loader, the PowerCenter Server creates a control file and a target flat file. The control file contains information about the target flat file such as data format and loading instructions for the external loader. The control file has an extension of .ctl. You can view the control file and the target flat file in the target file directory (default: $PMTargetFilesDir).

For more information about external loading and control files, see “External Loading” on page 523.

EmailYou can compose and send email messages by creating an Email task in the Workflow Designer or Task Developer. You can place the Email task in a workflow, or you can associate it with a session. The Email task allows you to automatically communicate information about a workflow or session run to designated recipients.

Email tasks in the workflow send email depending on the conditional links connected to the task. For post-session email, you can create two different messages, one to be sent if the session completes successfully, the other if the session fails. You can also use variables to generate information about the session name, status, and total rows loaded.

For example, if your database administrator wants to track how long a session takes to complete, you can configure the session to send an email containing the time and date the session starts and completes. Or, if you want to notify your Informatica administrator when a session fails, you can configure the session to send an email only if it fails and attach the session log to the email.

For more information, see “Sending Email” on page 319.

Indicator FileIf you use a flat file as a target, you can configure the PowerCenter Server to create an indicator file for target row type information. For each target row, the indicator file contains a number to indicate whether the row was marked for insert, update, delete, or reject. The PowerCenter Server names this file target_name.ind and stores it in the same directory as the target file. For more information about configuring the PowerCenter Server, see the Installation and Configuration Guide.

Output FileIf the session writes to a target file, the PowerCenter Server creates the target file based on a file target definition. By default, the PowerCenter Server names the target file based on the target definition name. If a mapping contains multiple instances of the same target, the PowerCenter Server names the target files based on the target instance name.


The PowerCenter Server creates this file in the PowerCenter Server variable directory, $PMTargetFileDir, by default. For more information about working with target files, see “Working with Targets” on page 233.

Cache FilesWhen the PowerCenter Server creates memory cache it also creates cache files. The PowerCenter Server creates index and data cache files for the following transformations in a mapping:

♦ Aggregator transformation

♦ Joiner transformation

♦ Rank transformation

♦ Lookup transformation

♦ Sorter transformation

By default, the DTM creates the index and data files for Aggregator, Rank, Joiner, and Lookup transformations in the directory configured for the $PMCacheDir server variable. The PowerCenter Server names the index file PM*.idx, and the data file PM*.dat. The PowerCenter Server creates the index and data files for the Sorter transformation in the $PMTempDir server variable directory.

The PowerCenter Server writes to the cache files during the session in the following cases:

♦ The mapping contains one or more Aggregator transformations configured without sorted ports.

♦ The session is configured for incremental aggregation.

♦ The mapping contains a Lookup transformation that is configured to use a persistent lookup cache, and the PowerCenter Server runs the session for the first time.

♦ The mapping contains a Lookup transformation that is configured to initialize the persistent lookup cache.

♦ The DTM runs out of cache memory and pages to the local cache files. The DTM may create multiple files when processing large amounts of data. The session fails if the local directory runs out of disk space.

After the session completes, the DTM generally deletes the overflow index and data files. It does not delete the cache files under the following circumstances:

♦ The session is configured to perform incremental aggregation.

♦ The session is configured with a persistent lookup cache.

Incremental Aggregation FilesIf the session performs incremental aggregation, the PowerCenter Server saves index and data cache information to disk when the session finished. The next time the session runs, the PowerCenter Server uses this historical information to perform the incremental aggregation.


The PowerCenter Server names these files PMAGG*.dat and PMAGG*.idx and saves them to the cache directory.

For more information about incremental aggregation, see “Using Incremental Aggregation” on page 573.

Persistent Lookup CacheIf a session uses a Lookup transformation, you can configure the transformation to use a persistent lookup cache. With this option selected, the PowerCenter Server saves the lookup cache to disk the first time it runs the session, then uses this lookup cache during subsequent session runs. These files are saved in the cache directory. If you do not name the files in the transformation properties, these files are named PMLKUP*.idx and PMLKUP*.dat.

For more information about lookup caching, see “Session Caches” on page 613 and “Lookup Transformation” in the Transformation Guide.



C h a p t e r 2

Configuring the Workflow Manager

This chapter covers the following topics:

♦ Overview, 38

♦ Customizing the Workflow Manager Options, 39

♦ Registering the PowerCenter Server, 46

♦ Configuring Connection Object Permissions, 51

♦ Setting Up a Relational Database Connection, 53

♦ Replacing a Relational Database Connection, 62

37

Overview

Before you can use the Workflow Manager to create workflows and sessions, you must configure the Workflow Manager. You can configure display options and connection information in the Workflow Manager. You must register a PowerCenter Server before you can start it or create a workflow to run against it.

You can configure the following information in the Workflow Manager:

♦ Configure Workflow Manager options. You can configure options such as grouping sessions or docking and undocking windows. For details, see “Customizing the Workflow Manager Options” on page 39.

♦ Register PowerCenter Servers. Before you can start a PowerCenter Server, you must register it with the repository. For details, see “Registering the PowerCenter Server” on page 46.

♦ Create a server grid. When you have multiple PowerCenter Servers registered to the same repository you can create a server grid to balance workloads. For details, see “Working with Server Grids” on page 446.

♦ Create source and target database connections. Create connections to each source and target database. You must create connections to a database before you can create a session that accesses the database. For details, see “Setting Up a Relational Database Connection” on page 53.

♦ Create connections objects. Create connection objects in the repository when you define database, FTP, and external loader connections. For details, see “Configuring Connection Object Permissions” on page 51.

Setting the Date/Time Display FormatThe Workflow Manager displays the date and time formats configured in the Windows Control Panel of the PowerCenter Client machine. To modify the date and time formats, display the Control panel and open Regional Settings. Set the date and time formats on the Date and Time tabs.

Note: For the Timer task and schedule settings, the Workflow Manager displays date in short date format, and the time in 24-hour format (HH:mm).

38 Chapter 2: Configuring the Workflow Manager

Customizing the Workflow Manager Options

You can customize the Workflow Manager default options to control the behavior and look of the Workflow Manager tools.

To configure Workflow Manager options, choose Tools-Options. You can configure the following options:

♦ General. You can configure workspace options, display options, and other general options on the General tab. For more information about the General tab, see “Configuring General Options” on page 39.

♦ Format. You can configure font, color, and other format options on the Format tab. For more information about the Format tab, see “Configuring Format Options” on page 42.

♦ Miscellaneous. You can configure Copy Wizard and Versioning options on the Miscellaneous tab. For more information about the Miscellaneous tab, see “Configuring Miscellaneous Options” on page 43.

♦ Advanced. You can configure enhanced security for connection objects in the Advanced tab. For more information about the Advanced tab, see “Enabling Enhanced Security” on page 44.

Configuring General OptionsGeneral options control tool behavior such as whether or not a tool retains its view when you close it, how the Overview window behaves, and where the Workflow Manager stores workspace files.

Customizing the Workflow Manager Options 39

Figure 2-1 shows the Workflow Manager General Options:

Table 2-1 describes general options you can configure in the Workflow Manager:

Figure 2-1. Workflow Manager General Options

Table 2-1. Workflow Manager General Options

Option Description

Reload Tasks/Workflows When Opening a Folder

Reloads the last view of a tool when you open it. For example, if you have a workflow open when you disconnect from a repository, select this option so that the same workflow displays the next time you open the folder and Workflow Designer. Enabled by default.

Ask Whether to Reload the Tasks/Workflows

Appears only when you select Reload tasks/workflows when opening a folder. Select this option if you want the Workflow Manager to prompt you to reload tasks, workflows, and worklets each time you open a folder. Disabled by default.

Overview Window Pans Delay

By default, when you drag the focus of the Overview window, the focus of the workbook moves concurrently. When you select this option, the focus of the workspace does not change until you release the mouse button. Disabled by default.

Arrange Workflows/Worklets Vertically By Default

Arranges tasks in workflows vertically by default. Disabled by default.

Allow Invoking In-Place Editing Using the Mouse

By default, you can press F2 to edit objects directly in the workspace instead of opening the Edit Task dialog box. Select this option so you can also click the object name in the workspace to edit the object. Disabled by default.


Open Editor When Task Is Created

Opens the Edit Task dialog box when you create a task. By default, the Workflow Manager creates the task in the workspace. If you do not enable this option, double-click the task to open the Edit Task dialog box. Disabled by default.

Workspace File Directory

The directory for workspace files created by the Workflow Manager. Workspace files maintain the last task or workflow you saved. This directory should be local to the PowerCenter Client to prevent file corruption or overwrites by multiple users. By default, the Workflow Manager creates files in the PowerCenter Client installation directory.

Display Tool Names On Views

Displays the name of the tool in the upper left corner of the workspace or workbook. Enabled by default.

Always Show the Full Name of Selected Task

Shows the full name of a task when you select it. By default, the Workflow Manager abbreviates the task name in the workspace. Enabled by default.

Show the Expression On a Link

Shows the link condition in the workspace. If you do not enable this option, the Workflow Manager abbreviates the link condition in the workspace. Enabled by default.

Launch Workflow Monitor when Workflow is Started

The Workflow Monitor launches when you start a workflow or a task. Enabled by default.

Receive Notifications from Server

Allows you to receive notification messages from the Repository Server. The Repository Server sends notification about actions performed on repository objects. Enabled by default. For details, see �Understanding the Repository� in the Repository Guide.

Table 2-1. Workflow Manager General Options

Option Description


Configuring Format OptionsFormat options control colors and fonts. To configure format options, select the appropriate Workflow Manager tool.

Figure 2-2 shows the Workflow Manager Format Options:

Table 2-2 describes the format options for the Workflow Manager:

Figure 2-2. Workflow Manager Format Options

Table 2-2. Workflow Manager Format Options

Option Description

Show Solid Lines for Links

Displays links as solid lines. By default, the Workflow Manager displays links as dotted lines.

Workspace Colors Displays all items that you can customize in the selected tool. Select an item to change its color.

Color Choose the color of the selected item in Workspace Colors.

Font Categories Select the Workflow Manager tool for which you want to customize the display font.

Change Font Select to change the display font and language script for the Workflow Manager tool you choose from the Categories menu.

Reset All Resets all format options to their original default values.


Configuring Miscellaneous OptionsCopy Wizard options control the display settings and available functions for the Copy Wizard. Versioning options control how the Workflow Manager displays checked out objects. Target loading options control how the PowerCenter Server loads targets. To configure Copy Wizard, Versioning, or Target Load Type options, choose Tools-Options and select the Miscellaneous tab.

Figure 2-3 shows the Workflow Manager Miscellaneous Options:

Table 2-3 describes the options for the Copy Wizard, Versioning, and Target Load Type:

Figure 2-3. Copy Wizard, Versioning, and Target Load Type Options

Table 2-3. Workflow Manager Miscellaneous Options

Option Description

Validate Copied Objects Validates the copied object. Enabled by default.

Generate Unique Name When Resolved to �Rename�

Generates unique names for copied objects if you select the Rename option. For example, if the workflow wf_Sales has the same as a workflow in the destination folder, the Rename option generates the unique name wf_Sales1. Enabled by default.

Get Default Object When Resolved to �Choose�

Uses the object with the same name in the destination folder if you select the Choose option.

Show Check Out Image in Navigator

Displays the Check Out icon when an object has been checked out. Enabled by default.


Enabling Enhanced SecurityThe Workflow Manager has an enhanced security option that allows you to specify a default set of privileges that applies to restricted access controls for connection objects.

When you enable enhanced security, the Workflow Manager automatically assigns default permissions for connection objects to the object owner, owner group, and all other users. You can assign read, write, and execute permissions to an object, and specify permission for users and groups you add in the Permissions dialog box when you edit a connection.

Table 2-4 lists the default permissions to a connection object:

If you do not enable enhanced security, the Workflow Manager assigns Read, Write, and Execute permissions to all users or groups for the connection.

Enabling enhanced security does not lock the restricted access settings for connection objects. You can continue to change the permissions for connection objects after enabling enhanced security.

If you delete the Owner from the repository, the Workflow Manager automatically assigns ownership of the object to Administrator.

To enable enhanced security for connection objects:

1. Choose Tools-Options.

2. Click the Advanced Tab.

Reset All Resets all Copy Wizard and Versioning options to their default values.

Target Load Type Sets default load type for sessions. You can choose normal or bulk loading. Any change you make takes effect after you restart the Workflow Manager.You can override this setting in the session properties. Default is Bulk.For more information on normal and bulk loading, see Table A-15 on page 697.

Table 2-4. Default Permissions for Connection Objects

User Default Connection Object Permissions

Owner Read/Write/Execute

Owner Group Read/Execute

World No permissions

Table 2-3. Workflow Manager Miscellaneous Options

Option Description


3. Select Enable Enhanced Security.

4. Click OK.


Registering the PowerCenter Server

Before you can start the PowerCenter Server or create or run workflows, you need to register the PowerCenter Server in the repository. Use the Workflow Manager to register the PowerCenter Server.

To register, edit, or delete the PowerCenter Server, you must have Administer Server, Administrator, or Super User privileges. In addition, to register a PowerCenter Server, you need the following information:

♦ PowerCenter Server name.

♦ Host name.

♦ TCP/IP address used to access the PowerCenter Server.

Use the IP address or host name of the machine on which the PowerCenter Server runs, and the port number the PowerCenter Server uses on that machine.

♦ Code page identifying the character set associated with the PowerCenter Server.

♦ Default directories you want the PowerCenter Server to use for workflow files and caches.

You can perform the following registration tasks for a PowerCenter Server:

♦ Register a PowerCenter Server. When you register a PowerCenter Server, specify information such as the code page and directories for session output. This information is stored in the repository.

When you register multiple PowerCenter Servers, you can choose the PowerCenter Server to run a workflow or a session. You also can create a server grid to distribute workloads across multiple servers.

♦ Edit a PowerCenter Server. When you edit a PowerCenter Server, all workflows and sessions using that PowerCenter Server use the updated server connection information, including the updated code page settings. You do not need to restart the Workflow Manager to use the updated information.

♦ Delete a PowerCenter Server. When you delete a PowerCenter Server, you must assign another PowerCenter Server for the workflows and sessions using the deleted server before you can run the workflow. To assign a PowerCenter Server to a workflow or to a session, choose Connections-Assign.

Server VariablesYou can define server variables for each PowerCenter Server you register. Some server variables define the path and directories for workflow output files and caches. By default, the PowerCenter Server places output files in these directories when you run a workflow. Other server variables define server attributes such as log file count. In a server grid, you must use the same server variables for each server.

The installation process creates directories in the location where you install the PowerCenter Server. To use these directories as the default location for the session output files, you must first set the server variable $PMRootDir to define the path to the directories.


By using server variables, you simplify the process of changing the PowerCenter Server that runs a workflow. If each workflow in a folder uses server variables, then when you copy the folder to a production repository, the PowerCenter Server in production can run the workflow using the server variables defined with the PowerCenter server running against the test repository. The PowerCenter Server reads and writes the files to the directories in the $PMRootDir path. To ensure a workflow successfully completes, relocate any necessary file source or incremental aggregation file to the default directories of the new PowerCenter Server.

Table 2-5 lists the server variables you configure when you register a PowerCenter Server:

Table 2-5. Server Variables

Server Variable Required/Optional Description

$PMRootDir Required A root directory to be used by any or all other server variables. Informatica recommends you use the PowerCenter Server installation directory as the root directory.

$PMSessionLogDir Required Default directory for session logs. Defaults to $PMRootDir/SessLogs.

$PMBadFileDir Required Default directory for reject files. Defaults to $PMRootDir/BadFiles.

$PMCacheDir Required Default directory for the index and data cache files. Defaults to $PMRootDir/Cache. To avoid performance problems, always use a drive local to the PowerCenter Server for the cache directory. Do not use a mapped or mounted drive for cache files.

$PMTargetFileDir Required Default directory for target files. Defaults to $PMRootDir/TgtFiles.

$PMSourceFileDir Required Default directory for source files. Defaults to $PMRootDir/SrcFiles.

$PMExtProcDir Required Default directory for external procedures. Defaults to $PMRootDir/ExtProc.

$PMTempDir Required Default directory for temporary files. Defaults to $PMRootDir/Temp.

$PMSuccessEmailUser Optional Email address to receive post-session email when the session completes successfully. Use to address post-session email. The default value is an empty string. For details, see �Sending Email� on page 319.

$PMFailureEmailUser Optional Email address to receive post-session email when the session fails. The default value is an empty string. Use to address post-session email.

$PMSessionLogCount Optional Number of session logs the PowerCenter Server archives for the session. Use to archive session logs. For details, see �Viewing Session Logs� on page 474. Defaults to 0.

$PMSessionErrorThreshold Optional Number of non-fatal errors the PowerCenter Server allows before failing the session. Non-fatal errors include reader, writer, and DTM errors. If you want to stop the session on errors, enter the number of non-fatal errors you want to allow before stopping the session. The PowerCenter Server maintains an independent error count for each source, target, and transformation. Use to configure the Stop On option in the session properties.Defaults to 0. If you use the default setting, non-fatal errors do not cause the session to stop.

Registering the PowerCenter Server 47

Steps for Registering a PowerCenter ServerYou can register one or more PowerCenter Servers with a PowerCenter repository, allowing you to run workflows and sessions on different servers. In a multiple server environment, it is important to enter descriptive server names for each registered server to help users differentiate between servers. When you register multiple servers you must have a unique server name and a unique combination of host name and port number for each server in the repository. For more information on using multiple servers, see “Using Multiple Servers” on page 443.

To register the PowerCenter Server:

1. In the Workflow Manager, connect to the repository.

Note: The first time you connect to the repository, use the database user name and password used to create the repository.

2. Choose Server-Server Configuration.

The Server Browser dialog box appears.

3. Click New to register a new server.

$PMWorkflowLogDir Required Default directory for workflow logs. Defaults to $PMRootDir/WorkflowLogs.

$PMWorkflowLogCount Optional Number of workflow logs the PowerCenter Server archives for the workflow. Defaults to 0.

$PMLookupFileDir Optional Default directory for lookup files. Defaults to $PMRootDir/LkpFiles.

Table 2-5. Server Variables

Server Variable Required/Optional Description


The Server dialog box appears.

4. Enter a new server name.

5. Configure the TCP/IP connectivity settings.

6. If you do not know the IP address, enter the host name and use the Resolve Server button to resolve the IP address. You can also enter the IP address in the Host Name/IP Address field and use the Resolve Server button to resolve the host name.

The Workflow Manager can only resolve the host name or IP address if you enter the information in the Host Name/IP Address field.

The Workflow Manager also resolves the host name or IP address when you click OK.

Table 2-6 describes the settings required to register a PowerCenter Server using TCP/IP:

Table 2-6. TCP/IP Settings to Register a Server

TCP/IP Option Required/Optional Description

Server Name Required The name of PowerCenter Server. This name must be unique to the repository.

Host Name or IP address

Required Server host name or IP address of the PowerCenter Server machine.

Resolved IP Address n/a (read-only) The IP address resolved by the Workflow Manager. This is a read-only field.

Port Number Required Port number the PowerCenter Server uses. Must be the same port listed in the PowerCenter Server configuration parameters.

Registering the PowerCenter Server 49

7. For $PMRootDir, enter a valid root directory for the PowerCenter Server platform.

Informatica recommends using the PowerCenter Server installation directory as the root directory because the PowerCenter Server installation creates the default server directories there. If you enter a different root directory, make sure to create the necessary directories.

8. Enter the server variables, as desired.

Do not use trailing delimiters. A trailing delimiter might invalidate the directory used by the PowerCenter Server. For example, enter c:\data\sessionlog, not c:\data\sessionlog\.

See Table 2-5 on page 47 for a list of server variables.

9. Click OK.

The new PowerCenter Server appears in the Navigator below the repository.

Deleting a PowerCenter ServerWhen you delete a PowerCenter Server with associated workflows, assign another server to the workflows. For details, see “Assigning the PowerCenter Server to a Workflow” on page 122.

To delete a PowerCenter Server, you must have one of the following privileges:

♦ Administer Server privilege

♦ Super User privilege

To delete a server:

1. In the Workflow Manager, choose Server-Server Configuration.

2. Select the PowerCenter Server you want to delete.

3. Click Delete.

4. Click OK.

Timeout Required Number of seconds the Workflow Manager waits for a response from the PowerCenter Server.

Code Page Required Character set associated with the PowerCenter Server. Select the code page identical to the PowerCenter Server operating system code page. Must be identical to or compatible with the repository code page.

Table 2-6. TCP/IP Settings to Register a Server

TCP/IP Option Required/Optional Description


Configuring Connection Object Permissions

You create connection objects in the repository when you define the following connections:

♦ Relational. Database connections for relational source or target databases. For more information about relational database connections, see “Setting Up a Relational Database Connection” on page 53.

♦ Queue. Database connections for message queues. For more information about message queues, see the PowerCenter Connect for IBM MQSeries User and Administrator Guide.

♦ FTP. Connection to access source or target files using File Transfer Protocol (FTP). For more information about using FTP, see “Using FTP” on page 559.

♦ Application. Database connection to access databases such as SAP R/3 and PeopleSoft. For more information, see your PowerCenter Connect documentation.

♦ Loader. Connection to access target databases using external loaders. For more information about using external loaders, see “External Loading” on page 523.

With correct permissions, you can access these objects from all folders in the repository and use them in any session.

Connection Object PermissionsYou can configure and manage permissions within each connection object. The Workflow Manager assigns Owner permissions to the user who creates the connection. The Workflow Manager grants Owner Group permissions to the first group in the Group Memberships list of the owner.

The Workflow Manager automatically assigns default permissions for connection objects to the object owner, owner’s group, and all other users if you enable enhanced security. For more information about enhanced security, see “Enabling Enhanced Security” on page 44.

You can specify read, write, and execute permissions for each user and group in the list. You can perform the following types of tasks with different connection object permissions, in combination with user privileges and folder permissions:

♦ Read. View the connection object in the Workflow Manager and Repository Manager. When you have read permission, you can perform tasks in which you view, copy, or edit repository objects associated with the connection object.

♦ Write. Edit the connection object.

♦ Execute. Run sessions that use the connection object.

For information on tasks you can perform with user privileges, folder permissions, and connection object permissions, see “Repository Security” in the Repository Guide.

To manage connection permissions, you must have Super User privileges or be the owner of the connection. If you do not have the privilege to manage connection permissions, the Permissions dialog box is read-only. You can change the owner of the object, add or remove users and groups in the permissions list, and change the permissions for each user or group.

Configuring Connection Object Permissions 51

To view or delete a connection, you must have at least read permission for the connection. To edit a connection, you must have read and write permissions for the connection.

You add permissions from the Connection Browser dialog box.

To configure permissions for connection objects:

1. Open the Connection Browser dialog box for the connection object. For example, choose Connections-Relational to open the Connection Browser dialog box for a relational database connection.

2. Select the connection object you want to configure in the Connection Browser dialog box.

3. Click Permissions to open the Permissions dialog box.

4. Select the owner and group for the connection object.

5. Add user or group you want to assign permissions for the connection, and click OK.

Configure permissions for connection objects.


Setting Up a Relational Database Connection

Before the PowerCenter Server can access a source or target database in a session, you must configure the database connections in the Workflow Manager. When you create or modify a session that reads from or writes to a relational database, you can select only configured source and target databases. Database connections are saved in the repository.

When you create a connection, you must have the following information available:

♦ Database name. Name for the connection.

♦ Database type. Type of the source or target database.

♦ Database username. Name of a user who has the appropriate database permissions to read from and write to the database.

♦ Password. Database password (7-bit ASCII only).

♦ Connect string. Connect string used to communicate with the database.

♦ Database code page. Code page associated with the database.

Some database drivers, such as ISG Navigator, do not allow user names and passwords. Since the Workflow Manager requires a database user name and password, PowerCenter provides two reserved words to register databases that do not allow user names and passwords:

♦ PmNullUser

♦ PmNullPasswd

Use the PmNullUser user name if you are using Oracle OS Authentication. Oracle OS Authentication allows you to log on to an Oracle database if you have a logon to the operating system. You do not need to know a database user name and password. PowerCenter uses Oracle OS Authentication when the connection user name is PmNullUser and the connection is for an Oracle database.

You can change connection information at any time. If you edit a Workflow Manager connection used by a workflow, the PowerCenter Server uses the updated connection information the next time the workflow runs. You might use this functionality when moving from test to production.

Tip: If you edit a database connection, all sessions using the named connection then use the updated connection.

To create a database connection, you must have one of the following privileges:

♦ Use Workflow Manager

♦ Super User

Database Connect StringsWhen you create a database connection, specify a connect string for that connection. The PowerCenter Server uses connect strings to communicate with a database.

Setting Up a Relational Database Connection 53

Table 2-7 lists the native connect string syntax for each supported database when you create or update connections:

Database Connection Code PagesWhen you create a database connection, select a code page for that connection. Code pages must be compatible for accurate data movement.

If you configure the PowerCenter Server and PowerCenter Client for data code page validation, the PowerCenter Server enforces code page compatibility at session runtime. Use the following guidelines to determine code page compatibility:

♦ The target database code page must be a superset of the source database code page and the PowerCenter Server code page.

♦ The source database code page must be a subset of the target database code page and the PowerCenter Server code page.

For example, if the source database code page is 7-bit ASCII and the PowerCenter Server code page is Latin 1, the target database code page must be Latin 1, which is a superset of 7-bit ASCII.

Table 2-8 summarizes code page compatibility between the source and target code pages when you configure the PowerCenter Client and PowerCenter Server for data code page validation:

Table 2-7. Native Connect String Syntax

Database Connect String Syntax Example

IBM DB2 dbname mydatabase

Informix dbname@servername mydatabase@informix

Microsoft SQL Server servername@dbname sqlserver@mydatabase

Oracle dbname.world (same as TNSNAMES entry) oracle.world

Sybase servername@dbname sambrown@mydatabase

Teradata* ODBC_data_source_name or ODBC_data_source_name@db_name or ODBC_data_source_name@db_user_name

TeradataODBCTeradataODBC@mydatabaseTeradataODBC@jsmith

*Use Teradata ODBC drivers to connect to source and target databases.

Table 2-8. Source and Target Code Page Compatibility

Component Code Page Code Page Compatibility

Source Subset of target and PowerCenter Server.

Target Superset of source and PowerCenter Server.The PowerCenter Server creates external loader data and control files using the target flat file code page.


When you change the code page in a database connection, you must choose one that is compatible with the previous code page. If the code pages are incompatible, the Workflow Manager invalidates all sessions using that database connection.

If you configure the PowerCenter Client and PowerCenter Server for relaxed data code page validation, you can select any supported code page for source and target database connections. If you are familiar with your data and are confident that it will convert safely from one code page to another, you can run sessions with incompatible source and target data code pages. It is your responsibility to ensure your data will convert properly.

For details, see “Globalization Overview” and “Code Pages” in the Installation and Configuration Guide.

Configuring Environment SQLFor relational databases, you may need to execute some SQL commands in the database environment when you connect to the database. For example, you might want to set isolation levels on the source and target systems to avoid deadlocks.

You configure environment SQL in the database connection. You can use environment SQL for source, target, lookup, and stored procedure connections. If the SQL syntax is not valid, the PowerCenter Server does not connect to the database, and the session fails.

The PowerCenter Server executes the SQL each time it connects to the database. For example, if you configure environment SQL in a target connection, and you configure three partitions for the pipeline, the PowerCenter Server executes the SQL three times, once for each connection to the target database.

Guidelines for Entering Environment SQLConsider the following guidelines when creating the SQL statements:

♦ You can enter any SQL command that is valid in the database associated with the connection object. The PowerCenter Server does not allow nested comments, even though the database might.

♦ When you enter SQL in the SQL Editor, you manually type in the SQL statements.

♦ Use a semi-colon (;) to separate multiple statements.

♦ The PowerCenter Server ignores semi-colons within single quotes, double quotes, or within /* ...*/.

♦ If you need to use a semi-colon outside of quotes or comments, you can escape it with a back slash (\).

♦ You cannot use session or mapping variables in the environment SQL.

♦ You can configure the table owner name using sqlid in the environment SQL for a DB2 connection. However, the table owner name in the target instance overrides the SET sqlid statement in environment SQL. To use the table owner name specified in the SET sqlid statement, do not enter a name in the target name prefix.


Configuring a Relational Database ConnectionUse the following procedure to configure a relational database connection.

To create a relational database connection:

1. In the Workflow Manager, connect to a repository.

2. Choose Connections-Relational.

A dialog box appears, listing all the registered source and target database connections.

3. Select the type of database connection you want to create.

4. Click New.


The Connection Object Definition dialog box appears.

5. For relational database connections, enter the connection information listed in Table 2-9:

Table 2-9. Relational Database Connection Information

Database Connection Option

Required/Optional Description

Name Required Connection name used by the Workflow Manager. Connection name cannot contain spaces or other special characters, except for the underscore.

Type Required Type of database.

User Name Required Database user name with the appropriate read and write database permissions to access the database. If you are using Oracle OS Authentication, or you are using databases such as ISG Navigator that do not allow user names, enter PmNullUser. For Teradata connections, this overrides the default database user name in the ODBC entry.

Password Required Password for the database user name. For Oracle OS Authentication, or for databases such as ISG Navigator that do not allow passwords, enter PmNullPassword. For Teradata connections, this overrides the database password in the ODBC entry.Passwords must be in 7-bit ASCII only.


6. For each type of relational database connection, enter the attributes listed in Table 2-10:

7. Click OK.

The new database connection appears in the Connection Browser list.

8. To add more database connections, repeat steps 3-7.

Connect String Required for all databases, except Microsoft SQL Server and Sybase

Connect string used to communicate with the database. For syntax, see �Database Connect Strings� on page 53.

Code Page Required Specifies the code page the PowerCenter Server uses to read from a source database or write to a target database or file.

Table 2-10. Relational Database Connection Attributes

Attribute Name Relational Database Type Description

Rollback Segment Oracle The name of the rollback segment. A rollback segment records database transactions in the event that you want to undo the transaction.

Enable Parallel Mode Oracle Enables parallel processing when loading data into a table in bulk mode.

Environment SQL All relational databases Enter SQL commands to set the database environment when you connect to the database.

Database Name Sybase, Microsoft SQL Server, and Teradata

The name of the database. For Teradata connections, this overrides the default database name in the ODBC entry. Also, if you do no enter a database name here for a Teradata connection, the PowerCenter Server uses the default database name in the ODBC entry.

Data Source Name Teradata The name of the Teradata ODBC data source.

Server Name Sybase and Microsoft SQL Server

Database server name. Used to configure workflows.

Packet Size Sybase and Microsoft SQL Server

Used to optimize the ODBC connection to Sybase and Microsoft SQL Server.

Domain Name Microsoft SQL Server The name of the domain. Used for Microsoft SQL Server on Windows.

Use Trusted Connection Microsoft SQL Server If selected, the PowerCenter Server uses Windows authentication to access the Microsoft SQL Server database. The user name that starts the PowerCenter Server must be a valid Windows user with access to the Microsoft SQL Server database.

Table 2-9. Relational Database Connection Information

Database Connection Option



9. Click OK to save all changes.

Deleting Connection ObjectsWhen you delete relational, queue, FTP, Application, and external loader connections, the Workflow Manager marks all sessions that use these connections invalid. To make the sessions valid, you must edit them and replace the missing connections.

Copying a Relational Database ConnectionAfter you set up a relational database connection, you can make a copy of it by clicking the Copy As button. The Workflow Manager allows you to choose the relational database type when you make a copy of a relational database connection.

When you make a copy of a relational database connection, the Workflow Manager retains the connection properties that apply to the relational database type you select. The copy of the connection is invalid if a required connection property is missing. Edit the connection properties manually to validate the connection.

The Workflow Manager appends an underscore and the first three letters of the relational database type to the name of the new database connection. For example, you make a copy of the Microsoft SQL Server database connection called Dev_Target. You choose Oracle for the type of the new database connection. The Workflow Manager names the new database connection Dev_Target_Ora.

To copy a relational database connection:

1. Choose Connections-Relational.

The Relational Connection Browser appears.

2. Choose the relational connection you want to copy.

Tip: Hold the shift key to select more than one connection to copy.


3. Click Copy As.

The Select Subtype dialog box appears.

4. Select a relational database type for the copy of the connection.

5. Click OK.

6. The Workflow Manager retains connection properties that apply to the relational database type.

If a required connection property does not exist, the Workflow Manager displays a warning message.

7. Click OK to close the warning dialog box.

8. The copy of the connection appears in the Relational Connection Browser.


9. If the copied connection is invalid, click the Edit button to enter required connection properties.

10. Click Close to close the Relational Connection Browser dialog box.


Replacing a Relational Database Connection

You can replace a relational database connection with another relational database connection. For example, you might have several sessions that you want to write to another target database. Instead of editing the properties for each session, you can replace the relational database connection for all sessions in the repository that use the connection.

When you replace database connections, the Workflow Manager replaces the relational database connections in the following locations for all sessions using the connection:

♦ Source connection

♦ Target connection

♦ Connection Information property in Lookup and Stored Procedure transformations

♦ $Source Connection Value session property

♦ $Target Connection Value session property

If the repository contains both relational and application connections with the same name, the Workflow Manager only replaces the relational connection when you specified the connection type as relational in all locations in the repository.

For example, you have a relational and an application source, each called ITEMS. In one session, you specified the name ITEMS for a source connection instead of Relational:ITEMS.

When you replace the relational connection ITEMS with another relational connection, the Workflow Manager does not replace any relational connection in the repository because it cannot determine the connection type for the source connection entered as ITEMS.

The PowerCenter Server uses the updated connection information the next time the workflow runs.

To replace connections in the Workflow Manager, you must have Super User privilege.

You must first close all folders before replacing a relational database connection.

To replace a relational database connection:

1. Close all folders in the repository.

2. Choose Connections-Replace.


The Replace Connections dialog box appears.

3. Click the Add button to replace a connection.

4. In the From list, choose a relational database connection you want to replace.

5. In the To list, choose the replacement relational database connection.

6. Click Replace.

All sessions in the repository that use the From connection now use the connection you choose in the To list.

Replace a connection.

Replacing a Relational Database Connection 63


C h a p t e r 3

Using the Workflow Manager


♦ Overview, 66

♦ Navigating the Workspace, 69

♦ Working with Repository Objects, 73

♦ Checking Out and In Versioned Repository Objects, 74

♦ Searching For Versioned Objects, 76

♦ Copying Repository Objects, 77

♦ Comparing Repository Objects, 79

♦ Working with Metadata Extensions, 82

65

Overview

In the Workflow Manager, you define a set of instructions called a workflow to execute mappings you build in the Designer. Generally, a workflow contains a session and any other task you may want to perform when you execute a session. Tasks can include a session, email notification, or scheduling information. You connect each task with links in the workflow.

You can also create a worklet in the Workflow Manager. A worklet is an object that groups a set of tasks. A worklet is similar to a workflow, but without scheduling information. You can execute a batch of worklets inside a workflow.

After you create a workflow, you run the workflow in the Workflow Manager and monitor it in the Workflow Monitor. For details on the Workflow Monitor, see “Monitoring Workflows” on page 401.

Workflow Manager ToolsTo create a workflow, you first create tasks such as a session, which contains the mapping you build in the Designer. You then connect tasks with conditional links to specify the order of execution for the tasks you created. The Workflow Manager consists of three tools to help you develop a workflow:

♦ Task Developer. Use the Task Developer to create tasks you want to execute in the workflow.

♦ Workflow Designer. Use the Workflow Designer to create a workflow by connecting tasks with links. You can also create tasks in the Workflow Designer as you develop the workflow.

♦ Worklet Designer. Use the Worklet Designer to create a worklet.

Figure 3-1 shows what a workflow might look like if you want to run a session, perform a shell command after the session completes, and then stop the workflow:

Workflow TasksYou can create the following types of tasks in the Workflow Manager:

♦ Assignment. Assigns a value to a workflow variable. For details, see “Working with the Assignment Task” on page 140.

♦ Command. Specifies a shell command to run during the workflow. For details, see “Using Workflow Variables” on page 103.

Figure 3-1. Sample Workflow

66 Chapter 3: Using the Workflow Manager

♦ Control. Stops or aborts the workflow. For details on the Control task, see “Stopping or Aborting the Workflow” on page 129.

♦ Decision. Specifies a condition to evaluate. For details, see “Working with the Decision Task” on page 149.

♦ Email. Sends email during the workflow. For details on the Email task, see “Sending Email” on page 319.

♦ Event-Raise. Notifies the Event-Wait task that an event has occurred. For details, see “Working with Event Tasks” on page 153.

♦ Event-Wait. Waits for an event to occur before executing the next task. For details, see “Working with Event Tasks” on page 153.

♦ Session. Runs a mapping you create in the Designer. For details on the Session task, see “Working with Sessions” on page 173.

♦ Timer. Waits for a timed event to trigger. For details, see “Scheduling a Workflow” on page 112.

Workflow Manager WindowsThe Workflow Manager displays the following windows to help you create and organize workflows:

♦ Navigator. Allows you to connect to and work in multiple repositories and folders. In the Navigator, the Workflow Manager displays a red icon over invalid objects.

♦ Workspace. Allows you to create, edit, and view tasks, workflows, and worklets.

♦ Output. Contains tabs to display different types of output messages. The Output window contains the following tabs:

− Save. Displays messages when you save a workflow, worklet, or task. The Save tab displays a validation summary when you save a workflow or a worklet.

− Fetch Log. Displays messages when the Workflow Manager fetches objects from the repository.

− Validate. Displays messages when you validate a workflow, worklet, or task.

− Copy. Displays messages when you copy repository objects.

− Server. Displays messages from the PowerCenter Server.

− Notifications. Displays messages from the Repository Server.

♦ Overview. An optional window that allows you to easily view large workflows in the workspace. Outlines the visible area in the workspace and highlights selected objects in color. Choose View-Overview Window to display this window.

You can view a list of open windows and switch from one window to another in the Workflow Manager. To view the list of open windows, choose Window-Windows.

The Workflow Manager also displays a status bar that shows the status of the operation you perform.

Overview 67

Figure 3-2 shows the Workflow Manager windows:

Figure 3-2. Workflow Manager WindowsNavigator Workspace

Overview

Output

Status Bar


Navigating the Workspace

The Workflow Manager allows you to perform the following operations to navigate the workspace:

♦ Customize windows.

♦ Customize toolbars.

♦ Search for tasks, links, events and variables.

♦ Arrange objects in the workspace.

♦ Zoom and pan the workspace.

Customizing Workflow Manager WindowsYou can customize the following options for the Workflow Manager windows:

♦ Display a window. From the menu, choose View. Then select the window you want to open.

♦ Close a window. Click the small x in the upper right corner of the window.

♦ Dock or undock a window. Double-click the title bar, or drag the title bar toward or away from the workspace.

Using ToolbarsThe Workflow Manager can display the following toolbars to help you select tools and perform operations quickly:

♦ Standard. Contains buttons to connect to and disconnect from repositories and folders, toggle windows, zoom in and out, pan the workspace, and find objects.

♦ Connections. Contains buttons to open connection browsers and to assign servers.

♦ Repository. Contains buttons to connect to, disconnect from, and add repositories, open folders, close tools, save changes to repositories, and print the workspace.

♦ View. Contains buttons to customize toolbars, toggle the status bar and windows, toggle full-screen view, create a new workbook, and view the properties of objects.

♦ Layout. Contains buttons to arrange and restore objects in the workspace, find objects, zoom in and out, and pan the workspace.

♦ Tasks. Contains buttons to create tasks.

♦ Workflow. Contains buttons to edit workflow properties.

♦ Run. Contains buttons to schedule the workflow, start the workflow, or start a task.

Navigating the Workspace 69

You can perform the following operations with toolbars:

♦ Display or hide a toolbar.

♦ Create a new toolbar.

♦ Add or remove buttons.

For details on how to perform these toolbar operations, see “Using the Designer” in the Designer Guide.

Searching for ItemsThe Workflow Manager includes search features to help you find tasks, links, variables, and events in the workspace as well as text in the Output window. You can search for items in any Workflow Manager tool or Output window.

There are two ways to search for items in the workspace:

♦ Find in Workspace. Searches multiple items at once and returns a list of all task names, link conditions, event names, or variable names that contain the search string.

♦ Find Next. Searches through items one at a time and highlights the first task, link, event, variable, or text string that contains the search string. If you repeat the search, the Workflow Manager highlights the next item that contains the search string.

To find a task, link, event, or variable in the workspace:

1. In any Workflow Manager tool, click the Find in Workspace toolbar button or choose Edit-Find in Workspace.

The Find in Workspace dialog box opens:

2. Choose whether you want to search for tasks, links, variables, or events.

3. Enter a search string, or select a string from the list.

The Workflow Manager saves the last 10 search strings in the list.

4. Specify whether or not to match whole words and whether or not to perform a case-sensitive search.

5. Click Find Now.

The Workflow Manager lists task names, link conditions, event names, or variable names that match the search string at the bottom of the dialog box.

6. Click Close.


To find a single object:

1. To search for a task, link, event, or variable, open the appropriate Workflow Manager tool and click a task, link, or event. To search for text in the Output window, click the appropriate tab in the Output window.

2. Enter a search string in the Find field on the standard toolbar.

The search is not case-sensitive.

3. Choose Edit-Find Next, click the Find Next button on the toolbar, or press Enter or F3 to search for the string.

The Workflow Manager highlights the first task name, link condition, event name, or variable name that contains the search string, or the first string in the Output window that matches the search string.

4. To search for the next item, press Enter or F3 again.

The Workflow Manager alerts you when you have searched through all items in the workspace or Output window before it highlights the same objects a second time.

Arranging Objects in the WorkspaceThe Workflow Manager can arrange objects in the workspace horizontally or vertically. In the Task Manager, you can also arrange tasks evenly in the workspace by choosing Tile. To arrange objects in the workspace, select Layout-Arrange and choose Horizontal, Vertical, or Tile.

Zooming the WorkspaceYou can zoom in and out as well as pan the workspace to adjust the view.

Use the following toolbar or Layout menu options to set zoom levels:

♦ Zoom Center In/Out by 10%. Increases or decreases the magnification by 10% increments while maintaining the center of the view.

♦ Zoom Point In/Out by 10%. Uses a point you select as the center point and increases or decreases the magnification by 10% increments.

♦ Zoom Rectangle. Increases the current magnification of a rectangular area you select. Degree of magnification depends upon the size of the area you select, workspace size, and current magnification.

♦ Zoom Normal. Sets the zoom level to 100%.

♦ Scale to Fit. Scales all workspace objects to fit the workspace.

Find Next Button

Find Field

Navigating the Workspace 71

♦ Zoom Percent. Sets the zoom level to the percent you choose while maintaining the center of the view.

To maximize the size of the workspace window, choose View-Full Screen. To go back to normal view, click the Close Full Screen button or press Esc.

To pan the workspace, click Layout-Pan or click the Pan button on the toolbar. Drag the focus of the workspace window and release the mouse button when it is in the appropriate position. Double-click the workspace to stop panning.


Working with Repository Objects

The Workflow Manager allows you to perform the following general operations with repository objects:

♦ View properties for each object.

♦ Enter descriptions for each object.

♦ Rename an object.

To edit any repository object, you must first add a repository in the Navigator so you can access the repository object. To add a repository in the Navigator, choose Repository-Add or click the Add Repository button on the Repository toolbar. Enter the repository name and user name and click OK.

Viewing Object PropertiesTo view properties of a repository object, first select the repository object in the Navigator. Choose View-Properties to view object properties. Or, right-click the repository object and choose Properties.

You can view properties of a folder, task, worklet, or workflow. For folders, the Workflow Manager displays folder name and whether the folder is shared. Object properties are read-only.

You can also view dependencies for repository objects, for more information about viewing object dependencies, see the Repository Guide.

Entering Descriptions for Repository ObjectsWhen you edit an object in the Workflow Manager, you can enter descriptions and comments for that object. The maximum number of characters you can enter is 2,000 bytes/K, where K is the maximum number of bytes a character contains in the selected repository code page. For example, if the repository code page is a Japanese code page where the each character can contain up to two bytes (K=2), each description and comment field allows you to enter up to 1,000 characters.

Renaming Repository ObjectsYou can rename repository objects by clicking the Rename button in the Edit Tasks dialog box or the Edit Workflow dialog box. You can also rename repository objects by clicking the object name in the workspace and typing in the new name.

Working with Repository Objects 73

Checking Out and In Versioned Repository Objects

When you work with versioned objects, you check out an object when you want to change it, and check it in when you want to commit your changes to the repository. Checking in new objects adds a new version to the object history.

For more information, see “Working with Versioned Objects” in the Repository Guide.

Checking Out ObjectsWhen you open an object in the workspace, the repository checks out the object and locks the object for your use. No other user can check out the object. If another user has checked out the object, you can open the object as read-only.

You can view objects you and other users have checked out. You might want to view checkouts to see if an object is available for you to work with, or if you need to check in all of the objects you have worked with.

For more information on viewing object checkouts, see “Working with Versioned Objects” in the Repository Guide.

Checking In ObjectsYou commit changes to the repository by checking in objects. When you check in an object, the repository creates a new version of the object and assigns it a version number. The repository increments the version number by one each time it creates a new version.

You can check in an object from the Workflow Manager workspace. To do this, select the object and choose Versioning-Check in.

You can check in an object when you review the results of the following tasks:

♦ View object history. You can check in an object from the View History window when you view the history of an object.

♦ View checkouts. You can check in an object from the View Checkouts window when you search for checked out objects.

♦ View query results. You can check in an object from the Query Results window when you search for object dependencies or run an object query.

To check in an object, select the object or objects and choose Versioning-Check in.

Enter text into the comment field in the Check In dialog box.


Figure 3-3 shows the Check In dialog box:

When you check in an object, the repository creates a new version of the object and increments the version number by one.

Figure 3-3. Check In Workflow Manager Objects

Apply the check in comment to multiple objects.

Checking Out and In Versioned Repository Objects 75

Searching For Versioned Objects

You can use an object query to search for versioned objects in the repository that meet specified conditions. When you run a query, the repository returns results based on those conditions. You may want to create an object query to perform the following tasks:

♦ Track repository objects during development. You can add Label, User, Last saved, or Comments parameters to queries to track objects during development. For more information about creating object queries, see “Grouping Versioned Objects” in the Repository Guide.

♦ Associate a query with a deployment group. When you create a dynamic deployment group, you can associate an object query with it. For more information about working with deployment groups, see “Copying Folders and Deployment Groups” in the Repository Guide.

To create an object query, choose Versioning-Queries to open the Query Browser.

Figure 3-4 shows the Query Browser:

From the Query Browser, you can create, edit, and delete queries. You can also configure permissions for each query from the Query Browser. You can run any queries for which you have read permissions from the Query Browser.

For information about working with object queries, see “Grouping Versioned Objects” in the Repository Guide.

Figure 3-4. Query Browser

Create a query.

Edit a query.

Delete a query.

Run a query.

Configure permissions.


Copying Repository Objects

You can copy repository objects (such as workflows, worklets, or tasks) within the same folder, to a different folder, or to a different repository. If you want to copy the object to another folder, you must open the destination folder before you copy the object into the folder.

The Workflow Manager provides a Copy Wizard that allows you to copy objects. When you copy a workflow or a worklet, the Copy Wizard copies all of the worklets, sessions, and tasks in the workflow. You must resolve all conflicts that occur. Conflicts occur when the Copy Wizard finds a workflow or worklet with the same name in the target folder, or when the server connection does not exist in the target repository. If a server connection does not exist, you can skip the conflict and choose a server connection after you copy the workflow. You cannot copy server connections. Conflicts may also occur when you copy Session tasks.

For more details on the Copy Wizard, see “Copying Objects” in the Repository Guide.

You can configure display settings and functions of the Copy Wizard by choosing Tools-Options. For details, see “Configuring Miscellaneous Options” on page 43.

Note: The Workflow Manager provides an Import Wizard that allows you to import objects from an XML file. The Import Wizard provides the same options to resolve conflicts as the Copy Wizard. For details, see “Exporting and Importing Objects” in the Repository Guide.

Copying SessionsWhen you copy a Session task, the Copy Wizard looks for the database connection and associated mapping in the destination folder. If the mapping or connection does not exist in the destination folder, you can select a new mapping or connection. If the destination folder does not contain any mapping, you must first copy a mapping to the destination folder in the Designer before you can copy the session.

When you copy a session that has mapping variable values saved in the repository, the Workflow Manager either copies or retains the saved variable values.

Copying Workflow SegmentsYou can copy segments of workflows and worklets when you want to reuse a portion of workflow or worklet logic. A segment consists of one or more tasks, the links between the tasks, and any condition in the links. You can copy reusable and non-reusable objects when copying and pasting segments. You can copy segments of workflows or worklets into workflows and worklets within the same folder, within another folder, or within a folder in a different repository. You can also paste segments of workflows or worklets into an empty Workflow Designer or Worklet Designer workspace.

Copying Repository Objects 77

To copy a segment from a workflow or worklet:

1. Open the workflow or worklet.

2. Select a segment by highlighting each task you want to copy. You can select multiple reusable or non-reusable objects. You can also select segments by dragging the pointer in a rectangle around objects in the workspace.

3. Choose Edit-Copy or press Ctrl+C to copy the segment to the clipboard.

4. Open the workflow or worklet into which you want to paste the segment. You can also copy the object into the Workflow or Worklet Designer workspace.

5. Choose Edit-Paste or press Ctrl+V.

The Copy Wizard opens, and notifies you if it finds copy conflicts.

Note: You can copy individual non-reusable tasks by selecting the individual task and following the instructions for copying and pasting segments.


Comparing Repository Objects

The Workflow Manager allows you to compare two repository objects of the same type to identify differences between the objects. For example, if you have two similar Email tasks in a folder, you can compare them to see which one contains the attributes you need. When you compare two objects, the Workflow Manager displays their attributes in detail.

You can compare objects across folders and repositories. To do this, you must have both folders open. You can compare a reusable object with a non-reusable object. You can also compare two versions of the same object. For more information about versioned objects, see “Working with Versioned Objects” in the Repository Guide.

To compare objects, you must have read permission on each folder that contains the objects you want to compare.

You can compare the following types of objects:

♦ Tasks

♦ Sessions

♦ Worklets

♦ Workflows

You can also compare instances of the same type. For example, if the workflows you compare contain worklet instances with the same name, you can compare the instances to see if they differ. The Workflow Manager also allows you to compare the following instances and attributes:

♦ Instances of sessions and tasks in a workflow or worklet comparison. For example, when you compare workflows, you can compare task instances that have the same name.

♦ Instances of mappings and transformations in a session comparison. For example, when you compare sessions, you can compare mapping instances.

♦ The attributes of instances of the same type within a mapping comparison. For example, when you compare flat file sources, you can compare attributes, such as file type (delimited or fixed), delimiters, escape characters, and optional quotes.

You can compare schedulers and session configuration objects in the Repository Manager. You cannot compare objects of different types. For example, you cannot compare an Email task with a Session task.

When you compare objects, the Workflow Manager displays the results in the Diff Tool window. The Diff Tool output contains different nodes for different types of objects.

When you import Workflow Manager objects, you can compare object conflicts. For more information, see “Exporting and Importing Objects” in the Repository Guide.

Comparing Repository Objects 79

Steps for Comparing ObjectsUse the following procedure to compare objects.

To compare two objects:

1. Open the folders that contain the objects you want to compare.

2. Open the appropriate Workflow Manager tool.

3. Choose Tasks-Compare, Worklets-Compare, or Workflow-Compare.

A dialog box similar to the following one opens:

4. Click Browse to select an object.

5. Click Compare.

Tip: You can also compare objects from the Navigator or workspace. In the Navigator, select the objects, right-click and choose Compare Objects. In the workspace, select the objects, right-click and choose Compare Objects.


Figure 3-5 shows the result of comparing two objects:

You can further compare differences between object properties by clicking the Compare Further icon or by right-clicking the differences.

6. If you want to save the comparison as a text or HTML file, choose File-Save to File.

Figure 3-5. Diff Tool Window

Drill down to further compare objects.

Filter nodes that have same attribute values.

Displays the properties of the node you select.

Differences between object properties are marked.

Differences between objects are highlighted and the nodes are flagged.

Comparing Repository Objects 81

Working with Metadata Extensions

You can extend the metadata stored in the repository by associating information with individual repository objects. For example, you may wish to store your name with the worklets you create. If you create a session, you can store your telephone extension with that session. You associate information with repository objects using metadata extensions.

Repository objects can contain both vendor-defined and user-defined metadata extensions. You can view and change the values of vendor-defined metadata extensions, but you cannot create, delete, or redefine them. You can create, edit, delete, and view user-defined metadata extensions, as well as change their values.

You can create metadata extensions for the following objects in the Workflow Manager:

♦ Sessions

♦ Workflows

♦ Worklets

You can create both reusable and non-reusable metadata extensions. You associate reusable metadata extensions with all repository objects of a certain type such as all sessions or all worklets. You associate non-reusable metadata extensions with a single repository object such as one workflow. For more information about metadata extensions, see “Metadata Extensions” in the Repository Guide.

To create, edit, and delete user-defined metadata extensions in the Workflow Manager, you must have read and write permissions on the folder.

Creating a Metadata ExtensionYou can create user-defined, reusable and non-reusable metadata extensions for repository objects using the Workflow Manager. To create a metadata extension, you edit the object for which you want to create the metadata extension, and then add the metadata extension to the Metadata Extensions tab.

If you need to create multiple reusable metadata extensions, it is easier to create them using the Repository Manager. For details, see “Metadata Extensions” in the Repository Guide.

To create a metadata extension:

1. Open the appropriate Workflow Manager tool.

2. Drag the appropriate object into the workspace.

3. Double-click the title bar of the object to edit it.


4. Click the Metadata Extensions tab:

This tab lists the existing user-defined and vendor-defined metadata extensions. User-defined metadata extensions appear in the User Defined Metadata Domain. If they exist, vendor-defined metadata extensions appear in their own domains.

5. Click the Add button.

A new row appears in the User Defined Metadata Extension Domain.

6. Enter the information in Table 3-1:

Table 3-1. Metadata Extension Attributes in the Workflow Manager

Field Required/Optional Description

Extension Name Required Name of the metadata extension. Metadata extension names must be unique for each type of object in a domain. Metadata extension names cannot contain any special characters except underscores and cannot begin with numbers.

Datatype Required The datatype: numeric (integer), string, or boolean.

Precision Required for string objects

The maximum length for string metadata extensions.

User-Defined Metadata Extensions

Working with Metadata Extensions 83

7. Click OK.

Editing a Metadata ExtensionYou can edit user-defined, reusable, and non-reusable metadata extensions for repository objects using the Workflow Manager. To edit a metadata extension, you edit the repository object, and then make changes to the Metadata Extensions tab.

What you can edit depends on whether the metadata extension is reusable or non-reusable. You can promote a non-reusable metadata extension to reusable, but you cannot change a reusable metadata extension to non-reusable.

Editing Reusable Metadata ExtensionsIf the metadata extension you want to edit is reusable and editable, you can change the value of the metadata extension, but not any of its properties. However, if the vendor or user who created the metadata extension did not make it editable, you cannot edit the metadata extension or its value. For details, see “Metadata Extensions” in the Repository Guide.

To edit the value of a reusable metadata extension, click the Metadata Extensions tab and modify the Value field. To restore the default value for a metadata extension, click Revert in the UnOverride column.

Value Optional An optional value.For a numeric metadata extension, the value must be an integer between -2,147,483,647 and 2,147,483,647.For a boolean metadata extension, choose true or false.For a string metadata extension, click the Open button in the Value field to enter a value of more than one line, up to 2,147,483,647 bytes.

Reusable Required Makes the metadata extension reusable or non-reusable. Check to apply the metadata extension to all objects of this type (reusable). Clear to make the metadata extension apply to this object only (non-reusable).Note: If you make a metadata extension reusable, you cannot change it back to non-reusable. The Workflow Manager makes the extension reusable as soon as you confirm the action.

UnOverride Optional Restores the default value of the metadata extension when you click Revert. This column appears only if the value of one of the metadata extensions was changed.

Description Optional Description of the metadata extension.

Table 3-1. Metadata Extension Attributes in the Workflow Manager

Field Required/Optional Description


Editing Non-Reusable Metadata ExtensionsIf the metadata extension you want to edit is non-reusable, you can change the value of the metadata extension as well as its properties. You can also promote the metadata extension to a reusable metadata extension.

To edit a non-reusable metadata extension, click the Metadata Extensions tab. You can update the Datatype, Value, Precision, and Description fields. For a description of these fields, see Table 3-1 on page 83.

If you wish to make the metadata extension reusable, check Reusable. If you make a metadata extension reusable, you cannot change it back to non-reusable. The Workflow Manager makes the extension reusable as soon as you confirm the action.

To restore the default value for a metadata extension, click Revert in the UnOverride column.

Deleting a Metadata ExtensionYou can delete metadata extensions for repository objects. You delete reusable metadata extensions using the Repository Manager. You can delete non-reusable metadata extensions using the Workflow Manager. To do this, edit the repository object, and then delete the metadata extension from the Metadata Extensions tab.

Working with Metadata Extensions 85

Keyboard Shortcuts

When editing a repository object or maneuvering around the Workflow Manager, use the following Keyboard shortcuts to help you complete different operations quickly.

Table 3-2 lists the Workflow Manager keyboard shortcuts for editing a repository object:

Table 3-3 lists the Workflow Manager keyboard shortcuts for navigating in the workspace:

Table 3-2. Workflow Manager Keyboard Shortcuts

To Press

Cancel editing in a cell Esc

Check and uncheck a check box. Space Bar

Copy text from a cell onto the clipboard. Ctrl+C

Cut text from a cell onto the clipboard. Ctrl+X

Edit the text of a cell. F2. Then move the cursor to the desired location.

Find all combination and list boxes. Type the first letter on the list.

Find tables or fields in the workspace. Ctrl+F

Move around cells in a dialog box. Ctrl+directional arrows

Paste copied or cut text from the clipboard into a cell. Ctrl+V

Select the text of a cell. F2

Table 3-3. Keyboard Shortcuts for Navigating the Workspace

To Press

Create links. Ctrl+F2. Press Ctrl+F2 to select first task you want to link. Press Tab to select the rest of the tasks you want to link. Press Ctrl+F2 again to link all the tasks you selected.

Edit task name in the workspace. F2

Expand selected node and all its children. SHIFT + * (use asterisk on numeric keypad )

Move across Select tasks in the workspace. Tab

Select multiple tasks. Ctrl+mouse click


C h a p t e r 4

Working with Workflows


♦ Overview, 88

♦ Developing Workflows, 91

♦ Using the Workflow Wizard, 99

♦ Using Workflow Variables, 103

♦ Scheduling a Workflow, 112

♦ Validating a Workflow, 119

♦ Running the Workflow, 122

♦ Suspending the Workflow, 127

♦ Stopping or Aborting the Workflow, 129

87

Overview

A workflow is a set of instructions that tells the PowerCenter Server how to execute tasks such as sessions, email notifications, and shell commands. After you create tasks in the Task Developer and Workflow Designer, you connect the tasks with links to create a workflow.

In the Workflow Designer, you can specify conditional links and use workflow variables to create branches in the workflow. The Workflow Manager also provides Event-Wait and Event-Raise tasks so you can control the sequence of task execution in the workflow. You can also create worklets and nest them inside the workflow.

Every workflow contains a Start task, which represents the beginning of the workflow.

Figure 4-1 shows a sample workflow:

You can create workflows with branches to execute tasks concurrently.

Figure 4-1. Sample Workflow

Start Task LinkSession Task

Workflow Tasks

Assignment Task Command Task

88 Chapter 4: Working with Workflows

Figure 4-2 shows a sample workflow with two branches:

After you create a workflow, select a PowerCenter Server to run the workflow. You can then start the workflow using the Workflow Manager, Workflow Monitor, or pmcmd.

Use the Workflow Monitor to see the progress of a workflow during its run. The Workflow Monitor can also show the history of a workflow. For more information about the Workflow Monitor, see “Monitoring Workflows” on page 401.

Use the following guidelines when you develop a workflow:

1. Create a new workflow. Create a new workflow in the Workflow Designer. For details on creating a new workflow, see “Creating a New Workflow” on page 91.

2. Add tasks in the workflow. You might have already created tasks in the Task Developer. Or, you can add tasks to the workflow as you develop the workflow in the Workflow Designer. For details on workflow tasks, see “Working with Tasks” on page 131.

3. Connect tasks with links. After you add tasks in the workflow, connect them with links to specify the order of execution in the workflow. For details on links, see “Working with Links” on page 92.

4. Specify conditions for each link. You can specify conditions on the links to create branches and dependencies. For details, see “Working with Links” on page 92.

5. Validate workflow. Validate the workflow in the Workflow Designer to identify errors. For details on validation rules, see “Validating a Workflow” on page 119.

6. Save workflow. When you save the workflow, the Workflow Manager validates the workflow and updates the repository.

7. Run workflow. In the workflow properties, select a PowerCenter Server to run the workflow. Run the workflow from the Workflow Manager, Workflow Monitor, or pmcmd. You can monitor the workflow in the Workflow Monitor. For details on starting a workflow, see “Running the Workflow” on page 122.

For a complete list of workflow properties, see “Workflow Properties Reference” on page 721.

Figure 4-2. Sample Workflow With Two Branches

Overview 89

Workflow PrivilegesYou need the one of the following privileges to create a workflow:

♦ Use Workflow Manager privilege with read and write folder permissions


You need one of the following privileges to run, schedule, and monitor the workflow:

♦ Workflow Operator privilege



Developing Workflows

The first step to develop a workflow is to create a new workflow in the Workflow Designer. A workflow must contain a Start task. The Start task represents the beginning of a workflow. When you create a workflow, the Workflow Designer creates a Start task and adds it to the workflow. You cannot delete the Start task.

After you create a new workflow, the next step is to add tasks to the workflow. The Workflow Manager includes tasks such as the Session task, the Command task, and the Email task so you can design your workflow.

Finally, you connect workflow tasks with links to specify the order of execution in the workflow. You can add conditions to links.

Creating a New WorkflowYou must create a workflow before you can add tasks such as a Session, Command, or Email. When adding a session, if the workspace in the Workflow Designer is empty, you can create a workflow automatically.

To create a workflow manually:

1. Open the Workflow Designer.

2. Choose Workflows-Create.

3. Enter a name for the new workflow.

4. Click OK.

The Workflow Designer creates a Start task in the new workflow.

For information on using the Workflow Wizard, see “Using the Workflow Wizard” on page 99.

Developing Workflows 91

To create a workflow automatically:

1. Open the Workflow Designer. Close any open workflow.

2. Click the session button on the Tasks toolbar.

3. Click in the Workflow Designer workspace.

The Mappings dialog box displays.

4. Select a mapping to associate with the session and click OK.

The Create Workflow dialog box appears. The Workflow Designer names the workflow wf_MappingName by default. You can rename the workflow or change other workflow properties. For more information on workflow properties, see “Workflow Properties Reference” on page 721.

5. Click OK.

The Workflow Designer creates a workflow for the session.

Adding Tasks to WorkflowsAfter you create a new workflow, you add tasks you want to execute in the workflow. You may already have created tasks in the Task Developer. Or, you may want to create tasks in the Workflow Designer as you develop the workflow.

If you have already created tasks in the Task Developer, add them to the workflow by dragging the tasks from the Navigator window to the Workflow Designer workspace.

To create and add tasks as you develop the workflow, choose Tasks-Create in the Workflow Designer. Or, you can also use the Tasks toolbar to create and add tasks to the workflow. Click the button on the Tasks toolbar for the task you want to create. Click again in the Workflow Designer workspace to create and add the task.

Tasks you create in the Workflow Designer are non-reusable. Tasks you create in the Task Developer are reusable. For more information about reusable tasks, see “Reusable Workflow Tasks” on page 135.

Working with LinksUse links to connect each workflow task. You can specify conditions with links to create branches in the workflow. The Workflow Manager does not allow you to use links to create loops in the workflow. Each link in the workflow can execute only once.

The workflow in Figure 4-3 is not a loop because each task runs at most once.


Figure 4-3 shows a valid workflow:

The Workflow Manager does not allow you to create a workflow that contains a loop, such as the loop shown in Figure 4-4. Figure 4-4 shows a loop where the three sessions may be run multiple times:

Use the following procedure to link tasks in the Workflow Designer or the Worklet Designer.

To link two tasks:

1. In the Tasks toolbar, click the link button.

2. In the workspace, click the first task you want to connect and drag it to the second task.

3. A link appears between the two tasks.

If you have a number of tasks that you want to link concurrently, you may not wish to connect each link manually. To quickly link tasks concurrently, use the following procedure.

To link several tasks concurrently:

1. In the workspace, click the first task you want to connect.

2. Ctrl-click all other tasks you want to connect.

Figure 4-3. Valid Workflow

Figure 4-4. Example of a Loop

Link Button


Note: Do not use Ctrl+A or Edit-Select to choose tasks.

3. Choose Tasks-Link concurrent.

4. A link appears between the first task you selected and each task you added. The first task you selected links to each task concurrently.

If you have a number of tasks that you want to link sequentially, you may not wish to connect each link manually. To quickly link tasks sequentially, use the following procedure.

To link several tasks sequentially:

1. In the workspace, click the first task you want to connect.

2. Ctrl-click the next task you want to connect. Continue to add tasks in the order you want them to run.

3. Choose Tasks-Link sequential.

4. Links appear in sequential order between the first task and each subsequent task you added.

Specifying Link ConditionsOnce you create links between tasks, you can specify conditions for each link to determine the order of execution in the workflow. If you do not specify conditions for each link, the PowerCenter Server executes the next task in the workflow by default.

You can use pre-defined or user-defined workflow variables in the link condition. If the link condition evaluates to True, the PowerCenter Server executes the next task in the workflow. If the link condition evaluates to False, the PowerCenter Server does not execute the next task in the workflow.

You can view results of link evaluation during workflow runs in the workflow log file.

Example of Link ConditionsYou can use link conditions to specify the order of execution in the workflow or to create branches in the workflow. For example, you may have two Session tasks in the workflow, s_STORES_CA and s_STORES_AZ. You want the PowerCenter Server to run the second Session task only if the first Session task has no target failed rows.

To accomplish this, you can set the link condition between the two sessions so that the s_STORES_AZ executes only if the number of failed target rows for S_STORES_CA is zero.


Figure 4-5 shows how to set the link condition using the target failed rows variable for S_STORES_CA:

After you specify the link condition in the Expression Editor, the Workflow Manager validates the link condition and displays it next to the link in the workflow.

Figure 4-6 shows the link condition displayed in the workspace:

To specify a condition for a link:

1. In the Workflow Designer workspace, double-click the link you want to specify.

or

Right-click the link and choose Edit. The Expression Editor displays.

Figure 4-5. Setting Link Condition

Figure 4-6. Displaying Link Condition in the Workflow

Link Condition


2. In the Expression Editor, enter the link condition.

The Expression Editor provides pre-defined workflow variables, user-defined workflow variables, variable functions, and boolean and arithmetic operators.

3. Validate the expression using the Validate button. The Workflow Manager displays error messages in the Output window.

Tip: Click and drag the end point of a link to move it from one task to another without losing the link condition.

Using the Expression EditorThe Workflow Manager provides an Expression Editor for any expressions in the workflow. You can enter expressions using the Expression Editor for the following:

♦ Link conditions

♦ Decision task

♦ Assignment task

Figure 4-7 shows the Expression Editor:

The Expression Editor displays system variables, user-defined, and pre-defined workflow variables such as $Session.status. For details on workflow variables, see “Using Workflow Variables” on page 103.

The Expression Editor also displays a list of functions. PowerCenter uses a SQL-like language that contains many functions designed to handle common expressions. For example, you can use the ABS function to find the absolute value. For a complete list of functions, see the Transformation Language Reference.

Figure 4-7. Expression Editor


Adding CommentsThe Expression Editor also allows you to add comments using -- or // comment indicators. You can use comments to give descriptive information about the expression, or you can specify a valid URL to access business documentation about the expression.

For examples on adding comments to expressions, see “The Transformation Language” in the Transformation Language Reference.

Validating ExpressionsYou can use the Validate button to validate an expression. If you do not validate an expression, the Workflow Manager validates it when you close the Expression Editor. You cannot run a workflow with invalid expressions.

Expressions in link conditions and Decision task conditions must evaluate to a numerical value. Workflow variables used in expressions must exist in the workflow.

Expression Editor DisplayThe Expression Editor can display syntax expressions in different colors for better readability. If you have the latest Rich Edit control, riched20.dll, installed on your system, the Expression Editor displays expression functions in blue, comments in grey, and quoted strings in green.

You can resize the Expression Editor. Expand the dialog box by dragging from the borders. The Workflow Manager saves the new size for the dialog box as a client setting.

Deleting a WorkflowYou may decide to delete a workflow that you no longer use. When you delete a workflow, you delete all non-reusable tasks and reusable task instances associated with the workflow. Reusable tasks used in the workflow remain in the folder when you delete the workflow.

If you delete a workflow that is running, the PowerCenter Server aborts the workflow. If you delete a workflow that is scheduled to run, the PowerCenter Server removes the workflow from the schedule.

You can delete a workflow in the Navigator window, or you can delete the workflow currently displayed in the Workflow Designer workspace.

♦ To delete a workflow from the Navigator window, open the folder, select the workflow and press the Delete key.

♦ To delete a workflow currently displayed in the Workflow Designer workspace, choose Workflows-Delete.


Editing a WorkflowWhen you edit a workflow, the repository updates the workflow information when you save the workflow. If a workflow is running when you make edits, the PowerCenter Server uses the updated information the next time you run the workflow.

Viewing Links in Workflow or WorkletWhen you edit a workflow or worklet, you can view the forward or backward link paths to other tasks. You can highlight paths to see links in the workflow branch from the Start task to the last task in the branch.

Note: You can configure the color the Workflow Manager uses to display links. When you configure the format options, choose the Link Selection option.

To view link paths:

1. In the Worklet Designer or Workflow Designer, right-click a task and choose Highlight Path.

2. Choose Forward Path, Backward Path, or Both.

The Workflow Manager highlights all links in the branch you select.

Deleting Links in a Workflow or WorkletWhen you edit a workflow or worklet, you can delete multiple links at once without deleting the connected tasks.

To delete multiple links:

1. In the Worklet Designer or Workflow Designer, select all links you want to delete.

Tip: You can use the mouse to click and drag the selection, or you can Ctrl-click the tasks and links.

2. Choose Edit-Delete Links.

The Workflow Manager removes all selected links.


Using the Workflow Wizard

You can use the Workflow Wizard to automate the process of creating sessions, adding sessions to a workflow, and linking sessions to create a workflow. The Workflow Wizard creates sessions from mappings and adds them to the workflow. It also creates a Start task and allows you to schedule the workflow. You can add tasks and edit other workflow properties after the Workflow Wizard completes. If you want to create concurrent sessions, use the Workflow Designer to manually build a workflow.

Before you create a workflow, verify that the folder contains a valid mapping for the Session task.

Complete the following steps to build a workflow using the Workflow Wizard:

1. Assign a name and PowerCenter Server to the workflow.

2. Create a session.

3. Schedule the workflow.

Step 1. Assign a Name and PowerCenter Server to the WorkflowIn the first step of the Workflow Wizard, you add the name and description of the workflow and choose the PowerCenter Server to run the workflow.

To create the workflow:

1. In the Workflow Manager, open the folder containing the mapping you want to use in the workflow.

2. Open the Workflow Designer.

3. Choose Workflows-Wizard.

Using the Workflow Wizard 99

The Workflow Wizard appears.

4. Enter a name for the workflow.

The convention for naming workflows is wf_WorkflowName. For a complete list of naming conventions for repository objects, see “Naming Conventions” in Getting Started.

5. Enter a description for the workflow.

6. Choose the PowerCenter Server to run the workflow, and click Next.

The next step is to create a session.

Step 2. Create a SessionIn the second step of the Workflow Wizard, you create a session based on a mapping. You can add tasks later in the Workflow Designer workspace. For details on working with tasks, see “Working with Tasks” on page 131.

To create a session:

1. In the second step of the Workflow Wizard, select a valid mapping and click the right arrow button.

The Workflow Wizard creates a Session task in the right pane using the selected mapping and names it s_MappingName by default.


The following figure shows a mapping selected for a session:

2. You can select additional mappings to create more Session tasks in the workflow.

When you add multiple mappings to the list, the Workflow Wizard creates sequential sessions in the order you add them.

3. Use the arrow buttons to change the session order.

4. Specify whether the session should be reusable.

When you create a reusable session, you can use the session in other workflows. For details on reusable sessions, see “Working with Tasks” on page 131

5. Specify how you want the PowerCenter Server to run the workflow.

You can specify that the PowerCenter Server runs sessions only if previous sessions complete, or you can specify that the PowerCenter Server always runs each session. When you select this option, it applies to all sessions you create using the Workflow Wizard.

Step 3. Schedule a WorkflowIn the third step of the Workflow Wizard, you can schedule a workflow to run continuously, repeat at a given time or interval, or start manually. The PowerCenter Server runs a workflow unless the prior workflow run fails. When a workflow fails, the PowerCenter Server removes the workflow from the schedule, and you must reschedule it. You can do this in the Workflow Manger or using pmcmd.

Using the Workflow Wizard 101

To schedule a workflow:

1. In the third step of the Workflow Wizard, configure the scheduling and run options. For more information about scheduling a workflow, see “Scheduling a Workflow” on page 112.

2. Click Next.

The Workflow Wizard displays the settings for the workflow:

3. Verify the workflow settings and click Finish. To edit settings, click Back.

The completed workflow opens in the Workflow Designer workspace. From the workspace, you can add tasks, create concurrent sessions, add conditions to links, or modify properties.

4. When you finish modifying the workflow, choose Repository-Save.


Using Workflow Variables

You can create and use variables in a workflow to reference values and record information. For example, you can use a variable in a Decision task to determine whether the previous task ran properly. If it did, you can run the next task. If not, you can stop the workflow.

You can use the following types of workflow variables:

♦ Pre-defined workflow variables. The Workflow Manager provides pre-defined workflow variables for tasks within a workflow. For more information, see “Pre-Defined Workflow Variables” on page 105.

♦ User-defined workflow variables. You create user-defined workflow variables when you create a workflow. For more information, see “User-Defined Workflow Variables” on page 108.

You can use workflow variables when you configure the following types of tasks:

♦ Assignment tasks. You can use an Assignment task to assign a value to a user-defined workflow variable. For example, you can increment a user-defined counter variable by setting the variable to its current value plus 1. For information on using workflow variables in Assignment tasks, see “Working with the Assignment Task” on page 140.

♦ Decision tasks. Decision tasks determine how the PowerCenter Server executes a workflow. For example, you can use the Status variable to run a second session only if the first session completes successfully. For information on using workflow variables in Decision tasks, see “Working with the Decision Task” on page 149.

♦ Links. Links connect each workflow task. You can use workflow variables in links to create branches in the workflow. For example, after a Decision task, you can create one link to follow when the decision condition evaluates to true, and another link to follow when the decision condition evaluates to false. For information on using workflow variables in Link tasks, see “Working with Links” on page 92.

♦ Timer tasks. Timer tasks specify when the PowerCenter Server begins to execute the next task in the workflow. You can use a user-defined date/time variable to specify the exact time the PowerCenter Server starts to execute the next task. For information on using workflow variables in Timer tasks, see “Working with the Timer Task” on page 161.

You can use the Expression Editor to create an expression that uses variables.

Using Workf low Variables 103

Figure 4-8 shows the Expression Editor:

When you build an expression, you can select pre-defined variables on the Pre-Defined tab. You can select user-defined variables on the User-Defined tab. The Functions tab contains functions that you can use with workflow variables.

Use the point-and-click method to enter an expression using a variable. For information on using the Expression Editor, see “Using the Expression Editor” on page 96.

You can use the following keywords to write expressions for user-defined and pre-defined workflow variables:

♦ AND

♦ OR

♦ NOT

♦ TRUE

♦ FALSE

♦ NULL

♦ SYSDATE

Figure 4-8. Expression EditorSelect pre-defined variables.

Select user-defined variables.

Create an expression using variables.


Pre-Defined Workflow VariablesEach workflow contains a set of pre-defined variables that you can use to evaluate workflow and task conditions. You can use the following types of pre-defined variables:

♦ Task-specific variables. The Workflow Manager provides a set of task-specific variables for each task in the workflow. You can use task-specific variables in a link condition to control the path the PowerCenter Server takes when running the workflow. The Workflow Manager lists task-specific variables under the task name in the Expression Editor.

♦ System variables. You can use the SYSDATE and WORKFLOWSTARTTIME system variables within a workflow. For more information on system variables, see “Variables” in the Transformation Language Reference. The Workflow Manager lists system variables under the Built-in node in the Expression Editor.

Table 4-1 lists the task-specific workflow variables available in the Workflow Manager:

Table 4-1. Task-Specific Workflow Variables

Task-Specific Variables Description Task Types Datatype

Condition Evaluation result of decision condition expression.If the task fails, the Workflow Manager keeps the condition set to null.

Decision Integer

EndTime Date and time the associated task ended. All tasks Date/time

ErrorCode Last error code for the associated task. If there is no error, the PowerCenter Server sets ErrorCode to 0 when the task completes.

All tasks Integer

ErrorMsg Last error message for the associated task.If there is no error, the PowerCenter Server sets ErrorMsg to an empty string when the task completes.

All tasks Nstring*

FirstErrorCode Error code for the first error message in the session.If there is no error, the PowerCenter Server sets FirstErrorCode to 0 when the session completes.

Session Integer

FirstErrorMsg The first error message in the session.If there is no error, the PowerCenter Server sets FirstErrorMsg to an empty string when the task completes.

Session Nstring*

PrevTaskStatus Status of the previous task in the workflow that the PowerCenter Server ran. Statuses include:- ABORTED- FAILED- STOPPED- SUCCEEDEDUse these key words when writing expressions to evaluate the status of the previous task. For more information, see �Evaluating Task Status in a Workflow� on page 107.

All tasks Integer

SrcFailedRows Total number of rows the PowerCenter Server failed to read from the source.

Session Integer

SrcSuccessRows Total number of rows successfully read from the sources. Session Integer


All pre-defined workflow variables except Status have a default value of null. The PowerCenter Server uses the default value of null when it encounters a pre-defined variable from a task that has not yet run in the workflow. Therefore, expressions and link conditions that depend upon tasks not yet run are valid. The default value of Status is NOTSTARTED.

Using Pre-Defined Workflow Variables in ExpressionsWhen you use a workflow variable in an expression, the PowerCenter Server evaluates the expression and returns True or False. If the condition evaluates to true, the PowerCenter Server runs the next task. The PowerCenter Server writes an entry in the workflow log similar to the following message:

INFO : LM_36506 : (1980|1040) Link [Session2 --> Session3]: condition is TRUE for the expression [$Session2.PrevTaskStatus = SUCCEEDED].

The Expression Editor displays the pre-defined workflow variables on the Pre-defined tab. The Workflow Manager groups task-specific variables by task and lists system variables under the Built-in node. To use a variable in an expression, double-click the variable. The Expression Editor displays task-specific variables in the Expression field in the following format:

$<TaskName>.<Pre-definedVariable>

StartTime Date and time the associated task started. All tasks Date/time

Status Status of the previous task in the workflow. Task statuses include:- ABORTED- DISABLED- FAILED- NOTSTARTED- STARTED- STOPPED- SUCCEEDEDUse these key words when writing expressions to evaluate the status of the current task. For more information, see �Evaluating Task Status in a Workflow� on page 107.

All tasks Integer

TgtFailedRows Total number of rows the PowerCenter Server failed to write to the target.

Session Integer

TgtSuccessRows Total number of rows successfully written to the targets. Session Integer

TotalTransErrors Total number of transformation errors. Session Integer

* Variables of type Nstring can have a maximum length of 600 characters.

Table 4-1. Task-Specific Workflow Variables

Task-Specific Variables Description Task Types Datatype


Figure 4-9 shows the Expression Editor with an expression using a task-specific workflow variable and keyword:

Evaluating Task Status in a WorkflowYou can use Status and PrevTaskStatus in link conditions to test the status of tasks in a workflow. Use Status to test the status of the previous task in the workflow. Use PrevTaskStatus to test the status of the previous task in the workflow that the PowerCenter Server ran.

Use PrevTaskStatus if you disable a task in the workflow. Status and PrevTaskStatus return the same value unless the condition uses a disabled task.

Figure 4-10 shows a workflow with link conditions using Status:

When you run the workflow, the PowerCenter Server evaluates the link condition and returns the value based on the status of Session2.

Figure 4-9. Expression Using a Pre-Defined Workflow Variable

Figure 4-10. Status Variable Example

Link condition:$Session2.Status = SUCCEEDEDThe PowerCenter Server returns value based on the previous task in the workflow, Session2.

Previous Task in Workflow


Figure 4-11 shows a workflow with link conditions using PrevTaskStatus:

When you run the workflow, the PowerCenter Server skips Session2 because the session is disabled. When the PowerCenter Server evaluates the link condition, it returns the value based on the status of Session1.

Tip: If you do not disable Session2, the PowerCenter Server returns the value based on the status of Session2. You do not need to change the link condition when you enable and disable Session2.

User-Defined Workflow VariablesYou can create your own variables within a workflow. When you create a variable in a workflow, it is valid only in that workflow. You can use the variable in tasks within that workflow. You can edit and delete user-defined workflow variables.

You can use user-defined variables when you need to make a workflow decision based on criteria you specify. For example, suppose you create a workflow to load data to an orders database nightly. You also need to load a subset of this data to headquarters periodically, perhaps every tenth time you update the local orders database. You create separate sessions to update the local database and the one at headquarters. The workflow looks like Figure 4-12:

Figure 4-11. PrevTaskStatus Variable Example

Figure 4-12. Sample Workflow Using Workflow Variable

Disabled Task

Link condition:$Session2.PrevTaskStatus = SUCCEEDEDThe PowerCenter Server returns value based on the previous task run, Session1.

Previous Task Run


You can use a user-defined variable to determine when to run the session that updates the orders database at headquarters.

To do this, set up the workflow as follows:

1. Create a persistent workflow variable, $$WorkflowCount, to represent the number of times the workflow has run.

2. Add a Start task and both sessions to the workflow.

3. Place a Decision task after the session that updates the local orders database.

Set up the decision condition to check to see if the number of workflow runs is evenly divisible by 10. You can use the modulus (MOD) function to do this.

4. Create an Assignment task to increment the $$WorkflowCount variable by one.

5. Link the Decision task to the session that updates the database at headquarters when the decision condition evaluates to true. Link it to the Assignment task when the decision condition evaluates to false.

When you do this, the session that updates the local database runs every time the workflow runs. The session that updates the database at headquarters runs every 10th time the workflow runs.

Start and Current ValuesConceptually, the PowerCenter Server holds two different values for a workflow variable during a workflow run:

♦ Start value of a workflow variable

♦ Current value of a workflow variable

The start value is the value of the variable at the start of the workflow. The start value could be a value defined in the parameter file for the variable, a value saved in the repository from the previous run of the workflow, a user-defined initial value for the variable, or the default value based on the variable datatype.

The PowerCenter Server looks for the start value of a variable in the following order:

1. Value in parameter file

2. Value saved in the repository (if the variable is persistent)

3. User-specified default value

4. Datatype default value

For a list of datatype default values, see Table 4-2 on page 110.

For example, you create a workflow variable in a workflow and enter a default value, but you do not define a value for the variable in a parameter file. The first time the PowerCenter Server runs the workflow, it evaluates the start value of the variable to the user-defined default value.


If you declare the variable as persistent, the PowerCenter Server saves the value of the variable to the repository at the end of the workflow run. The next time the workflow runs, the PowerCenter Server evaluates the start value of the variable as the value saved in the repository.

If the variable is non-persistent, the PowerCenter Server does not save the value of the variable. The next time the workflow runs, the PowerCenter Server evaluates the start value of the variable as the user-specified default value.

If you want to override the value saved in the repository before running a workflow, you need to define a value for the variable in a parameter file. When you define a workflow variable in the parameter file, the PowerCenter Server uses this value instead of the value saved in the repository or the configured initial value for the variable.

The current value is the value of the variable as the workflow progresses. When a workflow starts, the current value of a variable is the same as the start value. The value of the variable can change as the workflow progresses if you create an Assignment task that updates the value of the variable.

If the variable is persistent, the PowerCenter Server saves the current value of the variable to the repository at the end of a successful workflow run. If the workflow fails to complete, the PowerCenter Server does not update the value of the variable in the repository.

The PowerCenter Server states the value saved to the repository for each workflow variable in the workflow log.

Datatype Default ValuesIf the PowerCenter Server cannot determine the start value of a variable by any other means, it uses a default value for the variable based on its datatype. For more information on how the PowerCenter Server determines start values for a variable, see “Start and Current Values” on page 109.

Table 4-2 lists the datatype default values for user-defined workflow variables:

Creating User-Defined Workflow VariablesYou can create workflow variables for a workflow in the workflow properties.

Table 4-2. Datatype Default Values for User-defined Workflow Variables

Datatype Workflow Manager Default Value

Date/time 1/1/1753 A.D.

Double 0

Integer 0

Nstring Empty string


To create a workflow variable:

1. In the Workflow Designer, create a new workflow or edit an existing one.

2. Select the Variables tab.

3. Click Add and enter a name for the variable.

The correct format for a user-defined workflow variable is $$VariableName. Do not use a single $ for a user-defined workflow variable. The single $ is reserved for system variables and pre-defined workflow variables.

Workflow variable names are not case-sensitive.

4. In the Datatype field, select the datatype for the new variable. You can select from the following datatypes:

♦ Date/time

♦ Double

♦ Integer

♦ Nstring

Variables of type Nstring can have a maximum length of 600 characters.

5. Enable the Persistent option if you want the value of the variable retained from one execution of the workflow to the next. For more information, see “Start and Current Values” on page 109.

6. Enter the default value for the variable in the Default field. If the default value is a null value, enable the Is Null option.

7. To validate the default value of the new workflow variable, click the Validate button.

8. Click Apply to save the new workflow variable.

9. Click OK to close the workflow properties.

Add Button

Validate Button


Scheduling a Workflow

You can schedule a workflow to run continuously, repeat at a given time or interval, or you can manually start a workflow. The PowerCenter Server runs a scheduled workflow as configured.

By default, the workflow runs on demand. You can change the schedule settings by editing the scheduler. If you change schedule settings, the PowerCenter Server reschedules the workflow according to the new settings.

Each workflow has an associated scheduler. A scheduler is a repository object that contains a set of schedule settings. You can create a non-reusable scheduler for the workflow. Or, you can create a reusable scheduler so you can use the same set of schedule settings for workflows in the folder.

The Workflow Manager marks a workflow invalid if you delete the scheduler associated with the workflow.

If you choose a different PowerCenter Server for the workflow or restart the PowerCenter Server, it reschedules all workflows. This includes workflows that are scheduled to run continuously but whose start time has passed. You must manually reschedule workflows whose start time has passed if they are not scheduled to run continuously.

The PowerCenter Server does not run the workflow if:

♦ The prior workflow run fails. When a workflow fails, the PowerCenter Server removes the workflow from the schedule, and you must manually reschedule it. You can reschedule the workflow in the Workflow Manager or using pmcmd. In the Workflow Manager Navigator window, right-click the workflow and select Schedule Workflow. For more information about the pmcmd scheduleworkflow command, see “Scheduleworkflow” on page 604.

♦ You remove the workflow from the schedule. You can remove the workflow from the schedule in the Workflow Manager or using pmcmd. In the Workflow Manager Navigator window, right-click the workflow and select Unschedule Workflow. For more information about the pmcmd unscheduleworkflow command, see “Unscheduleworkflow” on page 610.

Note: The PowerCenter Server schedules the workflow in the time zone of the PowerCenter Server machine. For example, the PowerCenter Client is in your current time zone and the PowerCenter Server is in a time zone two hours later. If you schedule the workflow to start at 9 a.m., it starts at 9 a.m. in the time zone of the PowerCenter Server machine and 7 a.m. current time.

To schedule a workflow:

1. In the Workflow Designer, open the workflow.

2. Choose Workflows-Edit.

3. In the Scheduler tab, choose Non-reusable if you want to create a non-reusable set of schedule settings for the workflow.

Choose Reusable if you want to select an existing reusable scheduler for the workflow.


Note: If you do not have a reusable scheduler in the folder, you must create one before you choose Reusable. The Workflow Manager displays a warning message if you do not have an existing reusable scheduler.

4. Click the right side of the Scheduler field to edit scheduling settings for the scheduler.

For a complete list of scheduler options, see “Configuring Scheduler Settings” on page 114.

5. If you select Reusable, choose a reusable scheduler from the Scheduler Browser dialog box.

6. Click OK.

To remove a workflow from its schedule, right-click the workflow in the Navigator window and choose Unschedule Workflow.

Edit scheduler settings.

Scheduling a Workflow 113

To reschedule a workflow on its original schedule, right-click the workflow in the Navigator window and choose Schedule Workflow.

Creating a Reusable SchedulerFor each folder, the Workflow Manager allows you to create reusable schedulers so you can reuse the same set of scheduling settings for workflows in the folder. Use a reusable scheduler so you do not need to configure the same set of scheduling settings in each workflow.

When you delete a reusable scheduler, all workflows that use the deleted scheduler becomes invalid. To make the workflows valid, you must edit them and replace the missing scheduler.

To create a reusable scheduler:

1. In the Workflow Designer, choose Workflows-Schedulers.

2. Click Add to add a new scheduler.

3. In the General tab, enter a name for the scheduler.

4. Configure the scheduler settings in the Scheduler tab. For a complete list of scheduler settings, see Table 4-3 on page 115.

Configuring Scheduler SettingsConfigure the Schedule tab of the scheduler to set run options, schedule options, start options, and end options for the schedule.


Figure 4-13 shows the Schedule tab:

Table 4-3 describes the settings on the Schedule tab:

Figure 4-13. Schedule tab

Table 4-3. Schedule Tab Settings

Scheduler Options Required/Optional Description

Run Options:Run On Server Initialization/Run On Demand/Run Continuously

Optional Indicates the workflow schedule type.If you select Run On Server Initialization, the PowerCenter Server runs the workflow as soon as the server is initialized. The PowerCenter Server then starts the next run of the workflow according to settings in Schedule Options.If you select Run On Demand, the PowerCenter Server runs the workflow when you start the workflow manually.If you select Run Continuously, the PowerCenter Server runs the workflow as soon as the server initializes. The PowerCenter Server then starts the next run of the workflow as soon as it finishes the previous run.

Schedule Options:Run Once/Run Every/Customized Repeat

Optional Required if you select Run On Server Initialization, or if you do not choose any setting in Run Options.If you select Run Once, the PowerCenter Server runs the workflow once, as scheduled in the scheduler.If you select Run Every, the PowerCenter Server runs the workflow at regular intervals, as configured.If you select Customized Repeat, the PowerCenter Server runs the workflow on the dates and times specified in the Repeat dialog box.When you select Customized Repeat, click Edit to open the Repeat dialog box. The Repeat dialog box allows you to schedule specific dates and times for the workflow run. The selected scheduler appears at the bottom of the page.


Customizing Repeat OptionYou can schedule the workflow to run once, run at an interval, or customize your own repeat option. Click the Edit button to open the Customized Repeat dialog box.

Figure 4-14 shows the Customized Repeat dialog box:

Start Options: Start Date/Start Time

Optional Start Date indicates the date on which the PowerCenter Server begins the workflow schedule.Start Time indicates the time at which the PowerCenter Server begins the workflow schedule.

End Options: End On/End After/Forever

Required/Optional

Required if the workflow schedule is Run Every or Customized Repeat.If you select End On, the PowerCenter Server stops scheduling the workflow in the selected date.If you select End After, the PowerCenter Server stops scheduling the workflow after the set number of workflow runs.If you select Forever, the PowerCenter Server schedules the workflow as long as the workflow does not fail.

Figure 4-14. Customized Repeat Dialog Box

Table 4-3. Schedule Tab Settings



Table 4-4 describes options in the Customized Repeat dialog box:

Editing Scheduler SettingsYou can edit scheduler settings for both non-reusable and reusable schedulers.

♦ Non-reusable schedulers. When you configure or edit a non-reusable scheduler, check in the workflow to allow the schedule to automatically take effect.

You can update the schedule manually with the workflow checked out. Right-click the workflow in the Navigator, and select Schedule Workflow. Note that the changes are applied only to the latest checked-in version of the workflow.

Table 4-4. Repeat Dialog Box Options

Repeat Option Required/Optional Description

Repeat Every Required Enter the numeric interval you would like the PowerCenter Server to schedule the workflow, and then select Days, Weeks, or Months, as appropriate.If you select Days, select the appropriate Daily Frequency settings.If you select Weeks, select the appropriate Weekly and Daily Frequency settings.If you select Months, select the appropriate Monthly and Daily Frequency settings.

Weekly Required/Optional

Required to enter a weekly schedule. Select the day or days of the week on which you would like the PowerCenter Server to run the workflow.

Monthly Required/Optional

Required to enter a monthly schedule. If you select Run On Day, select the dates on which you want the workflow scheduled on a monthly basis. The PowerCenter Server schedules the workflow to run on the selected dates. If you select a numeric date exceeding the number of days within a given month, the PowerCenter Server schedules the workflow for the last day of the month, including leap years. For example, if you schedule the workflow to run on the 31st of every month, the PowerCenter Server schedules the session on the 30th of the following months: April, June, September, and November.If you select Run On The, select the week(s) of the month, then day of the week on which you want the workflow to run. For example, if you select Second and Last, then select Wednesday, the PowerCenter Server schedules the workflow to run on the second and last Wednesday of every month.

Daily Optional Enter the number of times you would like the PowerCenter Server to run the workflow on any day the session is scheduled.If you select Run Once, the PowerCenter Server schedules the workflow once on the selected day, at the time entered on the Start Time setting on the Time tab.If you select Run Every, enter Hours and Minutes to define the interval at which the PowerCenter Server runs the workflow. The PowerCenter Server then schedules the workflow at regular intervals on the selected day. The PowerCenter Server uses the Start Time setting for the first scheduled workflow of the day.


♦ Reusable schedulers. When you edit settings for a reusable scheduler, the repository creates a new version of the scheduler and increments the version number by one. To update a workflow with the latest schedule, check in the scheduler after you edit it.

When you configure a reusable scheduler for a new workflow, you must check in both the workflow and the scheduler to enable the schedule to take effect. Thereafter, when you check in the scheduler after revising it, the workflow schedule is updated automatically even if it is checked out.

You need to update the workflow schedule manually if you do not check in the scheduler. To update a workflow schedule manually, right-click the workflow in the Navigator, and select Schedule Workflow. Note that the new schedule is implemented only for latest version of the workflow that is checked in. Workflows that are checked out are not updated with the new schedule.

Disabling WorkflowsYou may want to disable the workflow while you edit it. This prevents the PowerCenter Server from running the workflow on its schedule. Select the Disable Workflows option on the General tab of the workflow properties. The PowerCenter Server does not run disabled workflows until you clear the Disable Workflows option. Once you clear the Disable Workflows option, the PowerCenter Server reschedules the workflow.


Validating a Workflow

Before you can run a workflow, you must validate it. When you validate the workflow, you validate all task instances in the workflow, including nested worklets.

The Workflow Manager validates the following properties:

♦ Expressions. Expressions in the workflow must be valid.

♦ Tasks. Non-reusable task and Reusable task instances in the workflow must follow validation rules.

♦ Scheduler. If the workflow uses a reusable scheduler, the Workflow Manager verifies that the scheduler exists.

The Workflow Manager also verifies that you linked each task properly. For example, you must link the Start task to at least one task in the workflow.

Note: The Workflow Manager validates Session tasks separately. If a session is invalid, the workflow may still be valid. For more information about session validation, see “Validating a Session” on page 195.

Expression ValidationThe Workflow Manager validates all expressions in the workflow. You can enter expressions in the Assignment task, Decision task, and link conditions. The Workflow Manager writes any error message to the Output window.

Expressions in link conditions and Decision task conditions must evaluate to a numerical value. Workflow variables used in expressions must exist in the workflow.

The Workflow Manager marks the workflow invalid if a link condition is invalid.

Task ValidationThe Workflow Manager validates each task in the workflow as you create it. When you save or validate the workflow, the Workflow Manager validates all tasks in the workflow except Session tasks. It marks the workflow invalid if it detects any invalid task in the workflow.

The Workflow Manager verifies that attributes in the tasks follow validation rules. For example, the user-defined event you specify in an Event task must exist in the workflow. The Workflow Manager also verifies that you linked each task properly. For example, you must link the Start task to at least one task in the workflow. For details on task validation rules, see “Validating Tasks” on page 139.

When you delete a reusable task, the Workflow Manager removes the instance of the deleted task from workflows. The Workflow Manager also marks the workflow invalid when you delete a reusable task used in a workflow.

The Workflow Manager verifies that there are no duplicate task names in a folder, and that there are no duplicate task instances in the workflow.

Validating a Workflow 119

Workflow Properties ValidationThe Workflow Manager marks the workflow invalid if the scheduler you specify for the workflow does not exist in the folder.

Running ValidationWhen you validate a workflow, you validate worklet instances, worklet objects, and all other nested worklets in the workflow. You validate task instances and worklets, regardless of whether you have edited them.

The Workflow Manager validates the worklet object using the same validation rules for workflows. The Workflow Manager validates the worklet instance by verifying attributes in the Parameter tab of the worklet instance. For details on validating worklets, see “Validating Worklets” on page 171.

If the workflow contains nested worklets, you can select a worklet to validate the worklet and all other worklets nested under it. To validate a worklet and its nested worklets, right-click the worklet and choose Validate.

ExampleFor example, you have a workflow that contains a non-reusable worklet called Worklet_1. Worklet_1 contains a nested worklet called Worklet_a. The workflow also contains a reusable worklet instance called Worklet_2. Worklet_2 contains a nested worklet called Worklet_b.

In the example workflow in Figure 4-15, the Workflow Manager validates links, conditions, and tasks in the workflow. The Workflow Manager validates all tasks in the workflow, including tasks in Worklet_1, Worklet_2, Worklet_a, and Worklet_b.

You can validate a part of the workflow. Right-click Worklet_1 and choose Validate. The Workflow Manager validates all tasks in Worklet_1 and Worklet_a.

Figure 4-15 shows the example workflow:

Validating Multiple WorkflowsYou can validate multiple workflows or worklets without fetching them into the workspace. To validate multiple workflows, you must select and validate the workflows from a query

Figure 4-15. Example Workflow - ValidationWorklet_1: Non-reusable worklet. Contains a nested worklet called Worklet_a.

Worklet_2: Reusable worklet. Contains a nested worklet called Worklet_b.


results view or a view dependencies list. When you validate multiple workflows, the validation does not include sessions, nested worklets, or reusable worklet objects in the workflows.

Note: If you are using the Repository Manager, you can select and validate multiple workflows from the Repository Navigator.

You can save and optionally check in workflows that change from invalid to valid status. For more information about validating multiple objects, see “Validating Multiple Objects” in the Repository Guide.

To validate multiple workflows:

1. Select workflows from either a query list or a view dependencies list.

2. Right-click one of the selected workflows and choose Validate.

The Validate Objects dialog box displays.

3. Choose whether to save objects and check in objects that you validate.

Validating a Workflow 121

Running the Workflow

Before you can run a workflow, you must save changes in the folder and select a PowerCenter Server to run the workflow. You can manually start a workflow configured to run on demand or to run on a schedule. Use the Workflow Manager, Workflow Monitor, or pmcmd to run a workflow. You can choose to run the entire workflow, part of a workflow, or a task in the workflow.

Selecting a Server to Run the WorkflowYou must choose a server to run the workflow. If you only register one server, the Workflow Manager lists the single registered PowerCenter Server that runs the workflow. For PowerCenter repositories with multiple servers, the Workflow Manager lists all servers.

To select a server to run a workflow:

1. In the Workflow Designer, open the Workflow.

2. Choose Workflows-Edit. The Edit Workflow dialog box appears.

3. Click the Select Server button on the General tab. A list of registered servers appear.

4. Select the server on which you want to run the workflow.

5. Click OK twice to select the server for the workflow.

Assigning the PowerCenter Server to a WorkflowAfter you register the PowerCenter Server, you can assign it to workflows you want to run on that server. This allows you to assign the PowerCenter Server to multiple workflows without

Select a server.


editing each workflow property individually. To assign the PowerCenter Server to multiple workflows, you must first close all folders in the repository.

You can also choose a PowerCenter Server to run a specific workflow by editing the workflow property. For details, see “Running a Workflow” on page 124.

To assign the PowerCenter Server to workflows, you must have Super User privilege.

To assign the PowerCenter Server:


2. Choose Server-Assign Server.

or

Right-click the server name in the Navigator and choose Assign Server. The Assign Server dialog box opens.

3. From the Choose Server list, select the server you want to assign.

4. From the Show Folder list, select the folder you want to view. Or, choose All to view workflows in all folders in the repository.

5. Select the Select check box for each workflow you want to run on the PowerCenter Server.

6. Click Assign.

Removing an Assigned Server from a WorkflowYou can remove an assigned server from a workflow in the Assign Server dialog box. Perform the following steps to remove an assigned server from a workflow.

Assign a server to a workflow.

Select a server to assign.

Select a folder.

Running the Workflow 123

To remove an assigned server:



3. From the Choose Server list, select None.


5. Select the workflows from which you want to remove the assigned server.

6. Click Assign.

Running a WorkflowWhen you choose Workflows-Start, the PowerCenter Server runs the entire workflow.

To run a workflow from pmcmd, use the startworkflow command. For details on using pmcmd, see “Using pmcmd” on page 581.

To start a workflow with the Workflow Manager:

1. Connect to a repository and open the folder containing the workflow.

2. From the Navigator, select the workflow that you want to start.

3. Right-click the workflow in the Navigator and choose Start Workflow.

The PowerCenter Server starts running the entire workflow.

When you choose Start Workflow, the workflow runs on the PowerCenter Server you selected in the workflow properties. You can also use the Choose Server toolbar button to run the workflow on a different server.

After the Workflow Manager sends a request to the PowerCenter Server, the Output window displays the PowerCenter Server response. If an error displays, check the workflow log or session log for error messages.

You can also manually start a workflow by right-clicking in the Workflow Designer workspace and choosing Start Workflow.

Running a Part of a WorkflowYou can choose to run only part of the workflow. To run part of the workflow, right-click the task that you want the PowerCenter Server to begin running and choose Start Workflow From Task. The PowerCenter Server runs the workflow from the selected task to the end of the workflow.

When you run a workflow from a selected task, the PowerCenter Server runs the workflow on the registered server you choose in the workflow properties. The PowerCenter Server logs messages in the workflow log when you start a workflow from a task.


To run a part of a workflow from pmcmd, use the startfrom flag of the startworkflow command. For details on using pmcmd, see “Using pmcmd” on page 581.

To run a part of a workflow:

1. Connect to the folder containing the workflow.

2. In the Navigator window, drill down the Workflow folder to show the tasks in the workflow.

or

In the Workflow Designer workspace, select the task from which you want the PowerCenter Server to begin running.

3. Right-click the task on which you want the PowerCenter Server to begin running.

4. Choose Start Workflow From Task.

For example, you have a workflow with multiple tasks. The example workflow in Figure 4-16 contains two branches. If you want to run the tasks commandtask2, e_email2, and command3, you start the workflow from commandtask2. All subsequent tasks in the branch will run.

Running a Task in the WorkflowWhen you start a task in the workflow, the Workflow Manager locks the entire workflow so another user cannot start the workflow. The PowerCenter Server runs the selected task. It does not run the rest of the workflow.

To run a task using the Workflow Manager, select the task in the Workflow Designer workspace. Right-click the task and choose Start Task.

You can select a task to start using menu commands in the Workflow Manager. In the Navigator window, drill down the Workflow folder to show the tasks in the workflow you want to start. Right-click the task you want to start and choose Start Task.

Figure 4-16. Running Part of a Workflow - Example

When you start the workflow from commandtask2, the PowerCenter Server runs this portion of the workflow.

Running the Workflow 125

To start a task in a workflow from pmcmd, use the starttask command. For details on using pmcmd, see “Using pmcmd” on page 581.


Suspending the Workflow

When a task in the workflow fails, you might want to suspend the workflow, fix the error, and resume or recover the workflow. The PowerCenter Server suspends the workflow if you enable the Suspend On Error option in the workflow properties. You can optionally set a suspension email so the PowerCenter Server sends an email when it suspends a workflow.

When you enable the Suspend On Error option, the PowerCenter Server suspends the workflow when one of the following fails:

♦ Session

♦ Command

♦ Worklet

♦ Email

When a task fails in the workflow, the PowerCenter Server stops running tasks in its path. The PowerCenter Server does not evaluate the output link of the failed task. If no other task is running in the workflow, the Workflow Monitor displays the status of the workflow as “Suspended.”

If one or more tasks are still running in the workflow when a task fails, the PowerCenter Server stops running the failed task and continues running tasks in other paths. The Workflow Monitor displays the status of the workflow as “Suspending.”

When the status of the workflow is “Suspended” or “Suspending,” you can fix the error, such as a target database error, and resume or recover the workflow in the Workflow Monitor. When you resume or recover a workflow, the PowerCenter Server restarts the failed tasks and continues evaluating the rest of the tasks in the workflow. The PowerCenter Server does not run any task that already completed successfully.

Note: Do not edit a workflow or the tasks inside a workflow when the PowerCenter Server suspends a workflow.

For details about resuming the workflow, see “Resuming a Workflow or Worklet” on page 417. For details about recovering the workflow, see “Recovering a Workflow or Worklet” on page 417.

To suspend a workflow:


2. Choose Workflows-Edit.

Suspending the Workflow 127

3. In the General tab, enable Suspend On Error.

4. Click OK.

Configuring Suspension EmailYou can configure the workflow so that the PowerCenter Server sends an email when it suspends a workflow. Select an existing reusable email task for the suspension email. When a task fails, the PowerCenter Server starts suspending the workflow and sends the suspension email. If another task fails while the PowerCenter Server is suspending the workflow, you do not get the suspension email again.

The PowerCenter Server sends out a suspension email if another task fails after you resume the workflow.

For details on configuring suspension emails, see “Working with Suspension Email” on page 339.


Stopping or Aborting the Workflow

You can specify when and how you want the PowerCenter Server to stop or abort a workflow by using the Control task in the workflow. After you start a workflow, you can stop or abort it through the Workflow Monitor or pmcmd. You can issue the stop or abort command at any time during the execution of a workflow.

You can stop or abort a workflow by performing one of the following actions:

♦ Use a Control task in the workflow. For details, see “Working with the Control Task” on page 147.

♦ Issue a stop or abort command in the Workflow Monitor. For details, see “Monitoring Workflows” on page 401.

♦ Issue a stop or abort command in pmcmd. For details, see “pmcmd Reference” on page 594.

You can also stop or abort a task within a workflow. For details on stopping the Session task, see “Stopping and Aborting a Session” on page 200.

Server Handling of Stop and AbortWhen you stop a workflow, the PowerCenter Server tries to stop all the tasks that are currently running in the workflow. If the workflow contains a worklet, the PowerCenter Server also tries to stop all the tasks that are currently running in the worklet. If it cannot stop the workflow, you need to abort the workflow.

The PowerCenter Server can stop the following tasks completely:

♦ Session

♦ Command

♦ Timer

♦ Event-Wait

♦ Worklet

When you stop a Command task that contains multiple commands, the PowerCenter Server finishes executing the current command and does not execute the rest of the commands. The PowerCenter Server cannot stop tasks such as the Email task. For example, if the PowerCenter Server has already started sending an email when you issue the stop command, the PowerCenter Server finishes sending the email before it stops running the workflow.

The PowerCenter Server aborts the workflow if the Repository Server process shuts down.

Stopping or Aborting a TaskYou can stop or abort a task within a workflow from the Workflow Monitor. When you stop or abort a task, the PowerCenter Server stops processing the task. The PowerCenter Server does not process other tasks in the path of the stopped or aborted task. The PowerCenter

Stopping or Abort ing the Workflow 129

Server continues processing concurrent tasks in the workflow. If the PowerCenter Server cannot stop the task, you can abort the task.

When you abort a task, the PowerCenter Server kills the process on the task. The PowerCenter Server continues processing concurrent tasks in the workflow when you abort a task.

You can also stop or abort a worklet. The PowerCenter Server stops and aborts a worklet similar to stopping and aborting a task. The PowerCenter Server stops the worklet while executing concurrent tasks in the workflow. You can also stop or abort tasks within a worklet.

Stopping or Aborting a Session TaskIf the PowerCenter Server is executing a Session task when you issue the stop command, the PowerCenter Server stops reading data. It continues processing and writing data and committing data to targets. If the PowerCenter Server cannot finish processing and committing data, you can issue the abort command.

The PowerCenter Server handles the abort command for the Session task like the stop command, except it has a timeout period of 60 seconds. If the PowerCenter Server cannot finish processing and committing data within the timeout period, it kills the DTM process and terminates the session. For details on stopping or aborting a session, see “Stopping and Aborting a Session” on page 200.


C h a p t e r 5

Working with Tasks


♦ Overview, 132

♦ Creating a Task, 133

♦ Configuring Tasks, 135

♦ Validating Tasks, 139

♦ Working with the Assignment Task, 140

♦ Working with the Command Task, 143

♦ Working with the Control Task, 147

♦ Working with the Decision Task, 149

♦ Working with Event Tasks, 153

♦ Working with the Timer Task, 161

131

Overview

The Workflow Manager contains many types of tasks to help you build workflows and worklets. You can create reusable tasks in the Task Developer. Or, create and add tasks in the Workflow or Worklet Designer as you develop the workflow.

Table 5-1 summarizes workflow tasks available in Workflow Manager:

The Workflow Manager validates tasks attributes and links. If a task is invalid, the workflow becomes invalid. Workflows containing invalid sessions may still be valid. For details on validating tasks, see “Validating Tasks” on page 139.

Table 5-1. Workflow Tasks

Task Name Tool Reusable Description

Assignment Workflow DesignerWorklet Designer

No Assigns a value to a workflow variable. For details, see �Working with the Assignment Task� on page 140.

Command Task Developer Workflow Designer Worklet Designer

Yes Specifies shell commands to run during the workflow. You can choose to run the Command task only if the previous task in the workflow completes. For details, see �Working with the Command Task� on page 143.

Control Workflow DesignerWorklet Designer

No Stops or aborts the workflow. For details, see �Working with the Control Task� on page 147.

Decision Workflow DesignerWorklet Designer

No Specifies a condition to evaluate in the workflow. Use the Decision task to create branches in a workflow. For details, see �Working with the Decision Task� on page 149.

Email Task Developer Workflow Designer Worklet Designer

Yes Sends email during the workflow. For details, see �Sending Email� on page 319.

Event-Raise Workflow DesignerWorklet Designer

No Represents the location of a user-defined event. The Event-Raise task triggers the user-defined event when the PowerCenter Server runs the Event-Raise task. For details, see �Working with Event Tasks� on page 153.

Event-Wait Workflow Designer Worklet Designer

No Waits for a user-defined or a pre-defined event to occur. Once the event occurs, the PowerCenter Server completes the rest of the workflow. For details, see �Working with Event Tasks� on page 153.

Session Task Developer Workflow Designer Worklet Designer

Yes Set of instructions to run a mapping. For details, see �Working with Sessions� on page 173.

Timer Workflow Designer Worklet Designer

No Waits for a specified period of time to run the next task. For details, see �Working with Event Tasks� on page 153.

132 Chapter 5: Working with Tasks

Creating a Task

You can create tasks in the Task Developer, or you can create them in the Workflow Designer or the Worklet Designer as you develop the workflow or worklet. Tasks you create in the Task Developer are reusable. Tasks you create in the Workflow Designer and Worklet Designer are non-reusable by default.

For details on reusable tasks, see “Reusable Workflow Tasks” on page 135.

Creating a Task in the Task DeveloperYou can create the following three types of tasks in the Task Developer:

♦ Command

♦ Session

♦ Email

Perform the following steps to create tasks in the Task Developer.

To create a task in the Task Developer:

1. In the Task Developer, choose Tasks-Create. The Create Task dialog box appears.

2. Select the task type you want to create, Command, Session, or Email.

3. Enter a name for the task.

4. For session tasks, select the mapping you want to associate with the session.

5. Click Create.

The Task Developer creates the workflow task.

6. Click Done to close the Create Task dialog box.

Creating a Task in the Workflow or Worklet DesignerYou can create and add tasks in the Workflow Designer or Worklet Designer as you develop the workflow or worklet. You can create any type of task in the Workflow Designer or Worklet Designer. Tasks you create in the Workflow Designer or Worklet Designer are non-reusable. Edit the General tab of the task properties to promote a non-reusable task to a reusable task.

Creating a Task 133

Perform the following steps to create tasks in the Workflow Designer or Worklet Designer.

To create tasks in the Workflow Designer or Worklet Designer:

1. In the Workflow Designer or Worklet Designer, open a workflow or worklet.

2. Choose Tasks-Create.

3. Select the type of task you want to create.

4. Enter a name for the task.

5. Click Create.

The Workflow Designer or Worklet Designer creates the task and adds it to the workspace.

6. Click Done.

You can also use the Tasks toolbar to create and add tasks to the workflow. Click the button on the Tasks toolbar for the task you want to create. Click again in the Workflow Designer or Worklet Designer workspace to create and add the task. The Workflow Designer or Worklet Designer creates the task with a default task name when you use the Tasks toolbar.


Configuring Tasks

After you create the task, you can configure general task options on the General tab. For each task instance in the workflow, you can configure how the PowerCenter Server runs the task and the other objects associated with the selected task. You can also disable the task so you can run rest of the workflow without the selected task.

Figure 5-1 displays the General tab in the Edit Tasks dialog box:

When you use a task in the workflow, you can edit the task in the Workflow Designer and configure the following task options in the General tab:

♦ Treat input link as AND or OR. Choose to have the PowerCenter Server run the task when all or one of the input link conditions evaluates to True.

♦ Disable this task. Choose to disable the task so you can run the rest of the workflow without the task.

♦ Fail parent if this task fails. Choose to fail the workflow or worklet containing the task if the task fails.

♦ Fail parent if this task does not run. Choose to fail the workflow or worklet containing the task if the task does not run.

Reusable Workflow TasksWorkflows can contain reusable task instances and non-reusable tasks. Non-reusable tasks exist within a single workflow. Reusable tasks can be used in multiple workflows in the same folder.

Figure 5-1. General Tab - Edit Tasks Dialog Box

Configuring Tasks 135

You have the option to create any task as non-reusable or reusable. Tasks you create in the Task Developer are reusable. Tasks you create in the Workflow Designer are non-reusable by default. However, you can edit the general properties of a task to promote it to a reusable task.

The Workflow Manager stores each reusable task separate from the workflows that use the task. You can view a list of reusable tasks in the Tasks node in the Navigator window. You can see a list of all reusable Session tasks in the Sessions node in the Navigator window.

To promote a non-reusable workflow task:

1. In the Workflow Designer, double-click the task you want to make reusable.

2. In the General tab of the Edit Task dialog box, check the Make Reusable option.

3. When prompted whether you are sure you want to promote the task, click Yes.

4. Click OK to return to the workflow.

5. Choose Repository-Save.

The newly promoted task appears in the list of reusable tasks in the Tasks node in the Navigator window.

Instances and Inherited ChangesWhen you add a reusable task to a workflow, you add an instance of the task. The definition of the task exists outside the workflow, while an instance of the task exists in the workflow.

You can edit the task instance in the Workflow Designer. Changes you make in the task instance exist only in the workflow. The task definition remains unchanged in the Task Developer.

When you make changes to a reusable task definition in the Task Developer, the changes reflect in the instance of the task in the workflow only if you have not edited the instance.

Reverting Changes in Reusable Tasks InstancesWhen you edit an instance of a reusable task in the workflow, you can revert back to the settings in the task definition. When you change settings in the task instance, the Revert button appears. The Revert button appears after you override task properties. You cannot use the Revert button for settings that are read-only or locked by another user.


Figure 5-2 displays the Revert button in the Mapping tab of a Session task:

AND or OR Input LinksFor each task, you can choose to treat the input link as an AND link or an OR link. When a task has one input link, the PowerCenter Server processes the task when the previous object completes and the link condition evaluates to True. If you have multiple links going into one task, you can choose to have an AND input link so that the PowerCenter Server runs the task when all the link conditions evaluates to True. Or, you can choose to have an OR input link so that the PowerCenter Server runs the task as soon as any link condition evaluates to True.

To set the type of input links, double-click the task to open the Edit Tasks dialog box. Select AND or OR for the input link type. For details on working with links and link conditions, see “Working with Links” on page 92.

Disabling TasksIn the Workflow Designer, you can disable a workflow task so that the PowerCenter Server runs the workflow without the disabled task. The status of a disabled task is DISABLED.

Disable a task in the workflow by selecting the Disable This Task option in the Edit Tasks dialog box.

Figure 5-2. Revert Button in Session Properties

Configuring Tasks 137

Failing Parent Workflow or WorkletYou can choose to fail the workflow or worklet if a task fails or does not run. The workflow or worklet that contains the task instance is called the parent. A task might not run when the input condition for the task evaluates to False.

To fail the parent workflow or worklet if the task fails, double-click the task and select the Fail Parent If This Task Fails option in the General tab. When you select this option and a task fails, it does not prevent the other tasks in the workflow or worklet from running. Instead, the PowerCenter Server marks the status of the workflow or worklet as failed. If you have a session nested within multiple worklets, you must select the Fail Parent If This Task Fails option for each worklet instance to see the failure at the workflow level.

To fail the parent workflow or worklet if the task does not run, double-click the task and select the Fail Parent If This Task Does Not Run option in the General tab. When you choose this option, the PowerCenter Server fails the parent workflow if a task did not run.

Note: The PowerCenter Server does not fail the parent workflow if you disable a task.


Validating Tasks

You can validate reusable tasks in the Task Developer. Or, you can validate task instances in the Workflow Designer. When you validate a task, the Workflow Manager validates task attributes and links. For example, the user-defined event you specify in an Event tasks must exist in the workflow.

The Workflow Manager uses the following rules to validate tasks:

♦ Assignment. The Workflow Manager validates the expression you enter for the Assignment task. For example, the Workflow Manager verifies that you assigned a matching datatype value to the workflow variable in the assignment expression.

♦ Command. The Workflow Manager does not validate the shell command you enter for the Command task.

♦ Event-Wait. If you choose to wait for a pre-defined event, the Workflow Manager verifies that you specified a file to watch. If you choose to use the Event-Wait task to wait for a user-defined event, the Workflow Manager verifies that you specified an event.

♦ Event-Raise. The Workflow Manager verifies that you specified a user-defined event for the Event-Raise task.

♦ Timer. The Workflow Manager verifies that the variable you specified for the Absolute Time setting has the Date/Time datatype.

♦ Start. The Workflow Manager verifies that you linked the Start task to at least one task in the workflow.

When a task instance is invalid, the workflow using the task instance becomes invalid. When a reusable task is invalid, it does not affect the validity of the task instance used in the workflow. However, if a Session task instance is invalid, the workflow may still be valid. The Workflow Manager validates sessions differently. For details, see “Validating a Session” on page 195.

To validate a task, select the task in the workspace and choose Tasks-Validate. Or, right-click the task in the workspace and choose Validate.

Validating Tasks 139

Working with the Assignment Task

The Assignment task allows you to assign a value to a user-defined workflow variable. To use an Assignment task in the workflow, first create and add the Assignment task to the workflow. Then configure the Assignment task to assign values or expressions to user-defined variables. After you assign a value to a variable using the Assignment task, the PowerCenter Server uses the assigned value for the variable during the remainder of the workflow.

You must create a variable before you can assign values to it. You cannot assign values to pre-defined workflow variables.

To create an Assignment task:

1. In the Workflow Designer, click the Assignment icon on the Tasks toolbar.

or

Choose Tasks-Create. Select Assignment Task for the task type.

2. Enter a name for the Assignment task. Click Create. Then click Done.

The Workflow Designer creates and adds the Assignment task to the workflow.

3. Double-click the Assignment task to open the Edit Task dialog box.

4. On the Expressions tab, click Add to add an assignment.

5. Click the Open button in the User Defined Variables field.

Assignment Task Toolbar Icon

Add an assignment.

Open Button


The Select Variable dialog box appears.

6. Select the variable for which you want to assign a value. Click OK.

7. Click the Edit button in the Expression field to open the Expression Editor.

The Expression Editor shows pre-defined workflow variables, user-defined workflow variables, variable functions, and boolean and arithmetic operators.

8. Enter the value or expression you want to assign. For example, if you want to assign the value 500 to the user-defined variable $$custno1, enter the number 500 in the Expression Editor.

Validate the expression before you close the Expression Editor.

Working with the Assignment Task 141

9. Repeat steps 5-7 to add more variable assignments as necessary. Use the up and down arrows in the Expressions tab to change the order of the variable assignments.

10. Click OK.


Working with the Command Task

The Command task allows you to specify one or more shell commands to run during the workflow. For example, you can specify shell commands in the Command task to delete reject files, copy a file, or archive target files.

You can use a Command task in the following ways:

♦ Standalone Command task. You can use a Command task anywhere in the workflow or worklet to run shell commands.

♦ Pre- and post-session shell command. You can call a Command task as the pre- or post-session shell command for a Session task. For more information about specifying pre-session and post-session shell commands, see “Using Pre- or Post-Session Shell Commands” on page 188.

Note: You can use server variables or session variables in pre- and post-session shell commands. You cannot use server variables or session variables in standalone Command tasks. The PowerCenter Server does not expand server variables or session variables in standalone Command tasks.

Use any valid UNIX command or shell script for UNIX servers, or any valid DOS or batch file for Windows servers.

For example, you might use a shell command to copy a file from one directory to another. For a Windows server you would use the following shell command to copy the SALES_ ADJ file from the source directory, L, to the target, H:

copy L:\sales\sales_adj H:\marketing\

For a UNIX server, you would use the following command to perform a similar operation:

cp sales/sales_adj marketing/

Each shell command runs in the same environment (UNIX or Windows) as the PowerCenter Server. Environment settings in one shell command script do not carry over to other scripts. To run all shell commands in the same environment, call a single shell script that invokes other scripts.

Using Session ParametersYou can use session parameters in pre- or post-session shell commands. For example, you might use an input file parameter instead of hard-coding the name of a source file.

Creating a Command TaskPerform the following steps to create a Command task.

Working with the Command Task 143

To create a Command task:

1. In the Workflow Designer or the Task Developer, click the Command Task icon on the Tasks toolbar.

or

Choose Task-Create. Select Command Task for the task type.

2. Enter a name for the Command task. Click Create. Then click Done.

3. Double-click the Command task in the workspace to open the Edit Tasks dialog box.

4. In the Commands tab, click the Add button to add a command.

5. In the Name field, enter a name for the new command.

Command Task Icon

Add Button

Edit Button


6. In the Command field, click the Edit button to open the Command Editor.

7. Enter the command you want to perform. Enter only one command in the Command Editor.

8. Click OK to close the Command Editor.

9. Repeat steps 3-8 to add more commands in the task.

10. Click OK.

If you specify non-reusable shell commands for a session, you can promote the non-reusable shell commands to a reusable Command task. For details, see “Creating a Reusable Command Task from Pre- or Post-Session Commands” on page 191.

Executing Commands in the Command TaskThe PowerCenter Server processes the shell commands in the order you specify them. You can choose to run a command only if the previous command completed successfully. Or, you choose to run all commands in the Command Task, regardless of the result of the previous command. If you configure multiple commands in a Command task to run on UNIX, each command runs in a separate shell.

To run the next command only if the previous command completes successfully, select the “Run If Previous Completed” option in the Properties tab of the Command task.

If you select the Run If Previous Completed option, when one of the commands in the Command task fails, the PowerCenter Server stops running the rest of the commands and fails the task. If you do not select the Run If Previous Completed option, the PowerCenter Server runs all the commands in the Command task and treats the task as completed, even if a command fails.

Working with the Command Task 145

Figure 5-3 shows the Run If Previous Completed option:

Figure 5-3. Run If Previous Completed Option


Working with the Control Task

You can use the Control task to stop, abort, or fail the top-level workflow or the parent workflow based on an input link condition. A parent workflow or worklet is the workflow or worklet that contains the Control task.

To create a Control task:

1. In the Workflow Designer, click the Control Task icon on the Tasks toolbar.

or

Choose Tasks-Create. Select Control Task for the task type.

2. Enter a name for the Control task. Click Create. Then click Done.

The Workflow Manager creates and adds the Control task to the workflow.

3. Double-click the Control task in the workspace to open it.

Control Task Icon

Working with the Control Task 147

4. Configure control options on the Properties tab.

You can choose from the following control options:

Control Option Description

Fail Me Marks the Control task as �Failed.� The PowerCenter Server fails the Control task if you choose this option. If you choose Fail Me in the Properties tab and choose Fail Parent If This Task Fails in the General tab, the PowerCenter Server fails the parent workflow.

Fail Parent Marks the status of the workflow or worklet that contains the Control task as failed after the workflow or worklet completes.

Stop Parent Stops the workflow or worklet that contains the Control task.

Abort Parent Aborts the workflow or worklet that contains the Control task.

Fail Top-Level Workflow Fails the workflow that is running.

Stop Top-Level Workflow Stops the workflow that is running.

Abort Top-Level Workflow Aborts the workflow that is running.


Working with the Decision Task

The Decision task allows you to enter a condition that determines the execution of the workflow, similar to a link condition. The Decision task has a pre-defined variable called $Decision_task_name.condition that represents the result of the decision condition. The PowerCenter Server evaluates the condition in the Decision task and sets the pre-defined condition variable to True (1) or False (0).

You can specify one decision condition per Decision task.

After the PowerCenter Server evaluates the Decision task, you can use the pre-defined condition variable in other expressions in the workflow to help you develop the workflow.

Depending on the workflow, you might use link conditions instead of a Decision task. However, the Decision task simplifies the workflow. For details on link conditions, see “Working with Links” on page 92.

If you do not specify a condition in the Decision task, the PowerCenter Server evaluates the Decision task to True.

Using the Decision TaskYou can use the Decision task instead of multiple link conditions in a workflow. Instead of specifying multiple link conditions, use the pre-defined Condition variable in a Decision task to simplify link conditions.

ExampleFor example, you have a Command task that depends on the status of the three sessions in the workflow. You want the PowerCenter Server to run the Command task when any of the three sessions fails. To accomplish this, use a Decision task with the following decision condition:

$Q1_session.status = FAILED OR $Q2_session.status = FAILED OR $Q3_session.status = FAILED

You can then use the pre-defined condition variable in the input link condition of the Command task. Configure the input link with the following link condition:

$Decision.condition = True

Working with the Decision Task 149

Figure 5-4 shows the example workflow using a Decision task:

You can configure the same logic in the workflow without the Decision task. Without the Decision task, you need to use three link conditions and treat the input links to the Command task as OR links.

Figure 5-5 shows the example workflow without the Decision task:

You can further expand the example workflow in Figure 5-4. In Figure 5-4, the PowerCenter Server runs the Command task if any of the three Session tasks fails. Suppose now you want the PowerCenter Server to also run an Email task if all three Session tasks succeed.

Figure 5-4. Example Workflow Using a Decision Task

Figure 5-5. Example Workflow without a Decision Task


To do this, add an Email task and use the decision condition variable in the link condition. Figure 5-6 shows the expanded example workflow using a Decision task:

Creating a Decision TaskPerform the following steps to create a Decision task.

To create a Decision task:

1. In the Workflow Designer, click the Decision Task icon on the Tasks toolbar.

or

Choose Tasks-Create. Select Decision Task for the task type.

2. Enter a name for the Decision task. Click Create. Then click Done.

The Workflow Designer creates and adds the Decision task to the workspace.

Figure 5-6. Expanded Example Workflow Using a Decision Task

$Decision.condition = True

$Decision.condition = False

Decision Task Icon

Working with the Decision Task 151

3. Double-click the Decision task to open it.

4. Click the Open button in the Value field to open the Expression Editor.

5. In the Expression Editor, enter the condition you want the PowerCenter Server to evaluate.

Validate the expression before you close the Expression Editor.

6. Click OK.


Working with Event Tasks

You can define events in the workflow to specify the sequence of task execution. The event is triggered based on the completion of the sequence of tasks. Use the following tasks to help you use events in the workflow:

♦ Event-Raise task. Event-Raise task represents a user-defined event. When the PowerCenter Server runs the Event-Raise task, the Event-Raise task triggers the event. Use the Event-Raise task with the Event-Wait task to define events.

♦ Event-Wait task. The Event-Wait task waits for an event to occur. Once the event triggers, the PowerCenter Server continues executing the rest of the workflow.

To coordinate the execution of the workflow, you may specify the following types of events for the Event-Wait and Event-Raise tasks:

♦ Pre-defined event. A pre-defined event is a file-watch event. For pre-defined events, use an Event-Wait task to instruct the PowerCenter Server to wait for the specified indicator file to appear before continuing with the rest of the workflow. When the PowerCenter Server locates the indicator file, it starts the next task in the workflow.

♦ User-defined event. A user-defined event is a sequence of tasks in the workflow. Use an Event-Raise task to specify the location of the user-defined event in the workflow. A user-defined event is sequence of tasks in the branch from the Start task leading to the Event-Raise task.

When all the tasks in the branch from the Start task to the Event-Raise task complete, the Event-Raise task triggers the event. The Event-Wait task waits for the Event-Raise task to trigger the event before continuing with the rest of the tasks in its branch.

Example of User-Defined EventsSay you have four sessions you want to run in a workflow. You want Q1_session and Q2_session to run concurrently to save time. You also want to run Q3_session after Q1_session completes. You want to run Q4_session only when Q1_session, Q2_session, and Q3_session complete.

Figure 5-7 shows how to accomplish this using the Event-Raise and Event-Wait tasks:

Figure 5-7. Example of User-Defined EventUser-defined event: Q1Q3_Complete

Working with Event Tasks 153

Perform the following steps to configure the workflow shown in Figure 5-7:

1. Link Q1_session and Q2_session concurrently.

2. Add Q3_session after Q1_session.

3. Declare an event called Q1Q3_Complete in the Events tab of the workflow properties.

4. In the workspace, add an Event-Raise task after Q3_session.

5. Specify the Q1Q3_Complete event in the Event-Raise task properties. This allows the Event-Raise task to trigger the event when Q1_session and Q3_session complete.

6. Add an Event-Wait task after Q2_session.

7. Specify the Q1Q3_Complete event for the Event-Wait task.

8. Add Q4_session after the Event-Wait task. When the PowerCenter Server processes the Event-Wait task, it waits until the Event-Raise task triggers Q1Q3_Complete before it runs Q4_session.

The PowerCenter Server runs the workflow shown in Figure 5-7 in the following order:

1. The PowerCenter Server runs Q1_session and Q2_session concurrently.

2. When Q1_session completes, the PowerCenter Server runs Q3_session.

3. The PowerCenter Server finishes executing Q2_session.

4. The Event-Wait task waits for the Event-Raise task to trigger the event.

5. The PowerCenter Server completes Q3_session.

6. The Event-Raise task triggers the event, Q1Q3_complete.

7. The PowerCenter Server runs Q4_session because the event, Q1Q3_Complete, has been triggered.

8. The PowerCenter Server runs the Email task.

Working with Event-Raise TasksThe Event-Raise task represents the location of a user-defined event. A user-defined event is the sequence of tasks in the branch from the Start task to the Event-Raise task. When the PowerCenter Server runs the Event-Raise task, the Event-Raise task triggers the user-defined event.

To use an Event-Raise task, you must first declare the user-defined event. Then, create an Event-Raise task in the workflow to represent the location of the user-defined event you just declared. In the Event-Raise task properties, specify the name of a user-defined event.


Declaring a User-Defined EventPerform the following steps to declare a name for a user-defined event.

To declare a user-defined event:

1. In the Workflow Designer, select Workflow-Edit to open the workflow properties.

2. Select the Events tab in the Edit Workflow dialog box.

3. Click Add to add an event name. Event name is not case-sensitive.

4. Click OK.

Using the Event-Raise Task For a User-Defined EventAfter you declare a user-defined event, use the Event-Raise task to represent the location of the event and to trigger the event.

Perform the following steps to use an Even-Raise task.

To use an Event-Raise task:

1. In the Workflow Designer workspace, create an Event-Raise task and place it in the workflow to represent the user-defined event you want to trigger. A user-defined event is the sequence of tasks in the branch from the Start task to the Event-Raise task.

Add a user-defined event.


2. Double-click the Event-Raise task to open it.

3. Click the Open button in the Value field on the Properties tab to open the Events Browser for user-defined events.

4. Choose an event in the Events Browser.

5. Click OK twice to return to the workspace.

Working With Event-Wait TasksThe Event-Wait task waits for a pre-defined event or a user-defined event. A pre-defined event is a file-watch event. When you use the Event-Wait task to wait for a pre-defined event, you


specify an indicator file for the PowerCenter Server to watch. The PowerCenter Server waits for the indicator file to appear. Once the indicator file appears, the PowerCenter Server continues executing tasks after the Event-Wait task.

Do not use the Event-Raise task to trigger the event when you wait for a pre-defined event.

You can also use the Event-Wait task to wait for a user-defined event. To use the Event-Wait task for a user-defined event, you specify the name of the user-defined event in the Event-Wait task properties. The PowerCenter Server waits for the Event-Raise task to trigger the user-defined event. Once the user-defined event is triggered, the PowerCenter Server continues running tasks after the Event-Wait task.

Waiting for User-Defined EventsYou can use the Event-Wait task to wait for a user-defined event. A user-defined event is triggered by the Event-Raise task. To wait for a user-defined event, you must first use an Event-Raise task to trigger the user-defined event.

To wait for a user-defined event:

1. In the workflow, create an Event-Wait task and double-click the Event-Wait task to open the Edit Task dialog box.

2. In the Events tab of the Edit Tasks dialog box, select User-Defined.

Open the Events Browser.


3. Click the Event button to open the Events Browser dialog box.

4. Select a user-defined event for the PowerCenter Server to wait.

5. Click OK twice.

Waiting for Pre-Defined EventsTo use a pre-defined event, you need a shell command, script, or batch file to create an indicator file. The file must be created or sent to a directory local to the PowerCenter Server. The file can be any format recognized by the PowerCenter Server operating system. You can choose to have the PowerCenter Server delete the indicator file after it detects the file, or you can manually delete the indicator file. The PowerCenter Server marks the status of the Event-Wait task as failed if it cannot delete the indicator file.

When you specify the indicator file in the Event-Wait task, enter the directory in which the file will appear and the name of the indicator file. You must provide the absolute path for the file. The directory must be local to the PowerCenter Server. If you only specify the file name and not the directory, the PowerCenter Server looks for the indicator file in the system directory. For example, on Windows 2000, the system directory is c:\winnt\system32.

You can enter the actual name of the file or use server variables to specify the location of the files. For more information on server variables, see “Server Variables” on page 46.

The PowerCenter Server writes the time the file appears in the workflow log.

Note: Do not use a source or target file name as the indicator file name.

Perform the following steps to wait for a pre-defined event in the workflow.

To wait for a pre-defined event:

1. Create an Event-Wait task and double-click the Event-Wait task to open it.


2. In the Events tab of the Edit Task dialog box, select Pre-defined.

3. Enter the path of the indicator file.

4. If you want the PowerCenter Server to delete the indicator file after it detects the file, select the Delete Filewatch File option in the Properties tab.

5. Click OK.

Enabling Past EventsBy default, the Event-Wait task waits for the Event-Raise task to trigger the event. By default, the Event-Wait task does not check if the event already occurred. You can select the Enable Past Events option so that the PowerCenter Server checks if the event has already occurred.


When you select Enable Past Events, the PowerCenter Server continues executing the next tasks if the event already occurred.

Select the Enable Past Events option in the Properties tab of the Event-Wait task.


Working with the Timer Task

The Timer task allows you to specify the period of time to wait before the PowerCenter Server runs the next task in the workflow. You can choose to start the next task in the workflow at an exact time and date. You can also choose to wait a period of time after the start time of another task, workflow, or worklet before starting the next task.

The Timer task has two types of settings:

♦ Absolute time. You specify the exact time that the PowerCenter Server starts running the next task in the workflow. You may specify the exact date and time, or you can choose a user-defined workflow variable to specify the exact time.

♦ Relative time. You instruct the PowerCenter Server to wait for a specified period of time after the Timer task, the parent workflow, or the top-level workflow starts.

For example, you may have two sessions in the workflow. You want the PowerCenter Server wait ten minutes after the first session completes before it runs the second session. Use a Timer task after the first session. In the Relative Time setting of the Timer task, specify ten minutes from the start time of the Timer task.

Figure 5-8 shows the example workflow using the Timer task:

You can use a Timer task anywhere in the workflow after the Start task.

To create a Timer task:

1. In the Workflow Designer, click the Timer task icon on the Tasks toolbar.

or

Choose Tasks-Create. Select Timer Task for the task type.

2. Double-click the Timer task to open it.

3. On the General tab, enter a name for the Timer task.

Figure 5-8. Example Workflow Using the Timer Task

Timer Task Toolbar Icon

Working with the Timer Task 161

4. Click the Timer tab to specify when the PowerCenter Server starts the next task in the workflow.

Specify attributes for Absolute Time or Relative Time described in Table 5-2:

Table 5-2. Timer Task Attributes

Timer Attribute Description

Absolute Time: Specify the exact time to start

The PowerCenter Server starts the next task in the workflow at the exact date and time you specify.

Absolute Time: Use this workflow date-time variable to calculate the wait

Specify a user-defined date-time workflow variable. The PowerCenter Server starts the next task in the workflow at the time you choose. The Workflow Manager verifies that the variable you specify has the Date/Time datatype.The Timer task fails if the date-time workflow variable evaluates to NULL.

Relative time: Start after Specify the period of time the PowerCenter Server waits to start executing the next task in the workflow.

Relative time: from the start time of this task

Choose this option to wait a specified period of time after the start time of the Timer task to run the next task.

Relative time: from the start time of the parent workflow/worklet

Choose this option to wait a specified period of time after the start time of the parent workflow/worklet to run the next task.

Relative time: from the start time of the top-level workflow

Choose this option to wait a specified period of time after the start time of the top-level workflow to run the next task.


C h a p t e r 6

Working with Worklets


♦ Overview, 164

♦ Developing a Worklet, 165

♦ Using Worklet Variables, 169

♦ Validating Worklets, 171

163

Overview

A worklet is an object that represents a set of tasks. It can contain any task available in the Workflow Manager. You can run worklets inside a workflow. The workflow that contains the worklet is called the parent workflow. You can also nest a worklet in another worklet.

Create a worklet when you want to reuse a set of workflow logic in several workflows. Use the Worklet Designer to create and edit worklets.

When the PowerCenter Server runs a worklet, it expands the worklet. The PowerCenter Server then runs the worklet as it would any other workflow, executing tasks and evaluating links in the worklet.

The worklet does not contain any scheduling or server information. To run a worklet, include the worklet in a workflow. The worklet runs on the PowerCenter Server you choose for the workflow. The Workflow Manager does not provide a parameter file or log file for worklets. The PowerCenter Server writes information about worklet execution in the workflow log.

Suspending WorkletsWhen you choose Suspend On Error for the parent workflow, the PowerCenter Server also suspends the worklet if a task in the worklet fails. When a task in the worklet fails, the PowerCenter Server stops executing the failed task and other tasks in its path. If no other task is running in the worklet, the worklet status is “Suspended.” If one or more tasks are still running in the worklet, the worklet status is “Suspending.” The PowerCenter Server suspends the parent workflow when the status of the worklet is “Suspended” or “Suspending.”

For details on suspending workflows, see “Suspending the Workflow” on page 127.

164 Chapter 6: Working with Worklets

Developing a Worklet

To develop a worklet, you must first create a worklet. After you create a worklet, configure worklet properties and add tasks to the worklet. You can create reusable worklets in the Worklet Designer. You can also create non-reusable worklets in the Workflow Designer as you develop the workflow.

Creating a Reusable WorkletCreate reusable worklets in the Worklet Designer. You can view a list of reusable worklets in the Navigator Worklets node.

To create a reusable worklet:

1. In the Worklet Designer, choose Worklets-Create. The Create Worklet dialog box appears.

2. Enter a name for the worklet.

3. Click OK.

The Worklet Designer creates a Start task in the worklet.

Creating a Non-Reusable WorkletYou can create non-reusable worklets in the Workflow Designer as you develop the workflow. Non-reusable worklets only exist in the workflow. You cannot use a non-reusable worklet in another workflow. After you create the worklet in the Workflow Designer, open the worklet to edit it in the Worklet Designer.

Developing a Worklet 165

You can promote non-reusable worklets to reusable worklets by selecting the Reusable option in the worklet properties. To rename non-reusable worklets, open the worklet properties in the Workflow Designer.

To create a non-reusable worklet:

1. In the Workflow Designer, open a workflow.

2. Choose Tasks-Create.

3. Select Worklet for the Task type.

4. Enter a name for the worklet.

5. Click Create.

The Workflow Designer creates the worklet and adds it to the workspace.

6. Click Done.

Configuring Worklet PropertiesWhen you use a worklet in a workflow, you can configure the same set of general task settings on the General tab as any other task. For example, you can make a worklet reusable, disable a worklet, configure the input link to the worklet, or fail the parent workflow based on the worklet. For details on these task settings, see “Configuring Tasks” on page 135.

In addition to general task settings, you can configure the following worklet properties:

♦ Worklet variables. Use worklet variables to reference values and record information. You use worklet variables the same way you use workflow variables. You can assign a workflow variable to a worklet variable to override its initial value.

For details on worklet variables, see “Using Worklet Variables” on page 169.

♦ Events. To use the Event-Wait and Event-Raise tasks in the worklet, you must first declare an event in the worklet properties.

♦ Metadata extension. Extend the metadata stored in the repository by associating information with repository objects. For details, see “Working with Metadata Extensions” on page 82.

Adding Tasks in WorkletsAfter you create a new worklet, add tasks by opening the worklet in the Worklet Designer. A worklet must contain a Start task. The Start task represents the beginning of a worklet. When you create a worklet, the Worklet Designer automatically creates a Start task for you.

To add tasks to a non-reusable worklet:

1. Create a non-reusable worklet in the Workflow Designer workspace.

2. Right-click the worklet and choose Open Worklet.


3. The Worklet Designer opens so you can add tasks in the worklet.

4. Add tasks in the worklet by using the Tasks toolbar or choose Tasks-Create in the Worklet Designer.

5. Connect tasks with links.

Declaring Events in WorkletsSimilar to workflows, you can use Event-Wait and Event-Raise tasks in a worklet. To use the Event-Raise task, you first declare a user-defined event in the worklet. Events in one instance of a worklet do not affect events in other instances of the worklet. You cannot specify worklet events in the Event tasks in the parent workflow.

For more information about using event tasks, see “Working with Event Tasks” on page 153.

Viewing Links in a WorkletWhen you edit a workflow or worklet, you can view the forward or backward link paths to other tasks. You can highlight paths to see links in the workflow branch from the Start task to the last task in the branch. For details, see “Developing Workflows” on page 91.

Nesting WorkletsYou can nest a worklet within another worklet. When you run a workflow containing nested worklets, the PowerCenter Server runs the nested worklet from within the parent worklet. You can group several worklets together by function or simplify the design of a complex workflow when you nest worklets.

You might choose to nest worklets to load data to fact and dimension tables. Create a nested worklet to load fact and dimension data into a staging area. Then, create a nested worklet to load the fact and dimension data from the staging area to the data warehouse.

You might choose to nest worklets to simplify the design of a complex workflow. Nest worklets that can be grouped together within one worklet. In the workflow in Figure 6-1, two worklets relate to regional sales and two worklets relate to quarterly sales.

Figure 6-1 shows a workflow that uses multiple worklets:

Figure 6-1. Workflow with Multiple Worklets

Developing a Worklet 167

The workflow in Figure 6-2 shows the same workflow with the worklets grouped and nested in parent worklets.

Figure 6-2 shows a workflow that uses nested worklets:

Creating Nested WorkletsFrom the Worklet Designer, open the parent worklet. To nest an existing reusable worklet, choose Tasks-Insert Worklet. To create a non-reusable nested worklet, choose Tasks-Create, and select worklet.

Figure 6-2. Workflow with Nested Worklets


Using Worklet Variables

Worklet variables are similar to workflow variables. A worklet has the same set of pre-defined variables as any task. You can also create user-defined worklet variables. Like user-defined workflow variables, user-defined worklet variables can be persistent or non-persistent. For details on workflow variables, see “Using Workflow Variables” on page 103.

You cannot use variables from the parent workflow in the worklet. Similarly, you cannot use user-defined worklet variables in the parent workflow. However, you can use pre-defined worklet variables in the parent workflow, just as you can use pre-defined variables for other tasks in the workflow.

Persistent Worklet VariablesUser-defined worklet variables can be persistent or non-persistent. To create a persistent worklet variable, select Persistent when you create the variable. When you create a persistent worklet variable, the worklet variable retains its value the next time the PowerCenter Server executes the worklet instance in the parent workflow.

For example, you might have a worklet with a persistent variable. Use two instances of the worklet in a workflow to run the worklet twice. You name the first instance of the worklet Worklet1 and the second instance Worklet2.

Figure 6-3 shows the example workflow:

When you run the example workflow shown in Figure 6-3, the persistent worklet variable retains its value from Worklet1 and becomes the initial value in Worklet2. After the PowerCenter Server executes Worklet2, it retains the value of the persistent variable in the repository and uses the value the next time you run the workflow.

Worklet variables only persist when you run the same workflow. A worklet variable does not retain its value when you use instances of the worklet in different workflows.

Overriding Initial ValueFor each worklet instance, you can override the initial value of the worklet variable by assigning a workflow variable to it.

To override the initial value of a worklet variable:

1. Double-click the worklet instance in the Workflow Designer workspace.

Figure 6-3. Example of Persistent Worklet Variable

Using Worklet Variables 169

2. On the Parameters tab, click the Add button.

3. Click the open button in the User-Defined Worklet Variables field to select a worklet variable.

4. Click the Open button in the Parent Workflow Variable field to select a workflow variable to assign to the worklet variable.

5. Click Apply.

The worklet variable in this worklet instance now has the selected workflow variable as its initial value.

Add Button

Select a user-defined worklet variable.


Validating Worklets

The Workflow Manager validates worklets when you save the worklet in the Worklet Designer. In addition, when you use worklets in a workflow, the PowerCenter Server validates the workflow according to the following validation rules at runtime:

♦ You cannot run two instances of the same worklet concurrently in the same workflow.

♦ You cannot run two instances of the same worklet concurrently across two different workflows.

♦ Each worklet instance in the workflow can run only once.

When a worklet instance is invalid, the workflow using the worklet instance remains valid. For details on workflow validation rules, see “Validating a Workflow” on page 119.

The Workflow Manager displays a red invalid icon if the worklet object is invalid. The Workflow Manager validates the worklet object using the same validation rules for workflows. The Workflow Manager displays a blue invalid icon if the worklet instance in the workflow is invalid. The worklet instance may be invalid when any of the following conditions occurs:

♦ The parent workflow or worklet variable you assign to the user-defined worklet variable does not have a matching datatype.

♦ The user-defined worklet variable you used in the worklet properties does not exist.

♦ You do not specify the parent workflow or worklet variable you want to assign.

For non-reusable worklets, you may see both red and blue invalid icons displayed over the worklet icon in the Navigator.

Validat ing Worklets 171


C h a p t e r 7

Working with Sessions


♦ Overview, 174

♦ Creating a Session Task, 175

♦ Editing a Session, 177

♦ Creating a Session Configuration Object, 183

♦ Using Pre- and Post-Session SQL Commands, 186

♦ Using Pre- or Post-Session Shell Commands, 188

♦ Using Post-Session Email, 194

♦ Validating a Session, 195

♦ Running the Session, 197

♦ Stopping and Aborting a Session, 200

♦ Mapping Parameters and Variables in Sessions, 203

♦ Handling High Precision Data, 204

173

Overview

A session is a set of instructions that tells the PowerCenter Server how and when to move data from sources to targets. A session is a type of task, similar to other tasks available in the Workflow Manager. In the Workflow Manager, you configure a session by creating a Session task. To run a session, you must first create a workflow to contain the Session task.

When you create a Session task, you enter general information such as the session name, session schedule, and the PowerCenter Server to run the session. You can also select options to execute pre-session shell commands, send On-Success or On-Failure email, and use FTP to transfer source and target files.

Using session properties, you can also override parameters established in the mapping, such as source and target location, source and target type, error tracing levels, and transformation attributes. When you assign a server in a server grid to a session, the server you specify at the session level overrides the server you specify at the workflow level.

You can run as many sessions in a workflow as you need. You can run the Session tasks sequentially or concurrently, depending on your needs.

The PowerCenter Server creates several files and in-memory caches depending on the transformations and options used in the session. For more details on session output files and caches, see “Output Files and Caches” on page 28.

174 Chapter 7: Working with Sessions

Creating a Session Task

You create a Session task for each mapping you want the PowerCenter Server to run. The PowerCenter Server uses the instructions configured in the session to move data from sources to targets.

You can create a reusable Session task in the Task Developer. You can also create non-reusable Session tasks in the Workflow Designer as you develop the workflow. After you create the session, you can edit the session properties at any time.

Note: Before you create a Session task, you must configure the Workflow Manager to communicate with databases and the PowerCenter Server. You must assign appropriate permissions for any database, FTP, or external loader connections you configure. For details on configuring the Workflow Manager, see “Configuring the Workflow Manager” on page 37.

Session PrivilegesTo create sessions, you must have one of the following sets of privileges and permissions:

♦ Use Workflow Manager privilege with read, write, and execute permissions


You must have read permission for connection objects associated with the session in addition to the above privileges and permissions.

PowerCenter allows you to set a read-only privilege for sessions. The Workflow Operator privilege allows a user to view, start, stop, and monitor sessions without being able to edit session properties.

Steps to Create a Session TaskCreate the Session task in the Task Developer or the Workflow Designer. Session tasks created in the Task Developer are reusable. For more information about reusable tasks and other general information about workflow tasks, see “Reusable Workflow Tasks” on page 135.

To create a Session task:

1. In the Workflow Designer, click the Session Task icon on the Tasks toolbar.

or

Choose Tasks-Create. Select Session Task for the task type.

2. Enter a name for the Session task.

3. Click Create. The Mappings dialog box appears.

Creating a Session Task 175

4. Select the mapping you want to use in the Session task and click OK.

5. Click Done. The Session task appears in the workspace.


Editing a Session

After you create a session, you can edit it. For example, you might need to adjust the buffer and cache sizes, modify the update strategy, or clear a variable value saved in the repository.

Double-click the Session task to open the session properties. The session has the following tabs, and each of those tabs has multiple settings:

♦ General tab. Enter session name, mapping name, description for the Session task, specify a PowerCenter Server override, and configure additional task options.

♦ Properties tab. Enter session log information, test load settings, and performance configuration.

♦ Config Object tab. Enter advanced settings, log options, and error handling configuration.

♦ Mapping tab. Enter source and target information, override transformation properties, and configure the session for partitioning.

♦ Components tab. Configure pre- or post-session shell commands and emails.

♦ Metadata Extension tab. Configure metadata extension options.

For a detailed description of the session properties tabs and associated options, see “Session Properties Reference” on page 667.

Figure 7-1 shows the session properties:

Figure 7-1. Session Properties

Edit ing a Session 177

You can edit session properties at any time. The repository updates the session properties immediately.

If the session is running when you edit the session, the repository updates the session when the session completes. If the mapping changes, the Workflow Manager might issue a warning that the session is invalid. The Workflow Manager then allows you to continue editing the session properties. After you edit the session properties, the PowerCenter Server validates the session and reschedules the session as necessary. For details on session validation, see “Validating a Session” on page 195.

Edit Session PrivilegeTo edit a session, you must have one of the following sets of privileges and permissions:

♦ Use Workflow Manager privilege with read and write permissions on the folder


Applying Attributes to All InstancesWhen you edit the session properties, you can apply source, target, and transformation settings to all instances of the same type in the session. You can also apply settings to all partitions in a pipeline. You can apply reader or writer settings, connection settings, and properties settings.

For example, you might need to change a relational connection from a test to a production database for all the target instances in a session. You can change the connection value for one target in a session and apply the connection to the other relational target objects.


Figure 7-2 shows the writers, connections, and properties settings for a target instance in a session:

Table 7-1 shows the options you can use to apply attributes to objects in a session. You can apply different options depending on whether the setting is a reader or writer, connection, or an object property.

Figure 7-2. Session Target Object Settings

Table 7-1. Apply All Options

Setting Option Description

Reader Writer

Apply Type to All Instances Applies a reader or writer type to all instances of the same object type in the session. For example, you can apply a relational reader type to all the other readers in your session.

Reader Writer

Apply Type to All Partitions Applies a reader or writer type to all the partitions in a pipeline. For example, if you have four partitions, you can change the writer type in one partition for a target instance. Then you can use this option to apply the change to the other three partitions.

Connections Apply Connection Type Applies the same type of connection to all instances. Connection types are relational, FTP, queue, application, or external loader.

For a target instance, you can change writers, connections, and properties settings.


Applying Connection SettingsWhen you apply connection settings you can apply the connection type, connection value, and connection attributes. You can only apply a connection value that is valid for a connection type unless you choose the Apply All Connection Information option. For example, if a target instance uses an FTP connection, you can only choose an FTP connection value to apply to it. The Apply All Connection Information option enables you to apply a new connection type, connection value, and connection attributes.

Connections Apply Connection Value Apply a connection value to all instances or partitions. The connection value defines a specific connection that you can view in the connection browser. You can only apply a connection value that is valid for the existing connection type.

Connections Apply Connection Attributes Apply only the connection attribute values to all instances or partitions. Each type of connection has different attributes. You can apply connection attributes separately from connection values. To view sample connection attributes, see Figure 7-3 on page 181.

Connections Apply Connection Data Apply the connection value and its connection attributes to all the other instances that have the same connection type. This option combines the connection option and the connection attribute option.

Connections Apply All Connection Information

Applies the connection value and its attributes to all the other instances even if they do not have the same connection type. This option is similar to Apply Connection Data, but it allows you to change the connection type.

Properties Apply Attribute to all Instances

Applies an attribute value to all instances of the same object type in the session. For example, if you have a relational target you can choose to truncate a table before you load data. You can apply the attribute value to all the relational targets in your session.

Properties Apply Attribute to all Partitions

Applies an attribute value to all partitions in a pipeline. For example, you can change the name of the reject file name in one partition for a target instance, then apply the file name change to the other partitions.

Table 7-1. Apply All Options

Setting Option Description


Figure 7-3 illustrates the connection options by showing where they display on a connection browser:

Applying Attributes to Partitions or InstancesWhen you apply attributes to all instances or partitions in a session, you must open the session and edit one of the session objects. You apply attributes or properties to other instances by choosing an attribute in that object and selecting to apply its value to the other instances or partitions.

To apply attributes to all instances or partitions:

1. Open a session in the workspace.

2. Click the Mappings tab.

3. Choose a source, target, or transformation instance from the Navigator. Settings for properties, connections, and readers or writers might display, depending on the object you choose.

Figure 7-3. Connection Options

The connection type can be relational, FTP, queue, application, or external loader.

The connection value defines a specific connection.

Connection attributes are different for each connection type.


4. Right-click a reader, writer, property, or connection value. A list of options display.

5. Select an option from the list and choose to apply it to all instances or all partitions.

6. Click OK to apply the attribute or property.


Creating a Session Configuration Object

The Config Object tab in the session properties includes commit and load settings, log options, and error handling settings. The Workflow Manager allows you to create a reusable set of attributes for the Config Object tab. When you configure attributes in the Config Object tab, you can specify a session configuration object you already created. Or, you can specify the default session configuration object called default_session_config. Override the attributes in the session configuration object in the Config Object tab.

Figure 7-4 shows the Config Object tab of the session properties:

Click the Browse button in the Config Name field to choose a session configuration. Select a user-defined or default session configuration object from the browser.

To create a session configuration object:

1. In the Workflow Manager, click Tasks-Session Configuration.

Figure 7-4. Config Object Tab

Select a session configuration object.

Creating a Session Configuration Object 183

The Session Configuration Browser appears.

Figure 7-5 shows the Session Configuration Browser:

2. Click New to create a new session configuration object.

3. Enter a name for the session configuration object.

Figure 7-5. Session Configuration Browser


4. In the Properties tab, configure advanced settings, log options, and error handling options.

5. Click OK.

For session configuration object settings descriptions, see “Config Object Tab” on page 675.

Creating a Session Configuration Object 185

Using Pre- and Post-Session SQL Commands

You can specify pre- and post-session SQL in the Source Qualifier transformation and the target instance when you create a mapping. When you create a Session task in the Workflow Manager you can override the SQL commands on the Mapping tab. You might want to use these commands to drop indexes on the target before the session runs, and then recreate them when the session completes.

The PowerCenter Server executes pre-session SQL commands before it reads the source. It executes post-session SQL commands after it writes to the target.

Guidelines for Entering Pre- and Post-Session SQL CommandsRemember the following guidelines when creating the SQL statements:

♦ You can use any command that is valid for the database type. However, the PowerCenter Server does not allow nested comments, even though the database might.

♦ You can use mapping parameters and variables in SQL executed against the source, but not the target.

♦ Use a semi-colon (;) to separate multiple statements.

♦ The PowerCenter Server ignores semi-colons within single quotes, double quotes, or within /* ...*/.

♦ If you need to use a semi-colon outside of quotes or comments, you can escape it with a back slash (\).

♦ The Workflow Manager does not validate the SQL.

Error HandlingYou can configure error handling on the Config Object tab. You can choose to stop or continue the session if the PowerCenter Server encounters an error issuing the pre- or post- session SQL command.


Figure 7-6 shows how to configure error handling for a pre- or post-session SQL commands:

Figure 7-6. Stop or Continue the Session on Pre- or Post-Session SQL Errors

Stop or continue the session on pre- or post- session SQL error.

Using Pre- and Post-Session SQL Commands 187

Using Pre- or Post-Session Shell Commands

The PowerCenter Server can perform shell commands at the beginning of the session or at the end of the session. Shell commands are operating system commands. You can use pre- or post-session shell commands, for example, to delete a reject file or session log, or to archive target files before the session begins.

The Workflow Manager provides the following types of shell commands for each Session task:

♦ Pre-session command. The PowerCenter Server performs pre-session shell commands at the beginning of a session. You can configure a session to stop or continue if a pre-session shell command fails.

♦ Post-session success command. The PowerCenter Server performs post-session success commands only if the session completed successfully.

♦ Post-session failure command. The PowerCenter Server performs post-session failure commands only if the session failed to complete.

Use the following guidelines to call a shell command:

♦ Use any valid UNIX command or shell script for UNIX servers, or any valid DOS or batch file for Windows servers.

♦ Configure the session to execute the pre- or post-session shell commands.

The Workflow Manager provides a task called the Command task that allows you to specify shell commands anywhere in the workflow. You can choose a reusable Command task for the pre- or post-session shell command. Or, you can create non-reusable shell commands for the pre- or post-session shell commands. For details on the Command task, see “Working with the Command Task” on page 143.

If you create a non-reusable pre- or post-session shell command, you can make it into a reusable Command task.

The Workflow Manager allows you to choose from the following options when you configure shell commands:

♦ Create non-reusable shell commands. Create a non-reusable set of shell commands for the session. Other sessions in the folder cannot use this set of shell commands.

♦ Use an existing reusable Command task. Select an existing Command task to run as the pre- or post-session shell command.

Configure pre- and post-session shell commands in the Components tab of the session properties.

Using Server and Session VariablesYou can include any server variable, such as $PMTargetFileDir, or session variables in commands in pre-session and post-session commands. When you use a server variable instead of entering a specific directory, you can run the same workflow on different PowerCenter Servers without changing session properties. You cannot use server variables or session


variables in standalone Command tasks in the workflow. The PowerCenter Server does not expand server variables or session variables used in standalone Command tasks.

Configuring Non-Reusable Shell CommandsWhen you create non-reusable pre- or post-session shell commands, the commands are only visible in session properties. The Workflow Manager does not create Command tasks from these non-reusable commands. You can make non-reusable shell commands into a reusable Command tasks.

Figure 7-7 shows the Make Reusable option for a pre-session shell command:

Perform the following steps to create pre- or post-session shell commands for a specific session.

Figure 7-7. Make Reusable Option for Pre-Session Shell Commands

Make this shell command reusable.

Using Pre- or Post-Session Shell Commands 189

To create non-reusable shell commands:

1. In the Components tab of the session properties, select Non-reusable for pre- or post-session shell command.

2. Click the Edit button in the Value field to open the Edit Pre- or Post-Session Command dialog box.

3. Enter a name for the command in the General tab.

Edit pre-session commands.


4. If you want the PowerCenter Server to perform the next command only if the previous command completed successfully, select Run If Previous Completed in the Properties tab.

5. In the Commands tab, click the Add button to add shell commands.

Enter one command for each line.

6. Click OK.

Creating a Reusable Command Task from Pre- or Post-Session CommandsIf you create non-reusable pre- or post-session shell commands, you can make them into a reusable Command task. Once you make the pre- or post-session shell commands into a reusable Command task, you cannot revert back.

Add a command.


To create a Command Task from non-reusable pre- or post-session shell commands, click the Edit button to open the Edit dialog box for the shell commands. In the General tab, select the Make Reusable checkbox.

After you check the Make Reusable checkbox and click OK, a new Command task appears in the Tasks folder in the Navigator window. You can use this Command task in other workflows, just as you do with any other reusable workflow tasks.

Configuring Reusable Shell CommandsPerform the following steps to call an existing reusable Command task as the pre- or post-session shell command for the Session task.

To select an existing Command task as the pre-session shell command:

1. In the Components tab of the session properties, click Reusable for the pre- or post-session shell command.

2. Click the Edit button in the Value field to open the Task Browser dialog box.

3. Select the Command task you want to run as the pre- or post-session shell command.

4. Click the Override button in the Task Browser dialog box if you want to change the order of the commands, or if you want to specify whether to run the next command when the previous command fails.

Changes you make to the Command task from the session properties only apply to the session. In the session properties, you cannot edit the commands in the Command task.

5. Click OK to select the Command task for the pre- or post-session shell command.

The name of the Command task you select appears in the Value field for the shell command.


Using Server VariablesYou can include any server variable, such as $PMTargetFileDir, in pre- or post-session shell commands. When you use a server variable instead of entering a specific directory, you can run the same workflow on different PowerCenter Servers without changing session properties.

Pre-Session Shell Command ErrorsYou can configure the session to stop or continue if a pre-session shell command fails. If you select stop, the PowerCenter Server stops the session, but continues with the rest of the workflow. If you select Continue, the PowerCenter Server ignores the errors and continues the session. By default the PowerCenter Server stops the session upon shell command errors.

Configure the session to stop or continue if a pre-session shell command fails in the Error Handling settings on the Config Object tab.

Figure 7-8 shows how to configure the session to stop or continue when a pre-session shell command fails:

Figure 7-8. Stop or Continue the Session on Pre-Session Shell Command Error

Stop or continue the session on pre-session shell command error.


Using Post-Session Email

The PowerCenter Server can send emails after the session completes. You can send an email when the session completes successfully. Or, you can send an email when the session fails. The PowerCenter Server can send the following types of emails for each Session task:

♦ On-Success Email. The PowerCenter Server sends the email when the session completes successfully.

♦ On-Failure Email. The PowerCenter Server sends the email when the session fails.

You can also use an Email task to send email anywhere in the workflow. If you already created a reusable Email task, you can select it as the On-Success or On-Failure email for the session. Or, you can create non-reusable emails that exist only within the Session task.

For more information about sending post-session emails, see “Sending Email” on page 319.


Validating a Session

The Workflow Manager validates a Session task when you save it. You can also manually validate Session tasks and session instances. Validate reusable Session tasks in the Task Developer. Validate non-reusable sessions and reusable session instances in the Workflow Designer.

The Workflow Manager marks a reusable session or session instance invalid if you perform one of the following tasks:

♦ Edit the mapping in a way that might invalidate the session. You can edit the mapping used by a session at any time. When you edit and save a mapping, the repository might invalidate sessions that already use the mapping. The PowerCenter Server does not execute invalid sessions.

You must reconnect to the folder to see the effect of mapping changes on Session tasks. For details on validating mappings, see “Mappings” in the Designer Guide.

When you edit a session based on an invalid mapping, the Workflow Manager displays a warning message:

The mapping [mapping_name] associated with the session [session_name] is invalid.

♦ Delete a database, FTP, or external loader connection used by the session.

♦ Leave session attributes blank. For example, the session is invalid if you do not specify the source file name.

♦ Change the code page of a session database connection to an incompatible code page.

If you delete objects associated with a Session task such as session configuration object, Email, or Command task, the Workflow Manager marks a reusable session invalid. However, the Workflow Manager does not mark a non-reusable session invalid if you delete an object associated with the session.

If you delete a shortcut to a source or target from the mapping, the Workflow Manager does not mark the session invalid.

The Workflow Manager does not validate SQL overrides or filter conditions entered in the session properties when you validate a session. You must validate SQL override and filter conditions in the SQL Editor.

If a reusable session task is invalid, the Workflow Manager displays an invalid icon over the session task in the Navigator and in the Task Developer workspace. This does not affect the validity of the session instance and the workflows using the session instance.

If a reusable or non-reusable session instance is invalid, the Workflow Manager marks it invalid in the Navigator and in the Workflow Designer workspace. Workflows using the session instance remain valid.

To validate a session, select the session in the workspace and choose Tasks-Validate. Or, right-click the session instance in the workspace and choose Validate.

Validat ing a Session 195

Validating Multiple SessionsYou can validate multiple sessions without fetching them into the workspace. You must select and validate the sessions from a query results view or a view dependencies list. You can save and optionally check in sessions that change from invalid to valid status. For more information about validating multiple objects, see “Validating Multiple Objects” in the Repository Guide.

Note: If you are using the Repository Manager, you can select and validate multiple sessions from the Navigator.

To validate multiple sessions:

1. Select sessions from either a query list or a view dependencies list.

2. Right-click one of the selected sessions and choose Validate.

The Validate Objects dialog box displays.

3. Choose whether to save objects and check in objects that you validate.


Running the Session

By default, the PowerCenter Server you assign to a workflow runs all tasks. If you register multiple servers to a repository, you can override the PowerCenter Server at the session level.

In a server grid, the master server distributes the sessions to available worker servers. You can assign a PowerCenter Server to a session. The session always runs on the server you assigned to it. For more information about how a server grid distributes sessions, see “Distributing Sessions” on page 446.

Selecting a Server to Run the SessionYou can choose a server to run the session. If you only register one server, the Workflow Manager lists the single registered PowerCenter Server that runs the workflow and session. For PowerCenter repositories with multiple servers, the Workflow Manager lists all servers.

To select a server to run a session:

1. Open a session in a workflow.

2. Double-click the session in the workflow. The Edit Tasks dialog box appears.

3. Click the Select Server button on the General tab. A list of registered servers appear.

4. Select a server to run the session.

Select a server.

Running the Session 197

5. Click OK twice to select the server for the session.

Instead of choosing a server for each session in the folder, you can assign multiple sessions to a server.

Assigning the PowerCenter Server to a SessionAfter you register the PowerCenter Server, you can assign it to sessions you want to run on that server. This allows you to assign the PowerCenter Server to multiple sessions without editing each session property individually. To assign the PowerCenter Server to multiple sessions, you must first close all folders in the repository.

To assign the PowerCenter Server to sessions, you must have the Super User privilege.

Figure 7-9 shows the Assign Server dialog box:

To assign the PowerCenter Server:



or

Right-click the server name in the Navigator and choose Assign Server. The Assign Server dialog box opens.

3. From the Choose Server list, select the server you want to assign.

Figure 7-9. Assign Server Dialog Box

Show sessions.

Assign a server to a session.

Select a server to assign.

Select a folder.



5. Select the Show Sessions check box.

6. Select each session you want to run on the PowerCenter Server.

7. Click Assign.

You can remove an assigned server from a session in the Assign Server dialog box. Perform the following steps to remove an assigned server from a session.

To remove an assigned server:



3. From the Choose Server list, select None.


5. Select the sessions from which you want to remove the assigned server.

6. Click Assign.

Running the Session 199

Stopping and Aborting a Session

You can stop or abort a session just as you can stop or abort any task. You can also abort a session by using the ABORT() function in the mapping logic. Session errors can cause the PowerCenter Server to stop a session early. You can control the stopping point by setting an error threshold in a session, using the ABORT function in mappings, or requesting the PowerCenter Server to stop the session. You cannot control the stopping point when the PowerCenter Server encounters fatal errors, such as loss of connection to the target database.

If a session fails as a result of error, you can consider performing session recovery. For more information on recovery, see “Recovering a Session Task” on page 311. For more information on row error logging, see “Overview” on page 482.

Threshold ErrorsYou can choose to stop a session on a designated number of non-fatal errors. A non-fatal error is an error that does not force the session to stop on its first occurrence. Establish the error threshold in the session properties with the Stop On option. When you enable this option, the PowerCenter Server counts non-fatal errors that occur in the reader, writer, and transformation threads.

The PowerCenter Server maintains an independent error count when reading sources, transforming data, and writing to targets. The PowerCenter Server counts the following non-fatal errors when you set the stop on option in the session properties:

♦ Reader errors. Errors encountered by the PowerCenter Server while reading the source database or source files. Reader threshold errors can include alignment errors while running a session in Unicode mode.

♦ Writer errors. Errors encountered by the PowerCenter Server while writing to the target database or target files. Writer threshold errors can include key constraint violations, loading nulls into a not null field, and database trigger responses.

♦ Transformation errors. Errors encountered by the PowerCenter Server while transforming data. Transformation threshold errors can include conversion errors, and any condition set up as an ERROR, such as null input.

When you create multiple partitions in a pipeline, the PowerCenter Server maintains a separate error threshold for each partition. When the PowerCenter Server reaches the error threshold for any partition, it stops the session. The writer may continue writing data from one or more partitions, but it does not affect your ability to perform a successful recovery.

Note: If alignment errors occur in a non line-sequential VSAM file, the PowerCenter Server sets the error threshold to 1 and stops the session.

Fatal ErrorA fatal error occurs when the PowerCenter Server cannot access the source, target, or repository. This can include loss of connection or target database errors, such as lack of


database space to load data. If the session uses a Normalizer or Sequence Generator transformation, the PowerCenter Server cannot update the sequence values in the repository, and a fatal error occurs.

If the session does not use a Normalizer or Sequence Generator transformation, and the PowerCenter Server loses connection to the repository, the PowerCenter Server does not stop the session. The session completes, but the PowerCenter Server cannot log session statistics into the repository.

ABORT FunctionUse the ABORT function in the mapping logic to abort a session when the PowerCenter Server encounters a designated transformation error.

For more information about ABORT, see “Functions” in the Transformation Language Reference.

User CommandYou can stop or abort the session from the Workflow Manager. You can also stop the session using pmcmd.

PowerCenter Server Handling for Session FailureThe PowerCenter Server handles session errors in different ways, depending on the error or event that causes the session to fail.

Table 7-2 describes the PowerCenter Server behavior when a session fails:

Table 7-2. PowerCenter Server Behavior for Failed Sessions

Cause for Session Errors PowerCenter Server Behavior

- Error threshold met due to reader errors- Stop command using Workflow Manager or

pmcmd

The PowerCenter Server performs the following tasks:- Stops reading. - Continues processing data. - Continues writing and committing data to targets.If the PowerCenter Server cannot finish processing and committing data, you need to issue the Abort command to stop the session.

Abort command using Workflow Manager The PowerCenter Server performs the following tasks:- Stops reading. - Continues processing data.- Continues writing and committing data to targets.If the PowerCenter Server cannot finish processing and committing data within 60 seconds, it kills the PowerCenter Server process.

Stopping and Aborting a Session 201

- Fatal error from database- Error threshold met due to writer errors

The PowerCenter Server performs the following tasks:- Stops reading and writing.- Rolls back all data not committed to the target database.If the session stops due to fatal error, the commit or rollback may or may not be successful.

- Error threshold met due to transformation errors- ABORT( )- Invalid evaluation of transaction control

expression

The PowerCenter Server performs the following tasks:- Stops reading. - Flags the row as an abort row and continues processing data.- Continues to write to the target database until it hits the abort row.- Issues commits based on commit intervals.- Rolls back all data not committed to the target database.

Table 7-2. PowerCenter Server Behavior for Failed Sessions

Cause for Session Errors PowerCenter Server Behavior


Mapping Parameters and Variables in Sessions

You can use mapping parameters in the session properties to alter certain mapping attributes. For example, you can use a mapping parameter in a transformation override to override a filter or user-defined join in a Source Qualifier transformation.

If you use mapping variables in a session, you can clear any of the variable values saved in the repository by editing the session. When you clear the variable values, the PowerCenter Server uses the values in the parameter file the next time you run a session. If the session does not use a parameter file, the PowerCenter Server uses the initial values defined in the mapping. For more information on mapping variables, see “Mapping Parameters and Variables” in the Designer Guide.

To view or delete values for mapping variables saved in the repository:

1. In the Navigator window of the Workflow Manager, right-click the Session task and select View Persistent Values.

2. Click Delete Values to delete existing variable values.

3. To save changes, click OK.

Mapping Parameters and Variables in Sessions 203

Handling High Precision Data

The PowerCenter Server processes decimal values as Doubles or Decimals. When you create a session, you choose to enable the Decimal datatype or let the PowerCenter Server process the data as a Double (precision of 15).

To enable high precision data handling:

♦ Use the Decimal datatype with a precision of 16 to 28 in the mapping.

♦ Select Enable High Precision in the session properties.

The precision attributed to a number also includes the scale of the number. For example, the value 11.47 has a precision of 4 and a scale of 2.

For example, you might have a mapping with Decimal (20,0) that passes the number 40012030304957666903. If you enable high precision, the PowerCenter Server passes the number as is. If you do not enable high precision, the PowerCenter Server passes 4.00120303049577 x 1019.

If you want to process a Decimal value with a precision greater than 28 digits, the PowerCenter Server automatically treats it as a Double value. For example, if you want to process the number 2345678904598383902092.1927658, which has a precision of 29 digits, the PowerCenter Server automatically treats this number as a Double value of 2.34567890459838 x 1021.

To use high precision data handling in a session:

1. In the Workflow Manager, open the session properties.


2. On the Properties tab, select Enable High Precision.

3. Click OK twice to save changes.

Enable High Precision

Handling High Precision Data 205


C h a p t e r 8

Working with Sources


♦ Overview, 208

♦ Configuring Sources in a Session, 210

♦ Working with Relational Sources, 214

♦ Working with File Sources, 218

♦ Server Handling for File Sources, 226

♦ Server Handling for File Sources, 226

♦ Using a File List, 230

207

Overview

In the Workflow Manager, you can create sessions with the following sources:

♦ Relational. You can extract data from any relational database that the PowerCenter Server can connect to. When extracting data from relational sources and Application sources, you must configure the database connection to the data source prior to configuring the session.

♦ File. You can create a session to extract data from a flat file, COBOL, or XML source. The PowerCenter Server can extract data from any local directory or FTP connection for the source file. If the file source requires an FTP connection, you need to configure the FTP connection to the host machine before you create the session.

♦ Heterogeneous. You can extract data from multiple sources in the same session. You can extract from multiple relational sources, such as Oracle and SQL Server. Or, you can extract from multiple source types, such as relational and flat file. When you configure a session with heterogeneous sources, configure each source instance separately.

Globalization FeaturesYou can choose a code page that you want the PowerCenter Server to use for relational sources and flat files. You specify code pages for relational sources when you configure database connections in the Workflow Manager. You can set the code page for file sources in the session properties. For more information about code pages, see “Globalization Overview” in the Installation and Configuration Guide.

Source ConnectionsBefore you can extract data from a source, you must configure the connection properties the PowerCenter Server uses to connect to the source file or database. You can configure source database and FTP connections in the Workflow Manager.

For more information on creating database connections, see “Configuring the Workflow Manager” on page 37. For more information on creating FTP connections, see “Using FTP” on page 559.

Permissions and PrivilegesYou must have read permissions for the connections you use in the session. For example, if the source requires database connections or FTP connections, you must have permission to read those connections in the session.

Allocating Buffer MemoryWhen the PowerCenter Server initializes a session, it allocates blocks of memory to hold source and target data. The PowerCenter Server allocates at least two blocks for each source and target partition. Sessions that use a large number of sources or targets might require

208 Chapter 8: Working with Sources

additional memory blocks. If the PowerCenter Server cannot allocate enough memory blocks to hold the data, it fails the session.

For more information on allocating buffer memory, see “Optimizing the Session” on page 655.

Partitioning SourcesYou can create multiple partitions for relational, Application, and file sources. For relational or Application sources, the PowerCenter Server creates a separate connection to the source database for each partition you set in the session properties. For file sources, you can configure the session to read the source with one thread or multiple threads.

For more information on partitioning data, see “Pipeline Partitioning” on page 345.

Overview 209

Configuring Sources in a Session

Configure source properties for sessions in the Sources node of the Mapping tab of the session properties. When you configure source properties for a session, you define properties for each source instance in the mapping.

Figure 8-1 shows the Sources node on the Mapping tab:

The Sources node lists the sources used in the session and displays their settings. To view and configure settings for a source, select the source from the list. You can configure the following settings for a source:

♦ Readers

♦ Connections

♦ Properties

Configuring ReadersYou can click the Readers settings on the Sources node to view the reader the PowerCenter Server uses with each source instance. The Workflow Manager specifies the necessary reader for each source instance in the Readers settings on the Sources node.

Figure 8-1. Sources Node of the Session Properties


Figure 8-2 shows the Readers settings in the Sources node of the Mapping tab:

Configuring ConnectionsClick the Connections settings on the Sources node to define source connection information.

Figure 8-2. Readers Settings in the Sources Node of the Mapping Tab

Configuring Sources in a Session 211

Figure 8-3 shows the Connections settings in the Sources node of the Mapping tab:

For relational sources, choose a configured database connection in the Value column for each relational source instance. By default, the Workflow Manager displays the source type for relational sources. For details on configuring database connections, see “Selecting the Source Database Connection” on page 214.

For flat file and XML sources, choose one of the following source connection types in the Type column for each source instance:

♦ FTP. If you want to read data from a flat file or XML source using FTP, you must specify an FTP connection when you configure source options. You must define the FTP connection in the Workflow Manager prior to configuring the session.

You must have read permission for any FTP connection you want to associate with the session. The user starting the session must have execute permission for any FTP connection associated with the session. For details on using FTP, see “Using FTP” on page 559.

♦ None. Choose None when you want to read from a local flat file or XML file.

Configuring PropertiesClick the Properties settings in the Sources node to define source property information. The Workflow Manager displays properties, such as source file name and location for flat file,

Figure 8-3. Connections Settings in the Sources Node

Choose a connection.

Edit a connection.


COBOL, and XML source file types. You do not need to define any properties on the Properties settings for relational sources.

Figure 8-4 shows the Properties settings in the Sources node of the Mapping tab:

For more information on configuring sessions with relational sources, see “Working with Relational Sources” on page 214. For more information on configuring sessions with flat file sources, see “Working with File Sources” on page 218. For more information on configuring sessions with XML sources, see the XML User Guide.

Figure 8-4. Properties Settings in the Sources Node of the Mapping Tab

Configuring Sources in a Session 213

Working with Relational Sources

When you configure a session to read data from a relational source, you can configure the following properties for sources:

♦ Source database connection. Select the database connection for each relational source. For more information, see “Selecting the Source Database Connection” on page 214.

♦ Treat source rows as. Define how the PowerCenter Server treats each source row as it reads it from the source table. For more information, see “Defining the Treat Source Rows As Property” on page 214.

♦ Table owner name. Define the table owner name for each relational source. For more information, see “Configuring the Table Owner Name” on page 216.

♦ Override SQL query. You can override the default SQL query to extract source data. For more information, see “Overriding the SQL Query” on page 216.

Selecting the Source Database ConnectionBefore you can run a session to read data from a source database, the PowerCenter Server must connect to the source database. Database connections must exist in the repository to appear on the source database list. You must define them prior to configuring a session. For details on configuring a database connection, see “Setting Up a Relational Database Connection” on page 53.

On the Connections settings in the Sources node, select the database connection from the list. You must have read permission for the source database connection to configure the session to use it. The user starting the configured session must have execute permission for source database connections.

Defining the Treat Source Rows As PropertyWhen the PowerCenter Server reads a source, it marks each row with an indicator to specify which operation to perform when the row reaches the target. You can define how the PowerCenter Server marks each row using the Treat Source Rows As property in the General Options settings on the Properties tab.


Figure 8-5 shows the Treat Source Rows As property on the General Options settings:

Table 8-1 describes the options you can choose for the Treat Source Rows As property:

Once you determine how to treat all rows in the session, you also need to set update strategy options for individual targets. For more information on setting the target update strategy options, see “Target Properties” on page 241.

For more information on setting the update strategy for a session, see “Update Strategy Transformation” in the Transformation Guide.

Figure 8-5. Treat Source Rows As Property

Table 8-1. Treat Source Rows As Options

Treat Source Rows As Option Description

Insert The PowerCenter Server marks all rows to insert into the target.

Delete The PowerCenter Server marks all rows to delete from the target.

Update The PowerCenter Server marks all rows to update the target. You can further define the update operation in the target options. For more information, see �Target Properties� on page 241.

Data Driven The PowerCenter Server uses the Update Strategy transformations in the mapping to determine the operation on a row-by-row basis. You define the update operation in the target options. If the mapping contains an Update Strategy transformation, this option defaults to Data Driven. You can also use this option when the mapping contains Custom transformations configured to set the update strategy.

Treat Source Rows As Property

Working with Relational Sources 215

Configuring the Table Owner NameYou can define the owner name of the source table in the session properties. For some databases such as DB2, tables can have different owners. If the database user specified in the database connection is not the owner of the source tables in a session, specify the table owner for each source instance. A session can fail if the database user is not the owner and you do not specify the table owner name.

Specify the table owner name in the Owner Name field in the Properties settings in the Sources node.

Figure 8-6 shows the Properties settings where you define the table owner name for relational sources:

Overriding the SQL QueryYou can alter or override the default query in the mapping by entering SQL override in the Properties settings in the Sources node. You can enter any SQL statement supported by the source database.

The Workflow Manager does not validate the SQL override. The following errors could cause the session to fail, and possibly cause data errors:

♦ Fields with incompatible datatypes or unknown fields

♦ Typing mistakes or other errors

Figure 8-6. Source Table Owner Name Property

Owner Name


Figure 8-7 shows the Properties settings in the Sources node where you can override the SQL query:

To override the default query for a relational source:


2. Click the Mapping tab and open the Transformations view.

3. Click the Sources node and open the Properties settings.

4. Click the Open button in the SQL Query field to open the SQL Editor.

5. Enter the SQL override.

6. Click OK to return to the session properties.

Figure 8-7. SQL Query Override Property in the Session Properties

SQL Query

Working with Relational Sources 217

Working with File Sources

You can create a session to extract data from flat file or COBOL sources. When you create a session to read data from a flat file or COBOL file, you can configure the following information in the session properties:

♦ Source properties. You can define source properties on the Properties settings in the Sources node, such as source file options. For more information, see “Configuring Source Properties” on page 218.

♦ Flat file properties. You can edit fixed-width and delimited source file properties. For more information, see “Configuring Fixed-Width File Properties” on page 220 and “Configuring Delimited File Properties” on page 222.

♦ Line sequential buffer length. You can change the buffer length for flat files on the Advanced settings on the Config Object tab. For more information, see “Configuring Line Sequential Buffer Length” on page 225.

♦ Treat source rows as. Define how the PowerCenter Server treats each source row as it reads it from the source. For more information, see “Defining the Treat Source Rows As Property” on page 214.

Configuring Source Properties You can define session source properties on the Properties settings in the Sources node.


Figure 8-8 shows the flat file source properties you define in the Properties settings of the Sources node on the Mapping tab:

Figure 8-8. Properties Settings in the Sources Node for a Flat File Source

Working with File Sources 219

Table 8-2 describes the properties you define on the Properties settings for flat file source definitions:

Configuring Fixed-Width File PropertiesWhen you read data from a fixed-width file, you can edit file properties in the session, such as the null character or code page. You can configure fixed-width properties for non-reusable sessions in the Workflow Designer and for reusable sessions in the Task Developer. You cannot configure fixed-width properties for instances of reusable sessions in the Workflow Designer.

Click Set File Properties to open the Flat Files dialog box.

Table 8-2. Flat File Source Properties

File Source Options


Source File Directory

Optional Enter the directory name in this field. By default, the PowerCenter Server looks in the server variable directory, $PMSourceFileDir, for file sources.If you specify both the directory and file name in the Source Filename field, clear this field. The PowerCenter Server concatenates this field with the Source Filename field when it runs the session.You can also use the $InputFileName session parameter to specify the file directory. For details on session parameters, see �Session Parameters� on page 495.

Source Filename Required Enter the file name, or file name and path. Optionally use the $InputFileName session parameter for the file name. The PowerCenter Server concatenates this field with the Source File Directory field when it runs the session. For example, if you have �C:\data\� in the Source File Directory field, then enter �filename.dat� in the Source Filename field. When the PowerCenter Server begins the session, it looks for �C:\data\filename.dat�. By default, the Workflow Manager enters the file name configured in the source definition.For details on session parameters, see �Session Parameters� on page 495.

Source Filetype Required Allows you to configure multiple file sources using a file list. Indicates whether the source file contains the source data, or whether it contains a list of files with the exact same file properties. Choose Direct if the source file contains the source data. Choose Indirect if the source file contains a list of files. When you select Indirect, the PowerCenter Server finds the file list and reads each listed file when it runs the session. For details on file lists, see �Using a File List� on page 230.

Set File Properties link

Optional Opens a dialog box that allows you to override source file properties. By default, the Workflow Manager displays file properties as configured in the source definition.For more information, see �Configuring Fixed-Width File Properties� on page 220 and �Configuring Delimited File Properties� on page 222.


Figure 8-9 shows the Flat Files dialog box:

To edit the fixed-width properties, select Fixed Width and click Advanced. The Fixed-Width Properties dialog box appears. By default, the Workflow Manager displays file properties as configured in the mapping. Edit these settings to override those configured in the source definition.

Figure 8-10 shows the Fixed-Width Properties dialog box:

Figure 8-9. Flat Files Dialog Box

Figure 8-10. Fixed-Width File Properties Dialog Box


Table 8-3 describes options you can define in the Fixed Width Properties dialog box for file sources:

Configuring Delimited File PropertiesWhen you read data from a delimited file, you can edit file properties in the session, such as the delimiter or code page. You can configure delimited properties for non-reusable sessions in the Workflow Designer and for reusable sessions in the Task Developer. You cannot configure delimited properties for instances of reusable sessions in the Workflow Designer.

Click Set File Properties to open the Flat Files dialog box.

Table 8-3. Fixed-Width File Properties for File Sources

Fixed-Width Properties Options


Text/Binary Required Indicates the character representing a null value in the file. This can be any valid character in the file code page, or any binary value from 0 to 255. For more information about specifying null characters, see �Null Character Handling� on page 227.

Repeat Null Character

Optional If selected, the PowerCenter Server reads repeat NULL characters in a single field as a single NULL value. If you do not select this option, the PowerCenter Server reads a single null character at the beginning of a field as a null field.Important: For multibyte code pages, Informatica recommends that you specify a single-byte null character if you are using repeating non-binary null characters. This ensures that repeating null characters fit into the column exactly. For more information about specifying null characters, see �Null Character Handling� on page 227.

Code Page Required Select the code page of the fixed-width file. The default setting is the client code page.

Number of Initial Rows to Skip

Optional The PowerCenter Server skips the specified number of rows before reading the file. Use this to skip header rows. One row may contain multiple records. If you select the Line Sequential File Format option, the PowerCenter Server ignores this option.

Number of Bytes to Skip Between Records

Optional The PowerCenter Server skips the specified number of bytes between records. For example, you have an ASCII file on Windows with one record on each line, and a carriage return and line feed appear at the end of each line. If you want the PowerCenter Server to skip these two single-byte characters, enter 2.If you have an ASCII file on UNIX with one record for each line, ending in a carriage return, skip the single character by entering 1.

Strip Trailing Blanks Optional If selected, the PowerCenter Server strips trailing blank spaces from records before passing them to the Source Qualifier transformation.

Line Sequential File Format

Optional Select this option if the file uses a carriage return at the end of each record, shortening the final column.



To edit the delimited properties, select Delimited and click Advanced. The Delimited File Properties dialog box appears. By default, the Workflow Manager displays file properties as configured in the mapping. Edit these settings to override those configured in the source definition.

Figure 8-12 shows the Delimited File Properties dialog box:


Figure 8-12. Delimited File Properties Dialog Box


Table 8-4 describes options you can define in the Delimited File Properties dialog box for file sources:

Table 8-4. Delimited File Properties for File Sources

Delimited File Properties Options


Delimiters Required Character used to separate columns of data in the source file. Use the button to the right of this field to enter a different delimiter. Delimiters can be either printable or single-byte unprintable characters, and must be different from the escape character and the quote character (if selected). You cannot select unprintable multibyte characters as delimiters. The delimiter must be in the same code page as the flat file code page.

Treat Consecutive Delimiters as One

Optional By default, the PowerCenter Server reads pairs of delimiters as a null value. If selected, the PowerCenter Server reads any number of consecutive delimiter characters as one.For example, a source file uses a comma as the delimiter character and contains the following record: 56, , , Jane Doe. By default, the PowerCenter Server reads that record as four columns separated by three delimiters: 56, NULL, NULL, Jane Doe. If you select this option, the PowerCenter Server reads the record as two columns separated by one delimiter: 56, Jane Doe.

Optional Quotes Required Select No Quotes, Single Quote, or Double Quotes. If you select a quote character, the PowerCenter Server ignores delimiter characters within the quote characters. Therefore, the PowerCenter Server uses quote characters to escape the delimiter. For example, a source file uses a comma as a delimiter and contains the following row: 342-3849, ‘Smith, Jenna’, ‘Rockville, MD’, 6. If you select the optional single quote character, the PowerCenter Server ignores the commas within the quotes and reads the row as four fields.If you do not select the optional single quote, the PowerCenter Server reads six separate fields.When the PowerCenter Server reads two optional quote characters within a quoted string, it treats them as one quote character. For example, the PowerCenter Server reads the following quoted string as I’m going tomorrow:

2353, �I��m going tomorrow.�, MDAdditionally, if you select an optional quote character, the PowerCenter Server only reads a string as a quoted string if the quote character is the first character of the field.Note: You can improve session performance if the source file does not contain quotes or escape characters.

Code Page Required Select the code page of the delimited file. The default setting is the client code page.

Escape Character Optional Character immediately preceding a delimiter character embedded in an unquoted string, or immediately preceding the quote character in a quoted string. When you specify an escape character, the PowerCenter Server reads the delimiter character as a regular character (called escaping the delimiter or quote character). Note: You can improve session performance for mappings containing Sequence Generator transformations if the source file does not contain quotes or escape characters.


Configuring Line Sequential Buffer LengthYou can configure the line buffer length for file sources. By default, the PowerCenter Server reads a file record into a buffer that holds 1024 bytes. If the source file records are larger than 1024 bytes, increase the Line Sequential Buffer Length property in the session properties accordingly.

Figure 8-13 shows the Advanced settings on the Config Object tab in the session properties where you define the line buffer length:

Remove Escape Character From Data

Optional This option is selected by default. Clear this option to include the escape character in the output string.


Optional The PowerCenter Server skips the specified number of rows before reading the file. Use this to skip title or header rows in the file.

Figure 8-13. Line Sequential Buffer Length Property for File Sources

Table 8-4. Delimited File Properties for File Sources



Line Sequential Buffer Length


Server Handling for File Sources

When you configure a session with file sources, you might take these additional features into account when creating mappings with file sources:

♦ Character set

♦ Multibyte character error handling

♦ Null character handling

♦ Row length handling for fixed-width flat files

♦ Numeric data handling

♦ Tab handling

Character SetYou can configure the PowerCenter Server to run sessions in either ASCII or Unicode data movement mode.

Table 8-5 describes source file formats supported by each data movement path in PowerCenter:

If you configure a session to run in ASCII data movement mode, delimiters, escape characters, and null characters must be valid in the ISO Western European Latin 1 code page. Any 8-bit characters you specified in previous versions of PowerCenter are still valid. In Unicode data movement mode, delimiters, escape characters, and null characters must be valid in the specified code page of the flat file.

For more information about configuring and working with data movement modes, see “Globalization Overview” in the Installation and Configuration Guide.

Table 8-5. Support for ASCII and Unicode Data Movement Modes

Character Set Unicode mode ASCII mode

7-bit ASCII Supported Supported

US-EBCDIC(COBOL sources only)

Supported Supported


8-bit EBCDIC(COBOL sources only)

Supported Supported

ASCII-based MBCS Supported PowerCenter Server generates a warning message.

EBCDIC-based MBCS Supported Not supported. The PowerCenter Server terminates the session.


Multibyte Character Error HandlingMisalignment of multibyte data in a file causes session errors. Data becomes misaligned when you place column breaks incorrectly in a file, resulting in multibyte characters that extend beyond the last byte in a column.

When you import a fixed-width flat file, you can create, move, or delete column breaks using the Flat File Wizard. Incorrect positioning of column breaks can create alignment errors when you run a session containing multibyte characters.

The PowerCenter Server handles alignment errors in fixed-width flat files according to the following guidelines:

♦ Non-line sequential file. The PowerCenter Server skips rows containing misaligned data and resumes reading the next row. The skipped row appears in the session log with a corresponding error message. If an alignment error occurs at the end of a row, the PowerCenter Server skips both the current row and the next row, and writes them to the session log.

♦ Line sequential file. The PowerCenter Server skips rows containing misaligned data and resumes reading the next row. The skipped row appears in the session log with a corresponding error message.

♦ Reader error threshold. You can configure a session to stop after a specified number of non-fatal errors. A row containing an alignment error increases the error count by 1. The session stops if the number of rows containing errors reaches the threshold set in the session properties. Errors and corresponding error messages appear in the session log file.

Fixed-width COBOL sources are always byte-oriented and can be line sequential. The PowerCenter Server handles COBOL files according to the following guidelines:

♦ Line sequential files. The PowerCenter Server skips rows containing misaligned data and writes the skipped rows to the session log. The session stops if the number of error rows reaches the error threshold.

♦ Non-line sequential files. The session stops at the first row containing misaligned data.

Null Character HandlingYou can specify single-byte or multibyte null characters for fixed-width flat files. The PowerCenter Server uses these characters to determine if a column is null.

Server Handling for File Sources 227

Table 8-6 describes how the PowerCenter Server uses the Null Character and Repeat Null Character properties to determine if a column is null:

Row Length Handling for Fixed-Width Flat FilesFor fixed-width flat files, data in a row can be shorter than the row length in the following situations:

♦ The file is fixed-width line-sequential with a carriage return or line feed that appears sooner than expected.

♦ The file is fixed-width non-line sequential, and the last line in the file is shorter than expected.

In these cases, the PowerCenter Server reads the data but does not append any blanks to fill the remaining bytes. The PowerCenter Server reads subsequent fields as NULL. Fields containing repeating null characters that do not fill the entire field length are not considered NULL.

Table 8-6. Null Character Handling

Null Character

Repeat Null Character PowerCenter Server Behavior

Binary Disabled A column is null if the first byte in the column is the binary null character. The PowerCenter Server reads the rest of the column as text data only to determine the column alignment and track the shift state for shift sensitive code pages. If data in the column is misaligned, the PowerCenter Server skips the row and writes the skipped row and a corresponding error message to the session log.

Non-binary Disabled A column is null if the first character in the column is the null character. The PowerCenter Server reads the rest of the column only to determine the column alignment and track the shift state for shift sensitive code pages. If data in the column is misaligned, the PowerCenter Server skips the row and writes the skipped row and a corresponding error message to the session log.

Binary Enabled A column is null if it contains only the specified binary null character. The next column inherits the initial shift state of the code page.

Non-binary Enabled A column is null if the repeating null character fits into the column exactly, with no bytes leftover. For example, a five-byte column is not null if you specify a two-byte repeating null character. In shift-sensitive code pages, shift bytes do not affect the null value of a column. A column is still null if it contains a shift byte at the beginning or end of the column.Informatica recommends you specify a single-byte null character if you use repeating non-binary null characters. This ensures that repeating null characters fit into a column exactly.


Numeric Data HandlingSometimes, file sources contain non-numeric data in numeric columns. When the PowerCenter Server reads non-numeric data, it treats the row differently, depending on the source type. When the PowerCenter Server reads non-numeric data from numeric columns in a flat file source or an XML source, it drops the row and writes the row to the session log. When the PowerCenter Server reads non-numeric data for numeric columns in a COBOL source, it reads a null value for the column.

Server Handling for File Sources 229

Using a File List

You can create a session to run multiple source files for one source instance in the mapping. You might use this feature if, for example, your company collects data at several locations which you then want to move through the same session. When you create a mapping to use multiple source files for one source instance, the properties of all files must exactly match the source definition.

To use multiple source files, you create a file containing the names and directories of each source file you want the PowerCenter Server to use. This file is referred to as a file list.

When you configure the session properties, enter the file name of the file list in the Source Filename field and enter the location of the file list in the Source File Directory field. When the session starts, the PowerCenter Server reads the file list, then locates and reads the first file source in the list. After the PowerCenter Server reads the first file, it locates and reads the next file in the list.

The PowerCenter Server writes the path and name of the file list to the session log. If the PowerCenter Server encounters an error while accessing a source file, it logs the error in the session log and stops the session.

Note: When you use a file list and the session performs incremental aggregation, the PowerCenter Server performs incremental aggregation across all listed source files.

Creating the File ListThe file list contains the names of all the source files you want the PowerCenter Server to use for the source instance in the session. Create the file list in an editor appropriate to the PowerCenter Server platform and save it as a text file. For example, you can create a file list for a PowerCenter Server on Windows with any text editor then save it as ASCII.

The PowerCenter Server interprets the file list using the PowerCenter Server code page. Each file in the list must use the user-defined code page configured in the source definition. This code page must be a subset of the repository code page.

Each file in the file list must share the same file properties as configured in the source definition or as entered for the source instance in the session property sheet. You can enter different paths for each file in the list, but for the session to complete successfully, the paths must be local to the PowerCenter Server machine. Map the drives on a PowerCenter Server on Windows or mount the drives on a PowerCenter Server on UNIX, as necessary. If you do not specify a path for a file, the PowerCenter Server assumes the file is in the same directory as the file list.

The file list format must follow the following guidelines:

♦ Text file

♦ One file name, or path and file name, for each line


The PowerCenter Server skips blank lines and ignores leading blank spaces. Any characters indicating a new line, such as \n in ASCII files, must be valid in the code page of the PowerCenter Server.

The following example shows a valid file list created for a PowerCenter Server on Windows. Each of the drives listed are mapped on the server machine. The western_trans.dat file is located in the same directory as the file list.

western_trans.dat

d:\data\eastern_trans.dat

e:\data\midwest_trans.dat

f:\data\canada_trans.dat

Once you create the file list, place it in a directory local to the PowerCenter Server.

Configuring a Session to Use a File ListAfter you create a file list for multiple source files, you can configure the session to access those files.

To use multiple source files for one source instance in a session:


2. Click the Mapping tab and open the Transformations view.

Using a File List 231

3. Click the Properties settings in the Sources node.

4. In the Source Filetype field, choose Indirect.

5. In the Source Filename field, replace the file name with the name of the file list.

If necessary, also enter the path in the Source File Directory field.

If you enter only a file name in the Source Filename field, and you have specified a path in the Source File Directory field, the PowerCenter Server looks for the named file in the listed directory.

If you enter only a file name in the Source Filename field, and you do not specify a path in the Source File Directory field, the PowerCenter Server looks for the named file in the directory where the PowerCenter Server is installed on UNIX or in the system directory on Windows.

6. Click OK.

Indirect File Type

Source Filename


C h a p t e r 9

Working with Targets


♦ Overview, 234

♦ Configuring Targets in a Session, 236

♦ Working with Relational Targets, 240

♦ Working with Target Connection Groups, 257

♦ Working with Active Sources, 259

♦ Working with File Targets, 261

♦ Server Handling for File Targets, 268

♦ Working with Heterogeneous Targets, 274

233

Overview

In the Workflow Manager, you can create sessions with the following targets:

♦ Relational. You can load data to any relational database that the PowerCenter Server can connect to. When loading data to relational targets, you must configure the database connection to the target before you configure the session.

♦ File. You can load data to a flat file or XML target. The PowerCenter Server can load data to any local directory or FTP connection for the target file. If the file target requires an FTP connection, you need to configure the FTP connection to the host machine before you create the session.

♦ Heterogeneous. You can output data to multiple targets in the same session. You can output to multiple relational targets, such as Oracle and Microsoft SQL Server. Or, you can output to multiple target types, such as relational and flat file. For more information, see “Working with Heterogeneous Targets” on page 274.

Globalization FeaturesYou can configure the PowerCenter Server to run sessions in either ASCII or Unicode data movement mode.

Table 9-1 describes target character sets supported by each data movement mode in PowerCenter:

PowerCenter allows you to work with targets that use multibyte character sets. You can choose a code page that you want the PowerCenter Server to use for relational objects and flat files. You specify code pages for relational objects when you configure database connections in the Workflow Manager. The code page for a database connection used as a target must be a superset of the repository code page.

When you change the database connection code page to one that is not two-way compatible with the old code page, the Workflow Manager generates a warning and invalidates all sessions that use that database connection.

Table 9-1. Support for ASCII and Unicode Data Movement Modes

Character Set Unicode Mode ASCII Mode



ASCII-based MBCS Supported PowerCenter Server generates a warning message, but does not terminate the session.

UTF-8 Supported (targets only) PowerCenter Server generates a warning message, but does not terminate the session.

234 Chapter 9: Working with Targets

Code pages you select for a file represent the code page of the data contained in these files. If you are working with flat files, you can also specify delimiters and null characters supported by the code page you have specified for the file.

Target code pages must be a superset of the repository code page. They must also be a superset of the source code page and the PowerCenter Server code page.

However, if you configure the PowerCenter Server and Client for relaxed code page validation, you can select any code page supported by PowerCenter for the target database connection. When using relaxed code page validation, select compatible code pages for the source and target data to prevent data inconsistencies. For more information about code page compatibility, see “Globalization Overview” in the Installation and Configuration Guide.

If the target contains multibyte character data, configure the PowerCenter Server to run in Unicode mode. When the PowerCenter Server runs a session in Unicode mode, it uses the database code page to translate data.

If the target contains only single-byte characters, configure the PowerCenter Server to run in ASCII mode. When the PowerCenter Server runs a session in ASCII mode, it does not validate code pages.

Target ConnectionsBefore you can load data to a target, you must configure the connection properties the PowerCenter Server uses to connect to the target file or database. You can configure target database and FTP connections in the Workflow Manager.

For details on creating database connections, see “Setting Up a Relational Database Connection” on page 53. For details on creating FTP connections, see “Using FTP” on page 559.

Partitioning TargetsWhen you create multiple partitions in a session with a relational target, the PowerCenter Server creates multiple connections to the target database to write target data concurrently. When you create multiple partitions in a session with a file target, the PowerCenter Server creates one target file for each partition. You can configure the session properties to merge these target files.

For details on configuring a session for pipeline partitioning, see “Pipeline Partitioning” on page 345.

Permissions and PrivilegesYou must have execute permissions for connection objects associated with the session. For example, if the target requires database connections or FTP connections, you must have read permission on the connections to configure the session, and execute permission to run the session.

Overview 235

Configuring Targets in a Session

Configure target properties for sessions in the Transformations view on Mapping tab of the session properties. Click the Targets node to view the target properties. When you configure target properties for a session, you define properties for each target instance in the mapping.

Figure 9-1 shows where you define target properties in a session:

The Targets node contains the following settings where you define properties:

♦ Writers

♦ Connections

♦ Properties

Configuring WritersClick the Writers settings in the Transformations view to define the writer to use with each target instance.

Figure 9-1. Defining Target Properties in the Session Properties

Transformations View

Targets Node

Writers Settings

Connections Settings

Properties Settings


Figure 9-2 shows you define the writer to use with each target instance:

When the mapping target is a flat file, an XML file, an SAP BW target, or an IBM MQSeries target, the Workflow Manager specifies the necessary writer in the session properties. However, when the target in the mapping is relational, you can change the writer type to File Writer if you plan to use an external loader.

Note: You can change the writer type for non-reusable sessions in the Workflow Designer and for reusable sessions in the Task Developer. You cannot change the writer type for instances of reusable sessions in the Workflow Designer.

When you override a relational target to use the file writer, the Workflow Manager changes the properties for that target instance on the Properties settings. It also changes the connection options you can define in the Connections settings.

After you override a relational target to use a file writer, define the file properties for the target. Click Set File Properties and choose the target to define. For more information, see “Configuring Fixed-Width Properties” on page 265 and “Configuring Delimited Properties” on page 266.

Configuring ConnectionsView the Connections settings on the Mapping tab to define target connection information.

Figure 9-2. Writers Settings on the Mapping Tab of the Session Properties

Writers Settings

Configuring Targets in a Session 237

Figure 9-3 shows the Connections settings on the Mapping tab of the session properties:

For relational targets, the Workflow Manager displays Relational as the target type by default. In the Value column, choose a configured database connection for each relational target instance. For details on configuring database connections, see “Target Database Connection” on page 241.

For flat file and XML targets, choose one of the following target connection types in the Type column for each target instance:

♦ FTP. If you want to load data to a flat file or XML target using FTP, you must specify an FTP connection when you configure target options. FTP connections must be defined in the Workflow Manager prior to configuring sessions.

You must have read permission for any FTP connection you want to associate with the session. The user starting the session must have execute permission for any FTP connection associated with the session. For details on using FTP, see “Using FTP” on page 559.

♦ Loader. You can use the external loader option to improve the load speed to Oracle, DB2, Sybase IQ, or Teradata target databases.

To use this option, you must use a mapping with a relational target definition and choose File as the writer type on the Writers settings for the relational target instance. The PowerCenter Server uses an external loader to load target files to the Oracle, DB2, Sybase

Figure 9-3. Connections Settings on the Mapping Tab of the Session Properties

Choose a connection.

Edit a connection.



IQ, or Teradata database. You cannot choose external loader if the target is defined in the mapping as a flat file, XML, MQ, or SAP BW target.

For details on using the external loader feature, see “External Loading” on page 523.

♦ Queue. Choose Queue when you want to output to an IBM MQSeries message queue. For details, see the PowerCenter Connect for IBM MQSeries User and Administrator Guide.

♦ None. Choose None when you want to write to a local flat file or XML file.

Configuring PropertiesView the Properties settings on the Mapping tab to define target property information. The Workflow Manager displays different properties for the different target types: relational, flat file, and XML.

Figure 9-4 shows the Properties settings on the Mapping tab:

For more information on relational target properties, see “Working with Relational Targets” on page 240. For more information on flat file target properties, see “Working with File Targets” on page 261. For more information on XML target properties, see “Working with Heterogeneous Targets” on page 274.

For more information on configuring sessions with multiple target types, see “Working with Heterogeneous Targets” on page 274.

Figure 9-4. Properties Settings on the Mapping Tab of the Session Properties

Properties Settings

Configuring Targets in a Session 239

Working with Relational Targets

When you configure a session to load data to a relational target, you define most properties in the Transformations view on the Mapping tab. You also define some properties on the Properties tab and the Config Object tab.

You can configure the following properties for relational targets:

♦ Target database connection. Define database connection information. For more information, see “Target Database Connection” on page 241.

♦ Target properties. You can define target properties such as target load type, target update options, and reject options. For more information, see “Target Properties” on page 241.

♦ Truncate target tables. The PowerCenter Server can truncate target tables before loading data. For more information, see “Truncating Target Tables” on page 245.

♦ Deadlock retry. You can configure the session to retry deadlocks when writing to targets. For more information, see “Deadlock Retry” on page 246.

♦ Drop and recreate indexes. Use pre- and post-session SQL to drop and recreate an index on a relational target table to optimize query speed. For more information, see “Dropping and Recreating Indexes” on page 248.

♦ Constraint-based loading. The PowerCenter Server can load data to targets based on primary key-foreign key constraints and active sources in the session mapping. For more information, see “Constraint-Based Loading” on page 248.

♦ Bulk loading. You can specify bulk mode when loading to DB2, Microsoft SQL Server, Oracle, and Sybase databases. For more information, see “Bulk Loading” on page 252.

You can define the following properties in the session and override the properties you define in the mapping:

♦ Table name prefix. You can specify the target owner name or prefix in the session properties to override the table name prefix in the mapping. For more information, see “Table Name Prefix” on page 254.

♦ Pre-session SQL. You can create SQL commands and execute them in the target database before loading data to the target. For example, you might want to drop the index for the target table before loading data into it. For more information, see “Using Pre- and Post-Session SQL Commands” on page 186.

♦ Post-session SQL. You can create SQL commands and execute them in the target database after loading data to the target. For example, you might want to recreate the index for the target table after loading data into it. For more information, see “Using Pre- and Post-Session SQL Commands” on page 186.

If any target table or column name contains a database reserved word, you can create and maintain a reserved words file containing database reserved words. When the PowerCenter Server executes SQL against the database, it places quotes around the reserved words. For more information, see “Reserved Words” on page 255.

When the PowerCenter Server runs a session with at least one relational target, it performs database transactions per target connection group. For example, it commits all data to targets


in a target connection group at the same time. For more information, see “Working with Target Connection Groups” on page 257.

Target Database ConnectionBefore you can run a session to load data to a target database, the PowerCenter Server must connect to the target database. Database connections must exist in the repository to appear on the target database list. You must define them prior to configuring a session. For details on configuring a database connection, see “Configuring the Workflow Manager” on page 37.

You can choose the target connections in the Transformations view of the Mapping tab. Click either the Targets or Connections node and select the database connection from the list for each target instance. You must have read permission for the target database connection to configure the session to use it. The user starting the configured session must have execute permission for target database connections.

Target PropertiesYou can configure session properties for relational targets in the Transformations view on the Mapping tab, and in the General Options settings on the Properties tab. Define the properties for each target instance in the session.

When you click the Transformations view on the Mapping tab, you can view and configure the settings of a specific target. Select the target under the Targets node.

Working with Relational Targets 241

Figure 9-5 shows the relational target properties you define in the Properties settings on the Mapping tab:

Table 9-2 describes the properties available in the Properties settings on the Mapping tab of the session properties:

Figure 9-5. Properties Settings on the Mapping Tab for a Relational Target

Table 9-2. Relational Target Properties

Target Property Required/Optional Description

Target Load Type Required You can choose Normal or Bulk. If you select Normal, the PowerCenter Server loads targets normally. You can only choose Bulk when you load to Sybase, Oracle, or Microsoft SQL Server. If you specify Bulk for other database types, the PowerCenter Server reverts to a normal load.Note: Choose Normal mode if the mapping contains an Update Strategy transformation.For more information, see �Bulk Loading� on page 252.

Insert* Optional If selected, the PowerCenter Server inserts all rows flagged for insert.By default, this option is selected.

Update (as Update)* Optional If selected, the PowerCenter Server updates all rows flagged for update. By default, this option is selected.

Update (as Insert)* Optional If selected, the PowerCenter Server inserts all rows flagged for update.By default, this option is not selected.

Edit settings for a particular target.


Update (else Insert)* Optional If selected, the PowerCenter Server updates rows flagged for update if they exist in the target, then inserts any remaining rows marked for insert.By default, this option is not selected.

Delete* Optional If selected, the PowerCenter Server deletes all rows flagged for delete.By default, this option is selected.

Truncate Table Optional If selected, the PowerCenter Server truncates the target before loading. By default, this option is not selected.For details on this feature, see �Truncating Target Tables� on page 245.

Reject File Directory Optional Enter the directory name in this field. By default, the PowerCenter Server writes all reject files to the server variable directory, $PMBadFileDir.If you specify both the directory and file name in the Reject Filename field, clear this field. The PowerCenter Server concatenates this field with the Reject Filename field when it runs the session.You can also use the $BadFileName session parameter to specify the file directory. For details on session parameters, see �Session Parameters� on page 495.

Reject Filename Required Enter the file name, or file name and path. By default, the PowerCenter Server names the reject file after the target instance name: target_name.bad. Optionally use the $BadFileName session parameter for the file name. The PowerCenter Server concatenates this field with the Reject File Directory field when it runs the session. For example, if you have �C:\reject_file\� in the Reject File Directory field, and enter �filename.bad� in the Reject Filename field, the PowerCenter Server writes rejected rows to C:\reject_file\filename.bad.For details on session parameters, see �Session Parameters� on page 495.

*For details on target update strategies, see �Update Strategy Transformation� in the Transformation Guide.

Table 9-2. Relational Target Properties



Figure 9-6 shows the test load options in the General Options settings on the Properties tab:

Table 9-3 describes the test load options on the General Options settings on the Properties tab:

Figure 9-6. Test Load Options

Table 9-3. Test Load Options

Property Required/Optional Description

Enable Test Load Optional You can configure the PowerCenter Server to perform a test load. With a test load, the PowerCenter Server reads and transforms data without writing to targets. The PowerCenter Server generates all session files, and performs all pre- and post-session functions, as if running the full session. The PowerCenter Server writes data to relational targets, but rolls back the data when the session completes. For all other target types, such as flat file and SAP BW, the PowerCenter Server does not write data to the targets.Enter the number of source rows you want to test in the Number of Rows to Test field.You cannot perform a test load on sessions using XML sources.Note: You can perform a test load for relational targets when you configure a session for normal mode. If you configure the session for bulk mode, the session fails.

Number of Rows to Test

Optional Enter the number of source rows you want the PowerCenter Server to test load.The PowerCenter Server reads the exact number you configure for the test load.

Test Load Options


Truncating Target TablesThe PowerCenter Server can truncate target tables before running a session. You can choose to truncate tables on a target-by-target basis. If you have more than one target instance, you only have to select the truncate target table option for one target instance.

Depending on the target database and primary key-foreign key relationships in the session target, the PowerCenter Server might issue a delete or truncate command.

Table 9-4 lists the commands that the PowerCenter Server issues for each database:

If the PowerCenter Server issues a truncate target table command and the target table instance specifies a table name prefix, the PowerCenter Server verifies the database user privileges for the target table by issuing a truncate command. If the database user is not specified as the target owner name or does not have the database privilege to truncate the target table, the PowerCenter Server automatically issues a delete command instead and writes the following error message to the session log:

WRT_8208 Error truncating target table <target table name> trying DELETE FROM query.

If the PowerCenter Server issues a delete command and the database has logging enabled, the database saves all deleted records to the log for rollback. If you do not want to save deleted records for rollback, you can disable logging to improve the speed of the delete.

For all databases, if the PowerCenter Server fails to truncate or delete any selected table because the user lacks the necessary privileges, the session fails.

If you use truncate target tables with one of the following functions, the PowerCenter Server fails to successfully truncate target tables for the session:

♦ Incremental aggregation. When you enable both truncate target tables and incremental aggregation in the session properties, the Workflow Manager issues a warning that you cannot enable truncate target tables and incremental aggregation in the same session.

Table 9-4. PowerCenter Server Commands on Supported Databases

Target Database Table contains a primary key referenced by a foreign key

Table does not contain a primary key referenced by a foreign key

DB2 truncate table <table_name>* truncate table <table_name>*

Informix delete from <table_name> delete from <table_name>

ODBC delete from <table_name> delete from <table_name>

Oracle delete from <table_name> unrecoverable truncate table <table_name>

Microsoft SQL Server delete from <table_name> truncate table <table_name>**

Sybase 11.x truncate table <table_name> truncate table <table_name>

*If you use a DB2 database on AS/400, the PowerCenter Server issues a clrpfm command.** If you use the Microsoft SQL Server ODBC driver, the PowerCenter Server issues a delete statement.


♦ Test load. When you enable both truncate target tables and test load, the PowerCenter Server disables the truncate table function, runs a test load session, and writes the following message to the session log:

WRT_8105 Truncate target tables option turned off for test load session.

To truncate a target table:


2. Click the Mapping tab, and then click the Transformations view.

3. Click the Targets node.

4. In the Properties settings, select Truncate Target Table Option for each target table you want the PowerCenter Server to truncate before it runs the session.

5. Click OK.

Deadlock RetrySelect the Session Retry on Deadlock option in the session properties if you want the PowerCenter Server to retry target writes on a deadlock. A deadlock might occur when the PowerCenter Server attempts to take control of the same lock for a row when loading partitioned targets or when running two sessions simultaneously to the same target.

Truncate Target Table Option


If the PowerCenter Server encounters a deadlock when it tries to write to a target, the deadlock only affects targets in the same target connection group. The PowerCenter Server still writes to targets in other target connection groups.

Encountering deadlocks can slow session performance. To improve session performance, you can increase the number of target connection groups the PowerCenter Server uses to write to the targets in a session. To use a different target connection group for each target in a session, use a different database connection name for each target instance. If you want, you can specify the same connection information for each connection name. For more information, see “Working with Target Connection Groups” on page 257.

You can only retry sessions on deadlock for targets configured for normal load. If you select this option and configure a target for bulk mode, the PowerCenter Server does not retry target writes on a deadlock for that target. You can also configure the PowerCenter Server to set the number of deadlock retries and the deadlock sleep time period. For more information on configuring the PowerCenter Server, see the Installation and Configuration Guide.

To retry a session on deadlock, click the Properties tab in the session properties and then scroll down to the Performance settings.

Figure 9-7 shows how to retry sessions on deadlock:

Figure 9-7. Session Retry on Deadlock

Session Retry on Deadlock


Dropping and Recreating IndexesAfter you insert significant amounts of data into a target, you normally need to drop and recreate indexes on that table to optimize query speed. You can drop and recreate indexes by:

♦ Using pre- and post-session SQL. The preferred method for dropping and re-creating indexes is to define a SQL statement in the Pre SQL property that drops indexes before loading data to the target. You can use the Post SQL property to recreate the indexes after loading data to the target. Define the Pre SQL and Post SQL properties for relational targets in the Transformations view on the Mapping tab in the session properties. For more information, see “Using Pre- and Post-Session SQL Commands” on page 186.

♦ Using the Designer. The same dialog box you use to generate and execute DDL code for table creation can drop and recreate indexes. However, this process is not automatic. Every time you run a session that modifies the target table, you need to launch the Designer and use this feature.

Constraint-Based LoadingIn the Workflow Manager, you can specify constraint-based loading for a session. When you select this option, the PowerCenter Server orders the target load on a row-by-row basis. For every row generated by an active source, the PowerCenter Server loads the corresponding transformed row first to the primary key table, then to any foreign key tables. Constraint-based loading depends on the following requirements:

♦ Active source. Related target tables must have the same active source.

♦ Key relationships. Target tables must have key relationships.

♦ Target connection groups. Targets must be in one target connection group.

♦ Treat rows as insert. Use this option when you insert into the target. You cannot use updates with constraint-based loading.

Active SourceWhen target tables receive rows from different active sources, the PowerCenter Server reverts to normal loading for those tables, but loads all other targets in the session using constraint-based loading when possible. For example, a mapping contains three distinct pipelines. The first two contain a source, source qualifier, and target. Since these two targets receive data from different active sources, the PowerCenter Server reverts to normal loading for both targets. The third pipeline contains a source, Normalizer, and two targets. Since these two targets share a single active source (the Normalizer), the PowerCenter Server performs constraint-based loading: loading the primary key table first, then the foreign key table.

For more information on active sources, see “Working with Active Sources” on page 259.

Key RelationshipsWhen target tables have no key relationships, the PowerCenter Server does not perform constraint-based loading. Similarly, when target tables have circular key relationships, the


PowerCenter Server reverts to a normal load. For example, you have one target containing a primary key and a foreign key related to the primary key in a second target. The second target also contains a foreign key that references the primary key in the first target. The PowerCenter Server cannot enforce constraint-based loading for these tables. It reverts to a normal load.

Target Connection GroupsThe PowerCenter Server enforces constraint-based loading for targets in the same target connection group. If you want to specify constraint-based loading for multiple targets that receive data from the same active source, you must verify the tables are in the same target connection group. If the tables with the primary key-foreign key relationship are in different target connection groups, the PowerCenter Server cannot enforce constraint-based loading when you run the workflow.

To verify that all targets are in the same target connection group, perform the following tasks:

♦ Verify all targets are in the same target load order group and receive data from the same active source.

♦ Use the default partition properties and do not add partitions or partition points.

♦ Define the same target type for all targets in the session properties.

♦ Define the same database connection name for all targets in the session properties.

♦ Choose normal mode for the target load type for all targets in the session properties.

For more information, see “Working with Target Connection Groups” on page 257.

Treat Rows as InsertUse constraint-based loading only when the session option Treat Source Rows As is set to Insert. You might get inconsistent data if you select a different Treat Source Rows As option and you configure the session for constraint-based loading.

When the mapping contains Update Strategy transformations and you need to load data to a primary key table first, split the mapping using one of the following options:

♦ Load primary key table in one mapping and dependent tables in another mapping. You can use constraint-based loading to load the primary table.

♦ Perform inserts in one mapping and updates in another mapping.

For more information about update strategies, see “Update Strategy Transformation” in the Transformation Guide.

Constraint-based loading does not affect the target load ordering of the mapping. Target load ordering defines the order the PowerCenter Server reads the sources in each target load order group in the mapping. A target load order group is a collection of source qualifiers, transformations, and targets linked together in a mapping. Constraint-based loading establishes the order in which the PowerCenter Server loads individual targets within a set of targets receiving data from a single source qualifier.


ExampleThe session for the mapping in Figure 9-8 is configured to perform constraint-based loading. In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys referencing the T1 primary key. T_3 has a primary key that T_4 references as a foreign key.

Since these four tables receive records from a single active source, SQ_A, the PowerCenter Server loads rows to the target in the following order:

♦ T_1

♦ T_2 and T_3 (in no particular order)

♦ T_4

The PowerCenter Server loads T_1 first because it has no foreign key dependencies and contains a primary key referenced by T_2 and T_3. The PowerCenter Server then loads T_2 and T_3, but since T_2 and T_3 have no dependencies, they are not loaded in any particular order. The PowerCenter Server loads T_4 last, because it has a foreign key that references a primary key in T_3.

After loading the first set of targets, the PowerCenter Server begins reading source B. If there are no key relationships between T_5 and T_6, the PowerCenter Server reverts to a normal load for both targets.

If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data from a single active source, the Aggregator AGGTRANS, the PowerCenter Server loads rows to the tables in the following order:

♦ T_5

♦ T_6

Figure 9-8. Mapping Using Constraint-Based Loading


T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database connection for each target, and you use the default partition properties. T_5 and T_6 are in another target connection group together if you use the same database connection for each target and you use the default partition properties. The PowerCenter Server includes T_5 and T_6 in a different target connection group because they are in a different target load order group from the first four targets.

To enable constraint-based loading:

1. In the General Options settings of the Properties tab, choose Insert for the Treat Source Rows As property.

Treat rows as insert.


2. Click the Config Object tab. In the Advanced settings, select Constraint Based Load Ordering.

3. Click OK.

Bulk LoadingYou can enable bulk loading when you load to DB2, Sybase, Oracle, or Microsoft SQL Server.

If you enable bulk loading for other database types, the PowerCenter Server reverts to a normal load. Bulk loading improves the performance of a session that inserts a large amount of data to the target database. Configure bulk loading on the Mapping tab.

When bulk loading, the PowerCenter Server invokes the database bulk utility and bypasses the database log, which speeds performance. Without writing to the database log, however, the target database cannot perform rollback. As a result, you may not be able to perform recovery. Therefore, you must weigh the importance of improved session performance against the ability to recover an incomplete session.

For more information on increasing session performance when bulk loading, see “Bulk Loading” on page 642.

Note: When loading to DB2, Microsoft SQL Server, and Oracle targets, you must specify a normal load for data driven sessions. When you specify bulk mode and data driven, the PowerCenter Server reverts to normal load.

Constraint Based Load Ordering


Committing Data When bulk loading to Sybase and DB2 targets, the PowerCenter Server ignores the commit interval you define in the session properties and commits data when the writer block is full.

When bulk loading to Microsoft SQL Server and Oracle targets, the PowerCenter Server commits data at each commit interval. Also, Microsoft SQL Server and Oracle start a new bulk load transaction after each commit.

Tip: When bulk loading to Microsoft SQL Server or Oracle targets, define a large commit interval to reduce the number of bulk load transactions and increase performance.

Oracle Guidelines Oracle allows bulk loading for the following software versions:

♦ Oracle server version 8.1.5 or higher

♦ Oracle client version 8.1.7.2 or higher

You can use the Oracle client 8.1.7 if you install the Oracle Threaded Bulk Mode patch.

Use the following guidelines when bulk loading to Oracle:

♦ Do not define CHECK constraints in the database.

♦ Do not define primary and foreign keys in the database. However, you can define primary and foreign keys for the target definitions in the Designer.

♦ To bulk load into indexed tables, choose non-parallel mode. To do this, you must disable the Enable Parallel Mode option. For more information, see “Configuring a Relational Database Connection” on page 56.

Note that when you disable parallel mode, you cannot load multiple target instances, partitions, or sessions into the same table.

To bulk load in parallel mode, you must drop indexes and constraints in the target tables before running a bulk load session. After the session completes, you can rebuild them. If you use bulk loading with the session on a regular basis, you can use pre- and post-session SQL to drop and rebuild indexes and key constraints.

♦ When you use the LONG datatype, verify it is the last column in the table.

♦ Specify the Table Name Prefix for the target when you use Oracle client 9i. If you do not specify the table name prefix, the PowerCenter Server uses the database login as the prefix.

For more information, see your Oracle documentation.

DB2 GuidelinesUse the following guidelines when bulk loading to DB2:

♦ You must drop indexes and constraints in the target tables before running a bulk load session. After the session completes, you can rebuild them. If you use bulk loading with the session on a regular basis, you can use pre- and post-session SQL to drop and rebuild indexes and key constraints.


♦ You cannot use source-based or user-defined commit when you run bulk load sessions on DB2.

♦ If you create multiple partitions for a DB2 bulk load session, you must use database partitioning for the target partition type. If you choose any other partition type, the PowerCenter Server reverts to normal load and writes the following message to the session log:

ODL_26097 Only database partitioning is support for DB2 bulk load. Changing target load type variable to Normal.

♦ When you bulk load to DB2, the DB2 database writes non-fatal errors and warnings to a message log file in the session log directory. The message log file name is <session_log_name>.<target_instance_name>.<partition_index>.log. You can check both the message log file and the session log when you troubleshoot a DB2 bulk load session.

For more information, see your DB2 documentation.

Table Name PrefixThe table name prefix is the owner of the target table. For some databases, such as DB2, tables can have different owners. If the database user specified in the database connection is not the owner of the target tables in a session, specify the table owner for each target instance. A session can fail if the database user is not the owner and you do not specify the table owner name.

You can specify the table owner name in the target instance or in the session properties. When you specify the table owner name in the session properties, you override table owner name in the transformation properties. For more information about specifying table owner name in the mapping properties, see “Mappings” in the Designer Guide.

Note: When you specify the table owner name and you set the sqlid for a DB2 database in the environment SQL, the PowerCenter Server uses table owner name in the target instance. To use the table owner name specified in the SET sqlid statement, do not enter a name in the target name prefix.

To specify the target owner name or prefix at the session level:

1. In the Workflow Manager, open the session properties and click the Transformations view on the Mapping tab.

2. Select the target instance under the Targets node.


3. In the Properties settings, enter the table owner name or prefix in the Table Name Prefix field, and click OK.

Reserved WordsIf any table name or column name contains a database reserved word, such as MONTH or YEAR, the session fails with database errors when the PowerCenter Server executes SQL against the database. You can create and maintain a reserved words file, reswords.txt, in the PowerCenter Server installation directory. When the PowerCenter Server initializes a session, it searches for reswords.txt. If the file exists, the PowerCenter Server places quotes around matching reserved words when it executes SQL against the database.

Use the following rules and guidelines when working with reserved words.

♦ The PowerCenter Server searches the reserved words file when it generates SQL to connect to source, target, and lookup databases.

♦ If you override the SQL for a source, target, or lookup, you must enclose any reserved word in quotes.

♦ You may need to enable some databases, such as Microsoft SQL Server and Sybase, to use SQL-92 standards regarding quoted identifiers. You can use environment SQL to issue the command. For example, with Microsoft SQL Server, you can use the following command:

SET QUOTED_IDENTIFIER ON

Table Name Prefix

Target Instance


Sample reswords.txt FileTo use a reserved words file, create a file named reswords.txt and place it in the PowerCenter Server installation directory. Create a section for each database that you need to store reserved words for. Add reserved words used in any table or column name. You do not need to store all reserved words for a database in this file. Database names and reserved words in resword.txt are not case sensitive.

Following is a sample resword.txt file:

[Teradata]

MONTH

DATE

INTERVAL

[Oracle]

OPTION

START

[DB2]

[SQL Server]

CURRENT

[Informix]

[ODBC]

MONTH

[Sybase]


Working with Target Connection Groups

When you create a session with at least one relational target, SAP BW target, or dynamic MQSeries target, you need to consider target connection groups. A target connection group is a group of targets that the PowerCenter Server uses to determine commits and loading. When the PowerCenter Server performs a database transaction, such as a commit, it performs the transaction to all targets in a target connection group.

The PowerCenter Server performs the following database transactions per target connection group:

♦ Deadlock retry. If the PowerCenter Server encounters a deadlock when it writes to a target, the deadlock only affects targets in the same target connection group. The PowerCenter Server still writes to targets in other target connection groups. For more information, see “Deadlock Retry” on page 246.

♦ Constraint-based loading. The PowerCenter Server enforces constraint-based loading for targets in a target connection group. If you want to specify constraint-based loading, you must verify the primary table and foreign table are in the same target connection group. For more information, see “Constraint-Based Loading” on page 248.

Targets in the same target connection group meet the following criteria:

♦ Belong to the same partition.

♦ Belong to the same target load order group.

♦ Have the same target type in the session.

♦ Have the same database connection name for relational targets, and Application connection name for SAP BW targets. For more information, see the PowerCenter Connect for SAP BW User and Administrator Guide.

♦ Have the same target load type, either normal or bulk mode.

For example, suppose you create a session based on a mapping that reads data from one source and writes to two Oracle target tables. In the Workflow Manager, you do not create multiple partitions in the session. You use the same Oracle database connection for both target tables in the session properties. You specify normal mode for the target load type for both target tables in the session properties. The targets in the session belong to the same target connection group.

Suppose you create a session based on the same mapping. In the Workflow Manager, you do not create multiple partitions. However, you use one Oracle database connection name for one target, and you use a different Oracle database connection name for the other target. You specify normal mode for the target load type for both target tables. The targets in the session belong to different target connection groups.

Note: When you define the target database connections for multiple targets in a session using session parameters, the targets may or may not belong to the same target connection group. The targets belong to the same target connection group if all session parameters resolve to the same target connection name. For example, you create a session with two targets and specify the session parameter $DBConnection1 for one target, and $DBConnection2 for the other

Working with Target Connection Groups 257

target. In the parameter file, you define $DBConnection1 as Sales1 and you define $DBConnection2 as Sales1 and run the workflow. Both targets in the session belong to the same target connection group.


Working with Active Sources

An active source is an active transformation the PowerCenter Server uses to generate rows. An active source can be any of the following transformations:

♦ Aggregator

♦ Application Source Qualifier

♦ Custom, configured as an active transformation

♦ Joiner

♦ MQ Source Qualifier

♦ Normalizer (VSAM or pipeline)

♦ Rank

♦ Sorter

♦ Source Qualifier

♦ XML Source Qualifier

♦ Mapplet, if it contains any of the above transformation

Note: Although the Filter, Router, Transaction Control, and Update Strategy transformations are active transformations, the PowerCenter Server does not use them as active sources in a pipeline.

Active sources affect how the PowerCenter Server processes a session when you use any of the following transformations or session properties:

♦ XML targets. The PowerCenter Server can load data from different active sources to an XML target when each input group receives data from one active source. For more information on XML targets, see “Working with XML Targets” in the XML User Guide.

♦ Transaction generators. Transaction generators, such as Transaction Control transformations, become ineffective for downstream transformations or targets if you put a transaction control point after it. Transaction control points are transaction generators and active sources that generate commits. For more information on effective and ineffective transaction generators, see “Transaction Control Transformation” in the Transformation Guide. For a list of transaction control points, see “Transformation Scope” on page 287.

♦ Mapplets. An Input transformation must receive data from a single active source. For more information on connecting mapplets to active sources in mappings, see “Mapplets” in the Designer Guide.

♦ Source-based commit. Some active sources generate commits. When you run a source-based commit session, the PowerCenter Server generates a commit from these active sources at every commit interval. For more information on source-based commit sessions, see “Source-Based Commits” on page 278.

Working with Active Sources 259

♦ Constraint-based loading. To use constraint-based loading, you must connect all related targets to the same active source. The PowerCenter Server orders the target load on a row-by-row basis based on rows generated by an active source. For more information on constraint-based loading, see “Constraint-Based Loading” on page 248.

♦ Row error logging. If an error occurs downstream from an active source that is not a source qualifier, the PowerCenter Server cannot identify the source row information for the logged error row. For more information on logging errors, see “Overview” on page 482.


Working with File Targets

You can output data to a flat file in either of the following ways:

♦ Use a flat file target definition. Create a mapping with a flat file target definition. Create a session using the flat file target definition. When the PowerCenter Server runs the session, it creates the target flat file based on the flat file target definition.

♦ Use a relational target definition. Use a relational definition to write to a flat file when you want to use an external loader to load the target. Create a mapping with a relational target definition. Create a session using the relational target definition. Configure the session to output to a flat file by specifying the File Writer in the Writers settings on the Mapping tab. For details on using the external loader feature, see “External Loading” on page 523.

You can configure the following properties for flat file targets:

♦ Target properties. You can define target properties such as partitioning options, output file options, and reject options. For more information, see “Configuring Target Properties” on page 261.

♦ Flat file properties. You can choose to create delimited or fixed-width files, and define their properties. For more information, see “Configuring Fixed-Width Properties” on page 265 and “Configuring Delimited Properties” on page 266.

Configuring Target PropertiesYou can configure session properties for flat file targets in the Properties settings on the Mapping tab, and in the General Options settings on the Properties tab. Define the properties for each target instance in the session.

Working with File Targets 261

Figure 9-9 shows the flat file target properties you define in the Properties settings on the Mapping tab in the session properties:

Table 9-5 describes the properties you define in the Properties settings for flat file target definitions:

Figure 9-9. Properties Settings on the Mapping Tab for a Flat File Target

Table 9-5. Flat File Target Properties

Target Properties Required/Optional Description

Merge Partitioned Files

Optional When selected, the PowerCenter Server merges the partitioned target files into one file when the session completes, and then deletes the individual output files. If the PowerCenter Server fails to create the merged file, it does not delete the individual output files.You cannot merge files if the session uses FTP, an external loader, or a message queue.For details on configuring a session for partitioning, see �Pipeline Partitioning� on page 345.

Merge File Directory

Optional Enter the directory name in this field. By default, the PowerCenter Server writes the merged file in the server variable directory, $PMTargetFileDir.If you enter a full directory and file name in the Merge File Name field, clear this field.

Merge File Name Optional Name of the merge file. Default is target_name.out. This property is required if you select Merge Partitioned Files.

Flat File Target Instance

Properties Settings

Set File Properties


Output File Directory

Optional Enter the directory name in this field. By default, the PowerCenter Server writes output files in the server variable directory, $PMTargetFileDir.If you specify both the directory and file name in the Output Filename field, clear this field. The PowerCenter Server concatenates this field with the Output Filename field when it runs the session.You can also use the $OutputFileName session parameter to specify the file directory. For details on session parameters, see �Session Parameters� on page 495.

Output Filename Required Enter the file name, or file name and path. By default, the Workflow Manager names the target file based on the target definition used in the mapping: target_name.out.If the target definition contains a slash character, the Workflow Manager replaces the slash character with an underscore.When you use an external loader to load to an Oracle database, you must specify a file extension. If you do not specify a file extension, the Oracle loader cannot find the flat file and the PowerCenter Server fails the session. For more information about external loading, see �Loading to Oracle� on page 533.Enter the file name, or file name and path. Optionally use the $OutputFileName session parameter for the file name. The PowerCenter Server concatenates this field with the Output File Directory field when it runs the session. For details on session parameters, see �Session Parameters� on page 495.Note: If you specify an absolute path file name when using FTP, the PowerCenter Server ignores the Default Remote Directory specified in the FTP connection. When you specify an absolute path file name, do not use single or double quotes.

Reject File Directory

Optional Enter the directory name in this field. By default, the PowerCenter Server writes all reject files to the server variable directory, $PMBadFileDir.If you specify both the directory and file name in the Reject Filename field, clear this field. The PowerCenter Server concatenates this field with the Reject Filename field when it runs the session.You can also use the $BadFileName session parameter to specify the file directory. For details on session parameters, see �Session Parameters� on page 495.


Set File Properties Link

Optional Opens a dialog box that allows you to define flat file properties. For more information, see �Configuring Fixed-Width Properties� on page 265 and �Configuring Delimited Properties� on page 266.When you output to a flat file using a relational target definition in the mapping, make sure you define the flat file properties by clicking the Set File Properties link.

Table 9-5. Flat File Target Properties

Target Properties Required/Optional Description


Figure 9-10 shows the test load options in the General Options settings on the Properties tab:

Table 9-6 describes the test load options in the General Options settings on the Properties tab:

Figure 9-10. Test Load Options

Table 9-6. Test Load Options

Property Required/Optional Description

Enable Test Load Optional You can configure the PowerCenter Server to perform a test load. With a test load, the PowerCenter Server reads and transforms data without writing to targets. The PowerCenter Server generates all session files and performs all pre- and post-session functions, as if running the full session. The PowerCenter Server writes data to relational targets, but rolls back the data when the session completes. For all other target types, such as flat file and SAP BW, the PowerCenter Server does not write data to the targets.Enter the number of source rows you want to test in the Number of Rows to Test field.You cannot perform a test load on sessions using XML sources.Note: You can perform a test load for relational targets when you configure a session for normal mode. If you configure the session for bulk mode, the session fails.


Optional Enter the number of source rows you want the PowerCenter Server to test load.The PowerCenter Server reads the number you configure for the test load.

Test Load Options


Configuring Fixed-Width PropertiesWhen you output data to a fixed-width file, you can edit file properties in the session properties, such as the null character or code page. You can configure fixed-width properties for non-reusable sessions in the Workflow Designer and for reusable sessions in the Task Developer. You cannot configure fixed-width properties for instances of reusable sessions in the Workflow Designer.

In the Transformations view on the Mapping tab, click the Targets node and then click Set File Properties to open the Flat Files dialog box.


To edit the fixed-width properties, select Fixed Width and click Advanced.

Figure 9-12 shows the Fixed Width Properties dialog box:


Figure 9-12. Fixed Width Properties Dialog Box


Table 9-7 describes the options you define in the Fixed Width Properties dialog box:

Configuring Delimited PropertiesWhen you output data to a delimited file, you can edit file properties in the session properties, such as the delimiter or code page. You can configure delimited properties for non-reusable sessions in the Workflow Designer and for reusable sessions in the Task Developer. You cannot configure delimited properties for instances of reusable sessions in the Workflow Designer.

In the Transformations view on the Mapping tab, click the Targets node and then click Set File Properties to open the Flat Files dialog box.


To edit the delimited properties, select Delimited and click Advanced.

Table 9-7. Writing to a Fixed-Width Target



Null Character Required Enter the character you want the PowerCenter Server to use to represent null values. You can enter any valid character in the file code page.For more information about using null characters for target files, see �Null Characters in Fixed-Width Files� on page 272.

Repeat Null Character Optional Select this option to indicate a null value by repeating the null character to fill the field. If you do not select this option, the PowerCenter Server enters a single null character at the beginning of the field to represent a null value. For more information about specifying null characters for target files, see �Null Characters in Fixed-Width Files� on page 272.




Figure 9-14 shows the Delimited File Properties dialog box:

Table 9-8 describes the options you can define in the Delimited File Properties dialog box:

Figure 9-14. Delimited File Properties Dialog Box

Table 9-8. Delimited File Properties

Edit Delimiter Options


Delimiters Required Character used to separate columns of data. Use the button to the right of this field to enter a non-printable delimiter. Delimiters can be either printable or single-byte unprintable characters, and must be different from the escape character and the quote character (if selected). You cannot select unprintable multibyte characters as delimiters.

Optional Quotes Required Select None, Single, or Double. If you select a quote character, the PowerCenter Server does not treat delimiter characters within the quote characters as a delimiter. For example, suppose an output file uses a comma as a delimiter and the PowerCenter Server receives the following row: 342-3849, �Smith, Jenna�, �Rockville, MD�, 6. If you select the optional single quote character, the PowerCenter Server ignores the commas within the quotes and writes the row as four fields. If you do not select the optional single quote, the PowerCenter Server writes six separate fields.



Server Handling for File Targets

When you configure a session to write to file targets, you need to know how the PowerCenter Server loads data. In the mapping, you must correctly configure your flat file target definitions and the relational target definitions you use to write to flat files. The PowerCenter Server loads data to flat files based on the following criteria:

♦ Writing to fixed-width flat files from relational target definitions. The PowerCenter Server adds spaces to target columns based on transformation datatype.

♦ Writing to fixed-width flat files from flat file target definitions. You must configure the precision and field width for flat file target definitions to accommodate the total length of the target field.

♦ Writing multibyte data to fixed-width files. You must configure the precision of string columns to accommodate character data. When writing shift-sensitive data to a fixed-width flat file target, the PowerCenter Server adds shift characters and spaces to meet file requirements.

♦ Null characters in fixed-width files. The PowerCenter Server writes repeating or non-repeating null characters to fixed-width target file columns differently depending on whether the characters are single- or multibyte.

♦ Character set. You can write ASCII or Unicode data to a flat file target.

♦ Writing metadata to flat file targets. You can configure the PowerCenter Server to write the column header information when you write to flat file targets.

Writing to Fixed-Width Flat Files with Relational Target DefinitionsWhen you want to output to a fixed-width file based on a relational target definition in the mapping, consider how the PowerCenter Server handles spacing in the target file.

When the PowerCenter Server writes to a fixed-width flat file based on a relational target definition in the mapping, it adds spaces to columns based on the transformation datatype connected to the target. This allows the PowerCenter Server to write optional symbols necessary for the datatype, such as a negative sign or decimal point, without sending the row to the reject file.

For example, you connect a transformation Integer(10) port to a Number(10) column in a relational target definition. In the session properties, you override the relational target definition to use the File Writer and you specify to output a fixed-width flat file. In the target flat file, the PowerCenter Server appends an additional byte to the Number(10) column to allow for negative signs that might be associated with Integer data.


Table 9-9 describes the number of bytes the PowerCenter Server adds to the target column and optional characters it uses for each datatype:

Writing to Fixed-Width Files with Flat File Target DefinitionsWhen you want to output to a fixed-width flat file based on a flat file target definition, you must configure precision and field width for the target field to accommodate the total length of the target field. If the data for a target field is too long for the total length of the field, the PowerCenter Server performs one of the following actions:

♦ Truncates the row for string columns

♦ Writes the row to the reject file for numeric and datetime columns

Note: When the PowerCenter Server writes a row to the reject file, it writes a message in the session log.

When a session writes to a fixed-width flat file based on a fixed-width flat file target definition in the mapping, the PowerCenter Server defines the total length of a field by the precision or field width defined in the target.

Fixed-width files are byte-oriented, which means the total length of a field is measured in bytes.

Table 9-9. Datatype Modifications for File Target Columns

Transformation Datatype Connected to Fixed-Width Flat File Target Column

Bytes Added by PowerCenter Server

Optional Characters for the Datatype

Decimal 2 - Negative sign (-) for the mantissa.- Decimal point (.).

Double 7 - Negative sign for the mantissa.- Decimal point.- Negative sign, e, and three digits for the exponent, for

example, -4.2-e123.

Float 7 - Negative sign for the mantissa.- Decimal point.- Negative sign, e, and three digits for the exponent.

Integer 1 - Negative sign for the mantissa.

Money 2 - Negative sign for the mantissa.- Decimal point.

Numeric 2 - Negative sign for the mantissa.- Decimal point.

Real 7 - Negative sign for the mantissa.- Decimal point.- Negative sign, e, and three digits for the exponent.

Server Handling for File Targets 269

Table 9-10 describes how the PowerCenter Server measures the total field length for fields in a fixed-width flat file target definition:

Table 9-11 lists the characters you must accommodate when you configure the precision or field width for flat file target definitions to accommodate the total length of the target field:

When you edit the flat file target definition in the mapping, define the precision or field width great enough to accommodate both the target data and the characters in Table 9-11.

For example, suppose you have a mapping with a fixed-width flat file target definition. The target definition contains a number column with a precision of 10 and a scale of 2. You use a comma as the decimal separator and a period as the thousands separator. You know some rows of data might have a negative value. Based on this information, you know the longest possible number is formatted with the following format:

-NN.NNN.NNN,NN

Open the flat file target definition in the mapping and define the field width for this number column as a minimum of 14 bytes.

For more information on formatting numeric and datetime values, see “Working with Flat Files” in the Designer Guide.

Writing Multibyte Data to Fixed-Width Flat FilesIf you plan to load multibyte data into a fixed-width flat file, configure the precision to accommodate the multibyte data. Fixed-width files are byte-oriented, not character-oriented. So, when you configure the precision for a fixed-width target, you need to consider the number of bytes you load into the target, rather than the number of characters.

Table 9-10. Field Length Measurements for Fixed-Width Flat File Targets

Datatype Target Field Property That Determines Total Field Length

Number Field width

String Precision

Datetime Field width

Table 9-11. Characters to Include when Calculating Field Length for Fixed-Width Targets

Datatype Characters to Accommodate

Number - Decimal separator. - Thousands separators. - Negative sign (-) for the mantissa.

String - Multibyte data. - Shift-in and shift-out characters.For more information, see �Writing Multibyte Data to Fixed-Width Flat Files� on page 270.

Datetime - Date and time separators, such as slashes (/), dashes (-), and colons (:).For example, the format MM/DD/YYYY HH24:MI:SS has a total length of 19 bytes.


For string columns, the PowerCenter Server truncates the data if the precision is not large enough to accommodate the multibyte data.

You might work with the following types of multibyte data:

♦ Non shift-sensitive multibyte data. The file contains all multibyte data. Configure the precision in the target definition to allow for the additional bytes.

For example, you know that the target data contains four double-byte characters, so you define the target definition with a precision of 8 bytes.

If you configure the target definition with a precision of 4, the PowerCenter Server truncates the data before writing to the target.

♦ Shift-sensitive multibyte data. The file contains single-byte and multibyte data. When writing to a shift-sensitive flat file target, the PowerCenter Server adds shift characters and spaces to meet file requirements. You must configure the precision in the target definition to allow for the additional bytes and the shift characters. For more information, see “Writing Shift-Sensitive Multibyte Data” on page 271.

Note: Delimited files are character-oriented, and you do not need to allow for additional precision for multibyte data.

Writing Shift-Sensitive Multibyte DataWhen writing to a shift-sensitive flat file target, the PowerCenter Server adds shift characters and spaces if the data going into the target does not meet file requirements. You need to allow at least two extra bytes in each data column containing multibyte data so the output data precision matches the byte width of the target column.

The PowerCenter Server writes shift characters and spaces in the following ways:

♦ If a column begins or ends with a double-byte character, the PowerCenter Server adds shift characters so the column begins and ends with a single-byte shift character.

♦ If the data is shorter than the column width, the PowerCenter Server pads the rest of the column with spaces.

♦ If the data is longer than the column width, the PowerCenter Server truncates the data so the column ends with a single-byte shift character.

To illustrate how the PowerCenter Server handles a fixed-width file containing shift-sensitive data, say you want to output the following data to the target:

A is a double-byte character, a is a single-byte character.

The first target column contains eight bytes and the second target column contains four bytes.

SourceCol1 SourceCol2

AAAA aaaa


The PowerCenter Server must add shift characters to handle shift-sensitive data. Since the first target column can only handle eight bytes, the PowerCenter Server truncates the data before it can add the shift characters.

The following table describes the notation used in this example:

For the first target column, the PowerCenter Server writes only three of the double-byte characters to the target. It cannot write any additional double-byte characters to the output column because the column must end in a single-byte character. If you add two more bytes to the first target column definition, then the PowerCenter Server can add shift characters and write all the data without truncation.

For the second target column, the PowerCenter Server writes all four single-byte characters to the target. It does not add write shift characters to the column because the column begins and ends with single-byte characters.

Null Characters in Fixed-Width FilesYou can specify any valid single-byte or multibyte character as a null character for a fixed-width target. You can also use a space as a null character.

The null character can be repeating or non-repeating. If the null character is repeating, the PowerCenter Server writes as many null characters as possible into a target column. If you specify a multibyte null character and there are extra bytes left after writing null characters, the PowerCenter Server pads the column with single-byte spaces. If a column is smaller than the multibyte character specified as the null character, the session fails at initialization.

Character SetYou can configure the PowerCenter Server to run sessions with flat file targets in either ASCII or Unicode data movement mode.

If you configure a session with a flat file target to run in Unicode data movement mode, the target file code page must be a superset of the PowerCenter Server code page and the source code page. Delimiters, escape, and null characters must be valid in the specified code page of the flat file.

If you configure a session to run in ASCII data movement mode, delimiters, escape, and null characters must be valid in the ISO Western European Latin1 code page. Any 8-bit character you specified in previous versions of PowerCenter is still valid.

TargetCol1 TargetCol2

-oAAA-i aaaa

Notation Description

A-o-i

Double-byte characterShift-out characterShift-in character


For more information about configuring and working with data movement modes and code pages, see “Globalization Overview” in the Installation and Configuration Guide.

Writing Metadata to Flat File TargetsWhen you write to flat file targets, you can configure the PowerCenter Server to write the column header information. When you enable the Output Metadata For Flat File Target option, the PowerCenter Server writes column headers to flat file targets. It writes the target definition port names to the flat file target in the first line, starting with the # symbol. By default, this option is disabled.

When writing to fixed-width files, the PowerCenter Server truncates the target definition port name if it is longer than the column width.

For example, you have the following fixed-width flat file target definition:

The column width for ITEM_ID is six. When you enable the Output Metadata For Flat File Target option, the PowerCenter Server writes the following text to a flat file:

#ITEM_ITEM_NAME PRICE

100001Screwdriver 9.50

100002Hammer 12.90

100003Small nails 3.00

For information about configuring the PowerCenter Server to output flat file metadata, see the Installation and Configuration Guide.


Working with Heterogeneous Targets

You can output data to multiple targets in the same session. When the target types or database types of those targets differ from each other, you have a session with heterogeneous targets.

To create a session with heterogeneous targets, you can create a session based on a mapping with heterogeneous targets. Or, you can create a session based on a mapping with homogeneous targets and select different database connections.

A heterogeneous target has one of the following characteristics:

♦ Multiple target types. You can create a session that writes to both relational and flat file targets.

♦ Multiple target connection types. You can create a session that writes to a target on an Oracle database and to a target on a DB2 database. Or, you can create a session that writes to multiple targets of the same type, but you specify different target connections for each target in the session.

All database connections you define in the Workflow Manager are unique to the PowerCenter Server, even if you define the same connection information. For example, you define two database connections, Sales1 and Sales2. You define the same user name, password, connect string, code page, and attributes for both Sales1 and Sales2. Even though both Sales1 and Sales2 define the same connection information, the PowerCenter Server treats them as different database connections. When you create a session with two relational targets and specify Sales1 for one target and Sales2 for the other target, you create a session with heterogeneous targets.

You can create a session with heterogeneous targets in one of the following ways:

♦ Create a session based on a mapping with targets of different types or different database types. In the session properties, keep the default target types and database types.

♦ Create a session based on a mapping with the same target types. However, in the session properties, specify different target connections for the different target instances, or override the target type to a different type.

You can override the target type in the session properties. However, you can only perform certain overrides. You can specify the following target type overrides in a session:

♦ Relational target to flat file.

♦ Relational target to any other relational database type. Verify the datatypes used in the target definition are compatible with both databases.

♦ SAP BW target to a flat file target type.

Note: When the PowerCenter Server runs a session with at least one relational target, it performs database transactions per target connection group. For example, it orders the target load for targets in a target connection group when you enable constraint-based loading. For more information, see “Working with Target Connection Groups” on page 257.


C h a p t e r 1 0

Understanding Commit Points


♦ Overview, 276

♦ Target-Based Commits, 277

♦ Source-Based Commits, 278

♦ User-Defined Commits, 283

♦ Understanding Transaction Control, 287

♦ Setting Commit Properties, 292

275

Overview

A commit interval is the interval at which the PowerCenter Server commits data to targets during a session. The commit point can be a factor of the commit interval, the commit interval type, and the size of the buffer blocks. The commit interval is the number of rows you want to use as a basis for the commit point. The commit interval type is the type of rows that you want to use as a basis for the commit point. You can choose between the following commit types:

♦ Target-based commit. The PowerCenter Server commits data based on the number of target rows and the key constraints on the target table. The commit point also depends on the buffer block size, the commit interval, and the PowerCenter Server configuration for writer timeout.

♦ Source-based commit. The PowerCenter Server commits data based on the number of source rows. The commit point is the commit interval you configure in the session properties.

♦ User-defined commit. The PowerCenter Server commits data based on transactions defined in the mapping properties. You can also configure some commit and rollback options in the session properties.

Source-based and user-defined commit sessions have partitioning restrictions. If you configure a session with multiple partitions to use source-based or user-defined commit, you can only choose pass-through partitioning at certain partition points in a pipeline. For more information, see “Specifying Partition Types” on page 356.

276 Chapter 10: Understanding Commit Points

Target-Based Commits

During a target-based commit session, the PowerCenter Server commits rows based on the number of target rows and the key constraints on the target table. The commit point depends on the following factors:

♦ Commit interval. The number of rows you want to use as a basis for commits. Configure the target commit interval in the session properties.

♦ Writer wait timeout. The amount of time the writer waits before it issues a commit. Configure the writer wait timeout in the PowerCenter Server setup.

♦ Buffer blocks. Blocks of memory that hold rows of data during a session. You can configure the buffer block size in the session properties, but you cannot configure the number of rows the block holds.

When you run a target-based commit session, the PowerCenter Server may issue a commit before, on, or after, the configured commit interval. The PowerCenter Server uses the following process to issue commits:

♦ When the PowerCenter Server reaches a commit interval, it continues to fill the writer buffer block.When the writer buffer block fills, the PowerCenter Server issues a commit.

♦ If the writer buffer fills before the commit interval, the PowerCenter Server writes to the target, but waits to issue a commit. It issues a commit when one of the following conditions is true:

− The writer is idle for the amount of time specified by the PowerCenter Server writer wait timeout option.

− The PowerCenter Server reaches the commit interval and fills another writer buffer.

For more information about configuring the writer wait timeout, see “Installing and Configuring the PowerCenter Server on Windows” or “Installing and Configuring the PowerCenter Server on UNIX” in the Installation and Configuration Guide.

Note: When you choose target-based commit for a session containing an XML target, the Workflow Manager disables the On Commit session property on the Transformations view of the Mapping tab.

Target-Based Commits 277

Source-Based Commits

During a source-based commit session, the PowerCenter Server commits data to the target based on the number of rows from some active sources in a target load order group. These rows are referred to as source rows.

When the PowerCenter Server runs a source-based commit session, it identifies commit source for each pipeline in the mapping. The PowerCenter Server generates a commit row from these active sources at every commit interval. The PowerCenter Server writes the name of the transformation used for source-based commit intervals into the session log:

Source-based commit interval based on... TRANSFORMATION_NAME

The PowerCenter Server might commit less rows to the target than the number of rows produced by the active source. For example, you have a source-based commit session that passes 10,000 rows through an active source, and 3,000 rows are dropped due to transformation logic. The PowerCenter Server issues a commit to the target when the 7,000 remaining rows reach the target.

The number of rows held in the writer buffers does not affect the commit point for a source-based commit session. For example, you have a source-based commit session that passes 10,000 rows through an active source. When those 10,000 rows reach the targets, the PowerCenter Server issues a commit. If the session completes successfully, the PowerCenter Server issues commits after 10,000, 20,000, 30,000, and 40,000 source rows.

If the targets are in the same transaction control unit, the PowerCenter Server commits data to the targets at the same time. If the session fails or aborts, the PowerCenter Server rolls back all uncommitted data in a transaction control unit to the same source row.

If the targets are in different transaction control units, the PowerCenter Server performs the commit when each target receives the commit row. If the session fails or aborts, the PowerCenter Server rolls back each target to the last commit point. It might not roll back to the same source row for targets in separate transaction control units. For more information on transaction control units, see “Understanding Transaction Control Units” on page 289.

Note: Source-based commit may slow session performance if the session uses a one-to-one mapping. A one-to-one mapping is a mapping that moves data from a Source Qualifier, XML Source Qualifier, or Application Source Qualifier transformation directly to a target. For more information about performance, see “Performance Tuning” on page 635.

Determining the Commit SourceWhen you run a source-based commit session, the PowerCenter Server generates commits at all source qualifiers and transformations that do not propagate transaction boundaries. This includes the following active sources:

♦ Source Qualifier

♦ Application Source Qualifier

♦ MQ Source Qualifier


♦ XML Source Qualifier when you only connect ports from one output group

♦ Normalizer (VSAM)

♦ Aggregator with the All Input transformation scope

♦ Joiner with the All Input transformation scope

♦ Rank with the All Input transformation scope

♦ Sorter with the All Input transformation scope

♦ Custom with one output group and with the All Input transformation scope

♦ A multiple input group transformation with one output group connected to multiple upstream transaction control points

♦ Mapplet, if it contains one of the above transformations

For more information on transformation scope and transaction control, see “Understanding Transaction Control” on page 287. For more information on active sources, see “Working with Active Sources” on page 259.

A mapping can have one or more target load order groups, and a target load order group can have one or more active sources that generate commits. The PowerCenter Server uses the commits generated by the active source that is closest to the target definition. This is known as the commit source.

For example, you have the mapping in Figure 10-1:

The mapping contains a Source Qualifier transformation and an Aggregator transformation with the All Input transformation scope. The Aggregator transformation is closer to the targets than the Source Qualifier transformation and is therefore used as the commit source for the source-based commit session.

Figure 10-1. Mapping with a Single Commit Source

Transformation Scope property is All Input.

Source-Based Commits 279

Also, suppose you have the mapping in Figure 10-2:

The mapping contains a target load order group with one source pipeline that branches from the Source Qualifier transformation to two targets. One pipeline branch contains an Aggregator transformation with the All Input transformation scope, and the other contains an Expression transformation. The PowerCenter Server identifies the Source Qualifier transformation as the commit source for t_monthly_sales and the Aggregator as the commit source for T_COMPANY_ALL. It performs a source-based commit for both targets, but uses a different commit source for each.

Switching from Source-Based to Target-Based CommitIf the PowerCenter Server identifies a target in the target load order group that does not receive commits from an active source that generates commits, it reverts to target-based commit for that target only.

The PowerCenter Server writes the name of the transformation used for source-based commit intervals into the session log. When the PowerCenter Server switches to target-based commit, it writes a message in the session log.

A target might not receive commits from a commit source in the following circumstances:

♦ The target receives data from the XML Source Qualifier transformation, and you connect multiple output groups from an XML Source Qualifier transformation to downstream transformations. An XML Source Qualifier transformation does not generate commits when you connect multiple output groups downstream.

♦ The target receives data from an active source with multiple output groups other than an XML Source Qualifier transformation. For example, the target receives data from a Custom transformation that you do not configure to generate transactions. Multiple output group active sources neither generate nor propagate commits.

Figure 10-2. Mapping with Multiple Commit Sources

Transformation Scope property is All Input.


Connecting XML Sources in a MappingAn XML Source Qualifier transformation does not generate commits when you connect multiple output groups downstream. When you an XML Source Qualifier transformation in a mapping, the PowerCenter Server can use different commit types for targets in this session depending on the transformations used in the mapping:

♦ You put a commit source between the XML Source Qualifier transformation and the target. The PowerCenter Server uses source-based commit for the target because it receives commits from the commit source. The active source is the commit source for the target.

♦ You do not put a commit source between the XML Source Qualifier transformation and the target. The PowerCenter Server uses target-based commit for the target because it receives no commits.

Suppose you have the mapping in Figure 10-3:

This mapping contains an XML Source Qualifier transformation with multiple output groups connected downstream. Because you connect multiple output groups downstream, the XML Source Qualifier transformation does not generate commits. You connect the XML Source Qualifier transformation to two relational targets, T_STORE and T_PRODUCT. Therefore, these targets do not receive any commit generated by an active source. The PowerCenter Server uses target-based commit when loading to these targets.

However, the mapping includes an active source that generates commits, AGG_Sales, between the XML Source Qualifier transformation and T_YTD_SALES. The PowerCenter Server uses source-based commit when loading to T_YTD_SALES.

Figure 10-3. Mapping with Targets Connected to a Commit Source

Transformation Scope = All Input

Connected to an XML Source Qualifier transformation with multiple connected output groups. PowerCenter Server uses target-based commit when loading to these targets.

Connected to an active source that generates commits, AGG_Sales. PowerCenter Server uses source-based commit when loading to this target.

Source-Based Commits 281

Connecting Multiple Output Group Custom Transformations in a MappingMultiple output group Custom transformations that you do not configure to generate transactions neither generate nor propagate commits. Therefore, the PowerCenter Server can use different commit types for targets in this session depending on the transformations used in the mapping:

♦ You put a commit source between the Custom transformation and the target. The PowerCenter Server uses source-based commit for the target because it receives commits from the active source. The active source is the commit source for the target.

♦ You do not put a commit source between the Custom transformation and the target. The PowerCenter Server uses target-based commit for the target because it receives no commits.

Suppose you have the mapping in Figure 10-4:

The mapping contains a multiple output group Custom transformation, CT_XML_Parser, which drops the commits generated by the Source Qualifier transformation. Therefore, targets T_store_name and T_store_addr do not receive any commits generated by an active source. The PowerCenter Server uses target-based commit when loading to these targets.

However, the mapping includes an active source that generates commits, AGG_store_orders, between the Custom transformation and T_store_orders. The PowerCenter Server uses source-based commit when loading to T_store_orders.

Note: You can configure a Custom transformation to generate transactions when the Custom transformation procedure outputs transactions. When you do this, configure the session for user-defined commit. For more information on user-defined commit sessions, see “User-Defined Commits” on page 283.

Figure 10-4. Mapping a Custom Transformation with a Commit Source

Transformation Scope is All Input.

Connected to a multiple output group active source, CT_XML_Parser. PowerCenter Server uses target-based commit when loading to these targets.

Connected to an active source that generates commits, AGG_store_orders. PowerCenter Server uses source-based commit when loading to this target.


User-Defined Commits

During a user-defined commit session, the PowerCenter Server commits and rolls back transactions based on a row or set of rows that pass through a Transaction Control transformation. The PowerCenter Server evaluates the transaction control expression for each row that enters the transformation. The return value of the transaction control expression defines the commit or rollback point.

You can use also create a user-defined commit session when the mapping contains a Custom transformation configured to generate transactions. When you do this, the procedure associated with the Custom transformation defines the transaction boundaries.

When the PowerCenter Server evaluates a commit row, it commits all rows in the transaction to the target or targets. When it evaluates a rollback row, it rolls back all rows in the transaction from the target or targets. The PowerCenter Server writes a message to the session log at each commit and rollback point. The session details are cumulative. The following message is a sample commit message from the session log:

WRITER_1_1_1> WRT_8317

USER-DEFINED COMMIT POINT Wed Oct 15 08:15:29 2003

===================================================

WRT_8036 Target: TCustOrders (Instance Name: [TCustOrders])

WRT_8038 Inserted rows - Requested: 1003 Applied: 1003 Rejected: 0 Affected: 1023

When the PowerCenter Server writes all rows in a transaction to all targets, it issues commits sequentially for each target.

The PowerCenter Server rolls back data based on the return value of the transaction control expression or error handling configuration. If the transaction control expression returns a rollback value, the PowerCenter Server rolls back the transaction. If an error occurs, you can choose to roll back or commit at the next commit point.

If the transaction control expression evaluates to a value other than commit, rollback, or continue, the PowerCenter Server fails the session. For more information about valid values, see “Transaction Control Transformation” in the Transformation Guide.

When the session completes, the PowerCenter Server may write data to the target that was not bound by commit rows. You can choose to commit at end of file or to roll back that open transaction.

Note: If you use bulk loading with a user-defined commit session, the target may not recognize the transaction boundaries. If the target connection group does not support transactions, the PowerCenter Server writes the following message to the session log:

WRT_8234 Warning: Target Connection Group’s connection doesn’t support transactions. Targets may not be loaded according to specified transaction boundaries rules.

User-Defined Commits 283

Rolling Back TransactionsThe PowerCenter Server rolls back transactions in the following circumstances:

♦ Rollback evaluation. The transaction control expression returns a rollback value.

♦ Open transaction. You choose to roll back at the end of file.

♦ Roll back on error. You choose to roll back commit transactions if the PowerCenter Server encounters a non-fatal error.

♦ Roll back on failed commit. If any target connection group in a transaction control unit fails to commit, the PowerCenter Server rolls back all uncommitted data to the last successful commit point.

For more information on transaction control units, see “Understanding Transaction Control Units” on page 289.

Rollback EvaluationIf the transaction control expression returns a rollback value, the PowerCenter Server rolls back the transaction and writes a message to the session log indicating that the transaction was rolled back. It also indicates how many rows were rolled back.

The following message is a sample message that the PowerCenter Server writes to the session log when the transaction control expression returns a rollback value:

WRITER_1_1_1> WRT_8326 User-defined rollback processed

WRITER_1_1_1> WRT_8331 Rollback statistics

WRT_8162 ===================================================

WRT_8330 Rolled back [333] inserted, [0] deleted, [0] updated rows for the target [TCustOrders]

Roll Back Open TransactionIf the last row in the transaction control expression evaluates to TC_CONTINUE_TRANSACTION, the session completes with an open transaction. If you choose to roll back that open transaction, the PowerCenter Server rolls back the transaction and writes a message to the session log indicating that the transaction was rolled back.

The following message is a sample message indicating that Commit on End of File is disabled in the session properties:

WRITER_1_1_1> WRT_8168 End loading table [TCustOrders] at: Wed Nov 05 10:21:56 2003

WRITER_1_1_1> WRT_8325 Final rollback executed for the target [TCustOrders] at end of load

The following message is a sample message indicating that Commit on End of File is enabled in the session properties:

WRITER_1_1_1> WRT_8143

Commit at end of Load Order Group Wed Nov 05 08:15:29 2003


Roll Back on ErrorYou can choose to roll back a transaction at the next commit point if the PowerCenter Server encounters a non-fatal error. When the PowerCenter Server encounters a non-fatal error, it processes the error row and continues processing the transaction. If the transaction boundary is a commit row, the PowerCenter Server rolls back the entire transaction and writes it to the reject file.

The following table describes row indicators in the reject file for rolled-back transactions:

Note: The PowerCenter Server does not roll back a transaction if it encounters an error before it processes any row through the Transaction Control transformation.

Roll Back on Failed CommitWhen the PowerCenter Server reaches the commit point for all targets in a transaction control unit, it issues commits sequentially for each target. If the commit fails for any target connection group within a transaction control unit, the PowerCenter Server rolls back all data to the last successful commit point. The PowerCenter Server cannot roll back committed transactions, but it does write the transactions to the reject file.

For example, use the mapping in Figure 10-5 on page 286 to read through the following scenario. This mapping has one transaction control unit and three target connection groups. The target names contain information about the target connection group. For example, TCG1_T1 represents the first target connection group and the first target.

1. The PowerCenter Server reaches the third commit point for all targets.

2. It begins to issue commits sequentially for each target.

3. The PowerCenter Server successfully commits to TCG1_T1 and TCG1_T2.

4. The commit fails for TCG2_T3.

5. The PowerCenter Server does not issue a commit for TCG3_T4.

6. The PowerCenter Server rolls back TCG2_T3 and TCG3_T4 to the second commit point, but it cannot roll back TCG1_T1 and TCG1_T2 to the second commit point because it successfully committed at the third commit point.

7. The PowerCenter Server writes the rows to the reject file from TCG2_T3 and TCG3_T4. These are the rollback rows associated with the third commit point.

8. The PowerCenter Server writes the row to the reject file from TCG_T1 and TCG1_T2. These are the commit rows associated with the third commit point.

Row Indicator Description

4 Rolled-back insert

5 Rolled-back update

6 Rolled-back delete

User-Defined Commits 285

Figure 10-5 illustrates PowerCenter Server behavior when it rolls back on a failed commit:

The following table describes row indicators in the reject file for committed transactions in a failed transaction control unit:

Figure 10-5. Roll Back on Failed Commit Example

Row Indicator Description

7 Committed insert

8 Committed update

9 Committed delete

Third commit is successful (3).Rows appear in the reject file (8).

Third commit fails (4).PowerCenter Server rolls back to second commit (6).Rows appear in reject file (7).

PowerCenter Server does not issue third commit (5).It rolls back to second commit (6).Rows appear in reject file (7).


Understanding Transaction Control

PowerCenter allows you to define transactions that the PowerCenter Server uses when it processes transformations, and when it commits and rolls back data at a target. You can define a transaction based on a varying number of input rows. A transaction is a set of rows bound by commit or rollback rows, the transaction boundaries. Some rows may not be bound by transaction boundaries. This set of rows is an open transaction. You can choose to commit at end of file or to roll back open transactions when you configure the session. For more information on the Commit On End of File session property, see “Setting Commit Properties” on page 292.

The PowerCenter Server can process a transformation for each row at a time, for all rows in a transaction, or for all source rows together. Processing a transformation for all rows in a transaction allows you to include such transformations, such as an Aggregator, in a real-time session. For more information on configuring how the PowerCenter Server processes a transformation, see “Transformation Scope” on page 287.

Transaction boundaries originate from transaction control points. A transaction control point is a transformation that defines or redefines the transaction boundary in the following ways:

♦ Generates transaction boundaries. The transformations that define transaction boundaries differ, depending on the session commit type:

− Target-based and user-defined commit. Transaction generators generate transaction boundaries. A transaction generator is a transformation that generates both commit and rollback rows. The Transaction Control and Custom transformation are transaction generators.

− Source-based commit. Some active sources generate commits. They do not generate rollback rows. Also, transaction generators generate commit and rollback rows. For a list of active sources that generate commits, see “Determining the Commit Source” on page 278.

♦ Drops incoming transaction boundaries. When a transformation drops incoming transaction boundaries, and does not generate commits, the PowerCenter Server outputs all rows into an open transaction. All active sources that generate commits and transaction generators drop incoming transaction boundaries.

For a list of transaction control points, see Table 10-1 on page 288.

Transformation ScopeYou can configure how the PowerCenter Server applies the transformation logic to incoming data with the Transformation Scope transformation property. When the PowerCenter Server processes a transformation, it either drops transaction boundaries or preserves transaction boundaries, depending on the transformation scope and the mapping configuration.

You can choose one of the following values for the transformation scope:

♦ Row. Applies the transformation logic to one row of data at a time. Choose Row when a row of data does not depend on any other row. When you choose Row for a

Understanding Transaction Control 287

transformation connected to multiple upstream transaction control points, the PowerCenter Server drops transaction boundaries and outputs all rows from the transformation as an open transaction. When you choose Row for a transformation connected to a single upstream transaction control point, the PowerCenter Server preserves transaction boundaries.

♦ Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions. When you choose Transaction, the PowerCenter Server preserves incoming transaction boundaries. It resets any cache, such as an aggregator or lookup cache, when it receives a new transaction.

When you choose Transaction for a multiple input group transformation, you must connect all input groups to the same upstream transaction control point.

♦ All Input. Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter Server drops incoming transaction boundaries and outputs all rows from the transformation as an open transaction. Choose All Input when a row of data depends on all rows in the source.

Table 10-1 lists the transformation scope values available for each transformation:

Table 10-1. Transformation Scope Property Values

Transformation Row Transaction All Input

Aggregator Optional. Default. Transaction control point.

Application Source Qualifier

n/a.Transaction control point.

Custom* Optional.Transaction control point or when configured to generate commits.

Optional.Transaction control point or when configured to generate commits.

Default. Transaction control point when it has one output group or when configured to generate commits.

Expression Default. Does not display.

External Procedure Default. Does not display.

Filter Default. Does not display.

Joiner Optional. Default. Transaction control point.

Lookup Default. Does not display.

MQ Source Qualifier n/a.Transaction control point.

Normalizer (VSAM) n/a.Transaction control point.

Normalizer (relational) Default. Does not display.

Rank Optional. Default. Transaction control point.


Understanding Transaction Control UnitsA transaction control unit is the group of targets connected to an active source that generates commits or an effective transaction generator. A transaction control unit may contain multiple target connection groups. For more information on target connection groups, see “Working with Target Connection Groups” on page 257.

When the PowerCenter Server reaches the commit point for all targets in a transaction control unit, it issues commits sequentially for each target.

Router Default. Does not display.

Sorter Optional. Default. Transaction control point.

Sequence Generator Default. Does not display.

Source Qualifier n/a.Transaction control point.

Stored Procedure Default. Does not display.

Transaction Control Default. Does not display.Transaction control point.

Union Default. Does not display.

Update Strategy Default. Does not display.

XML Generator Optional.Transaction when the flush on commit is set to create a new document,

Default. Does not display.

XML Parser Default. Does not display.

XML Source Qualifier n/a.Transaction control point.

*For more information on how the Transformation Scope property affects the Custom transformation, see �Custom Transformation� in the Transformation Guide.

Table 10-1. Transformation Scope Property Values

Transformation Row Transaction All Input


Figure 10-6 illustrates transaction control units with a Transaction Control transformation:

Note that T5_ora1 uses the same connection name as T1_ora1 and T2_ora1. Because T5_ora1 is connected to a separate Transaction Control transformation, it is in a separate transaction control unit and target connection group. If you connect T5_ora1 to tc_TransactionControlUnit1, it will be in the same transaction control unit as all targets, and in the same target connection group as T1_ora1 and T2_ora1.

Rules and GuidelinesConsider the following rules and guidelines when you work with transaction control:

♦ Transformations with Transaction transformation scope must receive data from a single transaction control point.

♦ The PowerCenter Server uses the transaction boundaries defined by the first upstream transaction control point for transformations with Transaction transformation scope.

♦ Transaction generators can be effective or ineffective for a target. The PowerCenter Server uses the transaction generated by an effective transaction generator when it loads data to a target. For more information on effective and ineffective transaction generators, see “Transaction Control Transformation” in the Transformation Guide.

♦ The Workflow Manager prevents you from using incremental aggregation in a session with an Aggregator transformation with Transaction transformation scope.

♦ Transformations with All Input transformation scope cause a transaction generator to become ineffective for a target in a user-defined commit session. For more information on

Figure 10-6. Transaction Control Units

Target Connection Group 1




Transaction Control Unit 1

Transaction Control Unit 2


using transaction generators in mappings, see “Transaction Control Transformation” in the Transformation Guide.

♦ The PowerCenter Server resets any cache at the beginning of each transaction for Aggregator, Joiner, Rank, and Sorter transformations with Transaction transformation scope.

♦ You can only choose the Transaction transformation scope for Joiner transformations when you use sorted input.

♦ When you add a partition point at a transformation with Transaction transformation scope, the Workflow Manager uses the pass-through partition type by default. You cannot change the partition type.


Setting Commit Properties

When you create a session, you can configure commit properties. The properties you set depend on the type of mapping and the type of commit you want the PowerCenter Server to perform.

Figure 10-7 shows the session commit properties that you set in the General Options settings of the Properties tab:

Table 10-2 describes the session commit properties that you set in the General Options settings of the Properties tab:

Figure 10-7. Session Commit Properties

Table 10-2. Session Commit Properties

Property Target-Based Source-Based User-Defined

Commit Type Selected by default if no transaction generator or only ineffective transaction generators are in the mapping.

Choose for source-based commit if no transaction generator or only ineffective transaction generators are in the mapping.

Selected by default if effective transaction generators are in the mapping.

Commit Interval* Default is 10,000. Default is 10,000. n/a

Commit Type

Commit Interval

Roll Back Transactions on Error

Commit on End of File


Commit on End of File Commits data at the end of the file. Enabled by default. You cannot disable this option.

Commits data at the end of the file. Clear this option if you want the PowerCenter Server to roll back open transactions.

Commits data at the end of the file. Clear this option if you want the PowerCenter Server to roll back open transactions.

Roll Back Transactions on Errors

n/a If the PowerCenter Server encounters a non-fatal error, you can choose to roll back the transaction at the next commit point.When the PowerCenter Server encounters a transformation error, it only rolls back the transaction if the error occurs after the effective transaction generator for the target.

If the PowerCenter Server encounters a non-fatal error, you can choose to roll back the transaction at the next commit point.When the PowerCenter Server encounters a transformation error, it only rolls back the transaction if the error occurs after the effective transaction generator for the target.

*Tip: When you bulk load to Microsoft SQL Server or Oracle targets, define a large commit interval. Microsoft SQL Server and Oracle start a new bulk load transaction after each commit. Increasing the commit interval reduces the number of bulk load transactions and increases performance.

Table 10-2. Session Commit Properties

Property Target-Based Source-Based User-Defined

Setting Commit Properties 293


C h a p t e r 1 1

Recovering Data


♦ Overview, 296

♦ Preparing for Recovery, 297

♦ Recovering a Suspended Workflow, 305

♦ Recovering a Failed Workflow, 308

♦ Recovering a Session Task, 311

♦ Server Handling for Recovery, 314

♦ Completing Unrecoverable Sessions, 316

295

Overview

If you stop a session or if an error causes a session to stop unexpectedly, refer to the session logs to determine the cause of the failure. Correct the errors, and then complete the session. The method you use to complete the session depends on the configuration of the mapping and the session, the specific failure, and how much progress the session made before it failed. If the PowerCenter Server did not commit any data, run the session again. If the session issued at least one commit and is recoverable, consider running the session in recovery mode.

Recovery allows you to restart a failed session and complete it as if the session had run without pause. When the PowerCenter Server runs in recovery mode, it continues to commit data from the point of the last successful commit. For more information on PowerCenter Server processing during recovery, see “Server Handling for Recovery” on page 314.

All recovery sessions run as part of a workflow. When you recover a session, you also have the option to run part of the workflow. Consider the configuration and design of the workflow and the status of other tasks in the workflow before you choose a method of recovery. Depending on the configuration and status of the workflow and session, you can choose one or more of the following recovery methods:

♦ Recover a suspended workflow. If the workflow suspends due to session failure, you can recover the failed session and resume the workflow. For details, see “Recovering a Suspended Workflow” on page 305.

♦ Recover a failed workflow. If the workflow fails as a result of session failure, you can recover the session and run the rest of the workflow. For details, see “Recovering a Failed Workflow” on page 308.

♦ Recover a session task. If the workflow completes, but a session fails, you can recover the session alone without running the rest of the workflow. You can also use this method to recover multiple failed sessions in a branched workflow. For details, see “Recovering a Session Task” on page 311.

For more information on session failure, see “Stopping and Aborting a Session” on page 200.

296 Chapter 11: Recovering Data

Preparing for Recovery

Before you perform recovery, you must configure the mapping, session, workflow, and target database to ensure that the recovery session will consistently read, transform, and write data as though the session had not failed.

Under certain circumstances, you cannot recover the session and must run it again. For more information on completing unrecoverable sessions, see “Completing Unrecoverable Sessions” on page 316.

Configuring the MappingWhen you design a mapping, consider requirements for session recovery. Configure the mapping so that the PowerCenter Server can extract, transform, and load data with the same results each time it runs the session.

Use the following guidelines when you configure the mapping:

♦ Sort the data from the source. This guarantees that the PowerCenter Server always receives source rows in the same order. You can do this by configuring the Sorted Ports option in the Source Qualifier or Application Source Qualifier transformation or by adding a Sorter transformation configured for distinct output rows to the mapping after the source qualifier.

♦ Verify all targets receive data from transformations that produce repeatable data. Some transformations produce repeatable data. You can enable a session for recovery in the Workflow Manager when all targets in the mapping receive data from transformations that produce repeatable data. For more information on repeatable data, see “Working with Repeatable Data” on page 301.

Also, to perform consistent data recovery, the source, target, and transformation properties for the recovery session must be the same as those for the failed session. Do not change the properties of objects in the mapping before you run the recovery session.

Configuring the SessionTo perform recovery on a failed session, the session must meet the following criteria:

♦ The session is enabled for recovery.

♦ The previous session run failed and the recovery information is accessible.

To enable recovery, select the Enable Recovery option in the Error Handling settings of the Configuration tab in the session properties.

If you enable recovery and also choose to truncate the target for a relational normal load session, the PowerCenter Server does not truncate the target when you run the session in recovery mode.

Use the following guidelines when you enable recovery for a partitioned session:

Preparing for Recovery 297

♦ The Workflow Manager configures all partition points to use the default partitioning scheme for each transformation when you enable recovery.

♦ The Workflow Manager sets the partition type to pass-through unless the transformation receiving the data is either an Aggregator transformation, a Rank transformation, or a sorted Joiner transformation.

♦ You can only enable recovery for unsorted Joiner transformations with one partition.

♦ For Custom transformations, you can enable recovery only for transformations with one input group.

The PowerCenter Server disables test load when you enable the session for recovery.

To perform consistent data recovery, the session properties for the recovery session must be the same as the session properties for the failed session. This includes the partitioning configuration and the session sort order.

Configuring the WorkflowThe recovery method you choose for the workflow depends on the design and configuration of the workflow. As with sessions, you can configure a workflow so that you can correct errors and complete the workflow as though it ran without error.

If other tasks or workflows in your environment depend on the successful completion of a session, configure the workflow containing the session to suspend on error. This is useful for sequential and concurrent sessions because it prevents the PowerCenter Server from continuing the workflow after the session fails. This is also useful if multiple concurrent sessions fail or if other workflows depend on the successful completion of the workflow. For details on recovering a suspended workflow, see “Recovering a Suspended Workflow” on page 305.

If you do not want to configure the workflow to suspend on error, you can configure recoverable sessions to fail the workflow if the session fails. This prevents the PowerCenter Server from continuing to run the workflow after the session fails. In this case, you may want to perform recovery by running the part of the workflow that did not yet run. For more information, see “Recovering a Failed Workflow” on page 308.

You can also allow the workflow to complete even if sessions or other tasks fail. You can then choose to recover only the failed session tasks. This allows you to recover the sessions without running previously successful tasks. For more information, see “Recovering a Session Task” on page 311.

Configuring the Target DatabaseWhen the PowerCenter Server runs a session in recovery mode, it uses information in recovery tables that it creates on the target database system. The PowerCenter Server creates the recovery tables when it runs a session enabled for recovery. If the tables already exist, the PowerCenter Server writes information to them.


The PowerCenter Server creates the following recovery tables in the target database:

♦ PM_RECOVERY. This table records target load information during the session run. The PowerCenter Server removes the information from this table after each successful session and initializes the information at the beginning of subsequent sessions.

♦ PM_TGT_RUN_ID. This table records information the PowerCenter Server uses to identify each target on the database. The information remains in the table between session runs.

If you want the PowerCenter Server to create the recovery tables, you must grant table creation privileges to the database user name for the target database connection. If you do not want the PowerCenter Server to create the recovery tables, you must create the recovery tables manually.

Do not edit or drop the recovery tables while recovery is enabled. If you want to disable recovery, the PowerCenter Server does not remove the recovery tables from the target database. You must manually remove the recovery tables.

Table 11-1 describes the format of PM_RECOVERY:

Table 11-2 describes the format of PM_TGT_RUN_ID:

Note: If you manually create the PM_TGT_RUN_ID table, you must specify a value other than zero in the LAST_TGT_RUN_ID column to ensure that the session runs successfully in recovery mode.

Table 11-1. PM_RECOVERY Table Definition

Column Name Datatype

REP_GID VARCHAR(240)

WFLOW_ID NUMBER

SUBJ_ID NUMBER

TASK_INST_ID NUMBER

TGT_INST_ID NUMBER

PARTITION_ID NUMBER

TGT_RUN_ID NUMBER

RECOVERY_VER NUMBER

CHECK_POINT NUMBER

ROW_COUNT NUMBER

Table 11-2. PM_TGT_RUN_ID Table Definition

Column Name Datatype

LAST_TGT_RUN_ID NUMBER

Preparing for Recovery 299

Creating pmcmd ScriptsYou can use pmcmd to perform recovery from the command line or in a script. When you use pmcmd commands in a script, pmcmd indicates the success or failure of the command with a return code. The following return codes apply to recovery sessions.

Table 11-3 describes the return codes for pmcmd that relate to recovery:

For details on additional pmcmd return codes, see “pmcmd Return Codes” on page 590.

Table 11-3. pmcmd Return Codes for Recovery

Code Description

12 The PowerCenter Server cannot start recovery because the session or workflow is scheduled, suspending, waiting for an event, waiting, initializing, aborting, stopping, disabled, or running.

19 The PowerCenter Server cannot start the session in recovery mode because the workflow is configured to run continuously.


Working with Repeatable Data

You can enable a session for recovery in the Workflow Manager when all targets in the mapping receive data from transformations that produce repeatable data. All transformations have a property that determines when the transformation produces repeatable data. For most transformations, this property is hidden. However, you can write the Custom transformation procedure to output repeatable data, and then configure the Custom transformation Output Is Repeatable property to match the procedure behavior.

Transformations can produce repeatable data under the following circumstances:

♦ Never. The order of the output data is inconsistent between session runs. This is the default for active Custom transformations.

♦ Based on input order. The output order is consistent between session runs when the input data order for all input groups is consistent between session runs. This is the default for passive Custom transformations.

♦ Always. The order of the output data is consistent between session runs even if the order of the input data is inconsistent between session runs.

♦ Based on transformation configuration. The transformation produces repeatable data depending on how you configure the transformation. You can always enable the session for recovery, but you may get inconsistent results depending on how you configure the transformation.

Table 11-4 lists which transformations produce repeatable data:

Table 11-4. Transformations that Output Repeatable Data

Transformation Output is Repeatable

Source Qualifier (relational) Based on transformation configuration.Use sorted ports to produce repeatable data. Or, add a transformation that produces repeatable data immediately after the Source Qualifier transformation. If you do not do either of these options, you might get inconsistent results.

Source Qualifier (flat file) Always.

Application Source Qualifier Based on transformation configuration.Use sorted ports for relational sources, such as Siebel sources, to produce repeatable data. Or, add a transformation that produces repeatable data immediately after the Application Source Qualifier transformation. If you do not do either of these options, you might get inconsistent results.

MQ Source Qualifier Always.

XML Source Qualifier Always.

Aggregator Always.

Custom Based on transformation configuration.Configure the Output is Repeatable property according to the Custom transformation procedure behavior.

Working with Repeatable Data 301

To run a session in recovery mode, you must first enable the failed session for recovery. To enable a session for recovery, the Workflow Manager verifies all targets in the mapping receive data from transformations that produce repeatable data. The Workflow Manager uses the values in the Table 11-4 to determine whether or not you can enable a session for recovery.

However, the Workflow Manager cannot verify whether or not you configure some transformations, such as the Sequence Generator transformation, correctly and always allows you to enable these sessions for recovery. You may get inconsistent results if you do not configure these transformations correctly.

Expression Based on input order.

External Procedure Based on input order.

Filter Based on input order.

Joiner Based on input order.

Lookup Based on input order.

Normalizer (VSAM) Always.You can enable the session for recovery, however, you might get inconsistent results if you run the session in recovery mode. The Normalizer transformation generates source data in the form of primary keys. Recovering a session might generate different values than if the session completed successfully. However, the PowerCenter Server continues to produce unique key values.

Normalizer (pipeline) Based on input order.

Rank Always.

Router Based on input order.

Sequence Generator Based on transformation configuration.You must reset the sequence value to the value set in the failed session run. If you do not, you might get inconsistent results.

Sorter, configured for distinct output rows

Always.

Sorter, not configured for distinct output rows

Based on input order.

Stored Procedure Based on input order.

Transaction Control Based on input order.

Union Never.

Update Strategy Based on input order.

XML Generator Always.

XML Parser Always.

Table 11-4. Transformations that Output Repeatable Data

Transformation Output is Repeatable


You cannot enable a session for recovery in the Workflow Manager under the following circumstances:

♦ You connect a transformation that never produces repeatable data directly to a target. To enable this session for recovery, you can add a transformation that always produces repeatable data between the transformation that never produces repeatable data and the target.

♦ You connect a transformation that never produces repeatable data directly to a transformation that produces repeatable data based on input order. To enable this session for recovery, you can add a transformation that always produces repeatable data immediately after the transformation that never produces repeatable data.

When a mapping contains a transformation that never produces repeatable data, you can add a transformation that always produces repeatable data immediately after it.

Note: In some cases, you might get inconsistent data if you run some sessions in recovery mode. For a description of circumstances that might lead to inconsistent data, see “Completing Unrecoverable Sessions” on page 316.

Figure 11-1 illustrates a mapping you can enable for recovery:

The mapping contains an Aggregator transformation that always produces repeatable data. The Aggregator transformation provides data for the Lookup and Expression transformations. Lookup and Expression transformations produce repeatable data if they receive repeatable data. Therefore, the target receives repeatable data, and you can enable this session for recovery.

Figure 11-1. Mapping You Can Enable for Recovery

Working with Repeatable Data 303

Figure 11-2 illustrates a mapping you cannot enable for recovery:

The mapping contains two Source Qualifier transformations that produce repeatable data. However, the mapping contains a Union and Custom transformation downstream that never produce repeatable data. The Lookup transformation only produces repeatable data if it receives repeatable data. Therefore, the target does not receive repeatable data, and you cannot enable this session for recovery.

You can modify this mapping to enable the session for recovery by adding a Sorter transformation configured for distinct output rows immediately after transformations that never output repeatable data. Since the Union transformation is connected directly to another transformation that never produces repeatable data, you only need to add a Sorter transformation after the Custom transformation, as shown in the mapping in Figure 11-3:

Figure 11-2. Mapping You Cannot Enable for Recovery

Figure 11-3. Modified Mapping You Can Enable for Recovery

Never produces repeatable data.

Configured for distinct output rows.Always produces repeatable data.Configured for distinct output rows.Always produces repeatable data.

Never produces repeatable data.

Configured for distinct output rows.Always produces repeatable data.

Produces repeatable data based on input order.

Configured for distinct output rows.Always produces repeatable data.


Recovering a Suspended Workflow

You can configure the workflow to suspend if a task fails. If a session that is enabled for recovery fails, you can correct the error that caused the session to fail and resume the suspended workflow in recovery mode. When the PowerCenter Server resumes the workflow, it runs the failed session in recovery mode. If the recovery session succeeds, the PowerCenter Server runs the rest of the workflow.

You can recover a suspended workflow with sequential or concurrent sessions. For workflows with either sequential or concurrent sessions, suspending the workflow on error is useful if successive tasks in the workflow depend on the success of the previous sessions. For a workflow with concurrent sessions, resuming a suspended workflow in recovery mode also allows you to simultaneously recover concurrent failed sessions.

You can only resume a suspended workflow in recovery mode if a session that is enabled for recovery fails. If a session fails that is not enabled for recovery, you can resume the workflow normally. When you resume the workflow, the PowerCenter Server restarts the session. If the session succeeds, the PowerCenter Server runs the rest of the workflow.

To configure the workflow to suspend on error, enable the Suspend On Error option on the General tab of the workflow properties. For more information about suspending the workflow, see “Suspending the Workflow” on page 127.

For steps on recovering a suspended workflow, see “Steps for Recovering a Suspended Workflow” on page 307.

Recovering a Suspended Workflow with Sequential SessionsWhen a sequential session enabled for recovery fails, the PowerCenter Server places the workflow in a suspended state. While the workflow is suspended, you can correct the error that caused the session to fail.

After you correct the error, you can resume the workflow in recovery mode. When it resumes the workflow, the PowerCenter Server starts the failed session in recovery mode.

If the recovery session succeeds, the PowerCenter Server runs the rest of the workflow. If the recovery session fails, the PowerCenter Server suspends the workflow again.

ExampleSuppose the workflow w_ItemOrders contains two sequential sessions. In this workflow, s_ItemSales is enabled for recovery, and the workflow is configured to suspend on error.

Recovering a Suspended Workflow 305

Figure 11-4 illustrates w_ItemOrders:

Suppose s_ItemSales fails, and the PowerCenter Server suspends the workflow. You correct the error and resume the workflow in recovery mode. The PowerCenter Server recovers the session successfully, and then runs s_UpdateOrders.

If s_UpdateOrders also fails, the PowerCenter Server suspends the workflow again. You correct the error, but you cannot resume the workflow in recovery mode because you did not enable the session for recovery. Instead, you resume the workflow. The PowerCenter Server starts s_UpdateOrders from the beginning, completes the session successfully, and then runs the StopWorkflow control task.

Recovering a Suspended Workflow with Concurrent SessionsWhen a concurrent session enabled for recovery fails, the PowerCenter Server places the workflow in a suspending state while it completes any other concurrently running tasks. After concurrent tasks succeed or fail, the PowerCenter Server places the workflow in a suspended state. While the workflow is suspended, you can correct the error that caused the session to fail. If concurrent tasks failed, you can also correct those errors.

After you correct the error, you can resume the workflow in recovery mode. The PowerCenter Server runs the failed session in recovery mode. If multiple concurrent sessions failed, the PowerCenter Server starts all failed sessions enabled for recovery in recovery mode, and restarts other concurrent tasks or sessions not enabled for recovery.

After successful recovery or completion of all failed sessions and tasks, the PowerCenter Server completes the rest of the workflow. If a recovery session or task fails again, the PowerCenter Server suspends the workflow.

ExampleSuppose you have the workflow w_ItemsDaily, containing three concurrent sessions, s_SupplierInfo, s_PromoItems, and s_ItemSales. In this workflow, s_SupplierInfo and s_PromoItems are enabled for recovery, and the workflow is configured to suspend on error.

Figure 11-4. Resuming a Suspended Workflow with Sequential Sessions

Session enabled for recovery.

Workflow configured to suspend on error.


Figure 11-5 illustrates w_ItemsDaily:

Suppose s_SupplierInfo fails while the PowerCenter Server is running the three sessions. The PowerCenter Server places the workflow in a suspending state and continues running the other two sessions. s_PromoItems and s_ItemSales also fail, and the PowerCenter Server then places the workflow in a suspended state.

You correct the errors that caused each session to fail and then resume the workflow in recovery mode. The PowerCenter Server starts s_SupplierInfo and s_PromoItems in recovery mode. Since s_ItemSales is not enabled for recovery, it restarts the session from the beginning. The PowerCenter Server runs the three sessions concurrently.

After all sessions succeed, the PowerCenter Server runs the Command task.

Steps for Recovering a Suspended WorkflowYou can use the Workflow Monitor to resume a workflow in recovery mode. If the workflow or session is currently scheduled, waiting, or disabled, the PowerCenter Server cannot run the session in recovery mode. You must stop or unschedule the workflow or stop the session.

To resume a workflow or worklet in recovery mode:

1. In the Navigator, select the suspended workflow you want to resume.

2. Choose Task-Resume/Recover.

The PowerCenter Server resumes the workflow.

You can also use pmcmd to resume a workflow in recovery mode. For more information, see “Using pmcmd” on page 581.

Figure 11-5. Resuming a Suspended Workflow with Concurrent SessionsSessions enabled for recovery.

Workflow configured to suspend on error.

Recovering a Suspended Workflow 307

Recovering a Failed Workflow

You can configure a session to fail the workflow if the session fails. If the session is also enabled for recovery, you can correct the error that caused the session to fail and recover the workflow from the failed session. When the PowerCenter Server recovers the workflow from the failed session, it runs the failed session in recovery mode. If the recovery session succeeds, the PowerCenter Server runs the rest of the workflow.

You can recover a workflow from a failed sequential or concurrent session. You might want to fail a workflow as a result of session failure if successive tasks in the workflow depend on the success of the previous sessions.

To configure a session to fail the workflow if the session fails, enable the Fail Parent If This Task Fails option on the General tab of the session properties. For more information, see “Working with Tasks” on page 131.

For steps on recovering a failed workflow, see “Steps for Recovering a Failed Workflow” on page 310.

Recovering a Failed Workflow with Sequential SessionsWhen a sequential session fails that is enabled for recovery and configured to fail the workflow, the PowerCenter Server fails the workflow. You can correct the error that caused the session to fail and recover the workflow from the failed session. When the PowerCenter Server recovers the workflow from the session, it runs the session in recovery mode.

If the recovery session succeeds, the PowerCenter Server runs the rest of the workflow. If the recovery session fails, the PowerCenter Server fails the workflow again.

ExampleSuppose the workflow w_ItemOrders contains two sequential sessions. s_ItemSales is enabled for recovery and also configured to fail the parent workflow if it fails.

Figure 11-6 illustrates w_ItemOrders:

Figure 11-6. Recovering Part of a Workflow With Sequential SessionsSession enabled for recovery.

Sessions configured to fail workflow if either session fails.


Suppose s_ItemSales fails, and the PowerCenter Server fails the workflow. You correct the error and recover the workflow from s_ItemSales. The PowerCenter Server successfully recovers the session, and then runs the next task in the workflow, s_UpdateOrders.

Suppose s_UpdateOrders also fails, and the PowerCenter Server fails the workflow again. You correct the error, but you cannot recover the workflow from the session. Instead, you start the workflow from the session. The PowerCenter Server starts s_UpdateOrders from the beginning, completes the session successfully, and then runs the StopWorkflow control task.

Recovering a Failed Workflow with Concurrent SessionsWhen a concurrent session fails that is enabled for recovery and configured to fail the workflow, the PowerCenter Server fails the workflow. You can then correct the error that caused the session to fail and recover the workflow from the failed session. When the PowerCenter Server recovers the workflow, it runs the session in recovery mode. If the recovery session succeeds, the PowerCenter Server runs successive tasks in the workflow in the same path as the session. The PowerCenter Server does not recover or restart concurrent tasks when you recover a workflow from a failed session.

If multiple concurrent sessions fail that are enabled for recovery and configured to fail the workflow, the Informatica fails the workflow when the first session fails. Concurrent sessions continue to run until they succeed or fail. After all concurrent sessions complete, you can correct the errors that caused failures.

After you correct the errors, you can recover the workflow. If multiple sessions enabled for recovery fail, individually recover all but one failed session. You can then recover the workflow from the remaining failed session. This ensures that the Informatica recovers all concurrent failed sessions before it runs the rest of the workflow. For details on recovering a session individually, see “Recovering a Session Task” on page 311.

ExampleSuppose the workflow w_ItemsDaily contains three concurrent sessions, s_SupplierInfo, s_PromoItems, and s_ItemSales. In this workflow, each session is enabled for recovery and configured to fail the parent workflow if the session fails.


Figure 11-7. Recovering Part of a Workflow with Concurrent Sessions

Sessions enabled for recovery.Sessions configured to fail parent workflow if the session fails.

Recovering a Failed Workflow 309

Suppose s_SupplierInfo fails while the three concurrent sessions are running, and the PowerCenter Server fails the workflow. s_PromoItems and s_ItemSales also fail. You correct the errors that caused each session to fail.

In this case, you must combine two recovery methods to run all sessions before completing the workflow. You recover s_PromoItems individually. You cannot recover s_ItemSales because it is not enabled for recovery, but you start the session from the beginning. After the PowerCenter Server successfully completes s_PromoItems and s_ItemSales, you recover the workflow from s_SupplierInfo. The PowerCenter Server runs the session in recovery mode, and then runs the Command task.

Steps for Recovering a Failed WorkflowYou can use the Workflow Manager or Workflow Monitor to recover a failed workflow. If the workflow or session is currently scheduled, waiting, or disabled, the PowerCenter Server cannot run the session in recovery mode. You must stop or unschedule the workflow or stop the session.

To recover a failed workflow using the Workflow Manager:

1. Select the failed session in the Navigator or in the Workflow Designer workspace.

2. Right-click the failed session and choose Recover Workflow from Task.

The PowerCenter Server runs the failed session in recovery mode, and then runs the rest of the workflow.

To recover a failed workflow using the Workflow Monitor:

1. Select the failed session in the Navigator.

2. Right-click the session and choose Recover Workflow From Task.

or

Choose Task-Recover Workflow From Task.

The PowerCenter Server runs the session in recovery mode.

You can also use pmcmd to recover a failed workflow. For more information, see “Using pmcmd” on page 581.


Recovering a Session Task

If you do not configure the workflow to suspend on error, and you do not configure the workflow to fail if sessions or tasks fail, the PowerCenter Server completes the workflow even if it encounters errors. If a session fails, but other tasks in the workflow complete successfully, you may want to recover only the failed session. When the PowerCenter Server recovers a session, it runs the session in recovery mode.

You can recover sequential or concurrent sessions. For workflows with sequential sessions, individually recovering a session is useful if the rest of the workflow succeeded and you need to recover the failed session. This allows you to recover the session without restarting successful tasks.

For workflows with concurrent sessions, this method is useful if multiple concurrent sessions fail and also cause the workflow to fail. You can individually recover concurrent sessions and individually start subsequent tasks in the workflow paths until the paths converge at a single task.

In other complex, branched workflows, individually recovering multiple failed sessions allows you to specify the order in which the sessions run.

Recovering Sequential SessionsWhen a sequential session enabled for recovery fails, and the workflow is not configured to suspend or fail on error, the PowerCenter Server continues to run the workflow. You can correct the error that caused the session to fail.

After you correct the error, you can individually recover the failed session. When the PowerCenter Server individually recovers a session, it runs the session in recovery mode. It does not run other tasks in the workflow.

Recovering Concurrent SessionsWhen a concurrent session enabled for recovery fails, the PowerCenter Server continues to run the workflow. Other tasks and the workflow may succeed. You can correct the error that caused the session to fail. If concurrent tasks failed, you can also correct those errors. After you correct the errors, you can individually recover each session without running the rest of the workflow.

If multiple concurrent sessions fail that are enabled for recovery and configured to fail the workflow on session failure, the PowerCenter Server fails the workflow. You can correct the errors that caused the sessions to fail. After you correct the errors, you can individually recover each session. Once all concurrent tasks are recovered or complete, you can start the session from a task where the concurrent paths converge.

Recovering a Session Task 311

ExampleSuppose the workflow w_ItemsDaily contains three concurrently running sessions. Each session is enabled for recovery and configured to fail the workflow if the session fails.


Suppose s_ItemSales fails and the PowerCenter Server fails the workflow. s_PromoItems and s_SupplierInfo also fail. You correct the errors that caused the sessions to fail.

After you correct the errors, you individually recover each failed session. The PowerCenter Server successfully recovers the sessions. The workflow paths after the sessions converge at the Command task, allowing you to start the workflow from the Command task and complete the workflow.

Alternatively, after you correct the errors, you could also individually recover two of the three failed sessions. After the PowerCenter Server successfully recovers the sessions, you can recover the workflow from the third session. The PowerCenter Server then recovers the third session and, on successful recovery, runs the rest of the workflow.

Steps for Recovering a Session TaskYou can use the Workflow Manager or Workflow Monitor to recover a failed session in a workflow. If the workflow or session is currently scheduled, waiting, or disabled, the PowerCenter Server cannot run the session in recovery mode. You must stop or unschedule the workflow or stop the session.

To recover a failed session using the Workflow Manager:

1. Select the failed session in the Navigator or in the Workflow Designer workspace.

2. Right-click the failed session and choose Recover Task.


To recover a failed session using the Workflow Monitor:

1. Select the failed session in the Navigator.

Figure 11-8. Recovering Concurrent Sessions Individually

Sessions enabled for recovery.Sessions configured to fail parent workflow if the session fails.


2. Right-click the session and choose Recover Task.

or

Choose Task-Recover Task.


You can also use pmcmd to recover a failed session. For more information, see “Using pmcmd” on page 581.

Recovering a Session Task 313

Server Handling for Recovery

The PowerCenter Server writes recovery data to relational target databases when you run a session enabled for recovery. If the session fails, the PowerCenter Server uses the recovery data to determine the point at which it continues to commit data during the recovery session.

Verifying Recovery TablesThe PowerCenter Server creates recovery information in cache files for all sessions enabled for recovery. It also creates recovery tables on the target database for relational targets during the initial session run.

If the session is enabled for recovery, the PowerCenter Server creates recovery information in cache files during the normal session run. The PowerCenter Server stores the cache files in the directory specified for $PMCacheDir. The PowerCenter Server generates file names in the format PMGMD_METADATA_*.dat. Do not alter these files or remove them from the PowerCenter Server cache directory. The PowerCenter Server cannot run the recovery session if you delete the recovery cache files.

If the session writes to a relational database and is enabled for recovery, the PowerCenter Server also verifies the recovery tables on the target database for all relational targets at the beginning of a normal session run. If the tables do not exist, the PowerCenter Server creates them. If the database user name the PowerCenter Server uses to connect to the target database does not have permission to create the recovery tables, you must manually create them. For information about recovery table structure, see “Configuring the Target Database” on page 298.

During the session run, the PowerCenter Server writes target load information for normal load targets into the recovery tables. If the session fails, the PowerCenter Server uses this information to complete the session in recovery mode. If the session is configured to write to relational targets in bulk mode, the PowerCenter Server does not write recovery information to the recovery tables.

If the session completes successfully, the PowerCenter Server deletes all recovery cache files and removes recovery table entries that are related to the session. The PowerCenter Server initializes the information in the recovery tables at the beginning of the next session run.

The PowerCenter Server also uses the recovery cache files to store messages from real-time sources. For more information, see your PowerCenter Connect documentation.

Running RecoveryIf a session enabled for recovery fails, you can run the session in recovery mode. The PowerCenter Server moves a recovery session through the states of a normal session: scheduled, waiting, running, succeeded, and failed. When the PowerCenter Server starts the recovery session, it runs all pre-session tasks.


For relational normal load targets, the PowerCenter Server performs incremental load recovery. It uses the recovery information created during the normal session run to determine the point at which the session stopped committing data to the target. It then continues writing data to the target. On successful recovery, the PowerCenter Server removes the recovery information from the tables.

For example, if the PowerCenter Server commits 10,000 rows before the session fails, when you run the session in recovery mode, the PowerCenter Server bypasses the rows up to 10,000 and starts loading with row 10,001.

If the session writes to a relational target in bulk mode, the PowerCenter Server performs the entire writer run. If the Truncate Target Table option is enabled in the session properties, the PowerCenter Server truncates the target before loading data.

If the session writes to a flat file or XML file, the PowerCenter Server performs full load recovery. It overwrites the existing output file and performs the entire writer run. If the session writes to heterogeneous targets, the PowerCenter Server performs incremental load recovery for all relational normal load targets and full load recovery for all other target types.

On successful recovery, the PowerCenter Server deletes recovery cache files associated with the session. It also performs all post-session tasks.

Server Handling for Recovery 315

Completing Unrecoverable Sessions

In some cases, you cannot perform recovery for a session. There may also be circumstances that cause a recovery session to fail or produce inconsistent data. If you cannot recover a session, you can run the session again.

You cannot run sessions in recovery mode under the following circumstances:

♦ You change the number of partitions. If you change the number of partitions after the session fails, the recovery session fails.

♦ Recovery table is empty or missing from the target database. The PowerCenter Server fails the recovery session under the following circumstances:

− You deleted the table after the PowerCenter Server created it.

− The session enabled for recovery succeeded, and the PowerCenter Server removed the recovery information from the table.

♦ Recovery cache file is missing. The PowerCenter Server fails the recovery session if the recovery cache file is missing from the PowerCenter Server cache directory.

♦ The PowerCenter Server performing recovery is on a different operating system. The operating system of the PowerCenter Server that runs the recovery session must be the same as the operating system of the PowerCenter Server that ran the failed session.

You might get inconsistent data if you perform recovery under the following circumstances:

♦ You change the partitioning configuration. If you change any partitioning options after the session fails, you may get inconsistent data.

♦ Source data is not sorted. To perform a successful recovery, the PowerCenter Server must process source rows during recovery in the same order it processes them during the initial session. Use the Sorted Ports option in the Source Qualifier transformation or add a Sorter transformation directly after the Source Qualifier transformation.

♦ The sources or targets change after the initial session failure. If you drop or create indexes, or edit data in the source or target tables before recovering a session, the PowerCenter Server may return missing or repeat rows.

♦ The session writes to a relational target in bulk mode, but the session is not configured to truncate the target table. The PowerCenter Server may load duplicate rows to the during the recovery session.

♦ The mapping uses a Normalizer transformation. The Normalizer transformation generates source data in the form of primary keys. Recovering a session might generate different values than if the session completed successfully. However, the PowerCenter Server will continue to produce unique key values.

♦ The mapping uses a Sequence Generator transformation. The Sequence Generator transformation generates source data in the form of sequence values. Recovering a session might generate different values than if the session completed successfully.

If you want to ensure the same sequence data is generated during the recovery session, you can reset the value specified as the Current Value in the Sequence Generator


transformation properties to the same value used when you ran the failed session. If you do not reset the Current Value, the PowerCenter Server will continue to generate unique Sequence values.

♦ The session performs incremental aggregation and the PowerCenter Server stops unexpectedly. If the PowerCenter Server stops unexpectedly while running an incremental aggregation session, the recovery session cannot use the incremental aggregation cache files. Rename the backup cache files for the session from PMAGG*.idx.bak and PMAGG*.dat.bak to PMAGG*.idx and PMAGG*.dat before you perform recovery.

♦ The PowerCenter Server data movement mode changes after the initial session failure. If you change the data movement mode before recovering the session, the PowerCenter Server might return incorrect data.

♦ The PowerCenter Server code page or source and target code pages change after the initial session failure. If you change the source, target, or PowerCenter Server code pages, the PowerCenter Server might return incorrect data. You can perform recovery if the new code pages are two-way compatible with the original code pages.

♦ The PowerCenter Server runs in Unicode mode and you change the session sort order. When the PowerCenter Server runs in Unicode mode, it sorts character data based on the sort order selected for the session. Do not perform recovery if you change the session sort order after the session fails.

Completing Unrecoverable Sessions 317


C h a p t e r 1 2

Sending Email


♦ Overview, 320

♦ Configuring Email on UNIX, 321

♦ Configuring Email on Windows, 322

♦ Working with Email Tasks, 328

♦ Working with Post-Session Email, 332

♦ Working with Suspension Email, 339

♦ Using Email Tasks in a Workflow or Worklet, 341

♦ Tips, 342

319

Overview

You can send email to designated recipients when the PowerCenter Server runs a workflow. For example, if you want to track how long a session takes to complete, you can configure the session to send an email containing the time and date the session starts and completes. Or, if you want the PowerCenter Server to notify you when a workflow suspends, you can configure the workflow to send email when it suspends.

When you create a workflow or worklet, you can include the following types of email:

♦ Email task. You can include reusable and non-reusable Email tasks anywhere in the workflow or worklet. For more information, see “Using Email Tasks in a Workflow or Worklet” on page 341.

♦ Post-session email. You can configure the session so the PowerCenter Server sends an email when the session completes or fails. You create an Email task and use it for post-session email. For more information, see “Working with Post-Session Email” on page 332.

When you configure the subject and body of post-session email, you can use email variables to include information about the session run, such as session name, status, and the total number of records loaded. You can also use email variables to attach the session log or other files to email messages. For more information, see “Email Variables and Format Tags” on page 333.

♦ Suspension email. You can configure the workflow so the PowerCenter Server sends an email when the workflow suspends. You create an Email task and use it for suspension email. For more information, see “Working with Suspension Email” on page 339.

Before you can configure a session or workflow to send email, you need to create an Email task. For more information, see “Working with Email Tasks” on page 328.

The PowerCenter Server on Windows sends email in MIME format. This allows you to include characters in the subject and body that are not in 7-bit ASCII. For more information on the MIME format or the MIME decoding process, see your email documentation.

Before creating Email tasks, configure the PowerCenter Server to send email. For more information, see “Configuring Email on UNIX” on page 321 and “Configuring Email on Windows” on page 322.

320 Chapter 12: Sending Email

Configuring Email on UNIX

The PowerCenter Server on UNIX uses rmail to send email. To send email, the repository user who starts the PowerCenter Server must have the rmail tool installed in the path.

If you want to send email to more than one person, separate the email address entries with a comma. Do not put spaces between addresses.

To verify the rmail tool is accessible on AIX:

1. Log on to the UNIX system as the Informatica user who starts the PowerCenter Server.

2. Type the following lines at the prompt and press Enter:

rmail <your fully qualified email address>,<second fully qualified email address>

From <your_user_name>

3. To indicate the end of the message, type ^D.

You should receive a blank email from the email account of the user you specify in the From line. If not, locate the directory where rmail resides and add that directory to the path.

To verify the rmail tool is accessible on all other UNIX machines:

1. Log on to the UNIX system as the Informatica user who starts the PowerCenter Server.

2. Type the following line at the prompt and press Enter:

rmail <your fully qualified email address>,<second fully qualified email address>

3. To indicate the end of the message, type . on a line of its own and press Enter. Or, type ^D.

You should receive a blank email from the email account of the Informatica user. If not, locate the directory where rmail resides and add that directory to the path.

Once you verify that rmail is installed correctly, you can send email. For more information on configuring email, see “Working with Email Tasks” on page 328.

Configuring Email on UNIX 321

Configuring Email on Windows

The PowerCenter Server on Windows uses Microsoft Outlook to send email using the MAPI interface. You must meet the following requirements to send email on a PowerCenter Server on Windows:

♦ Install the Microsoft Outlook mail client on the PowerCenter Server machine.

♦ Run Microsoft Outlook on a Microsoft Exchange Server.

♦ Create a Windows user account that has Log on as a service rights and a Microsoft Outlook profile.

To configure the PowerCenter Server on Windows to send email, you must perform the following steps:

1. Verify the Informatica Service startup account.

2. Configure a Microsoft Outlook profile for the Informatica Service startup account.

3. Configure Logon network security.

4. Create distribution lists in the Personal Address Book in Microsoft Outlook.

5. Configure the PowerCenter Server to send email using the Microsoft Outlook profile you created in step 2.

Step 1. Verify the Informatica Service Startup AccountYou must have an Informatica Service startup account, which grants a user the Log on as a service right to start the Informatica Service. Verify the Informatica Service startup account so that you can create a Microsoft Outlook profile for the user who has Log on as a service right for the Informatica Service Start Account.

For details on verifying service rights, see the Troubleshooting section of “Installing and Configuring the PowerCenter Server on Windows” in the Installation and Configuration Guide.

Step 2. Configure a Microsoft Outlook UserYou must set up a Microsoft Outlook user for the Informatica Service startup account before configuring the PowerCenter Server to send email. The user profile must contain the following services:

♦ Microsoft Exchange Server

♦ Personal Address Book

Use the same log on name for both the Microsoft Outlook account you create and the user you grant Log on as a service rights in the Informatica Service startup account.

Note: If you do not already have a Microsoft Outlook mailbox for the Informatica Service startup account user, ask your network administrator to create one.


To configure a Microsoft Outlook user:

1. Open the Control Panel on the machine running the PowerCenter Server.

2. Double-click the Mail (or Mail and Fax) icon.

3. On the Services tab of the user Properties dialog box, click Show Profiles.

The Mail dialog box displays the list of profiles configured for the computer.

4. If you have a Microsoft Outlook profile set up for the Informatica Service startup account, skip to “Step 3. Configure Logon Network Security” on page 325. If you do not already have a Microsoft Outlook profile set up for the Informatica Service startup account, continue to the next step.

5. Click Add in the mail properties window.

The Microsoft Outlook Setup Wizard appears.

Configuring Email on Windows 323

6. Select Use The Following Information Services and then select Microsoft Exchange Server. Click Next.

7. Enter a profile name. You can enter any name, but Informatica recommends that you enter a text string that matches the Informatica Service startup account. Click Next.


8. Enter the name of the Microsoft Exchange Server. Enter your mailbox name. Click Next.

9. Indicate whether you travel with your computer. Click Next.

10. Enter the path to your personal address book. Click Next.

11. Indicate whether you want to run Outlook when you start Windows. Click Next.

12. The Setup Wizard indicates that you have successfully configured an Outlook profile.

13. Click Finish.

Step 3. Configure Logon Network SecurityYou must configure the Logon Network Security before you run the Microsoft Exchange Server Service.

To configure Logon Network Security for the Microsoft Exchange Server:

1. Open the Control Panel on the machine running the PowerCenter Server.

2. Double-click the Mail (or Mail and Fax) icon. The User Properties sheet appears.


3. On the Services tab, select Microsoft Exchange Server and click Properties.

4. Click the Advanced tab. Set the Logon network security option to NT Password Authentication.

5. Click OK.

Step 4. Create Distribution ListsWhen the PowerCenter Server runs on Windows, you can enter only one email address in the Workflow Manager. If you want to send email to multiple recipients, create a distribution list containing these addresses in the Personal Address Book in Microsoft Outlook. Enter the distribution list name as the recipient when configuring email.

For more information about working with your Personal Address Book, refer to Microsoft Outlook documentation.

Logon Network Security


Step 5. Configure the PowerCenter Server SetupAfter you create the Microsoft Outlook profile, configure the PowerCenter Server to send email as that Microsoft Outlook user.

To configure the PowerCenter Server as a Microsoft Outlook user:

1. From the PowerCenter Server Setup, click the Configuration tab.

2. In the MS Exchange Profile field, enter the name of the Microsoft Outlook profile you created for the Informatica Service startup account.

Microsoft Exchange Profile


Working with Email Tasks

The Workflow Manager provides an Email task that allows you to send email during a workflow. You can create reusable Email tasks in the Task Developer for any type of email. Or, you can create non-reusable Email tasks in the Workflow and Worklet Designer.

You can use Email tasks in any of the following locations:

♦ Session properties. You can configure the session to send email when the session completes or fails. For more information, see “Working with Post-Session Email” on page 332.

♦ Workflow properties. You can configure the workflow to send email when the workflow suspends. For more information, see “Working with Suspension Email” on page 339.

♦ Workflow or worklet. You can include an Email task anywhere in the workflow or worklet to send email based on a condition you define. For more information, see “Using Email Tasks in a Workflow or Worklet” on page 341.

Figure 12-1 shows the Edit Tasks dialog box for an Email task in the Task Developer:

Email Address Tips and GuidelinesConsider the following tips and guidelines when you enter the email address in an Email task:

♦ Enter the email address using 7-bit ASCII characters only.

♦ You can enter either the $PMSuccessEmailUser or $PMFailureEmailUser server variable for post-session email. For more information, see “Using Server Variables” on page 333.

Figure 12-1. Email Task


♦ If the PowerCenter Server runs on Windows, you can enter a Microsoft Exchange Profile name. The mail recipient must have an entry in the Global Address book of the Microsoft Outlook profile.

♦ If the PowerCenter Server runs on Windows, you can send email to multiple recipients by creating a distribution list in your Personal Address book. All recipients must also be in the Global Address book. You cannot enter multiple addresses separated by commas or semi-colons.

♦ If the PowerCenter Server runs on UNIX, you can enter multiple email addresses separated by a comma. Do not include spaces between email addresses.

Steps to Create an Email TaskYou can create Email tasks in the Task Developer, Worklet Designer, and Workflow Designer.

Use the following steps to create an Email task.

To create an Email task in the Task Developer:

1. In the Task Developer, choose Tasks-Create. The Create Task dialog box appears.

2. Select an Email task and enter a name for the task. Click Create.

The Workflow Manager creates an Email task in the workspace.

3. Click Done.

Working with Email Tasks 329

4. Double-click the Email task in the workspace. The Edit Tasks dialog box appears.

5. Click Rename to enter a name for the task.

6. You can optionally enter a description for the task in the Description field.

7. Click the Properties tab.

8. Enter the fully qualified email address of the mail recipient in the Email User Name field.

For more information on entering the email address, see “Email Address Tips and Guidelines” on page 328.

Enter the email text.


9. Enter the subject of the email in the Email Subject field. Or, you can leave this field blank.

10. Click the Open button in the Email Text field to open the Email Editor.

11. Enter the text of the email message in the Email Editor.

When you use the Email task, you can incorporate format tags in your message. For more information, see “Email Variables and Format Tags” on page 333.

You can leave the Email Text field blank.

12. Click OK twice to save your changes.

Working with Email Tasks 331

Working with Post-Session Email

You can configure a session so the PowerCenter Server sends email to someone when it fails or completes a session. You can create two Email tasks, one the PowerCenter Server sends if it completes the session, and the other if it fails the session.

The PowerCenter Server sends post-session email at the end of a session, after executing post-session shell commands or stored procedures. When the PowerCenter Server encounters an error sending the email, it writes a message to the server or event log. It does not fail the session.

The Workflow Manager includes the following session properties to send post-session email:

♦ On-Success Email

♦ On-Failure Email

Figure 12-2 shows the On-Success and On-Failure email properties on the Components tab of the session properties:

You can specify a reusable Email task you create in the Task Developer for either success email or failure email. Or, you can create a non-reusable Email task for each session property. When you create a non-reusable Email task for the session property, you create the Email task for that session only. You cannot use the Email task in the workflow or worklet.

Figure 12-2. Post-Session Email Properties

Select a reusable Email task.

Edit the non-reusable Email task.

Use a reusable Email task.

Use a non-reusable Email task.


You cannot specify a non-reusable Email task you create in the Workflow or Worklet Designer for post-session email.

Tip: When you configure an Email task for post-session email, use the email server variables, $PMSuccessEmailUser or $PMFailureEmailUser, for the email recipient. Verify you specify the values of the server variables for the PowerCenter Server that runs the session.

Using Server VariablesYou can use server variables to address post-session email. When you register the PowerCenter Server, you can configure its server variables. You can use the following server variables for sending post-session email:

♦ $PMSuccessEmailUser. Email address of the user to receive email when the session completes successfully. Use this variable for the Email User Name for success email only. The PowerCenter Server does not expand this variable when you use it for any other email type.

♦ $PMFailureEmailUser. Email address of the user to receive email when the session fails to complete. Use this variable for the Email User Name for failure email only. The PowerCenter Server does not expand this variable when you use it for any other email type.

When you use one of these server variables, the PowerCenter Server sends email to the address configured for the server variable.

You might use this functionality when you have an administrator who troubleshoots all failed sessions. Instead of entering the administrator email address for each session, you can use the email variable $PMFailureEmailUser. If the administrator changes, you can correct all sessions by editing the $PMFailureEmailUser server variable, instead of editing the email address in each session.

You might also use this functionality when you have different administrators for different PowerCenter Servers. If you deploy a folder from one repository to another or otherwise change the PowerCenter Server that runs the session, the new server automatically sends email to users associated with the new server when you use server variables instead of hard-coded email addresses.

Note: $PMSuccessEmailUser and $PMFailureEmailUser are optional server variables. Verify you define a variable before using it to address email.

Email Variables and Format TagsYou can use email variables and format tags in an email message for post-session emails. You can use some email variables in the subject of the email. With email variables, you can include important session information in the email, such as the number of rows loaded, the session completion time, or read and write statistics. You can also attach the session log or other relevant files to the email. Use format tags in the body of the message to make the message easier to read.

Working with Post-Session Email 333

Note: The PowerCenter Server does not limit the type or size of attached files. However, since large attachments can cause problems with your email system, avoid attaching excessively large files, such as session logs generated using verbose tracing. The PowerCenter Server generates an error message in the email if an error occurs attaching the file.

Table 12-1 describes the email variables you can use in a post-session email:

Table 12-2 lists the format tags you can use in an Email task:

Configuring Post-Session EmailYou can configure post-session email to use a reusable or non-reusable Email task.

Table 12-1. Email Variables for Post-Session Email

Email Variable Description

%s Session name.

%e Session status.

%b Session start time.

%c Session completion time.

%i Session elapsed time (session completion time-session start time).

%l Total rows loaded.

%r Total rows rejected.

%t Source and target table details, including read throughput in bytes per second and write throughput in rows per second. The PowerCenter Server includes all information displayed in the session detail dialog box.

%m Name of the mapping used in the session.

%n Name of the folder containing the session.

%d Name of the repository containing the session.

%g Attach the session log to the message.

%a<filename> Attach the named file. The file must be local to the PowerCenter Server. The following are valid file names: %a<c:\data\sales.txt> or %a</users/john/data/sales.txt>.Note: The file name cannot include the greater than character (>) or a line break.

Note: The PowerCenter Server ignores %a, %g, or %t when you include them in the email subject. Include these variables in the email message only.

Table 12-2. Format Tags for Email Tasks

Formatting Format Tag

tab \t

new line \n


Using a Reusable Email TaskUse the following steps to configure post-session email to use a reusable Email task.

To configure post-session email to use a reusable Email task:

1. Open the session properties and click the Components tab.

2. Select Reusable in the Type column for the success email or failure email field.

3. Click the Open button in the Value column to select the reusable Email task.


4. Select the Email task in the Object Browser dialog box and click OK.

5. You can optionally edit the Email task for this session property by clicking the Edit button in the Value column.

If you edit the Email task for either success email or failure email, the edits only apply to this session.

6. Click OK to close the session properties.

Using a Non-Reusable Email TaskFollow these steps to configure success email or failure email to use a non-reusable Email task.

To configure success email or failure email to use a non-reusable Email task:

1. Open the session properties and click the Components tab.

2. Select Non-Reusable in the Type column for the success email or failure email field.


3. Open the email editor using the Open button.

4. Edit the Email task and click OK. For more information on editing Email tasks, see “Working with Email Tasks” on page 328.

5. Click OK to close the session properties.

Sample EmailThe following is user-entered text from a sample post-session email configuration using variables:

Session complete.

Session name: %s

%l

%r

%e

%b

%c

%i

%g

The following is sample output from the configuration above:

Session complete.

Session name: sInstrTest

Total Rows Loaded = 1

Total Rows Rejected = 0

Completed


Start Time: Tue Nov 17 12:26:31 2003

Completion Time: Tue Nov 17 12:26:41 2003

Elapsed time: 0:00:10 (h:m:s)


Working with Suspension Email

You can configure a workflow to send email when the PowerCenter Server suspends the workflow. For example, when a task fails, the PowerCenter Server suspends the workflow and sends the suspension email. You can fix the error and resume the workflow.

If another task fails while the PowerCenter Server is suspending the workflow, you do not get the suspension email again. However, the PowerCenter Server sends another suspension email if another task fails after you resume the workflow.

For more information, see “Suspending the Workflow” on page 127.

Configure suspension email on the General tab of the workflow properties.

Figure 12-3 shows the Suspension Email workflow options:

To configure suspension email:


2. Choose Workflows-Edit to open the workflow properties.

3. On the General tab, select Suspend on Error.

Figure 12-3. Suspension Email

Select a reusable Email task.

Remove the reusable Email task.

Select Suspend On Error.

Working with Suspension Email 339

4. Click the Browse Emails button to select a reusable Email task.

Note: The Workflow Manager returns an error message if you do not have any reusable Email tasks in the folder. Create a reusable Email task in the folder before you configure suspension email.

5. Choose a reusable Email task and click OK.

6. Click OK to close the workflow properties.


Using Email Tasks in a Workflow or Worklet

You can use Email tasks anywhere in a workflow or worklet. For example, you can include an Email task in a workflow after a Command task that executes a shell script. You can configure the links in the workflow or worklet so the PowerCenter Server sends you email if the Command task fails.

You might want the PowerCenter Server to generate a report during a workflow and email the report to you after generating it.

Note: When you use an Email task outside of a Session task, the PowerCenter Server reads variables related to the session as text. For example, if you use the variable %s in an Email task in the workflow, the PowerCenter Server cannot provide a session name, as it is not within a session.

Figure 12-4 shows a workflow that performs this operation:

Configure the gen_report Command task to execute a shell script that generates the report. Verify the shell script saves the report to a directory local to the PowerCenter Server. Configure the em_report Email task to attach the file generated from the shell script.

Figure 12-4. Email Task in a Workflow

Using Email Tasks in a Workflow or Worklet 341

Tips

The following suggestions can extend the capabilities of Email tasks.

Create generic user for sending email.

Often there are multiple users who can start sessions on a PowerCenter Server. If you want to avoid entering the Microsoft Outlook profile each time the PowerCenter user changes, create a generic Microsoft Outlook profile, such as “PowerCenter,” then grant each PowerCenter user rights to send mail through this profile.

Use server variables to address post-session emails.

When the server variables $PMSuccessEmailUser and $PMFailureEmailUser are configured for the PowerCenter Server, use them to address post-session emails. This allows you to change the recipient of post-session emails for all sessions the server runs by editing the server variables. It can also make deploying sessions into production easier when the variables are defined for both development and production servers.

Generate and send post-session reports.

You can use a post-session success command to generate a report file and attach that file to a success email. For example, you create a batch file called Q3rpt.bat that generates a sales report, and you are running Microsoft Outlook on Windows.

Figure 12-5 shows how you can configure the post-session success command to generate a report:

Figure 12-5. Using Post-Session Commands to Generate Reports


Figure 12-6 shows how you can configure success email to attach a report file:

Use other mail programs.

If you do not have Microsoft Outlook, you can use a post-session success command to invoke a command line email program, such as WindMail. In this case, you do not have to enter the email user name or subject, since your recipients, email subject, and body text will be contained in the batch file, sendmail.bat.

Figure 12-7 shows how you can configure the post-session success command to invoke a command line email program:

Figure 12-6. Using Email Variables to Attach Reports

Figure 12-7. Sending Email without Microsoft Outlook

Use email variable %a to attach the report.

Tips 343


C h a p t e r 1 3

Pipeline Partitioning

This chapter covers the following subjects:

♦ Overview, 346

♦ Configuring Partitioning Information, 351

♦ Cache Partitioning, 359

♦ Round-Robin Partition Type, 360

♦ Hash Keys Partition Types, 361

♦ Key Range Partition Type, 363

♦ Pass-Through Partition Type, 367

♦ Database Partitioning Partition Type, 369

♦ Partitioning Relational Sources, 371

♦ Partitioning File Sources, 374

♦ Partitioning Relational Targets, 378

♦ Partitioning File Targets, 380

♦ Partitioning Joiner Transformations, 384

♦ Partitioning Lookup Transformations, 391

♦ Partitioning Sorter Transformations, 392

♦ Mapping Variables in Partitioned Pipelines, 394

♦ Partitioning Rules, 395

345

Overview

You create a session for each mapping you want the PowerCenter Server to run. Every mapping contains one or more source pipelines. A source pipeline consists of a source qualifier and all the transformations and targets that receive data from that source qualifier.

If you purchase the Partitioning option, you can specify partitioning information for each source pipeline in a mapping. The partitioning information for a pipeline controls the following factors:

♦ The number of reader, transformation, and writer threads that the master thread creates for the pipeline. For more information, see “Understanding Processing Threads” on page 14.

♦ How the PowerCenter Server reads data from the source, including the number of connections to the source.

♦ How the PowerCenter Server distributes rows of data to each transformation as it processes the pipeline.

♦ How the PowerCenter Server writes data to the target, including the number of connections to each target in the pipeline.

You can specify partitioning information for a pipeline by setting the following attributes:

♦ Location of partition points. Partition points mark the thread boundaries in a pipeline and divide the pipeline into stages. The PowerCenter Server sets partition points at several transformations in a pipeline by default. If you have the Partitioning option, you can define other partition points. When you add partition points, you increase the number of transformation threads, which can improve session performance. The PowerCenter Server can redistribute rows of data at partition points, which can also improve session performance. For more information on partition points, see “Partition Points” on page 346.

♦ Number of partitions. A partition is a pipeline stage that executes in a single thread. If you purchase the Partitioning option, you can set the number of partitions at any partition point. When you add partitions, you increase the number of processing threads, which can improve session performance. For more information, see “Number of Partitions” on page 348.

♦ Partition types. The PowerCenter Server specifies a default partition type at each partition point. If you purchase the Partitioning option, you can change the partition type. The partition type controls how the PowerCenter Server redistributes data among partitions at partition points. For more information, see “Partition Types” on page 348.

Partition PointsBy default, the PowerCenter Server sets partition points at various transformations in the pipeline. Partition points mark thread boundaries as well as divide the pipeline into stages. A stage is a section of a pipeline between any two partition points. When you set a partition point at a transformation, the new pipeline stage includes that transformation.

346 Chapter 13: Pipeline Partitioning

Table 13-1 lists the partition points that the Workflow Manager creates by default:

If you purchase the Partitioning option, you can add partition points at other transformations and delete some partition points.

Figure 13-1 shows the default partition points and pipeline stages for a simple mapping with one source pipeline:

The mapping in Figure 13-1 contains four stages. The partition point at the source qualifier marks the boundary between the first (reader) and second (transformation) stages. The partition point at the Aggregator transformation marks the boundary between the second and third (transformation) stages. The partition point at the target instance marks the boundary between the third (transformation) and fourth (writer) stage.

When you add a partition point, you increase the number of pipeline stages by one. Similarly, when you delete a partition point, you reduce the number of stages by one. For more information, see “Understanding Processing Threads” on page 14.

Besides marking stage boundaries, partition points also mark the points in the pipeline where the PowerCenter Server can redistribute data across partitions. For example, if you place a partition point at a Filter transformation and define multiple partitions, the PowerCenter Server can redistribute rows of data among the partitions before the Filter transformation processes the data. The partition type you set at this partition point controls the way in which the PowerCenter Server passes rows of data to each partition. For more information, see “Partition Types” on page 348.

For more information on adding and deleting partition points, see “Adding and Deleting Partition Points” on page 353.

Table 13-1. Default Partition Points

Transformation (Partition Point)

Default Partition Type Description

Source Qualifier or Normalizer transformation

Pass-through Controls how the PowerCenter Server reads data from the source and passes data into the source qualifier.

Rank and unsorted Aggregator transformations

Hash auto-keys Ensures that the PowerCenter Server groups rows properly before it sends them to the transformation.

Target instances Pass-through Controls how the target instances pass data to the targets.

Figure 13-1. Default Partition Points and Stages in a Sample Mapping


Fourth StageThird StageSecond StageFirst Stage

Overview 347

Number of PartitionsA partition is a pipeline stage that executes in a single reader, transformation, or writer thread. By default, the PowerCenter Server defines a single partition in the source pipeline. If you purchase the Partitioning option, you can increase the number of partitions. This increases the number of processing threads, which can improve session performance.

For example, you need to use the mapping in Figure 13-1 to extract data from three flat files of various sizes. To do this, you define three partitions at the source qualifier to read the data simultaneously. When you do this, the Workflow Manager defines three partitions in the pipeline.

Figure 13-2 shows the threads that the master thread creates for this mapping:

By default, the PowerCenter Server sets the number of partitions to one. You can generally define up to 64 partitions at any partition point. However, there are situations in which you can define only one partition in the pipeline. For more information, see “Restrictions on the Number of Partitions” on page 395.

Note: Increasing the number of partitions or partition points increases the number of threads. Therefore, increasing the number of partitions or partition points also increases the load on the server machine. If the server machine contains ample CPU bandwidth, processing rows of data in a session concurrently can increase session performance. However, if you create a large number of partitions or partition points in a session that processes large amounts of data, you can overload the system.

For more information on adding and deleting partitions, see “Adding and Deleting Partitions” on page 356.

Partition TypesWhen you configure the partitioning information for a pipeline, you must specify a partition type at each partition point in the pipeline. The partition type determines how the PowerCenter Server redistributes data across partition points.

Figure 13-2. Threads Created for a Sample Mapping with Three Partitions

3 Reader Threads 6 Transformation Threads 3 Writer Threads


Threads for Partition #1Threads for Partition #2Threads for Partition #3

(First Stage) (Second Stage) (Third Stage) (Fourth Stage)


The Workflow Manager allows you to specify the following partition types:

♦ Round-robin. The PowerCenter Server distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows. For more information, see “Round-Robin Partition Type” on page 360.

♦ Hash. The PowerCenter Server applies a hash function to a partition key to group data among partitions. If you select hash auto-keys, the PowerCenter Server uses all grouped or sorted ports as the partition key. If you select hash user keys, you specify a number of ports to form the partition key. Use hash partitioning where you want to ensure that the PowerCenter Server processes groups of rows with the same partition key in the same partition. For more information, see “Hash Keys Partition Types” on page 361.

♦ Key range. You specify one or more ports to form a compound partition key. The PowerCenter Server passes data to each partition depending on the ranges you specify for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range. For more information, see “Key Range Partition Type” on page 363.

♦ Pass-through. The PowerCenter Server passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions. For more information, see “Pass-Through Partition Type” on page 367.

♦ Database partitioning. The PowerCenter Server queries the IBM DB2 system for table partition information and loads partitioned data to the corresponding nodes in the target database. Use database partitioning with IBM DB2 targets stored on a multi-node tablespace. For more information, see “Database Partitioning Partition Type” on page 369.

You can specify different partition types at different points in the pipeline.

Figure 13-3 shows a mapping where you can specify different partition types to increase session performance:

The mapping in Figure 13-3 reads data about items and calculates average wholesale costs and prices. The mapping must read item information from three flat files of various sizes, and then filter out discontinued items. It sorts the active items by description, calculates the average prices and wholesale costs, and writes the results to a relational database in which the target tables are partitioned by key range.

When you use this mapping in a session, you can increase session performance by specifying different partition types at the following partition points in the pipeline:

♦ Source qualifier. To read data from the three flat files concurrently, you must specify three partitions at the source qualifier. Accept the default partition type, pass-through.

Figure 13-3. Sample Mapping

Overview 349

♦ Filter transformation. Since the source files vary in size, each partition processes a different amount of data. Set a partition point at the Filter transformation, and choose round-robin partitioning to balance the load going into the Filter transformation.

♦ Sorter transformation. To eliminate overlapping groups in the Sorter and Aggregator transformations, use hash auto-keys partitioning at the Sorter transformation. This causes the PowerCenter Server to group all items with the same description into the same partition before the Sorter and Aggregator transformations process the rows. You can delete the default partition point at the Aggregator transformation.

♦ Target. Since the target tables are partitioned by key range, specify key range partitioning at the target to optimize writing data to the target.

For more information on specifying partition types, see “Specifying Partition Types” on page 356.


Configuring Partitioning Information

When you create or edit a session, you can change the partitioning information for each pipeline in a mapping. If the mapping contains multiple pipelines, you can specify multiple partitions in some pipelines and single partitions in others. You update partitioning information using the Partitions view on the Mapping tab in the session properties.

You can configure the following information in the Partitions view on the Mapping tab:

♦ Add and delete partition points.

♦ Enter a description for each partition.

♦ Specify the partition type at each partition point.

♦ Add a partition key and key ranges for certain partition types.

Figure 13-4 shows the configuration options on the Partitions view on the Mapping tab:

Figure 13-4. Session Properties Partitions View on the Mapping Tab

Selected Partition Point

Add a partition point.

Delete a partition point.

Edit Keys

Specify key ranges.

Partitioning Workspace

Edit the selected partition point.

Click to display Partitions view.

Configuring Partitioning Information 351

Table 13-2 describes the configuration options for the Partitions view on the Mapping tab:

You can configure the following information when you edit or add a partition point:

♦ Specify the partition type at the partition point.

♦ Add and delete partitions.


Figure 13-5 shows the configuration options in the Edit Partition Point dialog box:

Table 13-2. Options on Session Properties Partitions View on the Mapping Tab

Partitions View Option Description

Add Partition Point Click to add a new partition point in the mapping. When you add a partition point, the transformation name appears under the Partition Points node.

Delete Partition Point Click to delete the selected partition point.You cannot delete certain partition points. For details, see �Adding and Deleting Partition Points� on page 353.

Edit Partition Point Click to edit the selected partition point. This opens the Edit Partition Point dialog box. For more information on the options in this dialog box, see Table 13-3 on page 353.

Key Range Displays the key and key ranges for the partition point, depending on the partition type. For key range partitioning, you specify the key ranges.For hash user keys partitioning, this field displays the partition key.The Workflow Manager does not display this area for other partition types.

Edit Keys Click to add or remove the partition key for key range or hash user keys partitioning. You cannot create a partition key for hash auto-keys, round-robin, or pass-through partitioning.

Figure 13-5. Edit Partition Point Dialog Box Selected Partition PointAdd a partition.

Select a partition.

Delete a partition.

Specify the partition type.

Enter the partition description.


Table 13-3 describes the configuration options in the Edit Partition Point dialog box:

Adding and Deleting Partition PointsWhen you create a session, the Workflow Manager creates one partition point at the following transformations in the pipeline:

♦ Source Qualifier or Normalizer. This partition point controls how the PowerCenter Server extracts data from the source and passes it to the source qualifier. You cannot delete this partition point.

♦ Rank and unsorted Aggregator transformations. These partition points ensure that the PowerCenter Server groups rows properly before it sends them to the transformation. You can delete these partition points if the pipeline contains only one partition or if the PowerCenter Server passes all rows in a group to a single partition before they enter the transformation.

For example, in the mapping in Figure 13-3 on page 349, you can delete the default partition point at the Aggregator transformation because hash auto-keys partitioning at the Sorter transformation sends all rows that contain items with the same description to the same partition. Therefore, the Aggregator transformation receives data for all items with the same description in one partition and can calculate the average costs and prices for this item correctly.

♦ Target instances. This partition point controls how the writer passes data to the targets. You cannot delete this partition point.

Rules for Adding and Deleting Partition PointsYou can add and delete partition points at other transformations in the pipeline according to the following rules:

♦ You cannot create partition points at source instances.

♦ You cannot create partition points at Sequence Generator transformations or unconnected transformations.

Table 13-3. Edit Partition Point Dialog Box Options

Partition Options Description

Select Partition Type Changes the partition type.

Partition Names Selects individual partitions from this dialog box to configure.

Add a Partition Adds a partition. You can add up to 64 partitions at any partition point. The number of partitions must be consistent across the pipeline. Therefore, if you define three partitions at one partition point, the Workflow Manager defines three partitions at all partition points in the pipeline.

Delete a Partition Deletes the selected partition. Each partition point must contain at least one partition.

Description Enter an optional description for the current partition.


♦ You can add partition points at any other transformation provided that no partition point receives input from more than one pipeline stage.

Figure 13-6 shows the valid partition points in a mapping:

In this mapping, the Workflow Manager creates partition points at the source qualifier and target instance by default. You can place an additional partition point at Expression transformation EXP_3.

If you place a partition point at EXP_3 and define one partition, the master thread creates the following threads:

In this case, each partition point receives data from only one pipeline stage, so EXP_3 is a valid partition point.

The following transformations are not valid partition points:

Figure 13-6. Sample Mapping Showing Valid Partition Points

Transformation Reason

Source This is a source instance.

Valid Partition Points*

** *

Partition Points*

** *

Reader Thread Transformation Threads Writer Thread(Fourth Stage)(Third Stage)(Second Stage)(First Stage)


For more information about processing threads, see “Understanding Processing Threads” on page 14.

Steps for Adding Partition PointsYou add partition points from the Mappings tab of the session properties.

To add a partition point:

1. On the Partitions view of the Mapping tab, select a transformation that is not already a partition point, and click the Add a Partition Point button.

Tip: You can select a transformation from the Non-Partition Points node.

2. Select the partition type for the partition point or accept the default value. For information on specifying a valid partition type, see “Specifying Partition Types” on page 356.

3. Click OK.

The transformation appears in the Partition Points node in the Partitions view on the Mapping tab of the session properties.

SG_1 This is a Sequence Generator transformation.

EXP_1 and EXP_2 If you could place a partition point at EXP_1 or EXP_2, you would create an additional pipeline stage that processes data from the source qualifier to EXP_1 or EXP_2. In this case, EXP_3 would receive data from two pipeline stages, which is not allowed.

Transformation Reason


Adding and Deleting PartitionsIn general, you can define up to 64 partitions at any partition point in a source pipeline. In certain circumstances, the number of partitions in the pipeline must be set to one. For more information, see “Restrictions on the Number of Partitions” on page 395.

The number of partitions you specify equals the number of connections to the source or target. If the pipeline contains a relational source or target, the number of partitions at the source qualifier or target instance equals the number of connections to the database. If the pipeline contains file sources, you can configure the session to read the source with one thread or with multiple threads. For more information on connecting to relational sources and targets, see “Partitioning Relational Sources” on page 371 and “Partitioning Relational Targets” on page 378. For more information on connecting to file sources and targets, see “Partitioning File Sources” on page 374 and “Partitioning File Targets” on page 380.

The number of partitions you specify remains consistent throughout the pipeline. So if you specify three partitions at any partition point, the PowerCenter Server creates three partitions at all other partition points in the pipeline.

Entering Partition DescriptionsYou can enter a description for each partition you create. To enter a description, select the partition in the Edit Partition Point dialog box, and then enter the description in the Description field.

Specifying Partition TypesThe Workflow Manager sets a default partition type for each partition point in the pipeline. At the source qualifier and target instance, the Workflow Manager specifies pass-through partitioning. For Rank and unsorted Aggregator transformations, for example, the Workflow Manager specifies hash auto-keys partitioning when the transformation scope is All Input. When you create a new partition point, the Workflow Manager sets the partition type to the default partition type for that transformation. You can change the default type.

You must specify pass-through partitioning for all transformations that are downstream from a transaction generator or an active source that generates commits, and upstream from a target or a transformation with Transaction transformation scope. Also, if you configure the session to use constraint-based loading, you must specify pass-through partitioning for all transformations that are downstream from the last active source.


Table 13-4 lists valid partition types and the default partition type for different partition points in the pipeline:

Table 13-4. Valid Partition Types for Partition Points


Round-Robin

Hash Auto-Keys

Hash User Keys

Key Range

Pass-Through

Database Partitioning

Default Partition Type

Source definition Not a valid partition point

Source Qualifier (relational sources)

X X Pass-through

Source Qualifier (flat file sources)

X Pass-through

XML Source Qualifier X Pass-through

Normalizer (COBOL sources)

X Pass-through

Normalizer(relational)

X X X X Pass-through

Aggregator (sorted) X Pass-through

Aggregator (unsorted) X X Based on transformation scope*

Custom X X X X Pass-through

Expression X X X X Pass-through

External Procedure X X X X Pass-through

Filter X X X X Pass-through

Joiner X X Based on transformation scope*

Lookup X X X X X Pass-through

Rank X X Based on transformation scope*

Router X X X X Pass-through

Sequence Generator Not a valid partition point

Sorter X X X Based on transformation scope*

Stored Procedure X X X X Pass-through

Transaction Control X X X X Pass-through

Union X X X X Pass-through

Update Strategy X X X X Pass-through


Adding Keys and Key RangesIf you select key range or hash user keys partitioning at any partition point, you need to specify a partition key. The PowerCenter Server uses the key to pass rows to the appropriate partition.

For example, if you specify key range partitioning at a Source Qualifier transformation, the PowerCenter Server uses the key and ranges to create the WHERE clause when it selects data from the source. Therefore, you can have the PowerCenter Server pass all rows that contain customer IDs less than 135000 to one partition and all rows that contain customer IDs greater than or equal to 135000 to another partition. For more information, see “Key Range Partition Type” on page 363.

If you specify hash user keys partitioning at a transformation, the PowerCenter Server uses the key to group data based on the ports you select as the key. For example, if you specify ITEM_DESC as the hash key, the PowerCenter Server distributes data so that all rows that contain items with the same description go to the same partition. For more information, see “Hash Keys Partition Types” on page 361.

Unconnected transformation

Not a valid partition point

Relational target definition

X X X X X(DB2 targets only)

Pass-through

The default for DB2 targets is database partitioning

Flat file target definition X X X X Pass-through

XML target definition Not a valid partition point

* The default partition type is pass-through when the transformation scope is Transaction, and hash auto-keys when the transformation scope is All Input.

Table 13-4. Valid Partition Types for Partition Points


Round-Robin

Hash Auto-Keys

Hash User Keys

Key Range

Pass-Through

Database Partitioning

Default Partition Type


Cache Partitioning

When you create a session with multiple partitions, the PowerCenter Server can partition caches for the Aggregator, Joiner, Lookup, and Rank transformations. It creates a separate cache for each partition, and each partition works with only the rows needed by that partition. As a result, the PowerCenter Server requires only a portion of total cache memory for each partition. When you run a session, the PowerCenter Server accesses the cache in parallel for each partition.

After you configure the session for partitioning, you can configure memory requirements and cache directories for each transformation in the Transformations view on the Mapping tab of the session properties. To configure the memory requirements, calculate the total requirements for a transformation, and divide by the number of partitions. To further improve performance, you can configure separate directories for each partition.

The guidelines for cache partitioning is different for each cached transformation:

♦ Aggregator transformation. The PowerCenter Server uses cache partitioning for any multi-partitioned session with an Aggregator transformation. You do not have to set a partition point at the Aggregator transformation.

♦ Joiner transformation. The PowerCenter Server uses cache partitioning when you create a partition point at the Joiner transformation. For more information about partitioning with Joiner transformations, see “Partitioning Joiner Transformations” on page 384.

♦ Lookup transformation. The PowerCenter Server uses cache partitioning when you create a hash auto-keys partition point at the Lookup transformation. For more information about partitioning with Lookup transformations, see “Partitioning Lookup Transformations” on page 391.

♦ Rank transformation. The PowerCenter Server uses cache partitioning for any multi-partitioned session with a Rank transformation. You do not have to set a partition point at the Rank transformation.

For more caching information, see “Session Caches” on page 613.

Cache Part it ioning 359

Round-Robin Partition Type

In round-robin partitioning, the PowerCenter Server distributes rows of data evenly to all partitions. Each partition processes approximately the same number of rows.

Table 13-4 on page 357 lists the partition points where you can specify round-robin partitioning.

Use round-robin partitioning when you need to distribute rows evenly and do not need to group data among partitions. In a pipeline that reads data from file sources of different sizes, you can use round-robin partitioning to ensure that each partition receives approximately the same number of rows.

Figure 13-7 shows a mapping where round-robin partitioning helps distribute rows before they enter a Filter transformation:

The session based on this mapping reads item information from three flat files of different sizes:

♦ Source file 1: 80,000 rows



When the PowerCenter Server reads the source data, the first partition begins processing 80% of the data, the second partition processes 5% of the data, and the third partition processes 15% of the data.

To distribute the workload more evenly, set a partition point at the Filter transformation and set the partition type to round-robin. The PowerCenter Server distributes the data so that each partition processes approximately one third of the data.

Figure 13-7. Mapping where Round-robin Partitioning Can Increase Performance

Round-robin partitioning distributes data evenly at the Filter transformation.


Hash Keys Partition Types

In hash partitioning, the PowerCenter Server uses a hash function to group rows of data among partitions. The PowerCenter Server groups the data based on a partition key.

Use hash partitioning when you want the PowerCenter Server to distribute rows to the partitions by group. For example, you need to sort items by item ID, but you do not know how many items have a particular ID number.

There are two types of hash partitioning:

♦ Hash auto-keys. The PowerCenter Server uses all grouped or sorted ports as a compound partition key. You may need to use hash auto-keys partitioning at Rank, Sorter, and unsorted Aggregator transformations.

♦ Hash user keys. You specify a number of ports to generate the partition key.

Table 13-4 on page 357 lists the partition points where you can specify hash partitioning.

Hash Auto-Keys You can use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and unsorted Aggregator transformations to ensure that rows are grouped properly before they enter these transformations.

Figure 13-8 shows a mapping where hash auto-keys partitioning causes the PowerCenter Server to distribute rows to each partition according to group before they enter the Sorter and Aggregator transformations:

In this mapping, the Sorter transformation sorts items by item description. If items with the same description exist in more than one source file, each partition will contain items with the same description. Without hash auto-keys partitioning, the Aggregator transformation might calculate average costs and prices for each item incorrectly.

To prevent errors in the cost and prices calculations, set a partition point at the Sorter transformation and set the partition type to hash auto-keys. When you do this, the PowerCenter Server redistributes the data so that all items with the same description reach the Sorter and Aggregator transformations in a single partition.

Figure 13-8. Mapping where Hash Partitioning Can Increase Performance

Hash auto-keys partitioning groups data at the Sorter.

Hash Keys Part it ion Types 361

Hash User Keys In hash user keys partitioning, the PowerCenter Server uses a hash function to group rows of data among partitions based on a user-defined partition key. You choose the ports that define the partition key.

In the mapping in Figure 13-8 on page 361, if you specify hash auto-keys partitioning, the Sorter transformation receives rows of data grouped by the sort key, such as ITEM_DESC. If the item descriptions are long, and you know that each item has a unique ID number, you can specify hash user keys partitioning at the Sorter transformation and select ITEM_ID as the hash key. This may improve the performance of the session since the hash function usually processes numerical data more quickly than string data.

Adding a Hash KeyIf you select hash user keys partitioning at any partition point, you must specify a hash key. The PowerCenter Server uses the hash key to distribute rows to the appropriate partition according to group.

To specify the hash key, select the partition point on the Partitions view of the Mapping tab, and click Edit Keys. This displays the Edit Partition Key dialog box. The Available Ports list displays the connected input and input/output ports in the transformation. To specify the hash key, select one or more ports from this list, and then click Add.

Figure 13-9 shows one port selected as the hash key for a Filter transformation:

To rearrange the order of the ports that make up the key, select a port in the Selected Ports list and click the up or down arrow.

Figure 13-9. Edit Partition Key Dialog Box

Rearrange selected ports.


Key Range Partition Type

With key range partitioning, the PowerCenter Server distributes rows of data based on a port or set of ports that you specify as the partition key. For each port, you define a range of values. The PowerCenter Server uses the key and ranges to send rows to the appropriate partition.

Table 13-4 on page 357 lists the partition points where you can specify key range partitioning.

Use key range partitioning in mappings where the source and target tables are partitioned by key range.

Figure 13-10 shows a mapping where key range partitioning can optimize writing to the target table:

The target table in the database is partitioned by ITEM_ID as follows:

♦ Partition 1: 0001–2999

♦ Partition 2: 3000–5999

♦ Partition 3: 6000–9999

To optimize writing to the target table, perform the following tasks:

1. Set the partition type at the target instance to key range.

2. Create three partitions.

3. Choose ITEM_ID as the partition key.

The PowerCenter Server uses this key to pass data to the appropriate partition.

4. Set the key ranges as follows:

When you do this, the PowerCenter Server sends all items with IDs less than 3000 to the first partition. It sends all items with IDs between 3000 and 5999 to the second partition. Items with IDs greater than or equal to 6000 go to the third partition. For more information on key ranges, see “Adding Key Ranges” on page 365.

Figure 13-10. Mapping where Key Range Partitioning Can Increase Performance

ITEM_ID Start Range End Range

Partition #1 3000

Partition #2 3000 6000

Partition #3 6000

Key range partitioning at the target optimizes writing to the target tables.

Key Range Partition Type 363

Adding a Partition KeyTo specify the partition key for key range partitioning, select the partition point on the Partitions view of the Mapping tab, and click Edit Keys. This displays the Edit Partition Key dialog box. The Available Ports list displays the connected input and input/output ports in the transformation. To specify the partition key, select one or more ports from this list, and then click Add.

Figure 13-11 shows one port selected as the partition key for the target table T_ITEM_PRICES:

To rearrange the order of the ports that make up the partition key, select a port in the Selected Ports list and click the up or down arrow.

In key range partitioning, the order of the ports does not affect how the PowerCenter Server redistributes rows among partitions, but it can affect session performance. For example, you might configure the following compound partition key:

Since boolean comparisons are usually faster than string comparisons, the session may run faster if you arrange the ports in the following order:

Figure 13-11. Edit Partition Key Dialog Box

Selected Ports

ITEMS.DESCRIPTION

ITEMS.DISCONTINUED_FLAG

Selected Ports

ITEMS.DISCONTINUED_FLAG

ITEMS.DESCRIPTION

Rearrange the selected ports.


Adding Key RangesAfter you identify the ports that make up the partition key, you must enter the ranges for each port on the Partitions view of the Mapping tab.

Figure 13-12 shows where you enter key ranges on the Partitions view of the Mapping tab:

You can leave the start or end range blank for a partition. When you leave the start range blank, the PowerCenter Server uses the minimum data value as the start range. When you leave the end range blank, the PowerCenter Server uses the maximum data value as the end range.

For example, you can add the following ranges for a key based on CUSTOMER_ID in a pipeline that contains two partitions:

When the PowerCenter Server reads the Customers table, it sends all rows that contain customer IDs less than 135000 to the first partition, and all rows that contain customer IDs equal to or greater than 135000 to the second partition. The PowerCenter Server eliminates rows that contain null values or values that fall outside the key ranges.

Figure 13-12. Adding Key Ranges

CUSTOMER_ID Start Range End Range

Partition #1 135000

Partition #2 135000

Specify key ranges.

Key Range Partition Type 365

When you configure a pipeline to load data to a relational target, if a row contains null values in any column that makes up the partition key or if a row contains a value that fall outside all of the key ranges, the PowerCenter Server sends that row to the first partition.

When you configure a pipeline to read data from a relational source, the PowerCenter Server reads rows that fall within the key ranges. It does not read rows with null values in any partition key column.

If you want to read rows with null values in the partition key, use pass-through partitioning and create a SQL override.

Consider the following guidelines when you create key ranges:

♦ The partition key must contain at least one port.

♦ You must specify a range for each port.

♦ Use the standard PowerCenter date format to enter dates in key ranges.

♦ The Workflow Manager does not validate overlapping string or numeric ranges.

♦ The Workflow Manager does not validate gaps or missing ranges.

Adding Filter ConditionsIf you specify key range partitioning for a relational source, you can specify optional filter conditions or override the SQL query. For details, see “Partitioning Relational Sources” on page 371.


Pass-Through Partition Type

In pass-through partitioning, the PowerCenter Server processes data without redistributing rows among partitions. Therefore, all rows in a single partition stay in that partition after crossing a pass-through partition point.

When you add a partition point to a pipeline, the master thread creates an additional pipeline stage. Use pass-through partitioning when you want to increase data throughput, but you cannot or do not want to increase the number of partitions.

You can specify pass-through partitioning at any valid partition point in a pipeline.

Figure 13-13 shows a mapping where pass-through partitioning can increase data throughput:

By default, this mapping contains partition points only at the source qualifier and target instance. Since this mapping contains an XML target, you can configure only one partition at any partition point.

In this case, the master thread creates one reader thread to read data from the source, one transformation thread to process the data, and one writer thread to write data to the target. Each pipeline stage processes the rows as follows:

Because the pipeline contains three stages, the PowerCenter Server can process three sets of rows concurrently.

If the Expression transformations are very complicated, processing the second (transformation) stage can take a long time and cause low data throughput. To improve performance, set a partition point at Expression transformation EXP_2 and set the partition

Figure 13-13. Mapping where Pass-through Partitioning Can Increase Performance

Reader Thread Transformation Thread Writer Thread(Third Stage)(Second Stage)(First Stage)

Row Set 1Row Set 2Row Set 3Row Set 4...Row Set n

–Row Set 1Row Set 2Row Set 3...Row Set n-1

––Row Set 1Row Set 2...Row Set n-2

Source Qualifier(First Stage)

Transformations(Second Stage)

Target Instance(Third Stage)

Time

Pass-Through Partition Type 367

type to pass-through. This creates an additional pipeline stage. The master thread creates an additional transformation thread:

The PowerCenter Server can now process four sets of rows concurrently as follows:

By adding an additional partition point at Expression transformation EXP_2, you replace one long running transformation stage with two shorter running transformation stages. Data throughput depends on the longest running stage. So in this case, data throughput increases.

For more information about processing threads, see “Understanding Processing Threads” on page 14.

Reader Thread Transformation Threads Writer Thread(Fourth Stage)(Third Stage)(Second Stage)(First Stage)

Row Set 1Row Set 2Row Set 3Row Set 4...Row Set n

-Row Set 1Row Set 2Row Set 3...Row Set n-1

--Row Set 1Row Set 2...Row Set n-2

Source Qualifier(First Stage)

FIL_1 & EXP_1 Transformations(Second Stage)

EXP_2 & LKP_1 Transformations(Third Stage)

Time---Row Set 1...Row Set n-3

Target Instance(Fourth Stage)


Database Partitioning Partition Type

When you load to an IBM DB2 table stored on a multi-node tablespace, you can optimize session performance by using the database partitioning partition type instead of the pass-through partition type for IBM DB2 targets.

When you use database partitioning, the PowerCenter Server queries the DB2 system for table partition information and loads partitioned data to the corresponding nodes in the target database.

You can only specify database partitioning for relational targets.

You can specify database partitioning for the target partition type with any number of pipeline partitions and any number of database nodes. However, you can improve load performance further when the number of pipeline partitions equals the number of database nodes.

Use the following rules and guidelines when you use database partitioning:

♦ By default, the PowerCenter Server fails the session when you use database partitioning for non-DB2 targets. However, you can configure the PowerCenter Server to default to pass-through partitioning when you use database partitioning for non-DB2 relational targets:

− On Windows. Select the Treat Database Partitioning as Pass-Through option on the Configuration tab of the PowerCenter Server setup. By default, this option is disabled.

− On UNIX. Add the following entry to the file pmserver.cfg:

TreatDBPartitionAsPassThrough=Yes

♦ You cannot use database partitioning when you configure the session to use source-based or user-defined commit, constraint-based loading, or session recovery.

♦ The target table must contain a partition key. Also, you must link all not-null partition key columns in the target instance to a transformation in the mapping.

♦ You must use high precision mode when the IBM DB2 table partitioning key uses a Bigint field. The PowerCenter Server fails the session when the IBM DB2 table partitioning key uses a Bigint field and you use low precision mode.

♦ If you create multiple partitions for a DB2 bulk load session, you must use database partitioning for the target partition type. If you choose any other partition type, the PowerCenter Server reverts to normal load and writes the following message to the session log:

ODL_26097 Only database partitioning is support for DB2 bulk load. Changing target load type variable to Normal.

If you configure a session for database partitioning, the PowerCenter Server reverts to pass-through partitioning under the following circumstances:

♦ The DB2 target table is stored on one node.

♦ You run the session in debug mode using the Debugger.

Database Partitioning Partition Type 369

♦ You configure the PowerCenter Server to treat the database partitioning partition type as pass-through partitioning and you used database partitioning for a non-DB2 relational target.


Partitioning Relational Sources

When you run a session that partitions relational or Application sources, the PowerCenter Server creates a separate connection to the source database for each partition. It then creates an SQL query for each partition. You can customize the query for each source partition by entering filter conditions in the Transformation view on the Mapping tab. You can also override the SQL query for each source partition using the Transformations view on the Mapping tab.

Figure 13-14 shows where you can override the SQL query for each source partition:

For more information about partitioning Application sources, refer to the PowerCenter Connect documentation.

Entering an SQL QueryYou can enter an SQL override if you want to customize the SELECT statement in the SQL query. The SQL statement you enter on the Transformations view of the Mapping tab overrides any customized SQL query that you set in the Designer when you configure the Source Qualifier transformation. For more information, see “Source Qualifier Transformation” in the Transformation Guide.

Figure 13-14. Overriding the SQL Query and Entering a Filter Condition

Enter SQL overrides.

Enter filter conditions.


Browse Button

Partitioning Relational Sources 371

The SQL query also overrides any key range and filter condition that you enter for a source partition. So, if you also enter a key range and source filter, the PowerCenter Server uses the SQL query override to extract source data.

If you create a key that contains null values, you can extract the nulls by creating another partition and entering an SQL query or filter to extract null values.

To enter an SQL query for each partition, click the Browse button in the SQL Query field. Enter the query in the SQL Editor dialog box, and then click OK.

If you entered an SQL query in the Designer when you configured the Source Qualifier transformation, that query appears in the SQL Query field for each partition. To override this query, click the Browse button in the SQL Query field, revise the query in the SQL Editor dialog box, and then click OK.

Entering a Filter ConditionIf you specify key range partitioning at a relational source qualifier, you can enter an additional filter condition. When you do this, the PowerCenter Server generates a WHERE clause that includes the filter condition you enter in the session properties.

The filter condition you enter on the Transformations view of the Mapping tab overrides any filter condition that you set in the Designer when you configure the Source Qualifier transformation. For more information, see “Source Qualifier Transformation” in the Transformation Guide.

If you use key range partitioning, the filter condition works in conjunction with the key ranges. For example, you want to select data based on customer ID, but you do not want to extract information for customers outside the USA. Define the following key ranges:

If you know that the IDs for customers outside the USA fall within the range for a particular partition, you can enter a filter in that partition to exclude them. Therefore, you enter the following filter condition for the second partition:

CUSTOMERS.COUNTRY = ‘USA’

When the session runs, the following queries for the two partitions appear in the session log:

READER_1_1_1> RR_4010 SQ instance [SQ_CUSTOMERS] SQL Query [SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, CUSTOMERS.LAST_NAME FROM CUSTOMERS WHERE CUSTOMER.CUSTOMER ID < 135000][...]READER_1_1_2> RR_4010 SQ instance [SQ_CUSTOMERS] SQL Query [SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, CUSTOMERS.LAST_NAME FROM CUSTOMERS WHERE CUSTOMERS.COUNTRY = ‘USA’ AND 135000 <= CUSTOMERS.CUSTOMER_ID]

CUSTOMER_ID Start Range End Range

Partition #1 135000

Partition #2 135000


To enter a filter condition, click the Browse button in the Source Filter field. Enter the filter condition in the SQL Editor dialog box, and then click OK.

If you entered a filter condition in the Designer when you configured the Source Qualifier transformation, that query appears in the Source Filter field for each partition. To override this filter, click the Browse button in the Source Filter field, change the filter condition in the SQL Editor dialog box, and then click OK.

Partitioning Relational Sources 373

Partitioning File Sources

When a session uses a file source, you can configure it to read the source with one thread or with multiple threads. The PowerCenter Server creates one connection to the file source when you configure the session to read with one thread, and it creates multiple concurrent connections to the file source when you configure the session to read with multiple threads.

Configure the source file name property for partitions 2-n to specify single- or multi-threaded reading. To configure for single-threaded reading, pass empty data through partitions 2-n. To configure for multi-threaded reading, leave the source file name blank for partitions 2-n. For more information about configuring file properties with multiple partitions, see “Configuring for File Partitioning” on page 375.

Guidelines for Partitioning File SourcesUse the following guidelines when you configure a file source session with multiple partitions:

♦ You can use pass-through partitioning at the source qualifier.

♦ You can use single- or multi-threaded reading with flat file or COBOL sources.

♦ You can use single-threaded reading with XML sources.

♦ You cannot use multi-threaded reading if the source files are non-disk files, such as FTP files or IBM MQSeries sources.

♦ If you use a shift-sensitive code page, you can use multi-threaded reading only if the following conditions are true:

− The file is fixed-width.

− The file is not line sequential.

− You did not enable user-defined shift state in the source definition.

♦ If you configure a session for multi-threaded reading, and the PowerCenter Server cannot create multiple threads to a file source, it writes a message to the session log and reads the source with one thread.

♦ When the PowerCenter Server uses multiple threads to read a source file, it may not read the rows in the file sequentially. If sort order is important, configure the session to read the file with a single thread. For example, sort order may be important if the mapping contains a sorted Joiner transformation and the file source is the sort origin.

♦ You can also use a combination of direct and indirect files to balance the load.

♦ Session performance for multi-threaded reading is optimal with large source files. Although the PowerCenter Server can create multiple connections to small source files, performance may not be optimal.


Using One Thread to Read a File SourceWhen the PowerCenter Server uses one thread to read a file source, it creates one connection to the source. The PowerCenter Server reads the rows in the file or file list sequentially. You can configure single-threaded reading for direct or indirect file sources in a session:

♦ Reading direct files. You can configure the PowerCenter Server to read from one or more direct files. If you configure the session with more than one direct file, the PowerCenter Server creates a concurrent connection to each file. It does not create multiple connections to a file.

♦ Reading indirect files. When the PowerCenter Server reads an indirect file, it reads the file list and reads the files in the list sequentially. If the session has more than one file list, the PowerCenter Server reads the file lists concurrently, and it reads the files in the list sequentially.

Using Multiple Threads to Read a File SourceWhen the PowerCenter Server uses multiple threads to read a source file, it creates multiple concurrent connections to the source. The PowerCenter Server may or may not read the rows in a file sequentially. You can configure multi-threaded reading for direct or indirect file sources in a session:

♦ Reading direct files. When the PowerCenter Server reads a direct file, it creates multiple reader threads to read the file concurrently. You can configure the PowerCenter Server to read from one or more direct files. For example, if a session reads from two files and you create five partitions, the PowerCenter Server may distribute one file among two partitions and one file among three partitions.

♦ Reading indirect files. When the PowerCenter Server reads an indirect file, it creates multiple threads to read the file list concurrently. It also creates multiple threads to read the files in the list concurrently. The PowerCenter Server may use more than one thread to read a single file.

Configuring for File PartitioningAfter you create partition points and configure partitioning information, you can configure source connection settings and file properties on the Transformations view of the Mapping tab. Click the source instance name you want to configure under the Sources node. When you click the source instance name for a file source, the Workflow Manager displays connection and file properties in the session properties.

You can configure the source file names and directories for each source partition. The Workflow Manager generates a file name and location for each partition.

Partitioning File Sources 375

Table 13-5 describes the file properties settings for file sources in a mapping:

Configuring Sessions to Use a Single ThreadTo configure a session to read a file with a single thread, pass empty data through partitions 2-n. To pass empty data, create a file with no data, such as “empty.txt,” and put it in the source file directory. Then, use “empty.txt” as the source file name.

Table 13-6 describes the session configuration and the PowerCenter Server behavior when it uses a single thread to read source files:

If you use FTP to access source files, you can choose a different connection for each direct file. For more information about using FTP to access source files, see “Using FTP” on page 559.

Configuring Sessions to Use Multiple ThreadsTo configure a session to read a file with multiple threads, leave the source file name blank for partitions 2-n. The PowerCenter Server uses partitions 2-n to read a portion of the previous partition file or file list. The PowerCenter Server ignores the directory field of that partition.

Table 13-5. File Properties Settings for File Sources

Attribute Value Description

Source File Directory Enter the local source file directory. The default location is $PMSourceFileDir.

Source File Name Enter the local source file name. You can also use the session variable, $InputFileName, as defined in the parameter file. If you use a file list, enter the name of the list.By default, the Workflow Manager uses the source file name for each partition. Edit the file name property for partitions 2-n based on how you want the PowerCenter Server to read the files.

Source File Type Choose Direct to use source files or Indirect to use a file list.

Table 13-6. Configuring Source File Name for Single-Threaded Reading

Source File Name Value PowerCenter Server Behavior

Partition #1Partition #2Partition #3

ProductsA.txtempty.txtempty.txt

The PowerCenter Server creates one thread to read ProductsA.txt. It reads rows in the file sequentially. After it reads the file, it passes the data to three partitions in the transformation pipeline.


ProductsA.txtempty.txtProductsB.txt

The PowerCenter Server creates two threads. It creates one thread to read ProductsA.txt, and it creates one thread to read ProductsB.txt. It reads the files concurrently, and it reads rows in the files sequentially.


Table 13-7 describes the session configuration and the PowerCenter Server behavior when it uses multiple threads to read source files:

Table 13-7. Configuring Source File Name for Multi-Threaded Reading

Attribute Value PowerCenter Server Behavior


ProductsA.txt<blank><blank>

The PowerCenter Server creates three threads to concurrently read ProductsA.txt.


ProductsA.txt<blank>ProductsB.txt

The PowerCenter Server creates three threads to read ProductsA.txt and ProductsB.txt concurrently. Two threads read ProductsA.txt and one thread reads ProductsB.txt.

Partitioning File Sources 377

Partitioning Relational Targets

When you configure a pipeline to load data to a relational target, the PowerCenter Server creates a separate connection to the target database for each partition at the target instance. It concurrently loads data for each partition into the target database.

Configure partition attributes for targets in the pipeline on the Transformations view of the Mapping tab in the session properties. For relational targets, you configure the reject file names and directories. The PowerCenter Server creates one reject file for each target partition.

Figure 13-15 shows the Properties settings for relational targets:

Figure 13-15. Properties Settings for Relational Targets in the Session Properties

Selected Target Instance

Enter reject file directories.

Enter reject file names.

Properties Settings



Table 13-8 describes the partitioning attributes for relational targets in a pipeline:

Database CompatibilityWhen you configure a session with multiple partitions at the target instance, the PowerCenter Server creates one connection to the target for each partition. If you configure multiple target partitions in a session that loads to a database or ODBC target that does not support multiple concurrent connections to tables, the session fails.

When you create multiple target partitions in a session that loads data to an Informix database, you must create the target table with row-level locking. If you insert data from a session with multiple partitions into an Informix target configured for page-level locking, the session fails and returns the following message:

WRT_8206 Error: The target table has been created with page level locking. The session can only run with multi partitions when the target table is created with row level locking.

Sybase IQ does not allow multiple concurrent connections to tables. If you create multiple target partitions in a session that loads to Sybase IQ, the PowerCenter Server loads all of the data in one partition.

Table 13-8. Partitioning Relational Target Attributes

Attribute Description

Reject File Directory Location for the target reject files. Default is $PMBadFileDir.

Reject File Name Name of reject file. Default is target name partition number.bad. You can also use the session variable, $BadFileName, as defined in the parameter file.

Partitioning Relational Targets 379

Partitioning File Targets

When you configure a session to write to a file target, the PowerCenter Server writes the output to a separate file for each partition at the target instance. When you run the session, the PowerCenter Server writes to the files concurrently.

You can configure connection settings and file properties for each target partition. You configure these settings in the Transformations view on the Mapping tab.

Configuring Connection SettingsThe Connections settings in the Transformations view on the Mapping tab allow you to configure the connection type for all target partitions. You can choose different connection objects for each partition, but they must all be of the same type.

You can use one of the following connection types with target files:

♦ Local. Write the partitioned target files to the local machine.

♦ FTP. Transfer the partitioned target files to another machine. You can transfer the files to any machine to which the PowerCenter Server can connect. For more information about using FTP to load to target files, see “Using FTP” on page 559.

♦ Loader. Use an external loader that can load from multiple output files. This option appears if the pipeline loads data to a relational target and you choose a file writer in the Writers settings on the Mapping tab. If you choose a loader that cannot load from multiple output files, the PowerCenter Server fails the session. For more information about configuring external loaders for partitioning, see “Partitioning Sessions with External Loaders” on page 526.

♦ Message Queue. Transfer the partitioned target files to an IBM MQSeries message queue. For more information about loading to message queues, refer to the PowerCenter Connect for IBM MQSeries User and Administrator Guide.

You can merge target files only if you choose local connections for all target partitions.


Figure 13-16 shows the Connections settings for file targets:

Table 13-9 describes the connection options for file targets in a mapping:

Configuring File PropertiesThe Properties settings in the Transformations view on the Mapping tab allow you to configure file properties such as the reject file names and directories, the output file names and directories, and whether to merge the target files.

Figure 13-16. Connections Settings for File Targets in the Session Properties

Table 13-9. File Targets Connection Options


Connection Type Choose a local, FTP, external loader, or message queue connection. Select None for a local connection.The connection type is the same for all partitions.

Value For an FTP, external loader, or message queue connection, click the button in this field to select the connection object.You can specify a different connection object for each partition.

Selected Target InstanceConnections Settings

Connection Type


Part it ioning File Targets 381

Figure 13-17 shows the Properties settings for file targets:

Table 13-10 describes the file properties for file targets in a mapping:

Figure 13-17. Properties Settings for File Targets in the Session Properties

Table 13-10. Target File Properties


Merge Partitioned Files If you select this option, the PowerCenter Server merges the partitioned target files into one file when the session completes, and then deletes the individual output files. It does not delete the individual files if it fails to create the merged file.You cannot merge files if the session uses FTP, an external loader, or an MQSeries message queue.

Merge File Directory Location for the merge file. Default is $PMTargetFileDir.

Merge File Name Name of the merge file. Default is target name.out.

Output File Directory Location for the target file. Default is $PMTargetFileDir.

Selected Target Instance

Select to merge target files.

Properties Settings

Enter output file directories.

Enter output file names.

Enter reject file directories.

Enter reject file names.


Output File Name Name of target file. Default is target name partition number.out. You can also use the session variable, $OutputFileName, as defined in the parameter file.

Reject File Directory Location for the target reject files. Default is $PMBadFileDir.

Reject File Name Name of reject file. Default is target name partition number.bad.

Table 13-10. Target File Properties


Part it ioning File Targets 383

Partitioning Joiner Transformations

When you create a partition point at the Joiner transformation, the Workflow Manager sets the partition type to hash auto-keys when the transformation scope is All Input. The Workflow Manager sets the partition type to pass-through when the transformation scope is Transaction.

You must create the same number of partitions for the master and detail source. If you configure the Joiner transformation for sorted input, you can change the partition type to pass-through. See the Transformation Guide for more information about configuring the Joiner transformation for sorted input.

To use cache partitioning with a Joiner transformation, you must create a partition point at the Joiner transformation. This allows you to create multiple partitions for both the master and detail source of a Joiner transformation. For more information about cache partitioning, see “Cache Partitioning” on page 359.

Note: If you do not create a partition point at the Joiner transformation, you can create n partitions for the detail source, but only one partition for the master source (1:n).

Partitioning Sorted Joiner TransformationsWhen you include a Joiner transformation that uses sorted input in the mapping, you must verify the Joiner transformation receives sorted data. If your sources contain large amounts of data, you may want to configure partitioning to improve performance. However, partitions that redistribute rows can rearrange the order of sorted data, so it is important to configure partitions to maintain sorted data.

For example, when you use a hash auto-keys partition point, the PowerCenter Server uses a hash function to determine the best way to distribute the data among the partitions. However, it does not maintain the sort order, so you must follow specific partitioning guidelines to use this type of partition point.

When you join data, you can partition data for the master and detail pipelines in the following ways:

♦ 1:n. Use one partition for the master source and multiple partitions for the detail source. The PowerCenter Server maintains the sort order because it does not redistribute master data among partitions.

♦ n:n. Use an equal number of partitions for the master and detail sources. When you use n:n partitions, the PowerCenter Server processes multiple partitions concurrently. You may need to configure the partitions to maintain the sort order depending on the type of partition you use at the Joiner transformation.

Note: When you use 1:n partitions, do not add a partition point at the Joiner transformation. If you add a partition point at the Joiner transformation, the Workflow Manager adds an equal number of partitions to both master and detail pipelines.

Use different partitioning guidelines, depending on where you sort the data:


♦ Using sorted flat files. Use one of the following partitioning configurations:

− Use 1:n partitions when you have one flat file in the master pipeline and multiple flat files in the detail pipeline. Configure the session to use one reader-thread for each file.

− Use n:n partitions when you have one large flat file in the master and detail pipelines. Configure partitions to pass all sorted data in the first partition, and pass empty file data in the other partitions.

♦ Using sorted relational data. Use one of the following partitioning configurations:

− Use 1:n partitions for the master and detail pipeline.

− Use n:n partitions. If you use a hash auto-keys partition, configure partitions to pass all sorted data in the first partition.

♦ Using the Sorter transformation. Use n:n partitions. If you use a hash auto-keys partition at the Joiner transformation, configure each Sorter transformation to use hash auto-keys partition points as well.

Note: Add only pass-through partition points between the sort origin and the Joiner transformation.

Using Sorted Flat FilesUse 1:n partitions when you have one flat file in the master pipeline and multiple flat files in the detail pipeline. When you use 1:n partitions, the PowerCenter Server maintains the sort order because it does not redistribute data among partitions. When you have one large flat file in each master and detail pipeline, you can use n:n partitions and add a pass-through or hash auto-keys partition at the Joiner transformation. When you add a hash auto-keys partition point, you must configure partitions to pass all sorted data in the first partition to maintain the sort order.

Using 1:n PartitionsIf the session uses one flat file in the master pipeline and multiple flat files in the detail pipeline, you can use one partition for the master source and n partitions for the detail file sources (1:n). Add a pass-through partition point at the detail Source Qualifier transformation. Do not add a partition point at the Joiner transformation. The PowerCenter Server maintains the sort order when you create one partition for the master source because it does not redistribute sorted data among partitions.

When you have multiple files in the detail pipeline that have the same structure, pass the files to the Joiner transformation using the following guidelines:

♦ Configure the mapping with one source and one Source Qualifier transformation in each pipeline.

♦ Specify the path and file name for each flat file in the Properties settings of the Transformations view on the Mapping tab of the session properties.

♦ Each file must use the same file properties as configured in the source definition.

Partitioning Joiner Transformations 385

♦ The range of sorted data in the flat files can overlap. You do not need to use a unique range of data for each file.

Figure 13-18 shows sorted file data joined using 1:n partitioning:

The Joiner transformation may output unsorted data depending on the join type. If you use a full outer or detail outer join, the PowerCenter Server processes unmatched master rows last, which can result in unsorted data.

Using n:n PartitionsIf the session uses sorted flat file data, you can use n:n partitions for the master and detail pipelines. You can add a pass-through partition or hash auto-keys partition at the Joiner transformation. If you add a pass-through partition at the Joiner transformation, follow instructions in the Transformation Guide for maintaining the sort order in mappings.

If you add a hash auto-keys partition point at the Joiner transformation, you can maintain the sort order by passing all sorted data to the Joiner transformation in a single partition. When you pass sorted data in one partition, the PowerCenter Server maintains the sort order when it redistributes data using a hash function.

To allow the PowerCenter Server to pass all sorted data in one partition, configure the session to use the sorted file for the first partition and empty files for the remaining partitions.

The PowerCenter Server redistributes the rows among multiple partitions and joins the sorted data.

Figure 13-18. Sorted File Data with 1:n Partitions

Source Qualifier

Joiner transformation

Source Qualifier with pass-through partition

Sorted Data

Sorted output depends on join type.

Flat File

Flat File 3

Flat File 1

Flat File 2


Figure 13-19 shows sorted file data passed through a single partition to maintain sort order:

The example in Figure 13-19 shows sorted data passed in a single partition to maintain the sort order. The first partition contains sorted file data while all other partitions pass empty file data. At the Joiner transformation, the PowerCenter Server distributes the data among all partitions while maintaining the order of the sorted data.

Using Sorted Relational DataWhen you join relational data, you can use 1:n partitions for the master and detail pipeline. When you use 1:n partitions, you cannot add a partition point at the Joiner transformation. If you use n:n partitions, you can add a pass-through or hash auto-keys partition at the Joiner transformation. If you use a hash auto-keys partition point, you must configure partitions to pass all sorted data in the first partition to maintain sort order.

Using 1:n PartitionsIf the session uses sorted relational data, you can use one partition for the master source and n partitions for the detail source (1:n). Add a key-range or pass-through partition point at the Source Qualifier transformation. Do not add a partition point at the Joiner transformation. The PowerCenter Server maintains the sort order when you create one partition for the master source because it does not redistribute data among partitions.

Figure 13-19. Sorted File Data Passed Through a Single Partition

Source Qualifier

Joiner transformation with hash auto-keys partition point

Source Qualifier

Sorted Data

No Data


Figure 13-20 shows sorted relational data with 1:n partitioning:

The Joiner transformation may output unsorted data depending on the join type. If you use a full outer or detail outer join, the PowerCenter Server processes unmatched master rows last, which can result in unsorted data.

Using n:n PartitionsIf the session uses sorted relational data, you can use n:n partitions for the master and detail pipelines and add a pass-through or hash auto-keys partition point at the Joiner transformation. When you use a pass-through partition at the Joiner transformation, follow instructions in the Transformation Guide for maintaining sorted data in mappings.

When you use a hash auto-keys partition point, you maintain the sort order by passing all sorted data to the Joiner transformation in a single partition. Add a key-range partition point at the Source Qualifier transformation that contains all source data in the first partition. When you pass sorted data in one partition, the PowerCenter Server redistributes data among multiple partitions using a hash function and joins the sorted data.

Figure 13-20. Sorted Relational Data with 1:n Partitioning

Source Qualifier transformation

Joiner transformation

Source Qualifier transformation with key-range or pass-through partition point Sorted Data

Relational Source

Relational Source

Unsorted Data

Sorted output depends on join type.


Figure 13-21 shows sorted relational data passed through a single partition to maintain the sort order:

The example in Figure 13-21 shows sorted relational data passed in a single partition to maintain the sort order. The first partition contains sorted relational data while all other partitions pass empty data. After the PowerCenter Server joins the sorted data, it redistributes data among multiple partitions.

Using Sorter TransformationsIf the session uses the Sorter transformations to sort data, you can use n:n partitions for the master and detail pipelines. Use a hash auto-keys partition point at the Sorter transformation to group the data. You can add a pass-through or hash auto-keys partition point at the Joiner transformation.

The PowerCenter Server groups data into partitions of the same hash values, and the Sorter transformation sorts the data before passing it to the Joiner transformation. When the PowerCenter Server processes the Joiner transformation configured with a hash auto-keys partition, it maintains the sort order by processing the sorted data using the same partitions it uses to route the data from each Sorter transformation.

Figure 13-21. Sorted Relational Data Passed Through a Single Partition

Source Qualifier transformation with key-range partition point Joiner

transformation with hash auto-keys partition pointSource Qualifier

transformation with key-range partition point

Sorted Data

No Data

Relational Source

Relational Source


Figure 13-22 shows Sorter transformations used with hash auto-keys to maintain sort order:

Note: For best performance, use sorted flat files or sorted relational data. You may want to calculate the processing overhead for adding Sorter transformations to your mapping.

Optimizing Sorted Joiner Transformations with PartitionsWhen you use partitions with a sorted Joiner transformation, you may optimize performance by grouping data and using n:n partitions.

Add a Hash Auto-keys Partition Upstream of the Sort OriginTo obtain expected results and get best performance when partitioning a sorted Joiner transformation, you must group and sort data. To group data, ensure that rows with the same key value are routed to the same partition. The best way to ensure that data is grouped and distributed evenly among partitions is to add a hash auto-keys or key-range partition point before the sort origin. Placing the partition point before you sort the data ensures that you maintain grouping and sort the data within each group.

Use n:n PartitionsYou may be able to improve performance for a sorted Joiner transformation by using n:n partitions. When you use n:n partitions, the Joiner transformation reads master and detail rows concurrently and does not need to cache all of the master data. This reduces memory usage and speeds processing. When you use 1:n partitions, the Joiner transformation caches all the data from the master pipeline and writes the cache to disk if the memory cache fills. When the Joiner transformation receives the data from the detail pipeline, it must then read the data from disk to compare the master and detail pipelines.

Figure 13-22. Using Sorter Transformations with Hash Auto-Keys to Maintain Sort Order

Sorter transformation with hash auto-keys partition point

Joiner transformation with hash auto-keys or pass-through partition point

Sorter transformation with hash auto-keys partition point

Sorted DataUnsorted Data

Source with unsorted data

Source with unsorted data




Partitioning Lookup Transformations

You can use cache partitioning for static and dynamic caches, and named and unnamed caches. When you create a partition point at a connected Lookup transformation, you can use cache partitioning under the following conditions:

♦ You use the hash auto-keys partition type for the Lookup transformation.

♦ The lookup condition contains only equality operators.

♦ The database is configured for case-sensitive comparison.

For example, if the lookup condition contains a string port and the database is not configured for case-sensitive comparison, the PowerCenter Server does not perform cache partitioning and writes the following message to the session log:

CMN_1799 Cache partitioning requires case sensitive string comparisons. Lookup will not use partitioned cache as the database is configured for case insensitive string comparisons.

For more information about cache partitioning, see “Cache Partitioning” on page 359.

Partitioning Lookup Transformations 391

Partitioning Sorter Transformations

If you configure multiple partitions in a session that uses a Sorter transformation, the PowerCenter Server sorts data in each partition separately. The Workflow Manager allows you to choose hash auto-keys, key-range, or pass-through partitioning when you add a partition point at the Sorter transformation.

Use hash-auto keys partitioning when you place the Sorter transformation before an Aggregator transformation configured to use sorted input. Hash auto-keys partitioning groups rows with the same values into the same partition based on the partition key. After grouping the rows, the PowerCenter Server passes the rows through the Sorter transformation. The PowerCenter Server processes the data in each partition separately, but hash auto-keys partitioning accurately sorts all of the source data because rows with matching values are processed in the same partition.

Use key-range partitioning when you want to send all rows in a partitioned session from multiple partitions into a single partition for sorting. When you merge all rows into a single partition for sorting, the PowerCenter Server can process all of your data together.

Use pass-through partitioning if you already used hash partitioning in the pipeline. This ensures that the data passing into the Sorter transformation is correctly grouped among the partitions. Pass-through partitioning increases session performance without increasing the number of partitions in the pipeline.

For more information on Sorter transformations, see “Sorter Transformation” in the Transformation Guide.

Configuring Sorter Transformation Work DirectoriesThe PowerCenter Server creates temporary files for each Sorter transformation in a pipeline. It reads and writes data to these files while it performs the sort. The PowerCenter Server stores these files in the Sorter transformation work directories.

By default, the Workflow Manager sets the work directories for all partitions at Sorter transformations to $PMTempDir. You can specify a different work directory for each partition in the session properties.


Figure 13-23 shows where you specify the work directories in the session properties:

Figure 13-23. Session Properties - Configuring Sorter Transformations

Enter Sorter transformation work directories.

Selected Sorter Transformation

Partitioning Sorter Transformations 393

Mapping Variables in Partitioned Pipelines

When you specify multiple partitions in a target load order group that uses mapping variables, the PowerCenter Server evaluates the value of a mapping variable in each partition separately. The PowerCenter Server uses the following process to evaluate variable values:

1. It updates the current value of the variable separately in each partition according to the variable function used in the mapping.

2. After loading all the targets in a target load order group, the PowerCenter Server combines the current values from each partition into a single final value based on the aggregation type of the variable.

3. If there is more than one target load order group in the session, the final current value of a mapping variable in a target load order group becomes the current value in the next target load order group.

4. When the PowerCenter Server completes loading the last target load order group, the final current value of the variable is saved into the repository.

For more information about mapping variables, see “Mapping Parameters and Variables” in the Designer Guide. For more information about target load order groups, see “Reading Source Data” on page 22.

Use one of the following variable functions in the mapping to set the variable value:

♦ SetCountVariable

♦ SetMaxVariable

♦ SetMinVariable

For more information about the variable functions, see “Functions” in the Transformation Language Reference.

Table 13-11 describes how the PowerCenter Server calculates variable values across partitions:

Note: You should use the SetVariable function only once for each mapping variable in a pipeline. When you create multiple partitions in a pipeline, the PowerCenter Server uses multiple threads to process that pipeline. If you use this function more than once for the same variable, the current value of a mapping variable may have indeterministic results.

Table 13-11. Variable Value Calculations with Partitioned Sessions

Variable Function Variable Value Calculation Across Partitions

SetCountVariable PowerCenter Server calculates the final count values from all partitions.

SetMaxVariable PowerCenter Server compares the final variable value for each partition and saves the highest value.

SetMinVariable PowerCenter Server compares the final variable value for each partition and saves the lowest value.


Partitioning Rules

You can create multiple partitions in a pipeline if the PowerCenter Server can maintain data consistency when it processes the partitioned data. When you create a session, the Workflow Manager validates each pipeline for partitioning. You can change the partitioning information for a pipeline as long as it conforms to the rules and restrictions listed in this section.

There are several types of partitioning rules and restrictions. These include restrictions on the number of partitions, partitioning restrictions when you change a mapping, restrictions that apply to other Informatica products, and general guidelines.

Restrictions on the Number of PartitionsIn general, you can create up to 64 partitions at any partition point in each pipeline in a mapping. Under certain circumstances however, the number of partitions should or must be limited.

Restrictions for Numerical FunctionsThe numerical functions CUME, MOVINGSUM, and MOVINGAVG calculate running totals and averages on a row-by-row basis. According to the way you partition a pipeline, the order that rows of data pass through a transformation containing one of these functions can change. Therefore, a session with multiple partitions that uses CUME, MOVINGSUM, or MOVINGAVG functions may not always return the same calculated result.

Restrictions for Relational TargetsWhen you configure a session to load data to relational targets, the PowerCenter Server can create one or more connections to each target. If you configure multiple target partitions in a session that writes to a database or ODBC target that does not support multiple connections, the session fails.

When you create multiple target partitions in a session that loads data to an Informix database, you must create the target table with row-level locking.

For more information, see “Database Compatibility” on page 379.

Sybase IQ does not allow multiple concurrent connections to tables. If you create multiple target partitions in a session that loads to Sybase IQ, the PowerCenter Server loads all of the data in one partition.

Restrictions for TransformationsSome restrictions on the number of partitions depend on the types of transformations in the pipeline. These restrictions apply to all transformations, including reusable transformations, transformations created in mappings and mapplets, and transformations, mapplets, and mappings referenced by shortcuts.

Partitioning Rules 395

Table 13-12 describes the restrictions on the number of partitions for transformations:

Sequence numbers generated by Normalizer and Sequence Generator transformations might not be sequential for a partitioned source, but they are unique.

Restrictions when Running the DebuggerYou can run the Debugger on a session if all pipelines in the mapping contain one partition.

Partition Restrictions for Editing ObjectsWhen you edit object properties, you can impact your ability to create multiple partitions in a a session or to run an existing session with multiple partitions.

Before You Create a SessionWhen you create a session, the Workflow Manager checks the mapping properties. Mappings dynamically pick up changes to shortcuts, but not to reusable objects, such as reusable transformations and mapplets. Therefore, if you edit a reusable object in the Designer after you save a mapping and before you create a session, you must open and resave the mapping for the Workflow Manager to recognize the changes to the object.

After You Create a Session with Multiple PartitionsWhen you edit a mapping after you create a session with multiple partitions, the Workflow Manager does not invalidate the session even if the changes violate partitioning rules. The PowerCenter Server fails the session the next time it runs unless you edit the session so that it no longer violates partitioning rules.

Table 13-12. Restrictions on the Number of Partitions for Transformations

Transformation Restrictions

Custom transformation By default, you can only specify one partition if the pipeline contains a Custom transformation. However, this transformation contains an option on the Properties tab to allow multiple partitions. If you enable this option, you can specify multiple partitions at this transformation. Do not select Is Partitionable if the Custom transformation procedure performs the procedure based on all the input data together, such as data cleansing.

External Procedure transformation

By default, you can only specify one partition if the pipeline contains an External Procedure transformation. This transformation contains an option on the Properties tab to allow multiple partitions. If this option is enabled, you can specify multiple partitions at this transformation.

Joiner transformation You can specify only one partition if the pipeline contains the master source for a Joiner transformation and you do not add a partition point at the Joiner transformation.

XML target instance You can specify only one partition if the pipeline contains XML targets.


The following changes to mappings can cause session failure:

♦ You delete a transformation that was a partition point.

♦ You add a transformation that is a default partition point.

♦ You move a transformation that is a partition point to a different pipeline.

♦ You change a transformation that is a partition point in any of the following ways:

− The existing partition type is invalid.

− The transformation can no longer support multiple partitions.

− The transformation is no longer a valid partition point.

♦ You disable partitioning in an External Procedure transformation after you create a pipeline with multiple partitions.

♦ You switch the master and detail source for the Joiner transformation after you create a pipeline with multiple partitions.

Partition Restrictions for Informatica Application ProductsYou can specify multiple partitions in Informatica Application products, but there are some additional restrictions with these products.

Table 13-13 describes the partitioning restrictions that apply to Informatica Application products:

Table 13-13. Partitioning Guidelines for Informatica Application Products

Product Restrictions

PowerCenter Connect for PeopleSoft If the pipeline contains an Application Source Qualifier transformation for PeopleSoft when it is connected to or associated with a PeopleSoft tree, then you can specify only one partition and the partition type must be pass-through.

PowerCenter Connect for IBM MQSeries

For MQSeries sources, you can specify multiple partitions only if there is no associated source qualifier in the pipeline.You cannot merge output files from sessions with multiple partitions if you use an MQSeries message queue as the target connection type.

PowerCenter Connect for SAP R/3 If the mapping contains hierarchies or IDOCs, then you can specify only one partition and the partition type must be pass-through.If you generate the ABAP program using exec SQL, then you can specify only one partition and the partition type must be pass-through.You must use the Informatica default date format to enter dates in key ranges.

PowerCenter Connect for SAP BW You can specify only one partition when the target load order group contains an SAP BW target.


For more information about these other products, please see the product documentation.

Partitioning GuidelinesThis section summarizes the other guidelines that appear throughout this chapter.

Guidelines for Adding and Deleting Partition PointsThe following guidelines apply to adding and deleting partition points:

♦ You cannot delete a partition point at a Source Qualifier transformation, a Normalizer transformation for COBOL sources, or a target instance.

♦ You cannot create a partition point at a source instance.

♦ You cannot create a partition point at a Sequence Generator transformation or an unconnected transformation.

♦ You can add a partition point at any other transformation provided that no partition point receives input from more than one pipeline stage.

For more information, see “Adding and Deleting Partition Points” on page 353.

Guidelines for Specifying the Partition TypeYou must choose pass-through partitioning at certain partition points in a pipeline if the session uses a source-based commit or constraint-based loading, or if the mapping contains a transaction generator, such as a Transaction Control transformation. For more information, see Table 13-4 on page 357.

If recovery is enabled, the Workflow Manager sets pass-through as the partition type unless the partition point is either an Aggregator transformation or a Rank transformation.

Guidelines for Adding and Deleting Partition KeysThe following guidelines apply to creating and deleting partition keys:

♦ A partition key must contain at least one port.

PowerCenter Connect for Siebel When you use a source filter in a join override, always use the following syntax for Siebel business components:SiebelBusinessComponentName.SiebelFieldNameWhen you create a source filter for a Siebel business component, always use the following syntax:SiebelBusinessComponentName.SiebelFieldName

PowerCenter Connect SDK If the mapping contains a multi-group target that receives data from more than one pipeline, then you can specify only one partition.If the mapping contains a multi-group target that receives data from multiple groups, then the partition type must be pass-through.

Table 13-13. Partitioning Guidelines for Informatica Application Products

Product Restrictions


♦ If you choose key range partitioning at any partition point, you must specify a range for each port in the partition key.

♦ If you choose key range partitioning and need to enter a date range for any port, use the standard PowerCenter date format. For details on the default date format, see “Dates” in the Transformation Language Reference.

♦ The Workflow Manager does not validate overlapping string ranges, overlapping numeric ranges, gaps, or missing ranges.

♦ If a row contains a null value in any column that makes up the partition key, or if a row contains values that fall outside all of the key ranges, the PowerCenter Server sends that row to the first partition.

For more information, see “Adding Key Ranges” on page 365.

Guidelines for Partitioning File Sources and TargetsThe following guidelines apply to partitioning file sources and targets:

♦ When connecting to file sources or targets, you must choose the same connection type for all partitions. You may choose different connection objects as long as each object is of the same type. For more information, see “Partitioning File Sources” on page 374 and “Partitioning File Targets” on page 380.

♦ You cannot merge output files from sessions with multiple partitions if you use FTP, an external loader, or an MQSeries message queue as the target connection type. For more information, see “Partitioning File Targets” on page 380.



C h a p t e r 1 4

Monitoring Workflows


♦ Overview, 402

♦ Using the Workflow Monitor, 404

♦ Customizing Workflow Monitor Options, 409

♦ Using Workflow Monitor Toolbars, 415

♦ Working with Tasks and Workflows, 416

♦ Workflow and Task Status, 421

♦ Using the Gantt Chart View, 423

♦ Using the Task View, 430

♦ Monitoring Session Details, 434

♦ Creating and Viewing Performance Details, 436

♦ Tips, 441

401

Overview

You can monitor workflows and tasks in the Workflow Monitor. View details about a workflow or task in Gantt Chart view or Task view. You can run, stop, abort, and resume workflows from the Workflow Monitor.

The Workflow Monitor displays workflows that have run at least once. The Workflow Monitor continuously receives information from the PowerCenter Server and Repository Server. It also fetches information from the repository to display historic information.

The Workflow Monitor consists of the following windows:

♦ Navigator window. Displays monitored repositories, servers, and repository objects.

♦ Output window. Displays messages from the PowerCenter Server and the Repository Server.

♦ Time window. Displays progress of workflow runs.

♦ Gantt Chart view. Displays details about workflow runs in chronological (Gantt Chart) format.

♦ Task view. Displays details about workflow runs in a report format, organized by workflow run.

The Workflow Monitor displays time relative to the time configured on the PowerCenter Server machine. For example, a folder contains two workflows. One workflow runs on a PowerCenter Server in your local time zone, and the other runs on a PowerCenter Server in a time zone two hours later. If you start both workflows at 9 a.m. local time, the Workflow Monitor displays the start time as 9 a.m. for one workflow and as 11 a.m. for the other workflow.

402 Chapter 14: Monitoring Workflows

Figure 14-1 shows the Workflow Monitor in Gantt Chart view:

Toggle between Gantt Chart view and Task view by clicking the tabs on the bottom of the Workflow Monitor.

Note: You can view and hide the Output window in the Workflow Monitor. To toggle back and forth, choose View-Output.

Permissions and PrivilegesTo use the Workflow Monitor, you must have one of the following sets of permissions and privileges:

♦ Use Workflow Manager privilege with the execute permission on the folder

♦ Workflow Operator privilege with the read permission on the folder


You must also have execute permission for connection objects to restart, resume, stop, or abort a workflow containing a session.

For more information on permissions and privileges necessary to use the Workflow Monitor, see “Permissions and Privileges by Task” in the Repository Guide.

Figure 14-1. Workflow Monitor

Time WindowOutput Window

Gantt Chart View

Task View

Navigator Window

Overview 403

Using the Workflow Monitor

The Workflow Monitor provides options to view information about workflow runs. After you open the Workflow Monitor and connect to a repository, you can view dynamic information about workflow runs by connecting to a PowerCenter Server.

You can customize the Workflow Monitor display by configuring the maximum days or workflow runs the Workflow Monitor shows. You can also filter tasks and servers in both Gantt Chart and Task view.

Complete the following steps to monitor workflows:

1. Open the Workflow Monitor.

2. Connect to the repository containing the workflow.

3. Connect to the PowerCenter Server.

4. Select the workflow you want to monitor.

5. Choose from Gantt Chart view or Task view.

Opening the Workflow MonitorYou can open the Workflow Monitor in the different ways:

♦ From the Windows Start menu

♦ From the Workflow Manager Navigator

♦ Configure the Workflow Manager to open the Workflow Monitor when you run a workflow from the Workflow Manager.

You can open multiple instances of the Workflow Monitor on one machine using the Windows Start menu.

To open the Workflow Monitor when you start a workflow:

1. In the Workflow Manager, choose Tools-Options.

2. In the General tab, select Launch Workflow Monitor When Workflow Is Started.

To open the Workflow Monitor from the Workflow Manager:


2. In the Navigator, right-click a server or a repository and choose Run Monitor.

The Workflow Monitor appears.


Connecting to RepositoriesWhen you open the Workflow Monitor, you must connect to a repository to monitor the objects in it. Connect to repositories by choosing Repository-Connect. Enter the repository name and connection information.

Once you connect to a repository, the Workflow Monitor displays a list of servers available for the repository. The Workflow Monitor can monitor multiple repositories, PowerCenter Servers, and workflows at the same time.

Note: If you are not connected to a repository, you can remove the repository from the Navigator. Select the repository in the Navigator and choose Edit-Delete. The Workflow Monitor displays a message verifying that you want to remove the repository from the Navigator list. Click Yes to remove the repository. You can connect to the repository again at any time.

Connecting to PowerCenter ServersWhen you connect to a repository, the Workflow Monitor displays all registered PowerCenter Servers and deleted PowerCenter Servers. To monitor tasks and workflows that run on a server, you must connect to the server. In the Navigator, the Workflow Monitor displays a red icon over deleted servers.

To connect to a server, right-click it and choose Connect. When you connect to a server, you can view all folders that you have read permission on. You can disconnect from a server by right-clicking it and selecting Disconnect. When you disconnect from a server, or when the Workflow Monitor cannot connect to a server, the Workflow Monitor displays disconnected for the server status.

You can also verify whether a PowerCenter Server is running by pinging it. Right-click the server in the Navigator and select Ping Server. You can view the ping response time in the Output window.

Note: You can also open a PowerCenter Server node in the Navigator without connecting to it. When you open a PowerCenter Server, the Workflow Monitor gets workflow run information stored in the repository. It does not get dynamic workflow run information from currently running workflows.

Filtering Tasks and ServersYou can filter tasks and servers in both Gantt Chart view and Task view. Use the Filters menu to hide tasks and servers you do not want to view in the Workflow Monitor.

Filtering TasksYou can view all or some workflow tasks. You can filter out tasks to view only tasks you want. For example, if you want to view only Session tasks, you can hide all other tasks. You can view all tasks at any time.

Using the Workflow Monitor 405

You can also filter deleted tasks. To filter deleted tasks, choose Filters-Deleted Tasks.

To filter tasks:

1. Choose Filters-Tasks.

The Filter Tasks dialog box appears.

2. Clear the tasks you want to hide, and select the tasks you want to view.

3. Click OK.

Note: When you filter a task, the Gantt Chart view displays a red link between tasks to indicate a filtered task. You can double-click the link to view the tasks you hid.

Filtering ServersWhen you connect to a repository, the Workflow Monitor displays a list of registered servers and deleted servers. When you register multiple servers, you can filter out servers to view only servers you want to monitor.

When you hide a server, the Workflow Monitor hides the server from the Navigator for both Gantt Chart and Task view. You can show the server at any time.

You can hide unconnected servers. When you hide a connected server, the Workflow Monitor asks if you want to disconnect from the server and then filter it. You must disconnect from a server before hiding it.

To filter servers:

1. In the Navigator, right-click a repository and select Filter Servers.

or

Choose Filters-Servers.


The Filter Servers dialog box appears.

2. Select the servers you want to view, and clear the servers you want to filter. Click OK.

If you are connected to a server that you clear, the Workflow Monitor prompts you to disconnect from the server before filtering.

3. Click Yes to disconnect from the server and filter it.

The Workflow Monitor hides the server from the Navigator.

Click No to remain connected to the server. If you click No, you cannot filter the server.

Tip: You can also filter a server in the Navigator by right-clicking it and selecting Filter Server.

Opening and Closing FoldersYou can choose which folders to open and close in the Workflow Monitor. When you open a folder, the Workflow Monitor displays the number of workflow runs that you configured in the Workflow Monitor options. For more information, see “Configuring General Options” on page 409.

You can open and close folders in both Gantt Chart and Task view. When you open a folder, it opens in both views. To open a folder, right-click it in the Navigator and select Open. Or, you can double-click the folder.

To view folder contents in the Workflow Monitor, you must have one of the following sets of permissions and privileges:

♦ Workflow Operator privilege with read permission on the folder


Using the Workflow Monitor 407

Viewing StatisticsYou can view statistics about the objects you monitor in the Workflow Monitor by choosing View-Statistics. The Statistics dialog box displays the following information:

♦ Number of opened repositories. Number of repositories you are connected to in the Workflow Monitor.

♦ Number of connected servers. Number of servers you connected to since you opened the Workflow Monitor.

♦ Number of fetched tasks. Number of tasks the Workflow Monitor fetched from the repository during the period specified in the Time window.

Figure 14-2 shows the Statistics dialog box:

Viewing PropertiesYou can view properties for the following items:

♦ Tasks. You can view properties such as task name, start time, and status.

♦ Sessions. You can view properties about the Session task and session run, such as mapping name and number of rows successfully loaded. You can also view load statistics about the session run. For more information on session details, see “Monitoring Session Details” on page 434. You can also view performance details about the session run. For more information, see “Creating and Viewing Performance Details” on page 436.

♦ Workflows. You can view properties such as start time, status, and run type.

♦ Links. When you double-click a link between tasks in Gantt Chart view, you can view tasks you hide.

♦ Servers. You can view properties such as server version and startup time. You can also view the sessions and workflows running on the PowerCenter Server.

♦ Folders. You can view properties such as the number of workflow runs displayed in the Time window.

To view properties for all objects, right-click the object and select Properties. You can right-click items in the Navigator or the Time window in either Gantt Chart view or Task view.

To view link properties, double-click the link in the Time window of Gantt Chart view. When you view link properties, you can double-click a task in the Link Properties dialog box to view the properties for the filtered task.

Figure 14-2. Workflow Monitor Statistics Dialog Box


Customizing Workflow Monitor Options

You can configure how the Workflow Monitor displays general information, workflows, and tasks. You can configure general tasks such as the maximum number of days or runs that the Workflow Monitor displays. You can also configure options specific to Gantt Chart and Task view.

Choose Tools-Options to configure Workflow Monitor options.

You can configure the following options in the Workflow Monitor:

♦ General. Customize general options such as the maximum number of workflow runs to display and whether to receive messages from the Workflow Manager. See “Configuring General Options” on page 409

♦ Gantt Chart view. Configure Gantt Chart view options such as workspace color, status colors, and time format. See “Configuring Gantt Chart View Options” on page 411.

♦ Task view. Configure which columns to display in Task view. See “Configuring Task View Options” on page 412.

♦ Advanced. Configure advanced options such as the number of workflow runs the Workflow Monitor holds in memory for each server. “Configuring Advanced Options” on page 412.

Configuring General OptionsYou can customize general options such as the maximum number of days to display and which text editor to use for viewing session and workflow logs.

Customizing Workflow Monitor Options 409

Figure 14-3 shows the General Options tab:

Table 14-1 describes the options you can configure on the General tab:

Figure 14-3. General Tab for Workflow Monitor Options

Table 14-1. Workflow Monitor General Options

Setting Description

Maximum Days Specifies the number of tasks the Workflow Monitor displays up to a maximum number of days. The default is 5.

Maximum Workflow Runs per Folder

Specifies the maximum number of workflow runs the Workflow Monitor displays for each folder. The default is 200.

Receive Messages from Workflow Manager

Select this option to receive messages from the Workflow Manager. The Workflow Manager sends messages when you start or schedule a workflow in the Workflow Manager. The Workflow Monitor displays these messages in the Output window.

Receive Notifications from Repository Server

Select this option to receive notifications from the Repository Server. Notifications from the Repository Server display in the Output window Notifications tab.

Log File Editor Enter the path and file name of the text editor to view and edit workflow and session logs. You can browse to select an editor. By default, the Workflow Monitor uses WordPad.

Location The location where the Workflow Monitor stores temporary versions of log files when you open session or workflow logs from the Workflow Monitor.


Configuring Gantt Chart View OptionsYou can configure Gantt Chart view options such as workspace color, status colors, and time format.

Figure 14-4 shows the Gantt Chart Options tab:

Table 14-2 describes the options you can configure on the Gantt Chart Options tab:

Figure 14-4. Gantt Chart Options

Table 14-2. Gantt Chart Options

Gantt Chart Option Description

Status Color Choose a status and configure the color for the status. The Workflow Monitor displays tasks with the selected status in the colors you choose. You can choose two colors to display a gradient.

Recovery Color Configure the color for the recovery sessions. The Workflow Monitor uses the status color for the body of the status bar, and it uses and the recovery color as a gradient in the status bar.

Workspace Color Choose a color for each workspace component.

Time Format Select a display format for the time window.


Configuring Task View OptionsYou can choose the columns you want to display in Task view. You can also reorder the columns and specify a default column width.

Figure 14-5 shows the Task View Options tab:

Configuring Advanced OptionsYou can configure advanced options such as the number of workflow runs the Workflow Monitor holds in memory for each server.

Figure 14-5. Task View Options


Figure 14-6 shows the Advanced Options tab:

Table 14-3 describes the options you can configure on the Advanced tab:

Figure 14-6. Advanced Tab for Workflow Monitor Options

Table 14-3. Advanced Workflow Monitor Options

Setting Description

Expand Running Workflows Automatically Expands running workflows in the Navigator.

Hide Folders/Workflows That Do Not Contain Any Runs When Filtering By Running/Schedule Runs

Hides folders or workflows under the Workflow Run column in the Time window when you filter running or scheduled tasks.

Highlight the Entire Row When an Item Is Selected

Highlights the entire row in the Time window for selected items. When you disable this option, the Workflow Monitor highlights the item in the Workflow Run column in the Time window.


Open Latest 20 Runs At a Time Allows you to open the number of workflow runs of your choice. The number of runs to be opened is set at 20 by default.

Minimum Number of Workflow Runs (Per Server) the Workflow Monitor Will Accumulate in Memory

Specifies the minimum number of workflow runs per server that the Workflow Monitor holds in memory before it starts releasing older runs from memory. When you connect to a server, the Workflow Monitor fetches the number of workflow runs specified on the General tab for each folder you connect to. When the number of runs is less than the number specified in this option, the Workflow Monitor stores new runs in memory until it reaches this number. Then it releases the oldest run from memory when it fetches a new run.When the number of workflow runs the Workflow Monitor initially fetches exceeds the number specified in this option, the Workflow Monitor stores all those runs and then releases the oldest run from memory when it fetches a new run.

Table 14-3. Advanced Workflow Monitor Options

Setting Description


Using Workflow Monitor Toolbars

The Workflow Monitor toolbars allow you to select tools and tasks quickly. You can perform the following toolbar operations:

♦ Display or hide a toolbar.

♦ Create a new toolbar.

♦ Add or remove buttons.

For details on how to perform these toolbar operations, see “Using the Designer” in the Designer Guide.

By default, the Workflow Monitor displays the following toolbars:

♦ Standard. Contains buttons to connect to and disconnect from repositories, and to zoom and print the workspace.

Figure 14-7 displays the Standard toolbar:

♦ Server. Contains buttons to connect to and disconnect from PowerCenter Servers, to ping the server, and to start and stop workflows, worklets, and tasks.

Figure 14-8 displays the Server toolbar:

♦ View. Contains buttons to refresh the view and to open workflow and session logs.

Figure 14-9 displays the View toolbar:

♦ Filter. Contains buttons to display most recent runs, and to filter tasks, servers, and folders.

Figure 14-10 displays the Filter toolbar:

Figure 14-7. Standard Toolbar

Figure 14-8. Server Toolbar

Figure 14-9. View Toolbar

Figure 14-10. Filter Toolbar

Using Workflow Monitor Toolbars 415

Working with Tasks and Workflows

You can perform the following tasks with objects in the Workflow Monitor:

♦ Run a task or workflow.

♦ Resume a suspended workflow.

♦ Stop or abort a task or workflow.

♦ Schedule and unschedule a workflow.

♦ View session logs and workflow logs.

♦ View history names.

Running a Task, Workflow, or WorkletThe Workflow Monitor displays workflows that have run at least once. In the Workflow Monitor, you can run a workflow or any task or worklet in the workflow. To run a workflow or part of a workflow, right-click the workflow or task and choose a restart option. When you choose restart, the task, workflow, or worklet runs on the PowerCenter Server you specify in the workflow properties.

You can also run part of a workflow. When you run part of a workflow, the PowerCenter Server runs the workflow from the selected task to the end of the workflow.

For details on running workflows and tasks in the Workflow Manager, see “Running the Workflow” on page 122.

To run a workflow from the Workflow Monitor:

1. In the Navigator, select the workflow you want to run.

2. Right-click the workflow in the Navigator and choose Restart.

or

Choose Task-Restart.

The PowerCenter Server runs the workflow you specify.

To run a task from the Workflow Monitor:

1. In the Navigator, select the task or worklet you want to run.

2. Right-click the task or worklet in the Navigator and choose Restart Task.

The PowerCenter Server runs the task or worklet you specify. It does not run the rest of the workflow.

To run a part of a workflow from the Workflow Monitor:

1. In the Navigator, select the task from which you want to run the workflow.


2. Right-click the task and choose Restart Workflow from Task.

or

Choose Task-Restart.

The PowerCenter Server runs the workflow starting with the task you specify.

Resuming a Workflow or WorkletIn the workflow properties, you can choose to suspend the workflow or worklet if a task fails. After you fix the failed task, resume the workflow in the Workflow Monitor. When you resume a workflow, the PowerCenter Server finds the failed task, runs the task again, and continues running the rest of the tasks in the workflow path.

For details on suspending a workflow, see “Suspending the Workflow” on page 127.

To resume a workflow or worklet:

1. In the Navigator, select the workflow or worklet you want to resume.

2. Choose Tasks-Resume.

or

Right-click the workflow or worklet in the Navigator and choose Resume.

The Workflow Monitor displays server messages about the resume command in the Output window.

Recovering a Workflow or WorkletIn the workflow properties, you can choose to suspend the workflow or worklet if a session fails. After you fix the errors that caused the session to fail, recover the workflow in the Workflow Monitor. When you recover a workflow, the PowerCenter Server recovers the failed session, and continues running the rest of the tasks in the workflow path.

For details on suspending a workflow, see “Suspending the Workflow” on page 127.

To recover a workflow or worklet:

1. In the Navigator, select the workflow or worklet you want to recover.

2. Choose Tasks-Resume/Recover.

or

Right-click the workflow or worklet in the Navigator and choose Resume/Recover.

The Workflow Monitor displays server messages about the recover command in the Output window.

Working with Tasks and Workf lows 417

Stopping or Aborting Tasks and WorkflowsYou can stop or abort a task, workflow, or worklet in the Workflow Monitor at any time. When you stop a task in the workflow, the PowerCenter Server stops processing the task and all other tasks in its path. The PowerCenter Server continues running concurrent tasks. If the PowerCenter Server cannot stop processing the task, you need to abort the task. When the PowerCenter Server aborts a task, it kills the DTM process and terminates the task.

For details on server handling of stop and abort, see “Server Handling of Stop and Abort” on page 129.

To stop or abort workflows, tasks, or worklets in the Workflow Monitor:

1. In the Navigator, select the task, workflow, or worklet you want to stop or abort.

2. Choose Tasks-Stop or Tasks-Abort.

or

Right-click the task, workflow, or worklet in the Navigator and choose Stop or Abort.

3. The Workflow Monitor displays the status of the stop or abort command in the Output window.

Scheduling and Unscheduling WorkflowsYou can schedule and unschedule workflows in the Workflow Monitor. You can schedule any workflow that is not configured to run on demand. When you try to schedule a run on demand workflow, the Workflow Monitor displays an error message in the Output window.

When you schedule an unscheduled workflow, the workflow uses its original schedule specified in the workflow properties. If you want to specify a different schedule for the workflow, you must edit the scheduler in the Workflow Manager.

To schedule an unscheduled workflow in the Workflow Monitor:

♦ Right-click the workflow and choose Schedule.

The Workflow Monitor displays the workflow status as Scheduled, and displays a message in the Output window.

To unschedule a scheduled workflow in the Workflow Monitor:

♦ Right-click the workflow and choose Unschedule.

The Workflow Monitor displays the workflow status as Unscheduled, and displays a message in the Output window.

For details on scheduling workflows, see “Scheduling a Workflow” on page 112.


Viewing Session Logs and Workflow LogsYou can open and edit session and workflow log files from the Workflow Monitor. To view workflow or session logs, connect to the server. You can view the most recent session or workflow log. Or, select a particular workflow run and view the log for that run. If a past session or workflow log is not available, the Workflow Manager opens the most recent log file.

You can view log files in any text editor on the PowerCenter Client. To change the log file editor, choose Tools-Options. Enter the path and file name of the text editor in the Log File Editor field on the General tab.

When you open a session or workflow log, the Workflow Monitor copies the log file from the PowerCenter Server machine to the directory specified on the General tab of the Options dialog box. The Workflow Monitor opens the file from the temporary directory on the client machine. When you open a session or workflow log, you can cancel the operation at any time.

Note: To view past session or workflow log files, you must configure the session or workflow to save logs by timestamp. For more information on workflow and session logs, see “Log Files” on page 455.

Viewing Dynamic Log FilesWhen you open a session or workflow log, the Workflow Monitor opens the most recent version of the log file, even if the PowerCenter Server is currently writing to the log file. Each time you choose Get Session Log or Get Workflow Log, the Workflow Monitor opens a new text file with the most recent version of the log file. If you choose to open the log file after the session completes, the Workflow Monitor opens the entire log in a new text file.

Steps to View Log FilesPerform the following steps to view a session or workflow log.

To view a session or workflow log file:

1. Right-click a Session task or workflow in the Navigator or Time window.

2. Choose Get Session Log, or choose Get Workflow Log.

The most recent session or workflow log file opens in the log file editor you specify for the Workflow Monitor.

Tip: When the Workflow Monitor retrieves the session or workflow log, you can press the Esc key to cancel the process.

Viewing History NamesIf you rename a task, workflow, or worklet, the Workflow Monitor can show a history of names. When you start a renamed task, workflow, or worklet, the Workflow Monitor displays the current name. To view a list of historical names, select the task, workflow, or worklet in the Navigator. Right-click and choose Show History Names.

Working with Tasks and Workf lows 419

Figure 14-11 shows the History Names dialog box:

Figure 14-11. History Names Dialog Box


Workflow and Task Status

The Workflow Monitor displays the status of workflows and tasks.

Table 14-4 describes the different statuses for workflow and tasks:

Table 14-4. Workflow and Task Status

Status Name Status for Description

Aborted WorkflowsTasks

The PowerCenter Server aborted the workflow or task. The PowerCenter Server kills the DTM process when you abort a workflow or task.

Aborting WorkflowsTasks

The PowerCenter Server is in the process of aborting the workflow or task.

Disabled WorkflowsTasks

You select the Disabled option in the workflow or task properties. The PowerCenter Server does not run the disabled workflow or task until you clear the Disabled option.

Failed WorkflowsTasks

The PowerCenter Server failed the workflow or task due to errors.

Running WorkflowsTasks

The PowerCenter Server is running the workflow or task.

Scheduled Workflows You schedule the workflow to run at a future date. The PowerCenter Server runs the workflow for the duration of the schedule.

Stopped WorkflowsTasks

You choose to stop the workflow or task in the Workflow Monitor. The PowerCenter Server stopped the workflow or task.

Stopping WorkflowsTasks

The PowerCenter Server is in the process of stopping a workflow or task.

Succeeded WorkflowsTasks

The PowerCenter Server successfully completed the workflow or task.

Suspended WorkflowsWorklets

The PowerCenter Server suspends the workflow because a task fails and no other tasks are running in the workflow. This status is available only when you choose the Suspend on Error option.

Suspending WorkflowsWorklets

A task fails in the workflow when other tasks are still running. The PowerCenter Server stops executing the failed task and continues executing tasks in other paths. This status is available only when you choose the Suspend on Error option.

Terminated Workflows The PowerCenter Server terminated unexpectedly when it was running this workflow or task.

Unscheduled Workflows You removed a workflow from the schedule. Or, the workflow is scheduled and the PowerCenter Server is about to run the scheduled workflow.

Waiting WorkflowsTasks

The PowerCenter Server is waiting for available resources so it can execute the workflow or task. For example, you may set the maximum number of concurrent sessions to 10. If the PowerCenter Server is already executing 10 concurrent sessions, all other workflows and tasks has the Waiting status until the PowerCenter Server is free to execute more tasks.

Workflow and Task Status 421

To see a list of tasks by status, view the workflow in Task view and sort by status. Or, choose Edit-List Tasks in Gantt Chart view. For details, see “Listing Tasks and Workflows” on page 424.


Using the Gantt Chart View

The Gantt Chart view allows you to view chronological details of workflow runs. The Gantt Chart view displays the following information:

♦ Task name. Name of the task in the workflow.

♦ Duration. The length of time the PowerCenter Server spends running the most recent task or workflow.

♦ Status. The status of the most recent task or workflow. For more information about status, see “Workflow and Task Status” on page 421.

♦ Connection between objects. The Workflow Monitor shows links between objects in the Time window.

Figure 14-12 displays the Gantt Chart view:

Organizing TasksIn Gantt Chart view, you can organize tasks in the Navigator. You can drag and drop tasks within a workflow to change the order they appear in the Navigator.

Figure 14-12. Gantt Chart View

Using the Gantt Chart View 423

For example, the Workflow Monitor usually displays the Decision task as the first task in the following workflow:

You can drag and drop the Decision task within the Navigator so the Decision task is in the middle or at the bottom of the list of tasks for that workflow:

Listing Tasks and WorkflowsThe Workflow Monitor lists tasks and workflows in all repositories you connect to. You can view tasks and workflows by status, such as failed or succeeded. You can highlight the task in Gantt Chart view by double-clicking the task in the list.

Decision task displays first.

Decision task displays between other tasks.


To view a list of tasks and workflows by status:

1. Open the Gantt Chart view and choose Edit-List Tasks. The List Tasks dialog box appears.

2. In the List What field, select the type of task status you want to list.

For example, select Failed to view a list of failed tasks and workflows.

3. Click List to view the list.

Tip: Double-click the task name in the List Tasks dialog box to highlight the task in Gantt Chart view.

Navigating the Time Window in Gantt Chart ViewYou can scroll through the Time window in Gantt Chart view to monitor the workflow runs. To scroll the Time window, you can use any of the following methods:

♦ Use the scroll bars.

♦ Right-click the task or workflow and choose Go To Next Run, or choose Go To Previous Run.

♦ Choose View-Organize to select the date you want to display.

When you choose View-Organize, the Go To field appears above the Time window. Click the Go To field to view a calendar and select the date you want to display. When you choose a date, the Workflow Monitor displays that date beginning at 12:00 a.m.


Figure 14-13 shows the Go To field:

Zooming the Gantt Chart ViewYou can change the zoom settings in Gantt Chart view. By default, the Workflow Monitor shows the Time window in increments of one hour. You can change the time increments to zoom the Time window.

Figure 14-13. Organizing Gantt Chart


Figure 14-14 shows the Time window in 30 minute increments:

To zoom the Time window in Gantt Chart view, choose View-Zoom and then choose the desired time increment.

You can also choose the time increment in the Zoom button on the toolbar.

Performing a SearchUse the search tool in the Gantt Chart view to search for tasks, workflows, and worklets in all repositories you connect to. The Workflow Monitor searches for the word you specify in task names, workflow names, and worklet names. You can highlight the task in Gantt Chart view by double-clicking the task after searching.

Figure 14-14. Zooming the Gantt Chart View

Zoom

30 Minute Increments

Solid Line For Hour Increments

Dotted Line For Half Hour Increments


To perform a search:

1. Open the Gantt Chart view and choose Edit-Find. The Find Object dialog box appears.

2. In the Find What field, enter the keyword you want to find.

3. Click Find Now.

The Workflow Monitor displays a list of tasks, workflows, and worklets that match the keyword.

Tip: Double-click the task name in the Find Object dialog box to highlight the task in Gantt Chart view.


Opening All FoldersYou can open all folders that you have read permission on in a Repository. To open all the folders in the Gantt Chart view, right-click the server you want to view, and then choose Open All Folders. The Workflow Monitor displays workflows and tasks in the folders.


Using the Task View

The Task view displays information about workflow runs in a report format. The Task view provides a convenient way to compare and filter details of workflow runs. Task view displays the following information:

♦ Workflow run list. The list of workflow runs. The workflow run list contains folder, workflow, worklet, and task names. The Workflow Monitor displays workflow runs chronologically with the most recent run at the top. It displays folders and servers alphabetically.

♦ Status. The status of the task or workflow.

♦ Start time. The time that the PowerCenter Server starts executing the task or workflow.

♦ Completion time. The time that the PowerCenter Server finishes executing the task or workflow.

♦ Status message. Message from the PowerCenter Server regarding the status of the task or workflow.

♦ Run type. The method you used to start the workflow. You might manually start the workflow or schedule the workflow to start.

♦ Worker server. The PowerCenter Server that ran the task.

You can perform the following tasks in Task view:

♦ Filter tasks. Use the Filter menu to select the tasks you want to display or hide. For more information on filtering tasks in Task view, see “Filtering in Task View” on page 431.

♦ Hide and view columns. Hide or view an entire column in Task view. For details on hiding and viewing columns in Task view, see “Configuring Task View Options” on page 412.

♦ Hide and view the Navigator. You can hide the Navigator in Task view. Choose View-Navigator to hide or view the Navigator.

To view the tasks in Task view, select the server you want to monitor in the Navigator.


Figure 14-15 displays the Task view:

Filtering in Task ViewIn Task view, you can view all or some workflow tasks. You can filter tasks in the following ways:

♦ By task type. You can filter out tasks to view only tasks you want. For example, if you want to view only session task types, you can filter out all other tasks. For more information on filtering task types and servers, see “Filtering Tasks and Servers” on page 405.

♦ By nodes in the Navigator. You can filter the workflow runs the Workflow Monitor displays in the Time window by selecting different nodes in the Navigator. For example, when you select a repository name in the Navigator, the Time window displays all workflow runs that ran on the PowerCenter Servers registered to that repository. When you select a folder name in the Navigator, the Time window displays all workflow runs in that folder.

♦ By the most recent runs. To display by the most recent runs, choose Filters-Most Recent Runs and choose the number of runs you want to display.

♦ By Time window columns. You can choose Filters-Auto Filter and filter by properties you specify in the Time window columns.

Figure 14-15. Task View

Time Window

Navigator Window

Task View

Output Window

Workflow Run List

Using the Task View 431

To filter by Time view columns:

1. Choose Filters-Auto Filter.

The Filter button appears in the some columns of the Time Window in Task view:

2. Click the Filter button in a column in the Time Window.

3. Choose the properties you want to filter.

Tip: If you want to view all tasks, select All to view all tasks.

When you click the Filter button in either the Start Time or Completion Time column, you can choose a custom time to filter.

4. Select Custom for either Start Time or Completion Time. The Filter Start Time or Custom Completion Time dialog box appears.

5. Choose to show tasks before, after, or between the time you specify. Select the date and time. Click OK.

Select the workflows you want to display.

Filter Button


Opening All FoldersYou can open all folders that you have read permission on in a Repository. To open all folders in the Task view, right-click the server with the folders you want to view, and then choose Open All Folders. The Workflow Monitor displays workflows and tasks in the folders.

Using the Task View 433

Monitoring Session Details

When the PowerCenter Server runs a Session task, the Workflow Monitor creates session details that provide load statistics for each target in the mapping. You can view session details when the session runs or after the session completes.

To view session details, right-click the session in the Workflow Monitor and choose Properties. Click the Transformation Statistics tab in the Properties dialog box.

Figure 14-16 shows the session details on the Transformation Statistics tab:

When you create multiple partitions in a session, the PowerCenter Server provides session details for each partition. You can use these details to determine if the data is evenly distributed among the partitions. For example, if the PowerCenter Server moves more rows through one target partition than another, or if the throughput is not evenly distributed, you might want to adjust the data range for the partitions.

When you load data to a target with multiple groups, such as an XML target, the PowerCenter Server provides session details for each group.

Table 14-5 lists the information on the Transformation Statistics tab:

Figure 14-16. Session Properties Transformation Statistics

Table 14-5. Session Details on the Transformation Statistics Tab

Session Detail Description

Instance Name Name of the source qualifier instance or the target instance in the mapping. If you create multiple partitions in the source or target, the Instance Name displays the partition number. If the source or target contains multiple groups, the Instance Name displays the group name.

Transformation Name Name of the source qualifier or target.


Applied Rows For targets, shows the number of rows the PowerCenter Server successfully applied to the target (that is, the target returned no errors). For sources, shows the number of rows the PowerCenter Server successfully read from the source. Note: The number of applied rows equals the number of affected rows for sources.

Affected Rows For targets, shows the number of rows affected by the specified operation. For example, you have a table with one column called SALES_ID and five rows containing the values 1, 2, 3, 2, and 2. You mark rows for update where SALES_ID is 2. The writer affects three rows, even though there was only one update request. Or, if you mark rows for update where SALES_ID is 4, the writer affects 0 rows. For sources, shows the number of rows the PowerCenter Server successfully read from the source. Note: The number of applied rows equals the number of affected rows for sources.

Rejected Rows Number of rows the PowerCenter Server dropped when reading from the source, or the number of rows the PowerCenter Server rejected when writing to the target.

Throughput (Rows/Sec) Rate at which the PowerCenter Server read rows from the source or wrote data into the target in bytes per second.

Last Error Message The most recent error message written to the session log. If you view details after the session completes, this field displays the last error message.

Last Error Code The error message code of the most recent error message written to the session log. If you view details after the session completes, this field displays the last error code.

Start Time The time the PowerCenter Server started to read from the source or write to the target.The Workflow Monitor displays time relative to the PowerCenter Server.

End Time The time the PowerCenter Server finished reading from the source or writing to the target.The Workflow Monitor displays time relative to the PowerCenter Server.

Table 14-5. Session Details on the Transformation Statistics Tab

Session Detail Description

Monitoring Session Details 435

Creating and Viewing Performance Details

The performance details provide counters that help you understand the session and mapping efficiency. Each source qualifier, target definition, and individual transformation appears in the performance details, along with counters that display performance information about each transformation.

You can view performance details through the Workflow Monitor as the session runs, or you can open the resulting file in a text editor.

You create performance details by selecting Collect Performance Data in the session properties before running the session. By evaluating the final performance details, you can determine where session performance slows down. Monitoring also provides session-specific details that can help tune the following:

♦ Buffer block size

♦ Index and data cache size for Aggregator, Rank, Lookup, and Joiner transformations

♦ Lookup transformations

Before using performance details to improve session performance you must do the following:

♦ Enable monitoring

♦ Increase Load Manager shared memory

♦ Understand performance counters

Enabling MonitoringTo view performance details, you must enable monitoring in the session properties before running the session.

To enable monitoring:

1. In the Workflow Manager, open the selected session properties.

2. In the Performance settings of the Properties tab, select Collect Performance Data, and click OK.

3. Run the session.

Viewing Session Performance DetailsYou can view session performance details in the Workflow Monitor or by locating and opening the performance details file.

In the Workflow Monitor, you can watch performance details during the session run.


To view performance details in the Workflow Monitor:

1. While the session is running, right-click the session in the Workflow Monitor and choose Properties.

2. Click the Performance tab in the Properties dialog box.

3. Click OK.

To view the performance details file:

1. Locate the performance details file.

The PowerCenter Server names the file session_name.perf, and stores it in the same directory as the session log. If there is no session-specific directory for the session log, the PowerCenter Server saves the file in the default log files directory.

2. Open the file in any text editor.

Memory Requirement for Performance DetailsWhen you enable monitoring, you must increase the size of the Load Manager Shared Memory. For each session in shared memory that you configure to create performance details, the Load Manager requires 200,000 bytes of additional shared memory.

If you create performance details for all sessions, multiply the MaxSessions parameter by 200,000 bytes to calculate the additional shared memory requirements.

Understanding Performance CountersAll transformations have some basic counters that indicate the number of input rows, output rows, and error rows.

Source Qualifiers, Normalizers, and targets have additional counters that indicate the efficiency of data moving into and out of buffers. You can use these counters to locate performance bottlenecks.

Creating and Viewing Performance Details 437

Some transformations have counters specific to their functionality. For example, each Lookup transformation has a counter that indicates the number of rows stored in the lookup cache.

When you read performance details, the first column displays the transformation name as it appears in the mapping, the second column contains the counter name, and the third column holds the resulting number or efficiency percentage.

When you create multiple partitions in a pipeline, the PowerCenter Server generates one set of counters for each partition. The following performance counters illustrate two partitions for an Expression transformation:

Note: When you increase the number of partitions, the number of aggregate or rank input rows may be different from the number of output rows from the previous transformation.

Table 14-6 lists the counters that may appear in the Session Performance Details dialog box or in the performance details file:

Transformation Counter Value

EXPTRANS [1] Expression_input rows 8

Expression_output rows 8

EXPTRANS [2] Expression_input rows 16

Expression_output rows 16

Table 14-6. Performance Counters

Transformation Counters Description

Aggregator andRank Transformations

Aggregator/Rank_inputrows Number of rows passed into the transformation.

Aggregator/Rank_outputrows Number of rows sent out of the transformation.

Aggregator/Rank_errorrows Number of rows in which the PowerCenter Server encountered an error.

Aggregator/Rank_readfromcache Number of times the PowerCenter Server read from the index or data cache.

Aggregator/Rank_writetocache Number of times the PowerCenter Server wrote to the index or data cache.

Aggregator/Rank_readfromdisk Number of times the PowerCenter Server read from the index or data file on the local disk, instead of using cached data.

Aggregator/Rank_writetodisk Number of times the PowerCenter Server wrote to the index or data file on the local disk, instead of using cached data.

Aggregator/Rank_newgroupkey Number of new groups the PowerCenter Server created.

Aggregator/Rank_oldgroupkey Number of times the PowerCenter Server used existing groups.


Lookup Transformation

Lookup_inputrows Number of rows passed into the transformation.

Lookup_outputrows Number of rows sent out of the transformation.

Lookup_errorrows Number of rows in which the PowerCenter Server encountered an error.

Lookup_rowsinlookupcache Number of rows stored in the lookup cache.

Joiner Transformation

Joiner_inputMasterRows Number of rows the master source passed into the transformation.

Joiner_inputDetailRows Number of rows the detail source passed into the transformation.

Joiner_outputrows Number of rows sent out of the transformation.

Joiner_errorrows Number of rows in which the PowerCenter Server encountered an error.

Joiner_readfromcache Number of times the PowerCenter Server read from the index or data cache.

Joiner_writetocache Number of times the PowerCenter Server wrote to the index or data cache.

Joiner_readfromdisk* Number of times the PowerCenter Server read from the index or data files on the local disk, instead of using cached data.

Joiner_writetodisk* Number of times the PowerCenter Server wrote to the index or data files on the local disk, instead of using cached data.

Joiner_readBlockFromDisk** Number of times the PowerCenter Server read from the index or data files on the local disk, instead of using cached data.

Joiner_writeBlockToDisk** Number of times the PowerCenter Server wrote to the index or data cache.

Joiner_seekToBlockInDisk** Number of times the PowerCenter Server accessed the index or data files on the local disk.

Joiner_insertInDetailCache* Number of times the PowerCenter Server wrote to the detail cache. The PowerCenter Server generates this counter only if you join data from a single source.

Joiner_duplicaterows Number of duplicate rows the PowerCenter Server found in the master relation.

Joiner_duplicaterowsused Number of times the PowerCenter Server used the duplicate rows in the master relation.



Creating and Viewing Performance Details 439

If you have multiple source qualifiers and targets, evaluate them as a whole. For source qualifiers and targets, a high value is considered 80-100 percent. Low is considered 0-20 percent.

All Other Transformations

Transformation_inputrows Number of rows passed into the transformation.

Transformation_outputrows Number of rows sent out of the transformation.

Transformation_errorrows Number of rows in which the PowerCenter Server encountered an error.

*The PowerCenter Server generates this counter when you use sorted input for the Joiner transformation.**The PowerCenter Server generates this counter when you do not use sorted input for the Joiner transformation.




Tips

Reduce the size of the Time window.

When you reduce the size of the Time window, the Workflow Monitor refreshes the screen faster, reducing flicker.

Use the Repository Manager to truncate the list of workflow logs.

If the Workflow Monitor takes a long time to refresh from the repository or to open folders, truncate the list of workflow logs. When you configure a session or workflow to archive session logs or workflow logs, the PowerCenter Server saves those logs in local directories. The repository also creates an entry for each saved workflow log and session log. If you move or delete a session log or workflow log from the workflow log directory or session log directory, truncate the lists of workflow and session logs to remove the entries from the repository. The repository always retains the most recent workflow log entry for each workflow.

Tips 441


C h a p t e r 1 5

Using Multiple Servers


♦ Overview, 444

♦ Using Server Variables, 445

♦ Working with Server Grids, 446

♦ Configuring Server Grids, 450

443

Overview

You can register and run multiple PowerCenter Servers against a local or global repository. When you register multiple PowerCenter Servers to the same repository, you can distribute the workload across the servers to increase performance.

You have the following options to run workflows and sessions using multiple servers:

♦ Use a server grid to run workflows. You can use a server grid to automate the distribution of sessions. A server grid is a server object that distributes sessions in a workflow to servers based on server availability. The grid maintains connections to multiple servers in the grid. For more information about using server grids, see “Working with Server Grids” on page 446.

♦ Change the assigned server for a workflow. When you configure a workflow, you assign a server to run that workflow. Each time the scheduled workflow runs, it runs on the assigned server. You can change the assigned server for a workflow in the workflow properties.

♦ Change the assigned server for a session. When you configure a session, by default it runs on the server assigned to the workflow. You can change the assigned server for a session in the session properties.

♦ Start a workflow on a non-assigned server. By default, each workflow runs on its assigned PowerCenter Server. You can run a workflow on a non-assigned server if the workflow is not currently running. Use the Start Workflow button on the Standard toolbar, and choose a PowerCenter Server.

You can use the Workflow Monitor to monitor workflows running on multiple servers. For server grids, the Workflow Monitor shows the individual status of each server in a grid. You can identify the server grid that a server is assigned to by right-clicking the server in the Workflow Monitor and selecting Properties. For more information about using the Workflow Monitor, see “Monitoring Workflows” on page 401.

Tip: You might want to place the most CPU intensive sessions on the more powerful servers.

444 Chapter 15: Using Multiple Servers

Using Server Variables

In a multiple server environment, each server must have access to input files and directories used by the session it runs. You can use server variables to simplify the process of changing the server that runs a session or workflow. Server variables set the paths for files and caches created during a session.

If you override a server variable in a workflow or session, you may need to manually edit the session or workflow properties. If the new PowerCenter Server cannot locate the override directory, it cannot run the session.

Using a File ServerConsider setting up a central location or using a file server accessible to all the PowerCenter Servers. This allows you to run sessions on different servers without moving cache files and input files.

♦ Configure $PMRootDir for each server to point to the central location.

♦ Use the same variables on each machine.

If you do not use a central file server, you need to relocate input files to the default directories of the new PowerCenter Server. Input files can include parameter files, cache files, external procedures, and flat file sources.

Running Sessions with Cache FilesIn a multiple server environment, each PowerCenter Server needs access to the index and data cache files created during previous sessions. This can include incremental aggregation files and persistent lookup cache files. If the PowerCenter Server cannot locate the cache files, it rebuilds them.

When the PowerCenter Server rebuilds incremental aggregation files, it loses aggregate history. Use one of the following methods to save aggregate history in a multiple server environment:

♦ Use consistent server variables. Use the same variable for $PMCacheDir for each PowerCenter Server running incremental aggregation sessions.

♦ Run incremental aggregation sessions on the same machine. When you run large incremental aggregation sessions, you might want to consider assigning a server to a session and overriding the server variable to write to a drive local to the assigned PowerCenter Server.

♦ Move incremental aggregation files. If you cannot make files accessible to each PowerCenter Server, or if the files are very large, you must move them to the server running the session.

Note: Since aggregate files can become very large, make sure the directory can accommodate the necessary files.

Using Server Variables 445

Working with Server Grids

You can increase workflow performance by using a server grid to balance the server workload. When you create a server grid, you can add PowerCenter Servers to the grid. When you run a workflow against a PowerCenter Server in the grid, that server becomes the master server for the workflow. The master server runs all non-session tasks and assigns session tasks to run on other servers in the grid. The other servers become worker servers for that workflow run.

You can specify server grid distribution options at the server level, workflow level, and session level. PowerCenter Servers specified at the session level override both server level and workflow level properties. For more information about these overrides, see “Configuring Server Grids” on page 450.

Note: You cannot run a single session on multiple servers.

Distributing SessionsIn a server grid, the master server starts the workflow and then distributes sessions to worker servers. The master server is the server that starts a workflow. A worker server is a server that runs sessions assigned to it by a master server. By default, each PowerCenter Server in a server grid is both a master server and a worker server. This means that a server in a grid can distribute sessions to and receive sessions from every server in the grid. The master server distributes sessions that are ready to run to available worker servers in a round-robin fashion based on server availability. The starting point for the session assignment is random.

If a worker server is running the maximum number of concurrent sessions, the master server assigns another worker server to run the session. If all worker servers are running the maximum number of concurrent sessions, the master server places the session in its own ready queue.

For information about configuring the maximum number of concurrent sessions, see “Installing and Configuring the PowerCenter Server on Windows” and “Installing and Configuring the PowerCenter Server on UNIX” in the Installation and Configuration Guide.

Figure 15-1 shows how a master server distributes the sessions in Workflow1 among the servers in a grid. The server grid contains Server A, Server B, and Server C. Server A is the master server, and Server B and Server C are worker servers.

Figure 15-1. Distributing Sessions in a Server Grid

Server B

Server C

Server A

In Workflow1, Server A is the master server.

Server A


Figure 15-2 shows how a master server distributes sessions in a workflow where a non-session task exists. Server C is the master server, and Server A and Server B are worker servers. Server C runs all non-session tasks it encounters and assigns sessions in a round-robin fashion.

Server Grid ConnectivityPowerCenter Servers in a server grid create and maintain a connection to each other. A server grid contains information about other servers in the grid. When you start a PowerCenter Server, it fetches the server grid object and creates a TCP/IP connection to the other servers in the grid.

Each server in the grid monitors the other servers to check connectivity status. As a result, the grid notifies each server when you add, edit, or delete any server in the grid.

You can add servers to a server grid at any time. When a server starts up, it connects to the grid and can run sessions from master servers and distribute sessions to worker servers in the grid. The Workflow Monitor communicates with the master server to monitor progress of workflows, get session statistics, retrieve performance details, and stop or abort the workflow or task instances.

If a PowerCenter Server loses its connection to the grid, it tries to reestablish a connection. You do not need to restart the server for it to connect to the grid. If a PowerCenter Server is not connected to the server grid, the other PowerCenter Servers in the server grid do not send it tasks.

When a PowerCenter Server cannot reestablish a connection to the grid, session and workflow completion depends on factors such as shut down mode and which server loses connectivity.

Figure 15-2. Running a Non-session Task on the Master Server

Server C is the master server.

Server A

Server C Server B

Server C

Server C

Server A

Server B

Working with Server Grids 447

Table 15-1 lists scenarios where a server grid can lose connectivity:

Server Grid Guidelines and RequirementsInformatica recommends that each PowerCenter Server in a server grid uses the same operating system. While you can specify different session log directories, workflow log

Table 15-1. Losing Connectivity in a Server Grid

Connectivity Loss Server Behavior

Worker server shuts down unexpectedly or you shut it down before it receives a session.

The worker server is not available to the master servers in the server grid. Master servers do not assign a session to the unavailable worker server and proceed with the round-robin distribution of sessions.

Worker server shuts down unexpectedly while running a session.

The master server marks the status of the session as terminated. The worker server stops running all sessions. The session settings you specify determine if the workflow fails. For more information about the Fail parent if this task fails option, Fail parent if this task does not run option, or Disable this task option, see �Configuring Tasks� on page 135.

You shut down a worker server while it is running a session.

The shut down mode you specify determines how the worker server handles sessions when it shuts down. When you shut down the worker server in complete mode, it continues to run the sessions it started until it completes, but does not accept sessions from master servers. For more information about shut down modes, see �pmcmd Reference� on page 594.

Worker server loses its network connection and cannot connect to the server grid.

The worker server continues to run the session and writes its status to the session log. However, the master server marks the status of the session as terminated.You must resume the workflow or resume from the failed task to continue running the workflow and update the session status. If you do not need the session status of the previous run, you can restart the workflow or restart the workflow from a task to start up a new workflow run. For more information, see �Working with Tasks and Workflows� on page 416.

Master server shuts down unexpectedly.

Workflow fails. You must restart the workflow on another server or wait for the master server to become available.

You shut down the master server while running a workflow or session.

The shut down mode you specify determines how the master server handles workflows and sessions when it shuts down. When you shut down the master server in complete mode, it continues to run the workflows and sessions it started until they complete, but does not accept tasks from other master servers. For more information about shut down modes, see �pmcmd Reference� on page 594.

Master server loses its network connection and cannot connect to the server grid.

The master server continues to run workflows as a standalone PowerCenter Server. If a worker server is assigned to a session, the session fails because the master server cannot distribute the session to the worker server. The session settings you specify determine if the workflow fails. For more information about the Fail parent if this task fails option, Fail parent if this task does not run option, or Disable this task option, see �Configuring Tasks� on page 135.


directories, and temp directories for the PowerCenter Servers, each PowerCenter Server in a server grid must meet the following requirements:

♦ Register each PowerCenter Server to the same repository.

♦ Use the same database connectivity for each PowerCenter Server.

♦ Use the same server variables for each server in a grid, except for the $PMTempDir, $PMSessionLogDir, and $PMWorkflowLogDir variables.

♦ Use the same cache directory.

♦ Configure the following PowerCenter Server parameters the same:

− Fail session if maximum number of concurrent sessions is reached

− PMServer 4.0 date handling compatibility

− Aggregate treat null as zero

− Aggregate treat rows as insert

− Treat CHAR as CHAR on read

− Data Movement Mode

− Validate Data Code Pages

− Output Session Log In UTF8

− Export Session Log Lib Name

− Treat Null in comparison operator as

− Data Display Format

♦ PowerCenter Servers must be the same product version.

♦ DB2 EEE loader must be on the same machine as PowerCenter Server.

Working with Server Grids 449

Configuring Server Grids

When you work with server grids, you can configure properties in the grid, workflow, and session. When you run a session using a server grid, the server grid evaluates session properties first, then workflow properties, and then grid properties.

Configuring Server Grid PropertiesBy default, each PowerCenter Server you add to the server grid can be both a master server and a worker server. Each server accepts tasks from the grid. You can configure a server to be only a master server by clearing Accept task from Server Grid. A PowerCenter Server that is only a master server does not run sessions from other servers in the grid, but it can distribute sessions to other servers in the grid.

Configuring Workflow PropertiesWhen you configure a workflow, you can configure the following server properties:

♦ You can assign a server to run the workflow. When you assign a server to a workflow, the server becomes the master server for the workflow.

♦ You can configure the entire workflow to run only on the master server. By default, the master server distributes sessions to worker servers. You can configure the session to override this workflow configuration.

Configuring Session PropertiesYou can assign a server to run a session. When you assign a server to a session, you override workflow and grid server assignments. You might want to assign a server to sessions that use the following features:

♦ Caching. When you run sessions that access large cache files, such as incremental aggregation files, you can increase performance by using a drive local to the PowerCenter Server for the cache directory. Assign a server to a session and override the server variable to write to a drive local to the PowerCenter Server.

♦ External loader. Assign a server to run DB2 EEE external loader sessions. DB2 EEE loaders require that the loader process runs on the PowerCenter Server running the session.

Note: If you assign a server to a session that is not in the grid, and the master server cannot connect to the assigned server, the session fails.


Override ExamplesTable 15-2 shows a configuration where the session properties override the workflow properties. The session runs on Server B even though you select the workflow option to run all tasks on Server A because the session is assigned to Server B.

Table 15-3 shows a configuration where the session properties override the server grid properties. The session runs on Server B, even though you configure Server B not to accept tasks from the grid because you assigned the session to Server B.

Steps for Creating a Server GridUse the Server Grid Browser to create and edit server grids. When you create or edit a server grid, you can choose servers from the list of available servers. A server is available if it is registered in the same repository and is not part of another server grid. You can add up to 64 PowerCenter Servers in a grid.

Use the following procedure to create a server grid.

To create a server grid:

1. Choose Server-Server Grid.

The Server Grid Browser opens.

2. Click New.

Table 15-2. Override Workflow Properties

Level Configuration

Grid - Server A accepts tasks from server grid.- Server B accepts tasks from server grid.

Workflow - Run on Server A. - Tasks must run on server.

Session Run on Server B.

Table 15-3. Override Server Grid Properties

Level Configuration

Grid - Server A accepts tasks from server grid.- Server B does not accept tasks from server grid.

Workflow - Run on Server A. - Tasks can run on other servers in the grid.

Session Run on Server B.

Configuring Server Grids 451

The Server Grid Editor opens with a list of available PowerCenter Servers.

3. Enter a server grid name and description.

4. Select the server you want to include in the server grid, and click Add.

The selected server appears in Selected Servers column.

5. Clear Accept tasks from Server Grid if you want the server to be only a master server.

6. Repeat steps 4 and 5 until you have chosen all the servers for the grid.

Configure as both a master and worker server.


7. Click OK.

The server grid name appears in the Server Grid Browser. Select Show servers in grid to view the servers in the grid.

8. Click Close.

Configuring Server Grids 453


C h a p t e r 1 6

Log Files


♦ Overview, 456

♦ Workflow Logs, 457

♦ Session Logs, 463

♦ Reject Files, 476

455

Overview

The PowerCenter Server can create log files for each workflow it runs. These files contain information about the tasks the PowerCenter Server performs, plus statistics about the workflow and all sessions in the workflow. If the writer or target database rejects data during a session run, the PowerCenter Server creates a file that contains the rejected rows.

The PowerCenter Server can create the following types of log files:

♦ Workflow log. Contains information about the workflow run such as workflow name, tasks executed, and workflow errors. By default, the PowerCenter Server writes this information to the server log or Windows Event Log, depending on how you configure the PowerCenter Server. If you wish to create a workflow log, enter a workflow file name in the workflow properties. For more information, see “Workflow Logs” on page 457.

♦ Session log. Contains information about the tasks that the PowerCenter Server performs during a session, plus load summary and transformation statistics. By default, the PowerCenter Server creates one session log for each session it runs. If a workflow contains multiple sessions, the PowerCenter Server creates a separate session log for each session in the workflow. For more information, see “Session Logs” on page 463.

♦ Reject file. Contains rows rejected by the writer or target file during a session run. If the writer or target does not reject any data during a session, the PowerCenter Server does not generate a reject file for that session. For more information, see “Reject Files” on page 476.

By default, the PowerCenter Server saves each type of log file in its own directory. The PowerCenter Server represents these directories using server variables.

Table 16-1 shows the default location for each type of log file:

You can change the default directories at the server level by editing the server connection in the Workflow Manager. You can also override these values for individual workflows or sessions by updating the workflow or session properties.

Table 16-1. Log File Default Locations

Log File Type Default Directory (Server Variable) Value

Workflow logs $PMWorkflowLogDir $PMRootDir/WorkflowLogs

Session logs $PMSessionLogDir $PMRootDir/SessLogs

Reject files $PMBadFileDir $PMRootDir/BadFiles

456 Chapter 16: Log Files

Workflow Logs

You can configure a workflow to create a workflow log. When you do this, the PowerCenter Server writes information such as process initialization, workflow task run information, errors encountered, and workflow run summary to the workflow log.

In general, a workflow log contains the following information about the workflow:

♦ Workflow name

♦ Workflow status

♦ Status of tasks and worklets in the workflow

♦ Start and end times for tasks and worklets

♦ Results of link conditions

♦ Some session messages and errors

♦ Errors encountered during the workflow

The PowerCenter Server categorizes workflow log error messages into severity levels. The PowerCenter Server either writes or does not write an error message to the log file based on the error severity level. You can set the Error Severity Level for Log Files in the PowerCenter Server setup program. For more information, see “Installing and Configuring the PowerCenter Server on Windows” or “Installing and Configuring the PowerCenter Server on UNIX” in the Installation and Configuration Guide. You can also configure the PowerCenter Server to suppress writing messages to the workflow log file completely.

As with PowerCenter Server logs and session logs, the PowerCenter Server enters a code number into the workflow log file message along with message text. You can find information on error messages in the Troubleshooting Guide.

You configure a workflow to create a workflow log by entering a workflow log file name in the workflow properties. If you choose to create a workflow log, the PowerCenter Server saves the workflow log in a directory entered for the server variable $PMWorkflowLogDir in the PowerCenter Server registration. You can override the workflow log directory at the server level or at the workflow level.

By default, the PowerCenter Server saves one workflow log for each workflow. If you want to save multiple logs for different workflow runs, you can configure the workflow to save a workflow log file by timestamp, which permits an unlimited number of workflow logs, or by run, which saves a specified number of logs. To view previous workflow logs, save log files by timestamp.

If you choose not to create workflow logs, the PowerCenter Server writes the workflow log messages to the to the server log or Windows Event Log, depending on how you configure the PowerCenter Server. For more information on configuring the PowerCenter Server, see “Installing and Configuring the PowerCenter Server on Windows” or “Installing and Configuring the PowerCenter Server on UNIX” in the Installation and Configuration Guide.

Workflow Logs 457

Workflow Log MessagesThe PowerCenter Server precedes each message in the log file with a code and number. It also precedes some messages with a timestamp. The code defines a group of messages for a specific process. The number defines a specific message. The message can provide general information or it can be an error message.

You can configure the PowerCenter Server to append a time stamp to every message it writes to the workflow log. To do this, enable the Time Stamp Workflow Log option in the PowerCenter Server setup program. For more information, see “Installing and Configuring the PowerCenter Server on Windows” or “Installing and Configuring the PowerCenter Server on UNIX” in the Installation and Configuration Guide.

Workflow Log CodesYou can use the workflow log to determine the cause of workflow problems. To resolve workflow problems, locate the relevant log file codes and text prefixes in the workflow log, then see the Troubleshooting Guide for details. You can find workflow-related server messages in the UNIX server log (default name: pmserver.log) or in the Windows Event Log (viewed with the Event Viewer).

Table 16-2 describes the codes that can appear in workflow logs:

Workflow Log SampleThe following sample is a workflow log from a simple workflow that shows log file codes:

INFO : LM_36315 [Tue Nov 18 11:16:38 2003] : (270|305) Starting execution of workflow [wf_PhoneList].

INFO : LM_36330 [Tue Nov 18 11:16:38 2003] : (270|305) Starting execution of start instance [StartWorkflow].

INFO : LM_36333 [Tue Nov 18 11:16:38 2003] : (270|305) Execution of start instance [StartWorkflow] succeeded.

INFO : LM_36505 : (270|305) Link [StartWorkflow --> s_PhoneList]: empty expression string, evaluated to TRUE.

INFO : LM_36330 [Tue Nov 18 11:16:38 2003] : (270|305) Starting execution of session instance [s_PhoneList].

Table 16-2. Workflow Log Codes

Error Code Description

CMN Messages related to databases, memory allocation, Lookup and Joiner transformations, and internal errors.

LM Messages related to the Load Manager.

REP Messages related to repository functions.

TM Messages related to Data Transformation Manager (DTM).

VAR Messages related to mapping variables.


INFO : LM_36522 : (270|305) Started DTM process [pid = 273] for session instance [s_PhoneList].

INFO : CMN_1760 : (273|255) Message from session: LM_36033 [Connected to repository [SALES] running on server:port [monster]:[5001] user [Administrator]].

INFO : CMN_1760 : (273|255) Message from session: TM_6228 [Writing session output to log file [d:\pcserver\SessLogs\s_PhoneList.log].].

INFO : LM_36333 [Tue Nov 18 11:16:43 2003] : (270|306) Execution of session instance [s_PhoneList] succeeded.

INFO : LM_36318 [Tue Nov 18 11:16:43 2003] : (270|306) Execution of workflow [wf_PhoneList] succeeded.

Configuring Workflow LogsYou can configure workflow log options in the workflow properties. You can configure the following information for a workflow log:

♦ Location. You can configure the directory where you want the workflow log created. By default, the PowerCenter Server creates the workflow log in the directory configured for the $PMWorkflowLogDir server variable. You can enter a different directory, but if the directory does not exist or is not local to the PowerCenter Server that runs the workflow, the workflow fails.

♦ Name. If you wish to create a workflow log, you can enter a name for the workflow log file. If you do not enter a filename, the PowerCenter Server does not create a workflow log. Instead, the PowerCenter Server writes workflow log messages to the Windows Event Log or UNIX server log.

♦ Archive. You can configure the number of workflow logs you want the PowerCenter Server to archive for each workflow. By default, the PowerCenter Server does not archive workflow logs.

Archiving Workflow LogsBy default, the PowerCenter Server does not save multiple logs for a single workflow. It creates one workflow log for each workflow and overwrites the existing log with the latest workflow log.

If you wish to save multiple logs for a workflow, you can configure the PowerCenter Server to do this. The PowerCenter Server can save workflow logs in two ways:

♦ Save a selected number of logs

♦ Save all logs by timestamp

If you configure the workflow to save a specific number of workflow logs, it names the most recent log filename.log. It then cycles through a closed naming sequence for historical logs as follows: filename.log.0, filename.log.1, filename.log.2, …, filename.log.n-1, where n represents the number of workflow logs. Because the PowerCenter Server cycles through the numeric naming sequence, check the workflow log file timestamp to determine the chronological order of those files.

Workflow Logs 459

Instead of entering a specific number of workflow logs to save, you can use the server variable $PMWorkflowLogCount. When you use $PMWorkflowLogCount server variable, the PowerCenter Server archives the number of workflow logs configured for the server variable. If you use $PMWorkflowLogCount for all workflows, you can increase the number of archived workflow logs for all workflows by changing the server variable.

Note: By default, $PMWorkflowLogCount is set to 0. To archive workflow logs using $PMWorkflowLogCount, configure it for a larger number of workflow logs. For details on configuring server variables, see “Registering the PowerCenter Server” on page 46.

You can also save all workflow logs by configuring a workflow to save logs by timestamp. When timestamping workflow logs, the PowerCenter Server appends the year, month, day, hour, and minute of the workflow completion to the log file. The resulting log file name is filename.log.yyyymmddhhmi, where:

♦ yyyy = year

♦ mm = month, ranging from 1-12

♦ dd = day, ranging from 1-31

♦ hh = hour, ranging from 0-23

♦ mi = minute, ranging from 0-59

To prevent filling the workflow log directory, periodically delete or backup log files when using the timestamp option.

Note: You can also truncate workflow and session log entries from the repository. For more information, see “Using the Repository Manager” in the Repository Guide.

Steps for Configuring Workflow LogsYou can configure workflow log information on the Properties tab of the workflow properties.

To configure workflow log information:

1. In the Workflow Manager, open the workflow properties.


2. Select the Properties tab.

3. Enter the following workflow log options:

Option Name Description

Parameter File Name Designates the name and directory for the parameter file. Use the parameter file to define workflow parameters. For details on parameter files, see �Parameter Files� on page 511.

Workflow Log File Name Optionally enter a file name, or a file name and directory.If you leave this field blank, the PowerCenter Server does not create a workflow log. Instead, the PowerCenter Server writes workflow log messages to the server log or Windows Event Log, depending on how you configure the PowerCenter Server.If you fill in this field, the PowerCenter Server appends information in this field to that entered in the Workflow Log File Directory field. For example, if you have "C:\workflow_logs\" in the Workflow Log File Directory field, then enter "logname.txt" in the Workflow Log File Name field, the PowerCenter Server writes logname.txt to the C:\workflow_logs\ directory.

Workflow Log File Directory Designates a location for the workflow log file. By default, the PowerCenter Server writes the log file in the server variable directory, $PMWorkflowLogDir.If you enter a full directory and file name in the Workflow Log File Name field, clear this field.

Workflow Logs 461

4. Click OK to save the workflow.

Viewing Workflow LogsWorkflow logs are text files that you can open with any text editor. The PowerCenter Server saves workflow logs in the directory you specify in the Workflow Log File Directory field in the workflow properties.

You can also view workflow logs through the Workflow Monitor. When you do this, the Workflow Manager creates a temporary file that stores the workflow log. You can view the temporary file through the Workflow Monitor.

The PowerCenter Server generates the workflow log based on the PowerCenter Server code page. You can specify the language in which you want to view the workflow log based on the locale of the machine hosting the PowerCenter Server.

To use the Workflow Monitor to view the most recent workflow log:

1. In the Navigator window, connect to the server on which the workflow runs.

2. Open the folder that contains the workflow.

3. Right-click the workflow and choose Get Workflow Log.

If you save workflow logs by timestamp, you can also use the Workflow Monitor to view past workflow logs. To do this, right click the workflow in the Gantt chart view and choose Get Workflow Log.

For more information about the Workflow Monitor, see “Using the Workflow Monitor” on page 404.

Save Workflow Log By If you select Save Workflow Log by Timestamp, the PowerCenter Server saves all workflow logs, appending a timestamp to each log.If you select Save Workflow Log by Runs, the PowerCenter Server saves a designated number of workflow logs. Configure the number of workflow logs in the Save Workflow Log for These Runs option.For details on these options, see �Archiving Workflow Logs� on page 459.You can also use the $PMWorkflowLogCount server variable to save the configured number of workflow logs for the PowerCenter Server.

Save Workflow Log for These Runs

The number of historical workflow logs you want the PowerCenter Server to save.The Informatica saves the number of historical logs you specify, plus the most recent workflow log. Therefore, if you specify 5 runs, the PowerCenter Server saves the most recent workflow log, plus historical logs 0 to 4, for a total of 6 logs.You can specify up to 2,147,483,647 historical logs. If you specify 0 logs, the PowerCenter Server saves only the most recent workflow log.



Session Logs

The session log file contains information about all tasks the PowerCenter Server performs, plus the load summary and transformation statistics. The amount of detail in the session log depends on the tracing level that you set. You can define the tracing level for each transformation or for the entire session. The session-level tracing overrides any transformation-level tracing levels.

In general, the session log contains the following information about the session:

♦ Allocation of system shared memory

♦ Execution of pre-session commands

♦ Creation of SQL commands for reader and writer threads

♦ Start and end times for target loading

♦ Errors encountered during session

♦ Execution of post-session commands

♦ Load summary of reader, writer, and Data Transformation Manager (DTM) statistics

By default, the PowerCenter Server saves session logs in the directory for the PowerCenter Server variable $PMSessionLogDir, which you define in the Workflow Manager. The default name for the session log is s_mapping name.log. You can override the session log name and location in the session properties.

The PowerCenter Server does not archive session logs by default. Instead, it creates one log for each session and overwrites the existing log with the latest session log. However, you can configure the session to archive session logs. For more information, see “Archiving Session Logs” on page 471.

By default, the PowerCenter Server generates session log files based on the PowerCenter Server code page. However, if you enable the Output Session Log in UTF-8 option on the Configuration tab of the PowerCenter Server setup program, the PowerCenter Server writes to the session log using the UTF-8 character set.

Note: By default, the PowerCenter Server writes row errors to the session log. However, if you enable row error logging in the sessions properties, the PowerCenter Server does not write dropped rows to the session log. When you enable row error logging, you can configure the PowerCenter Server to write row errors to the session log in addition to the row error log by enabling verbose data tracing.

Session Log MessagesThe PowerCenter Server precedes each message in the log file with a thread identification and then a code and number. The code defines a group of messages for a specific process. The number defines a specific message. The message can provide general information or it can be an error message.

Session Logs 463

You can configure the PowerCenter Server to write session log messages to an external library as well as to the session log. To do this, you can set the Export Session Log Lib Name in the PowerCenter Server setup program. For more information, see “Installing and Configuring the PowerCenter Server on Windows” or “Installing and Configuring the PowerCenter Server on UNIX” in the Installation and Configuration Guide.

Session Log CodesYou can use the session log to determine the cause of session problems. To resolve session problems, locate the relevant log file codes and text prefixes in the session log, then see the Troubleshooting Guide for details. You can find session-related server messages in the UNIX server log (default name: pmserver.log) or in the Windows Event Log (viewed with the Event Viewer).

Table 16-3 describes the codes that can appear in session logs:

Table 16-3. Session Log Codes

Message Code Description

BLKR Messages related to reader process, including Application, relational, or flat file.

CNX Messages related to the Repository Agent connections.

CMN Messages related to databases, memory allocation, Lookup and Joiner transformations, and internal errors.

DBG Messages related to PowerCenter Server loading and debugging.

DBGR Messages related to the Debugger.

EP Messages related to external procedures.

ES Messages related to the Repository Server.

FR Messages related to file sources.

FTP Messages related to File Transfer Protocol operations.

HIER Messages related to reading XML sources.

LM Messages related to the Load Manager.

NTSERV Messages related to Windows server operations.

OBJM Messages related to the Repository Agent.

ODL Messages related to database functions.

PETL Messages related to pipeline partitioning.

PMF Messages related to caching Aggregator, Rank, Joiner, or Lookup transformations.

RAPP Messages related to the Repository Agent.

REP Messages related to repository functions.

RR Messages related to relational sources.

SF Messages related to server framework, used by Load Manager and Repository Server.


Thread IdentificationThe thread identification consists of the thread type and a series of numbers separated by underscores. The numbers following a thread name indicate the following information:

♦ Target load order group number

♦ Partition point number

♦ Partition number

Note: The PowerCenter Server writes an asterisk (*) as the partition point number for writer threads.

The PowerCenter Server prints the thread identification before the log file code and the message text in the session log. The following example illustrates a reader thread from target load order group one, concurrent source set one, source pipeline one, and partition one:

READER_1_1_1> DBG_21438 Reader: Source is [p152636], user [jennie]

For more information on partitioning, see “Pipeline Partitioning” on page 345.

When you configure the PowerCenter Server to read Joiner transformation sources sequentially, the PowerCenter Server writes numbers with the following information after the thread name:

♦ Target load order group number

♦ Concurrent source set number

♦ Partition point number

♦ Partition number

A concurrent source set is the group of sources in a target load order group the PowerCenter Server reads concurrently. A target load order group might contain multiple concurrent source sets if it contains a Joiner transformation and you configure the PowerCenter Server to read Joiner transformation sources sequentially.

SORT Messages related to the Sorter transformation.

TE Messages related to transformations.

TM Messages related to Data Transformation Manager (DTM).

TT Messages related to transformations.

VAR Messages related to mapping variables.

WRT Messages related to the Writer.

XMLR Messages related to the XML Reader.

XMLW Messages related to the XML Writer.

Table 16-3. Session Log Codes

Message Code Description

Session Logs 465

Enable the PMServer 6.X Joiner source order compatibility PowerCenter Server option to configure it to read Joiner transformation sources sequentially.

Session Log SampleThe following sample is an excerpt from a session log file that illustrates log file codes and thread identifications:

TM_6703 Session [s_m_SampleSessionLog] is run by PowerCenter Server [sarao].

MASTER> CMN_1688 Allocated [12000000] bytes from process memory for [DTM Buffer Pool].

MASTER> PETL_24000 Parallel Pipeline Engine initializing.

MASTER> PETL_24001 Parallel Pipeline Engine running.

MASTER> PETL_24003 Initializing session run.

MAPPING> TM_6014 Initializing session [s_m_SampleSessionLog] at [Tue Aug 03 11:29:57 2004]

.

.

.

*****START LOAD SESSION*****

Load Start Time: Tue Aug 03 11:30:00 2004

Target tables:

Emp_target

READER_1_1_1> BLKR_16019 Read [1] rows, read [0] error rows for source table [EMP_SRC] instance name [EMP_SRC]

READER_1_1_1> BLKR_16008 Reader run completed.

TRANSF_1_1_1> DBG_21216 Finished transformations for Source Qualifier [SQ_EMP_SRC]. Total errors [0]

WRITER_1_*_1> WRT_8167 Start loading table [Emp_target] at: Tue Aug 03 11:30:00 2004

.

MASTER> PETL_24002 Parallel Pipeline Engine finished.

MASTER> PETL_24012 Session run completed successfully.


Some messages are embedded within other messages. For example, a code CMN_1039 contains informational messages from the Microsoft SQL Server as it changes to the source database to be used in the session.

Note: If you configure the PowerCenter Server to run in ASCII mode, the session log file reports the sort order as Binary, even if you select a different sort order in the session properties.

Load SummaryThe session log includes a load summary that reports the number of rows inserted, updated, deleted, and rejected for each target as of the last commit point. The PowerCenter Server reports the load summary for each session by default. However, you can set tracing level to Verbose Initialization or Verbose Data to report the load summary for each transformation.

The following sample is an excerpt from a load summary:

*****START LOAD SESSION*****

Load Start Time: Tue Aug 03 11:30:00 2004

Target tables:

Emp_target

Commit on end-of-data Aug 03 11:30:07 2004

===================================================

WRT_8036 Target: Emp_target (Instance Name: [Emp_target])


WRITER_1_*_1> WRT_8035 Load complete time: Tue Aug 03 11:30:07 2004

LOAD SUMMARY

============

Session Logs 467

WRT_8036 Target: Emp_target (Instance Name: [Emp_target])


.

.

,

WRITER_1_*_1> WRT_8043 *****END LOAD SESSION*****

The PowerCenter Server reports statistics for each of the following operations performed on the target:

♦ Inserted. Shows the number of rows the PowerCenter Server marked for insert into the target. The number of affected rows cannot be larger than requested for this operation.

♦ Updated. Shows the number of rows the PowerCenter Server marked for update in the target. The number of affected rows can be different from the number of requested rows. For example, you have a table with one column called SALES_ID and five rows containing the values: 1, 2, 3, 2, and 2. You mark rows for update where SALES_ID is 2. The writer affects three rows, even though there was only one update request. Or, if you mark rows for update where SALES_ID is 4, the writer affects 0 rows.

♦ Deleted. Shows the number of rows the PowerCenter Server marked to remove from the target. The number of affected rows can be different from the number of requested rows.

♦ Rejected. Shows the number of rows the PowerCenter Server rejected during the writing process. These rows cannot be applied to the target. For the Rejected rows category, the number of affected and applied rows is always zero since these rows are not written to the target.

The load summary provides the following statistics:

♦ Requested rows. Shows the number of rows the writer actually received for the specified operation.

♦ Applied rows. Shows the number of rows the writer successfully applied to the target (that is, the target returned no errors).

♦ Affected rows. Shows the number of rows affected by the specified operation. Depending on the operation, the number of affected rows can be different from the number of requested rows. For example, you have a table with one column called SALES_ID and five rows containing the values: 1, 2, 3, 2, and 2. You mark rows for update where SALES_ID is 2. The writer affects three rows, even though there was only one update request. Or, if you mark rows for update where SALES_ID is 4, the writer affects 0 rows.

♦ Rejected rows. Shows the number of rows the writer could not apply to the target. For example, the target database rejects a row if the PowerCenter Server attempts to insert NULL into a not-null field. The PowerCenter Server writes all rejected rows to the session reject file, or to the row error log, depending on how you configure the session.


♦ Mutated from update. Shows the number of rows originally flagged for update that are instead inserted into the target when the session is configured Update Else Insert.

If the number of rows requested, applied, rejected, and affected are all zero for any of these four operations, the operation does not appear as a line in the load summary. If no data is passed to the target, the writer reports the following message:

No data loaded for this target.

Detailed Transformation StatisticsThe DTM enables transformation statistics in the session log for two levels of tracing, Verbose Initialization and Verbose Data. Transformation statistics appear after the load summary in the log file.

The PowerCenter Server reports the following details for each transformation in the mapping:

♦ The name of the transformation

♦ The number of input rows and the name of the input source

♦ The number of output rows and the name of the output transformation or target

♦ The number of rows dropped

The following sample is an excerpt from the transformation statistics in a session log file:

DETAILED TRANSFORMATION ROW STATISTICS

for DSQ [SQ_EMPLOYEES], Partition[1]

---------------------------------

MAPPING>

MAPPING> TT_11031 Transformation [SQ_EMPLOYEES]:

MAPPING> TT_11035 Input - 12 (__READER__)

MAPPING> TT_11037 [T_EMPLOYEES]: Output - 12, Dropped - 0

MAPPING>

.

.

.

Configuring Session LogsConfigure session log options in the session properties. You can configure the following information for a session log:

♦ Location. You can configure the directory where you want the session log created. By default, the PowerCenter Server creates the session log in the directory configured for the $PMSessionLogDir server variable. You can enter a different directory, but if the directory does not exist or is not local to the PowerCenter Server that runs the session, the session fails.

Session Logs 469

♦ Name. You can name the session log or accept the default name. The default name for the session log is s_mapping name.log.

♦ Archive. You can configure the number of session logs you want the PowerCenter Server to archive for each session. By default, the PowerCenter Server does not archive session logs.

♦ Tracing levels. You can control the type of information the PowerCenter Server includes in the session log by setting a tracing level for the session. By default, the PowerCenter Server uses tracing levels configured in the mapping.

Configuring Session Log Locations and FilenamesYou can configure the name and location of the session log on the Properties tab of the session properties.

To configure session log information:


2. Select the General Options settings on the Properties tab.

Session Log Filename and Directory


3. Enter the following session log options:

4. Click OK to save the session.

Archiving Session LogsYou can archive session logs on a session-by-session basis. The PowerCenter Server can save session logs in the following ways:

♦ Save a selected number of logs

♦ Save all logs by timestamp

By default, the PowerCenter Server does not archive session logs. It creates one session log for each session and overwrites the existing log with the latest session log.

If you configure the session to save a specific number of session logs, it names the most recent log s_mapping name.log. It then cycles through a closed naming sequence for historical logs as follows: s_mapping name.log.0, s_mapping name.log.1, s_mapping name.log.2, …, s_mapping name.log.n-1, where n is the number of session logs. Because the PowerCenter Server cycles through the numeric naming sequence, check the session log file timestamp to determine the chronological order of those files.

Instead of entering a specific number of session logs to save, you can use the server variable $PMSessionLogCount. When you use $PMSessionLogCount server variable, the PowerCenter Server archives the number of session logs configured for the server variable. If you use $PMSessionLogCount for all sessions, you can increase the number of archived session logs for all sessions by changing the server variable.

Note: By default, $PMSessionLogCount is set to 0. To archive session logs using $PMSessionLogCount, configure it for a larger number of session logs. For details on configuring server variables, see “Registering the PowerCenter Server” in the Installation and Configuration Guide.


Session Log File Name

By default, the PowerCenter Server uses the session name for the log file name: s_mapping name.log. For a debug session, it uses DebugSession_mapping name.log.Optionally enter a file name, a file name and directory, or use the $PMSessionLogFile session parameter. The PowerCenter Server appends information in this field to that entered in the Session Log File Directory field. For example, if you have �C:\session_logs\� in the Session Log File Directory field, then enter �logname.txt� in the Session Log File field, the PowerCenter Server writes the logname.txt to the C:\session_logs\ directory.You can also use the $PMSessionLogFile session parameter to represent the name of the session log or the name and location of the session log. For details on session parameters, see �Session Parameters� on page 495.

Session Log File Directory

Location of the log file. Enter a valid directory local to the PowerCenter Server. By default, the PowerCenter Server creates session logs in the directory configured for the $PMSessionLogDir server variable.

Session Logs 471

You can also save all session logs by configuring a session to save logs by timestamp. When timestamping session logs, the PowerCenter Server appends the month, day, hour, and minute of the session completion to the log file. The resulting log file name is s_mapping name.log.yyyymmddhhmi, where:

♦ yyyy = year

♦ mm = month, ranging from 1-12

♦ dd = day, ranging from 1-31

♦ hh = hour, ranging from 0-23

♦ mi = minute, ranging from 0-59

To prevent filling the session log directory, periodically delete or backup log files when using the timestamp option.

Note: You can also truncate workflow and session log entries from the repository. For more information, see “Using the Repository Manager” in the Repository Guide.

To specify archiving information:


2. Select the Log Options settings on the Config Object tab.

Log Options Settings


3. Enter the following session log options:


Setting Tracing LevelsThe amount of detail in the session log depends on the tracing level that you set. You can define tracing levels for each transformation or for the entire session. By default, the PowerCenter Server uses tracing levels configured in the mapping.

Setting a tracing level for the session overrides the tracing levels configured for each transformation in the mapping. If you select a normal tracing level or higher, the PowerCenter Server writes row errors into the session log, including the transformation in which the error occurred and complete row data. If you configure the session for row error logging, the PowerCenter Server writes row errors to the error log instead of the session log. If you want the PowerCenter Server to write dropped rows to the session log as well, configure the session with Verbose Data tracing level.

Table 16-4 describes the session log tracing levels:


Save Session Log By If you select Save Session Log by Timestamp, the PowerCenter Server saves all session logs, appending a timestamp to each log.If you select Save Session Log by Runs, the PowerCenter Server saves a designated number of session logs. Configure the number of sessions in the Save Session Log for These Runs option.You can also use the $PMSessionLogCount server variable to save the configured number of session logs for the PowerCenter Server.

Save Session Log for These Runs

The number of historical session logs you want the PowerCenter Server to save.The Informatica saves the number of historical logs you specify, plus the most recent session log. Therefore, if you specify 5 runs, the PowerCenter Server saves the most recent session log, plus historical logs 0 to 4, for a total of 6 logs.You can specify up to 2,147,483,647 historical logs. If you specify 0 logs, the PowerCenter Server saves only the most recent session log.

Table 16-4. Session Log Tracing Levels

Tracing Level Description

None The PowerCenter Server uses the tracing level set in the mapping.

Terse PowerCenter Server logs initialization information as well as error messages and notification of rejected data.

Normal PowerCenter Server logs initialization and status information, errors encountered, and skipped rows due to transformation row errors. Summarizes session results, but not at the level of individual rows.

Session Logs 473

You can also enter tracing levels for individual transformations in the mapping. When you enter a tracing level in the session properties, you override tracing levels configured for transformations in the mapping.

To set the tracing level:

1. Select the Error Handling settings on the Config Object tab.

2. Select a tracing level from the Override Tracing list. Table 16-4 on page 473 describes the session log tracing levels.


Viewing Session LogsSession logs are text files that you can open with any text editor. The PowerCenter Server saves session logs in the directory you specify in the Session Log File Directory field in the session properties.

Verbose Initialization

In addition to normal tracing, PowerCenter Server logs additional initialization details, names of index and data files used, and detailed transformation statistics.

Verbose Data In addition to verbose initialization tracing, PowerCenter Server logs each row that passes into the mapping. Also notes where the PowerCenter Server truncates string data to fit the precision of a column and provides detailed transformation statistics.When you configure the tracing level to verbose data, the PowerCenter Server writes row data for all rows in a block when it processes a transformation.

Table 16-4. Session Log Tracing Levels

Tracing Level Description

Tracing Level


You can also view session logs through the Workflow Monitor. When you do this, the Workflow Monitor creates a temporary file that stores the session log. You can view the temporary file through the Workflow Monitor.

If a session fails, you can still view the session log file.

The PowerCenter Server generates the session log based on the PowerCenter Server code page. You can specify the language in which you want to view the session log based on the locale of the machine hosting the PowerCenter Server.

To use the Workflow Monitor to view the most recent session log:

1. In the Navigator window, connect to the server on which the workflow runs.

2. Open the folder that contains the workflow.

3. Open the workflow that contains the session whose log you wish to view.

4. Right-click the session and choose Get Session Log.

If you save session logs by timestamp, you can also use the Workflow Monitor to view past session logs. To do this, right-click the session in the Gantt chart view and choose Get Session Log.

For more information about the Workflow Monitor, see “Using the Workflow Monitor” on page 404.

Session Logs 475

Reject Files

During a session, the PowerCenter Server creates a reject file for each target instance in the mapping. If the writer or the target rejects data, the PowerCenter Server writes the rejected row into the reject file. The reject file and session log contain information that helps you determine the cause of the reject.

Each time you run a session, the PowerCenter Server appends rejected data to the reject file. Depending on the source of the problem, you can correct the mapping and target database to prevent rejects in subsequent sessions.

Note: If you enable row error logging in the session properties, the PowerCenter Server does not create a reject file. It writes the reject rows to the row error tables or file.

Locating Reject FilesThe PowerCenter Server creates reject files for each target instance in the mapping. It creates reject files in the session reject file directory, as configured on the Properties settings of the Targets node on the Mapping tab (Transformation view). By default, the PowerCenter Server creates reject files in the $PMBadFileDir server variable directory.

The PowerCenter Server names reject files after the target instance name. The default name for reject files is target instance partition number.bad. You can view or edit reject file names in the session properties. The Workflow Manager replaces slash characters in the target instance name with underscore characters.

To find the location and name of the reject files, view the properties settings of the Targets node on the Mapping tab (Transformation view).


Figure 16-1 shows the properties settings on the Mapping tab:

When you run a session that contains multiple partitions, the PowerCenter Server creates a separate reject file for each partition.

Reading Reject FilesAfter you locate a reject file, you can read it using a text editor that supports the reject file code page. Reject files contain rows of data rejected by the writer or the target database. Though the PowerCenter Server writes the entire row in the reject file, the problem generally centers on one column within the row. To help you determine which column caused the row to be rejected, the PowerCenter Server adds row and column indicators to give you more information about each column:

♦ Row indicator. The first column in each row of the reject file is the row indicator. The numeric indicator tells whether the row was marked for insert, update, delete, or reject.

If the session is a user-defined commit session, the row indicator might tell whether the transaction was rolled back due to a non-fatal error or if the committed transaction was in a failed target connection group. For more information about user-defined commit sessions and rejected rows, see “User-Defined Commits” on page 283.

♦ Column indicator. Column indicators appear after every column of data. The alphabetical character indicators tell whether the data was valid, overflow, null, or truncated.

The following sample reject file shows the row and column indicators:

0,D,1921,D,Nelson,D,William,D,415-541-5145,D

0,D,1922,D,Page,D,Ian,D,415-541-5145,D

Figure 16-1. Properties Settings on the Mapping Tab

Reject file directory and filename

Reject Files 477

0,D,1923,D,Osborne,D,Lyle,D,415-541-5145,D

0,D,1928,D,De Souza,D,Leo,D,415-541-5145,D

0,D,2001,D,S. MacDonald,D,Ira,D,415-541-5145,D

Row IndicatorsThe first column in the reject file is the row indicator. The number listed as the row indicator tells the writer what to do with the row of data.

Table 16-5 describes the row indicators in a reject file:

If a row indicator is 3, the writer rejected the row because an update strategy expression marked it for reject.

If a row indicator is 0, 1, or 2, either the writer or the target database rejected the row. To narrow down the reason why rows marked 0, 1, or 2 were rejected, review the column indicators and consult the session log.

Column IndicatorsAfter the row indicator is a column indicator, followed by the first column of data, and another column indicator. Column indicators appear after every column of data and define the type of the data preceding it.

Table 16-5. Row Indicators in Reject File

Row Indicator Meaning Rejected By

0 Insert Writer or target

1 Update Writer or target

2 Delete Writer or target

3 Reject Writer

4 Rolled-back insert Writer

5 Rolled-back update Writer

6 Rolled-back delete Writer

7 Committed insert Writer

8 Committed update Writer

9 Committed delete Writer


Table 16-6 describes the column indicators in a reject file:

Null columns appear in the reject file with commas marking their column. An example of a null column surrounded by good data appears as follows:

5,D,,N,5,D

Because either the writer or target database can reject a row, and because they can reject the row for a number of reasons, you need to evaluate the row carefully and consult the session log to determine the cause for reject.

Table 16-6. Column Indicators in Reject File

Column Indicator Type of data Writer Treats As

D Valid data. Good data. Writer passes it to the target database. The target accepts it unless a database error occurs, such as finding a duplicate key.

O Overflow. Numeric data exceeded the specified precision or scale for the column.

Bad data, if you configured the mapping target to reject overflow or truncated data.

N Null. The column contains a null value. Good data. Writer passes it to the target, which rejects it if the target database does not accept null values.

T Truncated. String data exceeded a specified precision for the column, so the PowerCenter Server truncated it.

Bad data, if you configured the mapping target to reject overflow or truncated data.

Reject Files 479


C h a p t e r 1 7

Row Error Logging

This chapter includes the following topics:

♦ Overview, 482

♦ Understanding the Error Log Tables, 483

♦ Understanding the Error Log File, 489

♦ Configuring Error Log Options, 493

481

Overview

When you configure a session, you can choose to log row errors in a central location. When a row error occurs, the PowerCenter Server logs error information that allows you to determine the cause and source of the error. The PowerCenter Server logs information such as source name, row ID, current row data, transformation, timestamp, error code, error message, repository name, folder name, session name, and mapping information.

You can log row errors into relational tables or flat files. When you enable error logging, the PowerCenter Server creates the error tables or an error log file the first time it runs the session. Error logs are cumulative. If the error logs exist, the PowerCenter Server appends error data to the existing error logs.

You can choose to log source row data. Source row data includes row data, source row ID, and source row type from the source qualifier where an error occurs. The PowerCenter Server cannot identify the row in the source qualifier that contains an error if the error occurs after a non pass-through partition point with more than one partition or one of the following active sources:

♦ Aggregator

♦ Custom, configured as an active transformation

♦ Joiner

♦ Normalizer (pipeline)

♦ Rank

♦ Sorter

By default, the PowerCenter Server logs transformation errors in the session log and reject rows in the reject file. When you enable error logging, the PowerCenter Server does not generate a reject file or write dropped rows to the session log. Without a reject file, the PowerCenter Server does not log Transaction Control transformation rollback or commit errors. If you want to write rows to the session log in addition to the row error log, you can enable verbose data tracing.

Note: When you log row errors, session performance may decrease because the PowerCenter Server processes one row at a time instead of a block of rows at once.

Error Log Code PagesThe code page for the error log must match the code page for the session log. By default, the error log code page matches the server code page, and you can set the server configuration parameter to use UTF-8. The code page for the relational database where the error tables exist needs to be one-way compatible with the server code page. For more information about code pages, see “Globalization Overview” in the Installation and Configuration Guide.

482 Chapter 17: Row Error Logging

Understanding the Error Log Tables

When you choose relational database error logging, the PowerCenter Server creates four error tables the first time you run a session. You specify the database connection to the database where the PowerCenter Server creates these tables. If the error tables exist for a session, the PowerCenter Server appends row errors to these tables.

Relational database error logging allows you to collect row errors from multiple sessions in one set of error tables. To do this, you specify the same error log table name prefix for all sessions. You can issue select statements on the generated error tables to retrieve error data for a particular session.

You can specify a prefix for the error tables. The error table names can have up to eleven characters. Do not specify a prefix that exceeds 19 characters when naming Oracle, Sybase, or Teradata error log tables, as these databases have a maximum length of 30 characters for table names.

The PowerCenter Server creates the error tables without specifying primary and foreign keys. However, you can specify key columns.

The PowerCenter Server generates the following tables to help you track row errors:

♦ PMERR_DATA. Stores data and metadata about a transformation row error and its corresponding source row.

♦ PMERR_MSG. Stores metadata about an error and the error message.

♦ PMERR_SESS. Stores metadata about the session.

♦ PMERR_TRANS. Stores metadata about the source and transformation ports, such as name and datatype, when a transformation error occurs.

PMERR_DATAWhen the PowerCenter Server encounters a row error, it inserts an entry into the PMERR_DATA table. This table stores data and metadata about a transformation row error and its corresponding source row.

Table 17-1 describes the structure of the PMERR_DATA table:

Table 17-1. PMERR_DATA Table Schema

Column Name Datatype Description

REPOSITORY_GID Varchar A unique identifier for the repository.

WORKFLOW_RUN_ID Integer A unique identifier for the workflow.

WORKLET_RUN_ID Integer A unique identifier for the worklet. If a session is not part of a worklet, this value is �0�.

SESS_INST_ID Integer A unique identifier for the session.

TRANS_MAPPLET_INST Varchar Name of the mapplet where an error occurred.

Understanding the Error Log Tables 483

TRANS_NAME Varchar Name of the transformation where an error occurred.

TRANS_GROUP Varchar Name of the input group or output group where an error occurred. Defaults to either �input� or �output� if the transformation does not have a group.

TRANS_PART_INDEX Integer Specifies the partition number of the transformation where an error occurred.

TRANS_ROW_ID Integer Specifies the row ID generated by the last active source.

TRANS_ROW_DATA Long Varchar Delimited string containing all column data, including the column indicator. Column indicators are:D - validO - overflowN - nullT - truncatedB - binaryU - data unavailableThe fixed delimiter between column data and column indicator is colon ( : ). The delimiter between the columns is pipe ( | ). You can override the column delimiter in the error handling settings.

The PowerCenter Server converts all column data to text string in the error table. For binary data, the PowerCenter Server uses only the column indicator.

This value can span multiple rows. When the data exceeds 2000 bytes, the PowerCenter Server creates a new row. The line number for each row error entry is stored in the LINE_NO column.

SOURCE_ROW_ID Integer Value that the source qualifier assigns to each row it reads. If the PowerCenter Server cannot identify the row, the value is -1.

SOURCE_ROW_TYPE Integer The row indicator that tells whether the row was marked for insert, update, delete, or reject. 0 - Insert1 - Update2 - Delete3 - Reject




PMERR_MSGWhen the PowerCenter Server encounters a row error, it inserts an entry into the PMERR_MSG table. This table stores metadata about the error and the error message.

Table 17-2 describes the structure of the PMERR_MSG table:

SOURCE_ROW_DATA Long Varchar Delimited string containing all column data, including the column indicator. Column indicators are:D - validO - overflowN - nullT - truncatedB - binaryU - data unavailableThe fixed delimiter between column data and column indicator is colon ( : ). The delimiter between the columns is pipe ( | ). You can override the column delimiter in the error handling settings.

The PowerCenter Server converts all column data to text string in the error table or error file. For binary data, the PowerCenter Server uses only the column indicator.

This value can span multiple rows. When the data exceeds 2000 bytes, the PowerCenter Server creates a new row. The line number for each row error entry is stored in the LINE_NO column.

LINE_NO Integer Specifies the line number for each row error entry in SOURCE_ROW_DATA and TRANS_ROW_DATA that spans multiple rows.

Informatica recommends using the fields in bold to join tables.

Table 17-2. PMERR_MSG Table Schema






MAPPLET_INST_NAME Varchar Mapplet to which the transformation belongs. If the transformation is not part of a mapplet, this value is N/A.





PMERR_SESSWhen you choose relational database error logging, the PowerCenter Server inserts entries into the PMERR_SESS table. This table stores metadata about the session where an error occurred.


TRANS_PART_INDEX Integer Specifies the partition number of the transformation where an error occurred.

TRANS_ROW_ID Integer Specifies the row ID generated by the last active source.

ERROR_SEQ_NUM Integer Counter for the number of errors per row in each transformation group. If a session has multiple partitions, the PowerCenter Server maintains this counter for each partition. For example, if a transformation generates three errors in partition 1 and two errors in partition 2, ERROR_SEQ_NUM generates the values 1, 2, and 3 for partition 1, and values 1 and 2 for partition 2.

ERROR_TIMESTAMP Date/Time Timestamp of the PowerCenter Server when the error occurred.

ERROR_UTC_TIME Integer The Coordinated Universal Time, also known as Greenwich Mean Time, of when an error occurred.

ERROR_CODE Integer The error code that the error generates.

ERROR_MSG Long Varchar Error message, which can span multiple rows. When the data exceeds 2000 bytes, the PowerCenter Server creates a new row. The line number for each row error entry is stored in the LINE_NO column.

ERROR_TYPE Integer The type of error that occurred. The PowerCenter Server uses the following values:1 - Reader error2 - Writer error3 - Transformation error

LINE_NO Integer Specifies the line number for each row error entry in ERROR_MSG that spans multiple rows.


Table 17-2. PMERR_MSG Table Schema



Table 17-3 describes the structure of the PMERR_SESS table:

PMERR_TRANSWhen the PowerCenter Server encounters a transformation error, it inserts an entry into the PMERR_TRANS table. This table stores metadata, such as the name and datatype of the source and transformation ports.

Table 17-4 describes the structure of the PMERR_TRANS table:

Table 17-3. PMERR_SESS Table Schema






SESS_START_TIME Date/Time Timestamp of the PowerCenter Server when a session starts.

SESS_START_UTC_TIME Integer The Coordinated Universal Time, also known as Greenwich Mean Time, of when the session starts.

REPOSITORY_NAME Varchar The repository name where sessions are stored.

FOLDER_NAME Varchar Specifies the folder where the mapping and session are located.

WORKFLOW_NAME Varchar Specifies the workflow that runs the session being logged.

TASK_INST_PATH Varchar Fully qualified session name that can span multiple rows. The PowerCenter Server creates a new line for the session name. The PowerCenter Server also creates a new line for each worklet in the qualified session name. For example, you have a session named WL1.WL2.S1. Each component of the name appears on a new line:WL1WL2S1The PowerCenter Server writes the line number in the LINE_NO column.

MAPPING_NAME Varchar Specifies the mapping that the session uses.

LINE_NO Integer Specifies the line number for each row error entry in TASK_INST_PATH that spans multiple rows.


Table 17-4. PMERR_TRANS Table Schema







TRANS_MAPPLET_INST Varchar Specifies the instance of a mapplet.



TRANS_ATTR Varchar Lists the port names and datatypes of the input or output group where the error occurred. Port name and datatype pairs are separated by commas, for example: portname1:datatype, portname2:datatype.

This value can span multiple rows. When the data exceeds 2000 bytes, the PowerCenter Server creates a new row for the transformation attributes and writes the line number in the LINE_NO column.

SOURCE_MAPPLET_INST Varchar Name of the mapplet in which the source resides.

SOURCE_NAME Varchar Name of the source qualifier. N/A appears when a row error occurs downstream of an active source that is not a source qualifier or a non pass-through partition point with more than one partition. For a list of active sources that can affect row error logging, see �Overview� on page 482.

SOURCE_ATTR Varchar Lists the connected field(s) in the source qualifier where an error occurred. When an error occurs in multiple fields, each field name is entered on a new line. Writes the line number in the LINE_NO column.

LINE_NO Integer Specifies the line number for each row error entry in TRANS_ATTR and SOURCE_ATTR that spans multiple rows.


Table 17-4. PMERR_TRANS Table Schema



Understanding the Error Log File

You can create an error log file to collect all errors that occur in a session. This error log file is a column delimited line sequential file. By specifying a unique error log file name, you can create a separate log file for each session in a workflow. When you want to analyze the row errors for only one session, use an error log file.

In an error log file, double pipes “||” delimit error logging columns. By default, pipe “|” delimits row data. You can change this row data delimiter by setting the Data Column Delimiter error log option.

The code page for the error file is the same as the code page for the session log file. If the session log uses a UTF-8 code page, the error file also uses a UTF-8 code page. For more information about code pages, see “Globalization Overview” in the Installation and Configuration Guide.

Error log files have the following structure:

[Session Header]

[Column Header]

[Column Data]

♦ Session header. Contains session run information. Information in the session header is like the information stored in the PMERR_SESS table.

♦ Column header. Contains data column names.

♦ Column data. Contains actual row data and error message information.

The following sample error log file contains a session header, column header, and column data:

**********************************************************************

Repository GID: fe4817ab-7d87-465f-9110-354222424df0

Repository: CustomerInfo

Folder: Row_Error_Logging

Workflow: wf_basic_REL_errors_AGG_case

Session: s_m_basic_REL_errors_AGG_case

Mapping: m_basic_REL_errors_AGG_case

Workflow Run ID: 1310

Worklet Run ID: 0

Session Instance ID: 19

Session Start Time: 08/03/2004 16:57:01

Session Start Time (UTC): 1067126221

**********************************************************************

Understanding the Error Log File 489

Transformation||Transformation Mapplet Name||Transformation Group||Partition Index||Transformation Row ID||Error Sequence||Error Timestamp||Error UTC Time||Error Code||Error Message||Error Type||Transformation Data||Source Mapplet Name||Source Name||Source Row ID||Source Row Type||Source Data

agg_REL_basic||N/A||Input||1||1||1||08/03/2004 16:57:03||1067126223||11019||Port [CUST_ID_NULL]: Default value is: ERROR(<<Expression Error>> [ERROR]: [AGG] CUST_ID - NULL detected on input.\n... nl:ERROR(s:'[AGG] CUST_ID - NULL detected on input.')).||3||D:1221|N:|N:|N:|D:Kauai Dive Shoppe|D:4-976 Sugarloaf Hwy|D:Kapaa Kauai|D:HI|D:94766|D:[AGG] DEFAULT SID VALUE.|D:01/01/2001 00:00:00||mplt_add_NULLs_to_QACUST3||SQ_QACUST3||1||0||D:1221|D:Kauai Dive Shoppe|D:4-976 Sugarloaf Hwy|D:Kapaa Kauai|D:HI|D:94766

agg_REL_basic||N/A||Input||1||4||1||08/03/2004 16:57:03||1067126223||11019||Port [CITY_IN]: Default value is: ERROR(<<Expression Error>> [ERROR]: [AGG] Null detected for City_IN.\n... nl:ERROR(s:'[AGG] Null detected for City_IN.')).||3||D:1354|N:|N:|D:1354|T:Cayman Divers World|D:PO Box 541|N:|D:Gr|N:|D:[AGG] DEFAULT SID VALUE.|D:01/01/2001 00:00:00||mplt_add_NULLs_to_QACUST3||SQ_QACUST3||4||0||D:1354|D:Cayman Divers World Unlim|D:PO Box 541|N:|D:Gr|N:

agg_REL_basic||N/A||Input||1||5||1||08/03/2004 16:57:03||1067126223||11131||Transformation [agg_REL_basic] had an error evaluating variable column [Var_Divide_by_Price]. Error message is [<<Expression Error>> [/]: divisor is zero\n... f:(f:2 / f:(f:1 - f:TO_FLOAT(i:1)))].||3||D:1356|N:|N:|D:1356|T:Tom Sawyer Diving C|T:632-1 Third Frydenh|D:Christiansted|D:St|D:00820|D:[AGG] DEFAULT SID VALUE.|D:01/01/2001 00:00:00||mplt_add_NULLs_to_QACUST3||SQ_QACUST3||5||0||D:1356|D:Tom Sawyer Diving Centre|D:632-1 Third Frydenho|D:Christiansted|D:St|D:00820

Table 17-5 describes the columns in an error log file:

Table 17-5. Error Log File Column Headers

Log File Column Headers Description

Transformation The name of the transformation used by a mapping where an error occurred.

Transformation Mapplet Name Name of the mapplet that contains the transformation. N/A appears when this information is not available.

Transformation Group Name of the input or output group where an error occurred. Defaults to either �input� or �output� if the transformation does not have a group.

Partition Index Specifies the partition number of the transformation partition where an error occurred.

Transformation Row ID Specifies the row ID for the error row.

Error Sequence Counter for the number of errors per row in each transformation group. If a session has multiple partitions, the PowerCenter Server maintains this counter for each partition. For example, if a transformation generates three errors in partition 1 and two errors in partition 2, ERROR_SEQ_NUM generates the values 1, 2, and 3 for partition 1, and values 1 and 2 for partition 2.


Error Timestamp Timestamp of the PowerCenter Server when the error occurred.

Error UTC Time The Coordinated Universal Time, also known as Greenwich Mean Time, when the error occurred.

Error Code The error code that corresponds to the error message.

Error Message Error message.

Error Type The type of error that occurred. The PowerCenter Server uses the following values:1 - Reader error2 - Writer error3 - Transformation error

Transformation Data Delimited string containing all column data, including the column indicator. Column indicators are:D - validO - overflowN - nullT - truncatedB - binaryU - data unavailableThe fixed delimiter between column data and column indicator is a colon ( : ). The delimiter between the columns is a pipe ( | ). You can override the column delimiter in the error handling settings.

The PowerCenter Server converts all column data to text string in the error file. For binary data, the PowerCenter Server uses only the column indicator.

Source Name Name of the source qualifier. N/A appears when a row error occurs downstream of an active source that is not a source qualifier or a non pass-through partition point with more than one partition. For a list of active sources that can affect row error logging, see �Overview� on page 482.

Source Row ID Value that the source qualifier assigns to each row it reads. If the PowerCenter Server cannot identify the row, the value is -1.



Understanding the Error Log File 491

Source Row Type The row indicator that tells whether the row was marked for insert, update, delete, or reject. 0 - Insert1 - Update2 - Delete3 - Reject

Source Data Delimited string containing all column data, including the column indicator. Column indicators are:D - validO - overflowN - nullT - truncatedB - binaryU - data unavailableThe fixed delimiter between column data and column indicator is a colon ( : ). The delimiter between the columns is a pipe ( | ). You can override the column delimiter in the error handling settings.

The PowerCenter Server converts all column data to text string in the error table or error file. For binary data, the PowerCenter Server uses only the column indicator.




Configuring Error Log Options

You configure error logging for each session in a workflow. You can find error handling options in the Config Object tab of the sessions properties.

Tip: You can use the Workflow Manager to create a reusable set of attributes for the Config Object tab. For more information on creating a session configuration object, see “Creating a Session Configuration Object” on page 183.

To configure error logging options:

1. Double-click the Session task to open the session properties.

2. Select the Config Object tab.

3. Choose error handling options.

Error Log Options

Configuring Error Log Options 493

Table 17-6 describes the error logging settings of the Config Object tab:

4. Click OK.

Table 17-6. Error Log Options

Error Log Options Required/Optional Description

Error Log Type Required Specifies the type of error log to create. You can specify relational database, flat file, or no log. By default, the PowerCenter Server does not create an error log.

Error Log DB Connection Required/Optional

Specifies the database connection for a relational log. This option is required when you enable relational database logging.

Error Log Table Name Prefix

Optional Specifies the table name prefix for relational logs. The PowerCenter Server appends 11 characters to the prefix name. Oracle and Sybase have a 30 character limit for table names. If a table name exceeds 30 characters, the session fails.

Error Log File Directory Required/Optional

Specifies the directory where errors are logged. By default, the error log file directory is $PMBadFilesDir\. This option is required when you enable flat file logging.

Error Log File Name Required/Optional

Specifies error log file name. The character limit for the error log file name is 255. By default, the error log file name is PMError.log. This option is required when you enable flat file logging.

Log Row Data Optional Specifies whether or not to log transformation row data. By default, the PowerCenter Server logs transformation row data. If you disable this property, N/A or -1 appears in transformation row data fields.

Log Source Row Data Optional If you choose not to log source row data, or if source row data is unavailable, the PowerCenter Server writes an indicator such as N/A or -1, depending on the column datatype.If you do not need to capture source row data, consider disabling this option to increase PowerCenter Server performance.

Data Column Delimiter Required Delimiter for string type source row data and transformation group row data. By default, the PowerCenter Server uses a pipe ( | ) delimiter. Verify that you do not use the same delimiter for the row data as the error logging columns. If you use the same delimiter, you may find it difficult to read the error log file.


C h a p t e r 1 8

Session Parameters

This chapter contains information on the following topics:

♦ Overview, 496

♦ Session Log Parameter, 497

♦ Database Connection Parameters, 499

♦ Source File Parameters, 502

♦ Target File Parameters, 504

♦ Lookup File Parameters, 506

♦ Reject File Parameters, 508

♦ Tips, 510

495

Overview

Session parameters, like mapping parameters, represent values you might want to change between sessions, such as a database connection or source file. Use session parameters in the session properties, and then define the parameters in a parameter file. You can specify the parameter file for the session to use in the session properties. You can also specify it when you use pmcmd to start the session.

The Workflow Manager provides one built-in session parameter, $PMSessionLogFile. With $PMSessionLogFile, you can change the name of the session log generated for the session.

The Workflow Manager also allows you to create user-defined session parameters.

Table 18-1 describes required naming conventions for the session parameters you can define:

Use session parameters to make sessions more flexible. For example, you have the same type of transactional data written to two different databases, and you use the database connections TransDB1 and TransDB2 to connect to the databases. You want to use the same mapping for both tables. Instead of creating two sessions for the same mapping, you can create a database connection parameter, $DBConnectionSource, and use it as the source database connection for the session. When you create a parameter file for the session, you set $DBConnectionSource to TransDB1 and run the session. After the session completes, you set $DBConnectionSource to TransDB2 and run the session again.

You might use several session parameters together to make session management easier. For example, you might use source file and database connection parameters to configure a session to read data from different source files and write the results to different target databases. You can then use reject file parameters to write the session reject files to the target machine. You can use the session log parameter, $PMSessionLogFile, to write to different session logs in the target machine, as well.

When you use session parameters, you must define the parameters in the parameter file. Session parameters do not have default values. When the PowerCenter Server cannot find a value for a session parameter, it fails to initialize the session.

Table 18-1. Naming Conventions for User-Defined Session Parameters

Parameter Type Naming Convention

Database Connection $DBConnectionName

Source File $InputFileName

Target File $OutputFileName

Lookup File $LookupFileName

Reject File $BadFileName

496 Chapter 18: Session Parameters

Session Log Parameter

The Workflow Manager provides a built-in session parameter named $PMSessionLogFile. Use $PMSessionLogFile in the session properties to change the name or location of the session log between runs. When you use $PMSessionLogFile in the session properties, define the parameter in the parameter file.

Changing the Session Log NameYou can use $PMSessionLogFile to change the session log name between sessions. In the General Options settings of the Properties tab, enter $PMSessionLogFile in the Session Log Filename field. Then define $PMSessionLogFile in the parameter file. When the PowerCenter Server runs the session, it creates a session log in the directory listed in the Session Log File Directory field and names the session log as instructed by the parameter file. If a session log with the same name already exists, the PowerCenter Server overwrites the existing file.

Figure 18-1 illustrates how to use the session log parameter with a directory:

For example, in a session, you leave Session Log File Directory set to its default value, the $PMSessionLogDir server variable. For Session Log File Name, you enter the session parameter $PMSessionLogFile. In the parameter file, you set $PMSessionLogFile to “TestRun.txt”. When you registered the PowerCenter Server, you defined $PMSessionLogDir as C:/Program Files/Informatica/PowerCenter Server/SessLogs. When the PowerCenter Server

Figure 18-1. Using $PMSessionLogFile as the Name of the Session Log

Session Log Parameter

Parameter Filename

Session Log Directory

Session Log Parameter 497

runs the session, it creates a session log named TextRun.txt in the C:/Program Files/Informatica/PowerCenter Server/SessLogs directory.

Changing the Session Log Name and LocationYou can also use $PMSessionLogFile to change both the directory and the session log name between sessions. If you do this, you also need to clear the Session Log File Directory field. The PowerCenter Server concatenates both fields to determine where and how to name the session log.

For example, you have one session writing target files to different systems. You want each session log written to the target machine so the local administrator can review the file. In the session, you configure a target file session parameter $PMOutputFile1. You then use $PMSessionLogFile to define the session log file name and clear the Session Log File Directory. In the parameter file, you configure both the target file and session log file parameter to write to the same machine. Set $PMOutputFile1 to E:/target files/Marketing.out, and $PMSessionLogFile to E:/session logs/Marketing.txt. After you run the session, you can edit the parameter file to change the directory and file names for both the target file and session log parameters.

Alternatively, you can create a different parameter file for each target. You can then use pmcmd to specify which parameter file to use when you start the session.

Steps for Using $PMSessionLogFileUse $PMSessionLogFile when you want to change the name and/or location of a session log between session runs.

To use the session log parameter:

1. In the session properties, click the General Options settings of the Properties tab.

2. Enter $PMSessionLogFile in the Session Log File field.

3. If you want $PMSessionLogFile to represent both the session log name and directory, clear the Session Log File Directory field.

4. Enter a parameter file and directory in the Parameter File Name field.

5. Click OK.

Before you run the session, create the parameter file in the specified directory and define $PMSessionLogFile. For details, see “Parameter Files” on page 511.


Database Connection Parameters

You can create user-defined database connection session parameters to reuse sessions for different relational sources, targets, or lookups. You can create a database connection parameter in the session properties of any session that uses a relational source, target, or lookup. Name all database connection session parameters with the prefix $DBConnection, followed by any alphanumeric and underscore characters. When you define the parameter in the parameter file, you can reference any database connection in the repository.

For example, you have a session you want to use with two relational sources. You access the first source with a database connection named “Marketing” and the second with a connection named “Sales.” In the session, you create a source database connection parameter named $DBConnection_Source. In the parameter file, you define $DBConnection_Source as Marketing and run the session. After the session completes, you set $DBConnection_Source to Sales in the parameter file, and then run the session.

Alternatively, you can create two different parameter files, one for each source database connection. You can then use pmcmd to specify which parameter file to use when you start the session.

If you want to use the same database connection for more than one connection, such as source and target, you can enter the same $DBConnection parameter for both source and target database connection. In the parameter file, enter one default value for the $DBConnection parameter. The PowerCenter Server uses the same DBConnectionName when accessing source and target.

Similarly heterogeneous sources may also use the same $DBConnection parameter.

To configure a database connection parameter:

1. In the session properties, click the Mapping tab (Transformation view) and click Connections settings for the sources or targets node.

Database Connection Parameters 499

2. Click the Open button in the Value field.

3. In the Relational Connection Browser, select Use Connection Variable.

4. Enter a name for the database connection parameter. Name the connection parameter $DBConnectionName.

Open Button


5. In the General Options settings of the Properties tab, enter a parameter file and directory in the Parameter Filename field.

The directory must be local to the PowerCenter Server.

6. Click OK.

Before you run the session, create the parameter file in the specified directory and define the database connection parameter. For details, see “Parameter Files” on page 511.

Database Connection Parameters 501

Source File Parameters

You can create user-defined source file session parameters. Use a source file parameter when you want to change the name or location of a session source file between session runs. Name all source file session parameters with the prefix $InputFile, followed by any alphanumeric and underscore characters. All source file session parameters within a session must have distinct names. You can create a source file parameter in any session that reads from file sources. When you define the parameter in the parameter file, you can reference any source file local to the PowerCenter Server.

You can use a user-defined source file session parameter in either the Source File Directory or Source Filename session property.

Changing the Source FileYou can use a source file parameter to change the name of the source file a session uses. In the Properties settings of the Mapping tab, enter the source file parameter in the Source Filename field. Then define the parameter in a parameter file. When the PowerCenter Server runs the session, it connects to the directory listed in the Source File Directory field and reads the source file listed in the parameter file.

Figure 18-2 shows how to use a source file parameter with a source directory:

Figure 18-2. Using Parameters to Change the Session Source File

Source Filename In the Parameter File



For example, in a session, you leave Source File Directory set to its default, the $PMSourceFileDir server variable. For the source file name, you create a session parameter named $Inputfile_products. In the parameter file, you set $Inputfile_products to “products.txt”. When you registered the PowerCenter Server, you set $PMSourceFileDir for C:/Program Files/Informatica/PowerCenter Server/SrcFiles. When the PowerCenter Server runs the session, it reads the products.txt file in the C:/Program Files/Informatica/PowerCenter Server/SrcFiles directory.

Changing the Source File and DirectoryYou can use a source file parameter to change both the source file and directory used by a session. When you specify both the source file and directory in the Source Filename field, you need to clear the Source File Directory field. The PowerCenter Server concatenates both fields to determine where to find the indicated source file.

Steps for Using a Source File ParameterUse a source file parameter when you want to change the source file and/or location between session runs.

To use a source file parameter:

1. Select a source under the Sources node on the Mapping tab.

2. Go to the Properties settings.

3. In the Source Filename field, enter the source file parameter name.

Name all source file parameters $InputFileName.

4. If you want the parameter to represent both the source file name and location, clear the Source Directory field.


6. Click OK.

Before you run the session, create the parameter file in the specified directory and define the source file parameter. For details, see “Parameter Files” on page 511.

Source File Parameters 503

Target File Parameters

You can create user-defined target file session parameters. Use a target file parameter when you want to change the name or location of a session target file between session runs. Name all target file session parameters with the prefix $OutputFile, followed by any alphanumeric and underscore characters. All target file session parameters within a session need to have distinct names. You can create a target file parameter in any session that writes to file targets. When you define the parameter in a parameter file, you can write the target file to any directory local to the PowerCenter Server.

You can use a user-defined target file session parameter in either the Output File Directory or Output Filename session property.

Changing the Target FileYou can use a target file parameter to change the name of the target file the PowerCenter Server creates when it runs a session. In the Properties settings of the Mapping tab, enter the target file parameter in the Output File Name field. Then define the parameter in a parameter file. When the PowerCenter Server runs the session, it connects to the directory listed in the Output File Directory field and creates the target file listed in the parameter file. If the target file exists, the PowerCenter Server overwrites the existing target file.

Figure 18-3 shows how to use a target file parameter with a target file directory:

Figure 18-3. Using Parameters to Change the Session Target File

Target file directoryTarget file name in the parameter file


For example, you want to name the target file based on the month in which the session runs. In the session you leave the target directory set to its default, the $PMTargetFileDir server variable. For the target file name, you create a session parameter named $OutputFileName. In the parameter file, you set $OutputFileName to “Nov2000.out”. When you registered the PowerCenter Server, set the $PMTargetFileDir to C:/Program Files/Informatica/PowerCenter Server/TgtFiles. When the PowerCenter Server runs the session, it creates Nov2000.out in the C:/Program Files/Informatica/PowerCenter Server/TgtFiles directory.

Changing the Target File and DirectoryYou can use a target file parameter to change both the target file and directory used by a session. When you specify both the target file and directory in the Output Filename field, you need to clear the Output File Directory field. The PowerCenter Server concatenates both fields to determine where to create the target file.

For example, a session uses a source file parameter to read both internal and external weblogs on different session runs. You want to write the results of the internal weblog session to one system and the external weblog session to another. In the session, you name the target file $OutputFileName and clear the Output File Directory field. In the parameter file, you set $OutputFileName to “E:/internal_weblogs/November_int.txt” to create a target file for the internal weblog session. After the session completes, you change $OutputFileName to “F:/external_weblogs/November_ex.txt” for the external weblog session.

Alternatively, you can create a different parameter file for each target. You can then use pmcmd to specify which parameter file to use when you start the session.

Steps for Using a Target File ParameterUse a target file parameter when you want to change the name and/or location of a target file between session runs.

To use a target file parameter:

1. Select a target under the Targets node on the Mapping tab.


3. In the Output Filename field, enter the target file parameter name.

Name all target file parameters $OutputFileName.

4. If you want the parameter to represent both the target file name and location, clear the Output File Directory field.


6. Click OK.

Before you run the session, create the parameter file in the specified directory and define the target file parameter you created. For details, see “Parameter Files” on page 511.

Target File Parameters 505

Lookup File Parameters

You can create user-defined lookup file session parameters. Use a lookup file parameter when you want to change the name or location of a session lookup file between session runs. Name all lookup file session parameters with the prefix $LookupFile, followed by any alphanumeric and underscore characters. All lookup file session parameters within a session must have distinct names. You can create a lookup file parameter in any session that performs lookups onflat files. When you define the parameter in the parameter file, you can reference any lookup file local to the PowerCenter Server.

You can use a user-defined lookup file session parameter in either the Lookup Source File Directory or Lookup Source Filename session property.

Changing the Lookup FileYou can use a lookup file parameter to change the name of the lookup file a session uses. In the Properties settings of the Mapping tab, enter the lookup file parameter in the Lookup Filename field. Then define the parameter in a parameter file. When the PowerCenter Server runs the session, it connects to the directory listed in the Lookup File Directory field and reads the source file listed in the parameter file.

Figure 18-4 shows how to use a lookup file parameter with a lookup directory:

Figure 18-4. Using Parameters to Change the Session Lookup File

Lookup File Directory

Lookup file name in the parameter file


For example, in a session, you leave Lookup File Directory set to its default, the $PMLookupFileDir server variable. For the lookup file name, you create a session parameter named $LookupFile_orders. In the parameter file, you set $LookupFile_orders to “orders.txt”. When you registered the PowerCenter Server, you set $PMLookupFileDir for C:/Program Files/Informatica/PowerCenter Server/LkpFiles. When the PowerCenter Server runs the session, it reads the orders.txt file in the C:/Program Files/Informatica/PowerCenter Server/LkpFiles directory.

Changing the Lookup File and DirectoryYou can use a lookup file parameter to change both the lookup file and directory used by a session. When you specify both the lookup file and directory in the Lookup Source Filename field, you need to clear the Lookup Source File Directory field. The PowerCenter Server concatenates both fields to determine where to find the indicated lookup file.

Steps for Using a Lookup File ParameterUse a lookup file parameter when you want to change the lookup file and/or location between session runs.

To use a lookup file parameter:

1. Select a Lookup transformation on the Mapping tab.


3. In the Lookup Source Filename field, enter the lookup file parameter name.

Name all lookup file parameters $LookupFileName.

4. If you want the parameter to represent both the source file name and location, clear the Lookup Directory field.


6. Click OK.

Before you run the session, create the parameter file in the specified directory and define the lookup file parameter. For details, see “Parameter Files” on page 511.

Lookup File Parameters 507

Reject File Parameters

You can create user-defined reject file session parameters. Use a reject file parameter when you want to change the name or location of session reject files between session runs. Name all reject file session parameters with the prefix $BadFile, followed by any alphanumeric and underscore characters. All reject file parameters within a session need to have distinct names. You can create a reject file parameter for any target in a session. When you define the parameter in a parameter file, you can reference any directory local to the PowerCenter Server.

You can use a user-defined reject file session parameter in either the Reject File Directory or Reject Filename session property.

Changing the Reject File NameYou can use a reject file parameter to change the name of a reject file a session uses. In the Properties settings of the Mapping tab, enter the reject file parameter in the Reject Filename field. Then define the parameter in the parameter file. When the PowerCenter Server runs the session, it locates the directory listed in the Reject File Directory field and creates the reject file listed in the parameter file. If the reject file already exists, it appends rejected data to the existing reject file.

Figure 18-5 shows how to use a reject file parameter with a reject file directory:

Figure 18-5. Using Parameters to Change the Reject File Name

Reject file directory

Reject file name in the parameter file


For example, you want to rename reject files between sessions to keep rejected data from different session runs in different files. in a session, you leave Reject File Directory set to its default, the $PMBadFileDir server variable. For the reject file name, you create a session parameter named $BadFileName. In the parameter file, you set $BadFileName to “FirstRun.bad.” When you registered the PowerCenter Server, you set $PMBadFileDir for C:/Program Files/Informatica/PowerCenter Server/BadFiles. When the PowerCenter Server runs the session, it creates the FirstRun.bad file in the C:/Program Files/Informatica/PowerCenter Server/BadFiles directory.

Changing the Reject File and DirectoryYou can use a reject file parameter to change both the directory and name for session reject files. When you specify both the reject file and directory in the Reject Filename field, you need to clear the Reject File Directory field. The PowerCenter Server concatenates both fields to determine where to find the indicated reject file.

For example, you use a database connection parameter to configure a session to write to different target databases. Instead of having the PowerCenter Server append rejected data from all sessions to the same reject file, you want to have a reject file for each target system. In the session, you name the reject file $BadFileName and clear the Reject File Directory field. In the parameter file, you set $BadFileName to the reject filename and directory for the target database used in the session. When you change the database connection parameter to a different database, you can also change the reject filename and directory.

Alternatively, you can create a different parameter file for each target system. You can then use pmcmd to specify which parameter file to use when you start the session.

Steps for Using a Reject File ParameterUse a reject file parameter when you want to change the reject file and/or location between session runs.

To use a reject file parameter:

1. Go to the Properties settings of the Mapping tab.

2. In the Reject Filename field, enter the reject file parameter name.

Name all reject file parameters $BadFileName.

3. If you want the parameter to represent both the reject file name and location, clear the Reject File Directory field.


5. Click OK.

Before you run the session, create the parameter file in the specified directory and define the reject file parameter. For details, see “Parameter Files” on page 511.

Reject File Parameters 509

Tips

Use reject file and session log parameters in conjunction with target file or target database connection parameters.

When you use a target file or target database connection parameter with a session, you can keep track of reject files by using a reject file parameter to write the reject file to the target machine. You can also use the session log parameter to write the session log to the target machine.


C h a p t e r 1 9

Parameter Files


♦ Overview, 512

♦ Parameter File Format, 513

♦ Guidelines for Creating Parameter Files, 515

♦ Sample Parameter File, 517

♦ Configuring the Parameter File Location, 518

♦ Troubleshooting, 520

♦ Tips, 521

511

Overview

You can use a parameter file to define the values for parameters and variables used in a workflow, worklet, or session. You can create a parameter file using a text editor such as WordPad or Notepad. You list the parameters or variables and their values in the parameter file. Parameter files can contain the following types of parameters and variables:

♦ Workflow variables

♦ Worklet variables

♦ Session parameters

♦ Mapping parameters and variables

When you use parameters or variables in a workflow, worklet, or session, the PowerCenter Server checks the parameter file to determine the start value of the parameter or variable. You can use a parameter file to initialize workflow variables, worklet variables, mapping parameters, and mapping variables. If you do not define start values for these parameters and variables, the PowerCenter Server checks for the start value of the parameter or variable in other places. For more information, see “Using Workflow Variables” on page 103 and “Mapping Parameters and Variables” in the Designer Guide.

You can place parameter files on the PowerCenter Server machine or on a local machine. Use a local parameter file if you do not have access to parameter files on the PowerCenter Server machine. When you use a local parameter file, pmcmd passes variables and values in the file to the PowerCenter Server. Local parameter files are used with the startworkflow pmcmd command. For more information, see “pmcmd Reference” on page 594.

You must define session parameters in a parameter file. Since session parameters do not have default values, when the PowerCenter Server cannot locate the value of a session parameter in the parameter file, it fails to initialize the session.

You can include parameter or variable information for more than one workflow, worklet, or session in a single parameter file by creating separate sections for each object within the parameter file.

You can also create multiple parameter files for a single workflow, worklet, or session and change the file that these tasks use as needed. To specify the parameter file the PowerCenter Server uses with a workflow, worklet, or session, you can do either of the following:

♦ Enter the parameter file name and directory in the workflow, worklet, or session properties.

♦ Start the workflow, worklet, or session using pmcmd and enter the parameter filename and directory in the command line. For details, see “Using pmcmd” on page 581.

If you enter a parameter file name and directory in both the workflow, worklet, or session properties and in the pmcmd command line, the PowerCenter Server uses the information you enter in the pmcmd command line.

512 Chapter 19: Parameter Files

Parameter File Format

When you enter values in a parameter file, you must precede the entries with a heading that identifies the workflow, worklet, or session whose parameters and variables you want to assign. You assign individual parameters and variables directly below this heading, entering each parameter or variable on a new line. You can list parameters and variables in any order for each task.

You can define the following heading formats:

♦ Workflow variables:

[folder name.WF:workflow name]

♦ Worklet variables:

[folder name.WF:workflow name.WT:worklet name]

♦ Worklet variables in nested worklets:

[folder name.WF:workflow name.WT:worklet name.WT:worklet name...]

♦ Session parameters, plus mapping parameters and variables:

[folder name.WF:workflow name.ST:session name]

or

[folder name.session name]

or

[session name]

Below each heading, you define parameter and variable values as follows:

parameter name=value

parameter2 name=value

variable name=value

variable2 name=value

For example, you have a session, s_MonthlyCalculations, in the Production folder. The session uses a string mapping parameter, $$State, that you want to set to “MA”, and a datetime mapping variable, $$Time. $$Time already has an initial values of “9/30/2000 00:00:00” saved in the repository, but you want to override this value to “10/1/2000 00:00:00.” The session also uses session parameters to connect to source files and target databases, as well as to write session log to the appropriate session log file.

Table 19-1 shows the parameters and variables that you define in the parameter file:

Table 19-1. Parameters and Variables in Parameter File

Parameter and Variable Type Parameter and Variable Name Desired Definition

String Mapping Parameter $$State MA

Datetime Mapping Variable $$Time 10/1/2000 00:00:00

Parameter File Format 513

The parameter file for the session includes the folder and session name, as well as each parameter and variable:

[Production.s_MonthlyCalculations]

$$State=MA

$$Time=10/1/2000 00:00:00

$InputFile1=sales.txt

$DBConnection_target=sales

$PMSessionLogFile=D:/session logs/firstrun.txt

The next time you run the session, you might edit the parameter file to change the state to MD and delete the $$Time variable. This allows the PowerCenter Server to use the value for the variable that was set in the previous session run.

Source File (Session Parameter) $InputFile1 Sales.txt

Database Connection (Session Parameter) $DBConnection_Target Sales (database connection)

Session Log File (Session Parameter) $PMSessionLogFile d:/session logs/firstrun.txt

Table 19-1. Parameters and Variables in Parameter File

Parameter and Variable Type Parameter and Variable Name Desired Definition


Guidelines for Creating Parameter Files

Use the following guidelines when creating parameter files:

♦ Capitalize folder and session names as necessary. Folder and session names are case-sensitive in the parameter file.

♦ Enter folder names for non-unique session names. When a session name exists more than once in a repository, enter the folder name to indicate the location of the session.

♦ Create one or more parameter files. You assign parameter files to workflows, worklets, and sessions individually. You can specify the same parameter file for all of these tasks or create several parameter files.

♦ When you want to include parameter and variable information for more than one session in the file, create a new section for each session as follows. The folder name is optional.

[folder_name.session_name]

parameter_name=value

variable_name=value

mapplet_name.parameter_name=value

[folder2_name.session_name]

parameter_name=value

variable_name=value


♦ Specify headings in any order. You can place headings in any order in the parameter file. However, if you define the same parameter or variable more than once in the file, the PowerCenter Server assigns the parameter or variable value using the first instance of the parameter or variable.

♦ Specify parameters and variables in any order. Below each heading, you can specify the parameters and variables in any order.

♦ When defining parameter values, do not use unnecessary line breaks or spaces. The PowerCenter Server might interpret additional spaces as part of the value.

♦ List all necessary mapping parameters and variables. Values entered for mapping parameters and variables become the start value for parameters and variables in a mapping. Mapping parameter and variable names are not case sensitive.

♦ List all session parameters. Session parameters do not have default values. An undefined session parameter can cause the session to fail. Session parameter names are not case-sensitive.

♦ Use correct date formats for datetime values. When entering datetime values, use the following date formats:

− MM/DD/RR

− MM/DD/RR HH24:MI:SS

Guidelines for Creating Parameter Files 515

− MM/DD/YYYY

− MM/DD/YYYY HH24:MI:SS

♦ Do not enclose parameters or variables in quotes. The PowerCenter Server interprets everything after the equal sign as part of the value.

♦ Precede parameters and variables created in mapplets with the mapplet name as follows:


mapplet2_name.variable_name=value


Sample Parameter File

The following text is an excerpt from a parameter file:

[HET_TGTS.WF:wf_TCOMMIT_INST_ALIAS]

$$platform=unix

[HET_TGTS.WF:wf_TGTS_ASC_ORDR.ST:s_TGTS_ASC_ORDR]

$$platform=unix

$DBConnection_ora=qasrvrk2_hp817

[ORDERS.WF:wf_PARAM_FILE.WT:WL_PARAM_Lvl_1]

$$DT_WL_lvl_1=02/01/2000 00:00:00

$$Double_WL_lvl_1=2.2

[ORDERS.WF:wf_PARAM_FILE.WT:WL_PARAM_Lvl_1.WT:NWL_PARAM_Lvl_2]

$$DT_WL_lvl_2=03/01/2000 00:00:00

$$Int_WL_lvl_2=3

$$String_WL_lvl_2=ccccc

Sample Parameter File 517

Configuring the Parameter File Location

You can specify the parameter filename and directory in the workflow or session properties.

To enter a parameter file in the workflow properties:

1. Select Workflows-Edit.

2. Click the Properties tab.

3. Enter the parameter directory and name in the Parameter Filename field.

You can enter either a direct path or a server variable directory. Use the appropriate delimiter for the PowerCenter Server operating system.

4. Click OK.

To enter a parameter file in the session properties:

1. Click the Properties tab and open the General Options settings.

2. Enter the parameter directory and name in the Parameter Filename field.

Enter the parameter directory.


3. You can enter either a direct path or a server variable directory. Use the appropriate delimiter for the PowerCenter Server operating system.

4. Click OK.

Enter the parameter directory.

Configuring the Parameter File Location 519

Troubleshooting

I have a section in a parameter file for a session, but the PowerCenter Server does not seem to read it.

In the parameter file, folder and session names are case-sensitive. Make sure to enter folder and session names exactly as they appear in the Workflow Manager. Also, use the appropriate prefix for all user-defined session parameters.

Table 19-2 describes required naming conventions for user-defined session parameters:

I am trying to use a source file parameter to specify a source file and location, but the PowerCenter Server cannot find the source file.

Make sure to clear the source file directory in the session properties. The PowerCenter Server concatenates the source file directory with the source file name to locate the source file.

Also, make sure to enter a directory local to the PowerCenter Server and to use the appropriate delimiter for the operating system.

I am trying to run a workflow with a parameter file and one of the sessions keeps failing.

The session might contain a parameter that is not listed in the parameter file. The PowerCenter Server uses the parameter file to start all sessions in the workflow. Check the session properties, then verify that all session parameters are defined correctly in the parameter file.

Table 19-2. Naming Conventions for User-Defined Session Parameters

Parameter Type Naming Convention

Database Connection $DBConnectionName

Reject File $BadFileName

Source File $InputFileName

Target File $OutputFileName

Lookup File $LookupFileName


Tips

Use a single parameter file to group parameter information for related sessions.

When sessions are likely to use the same database connection or directory, you might want to include them in the same parameter file. When existing systems are upgraded, you can update information for all sessions by editing one parameter file.

Use pmcmd and multiple parameter files for sessions with regular cycles.

When you change parameter values for a session in a cycle, reuse the same values on a regular basis. If you run a session against both the sales and marketing databases once a week, you might want to create separate parameter files for each regular session run. Then, instead of changing the parameter file in the session properties each time you run the session, use pmcmd to specify the parameter file to use when you start the session.

Tips 521


C h a p t e r 2 0

External Loading


♦ Overview, 524

♦ External Loader Permissions, 525

♦ External Loader Behavior, 526

♦ Loading to DB2, 528

♦ Loading to Oracle, 533

♦ Loading to Sybase IQ, 535

♦ Loading to Teradata, 538

♦ Creating an External Loader Connection, 551

♦ Configuring External Loading in a Session, 553

♦ Troubleshooting, 557

523

Overview

You can configure a session to use DB2, Oracle, Sybase IQ, and Teradata external loaders to load session target files into the respective databases. External Loaders can increase session performance since these databases can load information directly from files faster than they can run the SQL commands to insert the same data into the database.

To use an external loader for a session, you must perform the following tasks:

1. Create an external loader connection in the Workflow Manager and configure the external loader attributes. For details on creating external loader connections, see “Creating an External Loader Connection” on page 551.

2. Configure the session to write to flat file instead of to a relational database. For more information, see “Configuring a Session to Write to a File” on page 553.

3. Choose an external loader connection for each target file in the session properties. For more information, see “Selecting an External Loader Connection” on page 555.

When you run a session that uses an external loader, the PowerCenter Server creates a control file and a target flat file. The control file contains information about the target flat file such as data format and loading instructions for the external loader. The control file has an extension of .ctl. You can view the control file and the target flat file in the target file directory (default: $PMTargetFileDir).

The PowerCenter Server waits for all external loading to complete before performing post-session commands, external procedures, and sending post-session email.

Before you run external loaders, consider the following issues:

♦ Disable constraints. Normally, you disable constraints built into the tables receiving the data before performing the load. Consult your database documentation for instructions on how to disable constraints.

♦ Performance issues. To preserve high performance, you can increase commit intervals and turn off database logging. However, to perform database recovery on failed sessions, you must have database logging turned on.

♦ Code page requirements. DB2, Oracle, Sybase IQ, and Teradata database servers must run in the same code page as the target flat file code page. The external loaders start in the target flat file code page. The PowerCenter Server creates the control and target flat files using the target flat file code page. If you are using a code page other than 7-bit ASCII for the target flat file, run the PowerCenter Server in Unicode data movement mode.

The PowerCenter Server can use multiple external loaders within one session. For example, if the mapping contains two targets, you can create a session that uses different connection types: one uses an Oracle external loader connection and the other uses a Sybase IQ external loader connection.

524 Chapter 20: External Loading

External Loader Permissions

You can set external loader connection permissions in the connection object in the Workflow Manager. The Workflow Manager assigns Owner permissions to the user who registers the connection. The Workflow Manager grants Owner Group permissions to the first group in the Group Memberships list of the owner. You can manage External Loader permissions if you are the owner of the external loader connection or if you have Super User privileges.

If you want to edit an external loader connection, you must have read and write permissions for the connection. If you want to run sessions that use a target external loader connection, you must have at least execute permission for the connection.

Permissions and PrivilegesTo create an external loader connection, you must have one of the following privileges:


♦ Super User

To configure a session to use an external loader, you must have one of the following sets of privileges and permissions:

♦ Use Workflow Manager privilege and folder read and write permissions

♦ Super User

If you enabled enhanced security, you must also have read permission for external loader connections associated with the session.

External Loader Permissions 525

External Loader Behavior

The behavior of the external loader depends on how you choose to load the data. You can load data in the following ways:

♦ Loading to named pipes. When you load data to named pipes, the external loader starts to load data to the target database as soon as the data appears in the named pipe.

♦ Staging data using flat files. When you stage data in flat files, the external loader starts to load data to the target databases only after the PowerCenter Server completes writing to the target flat files.

Loading Data Using Named PipesOn UNIX, the PowerCenter Server writes to a named pipe, which is named after the configured target file name. The external loader starts to load data to the database as soon as the data appears in the named pipe. When you use external loaders on UNIX, the loader deletes the named pipe as soon as it completes the load.

On Windows, when you load data using named pipes, the PowerCenter Server writes data to a named pipe using the specified format: \\.\pipe\<pipename> where the pipename is the same as the configured target name. If the PowerCenter Server finds a file or named pipe that uses the same name as the target flat file, it deletes the file or named pipe and recreates it.

If the PowerCenter Server on UNIX finds a file or named pipe (with the same name as the session target flat file) in the target directory, it deletes the file or named pipe and recreates the named pipe.

Tip: You may not be able to create a named pipe or file if another file exists that uses the same name. You can rename the output file in the session that uses the external loader.

Staging Data to Flat FilesWhen you stage data using flat files, the external loader starts loading data to target databases only after the PowerCenter Server completes writing to the target flat files. The external loader does not delete the target flat files after loading them to the database. Make sure the target file directory can accommodate the size of the target flat files.

If the session contains fatal errors, the PowerCenter Server does not finish writing data to the target files, and the external loader does not start.

Partitioning Sessions with External LoadersWhen you configure multiple partitions in a session with a flat file target, the PowerCenter Server creates a separate flat file for each partition. Some external loaders cannot load data from multiple files into the target. When you use an external loader in a session with multiple partitions, you must configure partitioning according to the external loader you use.


When you use an external loader that can load data from multiple files, you can create multiple partitions in the session. You choose an external loader connection for each partition. The PowerCenter Server creates an output file for each partition, and the external loader loads the output from each target file to the database.

If you use a loader that cannot load from multiple files, the session fails.

Table 20-1 lists the external loaders and loader behavior:

Errors and Error MessagesThe PowerCenter Server writes external loader initialization and completion messages in the session log. For details on external loader performance, check the external loader log. The loader saves the log in the same directory as the target flat files (default location: $PMTargetFileDir). The default extension for external loader logs is .ldrlog.

Table 20-1. Partitioning Guidelines for External Loaders

External Loader Load Behavior

DB2 EE db2load Cannot load from multiple output files.

DB2 EEE autoloader Cannot load from multiple output files.*

Oracle Behavior based on parallel load configuration:- Disabled. Cannot load from multiple output files.- Enabled. Can load from multiple output files.

Sybase IQ Cannot load from multiple output files.

Teradata MultiLoad Cannot load from multiple output files.

Teradata TPump Can load from multiple output files.

Teradata Fastload Cannot load from multiple output files.

Teradata Warehouse Builder Can load from multiple output files.

*The PowerCenter Server cannot pass multiple output files to the DB2 EEE autoloader.

External Loader Behavior 527

Loading to DB2

The DB2 EE external loader and DB2 EEE external loader can perform insert and replace operations on targets. The external loaders can also restart or terminate load operations.

The DB2 EE external loader invokes the db2load executable located in the PowerCenter Server installation directory. The DB2 EE external loader can load data to a DB2 server on a machine that is remote to the PowerCenter Server.

The DB2 EEE external loader invokes the IBM DB2 Autoloader program to load data. The Autoloader program uses the db2atld executable. The DB2 EEE external loader can partition data and load the partitioned data simultaneously to the corresponding database partitions. When you use the DB2 EEE external loader, the PowerCenter Server and the DB2 EEE server must be on the same machine.

The DB2 external loaders load from a delimited flat file. Verify that the target table columns are wide enough to store all of the data.

If you select a DB2 loader in a session with multiple partitions, the session fails. For more information about partitioning sessions with external loaders, see “Partitioning Sessions with External Loaders” on page 526.

If you configure multiple targets in the same pipeline to use DB2 external loaders, each loader must load to a different tablespace on the target database. For information on selecting external loaders, see “Configuring External Loading in a Session” on page 553.

When you load data to a DB2 database using the DB2 EE or DB2 EEE external loader, you must have the correct authority levels and privileges to load data to the database tables.

Setting DB2 External Loader Operation ModesDB2 operation modes specify the type of load the external loader runs. You can configure the DB2 EE or DB2 EEE external loader to run in one of the following operation modes:

♦ Insert. Adds loaded data to the table without changing existing table data.

♦ Replace. Deletes all existing data from the table, and inserts the loaded data. The table and index definitions do not change.

♦ Restart. Restarts a previously interrupted load operation.

♦ Terminate. Terminates a previously interrupted load operation and rolls back the operation to the starting point, even if consistency points were passed. The tablespaces return to normal state, and all table objects are made consistent.

Configuring Authorities, Privileges, and PermissionsWhen you load data to a DB2 database using the DB2 EE or DB2 EEE external loader, you must have the correct authority levels and privileges to load data to the database tables.


DB2 privileges allow you to create or access database resources. Authority levels provide a method of grouping privileges and higher-level database manager maintenance and utility operations. Together, these act to control access to the database manager and its database objects. You can access objects for which you have the required privilege or authority.

To load data into a table, you must have one of the following authorities:

♦ SYSADM authority

♦ DBADM authority

♦ LOAD authority on the database, and one of the following privileges:

− INSERT privilege on the table when the load utility is invoked in INSERT mode, TERMINATE mode (to terminate a previous load insert operation), or RESTART mode (to restart a previous load insert operation)

− INSERT and DELETE privilege on the table when the load utility is invoked in REPLACE mode, TERMINATE mode (to terminate a previous load replace operation), or RESTART mode (to restart a previous load replace operation)

In addition, you must have proper read access and read/write permissions:

♦ The database instance owner must have read access to the external loader input files.

♦ If you run DB2 as a service on Windows, you must configure the service start account with a user account that has read/write permissions to use LAN resources, including drives, directories, and files.

♦ If you load to DB2 EEE, the database instance owner must have write access to the load dump file and the load temporary file.

For more information, consult your IBM DB2 database documentation.

Configuring DB2 EE External Loader AttributesTable 20-2 describes attributes for DB2 EE external loader connections:

Table 20-2. DB2 EE External Loader Attributes

Attributes Default Value Description

Opmode Insert The DB2 external loader operation mode. Choose one of the following operation modes:- Insert- Replace- Restart- TerminateFor more information about DB2 operation modes, see �Setting DB2 External Loader Operation Modes� on page 528.

External Loader Executable

db2load The name of the DB2 EE external loader executable file.

Loading to DB2 529

DB2 EE External Loader Return CodesThe DB2 EE external loader indicates the success or failure of a load operation with a return code. The PowerCenter Server writes the external loader return code to the session log. Return code (0) indicates that the load operation succeeded. The Informatica Server writes the following message to the session log if the external loader successfully completes the load operation:

WRT_8029 External loader process <external loader name> exited successfully.

Any other return code indicates that the load operation failed. The PowerCenter Server writes the following error message to the session log:

WRT_8047 Error: External loader process <external loader name> exited with error <return code>.

Table 20-3 describes the return codes for the DB2 EE external loader:

Configuring DB2 EEE External Loader AttributesYou can configure the DB2 EEE external loader to use different loading modes when loading to the database. Loading modes determine how the DB2 EEE external loader loads data across

DB2 Server Location Remote The location of the DB2 EE database server relative to the PowerCenter Server. Select Local if the DB2 EE database server resides on the PowerCenter Server machine. Select Remote if the DB2 EE Server resides on another machine.

Is Staged Disabled The method of loading data. Select Is Staged to load data to a flat file staging area before loading to the database. Otherwise, the data is loaded to the database using a named pipe. For more information, see �Loading Data Using Named Pipes� on page 526 or �Staging Data to Flat Files� on page 526.

Recoverable Enabled Sets tablespaces in backup pending state if forward recovery is enabled. If you disable forward recovery, the DB2 tablespace will not set to backup pending state. If the DB2 tablespace is in backup pending state, you must fully back up the database before you perform any other operation on the tablespace.

Table 20-3. DB2 EE External Loader Return Codes

Code Description

0 The external loader operation completed successfully.

1 The external loader cannot locate the control file.

2 The external loader could not open the external loader log file.

3 The external loader could not access the control file because the control file is locked by another process.

4 The DB2 database returned an error.

Table 20-2. DB2 EE External Loader Attributes

Attributes Default Value Description


partitions in the database. You can configure the DB2 EEE external loader to use the following loading modes:

♦ Split and load. The DB2 EEE external loader partitions the data and loads it simultaneously on the corresponding database partitions.

♦ Split only. The DB2 EEE external loader partitions the data and writes the output to files in the specified split file directory.

♦ Load only. The DB2 EEE external loader does not partition the data. It loads data in existing split files on the corresponding database partitions.

♦ Analyze. The DB2 EEE external loader generates an optimal partitioning map with even distribution across all database partitions. If you run the external loader in split and load mode after you run it in analyze mode, the external loader uses the optimal partitioning map to partition the data.

For more information about DB2 loading modes, consult your DB2 database documentation.

The DB2 EEE external loader also writes multiple external loader logs. The number of external loader logs depends on the number of database partitions to which the external loader loads data. For each partition, the external loader appends a number corresponding to the partition number to the external loader log file name. The DB2 EEE external loader log file format is file_name.ldrlog.partition_number.

The PowerCenter Server does not archive or overwrite DB2 EEE external loader logs. If an external loader log of the same name exists when the external loader runs, the external loader appends new external loader log messages to the end of the existing external loader log file. You must manually archive or delete the external loader log files. For details on log files generated by DB2 Autoload, consult your DB2 documentation.

For information on DB2 EEE external loader return codes, consult your DB2 documentation.

Table 20-4 describes attributes for DB2 EEE external loader connections:

Table 20-4. DB2 EEE External Loader Attributes

Attribute Default Value Description

Opmode Insert The DB2 external loader operation mode. Choose one of the following operation modes:- Insert- Replace- Restart- TerminateFor more information about DB2 operation modes, see �Setting DB2 External Loader Operation Modes� on page 528.


db2atld The name of the DB2 EEE external loader executable file.

Split File Location n/a The location of the split files. The external loader creates split files if you configure SPLIT_ONLY loading mode.

Output Nodes n/a The database partitions on which the load operation is to be performed.

Loading to DB2 531

Split Nodes n/a The database partitions that determine how to split the data. If you do not specify this attribute, the external loader automatically determines an optimal splitting method.

Mode Split and load

The loading mode the external loader uses to load the data. Choose one of the following loading modes:- Split and load- Split only- Load only- Analyze

Max Num Splitters 25 Maximum number of splitter processes.

Force No Forces the external loader operation to continue even if it determines at startup time that some target partitions or tablespaces are offline.

Status Interval 100 Number of megabytes of data the external loader loads before writing a progress message to the external loader log. You can specify a value between 1 and 4,000 MB.

Ports 6000-6063 The range of TCP ports the external loader uses to create sockets for internal communications with the DB2 server.

Check Level Nocheck Specifies whether the external loader should check for record truncation during input or output.

Map File Input n/a The name of the file that specifies the partitioning map. If you want to use a customized partitioning map, you must specify this attribute. You can generate a customized partitioning map when you run the external loader in Analyze loading mode.

Map File Output n/a The name of the partitioning map when you run the external loader in Analyze loading mode. You must specify this attribute if you want to run the external loader in Analyze loading mode.

Trace 0 The number of rows the external loader traces when you need to review a dump of the data conversion process and output of hashing values.


Date Format mm/dd/yyyy

The date format. The date format in the Connection Object definition must match the date format you define in the target definition. DB2 supports the following date formats:- mm/dd/yyyy- yyyy-mm-dd- dd.mm.yyyy- yyyy-mm-dd

Table 20-4. DB2 EEE External Loader Attributes



Loading to Oracle

The Oracle SQL loader can perform insert, update, and delete operations on targets. The target flat file for an Oracle external loader can be fixed-width or delimited.

Loading Multibyte Data to OracleWhen you load multibyte data to Oracle, data precision is measured in bytes for fixed-width files and in characters for delimited files. Make sure the target table columns are wide enough to store all the data without risking data truncation. To widen the columns, increase the column size in the target table definition.

Oracle supports character-oriented datatypes, such as Nchar, where the precision is measured in characters. If you use the Nchar datatype, multiply the maximum number of characters by K, where K is the maximum number of bytes a character contains in the selected target code page. This ensures that the PowerCenter Server does not truncate data before loading the target file.

Note: If you configure a session to write to an Oracle 8 table in bulk mode with NOT NULL constraints on any columns, the session may write null data into a NOT NULL column.

Oracle External Loader AttributesUse the following guidelines when you enter attributes for the Oracle external loader connection:

♦ If you select an Oracle external loader, the default external loader executable name is SQLLOAD. This is accurate for most UNIX platforms, but if you use Windows, check your Oracle documentation to find the name of the external loader executable.

♦ Select Do Not Enable Parallel Load to write to a non-partitioned Oracle target table.

♦ To write to a partitioned Oracle target using Direct Path, you must select Enable Parallel Load and Append load mode.

♦ To write to a partitioned Oracle target using Conventional Path, select Enable Parallel Load for best performance.

Tip: For optimal performance, select Direct Path when writing to a partitioned Oracle target. For details, see your Oracle documentation.

Loading to Oracle 533

Table 20-5 describes the attributes for Oracle external loader connections:

Reject FileThe Oracle external loader creates a reject file for data rejected by the database. The reject file has an extension of .ldrreject. The loader saves the reject file in the target files directory (default location: $PMTargetFileDir).

Table 20-5. Oracle External Loader Attributes


Error Limit 1 Number of errors to allow before the external loader stops the load operation.

Load Mode Append The loading mode the external loader uses to load data. Choose from one of the following loading modes:- Append- Insert- Replace- Truncate

Load Method Use Conventional Path

The method the external loader uses to load data. Choose from one of the following load methods:- Use Conventional Path- Use Direct Path (Recoverable)- Use Direct Path (Unrecoverable)

Enable Parallel Load

Enable Parallel Load

Determines whether the Oracle external loader loads data in parallel to a partitioned Oracle target table. Choose either Enable Parallel Load or Do Not Enable Parallel Load. You can create multiple partitions in a session if you use a loader configured to enable parallel load. Sessions with multiple partitions fail if you use a loader configured not to enable parallel load. For more information, see �Partitioning Sessions with External Loaders� on page 526.

Rows Per Commit 10000 For Conventional Path load method, this attribute specifies the number of rows in the bind array for load operations. For Direct Path load methods, this attribute specifies the number of rows the external loader reads from the target flat file before it saves the data to the database.


sqlload The name of the external loader executable file.

Log File Name n/a The path and name of the external loader log file.



Loading to Sybase IQ

The Sybase external loader can perform insert operations on Sybase IQ targets. It cannot perform update or delete operations on targets.

Use the following rules and guidelines when you work with a Sybase IQ external loader:

♦ Ensure that target tables do not violate primary key constraints.

♦ Configure a Sybase IQ user with read/write access before you use a Sybase IQ external loader.

♦ Target flat files for a Sybase IQ external loader can be fixed-width or delimited.

♦ The PowerCenter Server can load multibyte data to Sybase IQ targets.

♦ If you select a Sybase IQ external loader in a session with multiple partitions, the session fails. For more information about partitioning sessions with external loaders, see “Partitioning Sessions with External Loaders” on page 526.

♦ If the PowerCenter Server and Sybase IQ Server are on different machines, map a drive from the machine hosting the PowerCenter Server to the machine hosting the Sybase IQ Server. In a UNIX environment, mount the drive.

Using Sybase IQ External Loader on UNIXFor Sybase IQ external loaders, the PowerCenter Server can write to a named pipe if the PowerCenter Server is local to the Sybase IQ database. Use pmconfig to enable the SybaseIQLocaltoPMServer option. If the PowerCenter Server is not local to the Sybase IQ database server or if you do not enable the option, the PowerCenter Server writes to a flat file.

Loading Multibyte Data to Sybase IQWhen you load multibyte data to Sybase IQ targets, consider the following issues involving data precision and delimiters.

Fixed-Width Flat File TargetsIf you plan to load multibyte data into a fixed-width flat file target, configure the precision to accommodate the multibyte data. Fixed-width files are byte-oriented, not character-oriented. So when you configure the precision for a fixed-width target, you need to consider the number of bytes you load into the target, rather than the number of characters. The PowerCenter Server writes the row to the reject file if the precision is not large enough to accommodate the multibyte data.

For more information about writing to flat files, see “Working with File Targets” on page 261.

Loading to Sybase IQ 535

Delimited Flat File TargetsFor delimited flat files, data precision is measured in characters. When you insert multibyte character data in the target, you do not need to allow for additional precision for multibyte data. Sybase IQ does not allow optional quotes. You must choose None for Optional Quotes if you have a delimited target flat file.

When you load multibyte data to Sybase IQ targets, null characters and delimiters can be up to four bytes each. To avoid reading the delimiters as regular characters, each byte of the delimiter must have an ASCII value of less than 0x40. For details on loading multibyte data to targets, see “Working with File Targets” on page 261.

Sybase IQ External Loader AttributesUse the following guidelines when you enter attributes for the Sybase IQ external loader connection:

♦ The connect string must contain the following attributes:

uid=user ID; pwd=password; eng=Sybase IQ database server name; links=tcpip; (host=host name; port=port number)

♦ The server datafile directory is relative to the database server.

If the directory is in a Windows system, use a backslashes (\) in the directory path:

D:\mydirectory\inputfile.out

If the directory is in a UNIX system, use a forward slash (/):

/mydirectory/inputfile.out

♦ When you create a Sybase IQ external loader connection, the Workflow Manager sets the name of the external loader executable file to dbisql by default. If you use an executable file with a different name, for example, dbisqlc, you must update the External Loader Executable field. If the external loader executable file directory is not in the system path, you must enter the file path and file name in this field.

Table 20-6 describes the attributes for Sybase IQ external loader connections:

Table 20-6. Sybase IQ External Loader Attributes


Block Factor 10000 The number of records per block in the target Sybase table. The external loader applies the Block Factor attribute to load operations for fixed-width flat file targets only.

Block Size 50000 The size of blocks used in Sybase database operations. The external loader applies the Block Size attribute to load operations for delimited flat file targets only.

Checkpoint Enabled If enabled, the Sybase IQ database issues a checkpoint after successfully loading the table. If disabled, the database issues no checkpoints.


Notify Interval 1000 The number of rows the Sybase IQ external loader loads before it writes a status message to the external loader log.

Server Datafile Directory n/a The location of the flat file target. You must specify this attribute relative to the database server installation directory. Enter the target file directory path using the syntax for the machine hosting the database server installation. For example, if the PowerCenter Server is on a Windows machine and the Sybase IQ Server is on a UNIX machine, use UNIX syntax.


dbisql The name of the Sybase IQ external loader executable.

Is Staged Enabled The method of loading data. Select Is Staged to load data to a flat file staging area before loading to the database. Otherwise, the data is loaded to the database using a named pipe. For more information, see �Loading Data Using Named Pipes� on page 526 or �Staging Data to Flat Files� on page 526.

Table 20-6. Sybase IQ External Loader Attributes


Loading to Sybase IQ 537

Loading to Teradata

When you load to Teradata, you can use the following external loaders:

♦ Multiload. Performs insert, update, delete, and upsert operations for large volume incremental loads. You can use this loader when you run a session with a single partition. Multiload acquires table level locks, making it appropriate for offline loading. For more information about configuring the Multiload external loader connection object, see “Teradata MultiLoad External Loader Attributes” on page 540.

♦ TPump. Performs insert, update, delete, and upsert operations for relatively low volume updates. You can use this loader when you run a session with multiple partitions. TPump acquires row-hash locks on the table, allowing other users to access the table as TPump loads to it. For more information about configuring the Tpump external loader connection object, see “Teradata TPump External Loader Attributes” on page 542.

♦ FastLoad. Performs insert operations for high volume initial loads, or for high volume truncate and reload operations. You can use this loader when you run a session with a single partition. You can only use this loader on empty tables with no secondary indexes. For more information about configuring the FastLoad external loader connection object, see “Teradata FastLoad External Loader Attributes” on page 545.

♦ Warehouse Builder. Performs insert, update, upsert, and delete operations on targets. You can use this loader when you run a session with multiple partitions. You can achieve the functionality of the other loaders based on the operator you use. For more information about configuring the Warehouse Builder external loader connection object, see “Teradata Warehouse Builder External Loader Attributes” on page 547.

If you use a Teradata external loader to perform update or upsert, you can use the Target Update Override option in the Mapping Designer to override the UPDATE statement in the external loader control file. For upsert, the INSERT statement in the external loader control file remains unchanged. For details on using the Target Update Override option, see “Mappings” in the Designer Guide.

Use the following guidelines when you use the Teradata external loaders:

♦ The PowerCenter Server can use Teradata external loaders to load fixed-width flat files to a Teradata database.

♦ The target output file name, including the file extension, must not exceed 27 characters. If the session contains multiple partitions, the target output file name, including the file extension, must not exceed 25 characters.

♦ You cannot use spaces as null characters.

♦ You can use the Teradata external loaders to load multibyte data.

♦ You cannot use the Teradata external loaders to load binary data.

♦ When you load to Teradata using named pipes, set the checkpoint value to 0 to prevent external loaders from performing checkpoint operations.

♦ When you edit a session, you can specify error, log, or work table names, depending on the loader you use. You can also specify error, log, or work database names.


♦ When you edit a session, you can override the control file in the loader connection properties.

You can view the Teradata control file in the target directory.

See the Teradata documentation for more information about the loaders.

Overriding the Control FileWhen you edit the loader connection in a session, you can override the control file. You might want to override the control file to change some loader properties that you cannot edit in the loader connection. For example, you can specify the tracing option in the control file.

When you override the control file, the Workflow Manager saves the control file to the repository. The PowerCenter Server uses the saved control file when you run the session again. If you do not override the control file, the PowerCenter Server generates a control file based on the session and loader properties by default. It saves the control file in the output file directory by default, but it does not use the control file the next time it runs the session.

To override the control file, override the loader connection for the target in the session. Click the Edit button in the Control File Content Override loader property.

Figure 20-1 shows the Control File Editor dialog box where you override the Teradata control file:

In the Control File Editor dialog box, click Generate to create the default control file. The Workflow Manager creates the default control file based on the session and loader properties. Edit the generated control file, and click OK to save your changes.

Note that if you change a target or loader connection setting after you edit the control file, the control file does not include those changes. If you want to include those changes, you must generate the control file again and edit it.

Note: The Workflow Manager does not validate the control file syntax. Teradata verifies the control file syntax when you run a session. If the control file is invalid, the session fails.

Figure 20-1. Control File Editor Dialog Box for Teradata

Loading to Teradata 539

Teradata MultiLoad External Loader AttributesYou can configure the external loader connection object in the Workflow Manager. You can also override the external loader connection object attributes when you edit a reusable or non-reusable session.

Use the following guidelines when you work with the MultiLoad external loader:

♦ You can perform insert, update, delete, and upsert operations on targets. You can also use data driven mode to perform insert, update, or delete operations based on instructions coded in an Update Strategy or Custom transformation within a mapping.

♦ The MultiLoad external loader cannot load from multiple output files. If you run a session with multiple partitions, the session fails. For more information about partitioning sessions with external loaders, see “Partitioning Sessions with External Loaders” on page 526.

♦ If you invoke a greater number of sessions than the maximum number of concurrent sessions the database allows, the session may hang. You can set the minimum value for Tenacity and Sleep to ensure that sessions fail rather than hang.

Table 20-7 shows the attributes that you configure for the Teradata MultiLoad external loader:

Table 20-7. Teradata MultiLoad External Loader Attributes


TDPID n/a The Teradata database ID.

Database Name n/a Optional database name.

Date Format n/a The date format. The date format in the Connection Object definition must match the date format you define in the target definition. The PowerCenter Server supports the following date formats:- dd/mm/yyyy- mm/dd/yyyy- yyyy/dd/mm- yyyy/mm/dd

Error Limit 0 The total number of rejected records that MultiLoad can write to the MultiLoad error tables. Uniqueness violations do not count as rejected records.An error limit of 0 means that there is no limit on the number of rejected rows.

Checkpoint 10,000 The interval between checkpoints. You can set the interval to the following values:- 60 or more: MultiLoad performs a checkpoint operation after it processes each

multiple of that number of records.- 1�59: MultiLoad performs a checkpoint operation at the specified interval, in

minutes.- 0: MultiLoad does not perform any checkpoint operations during the import task.

Tenacity 10,000 Specifies how long, in hours, MultiLoad tries to log onto the required sessions. If a logon fails, MultiLoad delays for the number of minutes specified in the Sleep attribute, and then retries the logon. MultiLoad keeps trying until the logon succeeds or the number of hours specified in the Tenacity attribute elapses.


Load Mode Upsert The mode to generate SQL commands: Insert, Delete, Update, Upsert, or Data Driven. When you select Data Driven loading, the PowerCenter Server follows instructions coded in an Update Strategy or Custom transformations within the mapping to determine how to flag rows for insert, delete, or update. The PowerCenter Server writes a column in the target file or named pipe to indicate the update strategy. The control file uses these values to determine how to load data to the target. The PowerCenter Server uses the following values to indicate the update strategy:0 - Insert1 - Update2 - Delete

Drop Error Tables Enabled Specifies whether to drop the MultiLoad error tables before beginning the next session. Select this option to drop the tables, or clear it to keep them.


mload The name and optional file path of the Teradata external loader executable. If the external loader executable directory is not in the system path, you must enter the file path and filename.

Max Sessions 1 The maximum number of MultiLoad sessions per MultiLoad job. Max Sessions must be between 1 and 32,767.Running multiple MultiLoad sessions causes the client and database to use more resources. Therefore, setting this value to a small number may improve performance.

Sleep 6 The number of minutes MultiLoad waits before retrying a logon. MultiLoad tries until the logon succeeds or the number of hours specified in the Tenacity attribute elapses.Sleep must be greater than 0. If you specify 0, MultiLoad issues an error message and uses the default value, 6 minutes.


Error Database n/a The error database name. You can use this attribute to override the default error database name. If you do not specify a database name, the PowerCenter Server uses the target table database.

Work Table Database

n/a The work table database name. You can use this attribute to override the default work table database name. If you do not specify a database name, the PowerCenter Server uses the target table database.

Log Table Database

n/a The log table database name. You can use this attribute to override the default log table database name. If you do not specify a database name, the PowerCenter Server uses the target table database.

Table 20-7. Teradata MultiLoad External Loader Attributes



Table 20-8 shows the attributes that you configure when you edit a session and override the Teradata MultiLoad external loader connection object:

For more information about these attributes, consult your Teradata documentation.

Teradata TPump External Loader AttributesYou can configure the external loader connection object in the Workflow Manager. You can also override the external loader connection object attributes when you edit a reusable or non-reusable session.

You can perform insert, update, delete, and upsert operations on targets. You can also use data driven mode to perform insert, update, or delete operations based on instructions coded in an Update Strategy or Custom transformation within a mapping.

If you run a session with multiple partitions, you can use a TPump external loader to load the output files to a Teradata database. You must select a Teradata TPump external loader for each partition. For information on selecting external loaders, see “Configuring External Loading in a Session” on page 553.

Table 20-9 shows the attributes that you configure for the Teradata TPump external loader:

Table 20-8. Teradata MultiLoad External Loader Attributes Defined at the Session Level


Error Table 1 n/a The table name for the first error table. You can use this attribute to override the default error table name. If you do not specify an error table name, the PowerCenter Server uses ET_<target_table_name>.

Error Table 2 n/a The table name for the second error table. You can use this attribute to override the default error table name. If you do not specify an error table name, the PowerCenter Server uses UV_<target_table_name>.

Work Table n/a The work table name. You can use this attribute to override the default work table name. If you do not specify a work table name, the PowerCenter Server uses WT_<target_table_name>.

Log Table n/a The log table name. You can use this attribute to override the default log table name. If you do not specify a log table name, the PowerCenter Server uses ML_<target_table_name>.

Control File Content Override

n/a The control file text. You can use this attribute to override the control file the PowerCenter Server uses when it loads to Teradata. For more information, see �Overriding the Control File� on page 539.

Table 20-9. Teradata TPump External Loader Attributes



Database Name n/a Optional database name.


Error Limit 0 Limits the number of rows rejected for errors. When the error limit is exceeded, TPump rolls back the transaction that causes the last error. An error limit of 0 causes TPump to stop processing after any error.

Checkpoint 15 The number of minutes between checkpoints. You must set the checkpoint to a value between 0 and 60.

Tenacity 4 Specifies how long, in hours, TPump tries to log onto the required sessions. If a logon fails, TPump delays for the number of minutes specified in the Sleep attribute, and then retries the logon. TPump keeps trying until the logon succeeds or the number of hours specified in the Tenacity attribute elapses.To disable Tenacity, set the value to 0.

Load Mode Upsert The mode to generate SQL commands: Insert, Delete, Update, Upsert, or Data Driven.When you select Data Driven loading, the PowerCenter Server follows instructions coded in an Update Strategy or Custom transformations within the session mapping to determine how to flag rows for insert, delete, or update. The PowerCenter Server writes a column in the target file or named pipe to indicate the update strategy. The control file uses these values to determine how to load data to the database. The PowerCenter Server uses the following values to indicate the update strategy:0 - Insert1 - Update2 - Delete

Drop Error Tables Enabled Specifies whether to drop the TPump error tables before beginning the next session. Select this option to drop the tables, or clear it to keep them.


tpump The name and optional file path of the Teradata external loader executable. If the external loader executable directory is not in the system path, you must enter the file path and filename.

Max Sessions 1 The maximum number of TPump sessions per TPump job. Each partition in a session starts its own TPump job. Running multiple TPump sessions causes the client and database to use more resources. Therefore, setting this value to a small number may improve performance.

Sleep 6 The number of minutes TPump waits before retrying a logon. TPump tries until the logon succeeds or the number of hours specified in the Tenacity attribute elapses.

Packing Factor 20 The number of rows that each session buffer holds. Packing improves network/channel efficiency by reducing the number of sends and receives between the target flat file and the Teradata database.

Statement Rate 0 The initial maximum rate, per minute, at which the TPump executable sends statements to the Teradata database. If you set this attribute to 0, the statement rate is unspecified.




Table 20-10 shows the attributes that you configure when you edit a session and override the Teradata TPump external loader connection object:

Serialize Disabled Determines whether or not operations on a given key combination (row) occur serially.You may want to check this option if the TPump job contains multiple changes to one row. Sessions that contain multiple partitions with the same key range but different filter conditions may cause multiple changes to a single row. In this case, you may want to enable Serialize to prevent locking conflicts in the Teradata database, especially if you set the Pack attribute to a value greater than 1.If you select this option, the PowerCenter Server uses the primary key specified in the target table as the Key column. If no primary key exists in the target table, you must either clear this checkbox or indicate the Key column in the data layout section of the control file.

Robust Disabled When Robust is not selected, it signals TPump to use simple restart logic. In this case, restarts cause TPump to begin at the last checkpoint. TPump reloads any data that was loaded after the checkpoint. This method does not have the extra overhead of the additional database writes in the robust logic.

No Monitor Enabled When selected, this attribute prevents TPump from checking for statement rate changes from, or update status information for, the TPump monitor application.



Log Table Database


Table 20-10. Teradata TPump External Loader Attributes Defined at the Session Level


Error Table n/a The error table name. You can use this attribute to override the default error table name. If you do not specify an error table name, the PowerCenter Server uses ET_<target_table_name><partition_number>.

Log Table n/a The log table name. You can use this attribute to override the default log table name. If you do not specify a log table name, the PowerCenter Server uses LT_<target_table_name><partition_number>.







Teradata FastLoad External Loader AttributesYou can configure the external loader connection object in the Workflow Manager. You can also override the external loader connection object attributes when you edit a reusable or non-reusable session.

Use the following guidelines with the FastLoad external loader:

♦ Each FastLoad job loads data to one Teradata database table. If you want to load data to multiple tables using FastLoad, you must create multiple FastLoad jobs.

♦ The FastLoad external loader cannot load from multiple output files. If you run a session with multiple partitions, the session fails. For more information about partitioning sessions with external loaders, see “Partitioning Sessions with External Loaders” on page 526.

♦ The target table must be empty with no defined secondary indexes.

♦ FastLoad does not load duplicate rows from the output file to the target table in the Teradata database if the target table has a primary key.

♦ If you load date values to the target table, you must configure the date format for the column in the target table in the format YYYY-MM-DD.

♦ You cannot use FastLoad to load binary data.

You can view the Teradata FastLoad control file in the target directory.

Table 20-11 shows the attributes that you configure for the Teradata FastLoad external loader:

Table 20-11. Teradata FastLoad External Loader Attributes



Database Name n/a The database name.

Error Limit 1,000,000 The maximum number of rows that FastLoad rejects before it stops loading data to the database table.

Checkpoint 0 The number of rows transmitted to the Teradata database between checkpoints. If processing stops while a FastLoad job is running, you can restart the job at the most recent checkpoint.If you enter 0, FastLoad does not perform checkpoint operations.

Tenacity 4 The number of hours FastLoad tries to log on to the required FastLoad sessions when the maximum number of load jobs are already running on the Teradata database. When FastLoad tries to log on for a new session, and the Teradata database indicates that the maximum number of load sessions is already running, FastLoad logs off all new sessions that were logged on, delays for the number of minutes specified in the Sleep attribute, and then retries the logon. FastLoad keeps trying until it logs on for the required number of sessions or exceeds the number of hours specified in the Tenacity attribute.


Table 20-12 shows the attributes that you configure when you edit a session and override the Teradata FastLoad external loader connection object:


Drop Error Tables Enabled Specifies whether to drop the FastLoad error tables before beginning the next session. FastLoad will not run if non-empty error tables exist from a prior job.Select this option to drop the tables, or clear it to keep them.


fastload The name and optional file path of the Teradata external loader executable. If the external loader executable directory is not in the system path, you must enter the file path and file name.

Max Sessions 1 The maximum number of FastLoad sessions per FastLoad job. Max Sessions must be between 1 and the total number of access module processes (AMPs) on your system.

Sleep 6 The number of minutes FastLoad pauses before retrying a logon. FastLoad tries until the logon succeeds or the number of hours specified in the Tenacity attribute elapses.

Truncate Target Table

Disabled Specifies whether to truncate the target database table before beginning the FastLoad job. FastLoad cannot load data to non-empty tables.



Table 20-12. Teradata FastLoad External Loader Attributes Defined at the Session Level






Table 20-11. Teradata FastLoad External Loader Attributes



Teradata Warehouse Builder External Loader AttributesYou can configure the external loader connection object in the Workflow Manager. You can also override the external loader connection object attributes when you edit a reusable or non-reusable session.

If you run a session with multiple partitions, you can use a Warehouse Builder external loader to load the output files to a Teradata database. You must select a Teradata Warehouse Builder external loader for each partition. For information on selecting external loaders, see “Configuring External Loading in a Session” on page 553.

Teradata Warehouse Builder uses operators to load data. Operators allow the Teradata Warehouse Builder to achieve the functionality of FastLoad, MultiLoad, or TPump. When you use Teradata Warehouse Builder, each operator uses the protocol for a Teradata external loader.

Table 20-13 shows the operators and protocol for each Teradata Warehouse Builder operator:

Each Teradata Warehouse Builder operator has associated attributes. Not all attributes available for FastLoad, MultiLoad, and TPump external loaders are available for Teradata Warehouse Builder.

Table 20-14 shows the attributes that you configure for Teradata Warehouse Builder:

Table 20-13. Teradata Warehouse Builder Operators and Protocol

Operator Protocol

Load Uses FastLoad protocol. Load attributes are described in Table 20-14. For more information about how FastLoad works, see �Teradata FastLoad External Loader Attributes� on page 545.

Update Uses MultiLoad protocol. Update attributes are described in Table 20-14. For more information about how MultiLoad works, see �Teradata MultiLoad External Loader Attributes� on page 540.

Stream Uses TPump protocol. Stream attributes are described in Table 20-14. For more information about how TPump works, see �Teradata TPump External Loader Attributes� on page 542.

Table 20-14. Teradata Warehouse Builder External Loader Attributes



Database Name n/a The database name.

Error Database Name

n/a The name of the error database.

Operator Update The Warehouse Builder operator used to load the data. Choose Load, Update, or Stream.

Max instances 4 The maximum number of parallel instances for the defined operator.


Error Limit 0 The maximum number of rows that Warehouse Builder rejects before it stops loading data to the database table.

Checkpoint 0 The number of rows transmitted to the Teradata database between checkpoints. If processing stops while a Warehouse Builder job is running, you can restart the job at the most recent checkpoint.If you enter 0, Warehouse Builder does not perform checkpoint operations.

Tenacity 4 The number of hours Warehouse Builder tries to log on to the Warehouse Builder sessions when the maximum number of load jobs are already running on the Teradata database. When Warehouse Builder tries to log on for a new session, and the Teradata database indicates that the maximum number of load sessions is already running, Warehouse Builder logs off all new sessions that were logged on, delays for the number of minutes specified in the Sleep attribute, and then retries the logon. Warehouse Builder keeps trying until it logs on for the required number of sessions or exceeds the number of hours specified in the Tenacity attribute.To disable Tenacity, set the value to 0.

Load Mode Upsert The mode to generate SQL commands. Choose Insert, Update, Upsert, Delete or Data Driven.When you use the Update or Stream operators, you can choose Data Driven load mode. When you select data driven loading, the PowerCenter Server follows instructions coded in Update Strategy or Custom transformations within the mapping to determine how to flag rows for insert, delete, or update. The PowerCenter Server writes a column in the target file or named pipe to indicate the update strategy. The control file uses these values to determine how to load data to the database. The PowerCenter Server uses the following values to indicate the update strategy:0 - Insert1 - Update2 - Delete

Drop Error Tables Enabled Specifies whether to drop the Warehouse Builder error tables before beginning the next session. Warehouse Builder will not run if error tables containing data exist from a prior job. Clear the option to keep error tables.

Truncate Target Table

Disabled Specifies whether to truncate target tables. Enable this option to truncate the target database table before beginning the Warehouse Builder job.


tbuild The name and optional file path of the Teradata external loader executable file. If the external loader directory is not in the system path, enter the file path and file name.

Max Sessions 4 The maximum number of Warehouse Builder sessions per Warehouse Builder job. Max Sessions must be between 1 and the total number of access module processes (AMPs) on your system.

Sleep 6 The number of minutes Warehouse Builder pauses before retrying a logon. Warehouse Builder tries until the logon succeeds or the number of hours specified in the Tenacity attribute elapses.

Serialize Disabled Specifies whether operations on a column occur serially. Enabled with Update and Stream operators only.




Table 20-15 shows the attributes that you configure when you edit a session and override Teradata Warehouse Builder external loader connection object:

Packing Factor 20 The number of rows that each session buffer holds. Packing improves network/channel efficiency by reducing the number of sends and receives between the target file ad the Teradata database. Enabled with Stream operator only.

Robust Disabled The recovery or restart mode. When you disable Robust, the Stream operator uses simple restart logic. The Stream operator reloads any data that was loaded after the last checkpoint. When you enable Robust, Warehouse Builder uses robust restart logic. In robust mode, the Stream operator determines how many rows were processed since the last checkpoint. The Stream operator processes all the rows that were not processed after the last checkpoint. Enabled with Stream operator only.



Work Table Database

n/a The work table database name. You can use this attribute to override the default work table database name. If you do not specify a database name, the PowerCenter Server uses the target table database.

Log Table Database


Note: Valid attributes depend upon the operator you select.

Table 20-15. Teradata Warehouse Builder External Loader Attributes Defined at the Session Level




Work Table n/a The work table name. You can use this attribute to override the default work table name. If you do not specify a work table name, the PowerCenter Server uses WT_<target_table_name>.

Log Table n/a The log table name. You can use this attribute to override the default log table name. If you do not specify a log table name, the PowerCenter Server uses RL_<target_table_name>.







Note: Valid attributes depend upon the operator you select.

Table 20-15. Teradata Warehouse Builder External Loader Attributes Defined at the Session Level



Creating an External Loader Connection

The PowerCenter Server uses external loader attributes to create an external loader connection. You enter external loader attributes in the Workflow Manager when you create an external loader connection.

When you configure external loader settings, you may need to consult your DB2, Oracle SQL Loader, Sybase IQ, or Teradata documentation for details.

Tip: If you edit an external loader connection, all sessions using the connection use the updated connection.

To create an external loader connection:

1. In the Workflow Manager, choose Connections-Loader.

The Loader Connection Browser dialog box appears:

2. Click New.

Creating an External Loader Connection 551

3. Select an external loader type, and then click OK.

4. Enter a name for the external loader connection.

5. Enter the database user name, password, and connect string.

Enter the PmNullUser user name and PmNullPasswd if you use Oracle OS Authentication. PowerCenter uses Oracle OS Authentication when the connection user name is PmNullUser and the connection is with an Oracle database.

Note: When you use Teradata, you can enter PmNullPasswd as the database password to prevent the password from appearing in the control file. When you do this, the PowerCenter Server writes an empty string for the password in the control file.

6. Enter the necessary loader attributes.

7. Click OK.

8. To create additional connections, repeat steps 3-7, and then click Close to save your changes.


Configuring External Loading in a Session

Before using an external loader in a session, you must first configure the necessary connections. For more details, see “Creating an External Loader Connection” on page 551.

To use an external loader during a session, perform the following steps:

1. Configure the session to write to a file.

2. Configure the file properties.

3. Select the external loader connection.

Configuring a Session to Write to a FileWhen you want to use an external loader to write to a database, create the target definition in the mapping according to the target database type. The session configures a relational target type by default. To select an external loader connection, you must configure the session to write to a file instead of a relational target. To do this, you must change the writer type from Relational Writer to File Writer. You change the writer type using the Writers settings on the Mappings tab.

Figure 20-2 shows the Writers settings on the Mapping tab:

Figure 20-2. Writers Settings on the Mapping Tab

Target Instance

Writer Type

Configuring External Loading in a Session 553

To change the writer type for the target, select the target instance in the Instances list. Change the writer type from Relational Writer to File Writer.

Configuring File PropertiesAfter you configure the session to write to a file, you can set the file properties. You need to specify the output file name and directory, as well as the reject file name and directory. You configure these properties using the Properties settings on the Mapping tab.

Figure 20-3 shows the Properties settings on the Mapping tab:

To set the file properties, select the target instance in the Instances list.

Figure 20-3. Properties Settings on the Mapping Tab

Target Instance

Properties Settings


Table 20-16 shows the attributes in Properties settings:

Note: Do not select Merge Partitioned Files or enter a merge file name. You cannot merge partitioned output files when you use an external loader.

Selecting an External Loader ConnectionAfter you configure file properties, you are ready to select the external loader connection. To do this, you must choose the connection type and the connection object. You configure connection options using the Connections settings on the Mappings tab.

Table 20-16. Properties Settings


Output File Directory Enter the directory name in this field. By default, the PowerCenter Server writes output files to the directory $PMTargetFileDir.If you enter a full directory and file name in the Output Filename field, clear this field. External loader sessions may fail if you use double spaces in the path for the output file.

Output Filename Enter the file name, or file name and path. By default, the Workflow Manager names the target file based on the target definition used in the mapping: target_name.out. External loader sessions may fail if you use double spaces in the path for the output file.

Reject File Directory By default, the PowerCenter Server writes all reject files to the directory $PMBadFileDir.If you enter a full directory and file name in the Reject Filename field, clear this field.

Reject Filename Enter the file name, or file name and directory. The PowerCenter Server appends information in this field to that entered in the Reject File Directory field. For example, if you have �C:/reject_file/� in the Reject File Directory field, and enter �filename.bad� in the Reject Filename field, the PowerCenter Server writes rejected rows to C:/reject_file/filename.bad.By default, the PowerCenter Server names the reject file after the target instance name: target_name.bad.You can also enter a reject file session parameter to represent the reject file or the reject file and directory. Name all reject file parameters $BadFileName. For details on session parameters, see �Session Parameters� on page 495.

Set File Properties Opens a dialog box that allows you to define flat file properties. When you use an external loader, you must define the flat file properties by clicking the Set File Properties button.For Oracle external loaders, the target flat file can be fixed-width or delimited.For Sybase IQ external loaders, the target flat file can be fixed-width or delimited.For Teradata external loaders, the target flat file must be fixed-width. For DB2 external loaders, the target flat file must be delimited.For more information, see �Configuring Fixed-Width Properties� on page 265 and �Configuring Delimited Properties� on page 266.

Configuring External Loading in a Session 555

Figure 20-4 shows the Connections settings on the Mapping tab:

To select an external loader connection:

1. On the Mapping tab, select the target instance in the Navigator.

2. Select the Loader connection type.

3. Click the Open button in the Value field to select the correct external loader connection object.

4. Choose an external loader connection object, and then click OK.

5. Click OK to save your changes.

If the session contains multiple partitions, and you choose a loader that can load from multiple output files, you can select a different connection for each partition, but each connection must be of the same type. For example, you can select different Teradata TPump external loader connections for each partition, but you cannot select a Teradata TPump connection for one partition and an Oracle connection for another partition.

If the session contains multiple partitions, and you choose a loader that can load from only one output file, the session fails. For more information about running external loader sessions with multiple partitions, see “Partitioning Sessions with External Loaders” on page 526.

Figure 20-4. Connections Settings on the Mapping Tab

Target Instance

Connection Type and selected Connection Object


Troubleshooting

I am trying to set up a session to load data to an external loader, but I cannot select an external loader connection in the session properties.

Check your mapping to make sure you did not configure it to load to a flat file target. In order to use an external loader, you must configure the mapping with a DB2, Oracle, Sybase IQ, or Teradata relational target. When you create the session, select a file writer in the Writers settings of the Mapping tab in the session properties. Then open the Connections settings and select an external loader connection.

I am trying to run a session that uses TPump, but the session fails. The session log displays an error saying that the Teradata output file name is too long.

The PowerCenter Server uses the Teradata output file name to generate names for the TPump error and log files, as well as the log table name. To do this, the PowerCenter Server adds a prefix of several characters to the output file name. It adds three characters for sessions with one partition and five characters for sessions with multiple partitions.

Teradata allows log table names of up to 30 characters. Because the PowerCenter Server adds a prefix, if you are running a session with a single partition, specify a target output file name with a maximum of 27 characters, including the file extension. If you are running a session with multiple partitions, specify a target output file name with a maximum of 25 characters, including the file extension.

I tried to load data to Teradata using TPump, but the session failed. I corrected the error, but the session still fails.

Occasionally, Teradata does not drop the log table when you rerun the session. Check the Teradata database, and manually drop the log table if it exists. Then rerun the session.

Troubleshooting 557


C h a p t e r 2 1

Using FTP


♦ Overview, 560

♦ Creating an FTP Connection, 561

♦ Creating an FTP Session, 565

559

Overview

The PowerCenter Server can use File Transfer Protocol (FTP) to access source and target files. With both source and target files, you can use FTP to transfer the files directly to the PowerCenter Server or stage them on a local directory.

You can also stage files by creating a pre-session shell command to move the files local to the PowerCenter Server. Accessing files directly with FTP generally provides better session performance than using FTP to stage the files. However, you may want to stage FTP files to keep a local archive.

Before creating an FTP session, you must configure the FTP connection in the Workflow Manager. For details, see “Creating an FTP Connection” on page 561.

When using FTP file sources and targets in a session, you should know the following information:

♦ FTP connection name

♦ Remote file name and exact path

♦ Whether you want to stage the files

Mainframe NotesDue to mainframe restrictions, the following constraints apply when using FTP with mainframe machines:

♦ You cannot execute sessions concurrently if the sessions use the same FTP source file or target file located on a mainframe.

♦ If you abort a workflow containing a session with a staged FTP source or target from a mainframe, you may need to wait for the connection to timeout before you can run the workflow again.

560 Chapter 21: Using FTP

Creating an FTP Connection

The PowerCenter Server can access source and target files on remote machines using FTP. The PowerCenter Server can use FTP to access any machine to which the PowerCenter Server can connect.

Before you create a session using FTP, you must configure the FTP connection in the Workflow Manager.

You must know the following information when you create an FTP connection:

♦ Connection name. The connection name used by the Workflow Manager.

♦ Host name. The name or IP address of the remote machine. Optionally, you can specify a port number between 1 and 65535 inclusive. If you do not specify a port number, the PowerCenter Server uses the port number 21 by default. Use the following syntax for specifying a host name:

hostname:port-number

or

IP address:port-number

When you specify a port number, enable that port number for FTP on the host machine.

♦ Default remote directory. The directory you want the PowerCenter Server to use by default. In the session, when you enter a file name without a directory, the PowerCenter Server appends the file name to this directory. Therefore, this path must be exact and contain the appropriate trailing delimiters. For example, if you enter c:/data/ and in the session specify the file FILENAME, the PowerCenter Server reads the path and file name as c:\data\FILENAME.

If you enter the wrong delimiter for an FTP directory, the Workflow Manager does not correct it. If the FTP host is a mainframe machine, the directory must begin with a single quote and end with the period delimiter, such as: ‘defaultdir. You can override this option in the session properties.

Depending on the remote machine you access, you might also need to enter the user name and password. The password must be in 7-bit ASCII only. As with database connections, if you edit an FTP connection, all sessions using the FTP connection use the updated connection.

FTP PermissionsIf you enable enhanced security, you can set FTP connection permissions in the Workflow Manager. The Workflow Manager assigns Owner permissions to the user who registers the connection. The Workflow Manager grants Owner Group permissions to the first group in the Group Memberships list of the owner. You can manage FTP connection permissions if you are the owner of the connection or if you have Super User privileges.

A registered FTP connection does not appear in the list of FTP connections if you do not have at least read permission for the connection. If you want to edit a connection, you must

Creating an FTP Connection 561

have read and write permissions for the connection. If you want to run sessions that use a source or target FTP connection, you must have execute permission for the connection.

To create an FTP connection, you must have one of the following privileges:


♦ Super User

Steps for Creating an FTP ConnectionPerform the following steps to create an FTP connection.

To create an FTP connection:


2. Choose Connections-FTP. The FTP Object Browser appears.


3. Click New.

4. Enter the connection information in Table 21-1:

Table 21-1. FTP Options

FTP Option Required/Optional Description

Name Required Connection name used by the Workflow Manager.

User Name Optional User name necessary to access the host machine.

Password Optional Password for the user name. Must be in 7-bit ASCII only.

Host Name Required Host name or dotted IP address of the FTP connection.Optionally, you can specify a port number between 1 and 65535, inclusive. If you do not specify a port number, the PowerCenter Server uses 21 by default. Use the following syntax for specifying the host name: hostname:port-number-or- IP address:port-numberWhen you specify a port number, enable that port number for FTP on the host machine.

Default Remote Directory

Required Enter a valid FTP directory on the host machine.Do not enclose the default remote directory in quotation marks.The default directory name must be exact and include a trailing delimiter.Note: Depending on the FTP server you use, you may have limited options for entering FTP directories. Please see your FTP server documentation for details.

Creating an FTP Connection 563

5. Click OK.

6. Repeat steps 3-5 for any other necessary FTP connection, then click Close.


Creating an FTP Session

After defining FTP connections in the Workflow Manager, you can create sessions using FTP file sources and targets. You can use any mapping with the flat file sources or targets.

The steps to create FTP sessions vary for source and target files. You can use FTP to access both source and target files in a session.

To create a session using FTP sources and targets, you must have one of the following sets of privileges and permissions:

♦ Use Workflow Manager privilege with folder read and write permissions


You must have read permission for FTP connections you want to associate with the session in addition to the privileges and permissions listed above.

FTP File Sources Use FTP to access source files from any machine on your network, including mainframes.

To create a session using FTP source files:


2. In the Connections settings on the Mapping tab, select FTP for Type.

Select an FTP connection.

Creating an FTP Session 565

3. Click the Open button in the Value field to select an FTP connection.

4. Click Override and enter the remote file name.

If you enter a file name without a leading slash or drive letter, the PowerCenter Server appends the file name to the Default Remote Directory path entered in the FTP Connection dialog box. For example, if your default remote directory is c:/data/, and you enter a remote file name of FILENAME, the PowerCenter Server connects to the FTP host and looks for c:/data/FILENAME.

If you enter a fully qualified file name in the Remote Filename field, the PowerCenter Server uses the named path rather than the path entered in the Default Remote Directory.


If you enter a mainframe file name for a source file in the default directory, make sure you enter the closing quote. For example, if your default remote directory is:

‘defaultdir.

To access the file, FILENAME, from the default mainframe directory, enter the following in the Remote Filename field:

filename’

When the PowerCenter Server begins the session, it connects to the mainframe host and looks for:

‘defaultdir.filename’

In contrast, if you want to use a file in a different directory, you must enter that directory and file name in the Remote Filename field, like this:

‘overridedir.filename’

Note: Depending on the FTP server you use, you may have limited options for entering FTP directories. Please see your FTP server documentation for details.

5. To store the file in a directory local to the PowerCenter Server, select Is Staged.

When you select this option for a source file, the PowerCenter Server moves the source file from the FTP host to a local directory before the session begins, then uses the local file during the session. If the staged file exists, the PowerCenter Server truncates the staged file before running the session.

The location of the local file differs depending on the information entered in the Properties settings of the Sources tab:


If you have an individual path and file name listed in the Source Filename field, the PowerCenter Server uses that path as the local directory, and names the staged local file after the listed file. For example, if the Source Filename field contains the path, c:/data/sales_info, the PowerCenter Server connects to the FTP host, then moves the file to c:/data, and names the file sales_info.

If the Source Filename field contains only a file name (and no path), the PowerCenter Server names the file as defined in the Source Filename field, and places the file in the directory listed in the Source file directory field. If the directory is not specified, the PowerCenter Server stages the file in the directory where the PowerCenter Server runs on UNIX or in Windows system directory.

If you do not stage the source file, the PowerCenter Server accesses the data directly from the FTP host.

6. Repeat steps 3-5 for each FTP source and target in the session, then click OK.

7. Configure the rest of the session, then click OK.

FTP File TargetsYou can use FTP to transfer target files to any machine to which the PowerCenter Server can connect.

To create a session using FTP target files:



2. In the Connections settings on the Mapping tab, select FTP for Type.

3. Click the Open button in the Value field to select an FTP connection.

4. Click Override and enter the remote file name.

Select an FTP connection.


If you enter a file name without a leading slash or drive letter, the PowerCenter Server appends the file name to the Default Remote Directory path entered in the FTP Connection dialog. For example, if your default remote directory is c:/data/, and you enter a remote file name of FILENAME, the PowerCenter Server connects to the FTP host and looks for c:/data/FILENAME.

If you enter a fully qualified file name, the PowerCenter Server uses the named path rather than the path entered in the Default Remote Directory. Do not enclose the fully qualified file name in single or double quotation marks. The session may fail if you enclose the fully qualified file name in quotation marks.

When you transfer a target file to a mainframe host, make sure you enter the opening quote. For example, if your default remote directory is defaultdir., you enter the following in the default remote directory field:

‘defaultdir.

Note: Depending on the FTP server you use, you may have limited options for entering FTP directories. Please see your FTP server documentation for details.

5. To store the target file in a directory on the machine where the PowerCenter Server runs, select Is Staged.

When you select this option, the PowerCenter Server writes to the local target file during the session, then moves the file to the FTP host after the session is complete. The location of the local file differs depending on the information entered in the Properties settings of the Mapping tab:


If you have an individual path and file name listed in the Output Filename field, the PowerCenter Server uses that path as the local directory, and names the staged local file after the listed file. For example, if the Output Filename field contains the path, c:/data/t_company_all.out, the PowerCenter Server connects to the FTP host, then moves the file to c:/data, and names the file t_company_all.out.

If the Output Filename field contains only a file name (and no path), the PowerCenter Server names the file as defined in the Output Filename field, and places the file in the directory listed in the Output file directory field. If the directory is not specified, the PowerCenter Server stages the file in the directory where the PowerCenter Server runs on UNIX or the system directory on Windows.

If you do not stage the file, the PowerCenter Server accesses the data directly from the FTP host. The local file and directory are not used.

Select the Merge Partitioned Files option and specify the merge file name and directory when you partition your target. For more information, see “Partitioning File Targets” on page 380.

6. Repeat steps 3-5 for each FTP target in the session, and then click OK.

7. Configure the rest of the session, and then click OK.



C h a p t e r 2 2

Using Incremental Aggregation


♦ Overview, 574

♦ PowerCenter Server Processing for Incremental Aggregation, 575

♦ Reinitializing the Aggregate Files, 576

♦ Moving or Deleting the Aggregate Files, 577

♦ Partitioning Guidelines with Incremental Aggregation, 578

♦ Preparing for Incremental Aggregation, 579

573

Overview

When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes only incrementally and you can capture changes, you can configure the session to process only those changes. This allows the PowerCenter Server to update your target incrementally, rather than forcing it to process the entire source and recalculate the same data each time you run the session.

For example, you might have a session using a source that receives new data every day. You can capture those incremental changes because you have added a filter condition to the mapping that removes pre-existing data from the flow of data. You then enable incremental aggregation.

When the session runs with incremental aggregation enabled for the first time on March 1, you use the entire source. This allows the PowerCenter Server to read and store the necessary aggregate data. On March 2, when you run the session again, you filter out all the records except those time-stamped March 2. The PowerCenter Server then processes only the new data and updates the target accordingly.

Consider using incremental aggregation in the following circumstances:

♦ You can capture new source data. Use incremental aggregation when you can capture new source data each time you run the session. Use a Stored Procedure or Filter transformation to process only new data.

♦ Incremental changes do not significantly change the target. Use incremental aggregation when the changes do not significantly change the target. If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case, drop the table and re-create the target with complete source data.

Note: Do not use incremental aggregation if your mapping contains percentile or median functions. The PowerCenter Server uses system memory to process Percentile and Median functions in addition to the cache memory you configure in the session property sheet. As a result, the PowerCenter Server does not store incremental aggregation values for Percentile and Median functions in disk caches.

574 Chapter 22: Using Incremental Aggregation

PowerCenter Server Processing for Incremental Aggregation

The first time you run an incremental aggregation session, the PowerCenter Server processes the entire source. At the end of the session, the PowerCenter Server stores aggregate data from that session run in two files, the index file and the data file. The PowerCenter Server creates the files in a local directory.

Each subsequent time you run the session with incremental aggregation, you use only the incremental source changes in the session.

For each input record, the PowerCenter Server checks historical information in the index file for a corresponding group. If it finds a corresponding group, the PowerCenter Server performs the aggregate operation incrementally, using the aggregate data for that group, and saves the incremental change. If it does not find a corresponding group, the PowerCenter Server creates a new group and saves the record data.

When writing to the target, the PowerCenter Server applies the changes to the existing target. It saves modified aggregate data in the index and data files to be used as historical data the next time you run the session.

If the source changes significantly, and you want the PowerCenter Server to continue saving aggregate data for future incremental changes, configure the PowerCenter Server to overwrite existing aggregate data with new aggregate data. For details, see “Reinitializing the Aggregate Files” on page 576.

When you partition a session that uses incremental aggregation, the PowerCenter Server creates one set of cache files for each partition.

The PowerCenter Server creates new aggregate data, instead of using historical data, when you perform one of the following tasks:

♦ Save a new version of the mapping.

♦ Configure the session to reinitialize the aggregate cache.

♦ Move the aggregate files without correcting the configured path or directory for the files in the session property sheet.

♦ Change the configured path or directory for the aggregate files without moving the files to the new location.

♦ Delete cache files.

♦ Decrease the number of partitions.

Note: When the PowerCenter Server rebuilds incremental aggregation files, the data in the previous files is lost.

PowerCenter Server Processing for Incremental Aggregation 575

Reinitializing the Aggregate Files

If the source tables change significantly, you might want to run the session with the entire source data. To do this, you can configure the session to reinitialize the aggregate cache.

For example, you can reinitialize the aggregate cache if the source for a session changes incrementally every day and completely changes once a month. When you receive the new monthly source, you might configure the session to reinitialize the aggregate cache, truncate the existing target, and use the new source table during the session.

After you run a session that reinitializes the aggregate cache, edit the session properties to disable the Reinitialize Aggregate Cache option. If you do not clear Reinitialize Aggregate Cache, the PowerCenter Server overwrites the aggregate cache each time you run the session.

Note: When you move from Windows to UNIX, you must reinitialize the cache. Therefore, you cannot change from a Latin1 code page to an MSLatin1 code page, even though these code pages are compatible.


Moving or Deleting the Aggregate Files

Once you run an incremental aggregation session, avoid moving or modifying the index and data files that store historical aggregate information.

If you do move the files into a different directory, and you want the PowerCenter Server to use the aggregate files, you must also change the path to those files in the session properties. As well, if you change the path to the files, but you do not move the files, the PowerCenter Server rebuilds the files the next time you run the session.

If you change certain session or server properties, the PowerCenter Server cannot use the incremental aggregation files, and it fails the session. To avoid session failure, delete existing incremental aggregation files when you perform any of the following tasks:

♦ Change the PowerCenter Server data movement mode from ASCII to Unicode or from Unicode to ASCII.

♦ Change the PowerCenter Server code page to an incompatible code page.

♦ Change the session sort order when the PowerCenter Server runs in Unicode mode.

♦ Change the Enable High Precision session option.

Finding Index and Data FilesBy default, the PowerCenter Server stores the index and data files in the directory entered in the server variable, $PMCacheDir, in the Workflow Manager. The PowerCenter Server names the index file PMAGG*.idx. The PowerCenter Server names the data file PMAGG*.dat.

If you run the session using Verbose Init mode, the PowerCenter Server writes the file names in the session log. To locate the files, look in the previous session log for the TE_7034 and TE_7035 messages that indicate the cache file name and location. The following messages show sample entries in the session log:

MAPPING> TE_7034 Aggregate Information: Index file is [D:\Informatica\InformaticaServer\Cache\PMAGG8_4_2.idx]

MAPPING> TE_7035 Aggregate Information: Data file is [D:\Informatica\InformaticaServer\Cache\PMAGG8_4_2.dat]

If you do not run the session using Verbose Init mode or use an identifiable transformation naming convention, you may have difficulty determining which files belong to each session.

For more information about cache file storage and naming conventions, see “Cache Files” on page 615.

Moving or Deleting the Aggregate Files 577

Partitioning Guidelines with Incremental Aggregation

When you use incremental aggregation in a session with multiple partitions, the PowerCenter Server creates one set of cache files for each partition.

Use the following guidelines when you change the number of partitions or the cache directory:

♦ Change the cache directory for a partition. If you change the directory for a partition and you want the PowerCenter Server to reuse the cache files, you must move the cache files for the partition associated with the changed directory.

− If you change the directory for the first partition, and you do not move the cache files, the PowerCenter Server rebuilds the cache files for all partitions.

− If you change the directory for partitions 2-n, and you do not move the cache files, the PowerCenter Server rebuilds the cache files that it cannot locate.

♦ Decrease the number of partitions. If you delete a partition, and you want the PowerCenter Server to reuse the cache files, you must move the cache files for the deleted partition to the directory configured for the first partition. If you do not move the files to the directory of the first partition, the PowerCenter Server rebuilds the cache files that it cannot locate.

Note: If you increase the number of partitions, the PowerCenter Server realigns the index and data cache files the next time you run a session. It does not need to rebuild the files.

♦ Move cache files. If you move cache files for a partition and you want the PowerCenter Server to reuse the files, you must also change the partition directory. If you do not change the directory, the Informatica rebuilds the files the next time you run a session.

♦ Delete cache files. If you delete cache files, the PowerCenter Server rebuilds them the next time you run a session.

If you change the number of partitions and the cache directory, you may need to move cache files for both. For example, if you change the cache directory for the first partition, and you decrease the number of partitions, you need to move the cache files for the deleted partition as well as the cache files for the partition associated with the changed directory.


Preparing for Incremental Aggregation

When you use incremental aggregation, you need to configure both mapping and session properties.

♦ Implement mapping logic or filter to remove pre-existing data.

♦ Configure the session for incremental aggregation and verify that the file directory has enough disk space for the aggregate files.

Configuring the MappingBefore enabling incremental aggregation, you must capture changes in source data. You might do this by:

♦ Using a filter in the mapping. You may be able to remove pre-existing source data during a session with a filter.

♦ Using a stored procedure. You may be able to remove pre-existing source data at the source database with a pre-load stored procedure.

Configuring the SessionUse the following guidelines when you configure the session for incremental aggregation:

♦ Verify the location where you want to store the aggregate files. The index and data files grow in proportion to the source data. When denoting the directory for those files, be sure the directory has enough disk space to store historical data for the session.

When you run multiple sessions with incremental aggregation, decide where you want the files stored. Then enter the appropriate directory for the server variable, $PMCacheDir, in the Workflow Manager. You can enter session-specific directories for the index and data files. However, by using the server variable for all sessions using incremental aggregation, you can easily change the cache directory when necessary by changing $PMCacheDir.

Changing the cache directory without moving the files causes the PowerCenter Server to reinitialize the aggregate cache and gather new aggregate data.

In a server grid, PowerCenter Servers rebuild incremental aggregation files they cannot find. When a PowerCenter Server rebuilds incremental aggregation files, it loses aggregate history. For more information about methods to save aggregate history in a server grid, see “Running Sessions with Cache Files” on page 445.

♦ Configure the session to write file names in the session log. If you want the PowerCenter Server to write the incremental aggregation cache file names in the session log, configure the session with Verbose Init tracing. You can override tracing in the Error Handling settings on the Config Object tab.

♦ Verify the incremental aggregation settings in the session properties. You can configure the session for incremental aggregation in the Performance settings on the Properties tab.

Preparing for Incremental Aggregation 579

You can also configure the session to reinitialize the aggregate cache. If you choose to reinitialize the cache, the Workflow Manager displays a warning indicating the PowerCenter Server overwrites the existing cache and a reminder to clear this option after running the session.To configure a session for incremental aggregation:

Figure 22-1 shows the Performance settings on the Properties tab where you configure incremental aggregation options:

Note: You cannot use incremental aggregation when the mapping includes an Aggregator transformation with Transaction transformation scope. The Workflow Manager marks the session invalid.

Figure 22-1. Incremental Aggregation Session Properties

Configure incremental aggregation.


C h a p t e r 2 3

Using pmcmd


♦ Overview, 582

♦ Configuring Environment Variables, 585

♦ Using the Command Line Mode, 589

♦ Using the Interactive Mode, 592

♦ pmcmd Reference, 594

581

Overview

pmcmd is a program that you can use to communicate with the PowerCenter Server. You can perform some of the tasks that you can also perform in the Workflow Manager such as starting and stopping workflows and tasks.

You can use pmcmd in the following modes:

♦ Command line mode. The command line syntax allows you to write scripts for scheduling workflows. Each command you write in the command line mode must include connection information to the PowerCenter Server.

♦ Interactive mode. You establish and maintain an active connection to the PowerCenter Server. This allows you to issue a series of commands.

You can use repository user names and passwords as environment variables with pmcmd. You can also customize the way pmcmd displays the date and time on the machine running the PowerCenter Server. Before you use pmcmd, configure these variables on the PowerCenter Server. For more information, see “Configuring Environment Variables” on page 585.

Note: To issue the shutdownserver command, you must have the Super User privilege or Administer Server privilege.

Table 23-1 provides a description for the pmcmd commands. For details on command syntax and usage, see “pmcmd Reference” on page 594.

Table 23-1. pmcmd Commands

Command Mode(s) Description

aborttask Command line, Interactive

Aborts a task. Issue this command only after the PowerCenter Server fails to stop when you issue the stoptask command. For more information, see �Aborttask� on page 596.

abortworkflow Command line, Interactive

Aborts a workflow. Issue this command only after the PowerCenter Server fails to stop the workflow when you issue the stopworkflow command. For more information, see �Abortworkflow� on page 597.

connect Interactive Connects to the PowerCenter Server in the interactive mode. Use this command in conjunction with connection information. For more information, see �Connect� on page 597.

disconnect Interactive Disconnects from the PowerCenter Server in the interactive mode. For more information, see �Disconnect� on page 598.

exit Interactive Exits from pmcmd in the interactive mode. For more information, see �Exit� on page 598.

getrunningsessionsdetails Command line, Interactive

Displays details for sessions currently running on a PowerCenter Server including information for the folder, workflow, and session instance. Displays session status and statistics on each target table and source qualifier. For more information, see �Getrunningsessionsdetails� on page 598.

582 Chapter 23: Using pmcmd

getserverdetails Command line, Interactive

Displays details for the PowerCenter Server including server status, information on active workflows, and timestamp information.In a server grid, this command displays the PowerCenter Servers that runs each task instance. For more information, see �Getserverdetails� on page 599.

getserverproperties Command line, Interactive

Displays the PowerCenter Server name, type, and version. It returns the timestamp on the PowerCenter Server and the name of the repository. It also indicates the data movement mode and whether the PowerCenter Server can debug mappings. For more information, see �Getserverproperties� on page 599.

getsessionstatistics Command line, Interactive

Displays session details including information for the folder, workflow, and task instance. Displays session status and statistics on each target table and source qualifier.In a server grid, this command displays the PowerCenter Servers that runs each task instance. For more information, see �Getsessionstatistics� on page 600.

gettaskdetails Command line, Interactive

Displays details for a task including folder and workflow name. Also displays the task, status, and run mode.In a server grid, this command displays the PowerCenter Servers that runs each task instance. For more information, see �Gettaskdetails� on page 601.

getworkflowdetails Command line, Interactive

Displays details for a workflow including workflow name, status, and run mode. Also displays information when the workflow was last executed. For more information, see �Getworkflowdetails� on page 601.

help Command line, Interactive

Displays a list of pmcmd commands and syntax. For more information, see �Help� on page 602.

pingserver Command line, Interactive

Determines whether the PowerCenter Server is running. For more information, see �Pingserver� on page 602.

quit Interactive Quits from pmcmd in the interactive mode. For more information, see �Quit� on page 602.

resumeworkflow Command line, Interactive

Resumes a suspended workflow. For more information, see �Resumeworkflow� on page 603.

resumeworklet Command line, Interactive

Resumes a suspended worklet. For more information, see �Resumeworklet� on page 603.

scheduleworkflow Command line, Interactive

The scheduleworkflow command instructs the PowerCenter Server to schedule a workflow. Use this command to manually reschedule a workflow that has been removed from the schedule. For more information, see �Scheduleworkflow� on page 604.

setfolder Interactive Designates a folder as the default folder in which to execute all subsequent commands. For more information, see �Setfolder� on page 604.



Overview 583

setnowait Interactive Instructs the PowerCenter Server to execute subsequent commands in the nowait mode. In the nowait mode, you can enter a new pmcmd command after the PowerCenter Server receives the previous command. For more information, see �Setnowait� on page 605.

setwait Interactive Instructs the PowerCenter Server to execute subsequent commands in the wait mode. In the wait mode, you can enter a new pmcmd command only after the PowerCenter Server completes the previous command. For more information, see �Setwait� on page 605.

showsettings Interactive Displays the settings for the interactive mode, including PowerCenter Server and repository name, username, wait mode, and default folder. For more information, see �Showsettings� on page 605.

shutdownserver Command line, Interactive

Shuts down the PowerCenter Server. Use this command in conjunction with a shutdownmode option. For more information, see �Shutdownserver� on page 605.

startask Command line, Interactive

Starts a task. Use this command in conjunction with a task name. For more information, see �Starttask� on page 606.

startworkflow Command line, Interactive

Starts a workflow. Use this command in conjunction with a workflow name. For more information, see �Startworkflow� on page 607.

stoptask Command line, Interactive

Stops a task. Use this command in conjunction with a task name. For more information, see �Stoptask� on page 609.

stopworkflow Command line, Interactive

Stops a workflow. Use this command in conjunction with a workflow name. For more information, see �Stopworkflow� on page 609.

unscheduleworkflow Command line, Interactive

Instructs the PowerCenter Server to remove the workflow from the schedule. For more information, see �Unscheduleworkflow� on page 610.

unsetfolder Interactive Designates no folder as the default folder. For more information, see �Unsetfolder� on page 610.

version Command line, Interactive

Displays the PowerCenter version number. For more information, see �Version� on page 611.

waittask Command line, Interactive

Instructs the PowerCenter Server to wait for the completion of a running task before starting another command. Use this command in conjunction with a task name. For more information, see �Waittask� on page 611.

waitworkflow Command line, Interactive

Notifies you of the status of a workflow. Use this command in conjunction with a workflow name. For more information, see �Waitworkflow� on page 611.




Configuring Environment Variables

Before you use pmcmd, you can set environment variables that are applied each time you run pmcmd. You can configure the following environment variables to use with pmcmd:

♦ PM_CODEPAGENAME

♦ PMTOOL_DATEFORMAT

♦ Repository USERNAME and PASSWORD

♦ PM_HOME

Configuring PM_CODEPAGENAMEpmcmd uses the code page of the machine hosting pmcmd unless you specify the code page environment variable, PM_CODEPAGENAME, to override it. The code page must be compatible with the PowerCenter Server code page. pmcmd sends commands in Unicode. If the code pages are not compatible, the PowerCenter Server might not find the workflow, session, or task in the repository. For more information about code page compatibility, see “Globalization Overview” and “Code Pages” in the Installation and Configuration Guide.

To configure a code page environment variable in a UNIX environment:

1. If you are in a UNIX C shell environment, type:

setenv PM_CODEPAGENAME <code page name>

If you are in a UNIX Bourne shell environment, type:

PM_CODEPAGENAME=<code page name>

export PM_CODEPAGENAME

To configure a code page as an environment variable on Windows:

1. Enter environment variables in the Windows System Properties.

For information about setting environment variables for your Windows operating system, consult your Windows documentation.

2. Enter a system variable named PM_CODEPAGENAME and set the value to the code page name.

Configuring PMTOOL_DATEFORMATUse this environment variable to customize the way pmcmd displays the date and time. The pmcmd program verifies that the string you specify is a valid format. If the format string is not valid, the PowerCenter Server generates a warning message and displays the date in the format DY MON DD HH24:MI:SS YYYY.

Configuring Environment Variables 585

To configure a date display format as an environment variable on UNIX:


setenv PMTOOL_DATEFORMAT <date/time format string>


PMTOOL_DATEFORMAT=<date/time format string>

export PMTOOL_DATEFORMAT

To configure a date display format as an environment variable on Windows:



2. Enter a system or user variable named PMTOOL_DATEFORMAT and set the value to the display format string.

Configuring Repository Username and PasswordYou can enter your repository user name and password at the command line as environment variables. The password is an encrypted value.

To configure a username as an environment variable on UNIX:


setenv USERNAME YourUsername


USERNAME=YourUsername

export USERNAME

You can assign the environment variable any valid UNIX name.

To configure a password as an environment variable on UNIX:

1. In a UNIX session, navigate to the directory where the PowerCenter Server is installed.

2. At the shell prompt, type:

pmpasswd YourPassword

This command runs the encryption utility pmpasswd located in the directory where the PowerCenter Server is installed. The encryption utility generates and displays your encrypted password. The following is sample output. In this example, the password entered was “monday.”

Encrypted string -->bX34dqq<--

Will decrypt to -->monday<--

Your encrypted password is bX34dqq.



setenv PASSWORD YourEncryptedPassword


PASSWORD= YourEncryptedPassword

export PASSWORD

You can assign the environment variable any valid UNIX name.

To configure a username as an environment variable on Windows:



2. Enter the name of the user environment variable in the Variable field. Enter your repository username in the Value field.

You can set these up as either a user or system variable. User variables take precedence over system variables.

To configure a password as an environment variable on Windows:

1. In Windows DOS, navigate to the directory where the PowerCenter Server is installed.

2. At the command line, type:

pmpasswd YourPassword

The encryption utility generates and displays your encrypted password. The following is sample output. In this example, the password entered was “monday.”

Encrypted string -->bX34dqq<--

Will decrypt to -->monday<--

Your encrypted password is bX34dqq.



4. Enter the name of your password environment variable in the Variable field. Enter your encrypted password in the Value field.

You can set these up as either a user or system variable. User variables take precedence over system variables.

Configuring PM_HOMEUse the PM_HOME variable to start pmcmd from a directory other than the install directory. On UNIX, point the PM_HOME and PATH environment variables to the PowerCenter

Configuring Environment Variables 587

Server installation directory. On Windows, include the PowerCenter Server install directory in the environment path.

Warning: If you specify an incorrect directory path for the PM_HOME environment variable the PowerCenter Server cannot start.

To start pmcmd from any directory on UNIX:

1. Point the PM_HOME environment variable to the installation directory.

If you are in a UNIX C shell environment, type the following to set the PM_HOME variable:

setenv PM_HOME <install directory>


PM_HOME=<install directory>

export PM_HOME

2. Add the installation directory to the PATH environment variable.

If you are in a UNIX C shell environment, type the following to set the PATH variable:

setenv PATH “<install directory>:$PATH”


PATH=”<install directory>:$PATH”

export PATH

To start pmcmd from any directory on Windows:

In the system properties, add the installation directory to the path variable. For example, on Windows 2000, configure the path variable in System settings. Click the Environment tab to select the path variable and add the installation directory to the variable value.


Using the Command Line Mode

You can use pmcmd commands with operating system scheduling tools like cron or embed pmcmd commands into shell scripts or Perl programs.

Each command must include the connection information to the PowerCenter Server and the PowerCenter repository. For example, to start a workflow named wFlow4 in the command line mode, use the following syntax:

pmcmd startworkflow -s serveraddress:portno -u YourUsername -p YourPassword wFlow4

The following command immediately starts the workflow wSalesAvg, located in the east folder, on the remote PowerCenter Server with host name Sales listening at port 6258:

pmcmd startworkflow -u seller3 -p jackson -s SALES:6258 -f east -wait wSalesAvg

The user, seller3, with the password “jackson” sends the request to start the workflow. When you use the wait option, pmcmd returns to the shell or command prompt when the workflow completes.

For a list of commands you can use in the command line mode, see Table 23-1 on page 582. For details on each command see “pmcmd Reference” on page 594.

Connecting to the PowerCenter Server in the Command Line ModeWhen you run pmcmd in the command line mode, you enter connection parameters such as username, password, and server information for each command. If you incorrectly enter or omit one of the required parameters, the command fails and pmcmd returns a non-zero return code. For a description of all the return codes, see “pmcmd Return Codes” on page 590.

There are several options to enter the user and password information. You can enter a username. Or, if you previously defined a username environment variable, you can enter that instead. You can also enter a previously defined password environment variable instead of a password. The following command uses both user and password variables:

pmcmd startworkflow -s serveraddress:portno -uv USERNAME -pv PASSWORD wFlow4

For information on defining username and password environment variables, see “Configuring Repository Username and Password” on page 586.

Using the Command Line Mode 589

Table 23-2 describes the connection information you enter each time you write a command in the command line mode:

pmcmd Return CodesWhen you work in the command line mode, pmcmd indicates the success or failure of a command with a return code. Return code (0) indicates that the command succeeded. Any other return code indicates that the command failed.

Table 23-3 describes the return codes for command line pmcmd.

Table 23-2. Connection Information for the Command Line Mode

Parameter Flags Required/Optional Description

username -user-u

Required Your repository username. Required if userEnvVar is not used.

userEnvVar -uservar-uv

Required Specifies the username environment variable. Required if username is not used.If you do not encrypt your password, you can use -u $username, and run the command from a shell script.

password -password-p

Required Your repository password. Required if passwordEnvVar is not used.

passwordEnvVar -passwordvar-pv

Required Specifies the password environment variable. Required if password is not used.If you do not encrypt your password, you can use -p $password, and run the command from a shell script.

serveraddr -serveraddr-s

Required Server address of the machine hosting the PowerCenter Server.

host N/A Optional Name of the machine hosting the PowerCenter Server. If you do not specify a host name, pmcmd assumes the PowerCenter Server runs on the machine executing pmcmd.

portno N/A Required Port number at which the PowerCenter Server listens.

Table 23-3. pmcmd Return Codes

Code Description

0 For all commands, a return value of zero indicates that the command ran successfully. You can issue these commands in the wait or nowait mode: starttask, startworkflow, resumeworklet, resumeworkflow, aborttask, and abortworkflow. If you issue a command in the wait mode, a return value of zero indicates the command ran successfully. If you issue a command in the nowait mode, a return value of zero indicates that the request was successfully transmitted to the PowerCenter Server, and it acknowledged the request.

1 The PowerCenter Server is down, or pmcmd cannot connect to the PowerCenter Server. The TCP/IP host name or port number or a network problem occurred.

2 The specified task name, workflow name, or folder name does not exist.


3 An error occurred in starting or running the workflow or task.

4 Usage error. You passed the wrong parameters to pmcmd.

5 An internal pmcmd error occurred. Contact Informatica Technical Support.

6 An error occurred while stopping the PowerCenter Server. Contact Informatica Technical Support.

7 You used an invalid username or password.

8 You do not have the appropriate permissions or privileges to perform this task.

9 The connection to the PowerCenter Server timed out while sending the request.

12 The PowerCenter Server cannot start recovery because the session or workflow is scheduled, suspending, waiting for an event, waiting, initializing, aborting, stopping, disabled, or running.

13 The username environment variable is not defined.

14 The password environment variable is not defined.

15 The username environment variable is missing.

16 The password environment variable is missing.

17 Parameter file does not exist.

18 The PowerCenter Server found the parameter file, but it did not have the initial values for the session parameters, such as $input or $output.

19 The PowerCenter Server cannot start the session in recovery mode because the workflow is configured to run continuously.

20 A repository error has occurred. Please make sure that the Repository Server and the database are running and the number of connections to the database is not exceeded.

21 PowerCenter Server is shutting down and it is not accepting new requests.

22 The PowerCenter Server cannot find a unique instance of workflow/session you specified. Enter the command again with the folder name and workflow name.

23 There is no data available for your request.

24 Out of memory.

25 Command is cancelled.

Table 23-3. pmcmd Return Codes

Code Description

Using the Command Line Mode 591

Using the Interactive Mode

Use pmcmd in the interactive mode to start and stop workflows and tasks without writing a script. Once you establish a dedicated connection to the PowerCenter Server, you can issue commands without specifying the connection information. For example, to start the workflow wFlow4 in the interactive mode, type the following at the pmcmd prompt:

pmcmd> startworkflow wFlow4

The following commands immediately start the workflow wSalesAvg, located in the east folder:

pmcmd> connect -user seller3 -password jackson -serveraddr SALES:6258

pmcmd> setwait

pmcmd> setfolder east

pmcmd> startworkflow wSalesAvg

The setwait command means that for all subsequent commands, pmcmd returns the command prompt when the workflow completes. The setfolder command means that for all subsequent commands dealing with workflows or tasks, pmcmd uses the specified workflow or task from the east folder.

For a list of commands you can use in the interactive mode, see Table 23-1 on page 582. For details on each command see “pmcmd Reference” on page 594.

Connecting to the PowerCenter Server in the Interactive ModeTo use pmcmd in the interactive mode, first establish a dedicated connection to the PowerCenter Server.

To start in the interactive mode:

1. In either a Windows DOS session or a UNIX session, navigate to the directory where the PowerCenter Server is installed.

2. At the shell or command prompt, type:

pmcmd

This command returns the PowerCenter version number and the pmcmd prompt.

3. From the pmcmd prompt, type:

connect -u YourUserName -p YourPassword -s ServerName:PortNo

Or, if you use username and password environment variables, type the following at the pmcmd prompt:

connect -uv USERNAME -pv PASSWORD -serveraddr ServerName:PortNo

For information on defining user name and password environment variables, see “Configuring Repository Username and Password” on page 586.


If you omit connection information, pmcmd prompts you to enter the correct information. Once pmcmd successfully connects, you receive the pmcmd prompt. At the pmcmd prompt, you can issue commands without specifying the connection information.

Setting Defaults in the Interactive ModeOnce you connect to a PowerCenter Server using pmcmd interactive mode, you can designate default folders or conditions to use each time the PowerCenter Server executes a command. For example, if you want to issue a series of commands on tasks in the same folder, specify the name of the folder with the setfolder command. All subsequent commands use that folder as the default.

Table 23-4 describes the commands that you can use to set defaults for subsequent commands.

For a list of all the commands that you can use in the interactive mode, see Table 23-1 on page 582.

Table 23-4. Setting Defaults for the Interactive Mode

Command Description

setfolder Designates a folder as the default folder in which to execute all subsequent commands.

setnowait Instructs the PowerCenter Server to execute subsequent commands in the nowait mode. The pmcmd prompt is available after the PowerCenter Server receives the previous command. The nowait mode is the default mode.

setwait Instructs the PowerCenter Server to execute subsequent commands in the wait mode. The pmcmd prompt is available only after the PowerCenter Server completes the previous command.

showsettings Displays the following settings for the interactive mode:- name of the PowerCenter Server and repository to which pmcmd is connected- username- wait mode- default folder

unsetfolder Reverses the setfolder command.

Using the Interactive Mode 593

pmcmd Reference

pmcmd provides multiple ways to enter some of the parameters. For example, to enter a repository password, use the following syntax:

<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>

You can use -password or -p before entering a password. Or, use -passwordvar or -pv before a password environment variable.

To enter a password, precede the password with either the -password or the -p flag.

-password YourPassword

or

-p YourPassword

If you use a password environment variable, precede the variable name with either the -pv flag or the -passwordvar flag.

-passwordvar PASSWORD

or

-pv PASSWORD

For a list of all the parameters you can use with pmcmd, see Table 23-5 on page 594.

Command ParametersWhen you use most parameters, you precede the parameter with a flag. For ease of use, you can use a shortened version for most flags. For example, you can either use -serveraddr or its shortened equivalent, -s.

Table 23-5 describes the parameters used in pmcmd commands and lists the associated flags:

Table 23-5. Command Parameters

Parameter Flags Description

folder -folder -f

Name of the folder containing the workflow or task. Required if the workflow or task name is not unique in the repository.

host N/A The name of the machine hosting the PowerCenter Server. If you do not specify a host name, pmcmd assumes the PowerCenter Server runs on the machine executing pmcmd.

localparamfile -localparamfile-lpf

The localparamfile is a parameter file on a local machine that pmcmd uses when you start a workflow. Use in conjunction with the startworkflow command.

paramfile -paramfile The paramfile parameter determines which parameter file is used when a task or workflow runs. It overrides the configured parameter file for the workflow or task. Use in conjunction with the starttask or startworkflow commands.


Using Quotation MarksIf a command parameter contains spaces, use single or double quotation marks to enclose the parameter. For example, use single quotes in the following syntax to enclose the folder name:

abortworkflow -f ‘quarterly sales’ -wait Q3workflow

To denote an empty string, use two single quotes (‘’) or two double quotes (“”). Be sure you match an opening quote with a closing quote.

Syntax NotationTable 23-6 describes the notation used in pmcmd syntax:

password -password -p

Your repository password. Required if passwordEnvVar is not used.

passwordEnvVar -passwordvar-pv

Specifies the password environment variable. Required if password is not used.

portno N/A Specifies the port number at which the PowerCenter Server listens.

recovery -recovery Specifies you want to run the session in recovery mode.

serveraddr -serveraddr -s

Server address of the machine hosting the PowerCenter Server.

startfrom -startfrom Starts a workflow from a specified task, taskInstancePath. Use the startfrom parameter in conjunction with the startworkflow command. Write the taskInstancePath as a fully qualified string.

taskInstancePath N/A Indicates a task and where it appears within the workflow. A task within a workflow is indicated by its task name alone. A task within a worklet is indicated by WorkletName.TaskName.

userEnvVar -uservar -uv

Specifies the username environment variable. Required if username is not used.

username -user -u

Your repository username. Required if userEnvVar is not used.

workflow -workflow-w

Name of the workflow.

Table 23-6. pmcmd Syntax Notation

Convention Description

-z Flag placed before a parameter. This designates the parameter you enter. For example, to enter the username, type -u or -user followed by the username.

< x > Required parameter. If you omit a required parameter, pmcmd returns an error message.

Table 23-5. Command Parameters

Parameter Flags Description

pmcmd Reference 595

Tip: When you enter commands in pmcmd, type the command name first followed by the optional parameters in any order.

AborttaskThe aborttask command aborts a task. Issue this command only after the PowerCenter Server fails to stop the task when you issue the stoptask command. For details on how the PowerCenter Server aborts and stops tasks, see “Server Handling of Stop and Abort” on page 129.

In the command line mode, use the following syntax to abort a task:

pmcmd aborttask

<-serveraddr|-s> [host:]portno

<<-user|-u> username|<-uservar|-uv> userEnvVar>


[<-folder|-f> folder]

<<-workflow|-w> workflow>

[-wait|-nowait]

taskInstancePath

In the interactive mode, enter the following syntax at the pmcmd prompt to abort a task:

aborttask



<x | y > Select between required parameters. For the command to run, you must select from the listed parameters. If you omit a required parameter, pmcmd returns an error message.

[ x ] Optional parameter. The command runs whether or not you enter in optional parameters. For example, if you want to use the help command, the syntax is a follows:Help [Command]If you enter a command, pmcmd returns information on that command only. If you omit the command name, pmcmd returns a list of all commands.

[ x | y ] Select between optional parameters. The command runs whether or not you enter in optional parameters. For example, many commands run in either the wait or nowait mode. [-wait|-nowait]The command runs in the mode you specify. If you do not specify a mode, pmcmd runs the command in the default nowait mode.

<< x | y>| <a | b>> When a set contains subsets, the superset is indicated with bold brackets < >. A bold pipe symbol (| )separates the subsets.

Table 23-6. pmcmd Syntax Notation

Convention Description


[-wait|-nowait]

taskInstancePath

Write the taskInstancePath as a fully qualified string. If the task is within a worklet, write the string as WorkletName.TaskName. If the task is directly within a workflow, use the task name alone.

For information on other parameters used in this command, see Table 23-5 on page 594.

AbortworkflowThe abortworkflow command aborts a workflow. Issue this command only after the PowerCenter Server fails to stop the workflow when you issue the stopworkflow command. For details on how the PowerCenter Server aborts and stops workflows, see “Server Handling of Stop and Abort” on page 129.

In the command line mode, use the following syntax to abort a workflow:

pmcmd abortworkflow





[-wait|-nowait]

workflow

In the interactive mode, enter the following syntax at the pmcmd prompt to abort a workflow:

abortworkflow


[-wait|-nowait]

workflow


ConnectThe connect command connects the pmcmd program to the PowerCenter Server in the interactive mode. If you omit connection information, pmcmd prompts you to enter the correct information. Once pmcmd successfully connects, you receive the pmcmd prompt. At the pmcmd prompt, you can issue commands without specifying the connection information.

connect




pmcmd Reference 597

Note: You can use this command in the interactive mode only.

DisconnectThe disconnect command disconnects pmcmd from the PowerCenter Server. It does not close the pmcmd program. Use this command when you want to disconnect from a PowerCenter Server and connect to another in the interactive mode.

In the interactive mode, use the following syntax to disconnect pmcmd from a PowerCenter Server:

disconnect

Note: You can use this command only in the pmcmd interactive mode.

ExitThe exit command disconnects pmcmd from the PowerCenter Server and closes the pmcmd program.

In the interactive mode, use the following syntax to exit pmcmd:

exit

Note: You can use this command only in the pmcmd interactive mode.

GetrunningsessionsdetailsThe getrunningsessionsdetails command returns the details for all sessions currently running on the PowerCenter Server. Details include startup and current time, folder and workflow names, session instance, master and execution servers, number of successful and failed rows in sources and targets, number of transformation errors, and number of sessions running on the PowerCenter Server.

In the command line mode, use the following syntax to get details about sessions running on the PowerCenter Server:

pmcmd getrunningsessionsdetails




In the interactive mode, enter the following syntax at the pmcmd prompt to get details about the PowerCenter Server:

getrunningsessionsdetails


GetserverdetailsThe getserverdetails command returns details about workflows and tasks running on a PowerCenter Server.

♦ Workflow details. Workflow details include the name of the PowerCenter Server, folder, workflow, workflow log file, and user that runs the workflow. It includes workflow run type, start time, run status, and run error code. It also includes the number of active workflows and the number of scheduled workflows.

♦ Task details. In addition to workflow details, task details include folder name, workflow name, task instance name, task type, task start time, task run status, task run error code, and task run mode. When the task is a session, the getserverdetails command also returns master server name, worker server name, server grid name, the number of active sessions, and the number of waiting sessions.

In the command line mode, use the following syntax to get details about the PowerCenter Server:

pmcmd getserverdetails




[-all|-running|-scheduled]

In the interactive mode, enter the following syntax at the pmcmd prompt to get details about the PowerCenter Server:

getserverdetails

[-all|-running|-scheduled]

Issue the getserverdetails command for all or some of the workflows. The -running option returns status details on active workflows. Active workflows include running, suspending, and suspended workflows. The -scheduled option returns status details on the scheduled workflows. The default option is the -all option, and it returns status details on the scheduled and running workflows.


GetserverpropertiesThe getserverproperties command returns the PowerCenter Server name, type, and version. It returns the timestamp on the PowerCenter Server, the PowerCenter Server startup time, and the name of the repository. It indicates the data movement mode, the PowerCenter Server code page, and whether the PowerCenter Server can debug mappings. It also specifies the server grid name.

In the command line mode, use the following syntax to see the PowerCenter Server properties:

pmcmd getserverproperties

pmcmd Reference 599

<-serveraddr|-s>[host:]portno

In the interactive mode, enter the following syntax at the pmcmd prompt to see PowerCenter Server properties:

getserverproperties


Serveraddr is the server name and port number of the PowerCenter Server.

GetsessionstatisticsThe getsessionstatistics command returns session details and statistics. The command returns the following information for each partition:

♦ Session details. Session details include the name of the folder, workflow, task instance, and mapping. It includes the task run status, session log file name, first error code and message, the number of transformation errors, and the number of successful and failed rows for the sources and targets. It also includes the name of the master server, worker server, and server grid.

♦ Session statistics. Session statistics include the transformation name, transformation instance name, and the number of applied, affected, and rejected rows. It also includes the throughput, last error code and message, and start and end time for the session.

In the command line mode, use the following syntax to get session statistics:

pmcmd getsessionstatistics






taskInstancePath

In the interactive mode, enter the following syntax at the pmcmd prompt to get session statistics:

getsessionstatistics



taskInstancePath

When using this command, specify the workflow name. Also, write the taskInstancePath as a fully qualified string. If the task is within a worklet, write the string as WorkletName.TaskName. If the task is directly within a workflow, enter only the task name.



GettaskdetailsThe gettaskdetails command returns the folder name, workflow name, task instance name, task type, last execution start time, last execution complete time, task run status, and task run mode. It also returns the run error code and message.

If you issue the gettaskdetails command for a Session task, the command also returns the following additional information: mapping name, session log file name, first error code and message, number of successful and failed rows from the source and target, the number of transformation errors, master server name, worker server name, and server grid name.

In the command line mode, use the following syntax to get details on a task:

pmcmd gettaskdetails






taskInstancePath

In the interactive mode, enter the following syntax at the pmcmd prompt to get details on a task:

gettaskdetails



taskInstancePath

When you use this command, specify the workflow name. Also, write the taskInstancePath as a fully qualified string. If the task is within a worklet, write the string as WorkletName.TaskName. If the task is directly within a workflow, enter only the task name.


GetworkflowdetailsThe getworkflowdetails command returns the folder name, workflow name, last start time, last completion time, workflow status, run mode, and the username that ran the last workflow.

In the command line mode, use the following syntax to get details on a workflow:

pmcmd getworkflowdetails




pmcmd Reference 601


workflow

In the interactive mode, enter the following syntax at the pmcmd prompt to get details on a workflow:

getworkflowdetails


workflow


HelpThe help command returns the syntax for the command you specify. If you omit the command name, pmcmd lists each command and syntax.

In the command line mode, use the following command for help with command line commands:

pmcmd help [command]

In the interactive mode, use the following command for help with interactive mode commands:

help [command]

PingserverThe pingserver command verifies that the PowerCenter Server is running.

In the command line mode, use the following syntax to ping the PowerCenter Server:

pmcmd pingserver


In the interactive mode, enter the following syntax at the pmcmd prompt to ping the PowerCenter Server:

pingserver

Serveraddr is the host name and port number of the PowerCenter Server.

QuitThe quit command disconnects pmcmd from the PowerCenter Server and closes the pmcmd program.

In the interactive mode, use the following syntax to quit pmcmd:

quit

Note: You can use this command in the pmcmd interactive mode only.


ResumeworkflowThe resumeworkflow command resumes suspended workflows. To resume a workflow, specify the folder and workflow name. The PowerCenter Server resumes the workflow from all suspended and failed worklets and all suspended and failed Command, Email, and Session tasks.

In the command line mode, use the following syntax to resume a workflow:

pmcmd resumeworkflow





[-wait|-nowait]

[-recovery]

workflow

In the interactive mode, enter the following syntax at the pmcmd prompt to resume a workflow:

resumeworkflow


[-wait|-nowait]

[-recovery]

workflow


ResumeworkletThe resumeworklet command resumes suspended worklets. To resume the workflow from a specific worklet, specify the taskInstancePath as a fully qualified string. If you do not specify a taskInstancePath, the workflow resumes from the suspended worklet.

In the command line mode, use the following syntax to resume a worklet:

pmcmd resumeworklet






[-wait|-nowait]

[-recovery]

pmcmd Reference 603

taskInstancePath

In the interactive mode, enter the following syntax at the pmcmd prompt to resume a worklet:

resumeworklet



[-wait|-nowait]

[-recovery]

taskInstancePath


ScheduleworkflowThe scheduleworkflow command instructs the PowerCenter Server to schedule a workflow. Use this command to reschedule a workflow that has been removed from the schedule.

In the command line mode, use the following syntax to schedule a workflow:

pmcmd scheduleworkflow <-serveraddr|-s> [host:]portno

<<-user|-u> username|<-uservar|-uv> user_env_var>

<<-password|-p> password|<-passwordvar|-pv> password_env_var>

[<-folder|-f> folder] workflow

In the interactive mode, enter the following syntax at the pmcmd prompt to schedule a workflow:

scheduleworkflow [<-folder|-f> folder] workflow


SetfolderThe setfolder command designates a folder as the default folder in which to execute all subsequent commands. After issuing this command, you do not need to enter a folder name for workflow, task, and session commands. If you enter a folder name in a command after the setfolder command, that folder name overrides the default folder name for that command only.

In the interactive mode, enter the following syntax at the pmcmd prompt to designate a folder as the default folder:

setfolder folder



SetnowaitThe setnowait command instructs the PowerCenter Server to execute subsequent commands in the nowait mode. The nowait mode is the default mode.

In the interactive mode, enter the following syntax at the pmcmd prompt to instruct the PowerCenter Server to execute subsequent commands in the nowait mode:

setnowait

When the nowait mode is set, the pmcmd prompt is available after the PowerCenter Server receives the previous command. No parameters are required for this command.


SetwaitThe setwait command instructs the PowerCenter Server to execute subsequent commands in the wait mode. The pmcmd prompt is available only after the PowerCenter Server completes the previous command.

In the interactive mode, enter the following syntax at the pmcmd prompt to instruct the PowerCenter Server to execute subsequent commands in the wait mode:

setwait

No parameters are required for this command.


ShowsettingsThe showsettings command returns the name of the PowerCenter Server and repository to which pmcmd is connected. It displays the username, wait mode, and default folder. No parameters are required for this command.

In the interactive mode, enter the following syntax at the pmcmd prompt to display interactive mode settings:

showsettings


ShutdownserverThe shutdownserver command stops the PowerCenter Server. You must have the Super User or Administer Server privilege to use this command.

You can shut down the PowerCenter Server in the complete, stop, or abort mode. In the complete mode, pmcmd allows currently running workflows to complete before shutting down the PowerCenter Server. In the stop mode, the PowerCenter Server stops the running workflows. In the abort mode, the PowerCenter Server aborts the running workflows. For

pmcmd Reference 605

more information on the implications of stopping or abort a workflow, see “Stopping or Aborting the Workflow” on page 129.

In the command line mode, use the following syntax to stop the PowerCenter Server:

pmcmd shutdownserver




<-complete|-stop|-abort>

In the interactive mode, enter the following syntax at the pmcmd prompt to stop the PowerCenter Server:

shutdownserver

<-complete|-stop|-abort>


StarttaskThe starttask command starts a task.

In the command line mode, use the following syntax to start a task:

pmcmd starttask






[-paramfile paramfile]

[-wait|-nowait]

[-recovery]

taskInstancePath

In the interactive mode, enter the following syntax at the pmcmd prompt to start a task:

starttask




[-wait|-nowait]

[-recovery]

taskInstancePath


Write the taskInstancePath as a fully qualified string. If the task is within a worklet, write the string as WorkletName.TaskName. If the task is directly within a workflow, enter only the task.

Using Parameter Files with StarttaskWhen you start a task, you can optionally enter the directory and name of a parameter file. The PowerCenter Server runs the task using the parameters in the file you specify.

For UNIX shell users, enclose the parameter file name in single quotes:

-paramfile ’$PMRootDir/myfile.txt’

For Windows command prompt users, the parameter file name cannot have beginning or trailing spaces. If the name includes spaces, enclose the file name in double quotes:

-paramfile ”$PMRootDir\my file.txt”

When you write a pmcmd command that includes a parameter file located on another machine, use the backslash (\) with the dollar sign ($). This ensures that the machine where the variable is defined expands the server variable.

pmcmd starttask -uv USERNAME -pv PASSWORD -s SALES:6258 -f east -w wSalesAvg -paramfile ’\$PMRootDir/myfile.txt’ taskA


StartworkflowThe startworkflow command starts a workflow.

In the command line mode, use the following syntax to start a workflow:

pmcmd startworkflow





[<-startfrom> taskInstancePath]

[-recovery]


[<-localparamfile|-lpf> localparamfile]

[-wait|-nowait]

workflow

In the interactive mode, enter the following syntax at the pmcmd prompt to start a workflow:

startworkflow


pmcmd Reference 607

[<-startfrom> taskInstancePath]

[-recovery]


[<-localparamfile|-lpf> localparamfile]

[-wait|-nowait]

workflow

Use the -startfrom flag to start the workflow at a designated taskInstancePath. Write the taskInstancePath as a fully qualified string. If the task is within a worklet, write the string as WorkletName.TaskName. If the task is directly within a workflow, enter only the task. If you do not specify a starting point, the workflow starts at the Start task.

Using Parameter Files with StartworkflowWhen you start a workflow, you can optionally enter the directory and name of a parameter file. The PowerCenter Server runs the workflow using the parameters in the file you specify. For UNIX shell users, enclose the parameter file name in single quotes. For Windows command prompt users, the parameter file name cannot have beginning or trailing spaces. If the name includes spaces, enclose the file name in double quotes

You can use choose parameter files on the following machines:

♦ PowerCenter Server machine. When you use a parameter file located on the PowerCenter Server machine, use the -paramfile option to indicate the location and name of the parameter file.

On UNIX, use the following syntax:

-paramfile ’$PMRootDir/myfile.txt’

On Windows, use the following syntax:

-paramfile ”$PMRootDir\my file.txt”

♦ Local machine. When you use a parameter file located on the machine where pmcmd is invoked, pmcmd passes variables and values in the file to the PowerCenter Server. When you list a local parameter file, specify the absolute path or relative path to the file. Use the -localparamfile or -lpf option to indicate the location and name of the local parameter file.

On UNIX, use the following syntax:

-lpf ‘param_file.txt’

-lpf ‘c:\Informatica\parameterfiles\param file.txt’

-localparamfile ‘c:\Informatica\parameterfiles\param file.txt’

On Windows, use the following syntax:

-lpf param_file.txt

-lpf “c:\Informatica\parameterfiles\param file.txt”

-localparamfile param_file.txt


♦ Shared network drives. When you use a parameter file located on another machine, use the backslash (\) with the dollar sign ($). This ensures that the machine where the variable is defined expands the server variable.

-paramfile ’\$PMRootDir/myfile.txt’


StoptaskThe stoptask command stops a task.

In the command line mode, use the following syntax to stop a task:

pmcmd stoptask






[-wait|-nowait]

taskInstancePath

In the interactive mode, enter the following syntax at the pmcmd prompt to stop a task:

stoptask



[-wait|-nowait] taskInstancePath



StopworkflowThe stopworkflow command stops a workflow.

In the command line mode, use the following syntax to stop a workflow:

pmcmd stopworkflow





pmcmd Reference 609

[-wait|-nowait]

workflow

In the interactive mode, enter the following syntax at the pmcmd prompt to stop a workflow:

stopworkflow


[-wait|-nowait]

workflow


UnscheduleworkflowThe unscheduleworkflow command instructs the PowerCenter Server to remove the workflow from the schedule.

In the command line mode, enter the following syntax at the pmcmd prompt to remove the workflow from the schedule:

pmcmd unscheduleworkflow <-serveraddr|-s> [host:]portno

<<-user|-u> username|<-uservar|-uv> user_env_var>

<<-password|-p> password|<-passwordvar|-pv> password_env_var>

[<-folder|-f> folder] workflow

In the interactive mode, enter the following syntax at the pmcmd prompt to remove the workflow from the schedule:

unscheduleworkflow [<-folder|-f> folder] workflow


UnsetfolderThe unsetfolder command designates no folder as the default folder. After you issue this command, you must specify a folder name each time you enter a command for a session, workflow, or task.

In the interactive mode, enter the following syntax at the pmcmd prompt to clear the setfolder command:

unsetfolder

No parameters are required for this command.



VersionThe version command displays the PowerCenter version and Informatica trademark and copyright information.

In the command line mode, use the following command to verify the PowerCenter version:

pmcmd version

In the interactive mode, enter the following syntax at the pmcmd prompt to verify the PowerCenter version:

version

WaittaskThe waittask command instructs the PowerCenter Server to complete the task before returning the pmcmd prompt to the command prompt or shell.

In the command line mode, use the following syntax to set a task in the wait mode:

pmcmd waittask






taskInstancePath

In the interactive mode, enter the following syntax at the pmcmd prompt to set a task in the wait mode:

waittask



taskInstancePath



WaitworkflowThe waitworkflow command notifies you whether the specified workflow has run successfully or is not running. If the workflow is running, pmcmd indicates the success with return code 0 after the workflow has completed. If the workflow is not running, pmcmd indicates the

pmcmd Reference 611

workflow is not running with return code 3. For more information on pmcmd return codes, see “pmcmd Return Codes” on page 590.

The waitworkflow command returns the pmcmd prompt to the command prompt or shell when a workflow completes.

In the command line mode, use the following syntax to set a workflow to the wait mode:

pmcmd waitworkflow





workflow

In the interactive mode, enter the following syntax at the pmcmd prompt to set a workflow to the wait mode:

waitworkflow


workflow

You can use waitworkflow in conjunction with the startworkflow command if you are running scripts. For example, you may want to check the status of a critical workflow that was previously started. You can use the waitworkflow command to wait for that workflow to complete before you start the next workflow.



C h a p t e r 2 4

Session Caches

This chapter includes the following topics:

♦ Overview, 614

♦ Determining Cache Requirements, 617

♦ Cache Partitioning, 620

♦ Aggregator Caches, 621

♦ Joiner Caches, 624

♦ Lookup Caches, 628

♦ Rank Caches, 632

613

Overview

The PowerCenter Server creates index and data caches in memory for Aggregator, Rank, Joiner, and Lookup transformations in a mapping. The PowerCenter Server stores key values in the index cache and output values in the data cache. You configure memory parameters for the index and data cache in the transformation or session properties.

If the PowerCenter Server requires more memory, it stores overflow values in cache files. When the session completes, the PowerCenter Server releases cache memory, and in most circumstances, it deletes the cache files.

The PowerCenter Server creates cache files based on the PowerCenter Server code page.

Table 24-1 gives an overview of the type of information that the PowerCenter Server stores in the index and data caches:

Memory CacheThe PowerCenter Server creates a memory cache based on the size configured in the session properties. When you create a mapping, you specify the index and data cache size for each transformation instance. When you create a session, you can override the index and data cache size for each transformation instance in the session properties.

When you configure a session, you calculate the amount of memory the PowerCenter Server needs to process the session. Calculate requirements based on factors such as processing overhead and column size for key and output columns.

By default, the PowerCenter Server allocates 1,000,000 bytes to the index cache and 2,000,000 bytes to the data cache for each transformation instance. If the PowerCenter Server cannot allocate the configured amount of cache memory, it cannot initialize the session and the session fails.

If a server grid has 32-bit and 64-bit servers, and if a session exceeds 2 GB of memory, the master server assigns it to a 64-bit server. For information on server grids, see “Working with Server Grids” on page 446.

Table 24-1. Caching Storage Overview

Transformation Index Cache Data Cache

Aggregator Stores group values as configured in the group by ports.

Stores calculations based on the group by ports.

Rank Stores group values as configured in the group by ports.

Stores ranking information based on the group by ports.

Joiner Stores index values for the master source table as configured in the join condition.

Stores master source rows.

Lookup Stores lookup condition information. Stores lookup data that is not stored in the index cache.

614 Chapter 24: Session Caches

When you specify large cache sizes in transformations on 64-bit machines, the PowerCenter Server might run out of physical memory and perform slower. If the cache size forces the PowerCenter Server to swap virtual memory and to spill to disk, performance decreases.

Note: A PowerCenter Server running on a 32-bit machine cannot run a session if the total size of all the configured session caches is more than 2 GB.

Cache FilesIf the PowerCenter Server requires more memory than the configured cache size, it stores overflow values in the cache files. Since paging to disk can slow session performance, try to configure the index and data cache sizes to store data in memory.

The PowerCenter Server creates the index and data cache files by default in the PowerCenter Server variable directory, $PMCacheDir. If you do not define $PMCacheDir, the PowerCenter Server saves the files in the PMCache directory specified in the UNIX configuration file or the cache directory in the Windows registry. If the UNIX PowerCenter Server does not find a directory there, it creates the index and data files in the installation directory. If the PowerCenter Server on Windows does not find a directory there, it creates the files in the system directory.

If a cache file handles more than 2 GB of data, the PowerCenter Server creates multiple index and data files. When creating these files, the PowerCenter Server appends a number to the end of the filename, such as PMAGG*.idx1 and PMAGG*.idx2. The number of index and data files are limited only by the amount of disk space available in the cache directory.

When you run a session, the PowerCenter Server writes a message in the session log indicating the cache file name and the transformation name. When a session completes, the PowerCenter Server typically deletes index and data cache files. However, you may find index and data files in the cache directory under the following circumstances:

♦ The session performs incremental aggregation.

♦ You configure the Lookup transformation to use a persistent cache.

♦ The session does not complete successfully.

The PowerCenter Server use the following naming convention when it creates cache files:

[<Name Prefix> | <Prefix> <session ID>_<transformation ID>]_[partition index]<suffix>.[overflow index]

Overview 615

Table 24-2 describes the naming convention for cache files that the PowerCenter Server creates:

For example, in the file name, PMLKUP8_4_2.idx, PMLKUP identifies the transformation type as Lookup, 8 is the session ID, 4 is the transformation ID, and 2 is the partition index.

The cache directory should be local to the PowerCenter Server. You might encounter performance or reliability problems when you cache large quantities of data on a mapped or mounted drive.

For details on tuning the caches, see “Performance Tuning” on page 635.

Table 24-2. Cache File Names

File Name Component Description

Name Prefix Cache file name prefix configured in the Lookup transformation.

Prefix Describes the type of transformation:- Aggregator transformation is PMAGG.- Joiner transformation is PMJNR.- Lookup transformation is PMLKUP.- Rank transformation is PMAGG.

Session ID Session instance ID number.

Transformation ID Transformation instance ID number.

Partition Index If the session contains more than one partition, this identifies the partition number. The partition index is zero-based, so the first partition has no partition index. Partition index 2 indicates a cache file created in the third partition.

Suffix Identifies the type of file:- Index file is .idx.- Data file is .dat.

Overflow Index If a cache file handles more than 2 GB of data, the PowerCenter Server creates multiple index and data files. When creating these files, the PowerCenter Server appends an overflow index to the filename, such as PMAGG*.idx.1 and PMAGG*.idx.2. The number of index and data files are limited by the amount of disk space available in the cache directory.


Determining Cache Requirements

When you configure a mapping that uses an Aggregator, Rank, Joiner, or Lookup transformation, you configure memory cache on the Properties tab of the transformation. You can override these memory requirements in the session properties. To calculate the index and data cache, you need to consider column and row requirements as well as processing overhead.

The PowerCenter Server requires processing overhead to cache data and index information. Column overhead includes a null indicator, and row overhead can include row ID and key information.

Use the following steps to calculate and configure the cache size required to run a mapping:

1. Add the size requirements for the columns in the cache.

2. Add row or group processing overhead.

3. Multiply by the number of groups or rows.

4. Configure the index and data cache in the transformation properties. You configure cache sizes for each transformation on the Properties tab in the mapping.

The amount of memory you configure depends on the partition properties and how much memory cache and disk cache you want to use. If you use cache partitioning, the PowerCenter Server requires only a portion of total cache memory for each partition. For information on cache partitioning, see “Cache Partitioning” on page 620.

Cache CalculationsTo determine cache requirements for a session, first add the total column size in the cache to the row overhead. Multiply the result by the number of groups or rows in the cache. This gives the minimum caching requirements. To determine the maximum requirements for the index cache, you multiply the minimum requirements by two.

The following tables provide the calculations for the minimum cache requirements for each transformation:

Table 24-3. Aggregate Cache Calculation

Cache Calculation Columns in Cache

Index # groups [( Σ column size) + 17] Group by columns.

Data # groups[( Σ column size) + 7] - Non group by input ports used in non-aggregate output expression.

- Non group by input/output ports.- Local variable ports.- Column containing aggregate function (multiply by

three).*

* Each aggregate function has different cache space requirements. As a general rule, you can multiply the column containing the aggregate function by three.

Determining Cache Requirements 617

For more information about each cache, see the separate sections in this chapter.

Cache Column SizesWhen you calculate the column size for each cache, include the size of the data and additional processing requirements.

Table 24-7 gives the columns sizes for index and data cache calculations:

Table 24-4. Rank Cache Calculation


Index # groups [( Σ column size) + 17] Group by columns.

Data # groups [(# ranks *( Σ column size + 10)) + 20] - Non group by input ports used in non-aggregate output expression.

- Non group by input/output ports.- Local variable ports.- Rank ports.

Table 24-5. Joiner Cache Calculation


Index # master rows [( Σ column size) + 16] Master column in join conditions.

Data # master rows [( Σ column size) + 8] Master column not in join condition and used for output.

Table 24-6. Lookup Cache Calculation


Index (minimum)

200 * [( Σ column size) + 16] Columns in lookup condition.

Index (maximum)

# rows in lookup table [( Σ column size) + 16] * 2 Columns in lookup condition.

Data # rows in lookup table [( Σ column size) + 8] Connected output ports not in the lookup condition.Return port (for unconnected Lookup transformations).

Table 24-7. Column Sizes for Cache Calculations

Datatype Aggregator, Rank Joiner, Lookup

Binary precision + 2 precision + 8Round to nearest multiple of 8

Date/Time 18 24

Decimal, high precision off (all precision) 10 16

Decimal, high precision on (precision <=18) 18 24


The column sizes include the bytes required for a null indicator.

Additionally, to increase lookup and join performance, the PowerCenter Server aligns all data for lookup and joiner caches on an eight byte boundary. So, each Lookup and Joiner column includes rounding to the nearest multiple of eight.

Decimal, high precision on (precision >18, <=28) 22 32

Decimal, high precision on (precision >28) 10 16

Decimal, high precision on (negative scale) 10 16

Double 10 16

Real 10 16

Integer 6 16

Small integer 6 16

NString, NText, String, Text Unicode mode: 2*(precision + 2)ASCII mode: precision + 3

Unicode mode: 2*(precision + 5)ASCII mode: precision + 9

Table 24-7. Column Sizes for Cache Calculations

Datatype Aggregator, Rank Joiner, Lookup

Determining Cache Requirements 619

Cache Partitioning

When you create a session with multiple partitions, the PowerCenter Server can partition caches for the Aggregator, Joiner, Lookup, and Rank transformations. It creates a separate cache for each partition, and each partition works with only the rows needed by that partition. As a result, the PowerCenter Server requires only a portion of total cache memory for each partition. When you run a session, the PowerCenter Server accesses the cache in parallel for each partition. If you do not use cache partitioning, the PowerCenter Server accesses the cache serially for each partition.

After you configure the session for partitioning, you can configure memory requirements and cache directories for each transformation in the Transformations view on the Mapping tab of the session properties. To configure the memory requirements, calculate the total requirements for a transformation, and divide by the number of partitions. To further improve performance, you can configure separate directories for each partition.

The guidelines for cache partitioning is different for each cached transformation:

♦ Aggregator transformation. The PowerCenter Server uses cache partitioning for any multi-partitioned session with an Aggregator transformation. You do not have to set a partition point at the Aggregator transformation. For more caching information, see “Aggregator Caches” on page 621.

♦ Joiner transformation. The PowerCenter Server uses cache partitioning when you create a partition point at the Joiner transformation. For more caching information, see “Joiner Caches” on page 624.

♦ Lookup transformation. The PowerCenter Server uses cache partitioning when you create a hash auto-keys partition point at the Lookup transformation. For more caching information, see “Lookup Caches” on page 628.

♦ Rank transformation. The PowerCenter Server uses cache partitioning for any multi-partitioned session with a Rank transformation. You do not have to set a partition point at the Rank transformation. For more caching information, see “Joiner Caches” on page 624.

For more partitioning information, see “Pipeline Partitioning” on page 345.


Aggregator Caches

When the PowerCenter Server runs a session with an Aggregator transformation, it stores data in memory until it completes the aggregation. The PowerCenter Server uses cache partitioning when you create multiple partitions in a pipeline that contains an Aggregator transformation. It creates one memory cache and one disk cache for each partition and routes data from one partition to another based on group key values of the transformation.

After you configure the partitions in the session, you can configure the memory requirements and cache directories for the Aggregator transformation on the Mappings tab in session properties. Allocate enough disk space to hold one row in each aggregate group.

If you use incremental aggregation, the PowerCenter Server saves the cache files in the cache file directory. For information about caching with incremental aggregation, see “Partitioning Guidelines with Incremental Aggregation” on page 578.

Note: The PowerCenter Server uses memory to process an Aggregator transformation with sorted ports. It does not use cache memory. You do not need to configure cache memory for Aggregator transformations that use sorted ports.

For more information about the Aggregator transformation, see “Aggregator Transformation” in the Transformation Guide.

Calculating the Aggregator Index CacheThe index cache holds group information from the group by ports. Use the following information to calculate the minimum aggregate index cache size:

Aggregate Index Cache Calculation Columns in Cache

# groups [( Σ column size) + 17] Group by columns.

Aggregator Caches 621

For example, the following Aggregator transformation, AGG_SalesPerRegionItem, groups by STORE_ID and ITEM.

Use the column sizes in Table 24-7 on page 618 to add the group by columns.

You know that there are 36 stores and 2,000 items, so the total number of groups is 72,000. Use the following calculation to determine the minimum index cache requirements:

72,000 * (24 + 17) = 2,952,000

Double the size to determine the maximum index cache requirements:

2,952,000 * 2 = 5,904,000

Therefore, this Aggregator transformation requires an index cache size between 2,952,000 and 5,904,000 bytes.

Calculating the Aggregator Data CacheThe data cache holds row data for variable ports and connected output ports. As a result, the data cache is generally larger than the index cache. To reduce the data cache size, connect only

Column Name Column Type Datatype Size

STORE_ID Group by Integer 6

ITEM Group by String (15) 18

TOTAL COLUMN SIZE = 24


the necessary input/output ports to subsequent transformations. Use the following information to calculate the minimum aggregate data cache size:

The following figure shows the connected output ports of AGG_SalesPerRegionItem:

Use the column sizes in Table 24-7 on page 618 to add the columns in the data cache:

Note that you do not use STORE_ID and ITEM in the data cache calculation. These columns are connected to the target, but you do not use them in the cache calculation because they are group by ports and are used in the index cache calculation.

The total number of groups as calculated for the index cache size is 72,000. Use the following calculation to determine the minimum data cache requirements:

72,000 * (36 + 7) = 3,096,000

Therefore, this Aggregator transformation requires a data cache size of 3,096,000 bytes.

Aggregate Data Cache Calculation Columns in Cache

# groups[( Σ column size) + 7] - Non group by input ports used in non-aggregate output expression.- Non group by input/output ports.- Local variable ports.- Port containing aggregate function (multiply by three).*

*The cache space requirements for aggregate functions are different for each function. However, you can multiply the port containing the aggregate function by three for all aggregate functions.


ORDER_ID Non group by input/output Integer 6

SALES_PER_STORE_ITEMS Port containing aggregate function Decimal (12, 2) 30*


*Remember to multiply the port containing the aggregate function by three. For more information, see Table 24-3 on page 617.

Aggregator Caches 623

Joiner Caches

When the PowerCenter Server runs a session with a Joiner transformation, it reads rows from the master and detail sources concurrently and builds index and data caches based on the master rows. The PowerCenter Server then performs the join based on the detail source data and the cache data.

The number of rows the PowerCenter Server stores in the cache depends on the partitioning scheme, the data in the master source, and whether or not you use sorted input. For more information on how many rows the PowerCenter Server stores, see “Calculating the Number of Master Rows” on page 625.

When you create multiple partitions in a session, the PowerCenter Server processes the Joiner transformation differently when you use n:n partitioning and when you use 1:n partitioning.

♦ Processing master and detail data for outer joins. When you run a multi-partitioned session with a partitioned Joiner transformation, the PowerCenter Server builds one cache per partition. In a single-partitioned master pipeline (1:n), the PowerCenter Server outputs unmatched master rows after it processes all detail partitions. In a multi-partitioned master pipeline (n:n), the PowerCenter Server outputs unmatched master rows after it processes the partition for each detail cache.

♦ Configuring memory requirements. When you run a session with a Joiner transformation, the PowerCenter Server uses n times the memory you specify on the Transformation view of the Mapping tab. The PowerCenter Server might page to disk if you do not specify enough memory.

When you use 1:n partitioning, each partition requires as much memory as a 1:1 partition session. When you configure the cache for the Joiner transformation, enter the total transformation memory requirements for a single partition.

When you use n:n partitioning, each partition requires only a portion of the memory required by a 1:1 partition session. When you configure the cache, divide the memory requirements for a 1:1 partition session by the number of partitions. Enter that amount for the cache requirements.

For example, you calculate the following cache requirements for a Joiner transformation instance and determine that the transformation requires 2,000,000 bytes of memory for the index cache and 4,000,000 bytes of memory for the data cache. You create four partitions for the pipeline. If you use 1:n partitioning, you enter 2,000,000 bytes for the index cache and 4,000,000 bytes for the data cache. If you use n:n partitioning, enter 500,000 bytes for the index cache and 1,000,000 bytes for the data cache.

To increase join performance, the PowerCenter Server aligns all data for joiner caches on an eight byte boundary.

Note: To use n:n partitioning with a Joiner transformation, you must create a partition point at the Joiner transformation. This allows you to create multiple partitions for both the master and detail source of a Joiner transformation.

For more information about the Joiner transformation, see “Joiner Transformation” in the Transformation Guide.


Calculating the Number of Master RowsThe number of rows the PowerCenter Server stores in the cache depends on the partitioning scheme, the data in the master source, and whether or not you use sorted input.

The PowerCenter Server caches all master rows with a unique key in the index cache, and all master rows in the data cache under any of the following circumstances:

♦ You do not use sorted input.

♦ You use sorted input and 1:n partitioning.

However, when you use sorted input and you use n:n partitioning, the PowerCenter Server caches a different number of rows in the index and data cache:

♦ Index cache. The PowerCenter Server caches 100 master rows with unique keys.

♦ Data cache. The PowerCenter Server caches the master rows in the data cache that correspond to the 100 rows in the index cache. The number of rows it stores in the data cache depends on the data. For example, if every master row contains a unique key, the PowerCenter Server stores 100 rows in the data cache. However, if the master data contains multiple rows with the same key, the PowerCenter Server stores more than 100 rows in the data cache.

Calculating the Joiner Index CacheThe index cache holds rows from the master source that are in the join condition. Use the following information to calculate the minimum joiner index cache size:

Joiner Index Cache Calculation Columns in Cache

# master rows [( Σ column size) + 16] Master column in join condition.

Joiner Caches 625

For example the Joiner transformation, JNR_ORDERS_PRODUCTS, does not use sorted input, and it joins the sources ORDERS and PRODUCTS on ITEM_NO:

Use the column sizes in Table 24-7 on page 618 to add the columns in the index cache:

PRODUCTS is the master source and has 90,000 rows. Use the following calculation to determine the minimum index cache requirements:

90,000 * (16 + 16) = 2,880,000


2,880,000 * 2 = 5,760,000

Therefore, this Joiner transformation requires an index cache size between 2,880,000 and 5,760,000 bytes.

Calculating the Joiner Data CacheThe data cache holds rows from the master source until the PowerCenter Server joins the data. Use the following information to calculate the minimum joiner data cache size:


ITEM_NO Master column in join condition Decimal (10) 16


Joiner Data Cache Calculation Columns in Cache

# master rows [( Σ column size) + 8] Master column not in join condition and used for output.


The following figure shows the connected output ports for JNR_ORDERS_PRODUCTS:

Use the column sizes in Table 24-7 on page 618 to add the columns for the data cache:

Note that you do not use ITEM_NO in the data cache calculation because it is part of the join condition and is used in the index cache.

The master source has 90,000 rows.

Use the following calculation to determine the minimum data cache requirements:

90,000 * (62 + 8) = 6,300,000

This Joiner transformation requires a data cache size of 6,300,000 bytes.


ITEM_NAME Master column not in join condition String (23) 32

PRODUCT CATEGORY Master column not in join condition Decimal (21) 30


Joiner Caches 627

Lookup Caches

When the PowerCenter Server builds a lookup cache in memory, it processes the first row of data in the transformation. It queries the cache for each row that enters the transformation.

Configure the index and data cache memory for each Lookup transformation. The PowerCenter Server caches data differently for static and dynamic caches and also for sessions that use cache partitioning.

When you run the session, the PowerCenter Server rebuilds a persistent cache if any cache file is missing or invalid.

For more information about configuring the lookup cache and how the PowerCenter Server processes lookup requests, see “Lookup Caches” in the Transformation Guide.

Static CacheWhen you use a static lookup cache, the PowerCenter Server creates one memory cache for each partition.

If you use cache partitioning, the PowerCenter Server requires only a portion of the total memory to cache each partition. So, when you configure cache size, you can divide the total memory requirements by the number of partitions.

If you do not use cache partitioning, the PowerCenter Server requires as much memory for each partition as it does for a single partition pipeline. So, when you configure cache size, you enter the total memory requirements for the transformation.

If two Lookup transformations in a mapping share the cache, the PowerCenter Server does not allocate additional memory for shared transformations in the same pipeline stage. For shared transformations in a different pipeline stage, the PowerCenter Server does allocate additional memory.

Static Lookup transformations that use the same data or a subset of data to create a disk cache can share the disk cache. However, the lookup keys may be different, so the transformations must have separate memory caches.

For more information about caching the Lookup transformation, see “Lookup Caches” in the Transformation Guide.

Dynamic CacheWhen you use a dynamic lookup cache, the PowerCenter Server creates the memory cache based on whether you use cache partitioning or not.

If you use cache partitioning, the PowerCenter Server creates one memory cache for each partition. It requires only a portion of the total memory to cache each partition. So, when you configure cache size, you can divide the total memory requirements by the number of partitions.


If you do not use cache partitioning, the PowerCenter Server creates one memory cache and one disk cache for each transformation. All partitions share the memory and disk cache. When you configure the cache size, enter the total memory requirements in the transformation or on the Mapping tab in the session properties.

When Lookup transformations share a dynamic cache, the PowerCenter Server updates the memory cache and disk cache. To keep the caches synchronized, the PowerCenter Server must share the disk cache and the corresponding memory cache between the transformations.

Sharing Partitioned CachesUse the following guidelines when you share partitioned Lookup caches:

♦ Lookup transformations can share a partitioned cache if the transformations meet the following conditions:

− The cache structures are identical. The lookup/output ports for the first shared transformation must match the lookup/output ports for the subsequent transformations.

− The transformations have the same lookup conditions, and the lookup condition columns are in the same order.

♦ You cannot share a partitioned cache with a non-partitioned cache.

♦ When you share Lookup caches across target load order groups, you must configure the target load order groups with the same number of partitions.

Note: If the PowerCenter Server detects a mismatch between Lookup transformations sharing an unnamed cache, it rebuilds the cache files. If the PowerCenter Server detects a mismatch between Lookup transformations sharing a named cache, it fails the session.

Calculating the Lookup Index CacheThe lookup index cache holds data for the columns used in the lookup condition. The formula for calculating the minimum lookup index cache size is different than calculating the maximum size.

For best session performance, specify the maximum lookup index cache size. If you specify a lookup index cache less than the minimum cache size, the PowerCenter Server fails the session.

Calculating the Minimum Lookup Index CacheThe minimum size for a lookup index cache is independent of the number of source rows. Use the following information to calculate the minimum lookup index cache for both connected and unconnected Lookup transformations:

Lookup Index Cache Calculation Columns in Cache

200 * [( Σ column size) + 16] Columns in lookup condition.

Lookup Caches 629

Calculating the Maximum Lookup Index CacheUse the following information to calculate the maximum lookup index cache for both connected and unconnected Lookup transformations:

Example The Lookup transformation, LKP_PROMOS, looks up values based on the ITEM_ID. It uses the following lookup condition:

ITEM_ID = IN_ITEM_ID1

Use the column sizes in Table 24-7 on page 618 to add the columns for the index cache:

The lookup condition uses one column, ITEM_ID, and the table contains 60,000 rows.

Use the following calculation to determine the minimum index cache requirements:

200 * (16 + 16) = 6,400

Use the following calculation to determine the maximum index cache requirements:

60,000 * (16 + 16) * 2 = 3,840,000

Lookup Index Cache Calculation Columns in Cache

# rows in lookup table [( Σ column size) + 16] * 2 Columns in lookup condition.


ITEM_ID Column in lookup condition integer 16



Therefore, this Lookup transformation requires an index cache size between 6,400 and 3,840,000 bytes.

Calculating the Lookup Data CacheIn a connected transformation, the data cache contains data for the connected output ports, not including ports used in the lookup condition. In an unconnected transformation, the data cache contains data from the return port.

Use the following information to calculate the minimum data cache requirements for both connected and unconnected Lookup transformations:

The following figure shows the connected output ports for LKP_PROMOS:

Use the column sizes in Table 24-7 on page 618 to add the columns for the data cache:

The lookup table has 60,000 rows.


60,000 * (32 + 8) = 2,400,000

This Lookup transformation requires a data cache size of 2,400,000 bytes.

Lookup Data Cache Calculation Columns in Cache

# rows in lookup table [( Σ column size) + 8] Connected output ports not in the lookup condition.Use return ports for unconnected transformations.


PROMOTION_ID Connected output port not in lookup condition Integer 16

DISCOUNT Connected output port not in lookup condition Decimal (10) 16


Lookup Caches 631

Rank Caches

When the PowerCenter Server runs a session with a Rank transformation, it compares an input row with rows in the data cache. If the input row out-ranks a stored row, the PowerCenter Server replaces the stored row with the input row.

For example, you configure a Rank transformation to find the top three sales. The PowerCenter Server reads the following input data:

SALES

10,000

12,210

5,000

2,455

6,324

The PowerCenter Server caches the first three rows (10,000, 12,210, and 5,000). When the PowerCenter Server reads the next row (2,455) it compares it to the cache values. Since the row is lower in rank than the cached rows, it discards the row with 2,455. The next row (6,324), however, is higher in rank than one of the cached rows. Therefore, the PowerCenter Server replaces the cached row with the higher-ranked input row.

If the Rank transformation is configured to rank across multiple groups, the PowerCenter Server ranks incrementally for each group it finds.

The PowerCenter Server uses cache partitioning, when you create multiple partitions in a pipeline that contains a Rank transformation. It creates one memory cache and one disk cache per partition and routes data from one partition to another based on group key values of the transformation.

After you configure the partitions in the session, you can configure the memory requirements and cache directories for the Rank transformation on the Mappings tab in session properties.

For more information about the Rank transformation, see “Rank Transformation” in the Transformation Guide.

Calculating the Rank Index CacheThe index cache holds group information from the group by ports. Use the following information to calculate the minimum rank index cache size:

Rank Index Cache Calculation Columns in Cache

# groups [( Σ column size) + 17] Group by columns.


For example, the Rank transformation, RNK_TOPTEN, groups by product category:

Use the column sizes in Table 24-7 on page 618 to add the columns in the index cache:

There are 10,000 product categories, so the total number of groups is 10,000. Use the following calculation to determine the minimum index cache requirements:

10,000 * (24 + 17) = 410,000


410,000 * 2 = 820,000

Therefore, this Rank transformation requires an index cache size between 410,000 and 820,000 bytes.

Calculating the Rank Data CacheThe data cache size is proportional to the number of ranks. It holds row data until the PowerCenter Server completes the ranking and is generally larger than the index cache. To reduce the data cache size, connect only the necessary input/output ports to subsequent


PRODUCT_CATEGORY Group by String (21) 24


Rank Caches 633

transformations. Use the following information to calculate the minimum rank data cache size:

The following figure shows the connected output ports of RNK_TOPTEN:

Use the column sizes in Table 24-7 on page 618 to add the columns in the data cache:

RNK_TOPTEN ranks by price, and the total number of ranks is 10. The number of groups is 10,000.


10,000[(10 * (46 + 10)) + 20] = 5,800,000

This Rank transformation requires a data cache size of 5,800,000 bytes.

Rank Data Cache Calculation Columns in Cache

# groups [(# ranks *( Σ column size + 10)) + 20] - Non group by input ports used in non-aggregate output expression.

- Non group by input/output ports.- Local variable ports.- Rank ports.


ITEM_NO Non group by input/output port Decimal (10) 10

ITEM_NAME Non group by input/output port String (23) 26

PRICE Rank port Decimal (14) 10



C h a p t e r 2 5

Performance Tuning


♦ Overview, 636

♦ Identifying the Performance Bottleneck, 637

♦ Optimizing the Target Database, 642

♦ Optimizing the Source Database, 645

♦ Optimizing the Mapping, 647

♦ Optimizing the Session, 655

♦ Optimizing the System, 660

♦ Pipeline Partitioning, 663

635

Overview

The goal of performance tuning is to optimize session performance by eliminating performance bottlenecks. To tune the performance of a session, first you identify a performance bottleneck, eliminate it, and then identify the next performance bottleneck until you are satisfied with the session performance. You can use the test load option to run sessions when you tune session performance.

The most common performance bottleneck occurs when the PowerCenter Server writes to a target database. You can identify performance bottlenecks by the following methods:

♦ Running test sessions. You can configure a test session to read from a flat file source or to write to a flat file target to identify source and target bottlenecks.

♦ Studying performance details. You can create a set of information called performance details to identify session bottlenecks. Performance details provide information such as buffer input and output efficiency. For details about performance details, see “Creating and Viewing Performance Details” on page 436.

♦ Monitoring system performance. You can use system monitoring tools to view percent CPU usage, I/O waits, and paging to identify system bottlenecks.

Once you determine the location of a performance bottleneck, you can eliminate the bottleneck by following these guidelines:

♦ Eliminate source and target database bottlenecks. Have the database administrator optimize database performance by optimizing the query, increasing the database network packet size, or configuring index and key constraints.

♦ Eliminate mapping bottlenecks. Fine tune the pipeline logic and transformation settings and options in mappings to eliminate mapping bottlenecks.

♦ Eliminate session bottlenecks. You can optimize the session strategy and use performance details to help tune session configuration.

♦ Eliminate system bottlenecks. Have the system administrator analyze information from system monitoring tools and improve CPU and network performance.

If you tune all the bottlenecks above, you can further optimize session performance by increasing the number of pipeline partitions in the session. Adding partitions can improve performance by utilizing more of the system hardware while processing the session.

Because determining the best way to improve performance can be complex, change only one variable at a time, and time the session both before and after the change. If session performance does not improve, you might want to return to your original configurations.

636 Chapter 25: Performance Tuning

Identifying the Performance Bottleneck

The first step in performance tuning is to identify the performance bottleneck. Performance bottlenecks can occur in the source and target databases, the mapping, the session, and the system. Generally, you should look for performance bottlenecks in the following order:

1. Target

2. Source

3. Mapping

4. Session

5. System

You can identify performance bottlenecks by running test sessions, viewing performance details, and using system monitoring tools.

Identifying Target BottlenecksThe most common performance bottleneck occurs when the PowerCenter Server writes to a target database. You can identify target bottlenecks by configuring the session to write to a flat file target. If the session performance increases significantly when you write to a flat file, you have a target bottleneck.

If your session already writes to a flat file target, you probably do not have a target bottleneck. You can optimize session performance by writing to a flat file target local to the PowerCenter Server.

Causes for a target bottleneck may include small check point intervals, small database network packet size, or problems during heavy loading operations. For details about eliminating a target bottleneck, see “Optimizing the Target Database” on page 642.

Identifying Source BottlenecksPerformance bottlenecks can occur when the PowerCenter Server reads from a source database. If your session reads from a flat file source, you probably do not have a source bottleneck. You can improve session performance by setting the number of bytes the PowerCenter Server reads per line if you read from a flat file source.

If the session reads from relational source, you can use a filter transformation, a read test mapping, or a database query to identify source bottlenecks.

Using a Filter TransformationYou can use a filter transformation in the mapping to measure the time it takes to read source data.

Identifying the Performance Bottleneck 637

Add a filter transformation in the mapping after each source qualifier. Set the filter condition to false so that no data is processed past the filter transformation. If the time it takes to run the new session remains about the same, then you have a source bottleneck.

Using a Read Test SessionYou can create a read test mapping to identify source bottlenecks. A read test mapping isolates the read query by removing the transformation in the mapping. Use the following steps to create a read test mapping:

1. Make a copy of the original mapping.

2. In the copied mapping, keep only the sources, source qualifiers, and any custom joins or queries.

3. Remove all transformations.

4. Connect the source qualifiers to a file target.

Use the read test mapping in a test session. If the test session performance is similar to the original session, you have a source bottleneck.

Using a Database QueryYou can identify source bottlenecks by executing the read query directly against the source database.

Copy the read query directly from the session log. Execute the query against the source database with a query tool such as isql. On Windows, you can load the result of the query in a file. On UNIX systems, you can load the result of the query in /dev/null.

Measure the query execution time and the time it takes for the query to return the first row. If there is a long delay between the two time measurements, you can use an optimizer hint to eliminate the source bottleneck.

Causes for a source bottleneck may include an inefficient query or small database network packet sizes. For details about eliminating source bottlenecks, see “Optimizing the Source Database” on page 645.

Identifying Mapping BottlenecksIf you determine that you do not have a source or target bottleneck, you might have a mapping bottleneck. You can identify mapping bottlenecks by using a Filter transformation in the mapping.

If you determine that you do not have a source bottleneck, you can add a Filter transformation in the mapping before each target definition. Set the filter condition to false so that no data is loaded into the target tables. If the time it takes to run the new session is the same as the original session, you have a mapping bottleneck.


You can also identify mapping bottlenecks by using performance details. High errorrows and rowsinlookupcache counters indicate a mapping bottleneck. For details on eliminating mapping bottlenecks, see “Optimizing the Mapping” on page 647.

High Rowsinlookupcache Counters Multiple lookups can slow down the session. You might improve session performance by locating the largest lookup tables and tuning those lookup expressions. For details, see “Optimizing Multiple Lookups” on page 650.

High Errorrows Counters Transformation errors impact session performance. If a session has large numbers in any of the Transformation_errorrows counters, you might improve performance by eliminating the errors. For details, see “Eliminating Transformation Errors” on page 648.

Identifying a Session BottleneckIf you do not have a source, target, or mapping bottleneck, you may have a session bottleneck. You can identify a session bottleneck by using the performance details. The PowerCenter Server creates performance details when you enable Collect Performance Data in the Performance settings on the Properties tab of the session properties.

Performance details display information about each Source Qualifier, target definition, and individual transformation. All transformations have some basic counters that indicate the number of input rows, output rows, and error rows.

For details about performance details, see “Creating and Viewing Performance Details” on page 436.

Any value other than zero in the readfromdisk and writetodisk counters for Aggregator, Joiner, or Rank transformations indicate a session bottleneck.

Small cache size, low buffer memory, and small commit intervals can cause session bottlenecks. For details on eliminating session bottlenecks, see “Optimizing the Session” on page 655.

Aggregator, Rank, and Joiner Readfromdisk and Writetodisk CountersIf a session contains Aggregator, Rank, or Joiner transformations, examine each Transformation_readfromdisk and Transformation_writetodisk counter.

If these counters display any number other than zero, you can improve session performance by increasing the index and data cache sizes. The PowerCenter Server uses the index cache to store group information and the data cache to store transformed data, which is typically larger. Therefore, although both the index cache and data cache sizes affect performance, you will most likely need to increase the data cache size more than the index cache size. For further information about configuring cache sizes, see “Session Caches” on page 613.


If the session performs incremental aggregation, the PowerCenter Server reads historical aggregate data from the local disk during the session and writes to disk when saving historical data. As a result, the Aggregator_readtodisk and writetodisk counters display a number besides zero. However, since the PowerCenter Server writes the historical data to a file at the end of the session, you can still evaluate the counters during the session. If the counters show any number other than zero during the session run, you can increase performance by tuning the index and data cache sizes.

To view the session performance details while the session runs, right-click the session in the Workflow Monitor and choose Properties. Click the Properties tab in the details dialog box.

Source and Target BufferInput_efficiency and BufferOutput_efficiency CountersIf the BufferInput_efficiency and the BufferOutput_efficiency counters are low for all sources and targets, increasing the session DTM buffer size may improve performance. For information on when and how to tune this parameter, see “Increasing DTM Buffer Size” on page 656.

Under certain circumstances, tuning the buffer block size may also improve session performance. For details, see “Optimizing the Buffer Block Size” on page 657.

Identifying a System BottleneckAfter you tune the source, target, mapping, and session, you may consider tuning the system. You can identify system bottlenecks by using system tools to monitor CPU usage, memory usage, and paging.

The PowerCenter Server uses system resources to process transformation, session execution, and reading and writing data. The PowerCenter Server also uses system memory for other data such as aggregate, joiner, rank, and cached lookup tables. You can use system performance monitoring tools to monitor the amount of system resources the PowerCenter Server uses and identify system bottlenecks.

On Windows, you can use system tools in the Task Manager or Administrative Tools.

On UNIX systems you can use system tools such as vmstat and iostat to monitor system performance.

For details on eliminating system bottlenecks, see “Optimizing the System” on page 660.

Identifying System Bottlenecks on WindowsOn Windows, you can view the Performance and Processes tab in the Task Manager (use Ctrl-Alt-Del and choose Task Manager). The Performance tab in the Task Manager provides a quick look at CPU usage and total memory used. You can view more detailed performance information by using the Performance Monitor on Windows (use Start-Programs-Administrative Tools and choose Performance Monitor).


Use the Windows Performance Monitor to create a chart that provides the following information:

♦ Percent processor time. If you have several CPUs, monitor each CPU for percent processor time. If the processors are utilized at more than 80%, you may consider adding more processors.

♦ Pages/second. If pages/second is greater than five, you may have excessive memory pressure (thrashing). You may consider adding more physical memory.

♦ Physical disks percent time. This is the percent time that the physical disk is busy performing read or write requests. You may consider adding another disk device or upgrading the disk device.

♦ Physical disks queue length. This is the number of users waiting for access to the same disk device. If physical disk queue length is greater than two, you may consider adding another disk device or upgrading the disk device.

♦ Server total bytes per second. This is the number of bytes the server has sent to and received from the network. You can use this information to improve network bandwidth.

Identifying System Bottlenecks on UNIXYou can use UNIX tools to monitor user background process, system swapping actions, CPU loading process, and I/O load operations. When you tune UNIX systems, tune the server for a major database system. Use the following UNIX tools to identify system bottlenecks on the UNIX system:

♦ lsattr -E -I sys0. Use this tool to view current system settings. This tool shows maxuproc, the maximum level of user background processes. You may consider reducing the amount of background process on your system.

♦ iostat. Use this tool to monitor loading operation for every disk attached to the database server. Iostat displays the percentage of time that the disk was physically active. High disk utilization suggests that you may need to add more disks.

If you use disk arrays, use utilities provided with the disk arrays instead of iostat.

♦ vmstat or sar -w. Use this tool to monitor disk swapping actions. Swapping should not occur during the session. If swapping does occur, you may consider increasing your physical memory or reduce the number of memory-intensive applications on the disk.

♦ sar -u. Use this tool to monitor CPU loading. This tool provides percent usage on user, system, idle time, and waiting time. If the percent time spent waiting on I/O (%wio) is high, you may consider using other under-utilized disks. For example, if your source data, target data, lookup, rank, and aggregate cache files are all on the same disk, consider putting them on different disks.


Optimizing the Target Database

If your session writes to a flat file target, you can optimize session performance by writing to a flat file target that is local to the PowerCenter Server. If your session writes to a relational target, consider performing the following tasks to increase performance:

♦ Drop indexes and key constraints.

♦ Increase checkpoint intervals.

♦ Use bulk loading.

♦ Use external loading.

♦ Increase database network packet size.

♦ Optimize Oracle target databases.

Dropping Indexes and Key ConstraintsWhen you define key constraints or indexes in target tables, you slow the loading of data to those tables. To improve performance, drop indexes and key constraints before running your session. You can rebuild those indexes and key constraints after the session completes.

If you decide to drop and rebuild indexes and key constraints on a regular basis, you can create pre- and post-load stored procedures to perform these operations each time you run the session.

Note: To optimize performance, use constraint-based loading only if necessary.

Increasing Checkpoint IntervalsThe PowerCenter Server performance slows each time it waits for the database to perform a checkpoint. To increase performance, consider increasing the database checkpoint interval. When you increase the database checkpoint interval, you increase the likelihood that the database performs checkpoints as necessary, when the size of the database log file reaches its limit.

For details on specific database checkpoints, checkpoint intervals, and log files, consult your database documentation.

Bulk LoadingYou can use bulk loading to improve the performance of a session that inserts a large amount of data to a DB2, Sybase, Oracle, or Microsoft SQL Server database. Configure bulk loading on the Mapping tab.

When bulk loading, the PowerCenter Server bypasses the database log, which speeds performance. Without writing to the database log, however, the target database cannot perform rollback. As a result, you may not be able to perform recovery. Therefore, you must


weigh the importance of improved session performance against the ability to recover an incomplete session.

For more information on configuring bulk loading, see “Bulk Loading” on page 252.

External LoadingYou can use the External Loader session option to integrate external loading with a session.

If you have a DB2 EE or DB2 EEE target database, you can use the DB2 EE or DB2 EEE external loaders to bulk load target files. The DB2 EE external loader uses the PowerCenter Server db2load utility to load data. The DB2 EEE external loader uses the DB2 Autoloader utility.

If you have a Teradata target database, you can use the Teradata external loader utility to bulk load target files.

If your target database runs on Oracle, you can use the Oracle SQL*Loader utility to bulk load target files. When you load data to an Oracle database using a pipeline with multiple partitions, you can increase performance if you create the Oracle target table with the same number of partitions you use for the pipeline.

If your target database runs on Sybase IQ, you can use the Sybase IQ external loader utility to bulk load target files. If your Sybase IQ database is local to the PowerCenter Server on your UNIX system, you can increase performance by loading data to target tables directly from named pipes.

For details on the External Loader option, see “External Loading” on page 523.

Increasing Database Network Packet SizeYou can increase the network packet size in the Informatica Workflow Manager to reduce target bottleneck. For Sybase and Microsoft SQL Server, increase the network packet size to 8K - 16K. For Oracle, increase the network packet size in tnsnames.ora and listener.ora. If you increase the network packet size in the PowerCenter Server configuration, you also need to configure the database server network memory to accept larger packet sizes.

See your database documentation about optimizing database network packet size.

Optimizing Oracle Target DatabasesIf your target database is Oracle, you can optimize the target database by checking the storage clause, space allocation, and rollback segments.

When you write to an Oracle database, check the storage clause for database objects. Make sure that tables are using large initial and next values. The database should also store table and index data in separate tablespaces, preferably on different disks.

Optimizing the Target Database 643

When you write to Oracle target databases, the database uses rollback segments during loads. Make sure that the database stores rollback segments in appropriate tablespaces, preferably on different disks. The rollback segments should also have appropriate storage clauses.

You can optimize the Oracle target database by tuning the Oracle redo log. The Oracle database uses the redo log to log loading operations. Make sure that redo log size and buffer size are optimal. You can view redo log properties in the init.ora file.

If your Oracle instance is local to the PowerCenter Server, you can optimize performance by using IPC protocol to connect to the Oracle database. You can set up Oracle database connection in listener.ora and tnsnames.ora.

See your Oracle documentation for details on optimizing Oracle databases.


Optimizing the Source Database

If your session reads from a flat file source, you can improve session performance by setting the number of bytes the PowerCenter Server reads per line. By default, the PowerCenter Server reads 1024 bytes per line. If each line in the source file is less than the default setting, you can decrease the Line Sequential Buffer Length setting in the session properties.

If your session reads from a relational source, review the following suggestions for improving performance:

♦ Optimize the query.

♦ Create tempdb as in-memory database.

♦ Use conditional filters.

♦ Increase database network packet size.

♦ Connect to Oracle databases using IPC protocol.

Optimizing the QueryIf a session joins multiple source tables in one Source Qualifier, you might be able to improve performance by optimizing the query with optimizing hints. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.

Usually, the database optimizer determines the most efficient way to process the source data. However, you might know properties about your source tables that the database optimizer does not. The database administrator can create optimizer hints to tell the database how to execute the query for a particular set of source tables.

The query the PowerCenter Server uses to read data appears in the session log. You can also find the query in the Source Qualifier transformation. Have your database administrator analyze the query, and then create optimizer hints and/or indexes for the source tables.

Use optimizing hints if there is a long delay between when the query begins executing and when PowerCenter receives the first row of data. Configure optimizer hints to begin returning rows as quickly as possible, rather than returning all rows at once. This allows the PowerCenter Server to process rows parallel with the query execution.

Queries that contain ORDER BY or GROUP BY clauses may benefit from creating an index on the ORDER BY or GROUP BY columns. Once you optimize the query, use the SQL override option to take full advantage of these modifications. For details on using SQL override, see “Source Qualifier Transformation” in the Transformation Guide.

You can also configure the source database to run parallel queries to improve performance. See your database documentation for configuring parallel query.

Optimizing the Source Database 645

Using tempdb to Join Sybase and Microsoft SQL Server TablesWhen joining large tables on a Sybase or Microsoft SQL Server database, you might improve performance by creating the tempdb as an in-memory database to allocate sufficient memory. Check your Sybase or Microsoft SQL Server manual for details.

Using Conditional FiltersA simple source filter on the source database can sometimes impact performance negatively because of lack of indexes. You can use the PowerCenter conditional filter in the Source Qualifier to improve performance.

Whether you should use the PowerCenter conditional filter to improve performance depends on your session. For example, if multiple sessions read from the same source simultaneously, the PowerCenter conditional filter may improve performance.

However, some sessions may perform faster if you filter the source data on the source database. You can test your session with both the database filter and the PowerCenter filter to determine which method improves performance.

Increasing Database Network Packet SizesYou can improve the performance of a source database by increasing the network packet size, allowing larger packets of data to cross the network at one time. To do this you must complete the following tasks:

♦ Increase the database server network packet size.

♦ Change the packet size in the Workflow Manager database connection to reflect the database server packet size.

For Oracle, increase the packet size in listener.ora and tnsnames.ora. For other databases, check your database documentation for details on optimizing network packet size.

Connecting to Oracle Source DatabasesIf your Oracle instance is local to the PowerCenter Server, you can optimize performance by using IPC protocol to connect to the Oracle database. You can set up Oracle database connection in listener.ora and tnsnames.ora.


Optimizing the Mapping

Mapping-level optimization may take time to implement but can significantly boost session performance. Focus on mapping-level optimization only after optimizing on the target and source databases.

Generally, you reduce the number of transformations in the mapping and delete unnecessary links between transformations to optimize the mapping. You should configure the mapping with the least number of transformations and expressions to do the most amount of work possible. You should minimize the amount of data moved by deleting unnecessary links between transformations.

For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup transformations), limit connected input/output or output ports. Limiting the number of connected input/output or output ports reduces the amount of data the transformations store in the data cache.

You can also perform the following tasks to optimize the mapping:

♦ Configure single-pass reading.

♦ Optimize datatype conversions.

♦ Eliminate transformation errors.

♦ Optimize transformations.

♦ Optimize expressions.

Configuring Single-Pass ReadingSingle-pass reading allows you to populate multiple targets with one source qualifier. Consider using single-pass reading if you have several sessions that use the same sources. If you join the separate mappings and use only one source qualifier for each source, the PowerCenter Server then reads each source only once, then sends the data into separate data flows. A particular row can be used by all the data flows, by any combination, or by none, as the situation demands.

For example, you have the PURCHASING source table, and you use that source daily to perform an aggregation and a ranking. If you place the Aggregator and Rank transformations in separate mappings and sessions, you force the PowerCenter Server to read the same source table twice. However, if you join the two mappings, using one source qualifier, the PowerCenter Server reads PURCHASING only once, then sends the appropriate data to the two separate data flows.

When changing mappings to take advantage of single-pass reading, you can optimize this feature by factoring out any functions you do on both mappings. For example, if you need to subtract a percentage from the PRICE ports for both the Aggregator and Rank

Optimizing the Mapping 647

transformations, you can minimize work by subtracting the percentage before splitting the pipeline as shown in Figure 25-1:

Optimizing Datatype ConversionsForcing the PowerCenter Server to make unnecessary datatype conversions slows performance. For example, if your mapping moves data from an Integer column to a Decimal column, then back to an Integer column, the unnecessary datatype conversion slows performance. Where possible, eliminate unnecessary datatype conversions from mappings.

Some datatype conversions can improve system performance. Use integer values in place of other datatypes when performing comparisons using Lookup and Filter transformations.

For example, many databases store U.S. zip code information as a Char or Varchar datatype. If you convert your zip code data to an Integer datatype, the lookup database stores the zip code 94303-1234 as 943031234. This helps increase the speed of the lookup comparisons based on zip code.

Eliminating Transformation ErrorsIn large numbers, transformation errors slow the performance of the PowerCenter Server. With each transformation error, the PowerCenter Server pauses to determine the cause of the error and to remove the row causing the error from the data flow. Then the PowerCenter Server typically writes the row into the session log file.

Transformation errors occur when the PowerCenter Server encounters conversion errors, conflicting mapping logic, and any condition set up as an error, such as null input. Check the session log to see where the transformation errors occur. If the errors center around particular transformations, evaluate those transformation constraints.

If you need to run a session that generates a large numbers of transformation errors, you might improve performance by setting a lower tracing level. However, this is not a recommended long-term response to transformation errors. For details on error tracing and performance, see “Reducing Error Tracing” on page 659.

Figure 25-1. Single-Pass Reading


Optimizing Lookup TransformationsIf a mapping contains a Lookup transformation, you can optimize the lookup. Some of the things you can do to increase performance include caching the lookup table, optimizing the lookup condition, or indexing the lookup table.

For more information on the Lookup transformation, see “Lookup Transformation” in the Transformation Guide. For more information on lookup caching, see “Lookup Caches” in the Transformation Guide and “Session Caches” on page 613.

Caching LookupsIf a mapping contains Lookup transformations, you might want to enable lookup caching. In general, you want to cache lookup tables that need less than 300MB.

When you enable caching, the PowerCenter Server caches the lookup table and queries the lookup cache during the session. When this option is not enabled, the PowerCenter Server queries the lookup table on a row-by-row basis. You can increase performance using a shared or persistent cache:

♦ Shared cache. You can share the lookup cache between multiple transformations. You can share an unnamed cache between transformations in the same mapping. You can share a named cache between transformations in the same or different mappings.

♦ Persistent cache. If you want to save and reuse the cache files, you can configure the transformation to use a persistent cache. Use this feature when you know the lookup table does not change between session runs. Using a persistent cache can improve performance because the PowerCenter Server builds the memory cache from the cache files instead of from the database.

For more information on lookup caching options, see “Lookup Transformation” in the Transformation Guide.

Reducing the Number of Cached RowsUse the Lookup SQL Override option to add a WHERE clause to the default SQL statement. This allows you to reduce the number of rows included in the cache.

Optimizing the Lookup ConditionIf you include more than one lookup condition, place the conditions with an equal sign first to optimize lookup performance.

Indexing the Lookup TableThe PowerCenter Server needs to query, sort, and compare values in the lookup condition columns. The index needs to include every column used in a lookup condition. You can improve performance for both cached and uncached lookups:

♦ Cached lookups. You can improve performance by indexing the columns in the lookup ORDER BY. The session log contains the ORDER BY statement.


♦ Uncached lookups. Because the PowerCenter Server issues a SELECT statement for each row passing into the Lookup transformation, you can improve performance by indexing the columns in the lookup condition.

Optimizing Multiple LookupsIf a mapping contains multiple lookups, even with caching enabled and enough heap memory, the lookups can slow performance. By locating the Lookup transformations that query the largest amounts of data, you can tune those lookups to improve overall performance.

To see which Lookup transformations process the most data, examine the Lookup_rowsinlookupcache counters for each Lookup transformation. The Lookup transformations that have a large number in this counter might benefit from tuning their lookup expressions. If those expressions can be optimized, session performance improves. For hints on tuning expressions, see “Optimizing Expressions” on page 652.

Optimizing Filter TransformationsIf you filter rows from the mapping, you can improve efficiency by filtering early in the data flow. Instead of using a Filter transformation halfway through the mapping to remove a sizable amount of data, use a source qualifier filter to remove those same rows at the source.

If you cannot move the filter into the source qualifier, move the Filter transformation as close to the source qualifier as possible to remove unnecessary data early in the data flow.

In your filter condition, avoid using complex expressions. You can optimize Filter transformations by using simple integer or true/false expressions in the filter condition.

Use a Filter or Router transformation to drop rejected rows from an Update Strategy transformation if you do not need to keep rejected rows.

Optimizing Aggregator TransformationsAggregator transformations often slow performance because they must group data before processing it. Aggregator transformations need additional memory to hold intermediate group results. You can optimize Aggregator transformations by performing the following tasks:

♦ Group by simple columns.

♦ Use sorted input.

♦ Use incremental aggregation.

Group By Simple ColumnsYou can optimize Aggregator transformations when you group by simple columns. When possible, use numbers instead of string and dates in the columns used for the GROUP BY. You should also avoid complex expressions in the Aggregator expressions.


Use Sorted InputYou can increase session performance by sorting data and using the Aggregator Sorted Input option.

The Sorted Input decreases the use of aggregate caches. When you use the Sorted Input option, the PowerCenter Server assumes all data is sorted by group. As the PowerCenter Server reads rows for a group, it performs aggregate calculations. When necessary, it stores group information in memory.

The Sorted Input option reduces the amount of data cached during the session and improves performance. Use this option with the Source Qualifier Number of Sorted Ports option to pass sorted data to the Aggregator transformation.

You can benefit from better performance when you use the Sorted Input option in sessions with multiple partitions.

For details about using Sorted Input in the Aggregator transformation, see “Aggregator Transformation” in the Transformation Guide.

Use Incremental AggregationIf you can capture changes from the source that changes less than half the target, you can use Incremental Aggregation to optimize the performance of Aggregator transformations.

When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. The PowerCenter Server updates your target incrementally, rather than processing the entire source and recalculate the same calculations every time you run the session.

For details on using Incremental Aggregation, see “Using Incremental Aggregation” on page 573.

Optimizing Joiner TransformationsJoiner transformations can slow performance because they need additional space at run time to hold intermediate results. You can view Joiner performance counter information to determine whether you need to optimize the Joiner transformations.

Joiner transformations need a data cache to hold the master table rows and an index cache to hold the join columns from the master table. You need to make sure that you have enough memory to hold the data and the index cache so the system does not page to disk. To minimize memory requirements, you can also use the smaller table as the master table or join on as few columns as possible.

The type of join you use can affect performance. Normal joins are faster than outer joins and result in fewer rows. When possible, use database joins for homogenous sources.


Optimizing Sequence Generator TransformationsYou can optimize Sequence Generator transformations by creating a reusable Sequence Generator and use it in multiple mappings simultaneously. You can also optimize Sequence Generator transformations by configuring the Number of Cached Values property.

The Number of Cached Values property determines the number of values the PowerCenter Server caches at one time. Make sure that the Number of Cached Value is not too small. You may consider configuring the Number of Cached Values to a value greater than 1,000.

For details on configuring Sequence Generator transformation, see “Sequence Generator Transformation” in the Transformation Guide.

Optimizing ExpressionsAs a final step in tuning the mapping, you can focus on the expressions used in transformations. When examining expressions, focus on complex expressions for possible simplification. Remove expressions one-by-one to isolate the slow expressions.

Once you locate the slowest expressions, take a closer look at how you can optimize those expressions.

Factoring Out Common LogicIf the mapping performs the same task in several places, reduce the number of times the mapping performs the task by moving the task earlier in the mapping. For example, you have a mapping with five target tables. Each target requires a Social Security number lookup. Instead of performing the lookup five times, place the Lookup transformation in the mapping before the data flow splits. Then pass lookup results to all five targets.

Minimizing Aggregate Function CallsWhen writing expressions, factor out as many aggregate function calls as possible. Each time you use an aggregate function call, the PowerCenter Server must search and group the data. For example, in the following expression, the PowerCenter Server reads COLUMN_A, finds the sum, then reads COLUMN_B, finds the sum, and finally finds the sum of the two sums:

SUM(COLUMN_A) + SUM(COLUMN_B)

If you factor out the aggregate function call, as below, the PowerCenter Server adds COLUMN_A to COLUMN_B, then finds the sum of both.

SUM(COLUMN_A + COLUMN_B)

Replacing Common Sub-Expressions with Local VariablesIf you use the same sub-expression several times in one transformation, you can make that sub-expression a local variable. You can use a local variable only within the transformation, but by calculating the variable only once, you can speed performance. For details, see “Transformations” in the Designer Guide.


Choosing Numeric versus String OperationsThe PowerCenter Server processes numeric operations faster than string operations. For example, if you look up large amounts of data on two columns, EMPLOYEE_NAME and EMPLOYEE_ID, configuring the lookup around EMPLOYEE_ID improves performance.

Optimizing Char-Char and Char-Varchar ComparisonsWhen the PowerCenter Server performs comparisons between CHAR and VARCHAR columns, it slows each time it finds trailing blank spaces in the row. You can use the Treat CHAR as CHAR On Read option in the PowerCenter Server setup so that the PowerCenter Server does not trim trailing spaces from the end of Char source fields. For details, see the Installation and Configuration Guide.

Choosing DECODE versus LOOKUPWhen you use a LOOKUP function, the PowerCenter Server must look up a table in a database. When you use a DECODE function, you incorporate the lookup values into the expression itself, so the PowerCenter Server does not have to look up a separate table. Therefore, when you want to look up a small set of unchanging values, using DECODE may improve performance. For details on using a DECODE, see the Transformation Language Reference.

Using Operators Instead of Functions The PowerCenter Server reads expressions written with operators faster than expressions with functions. Where possible, use operators to write your expressions. For example, if you have an expression that involves nested CONCAT calls such as:

CONCAT( CONCAT( CUSTOMERS.FIRST_NAME, ‘ ’) CUSTOMERS.LAST_NAME)

you can rewrite that expression with the || operator as follows:

CUSTOMERS.FIRST_NAME || ‘ ’ || CUSTOMERS.LAST_NAME

Optimizing IIF ExpressionsIIF expressions can return a value as well as an action, which allows for more compact expressions. For example, say you have a source with three Y/N flags: FLG_A, FLG_B, FLG_C, and you want to return values such that: If FLG_A = “Y”, then return = VAL_A. If FLG_A = “Y” AND FLG_B = “Y”, then return = VAL_A + VAL_B, and so on for all the permutations.

One way to write the expression is as follows:

IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'Y', VAL_A + VAL_B + VAL_C,IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'N', VAL_A + VAL_B ,IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'Y', VAL_A + VAL_C,IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'N',


VAL_A ,IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'Y', VAL_B + VAL_C,IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'N', VAL_B ,IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'Y', VAL_C,IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'N', 0.0,

))))))))

This first expression requires 8 IIFs, 16 ANDs, and at least 24 comparisons.

But if you take advantage of the IIF function’s ability to return a value, you can rewrite that expression as:

IIF(FLG_A='Y', VAL_A, 0.0)+ IIF(FLG_B='Y', VAL_B, 0.0)+ IIF(FLG_C='Y', VAL_C, 0.0)

This results in three IIFs, two comparisons, two additions, and a faster session.

Evaluating ExpressionsIf you are not sure which expressions slow performance, the following steps can help isolate the problem.

To evaluate expression performance:

1. Time the session with the original expressions.

2. Copy the mapping and replace half of the complex expressions with a constant.

3. Run and time the edited session.

4. Make another copy of the mapping and replace the other half of the complex expressions with a constant.

5. Run and time the edited session.


Optimizing the Session

Once you optimize your source database, target database, and mapping, you can focus on optimizing the session. You can perform the following tasks to improve overall performance:

♦ Increase the number of partitions.

♦ Reduce errors tracing.

♦ Remove staging areas.

♦ Tune session parameters.

Table 25-1 lists the settings and values you can use to improve session performance:

Pipeline PartitioningIf you purchased the partitioning option, you can increase the number of partitions in a pipeline to improve session performance. Increasing the number of partitions allows the PowerCenter Server to create multiple connections to sources and process partitions of source data concurrently.

When you create a session, the Workflow Manager validates each pipeline in the mapping for partitioning. You can specify multiple partitions in a pipeline if the PowerCenter Server can maintain data consistency when it processes the partitioned data.

For details on partitioning sessions, see “Pipeline Partitioning” on page 663.

Allocating Buffer MemoryWhen the PowerCenter Server initializes a session, it allocates blocks of memory to hold source and target data. The PowerCenter Server allocates at least two blocks for each source and target partition. Sessions that use a large number of sources and targets might require additional memory blocks. If the PowerCenter Server cannot allocate enough memory blocks to hold the data, it fails the session.

Table 25-1. Session Tuning Parameters

Setting Default Value Suggested Minimum Value

Suggested Maximum Value

DTM Buffer Size 12,000,000 bytes 6,000,000 bytes 128,000,000 bytes

Buffer block size 64,000 bytes 4,000 bytes 128,000 bytes

Index cache size 1,000,000 bytes 1,000,000 bytes 12,000,000 bytes

Data cache size 2,000,000 bytes 2,000,000 bytes 24,000,000 bytes

Commit interval 10,000 rows N/A N/A

High Precision Disabled N/A N/A

Tracing Level Normal Terse N/A

Optimizing the Session 655

By default, a session has enough buffer blocks for 83 sources and targets. If you run a session that has more than 83 sources and targets, you can increase the number of available memory blocks by adjusting the following session parameters:

♦ DTM Buffer Size. Increase the DTM buffer size found in the Performance settings of the Properties tab. The default setting is 12,000,000 bytes.

♦ Default Buffer Block Size. Decrease the buffer block size found in the Advanced settings of the Config Object tab. The default setting is 64,000 bytes.

To configure these settings, first determine the number of memory blocks the PowerCenter Server requires to initialize the session. Then, based on default settings, you can calculate the buffer size and/or the buffer block size to create the required number of session blocks.

If you have XML sources or targets in your mapping, use the number of groups in the XML source or target in your calculation for the total number of sources and targets.

For example, you create a session that contains a single partition using a mapping that contains 50 sources and 50 targets.

1. You determine that the session requires 200 memory blocks:

[(total number of sources + total number of targets)* 2] = (session buffer blocks)

100 * 2 = 200

2. Next, based on default settings, you determine that you can change the DTM Buffer Size to 15,000,000, or you can change the Default Buffer Block Size to 54,000:

(session Buffer Blocks) = (.9) * (DTM Buffer Size) / (Default Buffer Block Size) * (number of partitions)

200 = .9 * 14222222 / 64000 * 1

or

200 = .9 * 12000000 / 54000 * 1

Increasing DTM Buffer SizeThe DTM Buffer Size setting specifies the amount of memory the PowerCenter Server uses as DTM buffer memory. The PowerCenter Server uses DTM buffer memory to create the internal data structures and buffer blocks used to bring data into and out of the PowerCenter Server. When you increase the DTM buffer memory, the PowerCenter Server creates more buffer blocks, which improves performance during momentary slowdowns.

Increasing DTM buffer memory allocation generally causes performance to improve initially and then level off. When you increase the DTM buffer memory allocation, consider the total memory available on the PowerCenter Server system.

If you do not see a significant increase in performance, DTM buffer memory allocation is not a factor in session performance.

Note: Reducing the DTM buffer allocation can cause the session to fail early in the process because the PowerCenter Server is unable to allocate memory to the required processes.


To increase DTM buffer size:

1. Go to the Performance settings of the Properties tab.

2. Increase the setting for DTM Buffer Size, and click OK.

The default for DTM Buffer Size is 12,000,000 bytes. Increase the setting by increments of multiples of the buffer block size, then run and time the session after each increase.

Optimizing the Buffer Block SizeDepending on the session source data, you might need to increase or decrease the buffer block size.

If the session mapping contains a large number of sources or targets, you might need to decrease the buffer block size. For more information, see “Allocating Buffer Memory” on page 655.

If you are manipulating unusually large rows of data, you can increase the buffer block size to improve performance. If you do not know the approximate size of your rows, you can determine the configured row size by following the steps below.

To evaluate needed buffer block size:

1. In the Mapping Designer, open the mapping for the session.

2. Open the target instance.

3. Click the Ports tab.

4. Add the precisions for all the columns in the target.

5. If you have more than one target in the mapping, repeat steps 2-4 for each additional target to calculate the precision for each target.

6. Repeat steps 2-5 for each source definition in your mapping.

7. Choose the largest precision of all the source and target precisions for the total precision in your buffer block size calculation.

The total precision represents the total bytes needed to move the largest row of data. For example, if the total precision equals 33,000, then the PowerCenter Server requires 33,000 bytes in the buffers to move that row. If the buffer block size is 64,000 bytes, the PowerCenter Server can move only one row at a time.

Ideally, a buffer should accommodate at least 20 rows at a time. So if the total precision is greater than 32,000, increase the size of the buffers to improve performance.

To increase buffer block size:

1. Go to the Advanced settings on the Config Object tab.

2. Increase the setting for Default Buffer Block Size, and click OK.

The default for this setting is 64,000 bytes. Increase this setting in relation to the size of the rows. As with DTM buffer memory allocation, increasing buffer block size should improve


performance. If you do not see an increase, buffer block size is not a factor in session performance.

Increasing the Cache Sizes The PowerCenter Server uses the index and data caches for Aggregator, Rank, Lookup, and Joiner transformation. The PowerCenter Server stores transformed data from Aggregator, Rank, Lookup, and Joiner transformations in the data cache before returning it to the data flow. It stores group information for those transformations in the index cache. If the allocated data or index cache is not large enough to store the data, the PowerCenter Server stores the data in a temporary disk file as it processes the session data. Each time the PowerCenter Server pages to the temporary file, performance slows.

You can see when the PowerCenter Server pages to the temporary file by examining the performance details. The Transformation_readfromdisk or Transformation_writetodisk counters for any Aggregator, Rank, Lookup, or Joiner transformation indicate the number of times the PowerCenter Server must page to disk to process the transformation. Since the data cache is typically larger than the index cache, you should increase the data cache more than the index cache.

For details on calculating the index and data cache size for Aggregator, Rank, Lookup, or Joiner transformations, see “Session Caches” on page 613.

Increasing the Commit IntervalThe Commit Interval setting determines the point at which the PowerCenter Server commits data to the target tables. Each time the PowerCenter Server commits, performance slows. Therefore, the smaller the commit interval, the more often the PowerCenter Server writes to the target database, and the slower the overall performance.

If you increase the commit interval, the number of times the PowerCenter Server commits decreases and performance improves.

When you increase the commit interval, consider the log file limits in the target database. If the commit interval is too high, the PowerCenter Server may fill the database log file and cause the session to fail.

Therefore, weigh the benefit of increasing the commit interval against the additional time you would spend recovering a failed session.

Click the General Options settings of the Properties tab to review and adjust the commit interval.

Disabling High PrecisionIf a session runs with high precision enabled, disabling high precision might improve session performance.


The Decimal datatype is a numeric datatype with a maximum precision of 28. To use a high precision Decimal datatype in a session, configure the PowerCenter Server to recognize this datatype by selecting Enable High Precision in the session properties. However, since reading and manipulating the high precision datatype slows the PowerCenter Server, you can improve session performance by disabling high precision.

When you disable high precision, the PowerCenter Server converts data to a double. The PowerCenter Server reads the Decimal row 3900058411382035317455530282 as 390005841138203 x 1013. For details on high precision, “Handling High Precision Data” on page 204.

Click the Performance settings on the Properties tab to enable high precision.

Reducing Error TracingIf a session contains a large number of transformation errors that you have no time to correct, you can improve performance by reducing the amount of data the PowerCenter Server writes to the session log.

To reduce the amount of time spent writing to the session log file, set the tracing level to Terse. You specify Terse tracing if your sessions run without problems and you don’t need session details. At this tracing level, the PowerCenter Server does not write error messages or row-level information for reject data.

To debug your mapping, set the tracing level to Verbose. However, it can significantly impact the session performance. Do not use Verbose tracing when you tune performance.

The session tracing level overrides any transformation-specific tracing levels within the mapping. This is not recommended as a long-term response to high levels of transformation errors.

For more information about tracing levels, see “Setting Tracing Levels” on page 473.

Removing Staging AreasWhen you use a staging area, the PowerCenter Server performs multiple passes on your data. Where possible, remove staging areas to improve performance. The PowerCenter Server can read multiple sources with a single pass, which may alleviate your need for staging areas. For details on single-pass reading, see “Optimizing the Mapping” on page 647.


Optimizing the System

Often performance slows because your session relies on inefficient connections or an overloaded PowerCenter Server system. System delays can also be caused by routers, switches, network protocols, and usage by many users. After you determine from the system monitoring tools that you have a system bottleneck, you can make the following global changes to improve the performance of all your sessions:

♦ Improve network speed. Slow network connections can slow session performance. Have your system administrator determine if your network runs at an optimal speed. Decrease the number of network hops between the PowerCenter Server and databases.

♦ Use multiple PowerCenter Servers. Using multiple PowerCenter Servers on separate systems might double or triple session performance.

♦ Use a server grid. Use a collection of PowerCenter Servers to distribute and process the workload of a workflow. For information on server grids, see “Working with Server Grids” on page 446.

♦ Improve CPU performance. Run the PowerCenter Server and related machines on high performance CPUs, or configure your system to use additional CPUs.

♦ Configure the PowerCenter Server for ASCII data movement mode. When all character data processed by the PowerCenter Server is 7-bit ASCII or EBCDIC, configure the PowerCenter Server for ASCII data movement mode.

♦ Check hard disks on related machines. Slow disk access on source and target databases, source and target file systems, as well as the PowerCenter Server and repository machines can slow session performance. Have your system administrator evaluate the hard disks on your machines.

♦ Reduce paging. When an operating system runs out of physical memory, it starts paging to disk to free physical memory. Configure the physical memory for the PowerCenter Server machine to minimize paging to disk.

♦ Use processor binding. In a multi-processor UNIX environment, the PowerCenter Server may use a large amount of system resources. Use processor binding to control processor usage by the PowerCenter Server.

Improving Network SpeedThe performance of the PowerCenter Server is related to network connections. A local disk can move data five to twenty times faster than a network. Consider the following options to minimize network activity and to improve PowerCenter Server performance.

If you use flat file as a source or target in your session, you can move the files onto the PowerCenter Server system to improve performance. When you store flat files on a machine other than the PowerCenter Server, session performance becomes dependent on the performance of your network connections. Moving the files onto the PowerCenter Server system and adding disk space might improve performance.


If you use relational source or target databases, try to minimize the number of network hops between the source and target databases and the PowerCenter Server. Moving the target database onto a server system might improve PowerCenter Server performance.

When you run sessions that contain multiple partitions, have your network administrator analyze the network and make sure it has enough bandwidth to handle the data moving across the network from all partitions.

Using Multiple PowerCenter ServersYou can run multiple PowerCenter Servers on separate systems against the same repository. Distributing the session load to separate PowerCenter Server systems increases performance. For details on using multiple PowerCenter Servers, see “Using Multiple Servers” on page 443.

Using Server GridsA server grid allows you to use the combined processing power of multiple PowerCenter Servers to balance the workload of workflows. For more information about creating a server grid, see “Working with Server Grids” on page 446.

In a server grid, a PowerCenter Server distributes sessions across the network of available PowerCenter Servers. You can further improve performance by assigning a more powerful server to run a complicated mapping. For more information about assigning a server to a session, see “Assigning the PowerCenter Server to a Session” on page 198.

Running the PowerCenter Server in ASCII Data Movement ModeWhen all character data processed by the PowerCenter Server is 7-bit ASCII or EBCDIC, configure the PowerCenter Server to run in the ASCII data movement mode. In ASCII mode, the PowerCenter Server uses one byte to store each character. When you run the PowerCenter Server in Unicode mode, it uses two bytes for each character, which can slow session performance.

Using Additional CPUsConfigure your system to use additional CPUs to improve performance. Additional CPUs allows the system to run multiple sessions in parallel as well as multiple pipeline partitions in parallel.

However, additional CPUs might cause disk bottlenecks. To prevent disk bottlenecks, minimize the number of processes accessing the disk. Processes that access the disk include database functions and operating system functions. Parallel sessions or pipeline partitions also require disk access.

Optimizing the System 661

Reducing PagingPaging occurs when the PowerCenter Server operating system runs out of memory for a particular operation and uses the local disk for memory. You can free up more memory or increase physical memory to reduce paging and the slow performance that results from paging. Monitor paging activity using system tools.

You might want to increase system memory in the following circumstances:

♦ You run a session that uses large cached lookups.

♦ You run a session with many partitions.

If you cannot free up memory, you might want to add memory to the system.

Using Processor BindingIn a multi-processor UNIX environment, the PowerCenter Server may use a large amount of system resources if you run a large number of sessions. As a result, other applications on the machine may not have enough system resources available. You can use processor binding to control processor usage by the PowerCenter Server.

In a Sun Solaris environment, the system administrator can create and manage a processor set using the psrset command. The system administrator can then use the pbind command to bind the PowerCenter Server to a processor set so the processor set only runs the PowerCenter Server. The Sun Solaris environment also provides the psrinfo command to display details about each configured processor, and the psradm command to change the operational status of processors. For details, see your system administrator and Sun Solaris documentation.

In an HP-UX environment, the system administrator can use the Process Resource Manager utility to control CPU usage in the system. The Process Resource Manager allocates minimum system resources and uses a maximum cap of resources. For details, see your system administrator and HP-UX documentation.

In an AIX environment, system administrators can use the Workload Manager in AIX 5L to manage system resources during peak demands. The Workload Manager can allocate resources and manage CPU, memory, and disk I/O bandwidth. For details, see your system administrator and AIX documentation.


Pipeline Partitioning

Once you have tuned the application, databases, and system for maximum single-partition performance, you may find that your system is under-utilized. At this point, you can reconfigure your session to have two or more partitions. Adding partitions may improve performance by utilizing more of the hardware while processing the session.

Use the following tips when you add partitions to a session:

♦ Add one partition at a time. To best monitor performance, add one partition at a time, and note your session settings before you add each partition.

♦ Set DTM Buffer Memory. For a session with n partitions, this value should be at least n times the value for the session with one partition.

♦ Set cached values for Sequence Generator. For a session with n partitions, there should be no need to use the “Number of Cached Values” property of the Sequence Generator transformation. If you must set this value to a value greater than zero, make sure it is at least n times the original value for the session with one partition.

♦ Partition the source data evenly. Configure each partition to extract the same number of rows.

♦ Monitor the system while running the session. If there are CPU cycles available (twenty percent or more idle time) then this session might see a performance improvement by adding a partition.

♦ Monitor the system after adding a partition. If the CPU utilization does not go up, the wait for I/O time goes up, or the total data transformation rate goes down, then there is probably a hardware or software bottleneck. If the wait for I/O time goes up a significant amount, then check the system for hardware bottlenecks. Otherwise, check the database configuration.

♦ Tune databases and system. Make sure that your databases are tuned properly for parallel ETL and that your system has no bottlenecks.

For details on pipeline partitioning, see “Pipeline Partitioning” on page 345.

Optimizing the Source Database for PartitioningUsually, each partition on the reader side represents a subset of the data to be processed. But if the database is not tuned properly, the results may not make your session any quicker. This is fairly easy to test. Create a pipeline with one partition. Measure the reader throughput in the Workflow Manager. After you do this, add partitions. Is the throughput scaling linearly? In other words, if you have two partitions, is your reader throughput twice as fast? If this is not true, you probably need to tune your database.

Some databases may have specific options that must be set to enable parallel queries. You should check your individual database manual for these options. If these options are off, the PowerCenter Server runs multiple partition SELECT statements serially.

Pipeline Part it ioning 663

You can also consider adding partitions to increase the speed of your query. Each database provides an option to separate the data into different tablespaces. If your database allows it, you can use the SQL override feature to provide a query that extracts data from a single partition.

To maximize a single-sorted query on your database, you need to look at options that enable parallelization. There are many options in each database that may increase the speed of your query.

Here are some configuration options to look for in your source database:

♦ Check for configuration parameters that perform automatic tuning. For example, Oracle has a parameter called parallel_automatic_tuning.

♦ Make sure intra-parallelism (the ability to run multiple threads on a single query) is enabled. For example, on Oracle you should look at parallel_adaptive_multi_user. On DB2, you should look at intra_parallel.

♦ Maximum number of parallel processes that are available for parallel executions. For example, on Oracle, you should look at parallel_max_servers. On DB2, you should look at max_agents.

♦ Size for various resources used in parallelization. For example, Oracle has parameters such as large_pool_size, shared_pool_size, hash_area_size, parallel_execution_message_size, and optimizer_percent_parallel. DB2 has configuration parameters such as dft_fetch_size, fcm_num_buffers, and sort_heap.

♦ Degrees of parallelism (may occur as either a database configuration parameter or an option on the table or query). For example, Oracle has parameters parallel_threads_per_cpu and optimizer_percent_parallel. DB2 has configuration parameters such as dft_prefetch_size, dft_degree, and max_query_degree.

♦ Turn off options that may affect your database scalability. For example, disable archive logging and timed statistics on Oracle.

Note: The above examples are not a comprehensive list of all the tuning options available to you on the databases. Check your individual database documentation for all performance tuning configuration parameters available.

Optimizing the Target Database for PartitioningIf you have a mapping with multiple partitions, you want the throughput for each partition to be the same as the throughput for a single partition session. If you do not see this correlation, then your database is probably inserting rows into the database serially.

To make sure that your database inserts rows in parallel, check the following configuration options in your target database:

♦ Look for a configuration option that needs to be set explicitly to enable parallel inserts. For example, Oracle has db_writer_processes, and DB2 has max_agents (some databases may have this enabled by default).


♦ Consider partitioning your target table. If it is possible, try to have each partition write to a single database partition. You can use the Router transformation to do this. Also, look into having the database partitions on separate disks to prevent I/O contention among the pipeline partitions.

♦ Turn off options that may affect your database scalability. For example, disable archive logging and timed statistics on Oracle.

Pipeline Part it ioning 665


A p p e n d i x A

Session Properties Reference

This appendix contains a listing of settings in the session properties. These settings are grouped by the following tabs:

♦ General Tab, 668

♦ Properties Tab, 670

♦ Config Object Tab, 675

♦ Mapping Tab (Transformations View), 681

♦ Mapping Tab (Partitions View), 705

♦ Components Tab, 710

♦ Metadata Extensions Tab, 718

667

General Tab

By default, the General tab appears when you edit a session task.

Figure A-1 displays the General tab:

On the General tab you can rename the session task and enter a description for the session task.

Table A-1 describes settings on the General tab:

Figure A-1. General Tab

Table A-1. General Tab

General Tab Options


Rename Optional The Rename button allows you to enter a new name for the session task.

Description Optional You can enter a description for the session task in the Description field.

Mapping name Required The name of the mapping associated with the session task.

Server Required The name of the server associated with the session task.

Fail Parent if this task fails*

Optional Fails the parent worklet or workflow if this task fails.

668 Appendix A: Session Properties Reference

Fail parent if this task does not run*

Optional Fails the parent worklet or workflow if this task does not run.

Disable this task* Optional Disables the task.

Treat the input links as AND or OR*

Required Runs the task when all or one of the input link conditions evaluate to True.

*Appears only in the Workflow Designer.

Table A-1. General Tab

General Tab Options


General Tab 669

Properties Tab

On the Properties tab you can configure the following settings:

♦ General Options. General Options settings allow you to configure session log file name, session log file directory, parameter filename and other general session settings. For more information, see “General Options Settings” on page 670.

♦ Performance. The Performance settings allow you to increase memory size, collect performance details, and set configuration parameters. For more information, see “Performance Settings” on page 673.

General Options SettingsYou can configure General Options settings on the Properties tab. You can enter session log file name, session log file directory, and other general session settings.

Figure A-2 displays the General Options settings on the Properties tab:

Figure A-2. Properties Tab - General Options Settings


Table A-2 describes the General Options settings on the Properties tab:

Table A-2. Properties Tab - General Options Settings

General Options Settings


Session Log File Name

Optional By default, the PowerCenter Server uses the session name for the log file name: s_mapping name.log. For a debug session, it uses DebugSession_mapping name.log.Optionally enter a file name, a file name and directory, or use the $PMSessionLogFile session parameter. The PowerCenter Server appends information in this field to that entered in the Session Log File Directory field. For example, if you have �C:\session_logs\� in the Session Log File Directory File field, then enter �logname.txt� in the Session Log File field, the PowerCenter Server writes the logname.txt to the C:\session_logs\ directory. You can also use the $PMSessionLogFile session parameter to represent the name of the session log or the name and location of the session log. For details on session parameters, see �Session Parameters� on page 495.

Session Log File Directory

Required Designates a location for the session log file. By default, the PowerCenter Server writes the log file in the server variable directory, $PMSessionLogFileDir. If you enter a full directory and file name in the Session Log File Name field, clear this field.

Parameter File Name

Optional Designates the name and directory for the parameter file. Use the parameter file to define session parameters. You can also use it to override values of mapping parameters and variables. For details on session parameters, see �Session Parameters� on page 495. For details on mapping parameters and variables, see �Mapping Parameters and Variables� in the Designer Guide.

Enable Test Load Optional You can configure the PowerCenter Server to perform a test load. With a test load, the PowerCenter Server reads and transforms data without writing to targets. The PowerCenter Server generates all session files, and performs all pre- and post-session functions, as if running the full session. The PowerCenter Server writes data to relational targets, but rolls back the data when the session completes. For all other target types, such as flat file and SAP BW, the PowerCenter Server does not write data to the targets.Enter the number of source rows you want to test in the Number of Rows to Test field.You cannot perform a test load on sessions using XML sources.Note: You can perform a test load when you configure a session for normal mode. If you configure the session for bulk mode, the session fails.


Optional Enter the number of source rows you want the PowerCenter Server to test load.The PowerCenter Server reads the exact number you configure for the test load. You cannot perform a test load when you run a session against a mapping that contains XML sources.

Properties Tab 671

$Source Connection Value

Optional Enter the database connection you want the PowerCenter Server to use for the $Source variable. Choose a relational or application database connection. You can also choose a $DBConnection parameter.You can use the $Source variable in Lookup and Stored Procedure transformations to specify the database location for the lookup table or stored procedure. If you use $Source in a mapping, you can specify the database location in this field to ensure the PowerCenter Server uses the correct database connection to run the session.If you use $Source in a mapping, but do not specify a database connection in this field, the PowerCenter Server determines which database connection to use when it runs the session. If it cannot determine the database connection, it fails the session. For more information, see �Lookup Transformation� and �Stored Procedure Transformation� in the Transformation Guide.

$Target Connection Value

Optional Enter the database connection you want the PowerCenter Server to use for the $Target variable. Choose a relational or application database connection. You can also choose a $DBConnection parameter.You can use the $Target variable in Lookup and Stored Procedure transformations to specify the database location for the lookup table or stored procedure. If you use $Target in a mapping, you can specify the database location in this field to ensure the PowerCenter Server uses the correct database connection to run the session.If you use $Target in a mapping, but do not specify a database connection in this field, the PowerCenter Server determines which database connection to use when it runs the session. If it cannot determine the database connection, it fails the session. For more information, see �Lookup Transformation� and �Stored Procedure Transformation� in the Transformation Guide.

Treat Source Rows As

Required Indicates how the PowerCenter Server treats all source rows. If the mapping for the session contains an Update Strategy transformation or a Custom transformation configured to set the update strategy, the default option is Data Driven.When you select Data Driven and you load to either a Microsoft SQL Server or Oracle database, you must use a normal load. If you bulk load, the PowerCenter Server fails the session.

Commit Type Required Determines whether the PowerCenter Server uses a source- or target-based, or user-defined commit. You can choose source- or target-based commit if the mapping has no Transaction Control transformation or only ineffective Transaction Control transformations. By default, the PowerCenter Server performs a target-based commit.A User-Defined commit is enabled by default if the mapping has effective Transaction Control transformations.For details on Commit Intervals, see �Setting Commit Properties� on page 292.

Commit Interval Required In conjunction with the selected commit interval type, indicates the number of rows. By default, the PowerCenter Server uses a commit interval of 10,000 rows. This option is not available for user-defined commit.





Performance SettingsYou can configure performance settings on the Properties tab. In Performance settings you can increase memory size, collect performance details, and set configuration parameters.

Figure A-3 displays the Performance settings on the Properties tab:

Commit On End Of File

Required By default, this option is enabled and the PowerCenter Server performs a commit at the end of the file. Clear this option if you want to roll back open transactions.This option is enabled by default for a target-based commit. You cannot disable it.

Rollback Transactions on Errors

Optional For source-based commit, the PowerCenter Server rolls back the transaction at the next commit point when it encounters a non-fatal writer error.For user-defined commit, the PowerCenter Server rolls back the transaction at the next commit point when it encounters a non-fatal error.This option is not available for target-based commit.

*Tip: When you bulk load to Microsoft SQL Server or Oracle targets, define a large commit interval. Microsoft SQL Server and Oracle start a new bulk load transaction after each commit. Increasing the commit interval reduces the number of bulk load transactions and increases performance.

Figure A-3. Properties Tab - Performance Settings




Properties Tab 673

Table A-3 describes the Performance settings on the Properties tab:

Table A-3. Properties Tab - Performance Settings

Performance Settings


DTM Buffer Size Required The amount of memory allocated to the session from the DTM process. By default, the Workflow Manager allocates 12 MB for DTM buffer memory. If a session contains large amounts of character data and you configure it to run in Unicode mode, increase the DTM Buffer size to 24 MB.Note: If a source contains a large binary object with a precision larger than the allocated DTM buffer size, then increase the DTM buffer size to increase the buffer memory. If you do not increase the DTM buffer memory, the session will fail.For information on improving session performance, see �Performance Tuning� on page 635.

Collect Performance Data

Optional When selected, the PowerCenter Server creates session performance details. Use this file to help determine how you can improve session performance. For more information, see �Performance Tuning� on page 635.

Incremental Aggregation

Optional Select Incremental Aggregation option if you want the PowerCenter Server to perform incremental aggregation. For details, see �Using Incremental Aggregation� on page 573.

Reinitialize Aggregate Cache

Optional Select Reinitialize Aggregate Cache if the session is an incremental aggregation session and you want to overwrite existing aggregate files.After a single session run, to return to a normal incremental aggregation session run, you must clear this option. For details, see �Using Incremental Aggregation� on page 573.

Enable High Precision

Optional When selected, the PowerCenter Server processes the Decimal datatype to a precision of 28. If a session does not use the Decimal datatype, leave this setting clear. For details on using the Decimal datatype with high precision, see �Handling High Precision Data� on page 204.

Session Retry On Deadlock

Optional Select this option if you want the PowerCenter Server to retry target writes on deadlock. You can only use Session Retry on Deadlock for sessions configured for normal load. This option is disabled for bulk mode. You can configure the PowerCenter Server to set the number of deadlock retries and the deadlock sleep time period.

Session Sort Order Required Specify a sort order for the session. The session properties display all sort orders associated with the PowerCenter Server code page. When the PowerCenter Server runs in Unicode mode, it sorts character data in the session using the selected sort order. When the PowerCenter Server runs in ASCII mode, it ignores this setting and uses a binary sort order to sort character data.


Config Object Tab

The Config Object tab displays settings such as session log settings, error handling settings, and other advanced properties. You can override properties in the default session configuration in the Config Object tab. Or, you can choose a session configuration object you already created in the Workflow Manager and override its properties.

Click the Open button in the Config Name field to choose the session configuration object you want to override.

You can configure the following settings in the Config Object tab:

♦ Advanced. Advanced settings allow you to configure constraint-based loading, lookup caches, and buffer sizes. For more information, see “Advanced Settings” on page 675.

♦ Log Options. Log options allow you to configure how you want to save the session log. By default, the PowerCenter Server saves only the current session log. For more information, see “Log Options Settings” on page 677.

♦ Error Handling. Error Handling settings allow you to determine if the session fails or continues when it encounters pre-session command errors, stored procedure errors, or a specified number of session errors. For more information see, “Error Handling Settings” on page 678.

Advanced SettingsAdvanced settings allow you to configure constraint-based loading, lookup caches, and buffer sizes.

Config Object Tab 675

Figure A-4 displays the Advanced settings on the Config Object tab:

Table A-4 describes the Advanced settings of the Config Object tab:

Figure A-4. Config Object Tab - Advanced Settings

Table A-4. Config Object Tab - Advanced Settings

Advanced Settings


Constraint Based Load Ordering

Optional The PowerCenter Server loads targets based on primary key-foreign key constraints where possible.

Cache Lookup() Function

Optional If selected, the PowerCenter Server caches PowerMart 3.5 LOOKUP functions in the mapping, overriding mapping-level LOOKUP configurations. If not selected, the PowerCenter Server performs lookups on a row-by-row basis, unless otherwise specified in the mapping.


Log Options SettingsLog options allow you to configure how you want to save the session log. By default, the PowerCenter Server saves only the current session log.

Figure A-5 displays the Log Options settings on the Config Object tab:

Default Buffer Block Size

Optional This setting is performance related. For details on performance tuning, see �Performance Tuning� on page 635.Note: The session must have enough buffer blocks to initialize. The minimum number of buffer blocks must be greater than the total number of sources (Source Qualifiers, Normalizers for COBOL sources), and targets. The number of buffer blocks in a session = DTM Buffer Size / Buffer Block Size. Default settings create enough buffer blocks for 83 sources and targets. If the session contains more than 83, you might need to increase DTM Buffer Size or decrease Default Buffer Block Size.

Line Sequential Buffer Length

Optional Affects the way the PowerCenter Server reads flat files. Increase this setting from the default of 1024 bytes per line only if source flat file records are larger than 1024 bytes.

Figure A-5. Config Object Tab - Log Option Settings

Table A-4. Config Object Tab - Advanced Settings

Advanced Settings



Table A-5 displays the Log Options settings of the Config Object tab:

Error Handling SettingsError Handling settings allow you to determine if the session fails or continues when it encounters pre-session command errors, stored procedure errors, or a specified number of session errors.

Table A-5. Config Object Tab - Log Options Settings

Log Options Settings Required/Optional Description

Save Session Log By Required If you select Save Session Log by Timestamp, the PowerCenter Server saves all session logs, appending a timestamp to each log.If you select Save Session Log by Runs, the PowerCenter Server saves a designated number of session logs. Configure the number of sessions in the Save Session Log for These Runs option.You can also use the $PMSessionLogCount server variable to save the configured number of session logs for the PowerCenter Server.For details on these options, see �Configuring Session Logs� on page 469.

Save Session Log for These Runs

Required The number of historical session logs you want the PowerCenter Server to save.The Informatica saves the number of historical logs you specify, plus the most recent session log. Therefore, if you specify 5 runs, the PowerCenter Server saves the most recent session log, plus historical logs 0-4, for a total of 6 logs.You can specify up to 2,147,483,647 historical logs. If you specify 0 logs, the PowerCenter Server saves only the most recent session log.


Figure A-6 displays the Error Handling settings on the Config Object tab:

Table A-6 describes the Error handling settings of the Config Object tab:

Figure A-6. Config Object Tab - Error Handling Settings

Table A-6. Config Object Tab - Error Handling Settings

Error Handling Settings


Stop On Errors Optional Indicates how many non-fatal errors the PowerCenter Server can encounter before it stops the session. Non-fatal errors include reader, writer, and DTM errors. Enter the number of non-fatal errors you want to allow before stopping the session. The PowerCenter Server maintains an independent error count for each source, target, and transformation. If you specify 0, non-fatal errors do not cause the session to stop. Optionally use the $PMSessionErrorThreshold server variable to stop on the configured number of errors for the PowerCenter Server.

Override Tracing Optional Overrides tracing levels set on a transformation level. Selecting this option enables a menu from which you choose a tracing level: None, Terse, Normal, Verbose Initialization, or Verbose Data. For details on tracing levels, see �Configuring Session Logs� on page 469.


On Stored Procedure Error

Optional Required if the session uses pre- or post-session stored procedures.If you select Stop Session, the PowerCenter Server stops the session on errors executing a pre-session or post-session stored procedure.If you select Continue Session, the PowerCenter Server continues the session regardless of errors executing pre-session or post-session stored procedures.By default, the PowerCenter Server stops the session on Stored Procedure error and marks the session failed.

On Pre-Session Command Task Error

Optional Required if the session has pre-session shell commands.If you select Stop Session, the PowerCenter Server stops the session on errors executing pre-session shell commands.If you select Continue Session, the PowerCenter Server continues the session regardless of errors executing pre-session shell commands.By default, the PowerCenter Server stops the session upon error.

On Pre-Post SQL Error Optional Required if the session uses pre- or post-session SQL.If you select Stop Session, the PowerCenter Server stops the session errors executing pre-session or post-session SQL.If you select Continue, the PowerCenter Server continues the session regardless of errors executing pre-session or post-session SQL.By default, the PowerCenter Server stops the session upon pre- or post-session SQL error and marks the session failed.

Enable Recovery Optional Enables recovery for the session. For details on recovery, see �Recovering Data� on page 295.

Error Log Type Required Specifies the type of error log to create. You can specify relational, file, or no log. By default, the Error Log Type is set to none.

Error Log DB Connection Optional Specifies the database connection for a relational error log.

Error Log Table Name Prefix

Optional Specifies table name prefix for a relational error log. Oracle and Sybase have a 30 character limit for table names. If a table name exceeds 30 characters, the session fails.

Error Log File Directory Optional Specifies the directory where errors are logged. By default, the error log file directory is $PMBadFilesDir\.

Error Log File Name Optional Specifies error log file name. By default, the error log file name is PMError.log.

Log Row Data Optional Specifies whether or not to log row data. By default, the check box is clear and row data is not logged.

Log Source Row Data Optional Specifies whether or not to log source row data. By default, the check box is clear and source row data is not logged.

Data Column Delimiter Optional Delimiter for string type source row data and transformation group row data. By default, the PowerCenter Server uses a pipe ( | ) delimiter. Verify that you do not use the same delimiter for the row data as the error logging columns. If you use the same delimiter, you may find it difficult to read the error log file.

Table A-6. Config Object Tab - Error Handling Settings

Error Handling Settings



Mapping Tab (Transformations View)

In the Transformations view of the Mapping tab, you can configure settings for connections, sources, targets, and transformations.

You can configure the following nodes:

♦ Connections

♦ Sources

♦ Targets

♦ Transformations

Connections NodeThe Connections node displays the source, target, lookup, stored procedure, FTP, external loader, and queue connections. You can choose connection types and connection values. You can also edit connection object values.

Figure A-7 displays the Connections settings on the Mapping tab:

Figure A-7. Mapping Tab - Connections Settings

Mapping Tab (Transformations View) 681

Table A-7 describes the Connections settings on the Mapping tab:

Table A-7. Mapping Tab - Connections Settings

Connections Node Settings


Type Required Enter the connection type for relational and non-relational sources and targets. Specifies Relational for relational sources and targets.You can choose the following connection types for flat file, XML, and MQSeries sources/Targets:- Queue. Select this connection type to access a MQSeries source if you are

using MQ Source Qualifiers. For static MQSeries targets, set the connection type to FTP or Queue. For dynamic MQSeries targets, the connection type is set to Queue. MQSeries connections must be defined in the Workflow Manager prior to configuring sessions. For more information, see the PowerCenter Connect for IBM MQSeries User and Administrator Guide .

- Loader. Select this connection type to use the External Loader to load output files to Teradata, Oracle, DB2, or Sybase IQ databases. If you select this option, select a configured loader connection in the Value column.To use this option, you must use a mapping with a relational target definition and choose File as the writer type on the Writers tab for the relational target instance. As the PowerCenter Server completes the session, it uses an external loader to load target files to the Oracle, Sybase IQ, DB2, or Teradata database. You cannot choose external loader for flat file or XML target definitions in the mapping. Note to Oracle 8 users: If you configure a session to write to an Oracle 8 external loader target table in bulk mode with NOT NULL constraints on any columns, the session may write the null character into a NOT NULL column if the mapping generates a NULL output.For details on using the external loader feature, see �External Loading� on page 523.

- FTP. Select this connection type to use FTP to access the source/target directory for flat file and XML sources/targets. If you select this option, select a configured FTP connection in the Value column. FTP connections must be defined in the Workflow Manager prior to configuring sessions. For details on using FTP, see �Using FTP� on page 559.

- None. Choose None when you want to read from a local flat file or XML file, or if you are using an associated source for a MQSeries session.

The type also specifies lists the connections in the mapping, such as $Source connection value and $Target connection value. You can also configure connection information for Lookups and Stored Procedures.


Sources NodeThe Sources node lists the sources used in the session and displays their settings. If you want to view and configure the settings of a specific source, select the source from the list.

You can configure the following settings:

♦ Readers. The Readers settings displays the reader the PowerCenter Server uses with each source instance. For more information, see “Readers Settings” on page 684.

♦ Connections. The Connections settings allows you to configure connections for the sources. For more information, see “Connections Settings” on page 684.

♦ Properties. The Properties settings allows you to configure the source properties. For more information, see “Properties Settings” on page 686.

Partitions N/A Displays the partitions if the session is partitioned.

Value Required Enter a source and target connection based on the value you choose in the Type column. You can also specify the $Source and $Target connection value:- $Source connection value. Enter the database connection you want the

PowerCenter Server to use for the $Source variable. Choose a relational or application database connection. You can also choose a $DBConnection parameter. You can use the $Source variable in Lookup and Stored Procedure transformations to specify the database location for the lookup table or stored procedure. If you use $Source in a mapping, you can specify the database location in this field to ensure the PowerCenter Server uses the correct database connection to run the session. If you use $Source in a mapping, but do not specify a database connection in this field, the PowerCenter Server determines which database connection to use when it runs the session. If it cannot determine the database connection, it fails the session. For more information, see the Transformation Guide.

- $Target connection value. Enter the database connection you want the PowerCenter Server to use for the $Target variable. Choose a relational or application database connection. You can also choose a $DBConnection parameter. You can use the $Target variable in Lookup and Stored Procedure transformations to specify the database location for the lookup table or stored procedure. If you use $Target in a mapping, you can specify the database location in this field to ensure the PowerCenter Server uses the correct database connection to run the session. If you use $Target in a mapping, but do not specify a database connection in this field, the PowerCenter Server determines which database connection to use when it runs the session. If it cannot determine the database connection, it fails the session. For more information, see the Transformation Guide.

You can also specify the lookup and stored procedure location information value, if your mapping has lookups or stored procedures.

Table A-7. Mapping Tab - Connections Settings

Connections Node Settings



Readers SettingsYou can view the reader the PowerCenter Server uses with each source instance. The Workflow Manager specifies the necessary reader for each source instance. For relations sources the reader is Relational Reader and for file sources it is File Reader.

Figure A-8 displays the Readers settings on the Mapping tab (Sources node):

Connections SettingsYou can configure the connections the PowerCenter Server uses with each source instance.

Figure A-8. Mapping Tab - Sources Node - Readers Settings


Figure A-9 displays the Connections settings on the Mapping tab (Sources node):

Table A-8 describes the Connections settings on the Mapping tab (Sources node):

Figure A-9. Mapping Tab - Sources Node - Connections Settings

Table A-8. Mapping Tab - Sources Node - Connections Settings



Type Required Enter the connection type for relational and non-relational sources. Specifies Relational for relational sources.You can choose the following connection types for flat file, XML, and MQSeries sources:- Queue. Select this connection type to access a MQSeries source if you are using

MQ Source Qualifiers. MQSeries connections must be defined in the Workflow Manager prior to configuring sessions. For more information, see the PowerCenter Connect for IBM MQSeries User and Administrator Guide .

- FTP. Select this connection type to use FTP to access the source directory for flat file and XML sources. If you want to extract data from a flat file or XML source using FTP, you must specify an FTP connection when you configure source options. If you select this option, select a configured FTP connection in the Value column. FTP connections must be defined in the Workflow Manager prior to configuring sessions. For details on using FTP, see �Using FTP� on page 559.

- None. Choose None when you want to read from a local flat file or XML file, or if you are using an associated source for a MQSeries session.

Value Required Enter a source connection based on the value you choose in the Type column.


Properties SettingsClick the Properties settings to define source property information. The Workflow Manager displays properties for both relational and file sources.

Figure A-10 displays the Properties settings on the Mapping tab (Sources node):

Table A-9 describes Properties settings on the Mapping tab for relational sources:

Figure A-10. Mapping Tab - Sources Node - Properties Settings

Table A-9. Mapping Tab - Sources Node - Properties Settings (Relational Sources)

Relational Source Options


Owner Name Optional Specified the table owner name.

User Defined Join Optional Specifies the condition used to join data from multiple sources represented in the same Source Qualifier transformation. For more information about user defined join, see �Source Qualifier Transformation� in the Transformation Guide.

Tracing Level N/A Specifies the amount of detail included in the session log when you run a session containing this transformation. You can view the value of this attribute when you click Show all properties. For more information about tracing level, see �Setting Tracing Levels� on page 473.

Select Distinct Optional Selects unique rows.


Table A-10 describes the Properties settings on the Mapping tab for file sources:

Pre SQL Optional Pre-session SQL commands to run against the source database before the PowerCenter Server reads the source. For more information about pre-session SQL, see �Using Pre- and Post-Session SQL Commands� on page 186.

Post SQL Optional Post-session SQL commands to run against the source database after the PowerCenter Server writes to the target. For more information about post-session SQL, see �Using Pre- and Post-Session SQL Commands� on page 186.

Sql Query Optional Defines a custom query that replaces the default query the PowerCenter Server uses to read data from sources represented in this Source Qualifier. A custom query overrides entries for a custom join or a source filter. For more information, see �Overriding the SQL Query� on page 216.

Source Filter Optional Specifies the filter condition the PowerCenter Server applies when querying records. For more information, see �Source Qualifier Transformation� in the Transformation Guide.

Table A-10. Mapping Tab - Sources Node - Properties Settings (File Sources)

File Source Options



Optional Enter the directory name in this field. By default, the PowerCenter Server looks in the server variable directory, $PMSourceFileDir, for file sources.If you specify both the directory and file name in the Source Filename field, clear this field. The PowerCenter Server concatenates this field with the Source Filename field when it runs the session.You can also use the $InputFileName session parameter to specify the file directory. For details on session parameters, see �Session Parameters� on page 495.

Source Filename Required Enter the file name, or file name and path. Optionally use the $InputFileName session parameter for the file name. The PowerCenter Server concatenates this field with the Source File Directory field when it runs the session. For example, if you have �C:\data\� in the Source File Directory field, then enter �filename.dat� in the Source Filename field. When the PowerCenter Server begins the session, it looks for �C:\data\filename.dat�. By default, the Workflow Manager enters the file name configured in the source definition.For details on session parameters, see �Session Parameters� on page 495.

Table A-9. Mapping Tab - Sources Node - Properties Settings (Relational Sources)

Relational Source Options



Setting File Properties for SourcesConfigure flat file properties by clicking the Set File Properties link in the Sources node. You can define properties for both fixed-width and delimited flat file sources.

You can configure flat file properties for non-reusable sessions in the Workflow Designer and for reusable sessions in the Task Developer.

Figure A-11 shows the Flat Files dialog box that appears when you click Set File Properties:

Select the file type (fixed-width or delimited) you want to configure and click Advanced.

Configuring Fixed-Width Properties for SourcesTo edit the fixed-width properties, select Fixed Width in the Flat Files dialog box and click the Advanced button. The Fixed Width Properties dialog box appears.

Source Filetype Required Allows you to configure multiple file sources using a file list. Indicates whether the source file contains the source data, or a list of files with the same file properties. Choose Direct if the source file contains the source data. Choose Indirect if the source file contains a list of files. When you select Indirect, the PowerCenter Server finds the file list then reads each listed file when it executes the session. For details on file lists, see �Using a File List� on page 230.

Set File Properties Optional Allows you to configure the file properties. For more information, see �Setting File Properties for Sources� on page 688.

Datetime Format* N/A Displays the datetime format for datetime fields.

Thousand Separator*

N/A Displays the thousand separator for numeric fields.

Decimal Separator* N/A Displays the decimal separator for numeric fields.

*You can view the value of this attribute when you click Show all properties. This attribute is read-only. For more information, see the Designer Guide.

Figure A-11. Flat Files Dialog Box for Sources

Table A-10. Mapping Tab - Sources Node - Properties Settings (File Sources)

File Source Options



Note: Edit these settings only if you need to override those configured in the source definition.

Figure A-12 displays the Fixed Width Properties dialog box for flat file sources:

Table A-11 describes the options you define in the Fixed Width Properties dialog box for sources:

Figure A-12. Fixed Width Properties

Table A-11. Fixed-Width Properties for File Sources



Null Character: Text/Binary

Required Indicates the character representing a null value in the file. This can be any valid character in the file code page, or any binary value from 0 to 255. For more information about specifying null characters, see �Null Character Handling� on page 227.

Repeat Null Character

Optional If selected, the PowerCenter Server reads repeat NULL characters in a single field as a single NULL value. If you do not select this option, the PowerCenter Server reads a single null character at the beginning of a field as a null field. Important: For multibyte code pages, Informatica recommends that you specify a single-byte null character if you are using repeating non-binary null characters. This ensures that repeating null characters fit into the column exactly. For more information about specifying null characters, see �Null Character Handling� on page 227.



Optional The PowerCenter Server skips the specified number of rows before reading the file. Use this to skip header rows. One row may contain multiple rows. If you select the Line Sequential File Format option, the PowerCenter Server ignores this option.You can enter any integer from zero to 2147483647.


Configuring Delimited File Properties for SourcesTo edit the delimited properties, select Delimited in the Flat Files dialog box and click the Advanced button. The Delimited File Properties dialog box appears.

Note: Edit these settings only if you need to override those configured in the source definition.

Figure A-13 displays the Delimited File Properties dialog box for flat file sources:

Number of Bytes to Skip Between Records

Optional The PowerCenter Server skips the specified number of bytes between records. For example, you have an ASCII file on Windows with one record on each line, and a carriage return and line feed appear at the end of each line. If you want the PowerCenter Server to skip these two single-byte characters, enter 2.If you have an ASCII file on UNIX with one record for each line, ending in a carriage return, skip the single character by entering 1.

Strip Trailing Blanks Optional If selected, the PowerCenter Server strips trailing blank spaces from records before passing them to the Source Qualifier transformation.

Line Sequential File Format

Optional Select this option if the file uses a carriage return at the end of each record, shortening the final column.

Figure A-13. Delimited Properties for File Sources

Table A-11. Fixed-Width Properties for File Sources




Table A-12 describes the options you can define in the Delimited File Properties dialog box for flat file sources:

Table A-12. Delimited Properties for File Sources



Delimiters Required Character used to separate columns of data in the source file. Use the Browse button to the right of this field to enter a different delimiter. Delimiters can be either printable or single-byte unprintable characters, and must be different from the escape character and the quote character (if selected). You cannot select unprintable multibyte characters as delimiters. The delimiter must be in the same code page as the flat file code page.

Optional Quotes Required Select None, Single, or Double. If you select a quote character, the PowerCenter Server ignores delimiter characters within the quote characters. Therefore, the PowerCenter Server uses quote characters to escape the delimiter. For example, a source file uses a comma as a delimiter and contains the following row: 342-3849, ‘Smith, Jenna’, ‘Rockville, MD’, 6. If you select the optional single quote character, the PowerCenter Server ignores the commas within the quotes and reads the row as four fields.If you do not select the optional single quote, the PowerCenter Server reads six separate fields.When the PowerCenter Server reads two optional quote characters within a quoted string, it treats them as one quote character. For example, the PowerCenter Server reads the following quoted string as I’m going tomorrow:

2353, �I��m going tomorrow.�, MDAdditionally, if you select an optional quote character, the PowerCenter Server only reads a string as a quoted string if the quote character is the first character of the field.Note: You can improve session performance if the source file does not contain quotes or escape characters.


Escape Character Optional Character immediately preceding a delimiter character embedded in an unquoted string, or immediately preceding the quote character in a quoted string. When you specify an escape character, the PowerCenter Server reads the delimiter character as a regular character (called escaping the delimiter or quote character). Note: You can improve session performance for mappings containing Sequence Generator transformations if the source file does not contain quotes or escape characters.

Remove Escape Character From Data

Optional This option is selected by default. Clear this option to include the escape character in the output string.


Targets NodeThe Targets node lists the used in the session and displays their settings. If you want to view and configure the settings of a specific target, select the target from the list.

You can configure the following settings:

♦ Writers. The Writers settings displays the writer the PowerCenter Server uses with each target instance. For more information, see “Writers Settings” on page 692.

♦ Connections. The Connections settings allows you to configure connections for the targets. For more information, see “Connections Settings” on page 693.

♦ Properties. The Properties settings allows you to configure the target properties. For more information, see “Properties Settings” on page 695.

Writers SettingsYou can view and configure the writer the PowerCenter Server uses with each target instance. The Workflow Manager specifies the necessary writer for each target instance. For relational targets the writer is Relational Writer and for file targets it is File Writer.

Treat Consecutive Delimiters as One

Optional By default, the PowerCenter Server reads pairs of delimiters as a null value. If selected, the PowerCenter Server reads any number of consecutive delimiter characters as one.For example, a source file uses a comma as the delimiter character and contains the following record: 56, , , Jane Doe. By default, the PowerCenter Server reads that record as four columns separated by three delimiters: 56, NULL, NULL, Jane Doe. If you select this option, the PowerCenter Server reads the record as two columns separated by one delimiter: 56, Jane Doe.


Optional The PowerCenter Server skips the specified number of rows before reading the file. Use this to skip title or header rows in the file.

Table A-12. Delimited Properties for File Sources




Figure A-14 displays the Writers settings on the Mapping tab (Targets node):

Table A-13 describes the Writers settings on the Mapping tab (Targets node):

Connections SettingsYou can enter connection types and specific target database connections on the Targets node of the Mappings tab.

Figure A-14. Mapping Tab - Targets Node - Writers Settings

Table A-13. Mapping Tab - Targets Node - Writers Settings

Writers Setting


Writers Required For relational targets, choose Relational Writer or File Writer. When the target in the mapping is a flat file, an XML file, a SAP BW target, or MQ target, the Workflow Manager specifies the necessary writer in the session properties.When you choose File Writer for a relational target you can use an external loader to load data to this target. For more information, see �External Loading� on page 523.When you override a relational target to use the file writer, the Workflow Manager changes the properties for that target instance on the Properties settings. It also changes the connection options you can define on the Connections settings.After you override a relational target to use a file writer, define the file properties for the target. Click Set File Properties and choose the target to define. For more information, see �Configuring Fixed-Width Properties� on page 265 and �Configuring Delimited Properties� on page 266.


Figure A-15 displays the Connections settings on the Mapping tab (Targets node):

Figure A-15. Mapping Tab - Targets Node - Connections Settings


Table A-14 describes the Connections settings on the Mapping tab (Targets node):

Properties SettingsClick the Properties settings to define target property information. The Workflow Manager displays different properties for the different target types: relational, flat file, and XML.

Properties Settings for Relational TargetsYou can configure the writer and object instance attributes for a relational target.

Table A-14. Mapping Tab - Targets Node - Connections Settings



Type Required Enter the connection type for non-relational targets. Specifies Relational for relational targets.You can choose the following connection types for flat file, XML, and MQ targets:- FTP. Select this connection type to use FTP to access the target directory for

flat file and XML targets. If you want to load data to a flat file or XML target using FTP, you must specify an FTP connection when you configure target options. If you select this option, select a configured FTP connection in the Value column. FTP connections must be defined in the Workflow Manager prior to configuring sessions. For details on using FTP, see �Using FTP� on page 559.

- External Loader. Select this connection type to use the External Loader to load output files to Teradata, Oracle, DB2, or Sybase IQ databases. If you select this option, select a configured loader connection in the Value column.To use this option, you must use a mapping with a relational target definition and choose File as the writer type on the Writers tab for the relational target instance. As the PowerCenter Server completes the session, it uses an external loader to load target files to the Oracle, Sybase IQ, DB2, or Teradata database. You cannot choose external loader for flat file or XML target definitions in the mapping. Note to Oracle 8 users: If you configure a session to write to an Oracle 8 external loader target table in bulk mode with NOT NULL constraints on any columns, the session may write the null character into a NOT NULL column if the mapping generates a NULL output.For details on using the external loader feature, see �External Loading� on page 523.

- Queue. Choose Queue when you want to output to an MQSeries message queue. If you select this option, select a configured MQ connection in the Value column. For more information, see the PowerCenter Connect for IBM MQSeries User and Administrator Guide.

- None. Choose None when you want to write to a local flat file or XML file.

Partitions N/A Displays the partitions if the session is partitioned.

Value Required Enter a target connection based on the value you choose in the Type column.


Figure A-16 displays the Properties settings on the Mapping tab for relational targets:

Figure A-16. Mapping Tab - Targets Node - Properties Settings (Relational)


Table A-15 describes the Properties settings on the Mapping tab for relational targets:

Table A-15. Mapping Tab - Targets Node - Properties Settings (Relational)


Target Load Type Required You can choose Normal or Bulk. If you select Normal, the PowerCenter Server loads targets normally. You can only choose Bulk when you load to Sybase, Oracle, or Microsoft SQL Server. If you select Bulk for a Sybase, Oracle, or Microsoft SQL Server target, Informatica invokes the bulk API with default settings, bypassing database logging. If you select Bulk for other database types, the PowerCenter Server reverts to a normal load.Loading in bulk mode can improve session performance, but limits your ability to recover because no database logging occurs.Note: Choose Normal mode if the mapping contains an Update Strategy transformation.Tip: When you choose Bulk mode for Microsoft SQL Server or Oracle targets, define a large commit interval. Consider the following database limitations when you choose Bulk mode when loading to Oracle:- Do not define CHECK constraints in the database.- Do not define primary-foreign keys in the database. However, you can

define primary-foreign keys for the target definitions in the Designer.- Do not create indexes in the database.- When you use the LONG datatype, verify it is the last column in the table.For more information, see your Oracle documentation.

Insert Optional If selected, the PowerCenter Server inserts all rows flagged for insert.By default, this option is selected.For details on target update strategies, see �Update Strategy Transformation� in the Transformation Guide.

Update (as Update) Optional If selected, the PowerCenter Server updates all rows flagged for update. By default, this option is selected.For details on target update strategies, see �Update Strategy Transformation� in the Transformation Guide.

Update (as Insert) Optional If selected, the PowerCenter Server inserts all rows flagged for update.By default, this option is not selected. For details on target update strategies, see �Update Strategy Transformation� in the Transformation Guide.

Update (else Insert) Optional If selected, the PowerCenter Server updates rows flagged for update if it they exist in the target, then inserts any remaining rows marked for insert.For details on target update strategies, see �Update Strategy Transformation� in the Transformation Guide.

Delete Optional If selected, the PowerCenter Server deletes all rows flagged for delete.For details on target update strategies, see �Update Strategy Transformation� in the Transformation Guide.

Truncate Table Optional If selected, the PowerCenter Server truncates the target before loading. For details on this feature, see �Truncating Target Tables� on page 245.


Reject File Directory Optional Enter the directory name in this field. By default, the PowerCenter Server writes all reject files to the server variable directory, $PMBadFileDir.If you specify both the directory and file name in the Reject Filename field, clear this field. The PowerCenter Server concatenates this field with the Reject Filename field when it runs the session.You can also use the $BadFileName session parameter to specify the file directory. For details on session parameters, see �Session Parameters� on page 495.


Rejected Truncated/Overflowed rows*

Optional Instructs the PowerCenter Server to write the truncated and overflowed rows to the reject file.

Update Override* Optional Override the default UPDATE statement.

Table Name Prefix Optional Specify the owner of the target tables.

Pre SQL Optional You can enter pre-session SQL commands for a target instance in a mapping to execute commands against the target database before the PowerCenter Server reads the source.

Post SQL Optional Enter post-session SQL commands to execute commands against the target database after the PowerCenter Server writes to the target.


Table A-15. Mapping Tab - Targets Node - Properties Settings (Relational)



Properties Settings for Flat File TargetsFigure A-17 describes the Properties settings on the Mapping tab for file targets:

Table A-16 describes the Properties settings on the Mapping tab for file targets:

Figure A-17. Mapping Tab - Targets Node - File Properties Settings

Table A-16. Mapping Tab - Targets Node - File Properties Settings


Merge Partitioned Files

Optional When selected, the PowerCenter Server merges the partitioned target files into one file when the session completes, and then deletes the individual output files. If the PowerCenter Server fails to create the merged file, it does not delete the individual output files.You cannot merge files if the session uses FTP, an external loader, or a message queue.For details on configuring a session for partitioning, see �Pipeline Partitioning� on page 345.

Merge File Directory

Optional Enter the directory name in this field. By default, the PowerCenter Server writes the merged file in the server variable directory, $PMTargetFileDir.If you enter a full directory and file name in the Merge File Name field, clear this field.

Merge File Name Optional Name of the merge file. Default is target_name.out. This property is required if you select Merge Partitioned Files.


Output File Directory

Optional Enter the directory name in this field. By default, the PowerCenter Server writes output files in the server variable directory, $PMTargetFileDir.If you specify both the directory and file name in the Output Filename field, clear this field. The PowerCenter Server concatenates this field with the Output Filename field when it runs the session.You can also use the $OutputFileName session parameter to specify the file directory. For details on session parameters, see �Session Parameters� on page 495.

Output Filename Required Enter the file name, or file name and path. By default, the Workflow Manager names the target file based on the target definition used in the mapping: target_name.out.If the target definition contains a slash character, the Workflow Manager replaces the slash character with an underscore.When you use an external loader to load to an Oracle database, you must specify a file extension. If you do not specify a file extension, the Oracle loader cannot find the flat file and the PowerCenter Server fails the session. For more information about external loading, see �Loading to Oracle� on page 533.Enter the file name, or file name and path. Optionally use the $OutputFileName session parameter for the file name. The PowerCenter Server concatenates this field with the Output File Directory field when it runs the session. For details on session parameters, see �Session Parameters� on page 495.Note: If you specify an absolute path file name when using FTP, the PowerCenter Server ignores the Default Remote Directory specified in the FTP connection. When you specify an absolute path file name, do not use single or double quotes.

Reject File Directory

Optional Enter the directory name in this field. By default, the PowerCenter Server writes all reject files to the server variable directory, $PMBadFileDir.If you specify both the directory and file name in the Reject Filename field, clear this field. The PowerCenter Server concatenates this field with the Reject Filename field when it runs the session.You can also use the $BadFileName session parameter to specify the file directory. For details on session parameters, see �Session Parameters� on page 495.


Set File Properties Optional Allows you to configure the file properties. For more information, see �Setting File Properties for Targets� on page 701.

Datetime Format* N/A Displays the datetime format selected for datetime fields.




Setting File Properties for TargetsClick the Set File Properties button on the Mapping tab to configure flat file properties. You can define flat file properties for both fixed-width and delimited flat file targets.

You can configure flat file properties for non-reusable sessions in the Workflow Designer and for reusable sessions in the Task Developer.

Figure A-18 shows the Flat Files dialog box that appears when you click Set File Properties:

Select the file type (fixed-width or delimited) you want to configure and click Advanced.

Configuring Fixed-Width Properties for TargetsTo edit the fixed-width properties, select Fixed Width in the Flat Files dialog box and click the Advanced button. The Fixed Width Properties dialog box appears.

Thousand Separator*

N/A Displays the thousand separator for numeric fields.

Decimal Separator* N/A Displays the decimal separator for numeric fields.


Figure A-18. Flat Files Dialog Box for Targets




Figure A-19 displays the Fixed-Width Properties dialog box for flat file targets:

Table A-17 describes the options you define in the Fixed Width Properties dialog box:

Configuring Delimited Properties for TargetsTo edit the delimited properties, select Delimited in the Flat Files dialog box and click the Advanced button. The Delimited File Properties dialog box appears.

Figure A-20 displays the Delimited File Properties dialog box for flat file targets:

Figure A-19. Fixed-Width Properties for File Targets

Table A-17. Fixed-Width Properties for File Targets



Null Character Required Enter the character you want the PowerCenter Server to use to represent null values. You can enter any valid character in the file code page.For more information about specifying null characters for target files, see �Null Characters in Fixed-Width Files� on page 272.

Repeat Null Character Optional Select this option to indicate a null value by repeating the null character to fill the field. If you do not select this option, the PowerCenter Server enters a single null character at the beginning of the field to represent a null value. For more information about specifying null characters for target files, see �Null Characters in Fixed-Width Files� on page 272.


Figure A-20. Delimited Properties for File Targets


Table A-18 describes the options you can define in the Delimited File Properties dialog box for flat file targets:

Transformations Node On the Transformations node, you can override properties that you configure in transformation and target instances in a mapping. The attributes you can configure depends on the type of transformation you select.

Table A-18. Delimited Properties for File Targets

Edit Delimiter Options


Delimiters Required Character used to separate columns of data. Use the Browse button to the right of this field to enter a non-printable delimiter. Delimiters can be either printable or single-byte unprintable characters, and must be different from the escape character and the quote character (if selected). You cannot select unprintable multibyte characters as delimiters.

Optional Quotes Required Select No Quotes, Single Quote, or Double Quotes. If you select a quote character, the PowerCenter Server does not treat delimiter characters within the quote characters as a delimiter. For example, suppose an output file uses a comma as a delimiter and the PowerCenter Server receives the following row: 342-3849, �Smith, Jenna�, �Rockville, MD�, 6. If you select the optional single quote character, the PowerCenter Server ignores the commas within the quotes and writes the row as four fields. If you do not select the optional single quote, the PowerCenter Server writes six separate fields.



Figure A-21 displays the Transformations node on the Mapping tab:

Figure A-21. Mapping Tab - Transformations Node


Mapping Tab (Partitions View)

In the Partitions view of the Mapping tab, you can configure partitions. You can configure partitions for non-reusable sessions in the Workflow Designer and for reusable sessions in the Task Developer.

The following nodes are available in the Partitions view:

♦ Partition Properties. For more information, see “Partition Properties Node” on page 705.

♦ KeyRange. For more information, see “KeyRange Node” on page 706.

♦ HashKeys. For more information, see “HashKeys Node” on page 706.

♦ Partition Points. For more information, see “Partition Points Node” on page 706.

♦ Non-Partition Points. For more information, see “Non-Partition Points Node” on page 709.

Partition Properties NodeThe Partition Properties node allows you to configure partitions.

Figure A-22 displays the Mapping tab - Partitions Properties node:

Figure A-22. Mapping Tab - Partitions Properties Node

Mapping Tab (Partitions View) 705

KeyRange NodeIn the KeyRange node, you can configure the partition range for key-range partitioning. Select Edit Keys to edit the partition key. For more information, see “Edit Partition Key” on page 708.

Figure A-23 displays the KeyRange node on the Mapping tab:

HashKeys NodeThe HashKeys node you can configure hash key partitioning. Select Edit Keys to edit the partition key. For more information, see “Edit Partition Key” on page 708.

Partition Points NodeThe Partition Points node displays the mapping with the transformation icons. The Partition Points node lists the partition points in the tree. Select a partition point to configure its attributes.

In the Partition Points node you can configure the following options for each pipeline in a mapping:

♦ Add and delete partition points.

♦ Specify the partition type at each partition point.

Figure A-23. Mapping Tab - KeyRange Node


♦ Add and delete partitions.


♦ Add keys and key ranges for certain partition types.

For more information about partitioning a pipeline, see “Pipeline Partitioning” on page 345.

Figure A-24 displays Mapping tab - Partition Points node:

Table A-19 describes the Partition Points node:

Figure A-24. Mapping Tab - Partition Points Node

Table A-19. Mapping Tab - Partition Points Node

Partition Points Node Description

Add Partition Point Click to add a new partition point to the Transformation list. For information on adding partition points, see �Adding and Deleting Partition Points� on page 353.

Delete Partition Point

Click to delete the current partition point. You cannot delete certain partition points. For details, see �Adding and Deleting Partition Points� on page 353.

Edit Partition Point Click to edit the current partition point.

Edit Keys Click to add, remove, or edit the key for key range or hash user keys partitioning. This button is not available for auto-hash, round-robin, or pass-through partitioning.For more information on adding keys and key ranges, see �Adding Keys and Key Ranges� on page 358.


Edit Partition PointThe Edit Partition Point dialog box allows you to add and delete partitions, and to select the partition type.

Figure A-25 displays the Edit Partition Points dialog box:

Table A-20 describes the options in the Edit Partition Point dialog box:

Edit Partition KeyWhen you specify key range or hash user keys partitioning at any partition point, you must specify one or more ports as the partition key. Click Edit Key to display the Edit Partition Key dialog box.

Figure A-25. Edit Partition Point Dialog Box

Table A-20. Edit Partition Point Dialog Box Options

Edit Partition Point Options Description

Add button Click to add a partition. You can add up to 64 partitions. For more information on adding partitions, see �Adding and Deleting Partitions� on page 356.

Delete button Click to delete the selected partition. For more information on deleting partitions, see �Adding and Deleting Partitions� on page 356.

Name Partition number.

Description Enter a description for the current partition.

Select Partition Type Select a partition type from the list. For more information, see �Specifying Partition Types� on page 356.


Figure A-26 displays the Edit Partition Key dialog box:

You can specify one or more ports as the partition key. To rearrange the order of the ports that make up the key, select a port in the Selected Ports list and click the up or down arrow.

For information on adding a key for key range partitioning, see “Key Range Partition Type” on page 363. For information on adding a key for hash partitioning, see “Hash Keys Partition Types” on page 361.

Non-Partition Points NodeThe Non-Partition Points node displays the mapping objects in iconized view. The Partition Points node lists the non-partition points in the tree. You can select a non-partition point and add partitions if you want.

Figure A-26. Edit Partition Key Dialog Box


Components Tab

In the Components tab, you can configure pre-session shell commands, post-session commands, and email messages if the session succeeds or fails.

Figure A-27 displays the Components Tab:

Figure A-27. Components Tab


Table A-21 describes the Components tab options:

Table A-22 describes the tasks available in the Components tab:

Reusable Pre- or Post-Session CommandsSelect Reusable in the Type field if you want to select an existing Command task as the pre- or post-session shell command. The Command Object Browser appears when you click the Open button in the Value field.

Table A-21. Components Tab

Components Tab Option

Optional/Required Description

Task n/a Tasks you can perform in the Components tab. You can configure pre- or post-session shell commands and success or failure email messages in the Components tab.

Type Required Select None if you do not want to configure commands and emails in the Components tab.For pre- and post-session commands, select Reusable to call an existing reusable Command task as the pre- or post-session shell command. Select Non-Reusable to create pre- or post-session shell commands for this session task.For success or failure emails, select Reusable to call an existing Email task as the success or failure email. Select Non-Reusable to create email messages for this session task.

Value Optional Use to configure commands or emails.

Table A-22. Components Tab Tasks

Components Tab Tasks


Pre-Session Command

Optional Shell commands that the PowerCenter Server performs at the beginning of a session. For details on using pre-session shell commands, see �Using Pre- or Post-Session Shell Commands� on page 188.

Post-Session Success Command

Optional Shell commands that the PowerCenter Server performs after the session completes successfully. For details on using pre-session shell commands, see �Using Pre- or Post-Session Shell Commands� on page 188.

Post-Session Failure Command

Optional Shell commands that the PowerCenter Server performs after the session if the session fails. For details on using pre-session shell commands, see �Using Pre- or Post-Session Shell Commands� on page 188.

On Success Email Optional The PowerCenter Server sends On Success email message if the session completes successfully.

On Failure Email Optional The PowerCenter Server sends On Failure email message if the session fails.

Components Tab 711

Figure A-28 displays the Task Browser:

Click the Override button to override the Run If Previous Completed option in the Command task. For details on the Run If Previous Completed option, see Table A-24 on page 714.

Non-Reusable Pre- or Post-Session CommandsSelect Non-Reusable in the Type field if you want to create pre- or post-session commands for the session. Non-reusable pre- or post-session commands do not appear as Command tasks in the folder.

Click the Open button in the Value field in the Components tab to edit pre- or post-session shell commands. The Edit Pre-Session Command or Edit Post-Session Command dialog box appears.

Figure A-28. Task Browser


Figure A-29 displays the Edit Pre-Session Command dialog box:

Table A-23 describes General tab for editing pre- or post-session shell commands:

Figure A-29. Edit Pre-Session Command Dialog Box

Table A-23. Pre- or Post-Session Commands - General Tab

General Tab for Pre- or Post-Session Commands


Name Required Enter a name for the pre- or post-session shell command.

Make Reusable Required Select Make Reusable to create a reusable Command task from the pre- or post-session shell commands.Clear the Make Reusable option if you do not want the Workflow Manager to create a reusable Command task from the shell commands.For details on creating Command tasks from pre- or post-session shell commands, see �Creating a Reusable Command Task from Pre- or Post-Session Commands� on page 191.

Description Optional Enter a description for the pre- or post-session shell command.

Components Tab 713

Table A-24 describes the Properties tab for editing pre- or post-session commands:

Table A-25 describes the Commands tab for editing pre- or post-session commands:

Reusable EmailSelect Reusable in the Type field for the On-Success or On-Failure email if you want to select an existing Email task as the On-Success or On-Failure email. The Email Object Browser appears when you click the right side of the Values field.

Table A-24. Pre- or Post-Session Commands - Properties Tab

Properties Tab for Pre- or Post-Session Commands


Name Required The name of the pre-session shell command.

Run If Previous Completed

Required Select this option if you want the PowerCenter Server to perform the next command only if the previous command completed successfully.

Table A-25. Pre- or Post-Session Commands - Commands Tab

Commands Tab for Pre- or Post-Session Commands


Name Required The name of the pre- or post-session shell command.

Command Required The shell command you want the PowerCenter Server to perform. Enter one command for each line. You can use session parameters or server variables in shell commands.If your command contains spaces, enclose the command in quotes. For example, if you want to call c:\program files\myprog.exe, you must enter �c:\program files\myprog.exe�, including the quotes. Enter only one command on each line.


Figure A-30 displays Email Object Browser:

Select an Email task to use as On-Success or On-Failure email. Click the Override button to override properties of the email. For more information about email properties, see Table A-27 on page 717.

Non-Reusable EmailSelect Non-Reusable in the Type field to create a non-reusable email for the session. Non-Reusable emails do not appear as Email tasks in the Task folder. Click the right side of the Values field to edit the properties for the non-reusable On-Success or On-Failure emails. For more information about email properties, see Table A-27 on page 717.

Email PropertiesYou configure email properties for On-Success or On-Failure Emails when you override an existing Email task or when you create a non-reusable email for the session.

Figure A-30. Email Object Browser

Components Tab 715

Figure A-31 displays the dialog box for editing the On-Success or On-Failure email properties:

Table A-26 describes general settings for editing On-Success or On-Failure emails:

Figure A-31. On-Success or On-Failure Email - General Tab

Table A-26. On-Success or On-Failure Emails - General Tab

Email Settings Required/Optional Description

Name Required Enter a name for the email you want to configure.

Description Required Enter a description for the email you want to configure.


Figure A-32 displays the properties for On-Success or On-Failure emails:

Table A-27 describes the email properties for On-Success or On-Failure emails:

Figure A-32. On-Success or On-Failure Email - Properties Tab

Table A-27. On-Success or On-Failure Emails - Properties Tab

Email Properties Required/Optional Description

Email user name Required Required to send On-Success or On-Failure session email. Enter the email address of the person you want the PowerCenter Server to email after the session completes. The email address must be entered in 7-bit ASCII.For success email, you can enter $PMSuccessEmailUser to send email to the user configured for the server variable.For failure email, you can enter $PMFailureEmailUser to send email to the user configured for the server variable.

Email subject Optional Enter the text you want to appear in the subject header.

Email text Optional Enter the text of the email. You can use several variables when creating this text to convey meaningful information, such as the session name and session status. For details, see �Sending Email� on page 319.

Components Tab 717

Metadata Extensions Tab

The Metadata Extensions tab appears in the session property sheet after the Partitions tab.

Figure A-33 displays the Metadata Extensions tab:

The Metadata Extensions tab allows you to create and promote metadata extensions. For information on creating metadata extensions, see “Metadata Extensions” in the Repository Guide.

Table A-28 describes the configuration options for the Metadata Extensions tab:

Figure A-33. Metadata Extensions Tab

Table A-28. Metadata Extensions Tab

Metadata Extensions Tab Options


Extension Name Required Name of the metadata extension. Metadata extension names must be unique in a domain.

Datatype Required The data type: numeric (integer), string, boolean, or XML.


Value Optional Value of the metadata extension.For a numeric metadata extension, the value must be an integer.For a boolean metadata extension, choose true or false.For a string or XML metadata extension, click the button in the Value field to enter a value of more than one line. The Workflow Manager does not validate XML syntax.

Precision Required for string and XML objects

The maximum length for string or XML metadata extensions.

Reusable Required Select to make the metadata extension apply to all objects of this type (reusable). Clear to make the metadata extension apply to this object only (non-reusable).

Description Optional Description of the metadata extension.

Table A-28. Metadata Extensions Tab



Metadata Extensions Tab 719


A p p e n d i x B

Workflow Properties Reference

This appendix contains a listing of settings in the workflow properties. These settings are grouped by the following tabs:


♦ Properties Tab, 724

♦ Scheduler Tab, 726

♦ Variables Tab, 731

♦ Events Tab, 732

♦ Metadata Extensions Tab, 733

721

General Tab

You can change the workflow name and enter a comment for the workflow on the General tab. By default, the General tab appears when you open the workflow properties.

Figure B-1 displays the General tab of the workflow properties:

Table B-1 describes the settings found on the General tab:

Figure B-1. Workflow Properties - General Tab

Table B-1. Workflow Properties - General Tab

General Tab Options


Name Required The name of the workflow.

Comments Optional Optional comment to describe the workflow.

Server Required Select a registered PowerCenter Server when configuring a workflow.

Tasks must run on Server

Optional Requires all workflow tasks to run on the PowerCenter Server that you select.

Suspension Email Optional Select a reusable email task for the suspension email. When a task fails, the PowerCenter Server suspends the workflow and sends the suspension email.For details on suspending workflows, see �Suspending the Workflow� on page 127.

Disabled Optional Select to disable the workflow from the schedule. The PowerCenter Server stops running the workflow until you clear the Disabled option.For details on the Disabled option, see �Disabling Workflows� on page 118.

Select a PowerCenter Server to run the workflow.

Select a suspension email.

722 Appendix B: Workflow Properties Reference

Suspend On Error Optional If selected, the PowerCenter Server suspends the workflow when a task in the workflow fails.For details on suspending workflows, see �Suspending the Workflow� on page 127.

Web Services Optional If selected, you create a service workflow. Click Config Service to configure service information.For more information on creating web services, see the Web Services Provider Guide.

Table B-1. Workflow Properties - General Tab

General Tab Options


General Tab 723

Properties Tab

Configure parameter file name and workflow log options in the Properties tab.

Figure B-2 displays the Properties tab:

Table B-2 describes the settings found on the Properties tab:

Figure B-2. Workflow Properties - Properties Tab

Table B-2. Workflow Properties - Properties Tab

Properties Tab Options


Parameter File Name

Optional Designates the name and directory for the parameter file. Use the parameter file to define workflow parameters. For details on parameter files, see �Parameter Files� on page 511.

Workflow Log File Name

Optional Optionally enter a file name, or a file name and directory.If you leave this field blank, the PowerCenter Server does not create a workflow log. Instead, the PowerCenter Server writes workflow log messages to the server log or Windows Event Log, depending on how you configure the PowerCenter Server.If you fill in this field, the PowerCenter Server appends information in this field to that entered in the Workflow Log File Directory field. For example, if you have "C:\workflow_logs\" in the Workflow Log File Directory field, then enter "logname.txt" in the Workflow Log File Name field, the PowerCenter Server writes logname.txt to the C:\workflow_logs\ directory.


Workflow Log File Directory

Required Designates a location for the workflow log file. By default, the PowerCenter Server writes the log file in the server variable directory, $PMWorkflowLogDir.If you enter a full directory and file name in the Workflow Log File Name field, clear this field.

Save Workflow Log By

Required If you select Save Workflow Log by Timestamp, the PowerCenter Server saves all workflow logs, appending a timestamp to each log.If you select Save Workflow Log by Runs, the PowerCenter Server saves a designated number of workflow logs. Configure the number of workflow logs in the Save Workflow Log for These Runs option.For details on these options, see �Archiving Workflow Logs� on page 459.You can also use the $PMWorkflowLogCount server variable to save the configured number of workflow logs for the PowerCenter Server.

Save Workflow Log For These Runs

Required The number of historical workflow logs you want the PowerCenter Server to save.The Informatica saves the number of historical logs you specify, plus the most recent workflow log. Therefore, if you specify 5 runs, the PowerCenter Server saves the most recent workflow log, plus historical logs 0�4, for a total of 6 logs.You can specify up to 2,147,483,647 historical logs. If you specify 0 logs, the PowerCenter Server saves only the most recent workflow log.

Table B-2. Workflow Properties - Properties Tab

Properties Tab Options


Properties Tab 725

Scheduler Tab

The Scheduler Tab allows you to schedule a workflow to run continuously, run at a given interval, or manually start a workflow. For details on scheduling workflows, see “Scheduling a Workflow” on page 112.

Figure B-3 displays the Scheduler tab:

You can configure the following types of scheduler settings:

♦ Non-Reusable. Choose to create a non-reusable scheduler for the workflow.

♦ Reusable. Choose a reusable scheduler for the workflow.

Figure B-3. Workflow Properties - Scheduler Tab

Edit scheduler settings.


Table B-3 describes the settings found on the Scheduler Tab:

Edit Scheduler SettingsClick the Edit Scheduler Settings button to configure the scheduler. The Edit Scheduler dialog box appears.

Figure B-4 displays the Edit Scheduler dialog box:

Table B-3. Workflow Properties - Scheduler Tab

Scheduler Tab Options Required/Optional Description

Non-Reusable/Reusable Required Indicates the scheduler type. If you select Non Reusable, the scheduler can only be used by the current workflow. If you select Reusable, choose a reusable scheduler. You can create reusable schedulers by selecting Schedulers.

Scheduler Required Choose a set of scheduler settings for the workflow.

Description Optional Enter a description for the scheduler.

Summary N/A Read-only summary of the selected scheduler settings.

Figure B-4. Workflow Properties - Scheduler Tab - Edit Scheduler Dialog Box

Scheduler Tab 727

Table B-4 describes the settings on the Edit Scheduler dialog box:

Table B-4. Workflow Properties - Scheduler Tab - Edit Scheduler Dialog Box


Run Options: Run On Server Initialization/Run On Demand/Run Continuously

Optional Indicates the workflow schedule type.If you select Run On Server Initialization, the PowerCenter Server runs the workflow as soon as the server is initialized.If you select Run On Demand, the PowerCenter Server only runs the workflow when you start the workflow.If you select Run Continuously, the PowerCenter Server starts the next run of the workflow as soon as it finishes the first run.

Schedule Options: Run Once/Run Every/Customized Repeat

Optional Required if you select Run On Server Initialization in Run Options.Also required if you do not choose any setting in Run Options.If you select Run Once, the PowerCenter Server runs the workflow once, as scheduled in the scheduler.If you select Run Every, the PowerCenter Server runs the workflow at regular intervals, as configured.If you select Customized Repeat, the PowerCenter Server runs the workflow on the dates and times specified in the Repeat dialog box.

Edit Optional Required if you select Customized Repeat in Schedule Options. Opens the Repeat dialog box, allowing you to schedule specific dates and times for the workflow to run. The selected scheduler appears at the bottom of the page. For details about the Repeat dialog box, see �Customizing Repeat Option� on page 116.

Start Date Optional Required if you select Run On Server Initialization in Run Options.Also required if you do not choose any setting in Run Options.Indicates the date on which the PowerCenter Server begins scheduling the workflow.

Start Time Optional Required if you select Run On Server Initialization in Run Options.Also required if you do not choose any setting in Run Options.Indicates the time at which the PowerCenter Server begins scheduling the workflow.

End Options: End On/End After/Forever

Optional Required if the workflow schedule is Run Every or Customized Repeat.If you select End On, the PowerCenter Server stops scheduling the workflow in the selected date.If you select End After, the PowerCenter Server stops scheduling the workflow after the set number of workflow runs.If you select Forever, the PowerCenter Server schedules the workflow as long as the workflow does not fail.


Customizing Repeat OptionYou can schedule the workflow to run once, run at an interval, or customize your own repeat option. Click the Edit button on the Edit Scheduler dialog box to configure Customized Repeat options.

Figure B-5 shows the Customized Repeat dialog box:

Table B-5 describes options in the Customized Repeat dialog box:

Figure B-5. Workflow Properties - Customized Repeat Dialog Box

Table B-5. Workflow Properties - Repeat Dialog Box Options


Repeat Every Required Enter the numeric interval you want to schedule the workflow, then select Days, Weeks, or Months, as appropriate.If you select Days, select the appropriate Daily Frequency settings.If you select Weeks, select the appropriate Weekly and Daily Frequency settings.If you select Months, select the appropriate Monthly and Daily Frequency settings.

Weekly Optional Required to enter a weekly schedule. Select the day or days of the week on which you want to schedule the workflow.

Scheduler Tab 729

Monthly Optional Required to enter a monthly schedule. If you select Run On Day, select the dates on which you want the workflow scheduled on a monthly basis. The PowerCenter Server schedules the workflow on the selected dates. If you select a numeric date exceeding the number of days within a given month, the PowerCenter Server schedules the workflow for the last day of the month, including leap years. For example, if you schedule the workflow to run on the 31st of every month, the PowerCenter Server schedules the session on the 30th of the following months: April, June, September, and November.If you select Run On The, select the week(s) of the month, then day of the week on which you want the workflow to run. For example, if you select Second and Last, then select Wednesday, the PowerCenter Server schedules the workflow on the second and last Wednesday of every month.

Daily Required Enter the number of times you would like the PowerCenter Server to run the workflow on any day the session is scheduled.If you select Run Once, the PowerCenter Server schedules the workflow once on the selected day, at the time entered on the Start Time setting on the Time tab.If you select Run Every, enter Hours and Minutes to define the interval at which the PowerCenter Server runs the workflow. The PowerCenter Server then schedules the workflow at regular intervals on the selected day. The PowerCenter Server uses the Start Time setting for the first scheduled workflow of the day.

Table B-5. Workflow Properties - Repeat Dialog Box Options



Variables Tab

Before you can use workflow variables, you must declare them in the Variables tab.

Figure B-6 displays the settings on the Variables tab:

Table B-6 describes the settings found on the Variables Tab:

Figure B-6. Workflow Properties - Variables Tab

Table B-6. Workflow Properties - Variables Tab

Variable Options Required/Optional Description

Name Required The name of the workflow variable.

Datatype Required The datatype of the workflow variable.

Persistent Required Indicates whether the PowerCenter Server maintains the value of the variable from the previous workflow run.

Is Null Required Indicates whether the workflow variable is null.

Default Optional Default value of the workflow variable.

Description Optional Optional details about the workflow variable.

Variables Tab 731

Events Tab

Before you can use the Event-Raise task, declare a user-defined event in the Events tab.

Figure B-7 displays the Events Tab:

Table B-7 describes the settings found on the Events Tab:

Figure B-7. Workflow Properties - Events Tab

Table B-7. Workflow Properties - Events Tab

Events Tab Options


Events Required The name of the event you declare.

Description Optional Optional details to describe the event.


Metadata Extensions Tab

Extend the metadata stored in the repository by associating information with individual repository objects. Create metadata extensions for repository objects by editing the object and then adding the metadata extension to the Metadata Extension tab.

Figure B-8 displays the Metadata Extensions tab:

The Metadata Extensions tab allows you to create and promote metadata extensions. For information on creating metadata extensions, see “Metadata Extensions” in the Repository Guide.

Table B-8 describes the configuration options for the Metadata Extensions tab:

Figure B-8. Workflow Properties - Metadata Extensions Tab

Table B-8. Workflow Properties - Metadata Extensions Tab



Extension Name Required Name of the metadata extension. Metadata extension names must be unique in a domain.

Datatype Required The datatype: numeric (integer), string, boolean, or XML.

Value Optional An optional value.For a numeric metadata extension, the value must be an integer.For a boolean metadata extension, choose true or false.For a string or XML metadata extension, click the Edit button on the right side of the Value field to enter a value of more than one line. The Workflow Manager does not validate XML syntax.

Metadata Extensions Tab 733

Precision Required for string and XML objects

The maximum length for string or XML metadata extensions.

Reusable Required Select to make the metadata extension apply to all objects of this type (reusable). Clear to make the metadata extension apply to this object only (non-reusable).

UnOverride Optional This column appears only if the value of one of the metadata extensions was changed. To restore the default value, click Revert.

Description Optional Optional description of the metadata extension.

Table B-8. Workflow Properties - Metadata Extensions Tab




A p p e n d i x C

Session Properties Comparison Reference

This appendix covers the following topics:

♦ Overview, 736


♦ Source Location Tab, 754

♦ Time Tab, 755

♦ Log and Error Handling Tab, 758

♦ Transformations Tab, 761

♦ Partitions Tab, 762

735

Overview

The Workflow Manager and Workflow Monitor replace the Server Manager in PowerCenter 5.x and PowerMart 5.x. This appendix compares session properties in the Server Manager with session and workflow options in the Workflow Manager. It lists the session properties as they appeared on the session properties in the Server Manager. It then gives the corresponding options in the Workflow Manager.

The session properties for the Server Manager contain the following tabs:

♦ General tab

♦ Source Location tab

♦ Time tab

♦ Log and Error Handling tab

♦ Transformations tab

♦ Partitions tab

736 Appendix C: Session Properties Comparison Reference

General Tab

In the Server Manager, the General tab appeared when you opened the session properties. In the Workflow Manager, the General tab appears when you open the session properties in the Task Developer or the Workflow Designer.

Figure C-1 shows the Server Manager General tab:

In the Server Manager, you configured the following options from the General tab:

♦ General options

♦ Source options

♦ Target options

♦ Session commands

♦ Performance

General OptionsIn the Server Manager, you could configure the Session Name field, Server Name, and the Session Enabled option on the General tab of the session properties.

In the Workflow Manager, these options are on either the General tab of the session properties or in the workflow properties.

Figure C-1. Server Manager General Tab

General Tab 737

Table C-1 compares general session options for the Server Manager with the corresponding options for Workflow Manager:

Source OptionsIn the Server Manager, Source options appeared under the Session Name field on the General tab.

In the Workflow Manager, source options appear under the Sources node on the Mapping tab (Transformations view). The Sources node contains connections, properties, and readers settings.

Table C-2 compares Source options for the Server Manager with the corresponding properties for the Workflow Manager:

Source Options Dialog Box for Flat File SourcesIn the Server Manager, the Source Options dialog box appeared when you clicked Source Options on the General tab and the mapping used file sources.

In the Workflow Manager, most of the source options for file sources appear when you select Properties from the Sources node on the Mapping tab.

Table C-1. General Session Options Comparison

Server Manager General Tab Properties Property Location in Workflow Manager

Session Name General tab-Rename button.

Server Name General tab-Workflow or session properties.

Add Server Button General tab-Workflow or session properties.

Session Enabled General tab-Disable this task. You can only view this property when you edit the session instance from the Workflow Designer.

Table C-2. Source Options Comparison

Server Manager General Tab-Source Options Properties Property Location in Workflow Manager

Source Type Mapping tab-Transformations view-Sources node-Connections settings.

Treat Rows As Properties tab-General Options settings.

Source Options Button Mapping tab-Transformations view-Sources node-Properties settings. Click Set File Properties.

Source Database Mapping tab-Transformations view-Sources node-Connections settings. Click the Edit button in the Value field.


Figure C-2 shows the Server Manager Source Options Dialog Box for File Sources:

Table C-3 compares source options for file sources for the Server Manager with the corresponding options for the Workflow Manager:

Figure C-2. Server Manager Source Options Dialog Box for File Sources

Table C-3. File Source Options Comparison

Server Manager General Tab-Source Options Properties Property Location in Workflow Manager

Source Directory Mapping tab-Transformations view-Sources node-Properties settings.

File Name Mapping tab-Transformations view-Sources node-Properties settings.

File Type Mapping tab-Transformations view-Sources node-Properties settings. Click Set File Properties.

File List Mapping tab-Transformations view-Sources node-Properties settings. Set Source Filetype property to direct or indirect.

FTP File Mapping tab-Transformations view-Sources node-Connections settings. Choose FTP for Type.

Apply to All Files N/A

Edit File Property Button Mapping tab-Transformations view-Sources node-Properties settings. Click Set File Properties.

Edit FTP Property Button Mapping tab-Transformations view-Sources node-Connections settings. Choose FTP for Type. Click the Edit button on the right side of the Value field to edit FTP properties.

General Tab 739

Source Options for Fixed-Width File SourcesIn the Server Manager, the Fixed-Width Properties dialog box appeared when you selected a fixed-width file from the File Source Dialog box and then clicked Edit File Property.

In the Workflow Manager, the Fixed-Width Properties dialog box appears when you click the Set File Properties from the Sources node on the Mapping tab, select Fixed-Width, and then click Advanced.

Figure C-3 shows the Server Manager Fixed-Width Properties dialog box:

Delimited File PropertiesIn the Server Manager, the Delimited File Properties dialog box appeared when you selected a delimited file from the File Source Dialog box and then clicked Edit File Property.

In the Workflow Manager, the Delimited Properties dialog box appears when you click Set File Properties from the Sources node on the Mapping tab, select Delimited, and click Advanced.

Figure C-3. Server Manager Fixed-Width Properties Dialog Box


Figure C-4 shows the Server Manager Delimited File Properties dialog box:

Source Options for XML SourcesIn the Server Manager, the Source Options for XML sources appeared when you clicked Source Options on the General tab and the mapping used XML sources.

In the Workflow Manager, XML source options appear in the Sources node on the Mapping tab when the mapping uses XML sources.

Figure C-5 shows the Server Manager Source Options dialog box for XML sources:

Figure C-4. Server Manager Delimited File Properties Dialog Box

Figure C-5. Server Manager Source Options Dialog Box (XML Sources)

General Tab 741

Table C-4 compares XML source options for the Server Manager with the corresponding options for the Workflow Manager:

FTP PropertiesIn the Server Manager, the FTP Properties dialog box appeared when you edited FTP properties.

In the Workflow Manager, the FTP Connection Editor appears when you choose FTP as the connection type from the Sources tab, click the Edit button on the right side of the Value field, and then click Override to edit the FTP properties.

Figure C-6 shows the Server Manager FTP Properties dialog box:

Table C-4. XML Sources Options Comparison

Server Manager XML Source Options Properties Property Location in Workflow Manager

Source Directory Mapping tab-Transformations view-Sources node-Properties settings.


Code Page Mapping tab-Transformations view-Sources node-Properties settings. Click Set File Properties, and then click Advanced.

File List Mapping tab-Transformations view-Sources node-Properties settings. Set Source Filetype property to direct or indirect.

FTP File Mapping tab-Transformations view-Sources node-Properties settings. Click Set File Properties.

Edit FTP Property Button Mapping tab-Transformations view-Sources node-Connections settings. Choose FTP for Type. Click the Edit button on the right side of the Value field to edit FTP properties.

Figure C-6. Server Manager FTP Properties Dialog Box


Table C-5 compares FTP properties for the Server Manager with the corresponding options for the Workflow Manager:

Source Options for Relational SourcesIn the Server Manager, the Source options dialog box for relational sources appeared when you clicked Source Options on the General tab and the mapping used relational sources.

In the Workflow Manager, enter a prefix for each source table in the Owner Name field on the Mapping tab-Transformations view-Sources node-Properties settings.

Target OptionsIn the Server Manager target options appeared on the General tab. In the target options, you could select the target type for the session, configure reject file names, and create database connection session parameters in the target options.

In the Workflow Manager, the Mapping tab-Transformations view-Targets node contains connections, properties, and writers settings.

Table C-6 compares target options for the Server Manager with the corresponding options for Workflow Manager:

Table C-5. FTP Properties Comparison

Server Manager FTP Properties Property Location in Workflow Manager

Connection Name Mapping tab-Transformations view-Sources node-Connections settings.Click the Edit button on the right side of the Value field. Choose FTP for Type. Click the Edit button on the right side of the Value field to edit FTP properties. Select an FTP connection.

Remote File Name Mapping tab-Transformations view-Sources node-Connections settings.Click the Edit button on the right side of the Value field. Choose FTP for Type. Click the Edit button on the right side of the Value field to edit FTP properties. Click Override in the FTP Object Browser.

Stage the FTP Data Mapping tab-Transformations view-Sources node-Connections settings.Click the Edit button on the right side of the Value field. Choose FTP for Type. Click the Edit button on the right side of the Value field to edit FTP properties. Click Override in the FTP Object Browser.

Table C-6. Target Options Comparison

Server Manager General Tab-Target Table Properties Property Location in Workflow Manager

Target Type Mapping tab-Transformations view-Targets node-Writers settings.

Target Options Button Properties in the Target Options dialog box are located on the Mapping tab-Transformations view-Targets node-Properties settings.

General Tab 743

Relational Target OptionsIn the Server Manager, the Targets dialog box appeared when you selected a relational target type and clicked Target Options on the General tab.

In the Workflow Manager, the target options for relational targets appear when you select the Mapping tab.

Figure C-7 shows the Server Manager Targets dialog box:

Table C-7 compares relational target options for the Server Manager with the corresponding options for the Workflow Manager:

Reject Options Button Properties in the Rejects Options dialog box are located on the Mapping tab-Transformations view-Targets node-Properties settings.

Target Database Mapping tab-Transformations view-Targets node-Connections settings.Click the Edit button on the right side of the Value field to choose a target connection.

Figure C-7. Server Manager Targets Dialog Box

Table C-7. Relational Target Options Comparison

Server Manager General Tab-Target Table Options Properties Workflow Manager Property Location

Insert Mapping tab-Transformations view-Targets node-Properties settings.

Update (as update) Mapping tab-Transformations view-Targets node-Properties settings.

Update (as insert) Mapping tab-Transformations view-Targets node-Properties settings.

Table C-6. Target Options Comparison

Server Manager General Tab-Target Table Properties Property Location in Workflow Manager


Output FilesIn the Server Manager, the Output Files dialog box appeared when you selected a file target type, then clicked Target Options on the General tab.

In the Workflow Manager, output file target options appear on the Mapping tab-Transformations view. The Targets node contains connections, properties, and writer settings.

Figure C-8 shows the Server Manager Output Files dialog box:

Update (else insert) Mapping tab-Transformations view-Targets node-Properties settings.

Delete Mapping tab-Transformations view-Targets node-Properties settings.

Truncate Table Mapping tab-Transformations view-Targets node-Properties settings.

Normal/Bulk Mapping tab-Transformations view-Targets node-Properties settings. Choose Normal or Bulk for Target Load Type.

Test Load Properties tab-General Options settings.

Number of Rows To Test Properties tab-General Options settings.

Figure C-8. Server Manager Output Files Dialog Box

Table C-7. Relational Target Options Comparison

Server Manager General Tab-Target Table Options Properties Workflow Manager Property Location

General Tab 745

Table C-8 compares output file options for the Server Manager with the corresponding options for the Workflow Manager:

External Loader PropertiesIn the Server Manager, the External Loader Properties dialog box appeared when you used the Loader option on the Targets Options dialog box, and then clicked Edit Object Properties to select the external loader you wanted the PowerCenter Server to use.

In the Workflow Manager, the External Loader Properties dialog box appears when you choose External Loader from the Targets node Connections settings on the Mappings tab, and then click the Edit button on the right side of the Value field.

Table C-8. File Target Output Options Comparison

Server Manager General Tab-Output Files Properties Workflow Manager Property Location

Directory Mapping tab-Transformations view-Targets node-Properties settings.

File Name Mapping tab-Transformations view-Targets node-Properties settings.

FTP file Mapping tab-Transformations view-Targets node-Connections settings.

Loader Mapping tab-Transformations view-Targets node-Connections settings.

Edit Object Properties Mapping tab-Transformations view-Targets node-Connections settings. Choose the connection type, and then click the Edit button on the right side of the Value field.

Fixed Width/Delimited Mapping tab-Transformations view-Targets node-Connections settings. Click Set File Properties.

Edit Null Character Button Mapping tab-Transformations view-Targets node-Connections settings. Click Set File Properties. Choose Fixed-Width and click the Advance button.

Edit Delimiter Button Mapping tab-Transformations view-Targets node-Connections settings. Click Set File Properties. Choose Delimited and click the Advance button.

Number of Rows To Test Properties tab-General Options settings.

Merge Targets For Partitioned Sessions Mapping tab-Transformations view-Targets node-Properties settings.


Figure C-9 shows the Server Manager External Loader Properties dialog box:

Fixed-Width PropertiesIn the Server Manager, the Fixed-Width dialog box appeared when you configured a session to write to a fixed-width target file, and then clicked Edit Null Character.

In the Workflow Manager, you can access the Fixed-Width Properties dialog box from the Properties settings of the Mappings tab. Click Set File Properties, and select Fixed-Width.

Figure C-10 shows the Server Manager Fixed-Width dialog box:

Delimited File PropertiesIn the Server Manager, the Delimited File Properties dialog box appeared when you configured a session to write to a delimited target file, then clicked Edit Delimiter.

In the Workflow Manager, you can access the Delimited Properties dialog box from the Properties settings of the Mappings tab. Click Set File Properties, and select Delimited.

Figure C-9. Server Manager External Loader Properties

Figure C-10. Server Manager Fixed-Width Dialog Box (Output Files)

General Tab 747

Figure C-11 shows the Server Manager Delimited File Properties dialog box:

XML TargetsIn the Server Manager, the XML Target dialog box appeared when you selected an XML file target type, then clicked Target Options.

In the Workflow Manager, you can access the XML Target dialog box from the Properties settings of the Mappings tab. Click Set File Properties.

Figure C-12 shows the Server Manager XML Target dialog box:

Table C-9 compares XML target options for the Server Manager with the corresponding options for Workflow Manager:

Figure C-11. Server Manager Delimited File Properties Dialog Box (Output Files)

Figure C-12. Server Manager XML Target Dialog Box

Table C-9. XML Target Options Comparison

Server Manager General Tab-XML Target Properties Workflow Manager Property Location

Directory Mapping tab-Transformations view-Targets node-Properties settings.



Reject FilesIn the Server Manager, the Reject Files dialog box appeared when you clicked Reject Options on the General tab.

In the Workflow Manager, the reject file options appear in the Targets node Properties settings on the Mapping tab.

Figure C-13 shows the Server Manager Reject File dialog box:

Table C-10 compares Reject Files options for the Server Manager with the corresponding options for Workflow Manager:

Code Page Mapping tab-Transformations view-Targets node-Properties settings. Click Set File Properties, and then click Advanced.

FTP File Mapping tab-Transformations view-Targets node-Properties settings. Click Set File Properties.

Edit Object Properties Mapping tab-Transformations view-Targets node-Connections settings. Choose FTP for Type. Click the Edit button on the right side of the Value field to edit FTP properties.

Figure C-13. Server Manager Reject File Dialog Box

Table C-10. Reject Files Options Comparison

Server Manager General tab-Reject File Properties Workflow Manager Property Location

Reject File Directory Mapping tab-Transformations view-Targets node-Properties settings.

File Name Mapping tab-Transformations view-Targets node-Properties settings.

Table C-9. XML Target Options Comparison

Server Manager General Tab-XML Target Properties Workflow Manager Property Location

General Tab 749

Session CommandsIn the Server Manager, session commands appeared under the Server Name field on the General tab. You could enter pre-session shell commands, post-session commands and separate email messages if the session succeeded or failed.

In the Workflow Manager, session commands appear on the Components tab.

Pre-Session CommandsIn the Server Manager, the Pre-Session Commands dialog box appeared when you clicked Pre-Session on the General tab of the session properties.

In the Workflow Manager, pre-session command options appear on the Components tab.

Figure C-14 shows the Server Manager Pre-Session Commands dialog box:

Table C-11 compares session command options for the Server Manager with the corresponding options for the Workflow Manager:

Post-Session Commands and EmailIn the Server Manager, the Post-Session Commands and Email dialog box appears when you click Post-session And Email on the General tab of the session properties.

In the Workflow Manager, post-session commands and email options appear on the Components tab.

Figure C-14. Server Manager Pre-Session Commands Dialog Box

Table C-11. Pre-Session Commands Comparison

Server Manager General Tab-Session Commands Pre-Session Properties Workflow Manager Property Location

Description Components tab. Click the Edit button on the right side of the Value field for Pre-Session Commands. Enter the description in the General tab of the Edit Pre-Session Commands dialog box.

Command Components tab. Click the Edit button on the right side of the Value field for Pre-Session Commands. Enter the command in the Command tab of the Edit Pre-Session Commands dialog box.


Figure C-15 shows the Server Manager Post-Session Commands and Email dialog box:

Table C-12 compares post-session command and email options for the Server Manager with the corresponding options for the Workflow Manager:

Figure C-15. Server Manager Post-Session Commands and Email

Table C-12. Post-Session Commands and Email Comparison

Server Manager General Tab-Post-Session Commands And Email Properties

Workflow Manager Property Location

Description Components tab. Click the Edit button on the right side of the Value field for Post-Session Commands. Enter the description in the General tab of the Edit Post-Session Commands dialog box.

Command Components tab. Click the Edit button on the right side of the Value field for Post-Session Commands. Enter the command in the Command tab of the Edit Post-Session Commands dialog box.

Success Components tab-On Success Email.

Failure Components tab-On Failure Email.

Email User Name Components tab. Click the Edit button on the right side of the Value field for On Success Email or On Failure Email. Enter the email user name in the Properties tab of the Edit Success Email or Edit Failure Email dialog box.

Email Subject Components tab. Click the Edit button on the right side of the Value field for On Success Email or On Failure Email. Enter the email subject in the Properties tab of the Edit Success Email or Edit Failure Email dialog box.

Email Text Components tab. Click the Edit button on the right side of the Value field for On Success Email or On Failure Email. Enter the email text in the Properties tab of the Edit Success Email or Edit Failure Email dialog box.

General Tab 751

Performance OptionsIn the Server Manager, Performance options appeared under Session Commands on the General tab. In Performance options you could increase memory size, selected performance details, and set configuration parameters. In the Workflow Manager, Performance options appear on the Properties tab in the session properties.

Table C-13 compares performance options for the Server Manager with the corresponding options for the Workflow Manager:

Configuration ParametersIn the Server Manager, the Configuration Parameters dialog box appeared when you clicked Advanced Options on the General tab. In the Configuration Parameters dialog box, you could configure the DTM memory parameters, general parameters, reader parameters, and event-based scheduling.

In the Workflow Manager, the configuration parameters options appear on multiple tabs.

Figure C-16 shows the Server Manager Configuration Parameter dialog box:

Table C-13. Performance Options Comparison

Server Manager General Tab-Performance Properties Workflow Manager Property Location

DTM Buffer Pool Size Properties tab-Performance settings.

Collect Performance Data Properties tab-Performance settings.

Advanced Options button Config Object tab, Mapping tab, and Properties tab.

Figure C-16. Server Manager Configuration Parameter Dialog Box


Table C-14 compares configuration parameters for the Server Manager with the corresponding options for the Workflow Manager:

Table C-14. Configuration Parameters Comparison

Server Manager Advanced Option Properties Workflow Manager Property Location

Default Buffer Block Size Config Object tab-Advanced settings.

Index Cache Size Mapping tab-Transformations view-Transformations node-Properties settings for Aggregator, Joiner, Lookup, Rank transformations.

Data Cache Size Mapping tab-Transformations view-Transformations node-Properties settings for Aggregator, Joiner, Lookup, Rank transformations.

Line Sequential Buffer Length Config Object tab-Advanced settings.

Source Based Commit Interval Properties tab-General settings.

Target Based Commit Interval Properties tab-General settings.

Commit Interval Properties tab-General settings.

Enable Decimal Arithmetic Properties tab-Performance settings. The option name is Enable High Precision.

Constraint Based Loading Config Object tab-Advanced settings.

Cache LOOKUP( ) Function Config Object tab-Advanced settings.

Event-Based Scheduling-Indicator File To Wait For

Event Wait Task-Events tab-Pre Defined Event. Enter the name of the file to watch.

General Tab 753

Source Location Tab

In the Server Manager, the Source Location tab displays when you created a heterogeneous session. In the Source Name field, you could optionally edit the source database listed for each relation source.

In the Workflow Manager, source database information displays in the Connections settings of the Sources node on the Mapping tab.

Figure C-17 shows the Server Manager Source Location tab:

Figure C-17. Server Manager Source Location Tab


Time Tab

In the Server Manager, the Time tab appeared after the General tab unless the session was heterogeneous. If the session was heterogeneous, the Time tab appeared after the Source Location tab.

In the Workflow Manager, the Schedule tab contains workflow scheduling options. To configure reusable scheduler options, select Workflows-Schedulers from the menu. To configure non-reusable schedule options, select Edit-Workflow to open workflow properties and click the Schedule tab.

Figure C-18 shows the Server Manager Time tab:

In the Server Manager, you configured the following options from the Time tab:

♦ Schedule options

♦ Start options

♦ Duration options

♦ Batch option

Schedule OptionsIn the Server Manager, you used the Schedule options on the Time tab of the session properties to schedule the frequency of a session run.

Figure C-18. Server Manager Time tab

Time Tab 755

In the Workflow Manager, you use the Run Options and Schedule Options on the Schedule tab of the Scheduler properties to schedule the frequency of a workflow run.

Repeat OptionsIn the Server Manager, the Repeat dialog box appeared when you selected Customized Repeat, then clicked Edit on the Time tab.

In the Workflow Manager, the Customized Repeat dialog box appears when you schedule a session to run on server initialization, select Customized Repeat, and then click Edit.

Figure C-19 shows the Server Manager Repeat dialog box:

Start OptionsIn the Server Manager, the Start options appeared below the Schedule options on the Time tab. In the Start options, you could select the session start date and session start time.

In the Workflow Manager, the Start options appear on the Schedule tab of the workflow properties.

Duration OptionsIn the Server Manager, Duration options appeared next to Start options on the Time tab. In Duration options, you could set the end date of a session run, the number of session runs, or schedule a session to run forever as long as it was successful.

In the Workflow Manager, End options appear next to Start options on the Scheduler tab of the workflow properties.

Figure C-19. Server Manager Repeat Dialog Box


Use Absolute Time OptionIn the Server Manager, Use Absolute Time option, or Batch option, appeared under Start and Duration options on the Time tab. You could use Use Absolute Time option to use the schedule as set in the session.

In the Workflow Manager, Use Absolute Time appears on the Schedule tab of the Timer object.

Time Tab 757

Log and Error Handling Tab

In the Server Manager, the Log and Error Handling tab appeared after the Time tab on the session properties.

In the Workflow Manager, log and error handling options appear on the Properties and Config Object tabs on the session properties.

Figure C-20 shows the Server Manager Log and Error Handling tab:

In the Server Manager, on the Log and Error Handling tab you could configure the following options:

♦ Log File options

♦ Parameter File option

♦ Batch Handling option

♦ Error Handling options

Log File OptionsIn the Server Manager Log File options appeared at the top of the Log and Error Handling tab. You could enter a session log variable, enter a file name for the session, or indicate how session logs should be archived.

In the Workflow Manager, Log File options appear on the Properties and Config Object tabs.

Figure C-20. Server Manager Log and Error Handling Tab


Table C-15 compares the Log File options for Server Manager with the corresponding options for the Workflow Manager:

Parameter File OptionIn the Server Manager, the Parameter File option appeared beneath the log file options on the Log and Error File tab. You could use the Parameter File option to designate a name and directory for a parameter file.

In the Workflow Manager, the Parameter File option appears on the Properties tab-General Options settings.

Batch Handling OptionIn the Server Manager, the Batch Handling option appears under the Parameter File option on the Log and Error Handling tab.

In the Workflow Manager, use link conditions in the Workflow Designer for a task to run based on the success or failure of the previous task.

Error Handling OptionsIn the Server Manager, Error Handling options appeared below the Parameter File option. In the Workflow Manager, Error handling options appear on the Config Object tab.

Table C-16 compares the Error Handling options for Server Manager with the corresponding options for the Workflow Manager:

Table C-15. Log File Options Comparison

Server Manager Log and Error Properties Property Location in Workflow Manager

Server Path to Log Files Properties tab-General Options settings. Enter the path in Session Log File Directory.

Session Log File Properties tab-General Options settings. Enter the log file name in Session Log File Name.

Save the Session Log From the Last <number> Session Runs

Config Object tab-Log Options settings.

Save Session Log By Timestamp Config Object tab-Log Options settings.

Table C-16. Error Handling Options Comparison

Server Manager Log and Error Handling Properties Property Location in Workflow Manager

Stop On Config Object tab-Error handling settings.

Perform Recovery Config Object tab-Error handling settings.

Log and Error Handling Tab 759

Override Tracing Config Object tab-Error handling settings.

Log and Error Handling tab-On pre-session command errors-Stop session/Continue session

Config Object tab-Error handling settings.

Log and Error Handling tab-On stored procedure errors-Stop session/Continue session

Config Object tab-Error handling settings.

Table C-16. Error Handling Options Comparison

Server Manager Log and Error Handling Properties Property Location in Workflow Manager


Transformations Tab

In the Server Manager, the Transformations tab appeared on the session properties after the Log and Error Handling tab.

In the Workflow Manager, the settings for transformations appear on the Mapping tab-Transformations view.

Figure C-21 shows the Server Manager Transformations tab:

Table C-17 compares the Transformations tab options for Server Manager with the corresponding options for the Workflow Manager:

Figure C-21. Server Manager Transformations Tab

Table C-17. Transformations Tab Options Comparison

Server Manager Transformations Tab Properties Property Location in Workflow Manager

Session Level Override Transformations Mapping tab-Transformations view-Transformations node.

Aggregate Behavior Properties tab-Performance settings.

Deadlock Behavior-Retry Session On Deadlock

Properties tab-Performance settings.

Sort Order Properties tab-Performance settings.

Transformations Tab 761

Partitions Tab

In the Server Manager, the Partitions tab appeared in the session properties after the Transformations tab.

In the Workflow Manager, the settings for partitioning appear on the Mapping tab-Partitions view. For more information about partitioning, see “Configuring Partitioning Information” on page 351.


I n d e x

AABORT function

See also Transformation Language Referencesession failure 200

aborted status 421aborting

Control tasks 147server handling 129sessions 130status 421tasks 129tasks in Workflow Monitor 418workflows 129

Aborttaskpmcmd syntax 596

Abortworkflowpmcmd syntax 597

absolute timespecifying 162Timer task 161

active sourcesconstraint-based loading 248defined 259generating commits 278row error logging 260source-based commit 278transaction generators 259XML targets 259

addingtasks 92

advanced settingssession properties 675

aggregate cachescalculating the data cache 622calculating the index cache 621overview 621reinitializing 576, 674

aggregate filesdeleting 577moving 577

aggregate function callsminimizing 652

Aggregator transformationcache options 621cache partitioning 621caches 26, 34data cache 622index cache 621optimizing performance 650optimizing with Sorted Input 651partitioning guidelines 347performance detail 639

allocating memoryXML sources 655

AND links 137archiving

session logs 471

763

workflow logs 459arrange

workflows vertically 40workspace objects 71

ASCII modeSee also Installation and Configuration GuideSee also Unicode modeoverview 27performance 661session behavior 16

assigningPowerCenter Servers 122, 198

Assignment taskscreating 140definition 140description 132using expression editor 96variables in 103

B$BadFile

definition 508naming convention 496, 520using 509

blockingdefinition 23

blocking source dataPowerCenter Server handling 23

buffer block sizeconfiguring 677optimizing 655, 657

buffer memoryallocating 655buffer blocks 25DTM process 25

bulk loadingcommit interval 253data driven session 252DB2 642DB2 guidelines 253Oracle 643Oracle guidelines 253session properties 252, 697Sybase IQ 643targets 642test load 244using user-defined commit 283

Ccache files

locating 577naming convention 615permissions 28

cache partitioningAggregator transformation 621described 359incremental aggregation 621Joiner transformation 624Lookup transformation 391Rank transformation 620

cachesAggregator transformation 621calculating Aggregator data cache 622calculating Aggregator index cache 621calculating Joiner data cache 626calculating Joiner index cache 625calculating Lookup data cache 631calculating Lookup index cache 629calculating Rank data cache 633calculating Rank index cache 632default directory 34files for index and data 614files, overview 34Joiner transformation 624Lookup transformation 628memory 26, 614memory usage 26optimizing 658overview 28, 614resetting with real-time sessions 288session cache files 614transformation 34

cachinglookup functions 676

Char datatypesremoving trailing blanks for optimization 653

check point intervaloptimizing 642

checking inversioned objects 74

checking out versioned objects 74COBOL sources

error handling 227numeric data handling 229

code page compatibilitySee also Installation and Configuration Guidemultiple file sources 230targets 235

764 Index

code pagesSee also Installation and Configuration Guidedata movement modes 27database connections 54, 234delimited source 224delimited target 267, 703external loader files 524fixed-width sources 222fixed-width target 266, 702relaxed validation 55validation 12viewing the session log 475

colorsetting 42workspace 42

command line mode for pmcmdconnecting 589return codes 590using 589

command line program See pmcmdCommand task

multiple UNIX commands 145Command tasks

creating 143definition 143description 132executing commands 145promoting to reusable 145Run if Previous Completed 145using server variables 188, 193using session parameters 143

commentsadding in Expression Editor 97

commit intervalbulk loading 253configuring 292description 276optimizing 655, 658source- and target-based 276

commit sourcesource-based commit 278

commit typeconfiguring 672

committing datatarget connect groups 278transaction control 283

common logicfactoring 652

comparing objectsSee also Designer GuideSee also Repository Guide

sessions 79tasks 79workflows 79worklets 79

Components tabproperties 710

concurrent connectionsin partitioned pipelines 379

Config Object tabproperties 675

configuringerror handling options 493

connect stringexamples 54syntax 54

connection objectsSee also Repository Guideassigning permissions 51definition 51deleting 59

connection settingsapplying to all session instances 180targets 695

connectionscopy as 59, 60copying a relational database connection 59external loader 551FTP 561multiple targets 274relational database 56replacing a relational database connection 62sources 211targets 237

connectivitySee also Installation and Configuration Guideconnect string examples 54overview 5server grids 447

constraint-based loadingactive sources 248configuring 248enabling 251key relationships 248session property 676target connection groups 249Update Strategy transformations 249

control fileoverriding Teradata 539overview 33permissions 28

Index 765

Control tasksdefinition 147description 132options 148stopping or aborting the workflow 129

copyingrepository objects 77

countersBufferInput_efficiency 640BufferOutput_efficiency 640overview 437Rowsinlookupcache 639Transformation_errorrows 639Transformation_readfromdisk 639Transformation_writetodisk 639

CPU usagePowerCenter Server 24

creatingexternal loader connections 551FTP sessions 565server grids 451sessions 175workflows 91

CUMEpartitioning restrictions 395

Custom transformationpartitioning guidelines 396

customized repeatdaily 117editing 115monthly 117options 116repeat every 117weekly 117

Ddata

capturing incremental source changes 574, 579data caches

Aggregator transformation 622description 614for incremental aggregation 577memory usage 26optimizing 655, 658Rank transformation 633

data drivenbulk loading 252

data filescreating directory 579

finding 577data flow

See pipelinedata movement mode

See also ASCII modeSee also Installation and Configuration GuideSee also Unicode modeaffecting incremental aggregation 577overview 27

database connectionsSee also Installation and Configuration Guideconfiguring 56copying a relational database connection 59domain name 58packet size 58privileges required to create 53replacing a relational database connection 62rollback segment 58session parameter 499use trusted connection 58using Oracle OS Authentication 53

databasesconnection requirements 57connectivity overview 46environment SQL 55optimizing sources 645optimizing targets 642selecting code pages 54setting up connections 53

datatypesSee also Designer GuideChar 653Decimal 269Double 269Float 269Integer 269minimizing conversions 648Money 269Numeric 269padding bytes for fixed-width targets 268Real 269Varchar 653

datesconfiguring 38formats 38

DB2bulk loading 642bulk loading guidelines 253commit interval 253See IBM DB2

766 Index

$DBConnectiondefinition 499naming convention 496, 520using 499

deadlockretry session 674

deadlock retrySee also Installation and Configuration Guideconfiguring 246target connection groups 257

Debuggerrestrictions in partitioned pipelines 396

decimal arithmeticSee high precision

Decision taskscreating 151decision condition variable 149definition 149description 132example 149using Expression Editor 96variables in 103

DECODE functionSee also Transformation Language Referenceusing for optimization 653

default remote directoriesfor FTP connections 561

deletingconnection objects 59servers 50workflows 97

delimited flat filescode page 691code page, sources 224code page, targets 267consecutive delimiters 692escape character 691escape character, sources 224numeric data handling 229quote character 691quote character, sources 224quote character, targets 267session properties, sources 222session properties, targets 266sources 691

delimited sourcesnumber of rows to skip 692

delimited targetssession properties 703

delimitersession properties, sources 222

session properties, targets 266description

repository objects 73directories

for historical aggregate data 579server defaults 46server variables 46workspace file 41

disabledstatus 421

disablingtasks 137workflows 118

displayingcustomizing windows 69date time format 38Expression Editor 97fonts 42options 39servers in Workflow Monitor 406show solid lines for links 42toolbars 69workspace color 42

documentationconventions xlixdescription xlviiionline xlix

domain name 58dropping

indexes 248DTM (Data Transformation Manager)

buffer memory 25overview 3post-session email 10process 7, 11running sessions and workflows 7transformation statistics example 469

DTM Buffer Pool Sizeoptimizing 655session property 674tuning 656

Eedit

delimiter 690edit null characters

session properties 702editing

delimiter 702

Index 767

session privileges 178sessions 177

emailattaching files 333, 342configuring a user on Windows 322, 342configuring the PowerCenter Server on UNIX 321configuring the PowerCenter Server on Windows 322distribution lists 326email variables 333format tags 333logon network security on Windows 325MIME format 320multiple recipients 326on failure 332on success 332overview 320post-session 332rmail 321server variables 333session properties 714specifying a Microsoft Outlook profile 327suspending workflows 339text message 328tips 342user name 328using other mail programs 343using server variables 333Windows service startup account 322workflows 341worklets 341

Email taskscreating 329description 132overview 328See also email 328suspension email 128

email variablesoverview 333

Enable Past Events option 159enabling enhanced security 44end of file

transaction control 284end options

end after 116end on 116forever 116

enhanced securityenabling 44enabling for connection objects 44

environment SQLconfiguring 55

guidelines for entering 55environment variables

PM_CODEPAGENAME 585PM_HOME 587PMTOOL_DATEFORMAT 585repository username and password 586

error handling 186COBOL sources 227error log files 489fixed-width file 227options 493overview 201PMError_MSG table schema 485PMError_ROWDATA table schema 483PMError_Session table schema 486pre- and post-session SQL 186settings 679transaction control 284

error logoptions 494session errors 201

error log files 489error log tables

creating 483overview 483

error loggingoverview 482

error logsmessages 29

error messagesexternal loader 527

error threshold$PMSessionErrorThreshold 47pipeline partitioning 200stop on errors 200

errorsSee also Troubleshooting Guideeliminating to improve performance 648fatal 200minimizing tracing level to improve performance 659pre-session shell command 193stopping on 679threshold 200validating in Expression Editor 97

Event-Raise tasksconfiguring 155declaring user-defined event 155definition 153description 132in worklets 167

768 Index

eventsin worklets 167pre-defined events 153user-defined events 153

Event-Wait tasksdefinition 153description 132for pre-defined events 158for user-defined events 157waiting for past events 159working with 156

Expression Editoradding comments 97displaying 97syntax colors 97using 96validating 119validating expressions using 97

expressionsoptimizing 652validating 97

external loaderbehavior 526code page 524connections 551DB2 528error messages 527loading multibyte data 533, 535on Windows systems 526Oracle 533overview 524performance 643permissions 525PowerCenter Server support 524privileges required to create connection 525session properties 682, 695setting up Workflow Manager 553Sybase IQ 535Teradata 538using with partitioned pipeline 380

External Procedure transformationSee also Designer Guidepartitioning guidelines 396

Ffail parent workflow 138failed status 421failing workflows

failing parent workflows 148

using Control task 148fatal errors

session failure 200file list

creating for multiple sources 230creating for partitioned sources 375using for source file 230

file serverfor multiple PowerCenter Servers 445setting up for multiple servers 445

file sourcesnumeric data handling 229partitioning 374server handling 226, 229session properties 218

file targetspartitioning 380session properties 261

filter conditionsin partitioned pipelines 372

filteringdeleted tasks in Workflow Monitor 406servers in Workflow Monitor 406tasks in Gantt Chart view 405tasks in Task View 431

filtersoptimizing 650

finding objectsWorkflow Manager 70

fixed-width filescode page 689code page, sources 222code page, targets 266error handling 227multibyte character handling 227null character 689null characters, sources 222null characters, targets 266numeric data handling 229padded bytes in fixed-width targets 268source session properties 220target session properties 265writing to 268, 269

fixed-width sourcessession properties 689

fixed-width targetssession properties 702

flat file definitionsescape character, sources 224PowerCenter Server handling, targets 268quote character, sources 224

Index 769

quote character, targets 267session properties, sources 218session properties, targets 261

flat filesSee also Designer Guidecode page, sources 222code page, targets 266delimiter, sources 224delimiter, targets 267increasing performance 660multibyte data 270null characters, sources 222null characters, targets 266numeric data handling 229output file session parameter 504output files 33precision 270precision, targets 269shift-sensitive target 271source file session parameter 502

fontssetting 42

format optionschanging the font 42color 42configuring 42date and time 38reset all 42schedule 38show solid lines for links 42Timer task 38

FTP (File Transfer Protocol)accessing source files 565accessing target files 568connecting to file targets 380connection names 561connection options 563creating a session 565defining connections 561defining default remote directory 561defining host names 561mainframe restrictions 560overview 560privileges required to create connections 562session properties 682, 695

functionsSee also Transformation Language Referenceminimizing for optimization 653

GGantt Chart

configuring 411filtering 405listing tasks and workflows 424navigating 425opening and closing folders 407organizing 425overview 402searching 427using 423zooming 426

general optionsarranging workflow vertically 40configuring 39in-place editing 40launching Workflow Monitor 41open editor 41panning windows 40receive notification from server 41reload task or workflow 40session properties 668show expression on a link 41show full name of task 41

General tab in session propertiesFTP properties 742in Server Manager 737in Workflow Manager 668session commands 750source options 738target options 743

General tab of session propertiesgeneral options 737performance options 752

generatingcommits with source-based commit 278

Getrunningsessionsdetailspmcmd syntax 598

Getserverdetailspmcmd syntax 599

Getserverpropertiespmcmd syntax 599

Getsessionstatisticspmcmd syntax 600

Gettaskdetailspmcmd syntax 601

Getworkflowdetailspmcmd syntax 601

globalizationSee also Installation and Configuration Guide

770 Index

database connections 234overview 234targets 234

Hhash partitioning

adding hash keys 362hash auto-keys partitioning 361hash user keys partitioning 362overview 348, 361

Helppmcmd syntax 602

heterogeneous sourcesdefined 208

heterogeneous targetsoverview 274

high precisiondisabling 658enabling 674handling 204optimizing 655

history namesin Workflow Monitor 419

host namesfor FTP connections 561registering the PowerCenter Server 49

IIBM DB2

connect string example 54icon

Workflow Monitor 404worklet validation 171

IIF expressionsSee also Transformation Language Referenceoptimizing 653

incremental aggregationSee also Installation and Configuration Guidecache partitioning 621changing server code page 577changing server data movement mode 577changing session sort order 577configuring 674configuring the session 579deleting files 577files 34moving files 577overview 574

partitioning data 578performance 651preparing to enable 579processing 575reinitializing cache 576

incremental changescapturing 579

index cachesAggregator transformation 621description 614for incremental aggregation 577memory usage 26optimizing 655, 658Rank transformation 632

indexescreating directory 579dropping for target tables 248finding 577optimizing by dropping 642recreating for target tables 248

indicator filesdescription 33pre-defined events 156session output 33

Informaticadocumentation xlviiiWebzine l

Informixconnect string syntax 54row-level locking 379

in-place editing 40$InputFile

definition 502naming convention 496, 520using 503, 507

interactive mode for pmcmdconnecting 592setting defaults 592

Jjoiner cache

overview 624Joiner transformation

cache partitioning 624caches 26, 34, 624joining sorted flat files 385joining sorted relational data 387optimizing 651optimizing performance 650

Index 771

partitioning guidelines 396performance detail 639threads created 19

Kkey constraints

optimizing by dropping 642key range partitioning 348, 363keys

constraint-based loading 248

Llaunch

Workflow Monitor 41, 404line sequential buffer length

configuring 677sources 225

linksAND 137condition 92example link condition 94linking tasks concurrently 93linking tasks sequentially 94loops 92OR 137show expression on a link 41show solid lines 42specifying condition 94using Expression Editor 96variables in 103working with 92

List Tasksin Workflow Monitor 424

Load Managercreating log files 11memory usage 24overview 3parameters 25post-session email 10process 7, 8running sessions and workflows 7scheduling workflows 8validating code pages 12

load summarysessions 467

local variablesreplacing sub-expressions 652

Log and Error Handling tabbatch handling option 759error handling option 759log file options 758parameter file option 759Server Manager session properties 758

log filesSee session logs, workflow logsSee also Installation and Configuration Guideeditor for Workflow Monitor 410server variable for 46session log 671

log optionssettings 677

logsserver 28session 31workflow 30

lookup cachecalculating size 629, 631overview 628persistent 35pipeline partitioning 628ports included 628session property 676

lookup cachesSee also Designer Guideenabling 649query created 628

LOOKUP functionSee also Transformation Language Referenceminimizing for optimization 653

Lookup SQL Override optionreducing cache size 649

Lookup transformation See also Designer Guidecache partitioning 391caches 26, 34, 628calculating cache size 628, 629, 631enabling caching 649optimizing 639, 649optimizing lookup condition 649optimizing multiple lookup expressions 650optimizing with indexing 649

loops in workflow 92

Mmapping bottlenecks

identify 638

772 Index

mapping parametersSee also Designer Guidein session properties 203overriding 203

mapping threadsdescription 14

mapping variablesSee also Designer Guidein partitioned pipelines 394

mappingsdefinition 2factoring common logic 652identify bottlenecks 638increasing performance 636single-pass reading 647

master servers 446master thread

description 14Maximum Days

Workflow Monitor 410maximum sessions

See also Installation and Configuration Guideparameter, description 25

Maximum Workflow RunsWorkflow Monitor 410

memorycaches 614DTM buffer 25increasing to avoid paging 662

merge target filessession properties 699

merging target files 380, 382message queue

using with partitioned pipeline 380metadata extensions

creating 82deleting 85editing 84overview 82session properties 718

Microsoft Accesspipeline partitioning 379

Microsoft Outlookconfiguring an email user 322, 342configuring the PowerCenter Server 322

Microsoft SQL Serverbulk loading 642commit interval 253connect string syntax 54optimizing 646

MIME formatemail 320

monitoringdata flow 639session details 434

MOVINGAVGSee also Transformation Language Referencepartitioning restrictions 395

MOVINGSUMSee also Transformation Language Referencepartitioning restrictions 395

multibyte datacharacter handling 227Oracle external loader 533Sybase IQ external loader 535writing to files 270

multiple serversoverview 444

multiple sessions 196

Nnaming convention

See also Getting Started Guidenaming conventions

session parameters 496, 520native connect string

See connect stringnavigating

workspace 69network packets

increasing 643, 646non-persistent variables 110non-reusable tasks

inherited changes 136promoting to reusable 136

normal loadingsession properties 697

Normal tracing levelsdefinition 473

Normalizer transformationpartitioning guidelines 347

notificationgeneral option 41

null charactersediting 702file targets 266server handling 227session properties, targets 265targets 702

Index 773

numeric operationsoptimizing by using 653

numeric valuesreading from sources 229

Oopen transaction

defined 287operators

using for optimization 653optimizing

block size 657buffer block size 655choosing numeric vs. string operations 653commit interval 655, 658data cache 655data caches 658data flow 440, 637, 639disabling high precision 658dropping indexes and key constraints 642DTM Buffer Pool Size 655eliminating transformation errors 648expressions 652factoring out common logic 652filters 650high precision 655IIF expressions 653increasing checkpoint interval 642increasing network packet size 646index cache 655, 658Joiner transformation 651Lookup transformation 649, 650mapping 647minimizing aggregate function calls 652minimizing datatype conversions 648minimizing error tracing 659pipeline partitioning 663removing trailing blank spaces 653replacing sub-expressions with local variables 652sessions 655single-pass reading 647source database 645system-level 660target database 642Tracing Level 655using DECODE vs. LOOKUP expressions 653using operators vs. functions 653

optimizing performanceAggregator transformation 650

OR links 137Oracle

bulk loading 642bulk loading guidelines 253commit intervals 253connect string syntax 54connection with OS Authentication 53

Oracle external loaderattributes 533bulk loading 643connecting with OS Authentication 552data precision 533delimited flat file target 533external loader connections 551external loader support 524, 533fixed-width flat file target 533multibyte data 533null constraint 533partitioned target files 533reject file 534

output filesoverview 28, 33permissions 28session parameter 504session properties 700targets 263

$OutputFiledefinition 504naming convention 496, 520using 505

overrideTeradata loader control file 539tracing levels 473, 679

owner nametruncating target tables 245

Ppacket size 58paging

eliminating 662parameter files

format 513location 518session 512specifying in session 518using with pmcmd starttask 607using with pmcmd startworkflow 608

parameterssession 496

774 Index

partition keysadding 358, 362, 364adding key ranges 365

partition pointsadding and deleting 353default 17description 17, 346Joiner transformation 384

partition typesdescription 348

partitioningSee pipeline partitioning

partitioning dataincremental aggregation 578

partitioning restrictionsDebugger 396Informix 379numerical functions 395PowerCenter Connect for IBM MQSeries restrictions

397PowerCenter Connect for PeopleSoft restrictions 397PowerCenter Connect for SAP BW 397PowerCenter Connect for SAP R/3 397PowerCenter Connect for Siebel 398relational targets 395Sybase IQ 379, 395transformations 395unconnected transformations 353XML targets 396

Partitioning tabin the Server Manager 762in the Workflow Manager 762

Partitionsproperties 352

partitionsadding and deleting 356description 18, 348

Partitions viewsproperties 351

pass-through pipelineoverview 15

performanceSee also optimizingcommit interval 278detail file 31identifying bottlenecks 637monitoring 436server data movement mode 661Sybase IQ 643tuning, overview 636

performance datacollecting 674

performance detail filescreating 436enabling session monitoring 436permissions 28understanding counters 437viewing 436

performance settingssession properties 674

permissionsconnection objects 51creating a session 175database 51deleting a PowerCenter Server 50editing sessions 177external loader 525FTP connections 561FTP session 565output and log files 28recovery files 28scheduling 90Workflow Monitor tasks 403

persistent lookup cachesession output 35

persistent variables 110in worklets 169

pingingpmcmd syntax 602PowerCenter Server in Workflow Monitor 405

Pingserverpmcmd syntax 602

pipeline partitioningadding and deleting partitions 356adding hash keys 362adding key ranges 365adding partition points 353caching Lookup transformations 628concurrent connections 379configuring a session 351configuring for sorted data 384configuring to optimize join performance 384database compatibility 379description 346error threshold 200example of use 349external loaders 380, 526file lists 375file sources 374file targets 380filter conditions 372

Index 775

hash auto-keys partitioning 361hash partitioning 361hash user keys partitioning 362Joiner transformation 384key range 363loading to Informix 379mapping variables 394merge target files 699merging target files 380, 382message queues 380multiple CPUs 3multiple source pipelines 19numerical functions restrictions 395object validation 396optimizing performance 663optimizing source databases 663optimizing target databases 664overview 3partition keys 358, 362, 364partition types overview 356partitioning indirect files 375pass-through partitioning 367recovery 200reject file 476relational sources 371relational targets 378round-robin partitioning 360rules and restrictions 395, 398session properties 705sorted flat files 385sorted relational data 387Sorter transformation 389, 392SQL queries 371symmetric processing platform 24threads and partitions 18threads created 16Transaction Control transformation 356

pipelinesSee source pipelinesactive sources 259data flow monitoring 440, 637, 639description 346

PM_CODEPAGENAMEusing with pmcmd 585

PM_RECOVERY tableformat 299

PM_TGT_RUN_ID tableformat 299

pmcmdaborttask 596abortworkflow 597

command line mode 589command parameters 594commands, list 582commands, reference 594environment variables 585getserverdetails 599getserverproperties 599getsessionstatistics 600gettaskdetails 601getworkflowdetails 601help 602interactive mode 592overview 582parameter files 607, 608pingserver 602resumeworkflow 603return codes 300setfolder 604setnowait 605setwait 605showsettings 605shutdownserver 605starttask 606startworkflow 607stoptask 609stopworkflow 609syntax 595unsetfolder 610version 611waittask 611waitworkflow 611writing scripts 589

PMError_MSG table schema 485PMError_ROWDATA table schema 483PMError_Session table schema 486$PMFailureEmailUser

definition 333tips 342

PmNullPasswdreserved word 53

PmNullUserreserved word 53

pmserverprocess 11

$PMSessionLogCountsaving a number of logs 471

$PMSessionLogDirconfiguring the session log 471definition 469

$PMSessionLogFiledefinition 497

776 Index

using 498$PMSuccessEmailUser

definition 333tips 342

PMTOOL_DATEFORMATusing with pmcmd 585

$PMWorkflowLogDirdefinition 459

$PMWorkflowLogCountsaving a number of logs 460

post-session commandsession properties 711shell command properties 714

post-session emailoverview 33, 332See also emailsession options 716session properties 711

post-session shell commandconfiguring non-reusable 189configuring reusable 192using 188

post-session SQL commands 186post-session threads

description 14PowerCenter Connect for IBM MQSeries

partitioning restrictions 397PowerCenter Connect for PeopleSoft

partitioning restrictions 397PowerCenter Connect for SAP BW

partitioning restrictions 397PowerCenter Connect for SAP R/3

partitioning restrictions 397PowerCenter Connect for Siebel

partitioning restrictions 398PowerCenter Server 22

architecture 2assigning sessions 198assigning workflows 122blocking data 23changing servers 445commit interval overview 276configuring for multiple servers 445connecting in Workflow Monitor 405connectivity overview 5, 46creating server grids 451data movement modes 27deleting 50external loader support 524filtering in Workflow Monitor 406handling file targets 268

logs 28messages 29monitoring 436multiple servers overview 444multiple source file list 230online and offline mode 405output files 33performance detail file 31permissions to delete 50pinging in Workflow Monitor 405privileges required to register 46processing data 22reading sources 22registering 46, 48removing assigned sessions 199removing assigned workflows 123reporting session statistics 468server grids overview 446system resources 24tracing levels 473truncating target tables 245using FTP 561using multiple to increase performance 661using server grids to increase performance 661variables for 46

pre- and post-session SQLentering 186guidelines 186

precisionflat files 270writing to file targets 269

pre-defined eventswaiting for 158

pre-defined variablesin Decision tasks 149

pre-session shell commandconfiguring non-reusable 189configuring reusable 192errors 193session properties 711using 188

pre-session SQL commands 186pre-session threads

description 14privileges

See also permissionsSee also Repository Guidescheduling 90session 175workflow 90Workflow Monitor tasks 403

Index 777

workflow operator 90Properties tab in session properties

in Workflow Manager 670

QQuit

pmcmd syntax 602quoted identifiers

reserved words 255

Rrank cache

calculating data cache 633calculating index cache 632location 632overview 632size 632

Rank transformationSee also Transformation Guidecache partitioning 620caches 26, 34, 632partitioning guidelines 347performance detail 639

reader threadsdescription 14, 15

readingsources 22

real-time sessionstransformation scope 288

recoveringpipeline partitioning 200

recoverycompleting unrecoverable sessions 316configuring mappings 297configuring the session 297configuring the target database 298configuring the workflow 298files, permissions 28overview 296PM_RECOVERY table format 299PM_TGT_RUN_ID table format 299pmcmd return codes 300recover from task 308recover task 311recovering a failed workflow 308recovering a session task 311recovering a suspended workflow 305recovery table layout 314

resume/recover 305server handling 314

recovery filespermissions 28

recreatingindexes 248

registeringPowerCenter Server 46, 48

registering serverSee also Installation and Configuration Guide

reinitializingaggregate cache 576

reject filechanging names 476column indicators 478locating 456, 476Oracle external loader 534overview 32permissions 28pipeline partitioning 476reading 477row indicators 478session parameter 508session properties 243, 263, 698, 700transaction control 284viewing 476

relational connectionsSee relational databases

relational databasesconfiguring a connection 56copying a relational database connection 59replacing a relational database connection 62rollback segment 58

relational sourcespartitioning 371session properties 214

relational targetspartitioning 378partitioning restrictions 395session properties 240, 697

Relative timespecifying 162Timer task 161

reload task or workflowconfiguring 40

renamerepository objects 73

repositoriesadding 73connecting in Workflow Monitor 405enter description 73

778 Index

repository objectsconfiguring 73rename 73

Repository Servernotification 41notification in Workflow Monitor 410

requirementsserver grids 448

reserved wordsgenerating SQL with 255resword.txt 255

reserved words filecreating 256

reset all 42restarting

in Workflow Monitor 416Resumeworkflow

pmcmd syntax 603Resumeworklet

pmcmd syntax 603reusable tasks

inherited changes 136reverting changes 136

reverting changestasks 136

rmailSee also emailconfiguring 321

rollback segment 58rolling back data

transaction control 283round-robin partitioning 348, 360row error log files

permissions 28row error logging

active sources 260row indicators

reject file 478rows to skip

delimited files 692Run if Previous Completed

in Command Tasks 145session command 714

run optionsrun continuously 115run on demand 115server initialization 115

running status 421running, sessions 197running, workflows 122

Ssaving

session logs 471workflow logs 459

scheduled status 421scheduling

configuring 114creating reusable scheduler 114disabling workflows 118editing 117end options 116error message 113permission 90run every 115run once 115run options 115schedule options 115start date 116start time 116workflows 112

searchingfor versioned objects in the Workflow Manager 76Workflow Manager 70Workflow Monitor 427

Sequence Generator transformationpartitioning guidelines 353, 396

serverSee PowerCenter ServerSee also database-specific serverselecting 122, 197

server code pageSee also PowerCenter Serveraffecting incremental aggregation 577

Server Grid Browser 453Server Grid Editor 452server grids

connectivity 447creating 451definition 444distributing sessions 446increasing performance 661master servers 446overview 446requirements 448worker servers 446

server handlingfile targets 268fixed-width targets 269, 270multibyte data to file targets 271shift-sensitive data, targets 271

Index 779

server logsmessages 29overview 28

Server Manager session propertiesGeneral tab 737Log and Error Handling tab 758Partitioning tab 762Source Location tab 754Time tab 755Transformations tab 761

server variablesdescription 46email 333for multiple servers 445in Command tasks 188, 193list 47log files 46

serversassigned 444non-associated 444

session command settingssession properties 711

session detailsmonitoring sessions 434

session errors 201session logs

archiving 471changing location 498changing locations 471changing name 497changing names 471code page 475codes 463creation 11default name 470editing 419external loader error messages 527generating using UTF-8 463load summary 467locating 456, 469location 671log file settings 469, 470, 472, 474overview 31parameter 497permissions 28reading 463sample 466saving 678session details 31session parameter 497thread identification 465

timestamp 472tracing levels 473transformation statistics 469viewing 474viewing dynamically 419viewing in Workflow Monitor 419

session outputcache files 34control file 33incremental aggregation files 34indicator file 33performance detail file 31persistent lookup cache 35post-session email 33PowerCenter Server log 28reject file 32session logs 31target output file 33

session parametersdatabase connection parameter 499defining 512in Command tasks 143naming conventions 496, 520overview 496reject file parameter 508session log parameter 497session parameter file 512source file parameter 502target file parameter 504

session propertiesComponents tab 710Config Object tab 675constraint-based loading 251delimited files, sources 222delimited files, targets 266edit delimiter 690, 702edit null character 702email 332, 714external loader 682, 695fixed-width files, sources 220fixed-width files, targets 265FTP files 682, 695general settings 668General tab 668log files 469, 470, 472, 474Metadata Extensions tab 718null character, targets 265on failure email 332on success email 332output files, flat file 700partition attributes 351, 352

780 Index

Partitions View 705performance settings 674post-session email 332post-session shell command 714Properties tab 670reject file, flat file 263, 700reject file, relational 243, 698relational sources 214relational targets 240session command settings 711session retry on deadlock 246sort order 577source connections 211sources 210table name prefix 254target connection settings 682, 695target connections 237target load options 252, 697target-based commit 292targets 236Transformation node 703transformations 703

session properties comparisonoverview 736

session retry on deadlockSee also Installation and Configuration Guideoverview 246

sessionsSee also session logsSee also session propertiesaborting 130, 200apply attributes to all instances 178assigning PowerCenter Servers 198caches 28configuring for multiple source files 231configuring to optimize join performance 384creating 175creating a session configuration object 183definition 2, 174description 132distributing in server grids 446DTM buffer memory 25editing 177editing privileges 178eliminating paging 621email 320enabling monitoring 436external loading 524, 553failure 200high-precision data 204identifying bottlenecks 639

metadata extensions in 82monitoring counters 437multiple source files 230optimizing 636, 655output files 28overview 174parameter file 512parameters 496performance detail file 31performance tuning 636properties reference 667read-only 175removing assigned PowerCenter Servers 199running 197runtime operations overview 7session details file 31starting 197stopping 130, 200test load 244, 264truncating target tables 245using FTP 565validating 195viewing performance details 436

Setfolderpmcmd syntax 604

Setnowaitpmcmd syntax 605

Setwaitpmcmd syntax 605

shared memoryLoad Manager 24

shell commandsexecuting in Command tasks 145make reusable 191post-session 188post-session properties 714pre-session 188using Command tasks 143using server variables 188, 193using session parameters 143

Showsettingspmcmd syntax 605

Shutdownserverpmcmd syntax 605

single-pass readingdefinition 647

sort orderSee also session propertiesaffecting incremental aggregation 577

sorted flat filespartitioning for optimized join performance 385

Index 781

sorted portscaching requirements 621

sorted relational datapartitioning for optimized join performance 387

Sorter transformationpartitioning 392partitioning for optimized join performance 389

$Sourcesession properties 672

source bottlenecksusing a database query to identify 638using a read test session to identify 638using filter transformation to identify 637

source datacapturing changes for aggregation 574

source databasesdatabase connection session parameter 499identifying bottlenecks 637optimizing 645optimizing by partitioning 663optimizing the query 645optimizing with conditional filters 646

source filesaccessing through FTP 560, 565configuring for multiple files 230, 231delimited properties 691fixed-width properties 689session parameter 502session properties 220, 687using parameters 502, 506

source locationsession properties 220, 687

Source Location tabin the Workflow Manager 754Server Manager session properties 754

source pipelinesdescription 346pass-through 15reading 22stages 17target load order groups 22threads created 19with Joiner transformations 19

Source Qualifier transformationpartitioning guidelines 347

source-based commitactive sources 278description 278

sourcescode page 224code page, flat file 222

connections 211delimiters 224escape character 691line sequential buffer length 225multiple sources in a session 230null character 689null character handling 227null characters 222overriding SQL query, session 216partitioning 371, 374quote character 691reading 22session properties 210specifying code page 689, 691

SQLconfiguring environment SQL 55guidelines for entering environment SQL 55

SQL queriesin partitioned pipelines 371

stagesdescription 17

staging areasremoving to improve performance 659

start date, scheduling 116Start tasks, definition 88start time, scheduling 116starting

selecting a server 122, 197sessions 197start from task 124starting a part of a workflow 124starting tasks 125starting workflows using Workflow Manager 124Workflow Monitor 404workflows 122

Starttaskpmcmd syntax 606using a parameter file 607

Startworkflowpmcmd syntax 607using a parameter file 608

statisticsfor Workflow Monitor 408viewing 408

statusaborted 421aborting 421disabled 421failed 421in Workflow Monitor 421running 421

782 Index

scheduled 421stopped 421stopping 421succeeded 421suspended 127, 421suspending 127, 421tasks 421terminated 421unscheduled 421waiting 421workflows 421

stop on$PMSessionErrorThreshold 47error threshold 200errors 679pre- and post-session SQL errors 186

stopped status 421stopping

PowerCenter Server See Installation and Configuration Guide

in Workflow Monitor 418server handling 129sessions 130tasks 129using Control tasks 147workflows 129

stopping status 421Stoptask

pmcmd syntax 609Stopworkflow

pmcmd syntax 609string operations

minimizing for performance 653sub-expressions

replacing with local variables 652succeeded status 421Suspend On Error option 127suspended status 127, 421suspending

behavior 127email 128resume in Workflow Monitor 417status 127workflows 127worklets 164

suspending status 421suspension email 339Sybase

commit interval 253Sybase IQ

partitioning restrictions 379, 395

Sybase IQ external loaderattributes 536bulk loading 643connections 551data precision 535delimited flat file targets 536fixed-width flat file targets 535multibyte data 535optional quotes 535overview 535support 524

Sybase SQL Serverbulk loading 642connect string example 54optimizing 646

symmetric processing platformpipeline partitioning 24

system bottlenecksidentifying 640UNIX 641Windows 640

system-level optimizationimproving network speed 660overview 660using additional CPUs 661

Ttable name prefix

target owner 254table owner name

session properties 216targets 254

$Targetsession properties 672

target connect groupscommitting data 278

target connection groupTransaction Control transformation 289

target connection groupsconstraint-based loading 249defined 257

target connection settingssession properties 682, 695

target databasesbulk loading 642database connection session parameter 499identifying bottlenecks 637optimizing 642optimizing by partitioning 664

Index 783

optimizing Oracle target database 643target files

delimited 703fixed-width 702

target load orderconstraint-based loading 249groups 22

target load order groupsdefined 22

target ownertable name prefix 254

target propertiesbulk mode 241test load 241update strategy 241

target tablestruncating 245

target-based commitWriterWaitTimeout 277

target-based commit intervaldescription 277

targetsaccessing through FTP 560, 568code page 267, 702, 703code page compatibility 235code page, flat file 266connection settings 695connections 237database connections 234delimiters 267file writer 236globalization features 234heterogeneous 274load, session properties 252, 697merging output files 380, 382multiple connections 274multiple types 274null characters 266output files 263output files for 33partitioning 378, 380relational settings 697relational writer 236session properties 236, 240specifying null character 702truncating tables 245viewing session detail 31writers 236

Task Developercreating tasks 133displaying and hiding tool name 41

Task viewconfiguring 412customizing 412displaying 430filtering 431hiding 412opening and closing folders 407overview 402using 430

tasksaborted 421aborting 129, 421adding in workflows 92arranging 71Assignment tasks 140Command tasks 143configuring 135Control task 147copying 77creating 133creating in Task Developer 133creating in Workflow Designer 133Decision tasks 149disabled 421disabling 137email 328Event-Raise tasks 153Event-Wait tasks 153failed 421failing parent workflow 138in worklets 166inherited changes 136instances 136list of 132non-reusable 92overview 132promoting to reusable 136restarting in Workflow Monitor 416reusable 92reverting changes 136running 421show full name 41starting 125status 421stopped 421stopping 129, 421stopping and aborting in Workflow Monitor 418succeeded 421Timer tasks 161using Tasks toolbar 92validating 119

784 Index

Tasks toolbarcreating tasks 134

TCP/IP network protocolserver settings 49

Teradataconnect string example 54

Teradata external loadercode page 538connections 551date format 538FastLoad attributes 545MultiLoad attributes 540overriding the control file 539support 524Teradata Warehouse Builder attributes 547TPump attributes 542

Teradata Warehouse Builderattributes 547operators 547

terminated status 421Terse tracing levels

See also Designer Guidedefined 473

test loadbulk loading 244enabling 671file targets 264number of rows to test 671relational targets 244

thread identificationsession log file 465

threadsand partitions 18creation 13, 14mapping 14master 14post-session 14pre-session 14reader 14, 15transformation 14, 16types 14writer 14, 16

timeconfiguring 38formats 38

Time tabduration options 756schedule options 755Server Manager session properties 755start options 756use absolute time option 757

Timer tasksabsolute time 161, 162definition 161description 132example 161relative time 161, 162variables in 103

timestampssession logs 472workflow logs 460, 462Workflow Monitor 402

tool namesdisplaying and hiding 41

toolbars 69adding tasks 92creating tasks 134using 69Workflow Monitor 415

Tracing Leveloptimizing 655

tracing levelsSee also Designer GuideNormal 473overriding 679session 473Terse 473Verbose Data 474Verbose Initialization 474

transactiondefined 287

transaction boundarydropping 287transaction control 287

transaction controlbulk loading 283end of file 284open transaction 287overview 287PowerCenter Server handling 283real-time sessions 287reject file 284rules and guidelines 290transaction control points 287transformation error 284transformation scope 287user-defined commit 283

transaction control pointdefined 287

Transaction Control transformationpartitioning guidelines 356target connection group 289

Index 785

transaction control unitdefined 289

transaction generatoractive sources 259effective and ineffective 259transaction control points 287

transformation scopedefined 287real-time processing 288transformations 288

transformation threadsdescription 14, 16

transformationsas partition points 353eliminating errors 648optimizing 639partitioning restrictions 395session properties 703statistics on 469

Transformations nodeproperties 703

Transformations tabin the Server Manager 761in the Workflow Manager 761

Transformations viewsession properties 681

Treat Source Rows Asbulk loading 252

Treat Source Rows As propertyoverview 214

truncatingTable Name Prefix 245target tables 245

Uunconnected transformations

partitioning restrictions 353Unicode mode

See also Installation and Configuration Guidecode pages 27session behavior 16

UNIX systemsemail 321external loader behavior 526PowerCenter Server as daemon 3

unscheduled status 421Unsetfolder

pmcmd syntax 610

update strategytarget properties 241

Update Strategy transformationconstraint-based loading 249

updatingincrementally 579

URLadding through business documentation links 97

user-defined commitsee also transaction controlbulk loading 283

user-defined eventsdeclaring 155example 153waiting for 157

using multiple servers 444

Vvalidating 196

expressions 97, 119tasks 119workflows 119, 120worklets 171

Varchar datatypesSee also Designer Guideremoving trailing blanks for optimization 653

variablesemail 333server 46workflow 103

Verbose Data tracing levelsconfiguring session log 474See also Designer Guide

Verbose Initialization tracing levelsconfiguring session log 474See also Designer Guide

Versionpmcmd syntax 611

versioned objectsSee also Repository Guidechecking in 74checking out 74searching for in the Workflow Manager 76

viewingreject file 476session logs 474workflow logs 462

786 Index

Wwaiting status 421Waittask

pmcmd syntax 611Waitworkflow

pmcmd syntax 611web links

adding to expressions 97webzine lwindows

customizing 69displaying and closing 69docking and undocking 69Navigator 67Output 67overview 67panning 40reloading 40Workflow Manager 67Workflow Monitor 402workspace 67

Windows System Trayaccessing Workflow Monitor 404

Windows systemsemail 322external loader behavior 526Informatica service owner 322logon network security 325PowerCenter Server service 3

worker servers 446Workflow Designer

creating tasks 133displaying and hiding tool name 41

workflow logsarchiving 459changing locations 461changing name 461codes 458configuring 460creation 9editing 419enabling and disabling 459, 461locating 456, 459log file settings 459, 460overview 30permissions 28reading 458sample 458timestamp 460viewing 462

viewing dynamically 419viewing in Workflow Monitor 419

Workflow Manageradding repositories 73arrange 71checking out and in versioned objects 74configuring for multiple source files 231copying 77creating external loader connections 551customizing options 39date and time formats 38defining FTP connections 561display options 39entering object descriptions 73format options 42general options 39increasing network packet size 646managing multiple servers 444messages to Workflow Monitor 410overview 38, 46, 66registering the PowerCenter Server 46, 48searching for items 70searching for versioned objects 76setting up database connections 53, 56toolbars 69tools 66validating sessions 195windows 67, 69zooming the workspace 71

Workflow Monitorclosing folders 407configuring 409connecting to repositories 405connecting to server 405customizing columns 412deleted servers 405deleted tasks 406disconnecting from server 405displaying servers 406dynamic logs 419editing logs 419filtering deleted tasks 406filtering servers 406filtering tasks in Task View 405, 431Gantt Chart view 402hiding columns 412hiding servers 406icon 404launching 404launching automatically 41listing tasks and workflows 424

Index 787

log file editor 410Maximum Days 410Maximum Workflow Runs 410monitor modes 405navigating the Time window 425notification from Repository Server 410opening folders 407overview 402performing tasks 416permissions and privileges 403pinging the PowerCenter Server 405receive messages from Workflow Manager 410restarting tasks, workflows, and worklets 416resuming a workflow or worklet 417searching 427session details 434starting 404statistics 408stopping or aborting tasks and workflows 418switching views 403System Tray 404Task view 402time 402toolbars 415viewing history names 419viewing session logs 419viewing workflow logs 419workflow and task status 421zooming 426

workflow outputemail 33workflow logs 30

workflow parameter file 110workflow properties

log files 459, 460suspension email 339

workflow variablescreating 110datatypes 105, 110default values 106, 109, 110keywords 104non-persistent variables 110persistent variables 110pre-defined 105start and current values 109SYSDATE 105user-defined 108using 103using in expressions 106WORKFLOWSTARTTIME 105

workflowsaborted 421aborting 129, 421adding tasks 92assigning PowerCenter Servers 122branches 88copying 77creating 91definition 2, 88deleting 97developing 89, 91disabled 421disabling 118editing 98email 341events 88fail parent workflow 138failed 421guidelines 89links 88locking 8metadata extensions in 82monitor 89overview 88parameter file 9privileges 90properties reference 721removing assigned PowerCenter Servers 123restarting in Workflow Monitor 416resuming in Workflow Monitor 417running 7, 122, 421runtime operations overview 7scheduled 421scheduling 112selecting a server 89starting 122starting on non-associated server 444status 127, 421stopped 421stopping 129, 421stopping and aborting in Workflow Monitor 418succeeded 421suspended 421suspending 127, 421suspension email 339terminated 421unscheduled 421using tasks 132validating 119variables 103waiting 421

788 Index

Worklet Designerdisplaying and hiding tool name 41

workletsadding tasks 166configuring properties 166create non-reusable worklets 165create reusable worklets 165declaring events 167developing 165email 341fail parent worklet 138metadata extensions in 82overriding variable value 169overview 164parameters tab 169persistent variable example 169persistent variables 169restarting in Workflow Monitor 416resuming in Workflow Monitor 417suspended 421suspending 164, 421unscheduled 421validating 171variables 169waiting 421

workspacecolor 42navigating 69setting colors 42setting fonts 42zooming 71

workspace file directory 41writer threads

description 14, 16writers

session properties 692WriterWaitTimeout

target-based commit 277writing

multibyte data to files 270to fixed-width files 268, 269

XXML sources

allocating memory 655numeric data handling 229

XML targetsactive sources 259partitioning restrictions 396

target-based commit 277

Zzooming

Workflow Manager 71Workflow Monitor 426

Index 789

790 Index

WorkflowAdministrationGuide

Documents

informatica documentation

informatica resources

informatica webzinel

informatica web sitel

informatica developer

informatica customer

powercenter server log

workflow log file