Chapter 2: BTEQ - dbmanagement.infodbmanagement.info/Books/MIX/15041762-Teradata-Utilities_TeraData.pdfTeradata Utilities-Breaking the Barriers, First Edition Chapter 2: BTEQ An Introduction

Teradata Utilities-Breaking the Barriers, First Edition

Chapter 2: BTEQAn Introduction to BTEQ

Why it is called BTEQ?Why is BTEQ available on every Teradata system ever built? Because the Batch TEradata Query (BTEQ) tool was the original way that SQL was submitted to Teradata as a means of getting an answer set in a desired format. This is the utility that I used for training at Wal*Mart, AT&T, Anthem Blue Cross and Blue Shield, and SouthWestern Bell back in the early 1990's. BTEQ is often referred to as the Basic TEradata Query and is still used today and continues to be an effective tool.

Here is what is excellent about BTEQ: BTEQ can be used to submit SQL in either a batch or interactive environment. Interactive users can submit SQL

and receive an answer set on the screen. Users can also submit BTEQ jobs from batch scripts, have error checking and conditional logic, and allow for the work to be done in the background.

BTEQ outputs a report format, where Queryman outputs data in a format more like a spreadsheet. This allows BTEQ a great deal of flexibility in formatting data, creating headings, and utilizing Teradata extensions, such as WITH and WITH BY that Queryman has problems in handling.

BTEQ is often used to submit SQL, but is also an excellent tool for importing and exporting data.o Importing Data: Data can be read from a file on either a mainframe or LAN attached computer and used for

substitution directly into any Teradata SQL using the INSERT, UPDATE or DELETE statements.o Exporting Data: Data can be written to either a mainframe or LAN attached computer using a SELECT from

Teradata. You can also pick the format you desire ranging from data files to printed reports to Excel formats.

There are other utilities that are faster than BTEQ for importing or exporting data. We will talk about these in future chapters, but BTEQ is still used for smaller jobs.

Logging on to BTEQBefore you can use BTEQ, you must have user access rights to the client system and privileges to the Teradata DBS. Normal system access privileges include a userid and a password. Some systems may also require additional user identification codes depending on company standards and operational procedures. Depending on the configuration of your Teradata DBS, you may need to include an account identifier (acctid) and/or a Teradata Director Program Identifier (TDPID).

Using BTEQ to submit queries

Submitting SQL in BTEQ's Interactive ModeOnce you logon to Teradata through BTEQ, you are ready to run your queries. Teradata knows the SQL is finished when it finds a semi-colon, so don't forget to put one at the end of your query. Below is an example of a Teradata table to demonstrate BTEQ operations.

Employee_Table

Figure 2-1

BTEQ execution

Page 1 of 91

http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D1%5F0%2Ejpg&image_id=1&previd=IMG_1&titlelabel=Figure+2%2D1%3A+


Figure 2-2

Submitting SQL in BTEQ's Batch ModeOn network-attached systems, BTEQ can also run in batch mode under UNIX (IBM AIX, Hewlett-Packard HP-UX, NCR MP-RAS, Sun Solaris), DOS, Macintosh, Microsoft Windows and OS/2 operating systems. To submit a job in Batch mode do the following:

1. Invoke BTEQ2. Type in the input file name3. Type in the location and output file name.

The following example shows how to invoke BTEQ from a DOS command. In order for this to work, the directory called Program Files\NCR\Teradata Client\bin must be established in the search path.

Figure 2-3

Notice that the BTEQ command is immediately followed by the


Using BTEQ Conditional LogicBelow is a BTEQ batch script example. The initial steps of the script will establish the logon, the database, and the delete all the rows from the Employee_Table. If the table does not exist, the BTEQ conditional logic will instruct Teradata to create it. However, if the table already exists, then Teradata will move forward and insert data.

Note In script examples, the left panel contains BTEQ base commands and the right panel provides a brief description of each command.

Figure 2-5Using BTEQ to Export DataBTEQ allows data to be exported directly from Teradata to a file on a mainframe or network-attached computer. In addition, the BTEQ export function has several export formats that a user can choose depending on the desired output. Generally, users will export data to a flat file format that is composed of a variety of characteristics. These characteristics include: field mode, indicator mode, or dif mode. Below is an expanded explanation of the different mode options.

Format of the EXPORT command: .EXPORT {FILE | DDNAME } = [, LIMIT=n]

Record Mode: (also called DATA mode): This is set by .EXPORT DATA. This will bring data back as a flat file. Each parcel will contain a complete record. Since it is not a report, there are no headers or white space between the data contained in each column and the data is written to the file (e.g., disk drive file) in native format. For example, this means that INTEGER data is written as a 4-byte binary field. Therefore, it cannot be read and understood using a normal text editor.

Field Mode (also called REPORT mode): This is set by .EXPORT REPORT. This is the default mode for BTEQ and brings the data back as if it was a standard SQL SELECT statement. The output of this BTEQ export would return the column headers for the fields, white space, expanded packed or binary data (for humans to read) and can be understood using a text editor.

Indicator Mode: This is set by .EXPORT INDICDATA. This mode writes the data in data mode, but also provides host operating systems with the means of recognizing missing or unknown data (NULL) fields. This is important if the data is to be loaded into another Relational Database System (RDBMS).

The issue is that there is no standard character defined to represent either a numeric or character NULL. So, every system uses a zero for a numeric NULL and a space or blank for a character NULL. If this data is simply loaded into another RDBMS, it is no longer a NULL, but a zero or space.

To remedy this situation, INDICATA puts a bitmap at the front of every record written to the disk. This bitmap contains one bit per field/column. When a Teradata column contains a NULL, the bit for that field is turned on by setting it to a "1". Likewise, if the data is not NULL, the bit remains a zero. Therefore, the loading utility reads these bits as indicators of NULL data and identifies the column(s) as NULL when data is loaded back into the table, where appropriate.

Page 3 of 91



Since both DATA and INDICDATA store each column on disk in native format with known lengths and characteristics, they are the fastest method of transferring data. However, it becomes imperative that you be consistent. When it is exported as DATA, it must be imported as DATA and the same is true for INDICDATA.

Again, this internal processing is automatic and potentially important. Yet, on a network-attached system, being consistent is our only responsibility. However, on a mainframe system, you must account for these bits when defining the LRECL in the Job Control Language (JCL). Otherwise, your length is too short and the job will end with an error.

To determine the correct length, the following information is important. As mentioned earlier, one bit is needed per field output onto disk. However, computers allocate data in bytes, not bits. Therefore, if one bit is needed a minimum of eight (8 bits per byte) are allocated. Therefore, for every eight fields, the LRECL becomes 1 byte longer and must be added. In other words, for nine columns selected, 2 bytes are added even though only nine bits are needed.

With this being stated, there is one indicator bit per field selected. INDICDATA mode gives the Host computer the ability to allocate bits in the form of a byte. Therefore, if one bit is required by the host system, INDICDATA mode will automatically allocate eight of them. This means that from one to eight columns being referenced in the SELECT will add one byte to the length of the record. When selecting nine to sixteen columns, the output record will be two bytes longer.

When executing on non-mainframe systems, the record length is automatically maintained. However, when exporting to a mainframe, the JCL (LRECL) must account for this addition length.

DIF Mode: Known as Data Interchange Format, which allows users to export data from Teradata to be directly utilized for spreadsheet applications like Excel, FoxPro and Lotus.

The optional limit is to tell BTEQ to stop returning rows after a specific number (n) of rows. This might be handy in a test environment to stop BTEQ before the end of transferring rows to the file.BTEQ EXPORT Example Using Record (DATA) ModeThe following is an example that displays how to utilize the export Record (DATA) option. Notice the periods (.) at the beginning some of script lines. A period starting a line indicates a BTEQ command. If there is no period, then the command is an SQL command.

When doing an export on a Mainframe or a network-attached (e.g., LAN) computer, there is one primary difference in the .EXPORT command. The difference is the following:

Mainframe syntax:

LAN syntax:

.EXPORT DATA DDNAME = data definition state name (JCL)

.EXPORT DATA FILE = actual file name

The following example uses a Record (DATA) Mode format. The output of the exported data will be a flat file.

Employee_Table

Figure 2-6BTEQ EXPORT Example Using Field (Report) ModeThe following is an example that displays how to utilize the export Field (Report) option. Notice the periods (.) at the beginning some of script lines. A period starting a line indicates a BTEQ command and needs no semi-colon. Likewise, if there is no period, then the command is an SQL command and requires a semi-colon.

Page 4 of 91



Figure 2-7

After this script has completed, the following report will be generated on disk.

Employee_No Last_name First_name Salary Dept_No

2000000 Jones Squiggy 32800.50 ?

1256349 Harrison Herbert 54500.00 400

1333454 Smith John 48000.00 200

1121334 Strickling Cletus 54500.00 400

1324657 Coffing Billy 41888.88 200

2341218 Reilly William 36000.00 400

1232578 Chambers Mandee 56177.50 100

1000234 Smythe Richard 64300.00 10

2312225 Larkins Loraine 40200.00 300

I remember when my mom and dad purchased my first Lego set. I was so excited about building my first space station that I ripped the box open, and proceeded to follow the instructions to complete the station. However, when I was done, I was not satisfied with the design and decided to make changes. So I built another space ship and constructed another launching station. BTEQ export works in the same manner, as the basic EXPORT knowledge is acquired, the more we can build on that foundation.

With that being said, the following is an example that displays a more robust example of utilizing the Field (Report) option. This example will export data in Field (Report) Mode format. The output of the exported data will appear like a standard output of a SQL SELECT statement. In addition, aliases and a title have been added to the script.

Figure 2-8

After this script has been completed, the following report will be generated on disk.

Page 5 of 91

http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D7%5F0%2Ejpg&image_id=7&previd=IMG_7&titlelabel=Figure+2%2D7%3A+http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D8%5F0%2Ejpg&image_id=8&previd=IMG_8&titlelabel=Figure+2%2D8%3A+


Employee Profiles

Employee Number Last Name First Name Salary Department Number

2000000 Jones Squiggy 32800.50 ?

1256349 Harrison Herbert 54500.00 400

1333454 Smith John 48000.00 200

1121334 Strickling Cletus 54500.00 400

1324657 Coffing Billy 41888.88 200

2341218 Reilly William 36000.00 400

1232578 Chambers Mandee 56177.50 100

1000234 Smythe Richard 64300.00 10

2312225 Larkins Loraine 40200.00 300

From the above example, a number of BTEQ commands were added to the export script. Below is a review of those commands. The WIDTH specifies the width of screen displays and printed reports, based on characters per line. The FORMAT command allows the ability to enable/inhibit the page-oriented format option. The HEADING command specifies a header that will appear at the top every page of a report.BTEQ IMPORT ExampleBTEQ can also read a file from the hard disk and incorporate the data into SQL to modify the contents of one or more tables. In order to do this processing, the name and record description of the file must be known ahead of time. These will be defined within the script file.

Format of the IMPORT command: .IMPORT { FILE | DNAME } = [,SKIP=n]

The script below introduces the IMPORT command with the Record (DATA) option. Notice the periods (.) at the beginning some of script lines. A period starting a line indicates a BTEQ command. If there is no period, then the command is an SQL command.

The SKIP option is used when you wish to bypass the first records in a file. For example, a mainframe tape may have header records that should not be processed. Other times, maybe the job started and loaded a few rows into the table with a UPI defined. Loading them again will cause an error. So, you can skip over them using this option.

The following example will use a Record (DATA) Mode format. The input of the imported data will populate the Employee_Table.

Page 6 of 91


Figure 2-9

From the above example, a number of BTEQ commands were added to the import script. Below is a review of those commands. .QUIET ON limits BTEQ output to reporting only errors and request processing statistics. Note: Be careful how

you spell .QUIET, else forgetting the E becomes .QUIT and it will. .REPEAT * causes BTEQ to read a specified number of records or until EOF. The default is one record. Using

REPEAT 10 would perform the loop 10 times. The USING defines the input data fields and their associated data types coming from the host.

The following builds upon the IMPORT Record (DATA) example above. The example below will still utilize the Record (DATA) Mode format. However, this script will add a CREATE TABLE statement. In addition, the imported data will populate the newly created Employee_Profile table.

Page 7 of 91



Figure 2-10

Notice that some of the scripts have a .LOGOFF and .QUIT. The .LOGOFF is optional because when BTEQ quits, the session is terminated. A logoff makes it a friendly departure and also allows you to logon with a different user name and password.Determining Out Record LengthsSome hosts, such as IBM mainframes, require the correct LRECL (Logical Record Length) parameter in the JCL, and will abort if the value is incorrect. The following page will discuss how to figure out the record lengths.

There are three issues involving record lengths and they are: Fixed columns Variable columns NULL indicators

Fixed Length Columns: For fixed length columns you merely count the length of the column. The lengths are:

INTEGER 4 bytes

SMALLINT 2 bytes

BYTEINT 1 byte

CHAR(10) 10 bytes

CHAR(4) 4 bytes

DATE 4 bytes

DECIMAL(7,2) 4 bytes (packed data, total digits / 2 +1)

DECIMAL(12,2) 8 bytes

Page 8 of 91



Variable columns: Variable length columns should be calculated as the maximum value plus two. This two bytes is for the number of bytes for the binary length of the field. In reality you can save much space because trailing blanks are not kept. The logical record will assume the maximum and add two bytes as a length field per column.

VARCHAR(8) 10 bytes

VARCHAR(10) 12 bytes

Indicator columns: As explained earlier, the indicators utilize a single bit for each field. If your record has 8 fields (which require 8 bits), then you add one extra byte to the total length of all the fields. If your record has 9-16 fields, then add two bytes.

BTEQ Return CodesReturn codes are two-digit values that BTEQ returns to the user after completing each job or task. The value of the return code indicates the completion status of the job or task as follows:

Return Code Descirption 00 Job completed with no errors. 02 User alert to log on to the Teradata DBS. 04 Warning error. 08 User error. 12 Severe internal error.

You can over-ride the standard error codes at the time you terminate BTEQ. This might be handy for debug purposes. The error code or "return code" can be any number you specify using one of the following:

Override Code Description .QUIT 15 .EXIT 15

BTEQ CommandsThe BTEQ commands in Teradata are designed for flexibility. These commands are not used directly on the data inside the tables. However, these 60 different BTEQ commands are utilized in four areas. Session Control Commands File Control Commands Sequence Control Commands Format Control Commands

Page 9 of 91


Session Control Commands

Figure 2-11

File Control CommandsThese BTEQ commands are used to specify the formatting parameters of incoming and outgoing information. This includes identifying sources and determining I/O streams.

Page 10 of 91



Figure 2-12

Sequence Control CommandsThese commands control the sequence in which Teradata commands operate.

Figure 2-13

Format Control CommandsThese commands control the formatting for Teradata and present the data in a report mode to the screen or printer.

Page 11 of 91



Figure 2-14

Page 12 of 91

http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D14a%5F0%2Ejpg&image_id=14&previd=IMG_14http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D14b%5F0%2Ejpg&image_id=15&previd=IMG_15&titlelabel=Figure+2%2D14%3A+


Chapter 3: FastExportAn Introduction to FastExport

Why it is called "FAST" ExportFastExport is known for its lightning speed when it comes to exporting vast amounts of data from Teradata and transferring the data into flat files on either a mainframe or network-attached computer. In addition, FastExport has the ability to except OUTMOD routines, which provides the user the capability to write, select, validate, and preprocess the exported data. Part of this speed is achieved because FastExport takes full advantage of Teradata's parallelism.

In this book, we have already discovered how BTEQ can be utilized to export data from Teradata in a variety of formats. As the demand increases to store data, the ever-growing requirement for tools to export massive amounts of data.

This is the reason why FastExport (FEXP) is brilliant by design. A good rule of thumb is that if you have more than half a million rows of data to export to either a flat file format or with NULL indicators, then FastExport is the best choice to accomplish this task.

Keep in mind that FastExport is designed as a one-way utility-that is, the sole purpose of FastExport is to move data out of Teradata. It does this by harnessing the parallelism that Teradata provides.

FastExport is extremely attractive for exporting data because it takes full advantage of multiple sessions, which leverages Teradata parallelism. FastExport can also export from multiple tables during a single operation. In addition, FastExport utilizes the Support Environment, which provides a job restart capability from a checkpoint if an error occurs during the process of executing an export job.

How FastExport WorksWhen FastExport is invoked, the utility logs onto the Teradata database and retrieves the rows that are specified in the SELECT statement and puts them into SPOOL. From there, it must build blocks to send back to the client. In comparison, BTEQ starts sending rows immediately for storage into a file.

If the output data is sorted, FastExport may be required to redistribute the selected data two times across the AMP processors in order to build the blocks in the correct sequence. Remember, a lot of rows fit into a 64K block and both the rows and the blocks must be sequenced. While all of this redistribution is occurring, BTEQ continues to send rows. FastExport is getting behind in the processing. However, when FastExport starts sending the rows back a block at a time, it quickly overtakes and passes BTEQ's row at time processing.

The other advantage is that if BTEQ terminates abnormally, all of your rows (which are in SPOOL) are discarded. You must rerun the BTEQ script from the beginning. However, if FastExport terminates abnormally, all the selected rows are in worktables and it can continue sending them where it left off. Pretty smart and very fast!

Also, if there is a requirement to manipulate the data before storing it on the computer's hard drive, an OUTMOD routine can be written to modify the result set after it is sent back to the client on either the mainframe or LAN. Just like the BASF commercial states, "We don't make the products you buy, we make the products you buy better". FastExport is designed off the same premise, it does not make the SQL SELECT statement faster, but it does take the SQL SELECT statement and processes the request with lighting fast parallel processing!

FastExport Fundamentals#1: FastExport EXPORTS data from Teradata. The reason they call it FastExport is because it takes data off of Teradata (Exports Data). FastExport does not import data into Teradata. Additionally, like BTEQ it can output multiple files in a single run.

#2: FastExport only supports the SELECT statement. The only DML statement that FastExport understands is SELECT. You SELECT the data you want exported and FastExport will take care of the rest.

#3: Choose FastExport over BTEQ when Exporting Data of more than half a million+ rows. When a large amount of data is being exported, FastExport is recommended over BTEQ Export. The only drawback is the total number of FastLoads, FastExports, and MultiLoads that can run at the same time, which is limited to 15. BTEQ Export

Page 13 of 91


does not have this restriction. Of course, FastExport will work with less data, but the speed may not be much faster than BTEQ.

#4: FastExport supports multiple SELECT statements and multiple tables in a single run. You can have multiple SELECT statements with FastExport and each SELECT can join information up to 64 tables.

#5: FastExport supports conditional logic, conditional expressions, arithmetic calculations, and data conversions. FastExport is flexible and supports the above conditions, calculations, and conversions.

#6: FastExport does NOT support error files or error limits. FastExport does not record particular error types in a table. The FastExport utility will terminate after a certain number of errors have been encountered.

#7: FastExport supports user-written routines INMODs and OUTMODs. FastExport allows you write INMOD and OUTMOD routines so you can select, validate and preprocess the exported data

FastExport Supported Operating SystemsThe FastExport utility is supported on either the mainframe or on LAN. The information below illustrates which operating systems are supported for each environment:

The LAN environment supports the following Operating Systems: UNIX MP-RAS Windows 2000 Windows 95 Windows NT UNIX HP-UX AIX Solaris SPARC Solaris Intel

The Mainframe (Channel Attached) environment supports the following Operating Systems: MVS VM

Maximum of 15 LoadsThe Teradata RDBMS will only support a maximum of 15 simultaneous FastLoad, MultiLoad, or FastExport utility jobs. This maximum value is determined and configured by the DBS Control record. This value can be set from 0 to 15. When Teradata is initially installed, this value is set at 5.

The reason for this limitation is that FastLoad, MultiLoad, and FastExport all use large blocks to transfer data. If more then 15 simultaneous jobs were supported, a saturation point could be reached on the availability of resources. In this case, Teradata does an excellent job of protecting system resources by queuing up additional FastLoad, MultiLoad, and FastExport jobs that are attempting to connect.

For example, if the maximum numbers of utilities on the Teradata system is reached and another job attempts to run that job does not start. This limitation should be viewed as a safety control feature. A tip for remembering how the load limit applies is this, "If the name of the load utility contains either the word 'Fast' or the word 'Load', then there can be only a total of fifteen of them running at any one time".

BTEQ does not have this load limitation. FastExport is clearly the better choice when exporting data. However, if two many load jobs are running. BTEQ is an alternate choice for exporting data.FastExport Support and Task CommandsFastExport accepts both FastExport commands and a subset of SQL statements. The FastExport commands can be broken down into support and task activities. The table below highlights the key FastExport commands and their definitions. These commands provide flexibility and control during the export process.

Page 14 of 91


Support Environment Commands (see Support Environment chapter for details)

Figure 3-1

Task Commands

Figure 3-2FastExport Supported SQL CommandsFastExport accepts the following Teradata SQL statements. Each has been placed in alphabetic order for your convenience.

Page 15 of 91



SQL Commands

Figure 3-3A FastExport in its Simplest FormThe hobby of racecar driving can be extremely frustrating, challenging, and rewarding all at the same time. I always remember my driving instructor coaching me during a practice session in a new car around a road course racetrack. He said to me, "Before you can learn to run, you need to learn how to walk." This same philosophy can be applied when working with FastExport. If FastExport is broken into steps, then several things that appear to be complicated are really very simple. With this being stated, FastExport can be broken into the following steps: Logging onto Teradata Retrieves the rows you specify in your SELECT statement Exports the data to the specified file or OUTMOD routine Logs off of Teradata

Page 16 of 91



Figure 3-4Sample FastExport ScriptNow that the first steps have been taken to understand FastExport, the next step is to journey forward and review another example that shows builds upon what we have learned. In the script below, Teradata comment lines have been placed inside the script [/*. */]. In addition, FastExport and SQL commands are written in upper case in order to highlight them. Another note is that the column names are listed vertically. The recommendation is to place the comma separator in front of the following column. Coding this way makes reading or debugging the script easier to accomplish.

Figure 3-5 FastExport Modes and Formats

FastExport ModesFastExport has two modes: RECORD or INDICATOR. In the mainframe world, only use RECORD mode. In the UNIX or LAN environment, RECORD mode is the default, but you can use INDICATOR mode if desired. The difference between the two modes is INDICATOR mode will set the indicator bits to 1 for column values containing NULLS.

Page 17 of 91



Both modes return data in a client internal format with variable-length records. Each individual record has a value for all of the columns specified by the SELECT statement. All variable-length columns are preceded by a two-byte control value indicating the length of the column data. NULL columns have a value that is appropriate for the column data type. Remember, INDICATOR mode will set bit flags that identify the columns that have a null value.

FastExport FormatsFastExport has many possible formats in the UNIX or LAN environment. The FORMAT statement specifies the format for each record being exported which are: FASTLOAD BINARY TEXT UNFORMAT

The default FORMAT is FASTLOAD in a UNIX or LAN environment.

FASTLOAD Format is a two-byte integer, followed by the data, followed by an end-of- record marker. It is called FASTLOAD because the data is exported in a format ready for FASTLOAD.

BINARY Format is a two-byte integer, followed by data.

TEXT is an arbitrary number of bytes followed by an end-of-record marker.

UNFORMAT is exported as it is received from CLIv2 without any client modifications.

A FastExport Script using Binary Mode

Figure 3-6

Page 18 of 91



Chapter 4: FastLoadAn Introduction to FastLoad

Why it is called "FAST" LoadFastLoad is known for its lightning-like speed in loading vast amounts of data from flat files from a host into empty tables in Teradata. Part of this speed is achieved because it does not use the Transient Journal. You will see some more of the reasons enumerated below. But, regardless of the reasons that it is fast, know that FastLoad was developed to load millions of rows into a table.

The way FastLoad works can be illustrated by home construction, of all things! Let's look at three scenarios from the construction industry to provide an amazing picture of how the data gets loaded.

Scenario One: Builders prefer to start with an empty lot and construct a house on it, from the foundation right on up to the roof. There is no pre-existing construction, just a smooth, graded lot. The fewer barriers there are to deal with, the quicker the new construction can progress. Building custom or spec houses this way is the fastest way to build them. Similarly, FastLoad likes to start with an empty table, like an empty lot, and then populate it with rows of data from another source. Because the target table is empty, this method is typically the fastest way to load data. FastLoad will never attempt to insert rows into a table that already holds data.

Scenario Two: The second scenario in this analogy is when someone buys the perfect piece of land on which to build a home, but the lot already has a house on it. In this case, the person may determine that it is quicker and more advantageous just to demolish the old house and start fresh from the ground up-allowing for brand new construction. FastLoad also likes this approach to loading data. It can just 1) drop the existing table, which deletes the rows, 2) replace its structure, and then 3) populate it with the latest and greatest data. When dealing with huge volumes of new rows, this process will run much quicker than using MultiLoad to populate the existing table. Another option is to DELETE all the data rows from a populated target table and reload it. This requires less updating of the Data Dictionary than dropping and recreating a table. In either case, the result is a perfectly empty target table that FastLoad requires!

Scenario Three: Sometimes, a customer has a good house already but wants to remodel a portion of it or to add an additional room. This kind of work takes more time than the work described in Scenario One. Such work requires some tearing out of existing construction in order to build the new section. Besides, the builder never knows what he will encounter beneath the surface of the existing home. So you can easily see that remodeling or additions can take more time than new construction. In the same way, existing tables with data may need to be updated by adding new rows of data. To load populated tables quickly with large amounts of data while maintaining the data currently held in those tables, you would choose MultiLoad instead of FastLoad. MultiLoad is designed for this task but, like renovating or adding onto an existing house, it may take more time.

How FastLoad WorksWhat makes FastLoad perform so well when it is loading millions or even billions of rows? It is because FastLoad assembles data into 64K blocks (64,000 bytes) to load it and can use multiple sessions simultaneously, taking further advantage of Teradata's parallel processing.

This is different from BTEQ and TPump, which load data at the row level. It has been said, "If you have it, flaunt it!" FastLoad does not like to brag, but it takes full advantage of Teradata's parallel architecture. In fact, FastLoad will create a Teradata session for each AMP (Access Module Processor-the software processor in Teradata responsible for reading and writing data to the disks) in order to maximize parallel processing. This advantage is passed along to the FastLoad user in terms of awesome performance. Teradata is the only data warehouse product in the world that loads data, processes data and backs up data in parallel.

FastLoad Has Some LimitsThere are more reasons why FastLoad is so fast. Many of these become restrictions and therefore, cannot slow it down. For instance, can you imagine a sprinter wearing cowboy boots in a race? Of course, not! Because of its speed, FastLoad, too, must travel light! This means that it will have limitations that may or may not apply to other load utilities. Remembering this short list will save you much frustration from failed loads and angry colleagues. It may even foster your reputation as a smooth operator!

Page 19 of 91


Rule #1: No Secondary Indexes are allowed on the Target Table. High performance will only allow FastLoad to utilize Primary Indexes when loading. The reason for this is that Primary (UPI and NUPI) indexes are used in Teradata to distribute the rows evenly across the AMPs and build only data rows. A secondary index is stored in a subtable block and many times on a different AMP from the data row. This would slow FastLoad down and they would have to call it: get ready now, HalfFastLoad. Therefore, FastLoad does not support them. If Secondary Indexes exist already, just drop them. You may easily recreate them after completing the load.

Rule #2: No Referential Integrity is allowed. FastLoad cannot load data into tables that are defined with Referential Integrity (RI). This would require too much system checking to prevent referential constraints to a different table. FastLoad only does one table. In short, RI constraints will need to be dropped from the target table prior to the use of FastLoad.

Rule #3: No Triggers are allowed at load time. FastLoad is much too focused on speed to pay attention to the needs of other tables, which is what Triggers are all about. Additionally, these require more than one AMP and more than one table. FastLoad does one table only. Simply ALTER the Triggers to the DISABLED status prior to using FastLoad.

Rule #4: Duplicate Rows (in Multi-Set Tables) are not supported. Multiset tables are tables that allow duplicate rows-that is when the values in every column are identical. When FastLoad finds duplicate rows, they are discarded. While FastLoad can load data into a multi-set table, FastLoad will not load duplicate rows into a multi-set table because FastLoad discards duplicate rows!

Rule #5: No AMPs may go down (i.e., go offline) while FastLoad is processing. The down AMP must be repaired before the load process can be restarted. Other than this, FastLoad can recover from system glitches and perform restarts. We will discuss Restarts later in this chapter.

Rule #6: No more than one data type conversion is allowed per column during a FastLoad. Why just one? Data type conversion is highly resource intensive job on the system, which requires a "search and replace" effort. And that takes more time. Enough said!

Three Key Requirements for FastLoad to RunFastLoad can be run from either MVS/ Channel (mainframe) or Network (LAN) host. In either case, FastLoad requires three key components. They are a log table, an empty target table and two error tables. The user must name these at the beginning of each script.

Log Table: FastLoad needs a place to record information on its progress during a load. It uses the table called Fastlog in the SYSADMIN database. This table contains one row for every FastLoad running on the system. In order for your FastLoad to use this table, you need INSERT, UPDATE and DELETE privileges on that table.

Empty Target Table: We have already mentioned the absolute need for the target table to be empty. FastLoad does not care how this is accomplished. After an initial load of an empty target table, you are now looking at a populated table that will likely need to be maintained.

If you require the phenomenal speed of FastLoad, it is usually preferable, both for the sake of speed and for less interaction with the Data Dictionary, just to delete all the rows from that table and then reload it with fresh data. The syntax DELETE . should be used for this. But sometimes, as in some of our FastLoad sample scripts below (see Figure 4-1), you want to drop that table and recreate it versus using the DELETE option. To do this, FastLoad has the ability to run the DDL statements DROP TABLE and CREATE TABLE. The problem with putting DDL in the script is that is no longer restartable and you are required to rerun the FastLoad from the beginning. Otherwise, we recommend that you have a script for an initial run and a different script for a restart.

Page 20 of 91

http://www.books24x7.com/book/id_5565/viewer_r.asp?bookid=5565&chunkid=856445705#ch04fig01#ch04fig01


Page 21 of 91

http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig4%2D1a%5F0%2Ejpg&image_id=22&previd=IMG_22http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig4%2D1b%5F0%2Ejpg&image_id=23&previd=IMG_23


Figure 4-1

Two Error Tables: Each FastLoad requires two error tables. These are error tables that will only be populated should errors occur during the load process. These are required by the FastLoad utility, which will automatically create them for you; all you must do is to name them. The first error table is for any translation errors or constraint violations. For example, a row with a column containing a wrong data type would be reported to the first error table. The second error table is for errors caused by duplicate values for Unique Primary Indexes (UPI). FastLoad will load just one occurrence for every UPI. The other occurrences will be stored in this table. However, if the entire row is a duplicate, FastLoad counts it but does not store the row. These tables may be analyzed later for troubleshooting should errors occur during the load. For specifics on how you can troubleshoot, see the section below titled, "What Happens When FastLoad Finishes."

Maximum of 15 LoadsThe Teradata RDBMS will only run a maximum number of fifteen FastLoads, MultiLoads, or FastExports at the same time. This maximum is determined by a value stored in the DBS Control record. It can be any value from 0 to 15. When Teradata is first installed, this value is set to 5 concurrent jobs.

Since these utilities all use the large blocking of rows, it hits a saturation point where Teradata will protect the amount system resources available by queuing up the extra load. For example, if the maximum number of jobs are currently running on the system and you attempt to run one more, that job will not be started. You should view this limit as a safety control. Here is a tip for remembering how the load limit applies: If the name of the load utility contains either the word "Fast" or the word "Load", then there can be only a total of fifteen of them running at any one time.

FastLoad Has Two PhasesTeradata is famous for its end-to-end use of parallel processing. Both the data and the tasks are divided up among the AMPs. Then each AMP tackles its own portion of the task with regard to its portion of the data. This same "divide and conquer" mentality also expedites the load process. FastLoad divides its job into two phases, both designed for speed. They have no fancy names but are typically known simply as Phase 1 and Phase 2. Sometimes they are referred to as Acquisition Phase and Application Phase.

PHASE 1: AcquisitionThe primary function of Phase 1 is to transfer data from the host computer to the Access Module Processors (AMPs) as quickly as possible. For the sake of speed, the Parsing Engine of Teradata does not does not take the time to hash each row of data based on the Primary Index. That will be done later. Instead, it does the following:

When the Parsing Engine (PE) receives the INSERT command, it uses one session to parse the SQL just once. The PE is the Teradata software processor responsible for parsing syntax and generating a plan to execute the request. It then opens a Teradata session from the FastLoad client directly to the AMPs. By default, one session is created for each AMP. Therefore, on large systems, it is normally a good idea to limit the number of sessions using the SESSIONS command. This capability is shown below.

Simultaneously, all but one of the client sessions begins loading raw data in 64K blocks for transfer to an AMP. The first priority of Phase 1 is to get the data onto the AMPs as fast as possible. To accomplish this, the rows are packed,

Page 22 of 91

http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig4%2D1c%5F0%2Ejpg&image_id=24&previd=IMG_24&titlelabel=Figure+4%2D1%3A+


unhashed, into large blocks and sent to the AMPs without any concern for which AMP gets the block. The result is that data rows arrive on different AMPs than those they would live, had they been hashed.

So how do the rows get to the correct AMPs where they will permanently reside? Following the receipt of every data block, each AMP hashes its rows based on the Primary Index, and redistributes them to the proper AMP. At this point, the rows are written to a worktable on the AMP but remain unsorted until Phase 1 is complete.

Phase 1 can be compared loosely to the preferred method of transfer used in the parcel shipping industry today. How do the key players in this industry handle a parcel? When the shipping company receives a parcel, that parcel is not immediately sent to its final destination. Instead, for the sake of speed, it is often sent to a shipping hub in a seemingly unrelated city. Then, from that hub it is sent to the destination city. FastLoad's Phase 1 uses the AMPs in much the same way that the shipper uses its hubs. First, all the data blocks in the load get rushed randomly to any AMP. This just gets them to a "hub" somewhere in Teradata country. Second, each AMP forwards them to their true destination. This is like the shipping parcel being sent from a hub city to its destination city!

PHASE 2: ApplicationFollowing the scenario described above, the shipping vendor must do more than get a parcel to the destination city. Once the packages arrive at the destination city, they must then be sorted by street and zip code, placed onto local trucks and be driven to their final, local destinations.

Similarly, FastLoad's Phase 2 is mission critical for getting every row of data to its final address (i.e., where it will be stored on disk). In this phase, each AMP sorts the rows in its worktable. Then it writes the rows into the table space on disks where they will permanently reside. Rows of a table are stored on the disks in data blocks. The AMP uses the block size as defined when the target table was created. If the table is Fallback protected, then the Fallback will be loaded after the Primary table has finished loading. This enables the Primary table to become accessible as soon as possible. FastLoad is so ingenious, no wonder it is the darling of the Teradata load utilities!

FastLoad CommandsHere is a table of some key FastLoad commands and their definitions. They are used to provide flexibility in control of the load process. Consider this your personal redireference guide! You will notice that there are only a few SQL commands that may be used with this utility (Create Table, Drop Table, Delete and Insert). This keeps FastLoad from becoming encumbered with additional functions that would slow it downA FastLoad Example in its Simplest FormThe load utilities often scare people because there are many things that appear complicated. In actuality, the load scripts are very simple. Think of FastLoad as: Logging onto Teradata Defining the Teradata table that you want to load (target table) Defining the INPUT data file Telling the system to start loading Telling the system to start loading

This first script example is designed to show FastLoad in its simplest form. The actual script is in the left column and our comments are on the right.

Page 23 of 91


Figure 4-2Sample FastLoad ScriptLet's look at an actual FastLoad script that you might see in the real world. In the script below, every comment line is placed inside the normal Teradata comment syntax, [/*.*/]. FastLoad and SQL commands are written in upper case in order to make them stand out. In reality, Teradata utilities, like Teradata itself, are by default not case sensitive. You will also note that when column names are listed vertically we recommend placing the comma separator in front of the following column. Coding this way makes reading or debugging the script easier for everyone. The purpose of this script is to update the Employee_Profile table in the SQL01 database. The input file used for the load is named EMPS.TXT. Below the sample script each step will be described in detail.

Normally it is not a good idea to put the DROP and CREATE statements in a FastLoad script. The reason is that when any of the tables that FastLoad is using are dropped, the script cannot be restarted. It can only be rerun from the beginning. Since FastLoad has restart logic built into it, a restart is normally the better solution if the initial load attempt should fail. However, for purposes of this example, it shows the table structure and the description of the data being read.

Page 24 of 91



Figure 4-4 Step One: Before logging onto Teradata, it is important to specify how many sessions you need. The syntax is

[SESSIONS {n}]. Step Two: Next, you LOGON to the Teradata system. You will quickly see that the utility commands in FastLoad

are similar to those in BTEQ. FastLoad commands were designed from the underlying commands in BTEQ. However, unlike BTEQ, most of the FastLoad commands do not allow a dot ["."] in front of them and therefore need a semicolon. At this point we chose to have Teradata tell us which version of FastLoad is being used for the load. Why would we recommend this? We do because as FastLoad's capabilities get enhanced with newer versions, the syntax of the scripts may have to be revisited.

Step Three: If the input file is not a FastLoad format, before you describe the INPUT FILE structure in the DEFINE statement, you must first set the RECORD layout type for the file being passed by FastLoad. We have used VARTEXT in our example with a comma delimiter. The other options are FastLoad, TEXT, UNFORMATTED OR VARTEXT. You need to know this about your input file ahead of time.

Step Four: Next, comes the DEFINE statement. FastLoad must know the structure and the name of the flat file to be used as the input FILE, or source file for the load.

Step Five: FastLoad makes no assumptions from the DROP TABLE statements with regard to what you want loaded. In the BEGIN LOADING statement, the script must name the target table and the two error tables for the load. Did you notice that there is no CREATE TABLE statement for the error tables in this script? FastLoad will automatically create them for you once you name them in the script. In this instance, they are named "Emp_Err1" and "Emp_Err2". Phase 1 uses "Emp_Err1" because it comes first and Phase 2 uses "Emp_Err2". The names are arbitrary, of course. You may call them whatever you like. At the same time, they must be unique within a database, so using a combination of your userid and target table name helps insure this uniqueness between multiple FastLoad jobs occurring in the same database.

In the BEGIN LOADING statement we have also included the optional CHECKPOINT parameter. We included [CHECKPOINT 100000]. Although not required, this optional parameter performs a vital task with regard to the load. In the old days, children were always told to focus on the three 'R's' in grade school ("reading, riting, and rithmatic"). There are two very different, yet equally important, R's to consider whenever you run FastLoad. They are RERUN and RESTART. RERUN means that the job is capable of running all the processing again from the beginning of the load. RESTART means that the job is capable of running the processing again from the point

Page 25 of 91



where it left off when the job was interrupted, causing it to fail. When CHECKPOINT is requested, it allows FastLoad to resume loading from the first row following the last successful CHECKPOINT. We will learn more about CHECKPOINT in the section on Restarting FastLoad.

Step Six: FastLoad focuses on its task of loading data blocks to AMPs like little Yorkshire terrier's do when playing with a ball! It will not stop unless you tell it to stop. Therefore, it will not proceed to Phase 2 without the END LOADING command.

In reality, this provides a very valuable capability for FastLoad. Since the table must be empty at the start of the job, it prevents loading rows as they arrive from different time zones. However, to accomplish this processing, simply omit the END LOADING on the load job. Then, you can run the same FastLoad multiple times and continue loading the worktables until the last file is received. Then run the last FastLoad job with an END LOADING and you have partitioned your load jobs into smaller segments instead of one huge job. This makes FastLoad even faster!

Of course to make this work, FastLoad must be restartable. Therefore, you cannot use the DROP or CREATE commands within the script. Additionally, every script is exactly the same with the exception of the last one, which contains the END LOADING causing FastLoad to proceed to Phase 2. That's a pretty clever way to do a partitioned type of data load.

Step Seven: All that goes up must come down. And all the sessions must LOGOFF. This will be the last utility command in your script. At this point the table lock is released and if there are no rows in the error tables, they are dropped automatically. However, if a single row is in one of them, you are responsible to check it, take the appropriate action and drop the table manually.

Converting Data Types with FastLoadConverting data is easy. Just define the input data types in the input file. Then, FastLoad will compare that to the column definitions in the Data Dictionary and convert the data for you! But the cardinal rule is that only one data type conversion is allowed per column. In the example below, notice how the columns in the input file are converted from one data type to another simply by redefining the data type in the CREATE TABLE statement.

FastLoad allows six kinds of data conversions. Here is a chart that displays them:

Figure 4-5

When we said that converting data is easy, we meant that it is easy for the user. It is actually quite resource intensive, thus increasing the amount of time needed for the load. Therefore, if speed is important, keep the number of columns being converted to a minimum!A FastLoad Conversion ExampleThis next script example is designed to show how FastLoad converts data automatically when the INPUT data type differs from the Target Teradata Table data type. The actual script is in the left column and our comments are on the right.

Page 26 of 91



Figure 4-5When You Cannot RESTART FastLoadThere are two types of FastLoad scripts: those that you can restart and those that you cannot without modifying the script. If any of the following conditions are true of the FastLoad script that you are dealing with, it is NOT restartable: The Error Tables are DROPPED The Target Table is DROPPED The Target Table is CREATED

Can you tell from the following sample fastLoad script why it is not restartable?

Figure 4-7

Page 27 of 91



Why might you have to RESTART a FastLoad job, anyway? Perhaps you might experience a system reset or some glitch that stops the job one half way through it. Maybe the mainframe went down. Well, it is not really a big deal because FastLoad is so lightning-fast that you could probably just RERUN the job for small data loads.

However, when you are loading a billion rows, this is not a good idea because it wastes time. So the most common way to deal with these situations is simply to RESTART the job. But what if the normal load takes 4 hours, and the glitch occurs when you already have two thirds of the data rows loaded? In that case, you might want to make sure that the job is totally restartable. Let's see how this is done.When You Can RESTART FastLoadIf all of the following conditions are true, then FastLoad is ALWAYS restartable: The Error Tables are NOT DROPPED in the script The Target Table is NOT DROPPED in the script The Target Table is NOT CREATED in the script You have defined a checkpoint

So, if you need to drop or create tables, do it in a separate job using BTEQ. Imagine that you have a table whose data changes so much that you typically drop it monthly and build it again. Let's go back to the script we just reviewed above and see how we can break it into the two parts necessary to make it fully RESTARTABLE. It is broken up below.

STEP ONE: Run the following SQL statements in Queryman or BTEQ before you start FastLoad:

Figure 4-8

First, you ensure that the target table and error tables, if they existed previously, are blown away. If there had been no errors in the error tables, they would be automatically dropped. If these tables did not exist, you have not lost anything. Next, if needed, you create the empty table structure needed to receive a FastLoad.

STEP TWO: Run the FastLoad scriptThis is the portion of the earlier script that carries out these vital steps: Defines the structure of the flat file Tells FastLoad where to load the data and store the errors Specifies the checkpoint so a RESTART will not go back to row one Loads the data

If these are true, all you need do is resubmit the FastLoad job and it starts loading data again with the next record after the last checkpoint. Now, with that said, if you did not request a checkpoint, the output message will normally indicate how many records were loaded.

You may optionally use the RECORD command to manually restart on the next record after the one indicated in the message.

Now, if the FastLoad job aborts in Phase 2, you can simply submit a script with only the BEGIN LOADING and END LOADING. It will then restart right into Phase 2.What Happens When FastLoad Finishes

You Receive an Outcome StatusThe most important thing to do is verify that FastLoad completed successfully. This is accomplished by looking at the last output in the report and making sure that it is a return code or status code of zero (0). Any other value indicates that something wasn't perfect and needs to be fixed.

Page 28 of 91



The locks will not be removed and the error tables will not be dropped without a successful completion. This is because FastLoad assumes that it will need them for its restart. At the same time, the lock on the target table will not be released either. When running FastLoad, you realistically have two choices once it is started. First choice is that you get it to run to a successful completion, or lastly, rerun it from the beginning. As you can imagine, the best course of action is normally to get it to finish successfully via a restart.

You Receive a Status ReportWhat happens when FastLoad finishes running? Well, you can expect to see a summary report on the success of the load. Following is an example of such a report.

Figure 4-9

The first line displays the total number of records read from the input file. Were all of them loaded? Not really. The second line tells us that there were fifty rows with constraint violations, so they were not loaded. Corresponding to this, fifty entries were made in the first error table. Line 3 shows that there were zero entries into the second error table, indicating that there were no duplicate Unique Primary Index violations. Line 4 shows that there were 999950 rows successfully loaded into the empty target table. Finally, there were no duplicate rows. Had there been any duplicate rows, the duplicates would only have been counted. They are not stored in the error tables anywhere. When FastLoad reports on its efforts, the number of rows in lines 2 through 5 should always total the number of records read in line 1.

Note on duplicate rows: Whenever FastLoad experiences a restart, there will normally be duplicate rows that are counted. This is due to the fact that a error seldom occurs on a checkpoint (quiet or quiescent point) when nothing is happening within FastLoad. Therefore, some number of rows will be sent to the AMPs again because the restart starts on the next record after the value stored in the checkpoint. Hence, when a restart occurs, the first row after the checkpoint and some of the consecutive rows are sent a second time. These will be caught as duplicate rows after the sort. This restart logic is the reason that FastLoad will not load duplicate rows into a MULTISET table. It assumes they are duplicates because of this logic.

You can TroubleshootIn the example above, we know that the load was not entirely successful. But that is not enough. Now we need to troubleshoot in order identify the errors and correct them. FastLoad generates two error tables that will enable us to find the culprits. The first error table, which we named Errorfile1, contains just three columns: The column ErrorCode contains the Teradata FastLoad code number to a corresponding translation or constraint error. The second column, named ErrorField, specifies which column in the table contained the error. The third column, DataParcel, contains the row with the problem. Both error tables contain the same three columns; they just track different types of errors.

As a user, you can select from either error table. To check errors in Errorfile1 you would use this syntax: SELECT DISTINCT ErrorCode, Errorfieldname FROM Errortable1;Corrected rows may be inserted to the target table using another utility that does not require an empty table.

To check errors in Errorfile2 you would the following syntax: SELECT * FROM Errortable2;The definition of the second error table is exactly the same as the target table with all the same columns and data types.Restarting FastLoad: A More In-Depth Look

How the CHECKPOINT Option WorksCHECKPOINT option defines the points in a load job where the FastLoad utility pauses to record that Teradata has processed a specified number of rows. When the parameter "CHECKPOINT [n]" is included in the BEGIN LOADING clause the system will stop loading momentarily at increments of [n] rows.

Page 29 of 91



At each CHECKPOINT, the AMPs will all pause and make sure that everything is loading smoothly. Then FastLoad sends a checkpoint report (entry) to the SYSADMIN.Fastlog table. This log contains a list of all currently running FastLoad jobs and the last successfully reached checkpoint for each job. Should an error occur that requires the load to restart, FastLoad will merely go back to the last successfully reported checkpoint prior to the error. It will then restart from the record immediately following that checkpoint and start building the next block of data to load. If such an error occurs in Phase 1, with CHECKPOINT 0, FastLoad will always restart from the very first row.

Restarting with CHECKPOINTSometimes you may need to restart FastLoad. If the FastLoad script requests a CHECKPOINT (other than 0), then it is restartable from the last successful checkpoint. Therefore, if the job fails, simply resubmit the job. Here are the two options: Suppose Phase 1 halts prematurely; the Data Acquisition phase is incomplete. Resubmit the FastLoad script. FastLoad will begin from RECORD 1 or the first record past the last checkpoint. If you wish to manually specify where FastLoad should restart, locate the last successful checkpoint record by referring to the SYSADMIN.FASTLOG table. To specify where a restart will start from, use the RECORD command. Normally, it is not necessary to use the RECORD command-let FastLoad automatically determine where to restart from.

If the interruption occurs in Phase 2, the Data Acquisition phase has already completed. We know that the error is in the Application Phase. In this case, resubmit the FastLoad script with only the BEGIN and END LOADING Statements. This will restart in Phase 2 with the sort and building of the target table.

Restarting without CHECKPOINT (i.e., CHECKPOINT 0)When a failure occurs and the FastLoad Script did not utilize the CHECKPOINT (i.e., CHECKPOINT 0), one procedure is to DROP the target table and error tables and rerun the job. Here are some other options available to you:

1. Resubmit job again and hope there is enough PERM space for all the rows already sent to the unsorted target table plus all the rows that are going to be sent again to the same target table. Other than using space, these rows will be rejected as duplicates. As you can imagine, this is not the most efficient way since it processes many of the same rows twice.

2. If CHECKPOINT wasn't specified, then CHECKPOINT defaults to 100,000. You can perform a manual restart using the RECORD statement. If the output print file shows that checkpoint 100000 occurred, use something like the following command: [RECORD 100001;]. This statement will skip records 1 through 10000 and resume on record 100001.

Using INMODs with FastLoadWhen you find that FastLoad does not read the file type you have or you wish to control the access for any reason, then it might be desirable to use an INMOD. An INMOD (Input Module), is fully compatible with FastLoad in either mainframe or LAN environments, providing that the appropriate programming languages are used. However, INMODs replace the normal mainframe DDNAME or LAN defined FILE name with the following statement: DEFINE INMOD=. For a more in- depth discussion of INMODs, see the chapter of this book titled, "INMOD Processing".

Page 30 of 91


Chapter 5: MultiLoadAn Introduction to MultiLoad

Why it is called "Multi" LoadIf we were going to be stranded on an island with a Teradata Data Warehouse and we could only take along one Teradata load utility, clearly, MultiLoad would be our choice. MultiLoad has the capability to load multiple tables at one time from either a LAN or Channel environment. This is in stark contrast to its fleet-footed cousin, FastLoad, which can only load one table at a time. And it gets better, yet!

This feature rich utility can perform multiple types of DML tasks, including INSERT, UPDATE, DELETE and UPSERT on up to five (5) empty or populated target tables at a time. These DML functions may be run either solo or in combinations, against one or more tables. For these reasons, MultiLoad is the utility of choice when it comes to loading populated tables in the batch environment. As the volume of data being loaded or updated in a single block, the performance of MultiLoad improves. MultiLoad shines when it can impact more than one row in every data block. In other words, MultiLoad looks at massive amounts of data and says, "Bring it on!"

Leo Tolstoy once said, "All happy families resemble each other." Like happy families, the Teradata load utilities resemble each other, although they may have some differences. You are going to be pleased to find that you do not have to learn all new commands and concepts for each load utility. MultiLoad has many similarities to FastLoad. It has even more commands in common with TPump. The similarities will be evident as you work with them. Where there are some quirky differences, we will point them out for you.

Two MultiLoad Modes: IMPORT and DELETEMultiLoad provides two types of operations via modes: IMPORT and DELETE. In MultiLoad IMPORT mode, you have the freedom to "mix and match" up to twenty (20) INSERTs, UPDATEs or DELETEs on up to five target tables. The execution of the DML statements is not mandatory for all rows in a table. Instead, their execution hinges upon the conditions contained in the APPLY clause of the script. Once again, MultiLoad demonstrates its user-friendly flexibility. For UPDATEs or DELETEs to be successful in IMPORT mode, they must reference the Primary Index in the WHERE clause.

The MultiLoad DELETE mode is used to perform a global (all AMP) delete on just one table. The reason to use .BEGIN DELETE MLOAD is that it bypasses the Transient Journal (TJ) and can be RESTARTed if an error causes it to terminate prior to finishing. When performing in DELETE mode, the DELETE SQL statement cannot reference the Primary Index in the WHERE clause. This due to the fact that a primary index access is to a specific AMP; this is a global operation.

The other factor that makes a DELETE mode operation so good is that it examines an entire block of rows at a time. Once all the eligible rows have been removed, the block is written one time and a checkpoint is written. So, if a restart is necessary, it simply starts deleting rows from the next block without a checkpoint. This is a smart way to continue. Remember, when using the TJ all deleted rows are put back into the table from the TJ as a rollback. A rollback can take longer to finish then the delete. MultiLoad does not do a rollback; it does a restart.

Page 31 of 91

http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/figu54%5F1%5F0%2Ejpg&image_id=32&previd=IMG_32


In the above diagram, monthly data is being stored in a quarterly table. To keep the contents limited to four months, monthly data is rotated in and out. At the end of every month, the oldest month of data is removed and the new month is added. The cycle is "add a month, delete a month, add a month, delete a month." In our illustration, that means that January data must be deleted to make room for May's data.

Here is a question for you: What if there was another way to accomplish this same goal without consuming all of these extra resources? To illustrate, let's consider the following scenario: Suppose you have TableA that contains 12 billion rows. You want to delete a range of rows based on a date and then load in fresh data to replace these rows. Normally, the process is to perform a MultiLoad DELETE to DELETE FROM TableA WHERE < 2002-02-01'. The final step would be to INSERT the new rows for May using MultiLoad IMPORT.

Block and Tackle ApproachMultiLoad never loses sight of the fact that it is designed for functionality, speed, and the ability to restart. It tackles the proverbial I/O bottleneck problem like FastLoad by assembling data rows into 64K blocks and writing them to disk on the AMPs. This is much faster than writing data one row at a time like BTEQ. Fallback table rows are written after the base table has been loaded. This allows users to access the base table immediately upon completion of the MultiLoad while fallback rows are being loaded in the background. The benefit is reduced time to access the data.

Amazingly, MultiLoad has full RESTART capability in all of its five phases of operation. Once again, this demonstrates its tremendous flexibility as a load utility. Is it pure magic? No, but it almost seems so. MultiLoad makes effective use of two error tables to save different types of errors and a LOGTABLE that stores built-in checkpoint information for restarting. This is why MultiLoad does not use the Transient Journal, thus averting time-consuming rollbacks when a job halts prematurely.

Here is a key difference to note between MultiLoad and FastLoad. Sometimes an AMP (Access Module Processor) fails and the system administrators say that the AMP is "down" or "offline." When using FastLoad, you must restart the AMP to restart the job. MultiLoad, however, can RESTART when an AMP fails, if the table is fallback protected. As the same time, you can use the AMPCHECK option to make it work like FastLoad if you want.

MultiLoad Imposes LimitsRule #1: Unique Secondary Indexes are not supported on a Target Table. Like FastLoad, MultiLoad does not support Unique Secondary Indexes (USIs). But unlike FastLoad, it does support the use of Non-Unique Secondary Indexes (NUSIs) because the index subtable row is on the same AMP as the data row. MultiLoad uses every AMP independently and in parallel. If two AMPs must communicate, they are not independent. Therefore, a NUSI (same AMP) is fine, but a USI (different AMP) is not.

Rule #2: Referential Integrity is not supported. MultiLoad will not load data into tables that are defined with Referential Integrity (RI). Like a USI, this requires the AMPs to communicate with each other. So, RI constraints must be dropped from the target table prior to using MultiLoad.

Rule #3: Triggers are not supported at load time. Triggers cause actions on related tables based upon what happens in a target table. Again, this is a multi-AMP operation and to a different table. To keep MultiLoad running smoothly, disable all Triggers prior to using it.

Rule #4: No concatenation of input files is allowed. MultiLoad does not want you to do this because it could impact are restart if the files were concatenated in a different sequence or data was deleted between runs.

Rule #5: The host will not process aggregates, arithmetic functions or exponentiation. If you need data conversions or math, you might be better off using an INMOD to prepare the data prior to loading it.

Error Tables, Work Tables and Log TablesBesides target table(s), MultiLoad requires the use of four special tables in order to function. They consist of two error tables (per target table), one worktable (per target table), and one log table. In essence, the Error Tables will be used to store any conversion, constraint or uniqueness violations during a load. Work Tables are used to receive and sort data and SQL on each AMP prior to storing them permanently to disk. A Log Table (also called, "Logtable") is used to store successful checkpoints during load processing in case a RESTART is needed.

HINT: Sometimes a company wants all of these load support tables to be housed in a particular database. When these tables are to be stored in any database other than the user's own default database, then you must give them a

Page 32 of 91


qualified name (.) in the script or use the DATABASE command to change the current database.

Where will you find these tables in the load script? The Logtable is generally identified immediately prior to the .LOGON command. Worktables and error tables can be named in the BEGIN MLOAD statement. Do not underestimate the value of these tables. They are vital to the operation of MultiLoad. Without them a MultiLoad job can not run. Now that you have had the "executive summary", let's look at each type of table individually.

Two Error Tables: Here is another place where FastLoad and MultiLoad are similar. Both require the use of two error tables per target table. MultiLoad will automatically create these tables. Rows are inserted into these tables only when errors occur during the load process. The first error table is the acquisition Error Table (ET). It contains all translation and constraint errors that may occur while the data is being acquired from the source(s).

The second is the Uniqueness Violation (UV) table that stores rows with duplicate values for Unique Primary Indexes (UPI). Since a UPI must be unique, MultiLoad can only load one occurrence into a table. Any duplicate value will be stored in the UV error table. For example, you might see a UPI error that shows a second employee number "99." In this case, if the name for employee "99" is Kara Morgan, you will be glad that the row did not load since Kara Morgan is already in the Employee table. However, if the name showed up as David Jackson, then you know that further investigation is needed, because employee numbers must be unique.

Each error table does the following: Identifies errors Provides some detail about the errors Stores the actual offending row for debugging

You have the option to name these tables in the MultiLoad script (shown later). Alternatively, if you do not name them, they default to ET_ and UV_. In either case, MultiLoad will not accept error table names that are the same as target table names. It does not matter what you name them. It is recommended that you standardize on the naming convention to make it easier for everyone on your team. For more details on how these error tables can help you, see the subsection in this chapter titled, "Troubleshooting MultiLoad Errors."

Log Table: MultiLoad requires a LOGTABLE. This table keeps a record of the results from each phase of the load so that MultiLoad knows the proper point from which to RESTART. There is one LOGTABLE for each run. Since MultiLoad will not resubmit a command that has been run previously, it will use the LOGTABLE to determine the last successfully completed step.

Work Table(s): MultiLoad will automatically create one worktable for each target table. This means that in IMPORT mode you could have one or more worktables. In the DELETE mode, you will only have one worktable since that mode only works on one target table. The purpose of worktables is to hold two things:

1. The Data Manipulation Language (DML) tasks2. The input data that is ready to APPLY to the AMPs

The worktables are created in a database using PERM space. They can become very large. If the script uses multiple SQL statements for a single data record, the data is sent to the AMP once for each SQL statement. This replication guarantees fast performance and that no SQL statement will ever be done more than once. So, this is very important. However, there is no such thing as a free lunch, the cost is space. Later, you will see that using a FILLER field can help reduce this disk space by not sending unneeded data to an AMP. In other words, the efficiency of the MultiLoad run is in your hands.

Supported Input FormatsData input files come in a variety of formats but MultiLoad is flexible enough to handle many of them. MultiLoad supports the following five format options: BINARY, FASTLOAD, TEXT, UNFORMAT and VARTEXT.

Page 33 of 91


Figure 5-1MultiLoad Has Five IMPORT PhasesMultiLoad IMPORT has five phases, but don't be fazed by this! Here is the short list: Phase 1: Preliminary Phase Phase 2: DML Transaction Phase Phase 3: Acquisition Phase Phase 4: Application Phase Phase 5: Cleanup Phase

Let's take a look at each phase and see what it contributes to the overall load process of this magnificent utility. Should you memorize every detail about each phase? Probably not. But it is important to know the essence of each phase because sometimes a load fails. When it does, you need to know in which phase it broke down since the method for fixing the error to RESTART may vary depending on the phase. And if you can picture what MultiLoad actually does in each phase, you will likely write better scripts that run more efficiently.

Phase 1: Preliminary PhaseThe ancient oriental proverb says, "Measure one thousand times; Cut once." MultiLoad uses Phase 1 to conduct several preliminary set-up activities whose goal is to provide a smooth and successful climate for running your load. The first task is to be sure that the SQL syntax and MultiLoad commands are valid. After all, why try to run a script when the system will just find out during the load process that the statements are not useable? MultiLoad knows that it is much better to identify any syntax errors, right up front. All the preliminary steps are automated. No user intervention is required in this phase.

Second, all MultiLoad sessions with Teradata need to be established. The default is the number of available AMPs. Teradata will quickly establish this number as a factor of 16 for the basis regarding the number of sessions to create. The general rule of thumb for the number of sessions to use for smaller systems is the following: use the number of AMPs plus two more. For larger systems with hundreds of AMP processors, the SESSIONS option is available to lower the default. Remember, these sessions are running on your poor little computer as well as on Teradata.

Each session loads the data to Teradata across the network or channel. Every AMP plays an essential role in the MultiLoad process. They receive the data blocks, hash each row and send the rows to the correct AMP. When the rows come to an AMP, it stores them in worktable blocks on disk. But, lest we get ahead of ourselves, suffice it to say that there is ample reason for multiple sessions to be established.

What about the extra two sessions? Well, the first one is a control session to handle the SQL and logging. The second is a back up or alternate for logging. You may have to use some trial and error to find what works best on your system configuration. If you specify too few sessions it may impair performance and increase the time it takes to complete load jobs. On the other hand, too many sessions will reduce the resources available for other important database activities.

Third, the required support tables are created. They are the following:

Page 34 of 91



Figure 5-2

The final task of the Preliminary Phase is to apply utility locks to the target tables. Initially, access locks are placed on all target tables, allowing other users to read or write to the table for the time being. However, this lock does prevent the opportunity for a user to request an exclusive lock. Although, these locks will still allow the MultiLoad user to drop the table, no one else may DROP or ALTER a target table while it is locked for loading. This leads us to Phase 2.

Phase 2: DML Transaction PhaseIn Phase 2, all of the SQL Data Manipulation Language (DML) statements are sent ahead to Teradata. MultiLoad allows the use of multiple DML functions. Teradata's Parsing Engine (PE) parses the DML and generates a step-by-step plan to execute the request. This execution plan is then communicated to each AMP and stored in the appropriate worktable for each target table. In other words, each AMP is going to work off the same page.

Later, during the Acquisition phase the actual input data will also be stored in the worktable so that it may be applied in Phase 4, the Application Phase. Next, a match tag is assigned to each DML request that will match it with the appropriate rows of input data. The match tags will not actually be used until the data has already been acquired and is about to be applied to the worktable. This is somewhat like a student who receives a letter from the university in the summer that lists his courses, professor's names, and classroom locations for the upcoming semester. The letter is a "match tag" for the student to his school schedule, although it will not be used for several months. This matching tag for SQL and data is the reason that the data is replicated for each SQL statement using the same data record.

Phase 3: Acquisition PhaseWith the proper set-up complete and the PE's plan stored on each AMP, MultiLoad is now ready to receive the INPUT data. This is where it gets interesting! MultiLoad now acquires the data in large, unsorted 64K blocks from the host and sends it to the AMPs.

At this point, Teradata does not care about which AMP receives the data block. The blocks are simply sent, one after the other, to the next AMP in line. For their part, each AMP begins to deal with the blocks that they have been dealt. It is like a game of cards-you take the cards that you have received and then play the game. You want to keep some and give some away.

Similarly, the AMPs will keep some data rows from the blocks and give some away. The AMP hashes each row on the primary index and sends it over the BYNET to the proper AMP where it will ultimately be used. But the row does not get inserted into its target table, just yet. The receiving AMP must first do some preparation before that happens. Don't you have to get ready before company arrives at your house? The AMP puts all of the hashed rows it has received from other AMPs into the worktables where it assembles them into the SQL. Why? Because once the rows are reblocked, they can be sorted into the proper order for storage in the target table. Now the utility places a load lock on each target table in preparation for the Application Phase. Of course, there is no Acquisition Phase when you perform a MultiLoad DELETE task, since no data is being acquired.

Phase 4: Application PhaseThe purpose of this phase is to write, or APPLY, the specified changes to both the target tables and NUSI subtables. Once the data is on the AMPs, it is married up to the SQL for execution. To accomplish this substitution of data into SQL, when sending the data, the host has already attached some sequence information and five (5) match tags to each data row. Those match tags are used to join the data with the proper SQL statement based on the SQL statement within a DMP label. In addition to associating each row with the correct DML statement, match tags also guarantee that no row will be updated more than once, even when a RESTART occurs.

The following five columns are the matching tags:

Page 35 of 91

http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig5%2D2%5F0%2Ejpg&image_id=34&previd=IMG_34&

Chapter 2: BTEQ - dbmanagement.infodbmanagement.info/Books/MIX/15041762-Teradata-Utilities_TeraData.pdfTeradata Utilities-Breaking the Barriers, First Edition Chapter 2: BTEQ An Introduction

Documents