-
Teradata Utilities-Breaking the Barriers, First Edition
Chapter 2: BTEQAn Introduction to BTEQ
Why it is called BTEQ?Why is BTEQ available on every Teradata
system ever built? Because the Batch TEradata Query (BTEQ) tool was
the original way that SQL was submitted to Teradata as a means of
getting an answer set in a desired format. This is the utility that
I used for training at Wal*Mart, AT&T, Anthem Blue Cross and
Blue Shield, and SouthWestern Bell back in the early 1990's. BTEQ
is often referred to as the Basic TEradata Query and is still used
today and continues to be an effective tool.
Here is what is excellent about BTEQ: BTEQ can be used to submit
SQL in either a batch or interactive environment. Interactive users
can submit SQL
and receive an answer set on the screen. Users can also submit
BTEQ jobs from batch scripts, have error checking and conditional
logic, and allow for the work to be done in the background.
BTEQ outputs a report format, where Queryman outputs data in a
format more like a spreadsheet. This allows BTEQ a great deal of
flexibility in formatting data, creating headings, and utilizing
Teradata extensions, such as WITH and WITH BY that Queryman has
problems in handling.
BTEQ is often used to submit SQL, but is also an excellent tool
for importing and exporting data.o Importing Data: Data can be read
from a file on either a mainframe or LAN attached computer and used
for
substitution directly into any Teradata SQL using the INSERT,
UPDATE or DELETE statements.o Exporting Data: Data can be written
to either a mainframe or LAN attached computer using a SELECT
from
Teradata. You can also pick the format you desire ranging from
data files to printed reports to Excel formats.
There are other utilities that are faster than BTEQ for
importing or exporting data. We will talk about these in future
chapters, but BTEQ is still used for smaller jobs.
Logging on to BTEQBefore you can use BTEQ, you must have user
access rights to the client system and privileges to the Teradata
DBS. Normal system access privileges include a userid and a
password. Some systems may also require additional user
identification codes depending on company standards and operational
procedures. Depending on the configuration of your Teradata DBS,
you may need to include an account identifier (acctid) and/or a
Teradata Director Program Identifier (TDPID).
Using BTEQ to submit queries
Submitting SQL in BTEQ's Interactive ModeOnce you logon to
Teradata through BTEQ, you are ready to run your queries. Teradata
knows the SQL is finished when it finds a semi-colon, so don't
forget to put one at the end of your query. Below is an example of
a Teradata table to demonstrate BTEQ operations.
Employee_Table
Figure 2-1
BTEQ execution
Page 1 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D1%5F0%2Ejpg&image_id=1&previd=IMG_1&titlelabel=Figure+2%2D1%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Figure 2-2
Submitting SQL in BTEQ's Batch ModeOn network-attached systems,
BTEQ can also run in batch mode under UNIX (IBM AIX,
Hewlett-Packard HP-UX, NCR MP-RAS, Sun Solaris), DOS, Macintosh,
Microsoft Windows and OS/2 operating systems. To submit a job in
Batch mode do the following:
1. Invoke BTEQ2. Type in the input file name3. Type in the
location and output file name.
The following example shows how to invoke BTEQ from a DOS
command. In order for this to work, the directory called Program
Files\NCR\Teradata Client\bin must be established in the search
path.
Figure 2-3
Notice that the BTEQ command is immediately followed by the
-
Teradata Utilities-Breaking the Barriers, First Edition
Using BTEQ Conditional LogicBelow is a BTEQ batch script
example. The initial steps of the script will establish the logon,
the database, and the delete all the rows from the Employee_Table.
If the table does not exist, the BTEQ conditional logic will
instruct Teradata to create it. However, if the table already
exists, then Teradata will move forward and insert data.
Note In script examples, the left panel contains BTEQ base
commands and the right panel provides a brief description of each
command.
Figure 2-5Using BTEQ to Export DataBTEQ allows data to be
exported directly from Teradata to a file on a mainframe or
network-attached computer. In addition, the BTEQ export function
has several export formats that a user can choose depending on the
desired output. Generally, users will export data to a flat file
format that is composed of a variety of characteristics. These
characteristics include: field mode, indicator mode, or dif mode.
Below is an expanded explanation of the different mode options.
Format of the EXPORT command: .EXPORT {FILE | DDNAME } = [,
LIMIT=n]
Record Mode: (also called DATA mode): This is set by .EXPORT
DATA. This will bring data back as a flat file. Each parcel will
contain a complete record. Since it is not a report, there are no
headers or white space between the data contained in each column
and the data is written to the file (e.g., disk drive file) in
native format. For example, this means that INTEGER data is written
as a 4-byte binary field. Therefore, it cannot be read and
understood using a normal text editor.
Field Mode (also called REPORT mode): This is set by .EXPORT
REPORT. This is the default mode for BTEQ and brings the data back
as if it was a standard SQL SELECT statement. The output of this
BTEQ export would return the column headers for the fields, white
space, expanded packed or binary data (for humans to read) and can
be understood using a text editor.
Indicator Mode: This is set by .EXPORT INDICDATA. This mode
writes the data in data mode, but also provides host operating
systems with the means of recognizing missing or unknown data
(NULL) fields. This is important if the data is to be loaded into
another Relational Database System (RDBMS).
The issue is that there is no standard character defined to
represent either a numeric or character NULL. So, every system uses
a zero for a numeric NULL and a space or blank for a character
NULL. If this data is simply loaded into another RDBMS, it is no
longer a NULL, but a zero or space.
To remedy this situation, INDICATA puts a bitmap at the front of
every record written to the disk. This bitmap contains one bit per
field/column. When a Teradata column contains a NULL, the bit for
that field is turned on by setting it to a "1". Likewise, if the
data is not NULL, the bit remains a zero. Therefore, the loading
utility reads these bits as indicators of NULL data and identifies
the column(s) as NULL when data is loaded back into the table,
where appropriate.
Page 3 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D5%5F0%2Ejpg&image_id=5&previd=IMG_5&titlelabel=Figure+2%2D5%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Since both DATA and INDICDATA store each column on disk in
native format with known lengths and characteristics, they are the
fastest method of transferring data. However, it becomes imperative
that you be consistent. When it is exported as DATA, it must be
imported as DATA and the same is true for INDICDATA.
Again, this internal processing is automatic and potentially
important. Yet, on a network-attached system, being consistent is
our only responsibility. However, on a mainframe system, you must
account for these bits when defining the LRECL in the Job Control
Language (JCL). Otherwise, your length is too short and the job
will end with an error.
To determine the correct length, the following information is
important. As mentioned earlier, one bit is needed per field output
onto disk. However, computers allocate data in bytes, not bits.
Therefore, if one bit is needed a minimum of eight (8 bits per
byte) are allocated. Therefore, for every eight fields, the LRECL
becomes 1 byte longer and must be added. In other words, for nine
columns selected, 2 bytes are added even though only nine bits are
needed.
With this being stated, there is one indicator bit per field
selected. INDICDATA mode gives the Host computer the ability to
allocate bits in the form of a byte. Therefore, if one bit is
required by the host system, INDICDATA mode will automatically
allocate eight of them. This means that from one to eight columns
being referenced in the SELECT will add one byte to the length of
the record. When selecting nine to sixteen columns, the output
record will be two bytes longer.
When executing on non-mainframe systems, the record length is
automatically maintained. However, when exporting to a mainframe,
the JCL (LRECL) must account for this addition length.
DIF Mode: Known as Data Interchange Format, which allows users
to export data from Teradata to be directly utilized for
spreadsheet applications like Excel, FoxPro and Lotus.
The optional limit is to tell BTEQ to stop returning rows after
a specific number (n) of rows. This might be handy in a test
environment to stop BTEQ before the end of transferring rows to the
file.BTEQ EXPORT Example Using Record (DATA) ModeThe following is
an example that displays how to utilize the export Record (DATA)
option. Notice the periods (.) at the beginning some of script
lines. A period starting a line indicates a BTEQ command. If there
is no period, then the command is an SQL command.
When doing an export on a Mainframe or a network-attached (e.g.,
LAN) computer, there is one primary difference in the .EXPORT
command. The difference is the following:
Mainframe syntax:
LAN syntax:
.EXPORT DATA DDNAME = data definition state name (JCL)
.EXPORT DATA FILE = actual file name
The following example uses a Record (DATA) Mode format. The
output of the exported data will be a flat file.
Employee_Table
Figure 2-6BTEQ EXPORT Example Using Field (Report) ModeThe
following is an example that displays how to utilize the export
Field (Report) option. Notice the periods (.) at the beginning some
of script lines. A period starting a line indicates a BTEQ command
and needs no semi-colon. Likewise, if there is no period, then the
command is an SQL command and requires a semi-colon.
Page 4 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D6%5F0%2Ejpg&image_id=6&previd=IMG_6&titlelabel=Figure+2%2D6%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Figure 2-7
After this script has completed, the following report will be
generated on disk.
Employee_No Last_name First_name Salary Dept_No
2000000 Jones Squiggy 32800.50 ?
1256349 Harrison Herbert 54500.00 400
1333454 Smith John 48000.00 200
1121334 Strickling Cletus 54500.00 400
1324657 Coffing Billy 41888.88 200
2341218 Reilly William 36000.00 400
1232578 Chambers Mandee 56177.50 100
1000234 Smythe Richard 64300.00 10
2312225 Larkins Loraine 40200.00 300
I remember when my mom and dad purchased my first Lego set. I
was so excited about building my first space station that I ripped
the box open, and proceeded to follow the instructions to complete
the station. However, when I was done, I was not satisfied with the
design and decided to make changes. So I built another space ship
and constructed another launching station. BTEQ export works in the
same manner, as the basic EXPORT knowledge is acquired, the more we
can build on that foundation.
With that being said, the following is an example that displays
a more robust example of utilizing the Field (Report) option. This
example will export data in Field (Report) Mode format. The output
of the exported data will appear like a standard output of a SQL
SELECT statement. In addition, aliases and a title have been added
to the script.
Figure 2-8
After this script has been completed, the following report will
be generated on disk.
Page 5 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D7%5F0%2Ejpg&image_id=7&previd=IMG_7&titlelabel=Figure+2%2D7%3A+http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D8%5F0%2Ejpg&image_id=8&previd=IMG_8&titlelabel=Figure+2%2D8%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Employee Profiles
Employee Number Last Name First Name Salary Department
Number
2000000 Jones Squiggy 32800.50 ?
1256349 Harrison Herbert 54500.00 400
1333454 Smith John 48000.00 200
1121334 Strickling Cletus 54500.00 400
1324657 Coffing Billy 41888.88 200
2341218 Reilly William 36000.00 400
1232578 Chambers Mandee 56177.50 100
1000234 Smythe Richard 64300.00 10
2312225 Larkins Loraine 40200.00 300
From the above example, a number of BTEQ commands were added to
the export script. Below is a review of those commands. The WIDTH
specifies the width of screen displays and printed reports, based
on characters per line. The FORMAT command allows the ability to
enable/inhibit the page-oriented format option. The HEADING command
specifies a header that will appear at the top every page of a
report.BTEQ IMPORT ExampleBTEQ can also read a file from the hard
disk and incorporate the data into SQL to modify the contents of
one or more tables. In order to do this processing, the name and
record description of the file must be known ahead of time. These
will be defined within the script file.
Format of the IMPORT command: .IMPORT { FILE | DNAME } =
[,SKIP=n]
The script below introduces the IMPORT command with the Record
(DATA) option. Notice the periods (.) at the beginning some of
script lines. A period starting a line indicates a BTEQ command. If
there is no period, then the command is an SQL command.
The SKIP option is used when you wish to bypass the first
records in a file. For example, a mainframe tape may have header
records that should not be processed. Other times, maybe the job
started and loaded a few rows into the table with a UPI defined.
Loading them again will cause an error. So, you can skip over them
using this option.
The following example will use a Record (DATA) Mode format. The
input of the imported data will populate the Employee_Table.
Page 6 of 91
-
Teradata Utilities-Breaking the Barriers, First Edition
Figure 2-9
From the above example, a number of BTEQ commands were added to
the import script. Below is a review of those commands. .QUIET ON
limits BTEQ output to reporting only errors and request processing
statistics. Note: Be careful how
you spell .QUIET, else forgetting the E becomes .QUIT and it
will. .REPEAT * causes BTEQ to read a specified number of records
or until EOF. The default is one record. Using
REPEAT 10 would perform the loop 10 times. The USING defines the
input data fields and their associated data types coming from the
host.
The following builds upon the IMPORT Record (DATA) example
above. The example below will still utilize the Record (DATA) Mode
format. However, this script will add a CREATE TABLE statement. In
addition, the imported data will populate the newly created
Employee_Profile table.
Page 7 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D9%5F0%2Ejpg&image_id=9&previd=IMG_9&titlelabel=Figure+2%2D9%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Figure 2-10
Notice that some of the scripts have a .LOGOFF and .QUIT. The
.LOGOFF is optional because when BTEQ quits, the session is
terminated. A logoff makes it a friendly departure and also allows
you to logon with a different user name and password.Determining
Out Record LengthsSome hosts, such as IBM mainframes, require the
correct LRECL (Logical Record Length) parameter in the JCL, and
will abort if the value is incorrect. The following page will
discuss how to figure out the record lengths.
There are three issues involving record lengths and they are:
Fixed columns Variable columns NULL indicators
Fixed Length Columns: For fixed length columns you merely count
the length of the column. The lengths are:
INTEGER 4 bytes
SMALLINT 2 bytes
BYTEINT 1 byte
CHAR(10) 10 bytes
CHAR(4) 4 bytes
DATE 4 bytes
DECIMAL(7,2) 4 bytes (packed data, total digits / 2 +1)
DECIMAL(12,2) 8 bytes
Page 8 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D10%5F0%2Ejpg&image_id=10&previd=IMG_10&titlelabel=Figure+2%2D10%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Variable columns: Variable length columns should be calculated
as the maximum value plus two. This two bytes is for the number of
bytes for the binary length of the field. In reality you can save
much space because trailing blanks are not kept. The logical record
will assume the maximum and add two bytes as a length field per
column.
VARCHAR(8) 10 bytes
VARCHAR(10) 12 bytes
Indicator columns: As explained earlier, the indicators utilize
a single bit for each field. If your record has 8 fields (which
require 8 bits), then you add one extra byte to the total length of
all the fields. If your record has 9-16 fields, then add two
bytes.
BTEQ Return CodesReturn codes are two-digit values that BTEQ
returns to the user after completing each job or task. The value of
the return code indicates the completion status of the job or task
as follows:
Return Code Descirption 00 Job completed with no errors. 02 User
alert to log on to the Teradata DBS. 04 Warning error. 08 User
error. 12 Severe internal error.
You can over-ride the standard error codes at the time you
terminate BTEQ. This might be handy for debug purposes. The error
code or "return code" can be any number you specify using one of
the following:
Override Code Description .QUIT 15 .EXIT 15
BTEQ CommandsThe BTEQ commands in Teradata are designed for
flexibility. These commands are not used directly on the data
inside the tables. However, these 60 different BTEQ commands are
utilized in four areas. Session Control Commands File Control
Commands Sequence Control Commands Format Control Commands
Page 9 of 91
-
Teradata Utilities-Breaking the Barriers, First Edition
Session Control Commands
Figure 2-11
File Control CommandsThese BTEQ commands are used to specify the
formatting parameters of incoming and outgoing information. This
includes identifying sources and determining I/O streams.
Page 10 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D11%5F0%2Ejpg&image_id=11&previd=IMG_11&titlelabel=Figure+2%2D11%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Figure 2-12
Sequence Control CommandsThese commands control the sequence in
which Teradata commands operate.
Figure 2-13
Format Control CommandsThese commands control the formatting for
Teradata and present the data in a report mode to the screen or
printer.
Page 11 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D12%5F0%2Ejpg&image_id=12&previd=IMG_12&titlelabel=Figure+2%2D12%3A+http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D13%5F0%2Ejpg&image_id=13&previd=IMG_13&titlelabel=Figure+2%2D13%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Figure 2-14
Page 12 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D14a%5F0%2Ejpg&image_id=14&previd=IMG_14http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig2%2D14b%5F0%2Ejpg&image_id=15&previd=IMG_15&titlelabel=Figure+2%2D14%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Chapter 3: FastExportAn Introduction to FastExport
Why it is called "FAST" ExportFastExport is known for its
lightning speed when it comes to exporting vast amounts of data
from Teradata and transferring the data into flat files on either a
mainframe or network-attached computer. In addition, FastExport has
the ability to except OUTMOD routines, which provides the user the
capability to write, select, validate, and preprocess the exported
data. Part of this speed is achieved because FastExport takes full
advantage of Teradata's parallelism.
In this book, we have already discovered how BTEQ can be
utilized to export data from Teradata in a variety of formats. As
the demand increases to store data, the ever-growing requirement
for tools to export massive amounts of data.
This is the reason why FastExport (FEXP) is brilliant by design.
A good rule of thumb is that if you have more than half a million
rows of data to export to either a flat file format or with NULL
indicators, then FastExport is the best choice to accomplish this
task.
Keep in mind that FastExport is designed as a one-way
utility-that is, the sole purpose of FastExport is to move data out
of Teradata. It does this by harnessing the parallelism that
Teradata provides.
FastExport is extremely attractive for exporting data because it
takes full advantage of multiple sessions, which leverages Teradata
parallelism. FastExport can also export from multiple tables during
a single operation. In addition, FastExport utilizes the Support
Environment, which provides a job restart capability from a
checkpoint if an error occurs during the process of executing an
export job.
How FastExport WorksWhen FastExport is invoked, the utility logs
onto the Teradata database and retrieves the rows that are
specified in the SELECT statement and puts them into SPOOL. From
there, it must build blocks to send back to the client. In
comparison, BTEQ starts sending rows immediately for storage into a
file.
If the output data is sorted, FastExport may be required to
redistribute the selected data two times across the AMP processors
in order to build the blocks in the correct sequence. Remember, a
lot of rows fit into a 64K block and both the rows and the blocks
must be sequenced. While all of this redistribution is occurring,
BTEQ continues to send rows. FastExport is getting behind in the
processing. However, when FastExport starts sending the rows back a
block at a time, it quickly overtakes and passes BTEQ's row at time
processing.
The other advantage is that if BTEQ terminates abnormally, all
of your rows (which are in SPOOL) are discarded. You must rerun the
BTEQ script from the beginning. However, if FastExport terminates
abnormally, all the selected rows are in worktables and it can
continue sending them where it left off. Pretty smart and very
fast!
Also, if there is a requirement to manipulate the data before
storing it on the computer's hard drive, an OUTMOD routine can be
written to modify the result set after it is sent back to the
client on either the mainframe or LAN. Just like the BASF
commercial states, "We don't make the products you buy, we make the
products you buy better". FastExport is designed off the same
premise, it does not make the SQL SELECT statement faster, but it
does take the SQL SELECT statement and processes the request with
lighting fast parallel processing!
FastExport Fundamentals#1: FastExport EXPORTS data from
Teradata. The reason they call it FastExport is because it takes
data off of Teradata (Exports Data). FastExport does not import
data into Teradata. Additionally, like BTEQ it can output multiple
files in a single run.
#2: FastExport only supports the SELECT statement. The only DML
statement that FastExport understands is SELECT. You SELECT the
data you want exported and FastExport will take care of the
rest.
#3: Choose FastExport over BTEQ when Exporting Data of more than
half a million+ rows. When a large amount of data is being
exported, FastExport is recommended over BTEQ Export. The only
drawback is the total number of FastLoads, FastExports, and
MultiLoads that can run at the same time, which is limited to 15.
BTEQ Export
Page 13 of 91
-
Teradata Utilities-Breaking the Barriers, First Edition
does not have this restriction. Of course, FastExport will work
with less data, but the speed may not be much faster than BTEQ.
#4: FastExport supports multiple SELECT statements and multiple
tables in a single run. You can have multiple SELECT statements
with FastExport and each SELECT can join information up to 64
tables.
#5: FastExport supports conditional logic, conditional
expressions, arithmetic calculations, and data conversions.
FastExport is flexible and supports the above conditions,
calculations, and conversions.
#6: FastExport does NOT support error files or error limits.
FastExport does not record particular error types in a table. The
FastExport utility will terminate after a certain number of errors
have been encountered.
#7: FastExport supports user-written routines INMODs and
OUTMODs. FastExport allows you write INMOD and OUTMOD routines so
you can select, validate and preprocess the exported data
FastExport Supported Operating SystemsThe FastExport utility is
supported on either the mainframe or on LAN. The information below
illustrates which operating systems are supported for each
environment:
The LAN environment supports the following Operating Systems:
UNIX MP-RAS Windows 2000 Windows 95 Windows NT UNIX HP-UX AIX
Solaris SPARC Solaris Intel
The Mainframe (Channel Attached) environment supports the
following Operating Systems: MVS VM
Maximum of 15 LoadsThe Teradata RDBMS will only support a
maximum of 15 simultaneous FastLoad, MultiLoad, or FastExport
utility jobs. This maximum value is determined and configured by
the DBS Control record. This value can be set from 0 to 15. When
Teradata is initially installed, this value is set at 5.
The reason for this limitation is that FastLoad, MultiLoad, and
FastExport all use large blocks to transfer data. If more then 15
simultaneous jobs were supported, a saturation point could be
reached on the availability of resources. In this case, Teradata
does an excellent job of protecting system resources by queuing up
additional FastLoad, MultiLoad, and FastExport jobs that are
attempting to connect.
For example, if the maximum numbers of utilities on the Teradata
system is reached and another job attempts to run that job does not
start. This limitation should be viewed as a safety control
feature. A tip for remembering how the load limit applies is this,
"If the name of the load utility contains either the word 'Fast' or
the word 'Load', then there can be only a total of fifteen of them
running at any one time".
BTEQ does not have this load limitation. FastExport is clearly
the better choice when exporting data. However, if two many load
jobs are running. BTEQ is an alternate choice for exporting
data.FastExport Support and Task CommandsFastExport accepts both
FastExport commands and a subset of SQL statements. The FastExport
commands can be broken down into support and task activities. The
table below highlights the key FastExport commands and their
definitions. These commands provide flexibility and control during
the export process.
Page 14 of 91
-
Teradata Utilities-Breaking the Barriers, First Edition
Support Environment Commands (see Support Environment chapter
for details)
Figure 3-1
Task Commands
Figure 3-2FastExport Supported SQL CommandsFastExport accepts
the following Teradata SQL statements. Each has been placed in
alphabetic order for your convenience.
Page 15 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig3%2D1%5F0%2Ejpg&image_id=16&previd=IMG_16&titlelabel=Figure+3%2D1%3A+http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig3%2D2%5F0%2Ejpg&image_id=17&previd=IMG_17&titlelabel=Figure+3%2D2%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
SQL Commands
Figure 3-3A FastExport in its Simplest FormThe hobby of racecar
driving can be extremely frustrating, challenging, and rewarding
all at the same time. I always remember my driving instructor
coaching me during a practice session in a new car around a road
course racetrack. He said to me, "Before you can learn to run, you
need to learn how to walk." This same philosophy can be applied
when working with FastExport. If FastExport is broken into steps,
then several things that appear to be complicated are really very
simple. With this being stated, FastExport can be broken into the
following steps: Logging onto Teradata Retrieves the rows you
specify in your SELECT statement Exports the data to the specified
file or OUTMOD routine Logs off of Teradata
Page 16 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig3%2D3%5F0%2Ejpg&image_id=18&previd=IMG_18&titlelabel=Figure+3%2D3%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Figure 3-4Sample FastExport ScriptNow that the first steps have
been taken to understand FastExport, the next step is to journey
forward and review another example that shows builds upon what we
have learned. In the script below, Teradata comment lines have been
placed inside the script [/*. */]. In addition, FastExport and SQL
commands are written in upper case in order to highlight them.
Another note is that the column names are listed vertically. The
recommendation is to place the comma separator in front of the
following column. Coding this way makes reading or debugging the
script easier to accomplish.
Figure 3-5 FastExport Modes and Formats
FastExport ModesFastExport has two modes: RECORD or INDICATOR.
In the mainframe world, only use RECORD mode. In the UNIX or LAN
environment, RECORD mode is the default, but you can use INDICATOR
mode if desired. The difference between the two modes is INDICATOR
mode will set the indicator bits to 1 for column values containing
NULLS.
Page 17 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig3%2D4%5F0%2Ejpg&image_id=19&previd=IMG_19&titlelabel=Figure+3%2D4%3A+http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig3%2D5%5F0%2Ejpg&image_id=20&previd=IMG_20&titlelabel=Figure+3%2D5%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Both modes return data in a client internal format with
variable-length records. Each individual record has a value for all
of the columns specified by the SELECT statement. All
variable-length columns are preceded by a two-byte control value
indicating the length of the column data. NULL columns have a value
that is appropriate for the column data type. Remember, INDICATOR
mode will set bit flags that identify the columns that have a null
value.
FastExport FormatsFastExport has many possible formats in the
UNIX or LAN environment. The FORMAT statement specifies the format
for each record being exported which are: FASTLOAD BINARY TEXT
UNFORMAT
The default FORMAT is FASTLOAD in a UNIX or LAN environment.
FASTLOAD Format is a two-byte integer, followed by the data,
followed by an end-of- record marker. It is called FASTLOAD because
the data is exported in a format ready for FASTLOAD.
BINARY Format is a two-byte integer, followed by data.
TEXT is an arbitrary number of bytes followed by an
end-of-record marker.
UNFORMAT is exported as it is received from CLIv2 without any
client modifications.
A FastExport Script using Binary Mode
Figure 3-6
Page 18 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig3%2D6%5F0%2Ejpg&image_id=21&previd=IMG_21&titlelabel=Figure+3%2D6%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Chapter 4: FastLoadAn Introduction to FastLoad
Why it is called "FAST" LoadFastLoad is known for its
lightning-like speed in loading vast amounts of data from flat
files from a host into empty tables in Teradata. Part of this speed
is achieved because it does not use the Transient Journal. You will
see some more of the reasons enumerated below. But, regardless of
the reasons that it is fast, know that FastLoad was developed to
load millions of rows into a table.
The way FastLoad works can be illustrated by home construction,
of all things! Let's look at three scenarios from the construction
industry to provide an amazing picture of how the data gets
loaded.
Scenario One: Builders prefer to start with an empty lot and
construct a house on it, from the foundation right on up to the
roof. There is no pre-existing construction, just a smooth, graded
lot. The fewer barriers there are to deal with, the quicker the new
construction can progress. Building custom or spec houses this way
is the fastest way to build them. Similarly, FastLoad likes to
start with an empty table, like an empty lot, and then populate it
with rows of data from another source. Because the target table is
empty, this method is typically the fastest way to load data.
FastLoad will never attempt to insert rows into a table that
already holds data.
Scenario Two: The second scenario in this analogy is when
someone buys the perfect piece of land on which to build a home,
but the lot already has a house on it. In this case, the person may
determine that it is quicker and more advantageous just to demolish
the old house and start fresh from the ground up-allowing for brand
new construction. FastLoad also likes this approach to loading
data. It can just 1) drop the existing table, which deletes the
rows, 2) replace its structure, and then 3) populate it with the
latest and greatest data. When dealing with huge volumes of new
rows, this process will run much quicker than using MultiLoad to
populate the existing table. Another option is to DELETE all the
data rows from a populated target table and reload it. This
requires less updating of the Data Dictionary than dropping and
recreating a table. In either case, the result is a perfectly empty
target table that FastLoad requires!
Scenario Three: Sometimes, a customer has a good house already
but wants to remodel a portion of it or to add an additional room.
This kind of work takes more time than the work described in
Scenario One. Such work requires some tearing out of existing
construction in order to build the new section. Besides, the
builder never knows what he will encounter beneath the surface of
the existing home. So you can easily see that remodeling or
additions can take more time than new construction. In the same
way, existing tables with data may need to be updated by adding new
rows of data. To load populated tables quickly with large amounts
of data while maintaining the data currently held in those tables,
you would choose MultiLoad instead of FastLoad. MultiLoad is
designed for this task but, like renovating or adding onto an
existing house, it may take more time.
How FastLoad WorksWhat makes FastLoad perform so well when it is
loading millions or even billions of rows? It is because FastLoad
assembles data into 64K blocks (64,000 bytes) to load it and can
use multiple sessions simultaneously, taking further advantage of
Teradata's parallel processing.
This is different from BTEQ and TPump, which load data at the
row level. It has been said, "If you have it, flaunt it!" FastLoad
does not like to brag, but it takes full advantage of Teradata's
parallel architecture. In fact, FastLoad will create a Teradata
session for each AMP (Access Module Processor-the software
processor in Teradata responsible for reading and writing data to
the disks) in order to maximize parallel processing. This advantage
is passed along to the FastLoad user in terms of awesome
performance. Teradata is the only data warehouse product in the
world that loads data, processes data and backs up data in
parallel.
FastLoad Has Some LimitsThere are more reasons why FastLoad is
so fast. Many of these become restrictions and therefore, cannot
slow it down. For instance, can you imagine a sprinter wearing
cowboy boots in a race? Of course, not! Because of its speed,
FastLoad, too, must travel light! This means that it will have
limitations that may or may not apply to other load utilities.
Remembering this short list will save you much frustration from
failed loads and angry colleagues. It may even foster your
reputation as a smooth operator!
Page 19 of 91
-
Teradata Utilities-Breaking the Barriers, First Edition
Rule #1: No Secondary Indexes are allowed on the Target Table.
High performance will only allow FastLoad to utilize Primary
Indexes when loading. The reason for this is that Primary (UPI and
NUPI) indexes are used in Teradata to distribute the rows evenly
across the AMPs and build only data rows. A secondary index is
stored in a subtable block and many times on a different AMP from
the data row. This would slow FastLoad down and they would have to
call it: get ready now, HalfFastLoad. Therefore, FastLoad does not
support them. If Secondary Indexes exist already, just drop them.
You may easily recreate them after completing the load.
Rule #2: No Referential Integrity is allowed. FastLoad cannot
load data into tables that are defined with Referential Integrity
(RI). This would require too much system checking to prevent
referential constraints to a different table. FastLoad only does
one table. In short, RI constraints will need to be dropped from
the target table prior to the use of FastLoad.
Rule #3: No Triggers are allowed at load time. FastLoad is much
too focused on speed to pay attention to the needs of other tables,
which is what Triggers are all about. Additionally, these require
more than one AMP and more than one table. FastLoad does one table
only. Simply ALTER the Triggers to the DISABLED status prior to
using FastLoad.
Rule #4: Duplicate Rows (in Multi-Set Tables) are not supported.
Multiset tables are tables that allow duplicate rows-that is when
the values in every column are identical. When FastLoad finds
duplicate rows, they are discarded. While FastLoad can load data
into a multi-set table, FastLoad will not load duplicate rows into
a multi-set table because FastLoad discards duplicate rows!
Rule #5: No AMPs may go down (i.e., go offline) while FastLoad
is processing. The down AMP must be repaired before the load
process can be restarted. Other than this, FastLoad can recover
from system glitches and perform restarts. We will discuss Restarts
later in this chapter.
Rule #6: No more than one data type conversion is allowed per
column during a FastLoad. Why just one? Data type conversion is
highly resource intensive job on the system, which requires a
"search and replace" effort. And that takes more time. Enough
said!
Three Key Requirements for FastLoad to RunFastLoad can be run
from either MVS/ Channel (mainframe) or Network (LAN) host. In
either case, FastLoad requires three key components. They are a log
table, an empty target table and two error tables. The user must
name these at the beginning of each script.
Log Table: FastLoad needs a place to record information on its
progress during a load. It uses the table called Fastlog in the
SYSADMIN database. This table contains one row for every FastLoad
running on the system. In order for your FastLoad to use this
table, you need INSERT, UPDATE and DELETE privileges on that
table.
Empty Target Table: We have already mentioned the absolute need
for the target table to be empty. FastLoad does not care how this
is accomplished. After an initial load of an empty target table,
you are now looking at a populated table that will likely need to
be maintained.
If you require the phenomenal speed of FastLoad, it is usually
preferable, both for the sake of speed and for less interaction
with the Data Dictionary, just to delete all the rows from that
table and then reload it with fresh data. The syntax DELETE .
should be used for this. But sometimes, as in some of our FastLoad
sample scripts below (see Figure 4-1), you want to drop that table
and recreate it versus using the DELETE option. To do this,
FastLoad has the ability to run the DDL statements DROP TABLE and
CREATE TABLE. The problem with putting DDL in the script is that is
no longer restartable and you are required to rerun the FastLoad
from the beginning. Otherwise, we recommend that you have a script
for an initial run and a different script for a restart.
Page 20 of 91
http://www.books24x7.com/book/id_5565/viewer_r.asp?bookid=5565&chunkid=856445705#ch04fig01#ch04fig01
-
Teradata Utilities-Breaking the Barriers, First Edition
Page 21 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig4%2D1a%5F0%2Ejpg&image_id=22&previd=IMG_22http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig4%2D1b%5F0%2Ejpg&image_id=23&previd=IMG_23
-
Teradata Utilities-Breaking the Barriers, First Edition
Figure 4-1
Two Error Tables: Each FastLoad requires two error tables. These
are error tables that will only be populated should errors occur
during the load process. These are required by the FastLoad
utility, which will automatically create them for you; all you must
do is to name them. The first error table is for any translation
errors or constraint violations. For example, a row with a column
containing a wrong data type would be reported to the first error
table. The second error table is for errors caused by duplicate
values for Unique Primary Indexes (UPI). FastLoad will load just
one occurrence for every UPI. The other occurrences will be stored
in this table. However, if the entire row is a duplicate, FastLoad
counts it but does not store the row. These tables may be analyzed
later for troubleshooting should errors occur during the load. For
specifics on how you can troubleshoot, see the section below
titled, "What Happens When FastLoad Finishes."
Maximum of 15 LoadsThe Teradata RDBMS will only run a maximum
number of fifteen FastLoads, MultiLoads, or FastExports at the same
time. This maximum is determined by a value stored in the DBS
Control record. It can be any value from 0 to 15. When Teradata is
first installed, this value is set to 5 concurrent jobs.
Since these utilities all use the large blocking of rows, it
hits a saturation point where Teradata will protect the amount
system resources available by queuing up the extra load. For
example, if the maximum number of jobs are currently running on the
system and you attempt to run one more, that job will not be
started. You should view this limit as a safety control. Here is a
tip for remembering how the load limit applies: If the name of the
load utility contains either the word "Fast" or the word "Load",
then there can be only a total of fifteen of them running at any
one time.
FastLoad Has Two PhasesTeradata is famous for its end-to-end use
of parallel processing. Both the data and the tasks are divided up
among the AMPs. Then each AMP tackles its own portion of the task
with regard to its portion of the data. This same "divide and
conquer" mentality also expedites the load process. FastLoad
divides its job into two phases, both designed for speed. They have
no fancy names but are typically known simply as Phase 1 and Phase
2. Sometimes they are referred to as Acquisition Phase and
Application Phase.
PHASE 1: AcquisitionThe primary function of Phase 1 is to
transfer data from the host computer to the Access Module
Processors (AMPs) as quickly as possible. For the sake of speed,
the Parsing Engine of Teradata does not does not take the time to
hash each row of data based on the Primary Index. That will be done
later. Instead, it does the following:
When the Parsing Engine (PE) receives the INSERT command, it
uses one session to parse the SQL just once. The PE is the Teradata
software processor responsible for parsing syntax and generating a
plan to execute the request. It then opens a Teradata session from
the FastLoad client directly to the AMPs. By default, one session
is created for each AMP. Therefore, on large systems, it is
normally a good idea to limit the number of sessions using the
SESSIONS command. This capability is shown below.
Simultaneously, all but one of the client sessions begins
loading raw data in 64K blocks for transfer to an AMP. The first
priority of Phase 1 is to get the data onto the AMPs as fast as
possible. To accomplish this, the rows are packed,
Page 22 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig4%2D1c%5F0%2Ejpg&image_id=24&previd=IMG_24&titlelabel=Figure+4%2D1%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
unhashed, into large blocks and sent to the AMPs without any
concern for which AMP gets the block. The result is that data rows
arrive on different AMPs than those they would live, had they been
hashed.
So how do the rows get to the correct AMPs where they will
permanently reside? Following the receipt of every data block, each
AMP hashes its rows based on the Primary Index, and redistributes
them to the proper AMP. At this point, the rows are written to a
worktable on the AMP but remain unsorted until Phase 1 is
complete.
Phase 1 can be compared loosely to the preferred method of
transfer used in the parcel shipping industry today. How do the key
players in this industry handle a parcel? When the shipping company
receives a parcel, that parcel is not immediately sent to its final
destination. Instead, for the sake of speed, it is often sent to a
shipping hub in a seemingly unrelated city. Then, from that hub it
is sent to the destination city. FastLoad's Phase 1 uses the AMPs
in much the same way that the shipper uses its hubs. First, all the
data blocks in the load get rushed randomly to any AMP. This just
gets them to a "hub" somewhere in Teradata country. Second, each
AMP forwards them to their true destination. This is like the
shipping parcel being sent from a hub city to its destination
city!
PHASE 2: ApplicationFollowing the scenario described above, the
shipping vendor must do more than get a parcel to the destination
city. Once the packages arrive at the destination city, they must
then be sorted by street and zip code, placed onto local trucks and
be driven to their final, local destinations.
Similarly, FastLoad's Phase 2 is mission critical for getting
every row of data to its final address (i.e., where it will be
stored on disk). In this phase, each AMP sorts the rows in its
worktable. Then it writes the rows into the table space on disks
where they will permanently reside. Rows of a table are stored on
the disks in data blocks. The AMP uses the block size as defined
when the target table was created. If the table is Fallback
protected, then the Fallback will be loaded after the Primary table
has finished loading. This enables the Primary table to become
accessible as soon as possible. FastLoad is so ingenious, no wonder
it is the darling of the Teradata load utilities!
FastLoad CommandsHere is a table of some key FastLoad commands
and their definitions. They are used to provide flexibility in
control of the load process. Consider this your personal
redireference guide! You will notice that there are only a few SQL
commands that may be used with this utility (Create Table, Drop
Table, Delete and Insert). This keeps FastLoad from becoming
encumbered with additional functions that would slow it downA
FastLoad Example in its Simplest FormThe load utilities often scare
people because there are many things that appear complicated. In
actuality, the load scripts are very simple. Think of FastLoad as:
Logging onto Teradata Defining the Teradata table that you want to
load (target table) Defining the INPUT data file Telling the system
to start loading Telling the system to start loading
This first script example is designed to show FastLoad in its
simplest form. The actual script is in the left column and our
comments are on the right.
Page 23 of 91
-
Teradata Utilities-Breaking the Barriers, First Edition
Figure 4-2Sample FastLoad ScriptLet's look at an actual FastLoad
script that you might see in the real world. In the script below,
every comment line is placed inside the normal Teradata comment
syntax, [/*.*/]. FastLoad and SQL commands are written in upper
case in order to make them stand out. In reality, Teradata
utilities, like Teradata itself, are by default not case sensitive.
You will also note that when column names are listed vertically we
recommend placing the comma separator in front of the following
column. Coding this way makes reading or debugging the script
easier for everyone. The purpose of this script is to update the
Employee_Profile table in the SQL01 database. The input file used
for the load is named EMPS.TXT. Below the sample script each step
will be described in detail.
Normally it is not a good idea to put the DROP and CREATE
statements in a FastLoad script. The reason is that when any of the
tables that FastLoad is using are dropped, the script cannot be
restarted. It can only be rerun from the beginning. Since FastLoad
has restart logic built into it, a restart is normally the better
solution if the initial load attempt should fail. However, for
purposes of this example, it shows the table structure and the
description of the data being read.
Page 24 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig4%2D2%5F0%2Ejpg&image_id=25&previd=IMG_25&titlelabel=Figure+4%2D2%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Figure 4-4 Step One: Before logging onto Teradata, it is
important to specify how many sessions you need. The syntax is
[SESSIONS {n}]. Step Two: Next, you LOGON to the Teradata
system. You will quickly see that the utility commands in
FastLoad
are similar to those in BTEQ. FastLoad commands were designed
from the underlying commands in BTEQ. However, unlike BTEQ, most of
the FastLoad commands do not allow a dot ["."] in front of them and
therefore need a semicolon. At this point we chose to have Teradata
tell us which version of FastLoad is being used for the load. Why
would we recommend this? We do because as FastLoad's capabilities
get enhanced with newer versions, the syntax of the scripts may
have to be revisited.
Step Three: If the input file is not a FastLoad format, before
you describe the INPUT FILE structure in the DEFINE statement, you
must first set the RECORD layout type for the file being passed by
FastLoad. We have used VARTEXT in our example with a comma
delimiter. The other options are FastLoad, TEXT, UNFORMATTED OR
VARTEXT. You need to know this about your input file ahead of
time.
Step Four: Next, comes the DEFINE statement. FastLoad must know
the structure and the name of the flat file to be used as the input
FILE, or source file for the load.
Step Five: FastLoad makes no assumptions from the DROP TABLE
statements with regard to what you want loaded. In the BEGIN
LOADING statement, the script must name the target table and the
two error tables for the load. Did you notice that there is no
CREATE TABLE statement for the error tables in this script?
FastLoad will automatically create them for you once you name them
in the script. In this instance, they are named "Emp_Err1" and
"Emp_Err2". Phase 1 uses "Emp_Err1" because it comes first and
Phase 2 uses "Emp_Err2". The names are arbitrary, of course. You
may call them whatever you like. At the same time, they must be
unique within a database, so using a combination of your userid and
target table name helps insure this uniqueness between multiple
FastLoad jobs occurring in the same database.
In the BEGIN LOADING statement we have also included the
optional CHECKPOINT parameter. We included [CHECKPOINT 100000].
Although not required, this optional parameter performs a vital
task with regard to the load. In the old days, children were always
told to focus on the three 'R's' in grade school ("reading, riting,
and rithmatic"). There are two very different, yet equally
important, R's to consider whenever you run FastLoad. They are
RERUN and RESTART. RERUN means that the job is capable of running
all the processing again from the beginning of the load. RESTART
means that the job is capable of running the processing again from
the point
Page 25 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig4%2D3%5F0%2Ejpg&image_id=26&previd=IMG_26&titlelabel=Figure+4%2D4%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
where it left off when the job was interrupted, causing it to
fail. When CHECKPOINT is requested, it allows FastLoad to resume
loading from the first row following the last successful
CHECKPOINT. We will learn more about CHECKPOINT in the section on
Restarting FastLoad.
Step Six: FastLoad focuses on its task of loading data blocks to
AMPs like little Yorkshire terrier's do when playing with a ball!
It will not stop unless you tell it to stop. Therefore, it will not
proceed to Phase 2 without the END LOADING command.
In reality, this provides a very valuable capability for
FastLoad. Since the table must be empty at the start of the job, it
prevents loading rows as they arrive from different time zones.
However, to accomplish this processing, simply omit the END LOADING
on the load job. Then, you can run the same FastLoad multiple times
and continue loading the worktables until the last file is
received. Then run the last FastLoad job with an END LOADING and
you have partitioned your load jobs into smaller segments instead
of one huge job. This makes FastLoad even faster!
Of course to make this work, FastLoad must be restartable.
Therefore, you cannot use the DROP or CREATE commands within the
script. Additionally, every script is exactly the same with the
exception of the last one, which contains the END LOADING causing
FastLoad to proceed to Phase 2. That's a pretty clever way to do a
partitioned type of data load.
Step Seven: All that goes up must come down. And all the
sessions must LOGOFF. This will be the last utility command in your
script. At this point the table lock is released and if there are
no rows in the error tables, they are dropped automatically.
However, if a single row is in one of them, you are responsible to
check it, take the appropriate action and drop the table
manually.
Converting Data Types with FastLoadConverting data is easy. Just
define the input data types in the input file. Then, FastLoad will
compare that to the column definitions in the Data Dictionary and
convert the data for you! But the cardinal rule is that only one
data type conversion is allowed per column. In the example below,
notice how the columns in the input file are converted from one
data type to another simply by redefining the data type in the
CREATE TABLE statement.
FastLoad allows six kinds of data conversions. Here is a chart
that displays them:
Figure 4-5
When we said that converting data is easy, we meant that it is
easy for the user. It is actually quite resource intensive, thus
increasing the amount of time needed for the load. Therefore, if
speed is important, keep the number of columns being converted to a
minimum!A FastLoad Conversion ExampleThis next script example is
designed to show how FastLoad converts data automatically when the
INPUT data type differs from the Target Teradata Table data type.
The actual script is in the left column and our comments are on the
right.
Page 26 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig4%2D4%5F0%2Ejpg&image_id=27&previd=IMG_27&titlelabel=Figure+4%2D5%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Figure 4-5When You Cannot RESTART FastLoadThere are two types of
FastLoad scripts: those that you can restart and those that you
cannot without modifying the script. If any of the following
conditions are true of the FastLoad script that you are dealing
with, it is NOT restartable: The Error Tables are DROPPED The
Target Table is DROPPED The Target Table is CREATED
Can you tell from the following sample fastLoad script why it is
not restartable?
Figure 4-7
Page 27 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig4%2D5%5F0%2Ejpg&image_id=28&previd=IMG_28&titlelabel=Figure+4%2D5%3A+http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig4%2D6%5F0%2Ejpg&image_id=29&previd=IMG_29&titlelabel=Figure+4%2D7%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Why might you have to RESTART a FastLoad job, anyway? Perhaps
you might experience a system reset or some glitch that stops the
job one half way through it. Maybe the mainframe went down. Well,
it is not really a big deal because FastLoad is so lightning-fast
that you could probably just RERUN the job for small data
loads.
However, when you are loading a billion rows, this is not a good
idea because it wastes time. So the most common way to deal with
these situations is simply to RESTART the job. But what if the
normal load takes 4 hours, and the glitch occurs when you already
have two thirds of the data rows loaded? In that case, you might
want to make sure that the job is totally restartable. Let's see
how this is done.When You Can RESTART FastLoadIf all of the
following conditions are true, then FastLoad is ALWAYS restartable:
The Error Tables are NOT DROPPED in the script The Target Table is
NOT DROPPED in the script The Target Table is NOT CREATED in the
script You have defined a checkpoint
So, if you need to drop or create tables, do it in a separate
job using BTEQ. Imagine that you have a table whose data changes so
much that you typically drop it monthly and build it again. Let's
go back to the script we just reviewed above and see how we can
break it into the two parts necessary to make it fully RESTARTABLE.
It is broken up below.
STEP ONE: Run the following SQL statements in Queryman or BTEQ
before you start FastLoad:
Figure 4-8
First, you ensure that the target table and error tables, if
they existed previously, are blown away. If there had been no
errors in the error tables, they would be automatically dropped. If
these tables did not exist, you have not lost anything. Next, if
needed, you create the empty table structure needed to receive a
FastLoad.
STEP TWO: Run the FastLoad scriptThis is the portion of the
earlier script that carries out these vital steps: Defines the
structure of the flat file Tells FastLoad where to load the data
and store the errors Specifies the checkpoint so a RESTART will not
go back to row one Loads the data
If these are true, all you need do is resubmit the FastLoad job
and it starts loading data again with the next record after the
last checkpoint. Now, with that said, if you did not request a
checkpoint, the output message will normally indicate how many
records were loaded.
You may optionally use the RECORD command to manually restart on
the next record after the one indicated in the message.
Now, if the FastLoad job aborts in Phase 2, you can simply
submit a script with only the BEGIN LOADING and END LOADING. It
will then restart right into Phase 2.What Happens When FastLoad
Finishes
You Receive an Outcome StatusThe most important thing to do is
verify that FastLoad completed successfully. This is accomplished
by looking at the last output in the report and making sure that it
is a return code or status code of zero (0). Any other value
indicates that something wasn't perfect and needs to be fixed.
Page 28 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig4%2D7%5F0%2Ejpg&image_id=30&previd=IMG_30&titlelabel=Figure+4%2D8%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
The locks will not be removed and the error tables will not be
dropped without a successful completion. This is because FastLoad
assumes that it will need them for its restart. At the same time,
the lock on the target table will not be released either. When
running FastLoad, you realistically have two choices once it is
started. First choice is that you get it to run to a successful
completion, or lastly, rerun it from the beginning. As you can
imagine, the best course of action is normally to get it to finish
successfully via a restart.
You Receive a Status ReportWhat happens when FastLoad finishes
running? Well, you can expect to see a summary report on the
success of the load. Following is an example of such a report.
Figure 4-9
The first line displays the total number of records read from
the input file. Were all of them loaded? Not really. The second
line tells us that there were fifty rows with constraint
violations, so they were not loaded. Corresponding to this, fifty
entries were made in the first error table. Line 3 shows that there
were zero entries into the second error table, indicating that
there were no duplicate Unique Primary Index violations. Line 4
shows that there were 999950 rows successfully loaded into the
empty target table. Finally, there were no duplicate rows. Had
there been any duplicate rows, the duplicates would only have been
counted. They are not stored in the error tables anywhere. When
FastLoad reports on its efforts, the number of rows in lines 2
through 5 should always total the number of records read in line
1.
Note on duplicate rows: Whenever FastLoad experiences a restart,
there will normally be duplicate rows that are counted. This is due
to the fact that a error seldom occurs on a checkpoint (quiet or
quiescent point) when nothing is happening within FastLoad.
Therefore, some number of rows will be sent to the AMPs again
because the restart starts on the next record after the value
stored in the checkpoint. Hence, when a restart occurs, the first
row after the checkpoint and some of the consecutive rows are sent
a second time. These will be caught as duplicate rows after the
sort. This restart logic is the reason that FastLoad will not load
duplicate rows into a MULTISET table. It assumes they are
duplicates because of this logic.
You can TroubleshootIn the example above, we know that the load
was not entirely successful. But that is not enough. Now we need to
troubleshoot in order identify the errors and correct them.
FastLoad generates two error tables that will enable us to find the
culprits. The first error table, which we named Errorfile1,
contains just three columns: The column ErrorCode contains the
Teradata FastLoad code number to a corresponding translation or
constraint error. The second column, named ErrorField, specifies
which column in the table contained the error. The third column,
DataParcel, contains the row with the problem. Both error tables
contain the same three columns; they just track different types of
errors.
As a user, you can select from either error table. To check
errors in Errorfile1 you would use this syntax: SELECT DISTINCT
ErrorCode, Errorfieldname FROM Errortable1;Corrected rows may be
inserted to the target table using another utility that does not
require an empty table.
To check errors in Errorfile2 you would the following syntax:
SELECT * FROM Errortable2;The definition of the second error table
is exactly the same as the target table with all the same columns
and data types.Restarting FastLoad: A More In-Depth Look
How the CHECKPOINT Option WorksCHECKPOINT option defines the
points in a load job where the FastLoad utility pauses to record
that Teradata has processed a specified number of rows. When the
parameter "CHECKPOINT [n]" is included in the BEGIN LOADING clause
the system will stop loading momentarily at increments of [n]
rows.
Page 29 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig4%2D8%5F0%2Ejpg&image_id=31&previd=IMG_31&titlelabel=Figure+4%2D9%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
At each CHECKPOINT, the AMPs will all pause and make sure that
everything is loading smoothly. Then FastLoad sends a checkpoint
report (entry) to the SYSADMIN.Fastlog table. This log contains a
list of all currently running FastLoad jobs and the last
successfully reached checkpoint for each job. Should an error occur
that requires the load to restart, FastLoad will merely go back to
the last successfully reported checkpoint prior to the error. It
will then restart from the record immediately following that
checkpoint and start building the next block of data to load. If
such an error occurs in Phase 1, with CHECKPOINT 0, FastLoad will
always restart from the very first row.
Restarting with CHECKPOINTSometimes you may need to restart
FastLoad. If the FastLoad script requests a CHECKPOINT (other than
0), then it is restartable from the last successful checkpoint.
Therefore, if the job fails, simply resubmit the job. Here are the
two options: Suppose Phase 1 halts prematurely; the Data
Acquisition phase is incomplete. Resubmit the FastLoad script.
FastLoad will begin from RECORD 1 or the first record past the last
checkpoint. If you wish to manually specify where FastLoad should
restart, locate the last successful checkpoint record by referring
to the SYSADMIN.FASTLOG table. To specify where a restart will
start from, use the RECORD command. Normally, it is not necessary
to use the RECORD command-let FastLoad automatically determine
where to restart from.
If the interruption occurs in Phase 2, the Data Acquisition
phase has already completed. We know that the error is in the
Application Phase. In this case, resubmit the FastLoad script with
only the BEGIN and END LOADING Statements. This will restart in
Phase 2 with the sort and building of the target table.
Restarting without CHECKPOINT (i.e., CHECKPOINT 0)When a failure
occurs and the FastLoad Script did not utilize the CHECKPOINT
(i.e., CHECKPOINT 0), one procedure is to DROP the target table and
error tables and rerun the job. Here are some other options
available to you:
1. Resubmit job again and hope there is enough PERM space for
all the rows already sent to the unsorted target table plus all the
rows that are going to be sent again to the same target table.
Other than using space, these rows will be rejected as duplicates.
As you can imagine, this is not the most efficient way since it
processes many of the same rows twice.
2. If CHECKPOINT wasn't specified, then CHECKPOINT defaults to
100,000. You can perform a manual restart using the RECORD
statement. If the output print file shows that checkpoint 100000
occurred, use something like the following command: [RECORD
100001;]. This statement will skip records 1 through 10000 and
resume on record 100001.
Using INMODs with FastLoadWhen you find that FastLoad does not
read the file type you have or you wish to control the access for
any reason, then it might be desirable to use an INMOD. An INMOD
(Input Module), is fully compatible with FastLoad in either
mainframe or LAN environments, providing that the appropriate
programming languages are used. However, INMODs replace the normal
mainframe DDNAME or LAN defined FILE name with the following
statement: DEFINE INMOD=. For a more in- depth discussion of
INMODs, see the chapter of this book titled, "INMOD
Processing".
Page 30 of 91
-
Teradata Utilities-Breaking the Barriers, First Edition
Chapter 5: MultiLoadAn Introduction to MultiLoad
Why it is called "Multi" LoadIf we were going to be stranded on
an island with a Teradata Data Warehouse and we could only take
along one Teradata load utility, clearly, MultiLoad would be our
choice. MultiLoad has the capability to load multiple tables at one
time from either a LAN or Channel environment. This is in stark
contrast to its fleet-footed cousin, FastLoad, which can only load
one table at a time. And it gets better, yet!
This feature rich utility can perform multiple types of DML
tasks, including INSERT, UPDATE, DELETE and UPSERT on up to five
(5) empty or populated target tables at a time. These DML functions
may be run either solo or in combinations, against one or more
tables. For these reasons, MultiLoad is the utility of choice when
it comes to loading populated tables in the batch environment. As
the volume of data being loaded or updated in a single block, the
performance of MultiLoad improves. MultiLoad shines when it can
impact more than one row in every data block. In other words,
MultiLoad looks at massive amounts of data and says, "Bring it
on!"
Leo Tolstoy once said, "All happy families resemble each other."
Like happy families, the Teradata load utilities resemble each
other, although they may have some differences. You are going to be
pleased to find that you do not have to learn all new commands and
concepts for each load utility. MultiLoad has many similarities to
FastLoad. It has even more commands in common with TPump. The
similarities will be evident as you work with them. Where there are
some quirky differences, we will point them out for you.
Two MultiLoad Modes: IMPORT and DELETEMultiLoad provides two
types of operations via modes: IMPORT and DELETE. In MultiLoad
IMPORT mode, you have the freedom to "mix and match" up to twenty
(20) INSERTs, UPDATEs or DELETEs on up to five target tables. The
execution of the DML statements is not mandatory for all rows in a
table. Instead, their execution hinges upon the conditions
contained in the APPLY clause of the script. Once again, MultiLoad
demonstrates its user-friendly flexibility. For UPDATEs or DELETEs
to be successful in IMPORT mode, they must reference the Primary
Index in the WHERE clause.
The MultiLoad DELETE mode is used to perform a global (all AMP)
delete on just one table. The reason to use .BEGIN DELETE MLOAD is
that it bypasses the Transient Journal (TJ) and can be RESTARTed if
an error causes it to terminate prior to finishing. When performing
in DELETE mode, the DELETE SQL statement cannot reference the
Primary Index in the WHERE clause. This due to the fact that a
primary index access is to a specific AMP; this is a global
operation.
The other factor that makes a DELETE mode operation so good is
that it examines an entire block of rows at a time. Once all the
eligible rows have been removed, the block is written one time and
a checkpoint is written. So, if a restart is necessary, it simply
starts deleting rows from the next block without a checkpoint. This
is a smart way to continue. Remember, when using the TJ all deleted
rows are put back into the table from the TJ as a rollback. A
rollback can take longer to finish then the delete. MultiLoad does
not do a rollback; it does a restart.
Page 31 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/figu54%5F1%5F0%2Ejpg&image_id=32&previd=IMG_32
-
Teradata Utilities-Breaking the Barriers, First Edition
In the above diagram, monthly data is being stored in a
quarterly table. To keep the contents limited to four months,
monthly data is rotated in and out. At the end of every month, the
oldest month of data is removed and the new month is added. The
cycle is "add a month, delete a month, add a month, delete a
month." In our illustration, that means that January data must be
deleted to make room for May's data.
Here is a question for you: What if there was another way to
accomplish this same goal without consuming all of these extra
resources? To illustrate, let's consider the following scenario:
Suppose you have TableA that contains 12 billion rows. You want to
delete a range of rows based on a date and then load in fresh data
to replace these rows. Normally, the process is to perform a
MultiLoad DELETE to DELETE FROM TableA WHERE < 2002-02-01'. The
final step would be to INSERT the new rows for May using MultiLoad
IMPORT.
Block and Tackle ApproachMultiLoad never loses sight of the fact
that it is designed for functionality, speed, and the ability to
restart. It tackles the proverbial I/O bottleneck problem like
FastLoad by assembling data rows into 64K blocks and writing them
to disk on the AMPs. This is much faster than writing data one row
at a time like BTEQ. Fallback table rows are written after the base
table has been loaded. This allows users to access the base table
immediately upon completion of the MultiLoad while fallback rows
are being loaded in the background. The benefit is reduced time to
access the data.
Amazingly, MultiLoad has full RESTART capability in all of its
five phases of operation. Once again, this demonstrates its
tremendous flexibility as a load utility. Is it pure magic? No, but
it almost seems so. MultiLoad makes effective use of two error
tables to save different types of errors and a LOGTABLE that stores
built-in checkpoint information for restarting. This is why
MultiLoad does not use the Transient Journal, thus averting
time-consuming rollbacks when a job halts prematurely.
Here is a key difference to note between MultiLoad and FastLoad.
Sometimes an AMP (Access Module Processor) fails and the system
administrators say that the AMP is "down" or "offline." When using
FastLoad, you must restart the AMP to restart the job. MultiLoad,
however, can RESTART when an AMP fails, if the table is fallback
protected. As the same time, you can use the AMPCHECK option to
make it work like FastLoad if you want.
MultiLoad Imposes LimitsRule #1: Unique Secondary Indexes are
not supported on a Target Table. Like FastLoad, MultiLoad does not
support Unique Secondary Indexes (USIs). But unlike FastLoad, it
does support the use of Non-Unique Secondary Indexes (NUSIs)
because the index subtable row is on the same AMP as the data row.
MultiLoad uses every AMP independently and in parallel. If two AMPs
must communicate, they are not independent. Therefore, a NUSI (same
AMP) is fine, but a USI (different AMP) is not.
Rule #2: Referential Integrity is not supported. MultiLoad will
not load data into tables that are defined with Referential
Integrity (RI). Like a USI, this requires the AMPs to communicate
with each other. So, RI constraints must be dropped from the target
table prior to using MultiLoad.
Rule #3: Triggers are not supported at load time. Triggers cause
actions on related tables based upon what happens in a target
table. Again, this is a multi-AMP operation and to a different
table. To keep MultiLoad running smoothly, disable all Triggers
prior to using it.
Rule #4: No concatenation of input files is allowed. MultiLoad
does not want you to do this because it could impact are restart if
the files were concatenated in a different sequence or data was
deleted between runs.
Rule #5: The host will not process aggregates, arithmetic
functions or exponentiation. If you need data conversions or math,
you might be better off using an INMOD to prepare the data prior to
loading it.
Error Tables, Work Tables and Log TablesBesides target table(s),
MultiLoad requires the use of four special tables in order to
function. They consist of two error tables (per target table), one
worktable (per target table), and one log table. In essence, the
Error Tables will be used to store any conversion, constraint or
uniqueness violations during a load. Work Tables are used to
receive and sort data and SQL on each AMP prior to storing them
permanently to disk. A Log Table (also called, "Logtable") is used
to store successful checkpoints during load processing in case a
RESTART is needed.
HINT: Sometimes a company wants all of these load support tables
to be housed in a particular database. When these tables are to be
stored in any database other than the user's own default database,
then you must give them a
Page 32 of 91
-
Teradata Utilities-Breaking the Barriers, First Edition
qualified name (.) in the script or use the DATABASE command to
change the current database.
Where will you find these tables in the load script? The
Logtable is generally identified immediately prior to the .LOGON
command. Worktables and error tables can be named in the BEGIN
MLOAD statement. Do not underestimate the value of these tables.
They are vital to the operation of MultiLoad. Without them a
MultiLoad job can not run. Now that you have had the "executive
summary", let's look at each type of table individually.
Two Error Tables: Here is another place where FastLoad and
MultiLoad are similar. Both require the use of two error tables per
target table. MultiLoad will automatically create these tables.
Rows are inserted into these tables only when errors occur during
the load process. The first error table is the acquisition Error
Table (ET). It contains all translation and constraint errors that
may occur while the data is being acquired from the source(s).
The second is the Uniqueness Violation (UV) table that stores
rows with duplicate values for Unique Primary Indexes (UPI). Since
a UPI must be unique, MultiLoad can only load one occurrence into a
table. Any duplicate value will be stored in the UV error table.
For example, you might see a UPI error that shows a second employee
number "99." In this case, if the name for employee "99" is Kara
Morgan, you will be glad that the row did not load since Kara
Morgan is already in the Employee table. However, if the name
showed up as David Jackson, then you know that further
investigation is needed, because employee numbers must be
unique.
Each error table does the following: Identifies errors Provides
some detail about the errors Stores the actual offending row for
debugging
You have the option to name these tables in the MultiLoad script
(shown later). Alternatively, if you do not name them, they default
to ET_ and UV_. In either case, MultiLoad will not accept error
table names that are the same as target table names. It does not
matter what you name them. It is recommended that you standardize
on the naming convention to make it easier for everyone on your
team. For more details on how these error tables can help you, see
the subsection in this chapter titled, "Troubleshooting MultiLoad
Errors."
Log Table: MultiLoad requires a LOGTABLE. This table keeps a
record of the results from each phase of the load so that MultiLoad
knows the proper point from which to RESTART. There is one LOGTABLE
for each run. Since MultiLoad will not resubmit a command that has
been run previously, it will use the LOGTABLE to determine the last
successfully completed step.
Work Table(s): MultiLoad will automatically create one worktable
for each target table. This means that in IMPORT mode you could
have one or more worktables. In the DELETE mode, you will only have
one worktable since that mode only works on one target table. The
purpose of worktables is to hold two things:
1. The Data Manipulation Language (DML) tasks2. The input data
that is ready to APPLY to the AMPs
The worktables are created in a database using PERM space. They
can become very large. If the script uses multiple SQL statements
for a single data record, the data is sent to the AMP once for each
SQL statement. This replication guarantees fast performance and
that no SQL statement will ever be done more than once. So, this is
very important. However, there is no such thing as a free lunch,
the cost is space. Later, you will see that using a FILLER field
can help reduce this disk space by not sending unneeded data to an
AMP. In other words, the efficiency of the MultiLoad run is in your
hands.
Supported Input FormatsData input files come in a variety of
formats but MultiLoad is flexible enough to handle many of them.
MultiLoad supports the following five format options: BINARY,
FASTLOAD, TEXT, UNFORMAT and VARTEXT.
Page 33 of 91
-
Teradata Utilities-Breaking the Barriers, First Edition
Figure 5-1MultiLoad Has Five IMPORT PhasesMultiLoad IMPORT has
five phases, but don't be fazed by this! Here is the short list:
Phase 1: Preliminary Phase Phase 2: DML Transaction Phase Phase 3:
Acquisition Phase Phase 4: Application Phase Phase 5: Cleanup
Phase
Let's take a look at each phase and see what it contributes to
the overall load process of this magnificent utility. Should you
memorize every detail about each phase? Probably not. But it is
important to know the essence of each phase because sometimes a
load fails. When it does, you need to know in which phase it broke
down since the method for fixing the error to RESTART may vary
depending on the phase. And if you can picture what MultiLoad
actually does in each phase, you will likely write better scripts
that run more efficiently.
Phase 1: Preliminary PhaseThe ancient oriental proverb says,
"Measure one thousand times; Cut once." MultiLoad uses Phase 1 to
conduct several preliminary set-up activities whose goal is to
provide a smooth and successful climate for running your load. The
first task is to be sure that the SQL syntax and MultiLoad commands
are valid. After all, why try to run a script when the system will
just find out during the load process that the statements are not
useable? MultiLoad knows that it is much better to identify any
syntax errors, right up front. All the preliminary steps are
automated. No user intervention is required in this phase.
Second, all MultiLoad sessions with Teradata need to be
established. The default is the number of available AMPs. Teradata
will quickly establish this number as a factor of 16 for the basis
regarding the number of sessions to create. The general rule of
thumb for the number of sessions to use for smaller systems is the
following: use the number of AMPs plus two more. For larger systems
with hundreds of AMP processors, the SESSIONS option is available
to lower the default. Remember, these sessions are running on your
poor little computer as well as on Teradata.
Each session loads the data to Teradata across the network or
channel. Every AMP plays an essential role in the MultiLoad
process. They receive the data blocks, hash each row and send the
rows to the correct AMP. When the rows come to an AMP, it stores
them in worktable blocks on disk. But, lest we get ahead of
ourselves, suffice it to say that there is ample reason for
multiple sessions to be established.
What about the extra two sessions? Well, the first one is a
control session to handle the SQL and logging. The second is a back
up or alternate for logging. You may have to use some trial and
error to find what works best on your system configuration. If you
specify too few sessions it may impair performance and increase the
time it takes to complete load jobs. On the other hand, too many
sessions will reduce the resources available for other important
database activities.
Third, the required support tables are created. They are the
following:
Page 34 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig5%2D1%5F0%2Ejpg&image_id=33&previd=IMG_33&titlelabel=Figure+5%2D1%3A+
-
Teradata Utilities-Breaking the Barriers, First Edition
Figure 5-2
The final task of the Preliminary Phase is to apply utility
locks to the target tables. Initially, access locks are placed on
all target tables, allowing other users to read or write to the
table for the time being. However, this lock does prevent the
opportunity for a user to request an exclusive lock. Although,
these locks will still allow the MultiLoad user to drop the table,
no one else may DROP or ALTER a target table while it is locked for
loading. This leads us to Phase 2.
Phase 2: DML Transaction PhaseIn Phase 2, all of the SQL Data
Manipulation Language (DML) statements are sent ahead to Teradata.
MultiLoad allows the use of multiple DML functions. Teradata's
Parsing Engine (PE) parses the DML and generates a step-by-step
plan to execute the request. This execution plan is then
communicated to each AMP and stored in the appropriate worktable
for each target table. In other words, each AMP is going to work
off the same page.
Later, during the Acquisition phase the actual input data will
also be stored in the worktable so that it may be applied in Phase
4, the Application Phase. Next, a match tag is assigned to each DML
request that will match it with the appropriate rows of input data.
The match tags will not actually be used until the data has already
been acquired and is about to be applied to the worktable. This is
somewhat like a student who receives a letter from the university
in the summer that lists his courses, professor's names, and
classroom locations for the upcoming semester. The letter is a
"match tag" for the student to his school schedule, although it
will not be used for several months. This matching tag for SQL and
data is the reason that the data is replicated for each SQL
statement using the same data record.
Phase 3: Acquisition PhaseWith the proper set-up complete and
the PE's plan stored on each AMP, MultiLoad is now ready to receive
the INPUT data. This is where it gets interesting! MultiLoad now
acquires the data in large, unsorted 64K blocks from the host and
sends it to the AMPs.
At this point, Teradata does not care about which AMP receives
the data block. The blocks are simply sent, one after the other, to
the next AMP in line. For their part, each AMP begins to deal with
the blocks that they have been dealt. It is like a game of
cards-you take the cards that you have received and then play the
game. You want to keep some and give some away.
Similarly, the AMPs will keep some data rows from the blocks and
give some away. The AMP hashes each row on the primary index and
sends it over the BYNET to the proper AMP where it will ultimately
be used. But the row does not get inserted into its target table,
just yet. The receiving AMP must first do some preparation before
that happens. Don't you have to get ready before company arrives at
your house? The AMP puts all of the hashed rows it has received
from other AMPs into the worktables where it assembles them into
the SQL. Why? Because once the rows are reblocked, they can be
sorted into the proper order for storage in the target table. Now
the utility places a load lock on each target table in preparation
for the Application Phase. Of course, there is no Acquisition Phase
when you perform a MultiLoad DELETE task, since no data is being
acquired.
Phase 4: Application PhaseThe purpose of this phase is to write,
or APPLY, the specified changes to both the target tables and NUSI
subtables. Once the data is on the AMPs, it is married up to the
SQL for execution. To accomplish this substitution of data into
SQL, when sending the data, the host has already attached some
sequence information and five (5) match tags to each data row.
Those match tags are used to join the data with the proper SQL
statement based on the SQL statement within a DMP label. In
addition to associating each row with the correct DML statement,
match tags also guarantee that no row will be updated more than
once, even when a RESTART occurs.
The following five columns are the matching tags:
Page 35 of 91
http://www.books24x7.com/viewer.asp?bkid=5565&image_src=http://images.books24x7.com/bookimages/id_5565/fig5%2D2%5F0%2Ejpg&image_id=34&previd=IMG_34&