-
27th June
2013http://www.teradatamagazine.com/v09n02/tech2tech/applied-solutions-3-whats-a-dba-to-do/
[http://www.teradatamagazine.com/v09n02/tech2tech/applied-solutions-3-whats-a-dba-to-do/]
A DATA WAREHOUSE FROM TERADATA ELIMINATES MANY MANUAL TASKS
TYPICALOF OTHER SYSTEMS.
BY ROGER MANN
What do experienced DBAs with an Oracle, IBM or Microsoft
background need to know about managing aTeradata system? Basically,
much less than they need to know about the others.
The Teradata Database is a shared-nothing massively parallel
processing (MPP) relational databasemanagement system (RDBMS),
making it the only commercially available RDBMS designed from
theground up for data warehousing.
Parallel processing and the automation of many typical DBA
functions were created in the DNA of theTeradata Database. Because
of that architecture, many functions that are manual or enhanced by
wizardsin other vendors systems are managed automatically.
Consequently, the roles and responsibilities of theDBA are
significantly different. Fewer tasks are required, making the
system much easier to manage.Understanding those differences and
how to exploit them with the RDBMS is key to driving the success
ofthe organization.
The next sections break down how a data warehouse from Teradata
differs from other systems, enablingthe DBA to focus on more
productive work and less on manually maintaining the data
warehouse.
Data warehouse performance is achieved in a parallel database
architecture by the divide and conquermethod. First, the data is
divided into small, equal units. Then independent software modules
processthose units simultaneously (i.e., in parallel) to conquer
the problems (i.e., answer the queries).
The units of work and the hardware resources allocated to each
parallel software module must be asequal as possible. Like a chain
that is only as strong as its weakest link, the overall job cannot
concludeuntil every unit has completed its processing. This
balanced processing of workloads in the TeradataDatabase enables
superior query performance.
The DBAs objective, therefore, is to install a balanced
processing platform, operating system (OS),database management
software and disk subsystems. In a typical environment, these steps
requirecareful planning with time-consuming analysis of user data
and targeted queries. Anxious for a quickreturn on investment
(ROI), however, management too often applies pressure to take
shortcutswhichcan have disastrous consequences.
With a Teradata purpose-built platform, the OS, database and
disk subsystems are installed beforesystem delivery. All that is
needed is the size of the raw user data, number of concurrent users
and sometargeted queries. This information is inputted into a
system-sizing calculator and a platform configuration
Whats a DBA to do?
SYSTEM INSTALLATION
Classic
Dynamic Views template. Powered by Blogger.
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
1 of 45 8/16/2014 12:40 AM
-
[http://www.teradatamagazine.com/tdmo_assets/tdmo_images/table.png]
Click to enlarge
[http://www.teradatamagazine.com/tdmo_assets/tdmo_images/chart.png]
are free from the burden of having to understand and be
responsible for setting the various options andinitialization
parameters with both the database and OS.
Those runtime database control parameter settings are critical
for performance tuning of transactionprocessing applications where
queries are predetermined and must be tuned. However, a
datawarehouse is built on the concept that users are able to ask
any question of any data at any time. Thereis no opportunity to
know or tune the queries beforehand. This is best left to the
Teradata Database tooptimize and tune dynamically.
In short, once the Teradata platform is installed, the DBA can
immediately define the databases, usersand tables and load data.
Users can then leverage the data warehouse as it is intendedto run
queriesthat answer their business intelligence (BI) questions.
Because the Teradata storage subsystem isinstalled and balanced
before delivery,management of the disk subsystem is
greatlysimplified. The DBA familiar with managing itemssuch as disk
groups, logical volumes, nodegroups, file system, files and
tablespaces willfind that those entities and concepts
arenonexistent. (See table.)
All disk organization is entirely logical, asopposed to
physical. (See figure) Initially, allspace in the system is
allocated to a predefinedsystem database called DBC. Using
theCREATE DML command, the DBA will defineDATABASE and USER
entities. The spaceparameter on the CREATE DML statement is nota
physical allocation but is simply a size quota. If a database is
allocated 5TB of space that is themaximum amount of space the
database is allowed to use. Anytime that database attempts to use
morethan 5TB, an out of space message will result. However, the
system is not out of space, it just exceededits space quota.
A database management system (DBMS) isdesigned for storing and
retrieving data. Typicalfile-system architectures fragment
andperformance degrades over time as insert,updates and deletes are
applied to the data.
Teradata broke all the rules with its file systemdesign. Data is
not stored in B-Tree indexesbased on data values; rather, the file
system isbuilt on raw disk slices. There are no pages,BufferPools,
tablespaces, extents, etc. to
STORAGE MANAGEMENT
DISK FILE SYSTEM
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
2 of 45 8/16/2014 12:40 AM
-
the DBA never has to do a re-organization, and performance is
optimized on a continual basis and doesnot degrade with file
updates.
Defining users is easy with a Teradata system. There are two
types: query users with no workspacecapability; and power users who
have the capability to create and manipulate tables within their
ownworkspace based on whatever limitations the DBA placed on their
space usage.
The DBA first adds the users with a CREATE USER DML command,
then grants them security rights todatabase entities. Role-based
security is supported for ease of maintenance.
DBAs need to resist the temptation to over-index the tables.
Because of the powerful parallel architectureof the Teradata
platform, it is unnecessary to avoid full-table scans. Therefore,
far fewer indexes areneeded than in other RDBMSs. In fact, as
experienced in several organizations, tables having more than80
billion rows each can be scanned in less than five minutes.
These recommended steps will help determine the number of
indexes needed in the Teradata Database:
The DBA defines a primary index (PI) and a secondary index on
any column that will participate as aforeign key in join
operations. The PI is for data distribution and keyed access.
Once the indexes are defined, the query workload is run and the
query capture facility logs the queryactivity.
The Teradata Index Wizard uses this information to recommend the
addition or removal of indexesbased on actual query usage.
This process saves the DBA from having to manually analyze the
number of indexes.
Special indexes are available for specific performance needs.
The partitioned primary index (PPI), for one,can deliver dramatic
results. A large transaction file that is accessed with date
parameter queries is agood candidate for creating a PPI on
transaction date. The Optimizer then can eliminate partitions on
anydate-sensitive queries with dramatic response-time
reductions.
The multi-level PPI, join, aggregate and aggregate join indexes
are tools that can turbo-charge certainapplications.
Teradata Active System Management provides the necessary tools
for comprehensive workloadmanagement; therefore, no outside tools
or resources are needed. The product has three components:
Dynamic Query Manager enables the DBA to classify and govern the
query before its execution in thedatabase.
Priority Scheduler defines resource partitions where varying
workloads can be controlled and monitored.
Database Query Log provides post-execution performance
analysis.
Because Teradata tools administer all performance management and
tuning needs, the DBA no longerhas to be an expert in the OS,
database and third-party tools.
USER MANAGEMENT
INDEX MANAGEMENT
WORKLOAD MANAGEMENT
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
3 of 45 8/16/2014 12:40 AM
-
Expanding the Teradata system is similar to ordering the initial
platform. With assistance from TeradataProfessional Services, the
DBA determines the amount of raw data on the existing Teradata
system andthe number of concurrent users, as well as a few critical
queries. Then those numbers are added to theanticipated growth in
each area. These values are input into a system-sizing calculator,
which producesthe additional platform requirements.
As with the original Teradata system, the additional platform
and software will be pre-configured,delivered, installed and
connected to the existing platform. The data redistribution utility
is then run, whichautomatically rebalances the data on the system.
The tool relocates any data that belongs on the newplatform nodes
had it been installed at the time the data was loaded. Once that
data is relocated, theutility tool removes it from the old nodes.
(No data movement occurs between the original nodes.)Redistributing
the data requires downtime on the system, but the process normally
takes less than a shiftto complete.
The features of the Teradata system make it ideal for data
warehouse applications. The balanced,purpose-built platform arrives
ready to deliver the first application generally in days, instead
of weeks ormonths.
With the automated data management features, DBAs are freed from
having to micro-manage the filesystem and can, therefore, engage in
other tasks and responsibilities. For instance, instead of the
DBAconstantly writing and tuning queries, the query optimizer
allows the user to ask any question, anytime.The support and
freedom provided by a data warehouse from Teradata empowers DBAs to
concentrateon working with the user community to deliver greater
business value to their organization.
Posted 27th June 2013 by pankaj agarwal
A BETTER VALUE
WHAT TERADATA DBAS DONT DO:
With the automatic features included in a Teradata Database,
DBAs have fewer tasks and responsibilities forimplementing and
maintaining the system. As identified in this partial list of
duties, Teradata DBAs have neverbeen required to:
Install an operating system (OS)
Understand and set extensive OS tuning parameters
Install the Teradata Database
Understand and set extensive Teradata Database parameters
Write programs/execute utilities that determine how to divide
data into partitions
Determine size and physical location of each table and index
partition or simple tablespace
Code/allocate/format partitions or underlying file
structures
Embed partition assignment into CREATE TABLE statements
Determine level/degrees of parallelism to be assigned to
tables/partitions/databases
Assign and manage special buffer pools for parallel
processing
Associate tables/queries with parallel degrees
Code/allocate/format temporary work space
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
4 of 45 8/16/2014 12:40 AM
-
25th March 2013
Join IndexesThe join index JOIN the two tables together and
keeps the result set in the permanentspace of TD.The join index
will hold the result set of two table, and at the time of JOIN,
parsing enginewill decide whether it is fast to build the result
set from the actual BASE tables or the JOINindex.User never
directly query the JOIN index.In the sense JOIN index is the result
of joining two tables together so that parsing enginealways decide
to take the result set from this JOIN index instead of going and
doingmanual join on the base table.Types of JOIN index-
1. Multi table Join Index- Suppose we have two BASE tables
Employee and Dept, whichholds the data of employee and department
respectively. Now a JOIN index on these twotables will be somewhat
Create Join Index emp_dept asSelect empno, empname, emp_dept,
emp_sal, emp_mgrFrom employee e inner join dept dOn
e.emp_dept=d.deptnoUnique primary index (empno);This way the JOIN
index EMP_DEPT holds the result set of two BASE tables and at
thetime of JOIN, PE will decide whether it is faster to join actual
tables or to take result setfrom this JOIN index. So always choose
wise list of columns and tables to create JOINindex.
2. Single table JOIN Index A Single table JOIN index duplicate a
single table, but changesthe primary index. Users will only query
the base table and its PE who decide which resultset is faster,
from JOIN index or from actual BASE tables. The reason to create
the singletable JOIN index is so joins can be performed faster
because no redistribution orduplication needs to occur.
Create Join Index emp_snap asSelect empno, empname, emp_deptFrom
employeeprimary index (empdept);
3. Aggregate JOIN Index An aggregate JOIN index will allow the
tracking of averages SUMand COUNT on any table. This JOIN index is
basically used if we need to perform anyaggregate function in the
data of the table.Create Join Index AGG_TABLESelEmpno,
sum(emp_sal)
Join Indexes
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
5 of 45 8/16/2014 12:40 AM
-
2. Users never query them directly, its PE who decide which
result set to take.3. Updated when base tables are changed.4. Cant
be loaded with fastload or multiload.
Posted 25th March 2013 by pankaj agarwal
0 Add a comment
21st March 2013
The very mention of changing data on disk implies that space
must be managed bythe AMP(s) owning the row(s) to modify. Data
cannot be changed unless it is readfrom the disk.
For INSERT operations, a new block might be written or an
existing block might bemodified to contain the new data row. The
choice of which to use depends onwhether or not there is sufficient
space on the disk to contain the original blockplus the number of
bytes in the new row.
If the new row causes the block to increase beyond the current
number of sectors,the AMP must locate an empty slot with enough
contiguous sectors to hold thelarger block. Then, it can allocate
this new area for the larger block.
A DELETE is going to make one or more blocks shorter. Therefore,
it should neverhave to find a larger slot in which to write the
block back to disk. However, it stillhas to read the existing
block, remove the appropriate rows and re-write thesmaller
block.
The UPDATE is more unpredictable than either the DELETE or the
INSERT. This isbecause an UPDATE might increase the size of the
block like the INSERT, decreasethe size like the DELETE or not
change the size at all.
A larger block might occur because one of the following
conditions:
A NULL value was compressed and now must be expanded to contain
avalue. This is the most likely situation .
A longer character literal is stored into a VARCHAR column.
Performance Issues With DataMaintenance
Performance Issues With Data Maintenance
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
6 of 45 8/16/2014 12:40 AM
-
A block size does not change:
The column is a fixed length CHAR, regardless of the length of
theactual character data value, the length stays at the maximum
defined.
All numeric columns are stored in their maximum number of
bytes.
There are many reasons for performance gains or losses. Another
consideration,which was previously mentioned, is the journal
entries for the Transient Journalfor recovery and rollback
processing. The Transient Journal is mandatory andcannot be
disabled. Without it, data integrity cannot be guaranteed.
When using FALLBACK on tables, it negatively impacts the
processing time whenchanging rows within a table. This is due to
the fact that the same change mustalso be made on the AMP storing
the FALLBACK copy of the row(s) involved. Thesechanges involve
additional disk I/O operations and the use of two AMPs instead
ofone for each row INSERT, UPDATE, or DELETE. That equates to twice
as much I/Oactivity.
When using PERMANENT JOURNAL logging on tables, it will
negatively impact theprocessing time when changing rows within a
table. This is due to the fact that theUPDATE processing also
inserts a copy of the row into the journal table. If BEFOREjournals
are used, a copy of the row as it existed before a change is placed
intothe log table. When AFTER images are requested, a copy of the
row is insertedinto the journal table that looks exactly like the
changed row.
There is another issue to consider for journaling, based on
SINGLE or DUALjournaling. DUAL asks for a second (mirror) copy to
be inserted. It is the journalsway to provide FALLBACK copies
without the table being required to useFALLBACK. The caution here
is that if the TABLE is FALLBACK protected, so are thejournals.
This will further impact the performance of the row
modification.
In Teradata, all tables must have a Primary Index (PI). It is a
normal and veryimportant part of the storage and retrieval of rows
for all tables. Therefore, thereis no additional overhead
processing involved in an INSERT or DELETE operationfor Primary
Indices.
However, when using an UPDATE and the data value of a PI is
changed, there ismore processing required than when changing the
content of any other column.This is due to the fact that the
original row must be read, literally deleted from thecurrent AMP
and rehashed, redistributed and inserted on another AMP based
on
Impact of FALLBACK on Row Modification
Impact of PERMANENT JOURNAL Logging on Row Modification
Impact of Primary Index on Row Modification
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
7 of 45 8/16/2014 12:40 AM
-
successfully complete the operation when a PI is the column
being modified.
In Teradata, a Secondary Index is optional. Currently, a table
may have 32secondary indices. Each index may be a combination of up
16 columns within atable. Every unique data value in a defined
index has a row in the subtable andpotentially one on each AMP for
a NUSI (Non Unique Secondary Index).Additionally, every index has
its own subtable.
When using secondary indices on tables, it may also negatively
impact theprocessing time when changing rows within a table. This
is due to the fact thatwhen a column is part of an index and its
data value is changed in the base table,the index value must also
be changed in the subtable. This normally requires thata row be
read, deleted and inserted into a subtable when the column is
involved ina USI (Unique Secondary Index). Remember that the delete
and insert areprobably be on different AMP processors.
For a NUSI, the processing all takes place on the same AMP. This
is referred to asAMP Local. At first glance this sounds like a good
thing. However, the processingrequires a read of the old NUSI, a
modification, and a rewrite. Then, most likely itwill be necessary
to insert an index row into the subtable. However, if the
NUSIalready exists, Teradata needs to read the existing NUSI,
append the new datavalue to it and re-write it back into the
subtable. This is why it is important not tocreate a Primary Index
or a Secondary Index on data that often changes.
The point of this discussion is simple. If secondary indices are
used, additionalprocessing is involved when the data value of the
index is changed. This is true onan INSERT, a DELETE and an UPDATE.
So, if a secondary index is defined, makesure that the SQL is using
it to receive the potential access speed benefit. AnEXPLAIN can
provide this information. If it is not being used, drop the
index.
As an added note to consider, when using composite secondary
indices, the samecolumn can be included in multiple indices. When
this is the case, any data valuechange requires multiple subtable
changes. The result is that the number ofindices in which it is
defined multiplies the previous AMP and
subtable-processingoverhead. Therefore, it becomes more important
to choose columns with a lowprobability of change.
Posted 21st March 2013 by pankaj agarwal
Impact of Secondary Indices on Row Modification
0 Add a comment
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
8 of 45 8/16/2014 12:40 AM
-
SELECT * FROM DBC.DBCINFO;
The access language for all modern relational database systems
(RDBMS) isStructured Query Language (SQL). It has evolved over time
to be the standard.The ANSI SQL group defines which commands and
functionality all vendors shouldprovide within their RDBMS.
There are three levels of compliance within the standard: Entry,
Intermediate andFull. The three level definitions are based on
specific commands, data types andfunctionalities. So, it is not
that a vendor has incorporated some percentage of thecommands; it
is more that each command is categorized as belonging to one ofthe
three levels. For instance, most data types are Entry level
compliant. Yet,there are some that fall into the Intermediate and
Full definitions.
Since the standard continues to grow with more options being
added, it is difficultto stay fully ANSI compliant. Additionally,
all RDBMS vendors provide extrafunctionality and options that are
not part of the standard. These extra functionsare called
extensions because they extend or offer a benefit beyond those in
thestandard definition.
At the writing of this book, Teradata was fully ANSI Entry level
compliant based onthe 1992 Standards document. NCR also provides
much of the Intermediate andsome of the Full capabilities. This
book indicates feature by feature which SQLcapabilities are ANSI
and which are Teradata specific, or extensions. It is to
NCRsbenefit to be as compliant as possible in order to make it
easier for customers ofother RDBMS vendors to port their data
warehouse to Teradata.
As indicated earlier, SQL is used to access, store, remove and
modify data storedwithin a relational database, like Teradata. The
SQL is actually comprised of threetypes of statements. They are:
Data Definition Language (DDL), Data ControlLanguage (DCL) and Data
Manipulation Language (DML). The primary focus of thisbook is on
DML and DDL. Both DDL and DCL are, for the most part, used
foradministering an RDBMS. Since the SELECT statement is used the
vast majority ofthe time, we are concentrating on its
functionality, variations and capabilities.
Everything in the first part of this chapter describes ANSI
standard capabilities ofthe SELECT command. As the statements
become more involved, each capabilitywill be designated as either
ANSI or a Teradata Extension.
Using the SELECT has been described like playing the game,
Jeopardy. The answer
Determining the Release of Your Teradata System:
Fundamental Structured Query Language (SQL)
Basic SELECT Command
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
9 of 45 8/16/2014 12:40 AM
-
case of the statement is not important. The SQL statements can
be written usingall uppercase, lowercase or a combination; it does
not matter to the Teradata PE.
The SELECT is used to return the data value(s) stored in the
columns namedwithin the SELECT command. The requested columns must
be valid names definedin the table(s) listed in the FROM portion of
the SELECT.
The following shows the format of a basic SELECT statement. In
this book, thesyntax uses expressions like: (see Figure 1-1) to
represent thelocation of one or more names required to construct a
valid SQL statement:
The structure of the above command places all keywords on the
left in uppercaseand the variable information such as column and
table names to the right. Likeusing capital letters, this
positioning is to aid in learning SQL. Lastly, although theuse of
SEL is acceptable in Teradata, with [ECT] in square brackets being
optional,it is not ANSI standard.
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
10 of 45 8/16/2014 12:40 AM
-
SEL[ECT] FROM ;
Both of these SELECT statements produce the output report, but
the above style iseasier to read and debug for complex queries. The
output display might appear as:
3 Rows Returned
aaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbb
cccccccccccccccccc
In the output, the column name becomes the default heading for
the report. Then,the data contained in the selected column is
displayed once for each row returned.
The next variation of the SELECT statement returns all of the
columns defined inthe table indicated in the FROM portion of the
SELECT.
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
11 of 45 8/16/2014 12:40 AM
-
The output of the above request uses each column name as the
heading and thecolumns are displayed in the same sequence as they
are defined in the table.Depending on the tool used to submit the
request, care should be taken, becauseif the returned display is
wider than the media (i.e. terminal=80 and paper=133);it may be
truncated.
At times, it is desirable to select the same column twice. This
is permitted and toaccomplish it, the column name is simply listed
in the SELECT column list morethan once. This technique might often
be used when doing aggregations orcalculating a value, both are
covered in later chapters.
The table below is used to demonstrate the results of various
requests. It is asmall table with a total of ten rows for easy
comparison.
Student Table - contains 10 students
Student_ID Last_Name First_name Class_code Grade_Pt
PK FKDynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
12 of 45 8/16/2014 12:40 AM
-
234121
231222
260000
280023
322133
324652
333450
423400
Thomas
Wilson
Johnson
McRoberts
Bond
Delaney
Smith
Larkins
Wendy
Susie
Stanley
Richard
Jimmy
Danny
Andy
Michael
FR
SO
JR
JR
SR
SO
FR
4.00
3.80
1.90
3.95
3.35
2.00
0.00
Figure 2-1
For Example: the next SELECT might be used with Figure 2-1, to
display thestudent number, the last name, first name, the class
code and grade point for allof the students in the Student
table:
SELECT *FROM Student_Table ;
10 Rows returned
Student_ID Last_Name First_Name Class_Code Grade_Pt
423400 Larkins Michael FR 0.00
125634 Hanson Henry FR 2.88
280023 McRoberts Richard JR 1.90
260000 Johnson Stanley ? ?
231222 Wilson Susie SO 3.80
234121 Thomas Wendy FR 4.00
324652 Delaney Danny SR 3.35
123250 Phillips Martin SR 3.00
322133 Bond Jimmy JR 3.95
333450 Smith Andy SO 2.00
Notice that Johnson has question marks in the grade point and
class codecolumns. Most client software uses the question mark to
represent missing data oran unknown value (NULL). More discussion
on this condition will appearthroughout this book. The other thing
to note is that character data is alignedfrom left to right, the
same as we read it and numeric is from right to left, fromthe
decimal.
This SELECT returns all of the columns except the Student ID
from the StudentDynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
13 of 45 8/16/2014 12:40 AM
-
10 Rows returned
First_Name Last_Name Class_Code Grade_Pt
Michael Larkins FR 0.00
Henry Hanson FR 2.88
Richard McRoberts JR 1.90
Stanley Johnson ? ?
Susie Wilson SO 3.80
Wendy Thomas FR 4.00
Danny Delaney SR 3.35
Martin Phillips SR 3.00
Jimmy Bond JR 3.95
Andy Smith SO 2.00
There is no short cut for selecting all columns except one or
two. Also, notice thatthe columns are displayed in the output in
the same sequence they are requestedin the SELECT statement.
The previous unconstrained SELECT statement returned every row
from thetable. Since the Teradata database is most often used as a
data warehouse, a tablemight contain millions of rows. So, it is
wise to request only certain types of rowsfor return.
By adding a WHERE clause to the SELECT, a constraint is
established to potentiallylimit which rows are returned based on a
TRUE comparison to specific criteria orset of conditions.
WHERE Clause
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
14 of 45 8/16/2014 12:40 AM
-
The conditional check in the WHERE can use the ANSI comparison
operators(symbols are ANSI / alphabetic is Teradata Extension):
Equal Not Equal Less Than Greater Than Less Than or Equal
Greater Than or Equal
= < > =
EQ NE LT GT LE GE
Figure 2-2
The following SELECT can be used to return the students with a B
(3.0) average orbetter from the Student table:
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
15 of 45 8/16/2014 12:40 AM
-
5 Rows returned
Student_ID Last_Name Grade_Pt
231222 Wilson 3.80
234121 Thomas 4.00
324652 Delaney 3.35
123250 Phillips 3.00
322133 Bond 3.95
Without the WHERE clause, the AMPs return all of the rows in the
table to theuser. More and more Teradata user systems are getting
to the point where theyare storing billions of rows in a single
table. There must be a very good reason forneeding to see all of
them. More simply put, you will always use a WHERE clausewhenever
you want to see only a portion of the rows in a table.
Compound Comparisons ( AND / OR )Dynamic Views template. Powered
by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
16 of 45 8/16/2014 12:40 AM
-
The following is the syntax for using the AND logical
operator:
Notice that the column name is listed for each comparison
separated by a logicaloperator; this will be true even when it is
the same column being compared twice.The AND signifies that each
individual comparison on both sides of the AND mustbe true. The
final result of the comparison must be TRUE for a row to be
returned.
This Truth Table illustrates this point using AND.
First Test Result AND Second Test Result Final Result
True True True
True False False
False True FalseDynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
17 of 45 8/16/2014 12:40 AM
-
never contain more than a single data value.
Therefore, it does not make good sense to issue the next SELECT
using an AND onthe same column because no rows will ever be
returned.
No rows found
The above SELECT will never return any rows. It is impossible
for a column tocontain more than one value. No student has a 3.0
grade average AND a 4.0average. They might have one or the other,
but not both. It might contain one orthe other, but never
both at the same time. The AND operator indicates both must be
TRUE and shouldnever be used between two comparisons on the same
column.
By substituting an OR logical operator for the previous AND,
rows will now bereturned.
The following is the syntax for using OR:
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
18 of 45 8/16/2014 12:40 AM
-
2 Rows returned
Student_ID Last_Name First_Name Grade_Pt
234121 Thomas Wendy 4.00
123250 Phillips Martin 3.00
The OR signifies that only one of the comparisons on each side
of the OR needs tobe true for the entire test to result in a true
and the row to be selected.
This Truth Table illustrates the results for the OR:
First Test Result OR Second Test Result Final Result
True True True
True False True
False True True
False False False
Figure 2-4
When using the OR, the same column or different column names may
be used. Inthis case, it makes sense to use the same column because
a row is returned whena column contains either of the specified
values as opposed to both values as seenwith AND.
It is perfectly legal and common practice to combine the AND
with the OR in asingle SELECT statement.
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
19 of 45 8/16/2014 12:40 AM
-
2 Rows returned
Student_ID Last_Name First_Name Class_Code Grade_Pt
234121 Thomas Wendy FR 4.00
123250 Phillips Martin SR 3.00
At first glance, it appears that the comparison worked
correctly. However, uponcloser evaluation it is incorrect because
Phillips is a senior and not a freshman.
When mixing AND with OR in the same WHERE clause, it is
important to knowthat the AND is evaluated first. The previous
SELECT actually returns all rows witha grade point of 3.0. Hence,
Phillips was returned. The second comparisonreturned Thomas with a
grade point of 4.0 and a class code of FR.
When it is necessary for the OR to be evaluated before the AND
the use ofparentheses changes the priority of evaluation. A
different result is seen whendoing the OR first. Here is how the
statement should be written:
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
20 of 45 8/16/2014 12:40 AM
-
1 Row returned
Last_Name Class_Code Grade_Pt
Thomas FR 4.00
Now, only Thomas is returned and the output is correct.
NULL is an SQL reserved word. It represents missing or unknown
data in acolumn. Since NULL is an unknown value, a normal
comparison cannot be used todetermine whether it is true or false.
All comparisons of any value to a NULLresult in an unknown; it is
neither true nor false. The only valid test for a nulluses the
keyword NULL without the normal comparison symbols and is
explainedin this chapter.
When a table is created in Teradata, the default for a column is
for it to allow aNULL value to be stored. So, unless the default is
over-ridden and NULL values arenot allowed, it is a good idea to
understand how they work.
A SHOW TABLE command (chapter 3) can be used to determine
whether a NULL isallowed. If the column contains a NOT NULL
constraint, you need not be concernedabout the presence of a NULL
because it is disallowed.
This AND Truth Table must now be used for compound tests when
NULL values areallowed:
First Test Result AND Second Test Result Final Result
Impact of NULL on Compound Comparisons
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
21 of 45 8/16/2014 12:40 AM
-
Unknown False False
Unknown Unknown Unknown
Figure 2-5
This OR Truth Table must now be used for compound tests when
NULL values areallowed:
First Test Result OR Second Test Result Final Result
True Unknown True
Unknown True True
False Unknown Unknown
Unknown False Unknown
Unknown Unknown Unknown
Figure 2-6
For most comparisons, an unknown (null) is functionally
equivalent to a falsebecause it is not a true. Therefore, when
using any comparison symbol a row isnot returned when it contains a
NULL.
At the same time, the next SELECT does not return Johnson
because allcomparisons against a NULL are unknown:
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
22 of 45 8/16/2014 12:40 AM
-
No rows foundV2R5: *** Failure 3731 The user must use IS NULL or
IS NOT NULL to test forNULL values.
As seen in the above Truth tables, a comparison test cannot be
used to find aNULL.
To find a NULL, it becomes necessary to make a slight change in
the syntax of theconditional comparison. The coding necessary to
find a NULL is seen in the nextsection.
It can be fairly straightforward to request exactly which rows
are needed.However, sometimes rows are needed that contain any
value other than a specificvalue. When this is the case, it might
be easier to write the SELECT to find what isnot needed instead of
what is needed. Then convert it to return everything else.This
might be the situation when there are 100 potential values stored
in thedatabase table and 99 of them are needed. So, it is easier to
eliminate the onevalue than it is to specifically list the desired
99 different values individually.
Either of the next two SELECT formats can be used to accomplish
the eliminationof the one value:
Using NOT in SQL Comparisons
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
23 of 45 8/16/2014 12:40 AM
-
This second version of the SELECT is normally used when compound
conditionsare required. This is because it is usually easier to
code the SELECT to get what isnot wanted and then to enclose the
entire set of comparisons in parentheses andput one NOT in front of
it. Otherwise, with a single comparison, it is easier to putNOT in
front of the comparison operator without requiring the use of
parentheses.
The next SELECT uses the NOT with an AND comparison to display
seniors andlower classmen with grade points less than 3.0:
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
24 of 45 8/16/2014 12:40 AM
-
6 Rows returned
Last_Name First_Name Class_Code Grade_Pt
McRoberts Richard JR 1.90
Hanson Henry FR 2.88
Delaney Danny SR 3.35
Larkins Michael FR 0.00
Phillips Martin SR 3.00
Smith Andy SO 2.00
Without using the above technique of a single NOT, it is
necessary to change everyindividual comparison. The following
SELECT shows this approach, notice the otherchange necessary below,
NOT AND is an OR:
Since you cannot have conditions like: NOT >= and NOT , they
must beconverted to < (not < and not =) and = (not, not =).
It returns the same 5 rows,but also notice that the AND is now an
OR:
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
25 of 45 8/16/2014 12:40 AM
-
6 Rows returned
Last_Name First_Name Class_Code Grade_Pt
McRoberts Richard JR 1.90
Hanson Henry FR 2.88
Delaney Danny SR 3.35
Phillips Martin SR 3.00
Larkins Michael FR 0.00
Smith Andy SO 2.00
Chart of individual conditions and NOT:
Condition Opposite condition NOT condition
=
= NOT
AND OR OR
OR AND AND
Figure 2-7
To maintain the integrity of the statement, all portions of the
WHERE must bechanged, including AND, as well as OR. The following
two SELECT statementsillustrate the same concept when using an
OR:Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
26 of 45 8/16/2014 12:40 AM
-
1 Row returned
Last_Name
Hanson
In the earlier Truth table, the NULL value returned an unknown
when checkedwith a comparison operator. When looking for specific
conditions, an unknown wasfunctionally equivalent to a false, but
really it is an unknown.
These two Truth tables can be used together as a tool when
mixing AND and ORtogether in the WHERE clause along with NOT.
This Truth Table helps to gauge returned rows when using NOT
with AND:
First Test Result AND Second Test Result Result
NOT(True) = False NOT(Unknown) = Unknown False
NOT(Unknown) = Unknown NOT(True) = False False
NOT(False) = True NOT(Unknown) = Unknown Unknown
NOT(Unknown) = Unknown NOT(False) = True Unknown
NOT(Unknown) = Unknown NOT(Unknown) = Unknown Unknown
Figure 2-8Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
27 of 45 8/16/2014 12:40 AM
-
NOT(Unknown) = Unknown NOT(True) = False Unknown
NOT(False) = True NOT(Unknown) = Unknown True
NOT(Unknown) = Unknown NOT(False) = True True
NOT(Unknown) = Unknown NOT(Unknown) = Unknown Unknown
Figure 2-9
There is an issue associated with using NOT. When a NOT is done
on a truecondition, the result is a false. Likewise, the NOT of a
false is a true. However,when a NOT is done with an unknown, the
result is still an unknown. Whenever aNULL appears in the data for
any of the columns being compared, the row willnever be returned
and the answer set will not be what is expected.
Another area where care must be taken is when allowing NULL
values to be storedin one or both of the columns. As mentioned
earlier, previous versions of Teradatahad no concept of unknown and
if a compare didnt result in a true, it was false.With the emphasis
on ANSI compatibility the unknown was introduced.
If NULL values are allowed and there is potential for the NULL
to impact the finaloutcome of compound tests, additional tests are
required to eliminate them. Oneway to eliminate this concern is to
never allow a NULL value in any columns.However, this may not be
appropriate and it will require more storage spacebecause a NULL
can be compressed. Therefore, when a NULL is allowed, the SQLneeds
to simply check for a NULL.
Therefore, using the expression IS NOT NULL is a good technique
when NULL isallowed in a column and the NOT is used with a single
or a compound comparison.This does require another comparison and
could be written as:
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
28 of 45 8/16/2014 12:40 AM
-
7 Rows returned
Last_Name First_Name Class_Code Grade_Pt
Larkins Michael FR 0.00
Hanson Henry FR 2.88
McRoberts Richard R 1.90
Johnson Stanley ? ?
Delaney Danny SR 3.35
Phillips Martin SR 3.00
Smith Andy SO 2.00
Notice that Johnson came back this time and did not appear
previously because ofthe NULL values.
Later in this book, the COALESCE will be explored as another way
to eliminateNULL values directly in the SQL instead of in the
database.
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
29 of 45 8/16/2014 12:40 AM
-
21st March 2013
Although a relational data model uses Primary Keys and Foreign
Keys to establishthe relationships between tables, that design is a
Logical Model. Each vendor usesspecialized techniques to implement
a Physical Model. Teradata does not use keysin its physical model.
Instead, Teradata is implemented using indices, both primaryand
secondary.
The Primary Index (PI) is the most important index in all of
Teradata. Theperformance of Teradata can be linked directly to the
selection of this index. Thedata value in the PI column(s) is
submitted to the hashing function. The resultingrow hash value is
used to map the row to a specific AMP for data distribution
andstorage.
To illustrate this concept, I have on several occasions used two
decks of cards.Imagine if you will, fourteen people in a room. To
the largest, most powerfullooking man in the room, you give one of
the decks of cards. His large hands allowhim to hold all fifty-two
cards at one time, with some degree of success. The cardsare
arranged with the ace of spades continuing through the king of
spades inascending order. After the spades, are the hearts, then
the clubs and last, thediamonds. Each suit is arranged starting
with the ace and ascending up to theking. The cards are partitioned
by suit.
The other deck of cards is divided among the other thirteen
people. Using thisprocedure, all cards with the same value (i.e.
aces) all go to the same person.Likewise, all the deuces, treys and
subsequent cards each go to one of the thirteenpeople. Each of the
four cards will be in the same order as the suits contained inthe
single deck that went to the lone man: spades, hearts, clubs and
diamonds.Once all the cards have been distributed, each of the
thirteen people will beholding four cards of the same value
(4*13=52). Now, the game can begin.
The requests in this game come in the form of give-me, one or
more cards.
To make it easy for the lone player, we first request: give-me
the ace of spades.The person with four aces finds their ace, as
does the lone player with all 52cards, both on the top other their
cards. That was easy!
As the difficulty of the give-me requests increase, the level of
difficultydramatically increases for the lone man. For instance,
when the give-me requestis for all of the twos, one of the thirteen
people holds up all four of their cards and
Use of an Index
Use of an Index
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
30 of 45 8/16/2014 12:40 AM
-
Another request might be give-me all of the diamonds. For the
thirteen people,each person locates and holds up one card of their
cards and the request isfinished. For the lone person with the
single deck, the request means finding andholding up the last
thirteen cards in their deck of fifty-two. In each of thesegive-me
requests, the lone man had to negotiate all fifty two cards while
thethirteen other people only needed to determine which of the four
cards applied tothe request, if any. This is the same procedure
used by Teradata. It divides up thedata like we divided up the
cards.
As illustrated, the thirteen people are faster than the lone
man. However, thegame is not limited to thirteen players. If there
were 26 people who wished toplay on the same team, the cards simply
need to be divided or distributeddifferently.
When using the value (ace through king) there are only 13 unique
values. Inorder for 26 people to play, we need a way to come up
with 26 unique values for26 people. To make the cards more unique,
we might combine the value of thecard (i.e. ace) with the color.
Therefore, we have two red aces and two black acesas well as two
sets for every other card. Now when we distribute the cards, eachof
the twenty-six people receives only two cards instead of the
original four. Thedistribution is still based on fifty-two cards (2
times 26).
At the same time, the optimum number of people for the game is
not 26. Based onwhat has been discussed so far, what is the optimum
number of people?
If your answer is 52, then you are absolutely correct.
With this many people, each person has one and only one card.
Any time agive-me is requested of the participants, their one card
either qualifies or it doesnot. It doesnt get any simpler or faster
than this situation.
As easy as this sounds, to accomplish this distribution the
value of the card aloneis not sufficient to manifest 52 unique
values. Neither is using the value and thecolor. That combination
only gives us a distribution of 26 unique values when 52unique
values are desired.
To achieve this distribution we need to establish still more
uniqueness.Fortunately, we can use the suit along with the value.
Therefore, the ace of spadesis different than the ace of hearts,
which is different from the ace of clubs and theace of diamonds. In
other words, there are now 52 unique identities to use
fordistribution.
To relate this distribution to Teradata, one or more columns of
a table are chosento be the Primary Index.
Primary Index
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
31 of 45 8/16/2014 12:40 AM
-
To store the data, the value(s) in the PI are hashed via a
calculation to determinewhich AMP will own the data. The same data
values always hash the same rowhash and therefore are always
associated with the same AMP.
The advantage to using up to sixteen columns is that row
distribution is verysmooth or evenly based on unique values. This
simply means that eachAMPcontains the same number of rows. At the
same time, there is a downside tousing several columns for a PI.
The PE needs every data value for each column asinput to the
hashing calculation to directly access a particular row. If a
singlecolumn value is missing, a full table scan will result
because the row hash cannotbe recreated. Any row retrieval using
the PI column(s) is always an efficient, oneAMP operation.
Although uniqueness is good in most cases, Teradata does not
require that a UPIbeused. It also allows for a Non-Unique Primary
Index (NUPI, pronounced asnew-pea). The potential downside of a
NUPI is that if several duplicate values(NUPI dups) are stored,
they all go to the same AMP. This can cause an unevendistribution
that places more rows on some of the AMPs than on others. Thismeans
that any time an AMP with a larger number of rows is involved, it
has towork harder than the other AMPs. The other AMPs will finish
before the slowerAMP. The time to process a single user request is
always based on the slowestAMP. Therefore, serious consideration
should be used when making the decision touse a NUPI.
Every table must have a PI and it is established when the table
is created. If theCREATE TABLE statement contains: UNIQUE PRIMARY
INDEX( ), thevalue in the column(s) will be distributed to an AMP
as a UPI. However, if thestatement reads: PRIMARY INDEX ( ), the
value in the column(s)will be distributed as a NUPI and allow
duplicate values. Again, all the same valueswill go to the same
AMP.
If the DDL statement does not specify a PI, but it specifies a
PRIMARY KEY (PK),the named column(s) are used as the UPI. Although
Teradata does not useprimary keys, the DDL may be ported from
another vendor's database system.
A UPI is used because a primary key must be unique and cannot be
null. Bydefault, both UPIs and NUPIs allow a null value to be
stored unless the columndefinition indicates that null values are
not allowed using a NOT NULL constraint.
Now, with that being said, when considering JOIN accesses on the
tables,sometimes it is advantageous to use a NUPI. This is because
the rows being joinedbetween tables must be on the same AMP. If
they are not on the same AMP, one ofthe rows must be moved to the
same AMP as the matching row. Teradata will useone of two different
strategies to temporarily move rows. It can copy all neededrows to
all AMPs or it can redistribute them using the hashing mechanism on
thecolumn defined as the join domain that is a PI. However, if
neither join column is aPI, it might be necessary to redistribute
all participating rows from both tables byDynamic Views template.
Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
32 of 45 8/16/2014 12:40 AM
-
duplicate values. The logical data model needs to be extended
with usageinformation in order to know the best way to distribute
the data rows. This is doneduring the physical implementation phase
before creating tables.
A Secondary Index (SI) is used in Teradata as a way to directly
access rows in thedata, sometimes called the base table, without
requiring the use of PI values.Unlike the PI, an SI does not effect
the distribution of the data rows. Instead, it isan alternate read
path and allows for a method to locate the PI value using the
SI.Once the PI is obtained, the row can be directly accessed using
the PI. Like the PI,an SI can consist of up to 16 columns.
In order for an SI to retrieve the data row by way of the PI, it
must store andretrieve an index row. To accomplish this Teradata
creates, maintains and uses asubtable. The PI of the subtable is
the value in the column(s) that are defined asthe SI. The data
stored in the subtable row is the previously hashed value of
thereal PI for the data row or rows in the base table. The SI is a
pointer to the realdata row desired by the request. An SI can also
be unique (USI, pronounced asyou-sea) or non-unique (NUSI,
pronounced as new-sea).
The rows of the subtable contain the row hashed value of the SI,
the actual datavalue(s) of the SI, and the row hashed value of the
PI as the row ID. Once the rowID of the PI is obtained from the
subtable row, using the hashed value of the SI,the last step is to
get the actual data row from the AMP where it is stored. Theaction
and hashing for an SI is exactly the same as when starting with a
PI. Whenusing a USI, the access of the subtable is a one AMP
operation and then accessingthe data row from the base table is
another one AMP operation. Therefore, USIaccesses are always a two
AMP operation based on two separate row hashoperations.
When using a NUSI, the subtable access is always an all AMP
operation. Since thedata is distributed by the PI, NUSI duplicate
values may exist and probably doexist on multiple AMPs. So, the
best plan is to go to all AMPs and check for therequested NUSI
value. To make this more efficient, each AMP scans its
subtable.These subtable rows contain the row hash of the NUSI, the
value of the data thatcreated the NUSI and one or more row IDs for
all the PI rows on that AMP. This isstill a fast operation because
these rows are quite small and several are stored ina single block.
If the AMP determines that it contains no rows for the value of
theNUSI requested, it is finished with its portion of the request.
However, if an AMPhas one or more rows with the NUSI value
requested, it then goes and retrievesthe data rows into spool space
using the index.
With this said, the SQL optimizer may decide that there are too
many base tabledata rows to make index access efficient. When this
happens, the AMPs will do afull base table scan to locate the data
rows and ignore the NUSI. This situation iscalled a weakly
selective NUSI. Even using old-fashioned indexed sequential
files,
Secondary Index
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
33 of 45 8/16/2014 12:40 AM
-
If the SQL does not use a NUSI, you should consider dropping it,
due to the factthat the subtable takes up PERM space with no
benefit to the users. The TeradataEXPLAIN is covered in this book
and it is the easiest way to determine if your SQLis using a NUSI.
Furthermore, the optimizer will never use a NUSI
withoutSTATISTICS.
There has been another evolution in the use of NUSI processing.
It is called NUSIBitmapping. This means that if a table has two
different NUSI indices andindividually they are weakly selective,
but together they can be bitmappedtogether to eliminate most of the
non-conforming rows; it will use the twodifferent NUSI columns
together because they become highly selective. Therefore,many
times, it is better to use smaller individual NUSI indices instead
of a largecomposite (more than one column) NUSI.
There is another feature related to NUSI processing that can
improve access timewhen a value range comparison is requested. When
using hash values, it isimpossible to determine any value within
the range. This is because large datavalues can generate small hash
values and small data values can produce largehash values. So, to
overcome the issue associated with a hashed value, there is arange
feature called Value Ordered NUSIs. At this time, it may only be
used with afour byte or smaller numeric data column. Based on its
functionality, a ValueOrdered NUSI is perfect for date processing.
See the DDL chapter in this book formore details on USI and NUSI
usage.
Posted 21st March 2013 by pankaj agarwal
0 Add a comment
21st March 2013
In Teradata, a user is the same as a database with one
exception. A user is able tologon to the system and a database
cannot. Therefore, to authenticate the user, apassword must be
established. The password is normally established at the sametime
that the CREATE USER statement is executed. The password can also
bechanged using a MODIFY USER command.
Like a database, a user area can contain database objects
(tables, views, macrosand triggers). A user can have PERM and TEMP
space and can also have spoolspace. On the other hand, a user might
not have any of these types of space,exactly the same as a
database.
Teradata Users
Teradata Users
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
34 of 45 8/16/2014 12:40 AM
-
PERMANENT
TEMPORARY
SPOOL
ACCOUNT
FALLBACK
JOURNAL
DEFAULT JOURNAL
PASSWORD
STARTUP
DEFAULT DATABASE
By no means are these all of the parameters. It is not the
intent of this chapter,nor the intent of this book to teach
database administration. There are referencemanuals and courses
available to use. Teradata administration warrants a book
byitself.
http://www.coffingdw.com/sql/tdsqlutp/teradata_users.htm[http://www.coffingdw.com/sql/tdsqlutp/teradata_users.htm]
Posted 21st March 2013 by pankaj agarwal
{ CREATE | MODIFY } DATABASE or USER (in common)
{ CREATE | MODIFY } USER (only)
0 Add a comment
21st March 2013
Within Teradata, a database is a storage location for database
objects (tables,views, macros, and triggers). An administrator can
use Data DefinitionLanguage(DDL) to establish a database by using a
CREATE DATABASE command.
A database may have PERMANENT (PERM) space allocated to it. This
PERM space
A Teradata Database
A Teradata Database
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
35 of 45 8/16/2014 12:40 AM
-
Teradata allocates PERM space to tables, up to the maximum, as
rows areinserted. The space is not pre-allocated. Instead, it is
allocated, as rows are storedin blocks on disk. The maximum block
size is defined either at a system level inthe DBS Control Record,
at the database level or individually for each table. LikePERM, the
block size is a maximum size. Yet, it is only a maximum for blocks
thatcontain multiple rows. By nature, the blocks are variable in
length. So, disk spaceis not pre-allocated; instead, it is
allocated on an as needed basis, one sector (512bytes) at a time.
Therefore, the largest possible wasted disk space in a block is511
bytes.
A database can also have SPOOL space associated with it. All
users who runqueries need workspace at some point in time. This
SPOOL space is workspaceused for the temporary storage of rows
during the execution of user SQLstatements. Like PERM space, SPOOL
is defined as a maximum amount that can beused within a database or
by a user. Since PERM is not pre-allocated, unusedPERM space is
automatically available for use as SPOOL. This maximizes the
diskspace throughout the system.
It is a common practice in Teradata to have some databases with
PERM space thatcontain only tables. Then, other databases contain
only views. These viewdatabases require no PERM space and are the
only databases that users haveprivileges to access. The views in
these databases control all access to the realtables in other
databases. They insulate the actual tables from user access.
Therewill be more on views later in this book.
The newest type of space allocation within Teradata is TEMPORARY
(TEMP) space.A database may or may not have TEMP space, however, it
is required if GlobalTemporary Tables are used. The use of
temporary tables is also covered in moredetail later in the SQL
portion of this book.
A database is defined using a series of parameter values at
creation time. Themajority of the parameters can easily be changed
after a database has beencreated using the MODIFY DATABASE command.
However, when attempting toincrease PERM or TEMP space maximums,
there must be sufficient disk spaceavailable even though it is not
immediately allocated. There may not be morePERM space defined that
actual disk on the system.
A number of additional database parameters are listed below
along with the userparameters in the next section. These parameters
are tools for the databaseadministrator and other experienced users
when establishing databases for tablesand views.
PERMANENT
TEMPORARY
CREATE / MODIFY DATABASE Parameters
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
36 of 45 8/16/2014 12:40 AM
-
JOURNAL
DEFAULT JOURNAL
Posted 21st March 2013 by pankaj agarwal
0 Add a comment
21st March 2013
The Teradata database currently runs normally on NCR
Corporations WorldMarkSystems in the UNIX MP-RAS environment. Some
of these systems consist of asingle processing node (computer)
while others are several hundred nodesworking together in a single
system. The NCR nodes are based entirely onindustry standard CPU
processor chips, standard internal and external busarchitectures
like PCI and SCSI, and standard memory modules with
4-wayinterleaving for speed.
At the same time, Teradata can run on any hardware server in the
single nodeenvironment when the system runs Microsoft NT and
Windows 2000. This singlenode may be any computer from a large
server to a laptop.
Whether the system consists of a single node or is a massively
parallel systemwith hundreds of nodes, the Teradata RDBMS uses the
exact same componentsexecuting on all the nodes in parallel. The
only difference between small and largesystems is the number of
processing components.
When these components exist on different nodes, it is essential
that thecomponents communicate with each other at high speed. To
facilitate thecommunications, the multi-node systems use the BYNET
interconnect. It is a highspeed, multi-path, dual redundant
communications channel. Another amazingcapability of the BYNET is
that the bandwidth increases with each consecutivenode added into
the system. There is more detail on the BYNET later in
thischapter.
As previously mentioned, Teradata is the superior product today
because of itsparallel operations based on its architectural
design. It is the parallel processingby the major components that
provide the power to move mountains of data.Teradata works more
like the early Egyptians who built the pyramids withoutheavy
equipment using parallel, coordinated human efforts. It uses
smaller nodes
Teradata Architecture 1
Teradata Architecture
Teradata Components
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
37 of 45 8/16/2014 12:40 AM
-
Processors and the Message Passing Layer. The role of each
component isdiscussed in the next sections to provide a better
understanding of Teradata. Oncewe understand how Teradata works, we
will pursue the SQL that allows storageand access of the data.
The Parsing Engine Processor (PEP) or Parsing Engine (PE), for
short, is one of thetwo primary types of processing tasks used by
Teradata. It provides the entrypoint into the database for users on
mainframe and networked computer systems.It is the primary director
task within Teradata.
As users logon to the database they establish a Teradata
session. Each PE canmanage 120 concurrent user sessions. Within
each of these sessions users submitSQL as a request for the
database server to take an action on their behalf. The PEwill then
parse the SQL statement to establish which database objects
areinvolved. For now, lets assume that the database object is a
table. A table is atwo-dimensional array that consists of rows and
columns. A row represents anentity stored in a table and it is
defined using columns. An example of a row mightbe the sale of an
item and its columns include the UPC, a description and thequantity
sold.
Any action a user requests must also go through a security check
to validate theirprivileges as defined by the database
administrator. Once their authorization atthe object level is
verified, the PE will verify that the columns requested
actuallyexist within the objects referenced.
Next, the PE optimizes the SQL to create an execution plan that
is as efficient aspossible based on the amount of data in each
table, the indices defined, the typeof indices, the selectivity
level of the indices, and the number of processing stepsneeded to
retrieve the data. The PE is responsible for passing the
optimizedexecution plan to other components as the best way to
gather the data.
An execution plan might use the primary index column assigned to
the table, asecondary index or a full table scan. The use of an
index is preferable and will bediscussed later in this chapter. For
now, it is sufficient to say that a full table scanmeans that all
rows in the table must be read and compared to locate therequested
data.
Although a full table scan sounds really bad, within the
architecture of Teradata, itis not necessarily a bad thing because
the data is divided up and distributed tomultiple, parallel
components throughout the database. We will look next at theAMPs
that perform the parallel disk access using their file system
logic. The AMPsmanage all data storage on disks. The PE has no
disks.
Activities of a PE:
Parsing Engine Processor (PEP or PE)
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
38 of 45 8/16/2014 12:40 AM
-
Optimize the access path(s) to retrieve the rows
Build an execution plan with necessary steps for row access
Send the plan steps to Access Module Processors (AMP)
involved
The next major component of Teradatas parallel architecture is
called an AccessModule Processor (AMP). It stores and retrieves the
distributed data in parallel.Ideally, the data rows of each table
are distributed evenly across all the AMPs. TheAMPs read and write
data and are the workhorses of the database. Their job is toreceive
the optimized plan steps, built by the PE after it completes
theoptimization, and execute them. The AMPs are designed to work in
parallel tocomplete the request in the shortest possible time.
Optimally, every AMP should contain a subset of all the rows
loaded into everytable. By dividing up the data, it automatically
divides up the work of retrievingthe data. Remember, all work comes
as a result of a users SQL request. If the SQLasks for a specific
row, that row exists in its entirety (all columns) on a single
AMPand other rows exist on the other AMPs.
If the user request asks for all of the rows in a table, every
AMP should participatealong with all the other AMPs to complete the
retrieval of all rows. This type ofprocessing is called an all AMP
operation and an all rows scan. However, each AMPis only
responsible for its rows, not the rows that belong to a different
AMP. As faras the AMPs are concerned, it owns all of the rows.
Within Teradata, the AMPenvironment is a shared nothing
configuration. The AMPs cannot access eachothers data rows, and
there is no need for them to do so.
Once the rows have been selected, the last step is to return
them to the clientprogram that initiated the SQL request. Since the
rows are scattered acrossmultiple AMPs, they must be consolidated
before reaching the client. Thisconsolidation process is
accomplished as a part of the transmission to the client sothat a
final comprehensive sort of all the rows is never performed.
Instead, allAMPs sort only their rows (at the same time in
parallel) and the MessagePassing Layer is used to merge the rows as
they are transmitted from all theAMPs.
Therefore, when a client wishes to sequence the rows of an
answer set, thistechnique causes the sort of all the rows to be
done in parallel. Each AMP sortsonly its subset of the rows at the
same time all the other AMPs sort their rows.Once all of the
individual sorts are complete, the BYNET merges the sorted
rows.Pretty brilliant!
Activities of the AMP:
Store and retrieve data rows using the file system
Access Module Processor (AMP)
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
39 of 45 8/16/2014 12:40 AM
-
Sort and format output data
The Message Passing Layer varies depending on the specific
hardware on which
the Teradata database is executing. In the latter part of the
20th
century, mostTeradata database systems executed under the UNIX
operating system. However,in 1998, Teradata was released on
Microsofts NT operating system. Today it alsoexecutes under Windows
2000. The initial release of Teradata, on the Microsoftsystems, is
for a single node.
When using the UNIX operating system, Teradata supports up to
512 nodes. Thismassively parallel system establishes the basis for
storing and retrieving datafrom the largest commercial databases in
the world, Teradata. Today, the largestsystem in the world consists
of 176 nodes. There is much room for growth as thedatabases begin
to exceed 40 or 50 terabytes.
For the NCR UNIX systems, the Message Passing Layer is called
the BYNET. Theamazing thing about the BYNET is its capacity.
Instead of a fixed bandwidth that isshared among multiple nodes,
the bandwidth of the BYNET increases as thenumber of nodes
increase. This feat is accomplished as a result of using
virtualcircuits instead of using a single fixed cable or a twisted
pair configuration.
To understand the workings of the BYNET, think of a telephone
switch used bylocal and long distance carriers. As more and more
people place phone calls, noone needs to speak slower. As one
switch becomes saturated, another switch isautomatically used. When
your phone call is routed through a different switch,you do not
need to speak slower. If a natural or other type of disaster occurs
and aswitch is destroyed, all subsequent calls are routed through
other switches. TheBYNET is designed to work like a telephone
switching network.
An additional aspect of the BYNET is that it is really two
connection paths, likehaving two phone lines for a business. The
redundancy allows for two differentaspects of its performance. The
first aspect is speed. Each path of the BYNETprovides bandwidth of
10 Megabytes (MB) per second with Version 1 and 60 MBper second
with Version 2. Therefore the aggregate speed of the two
connectionsis 20MB/second or 120MB/second. However, as mentioned
earlier, the bandwidthgrows linearly as more nodes are added. Using
Version 1 any two nodescommunicate at 40MB/second (10MB/second * 2
BYNETs * 2 nodes). Therefore,10 nodes can utilize 200MB/second and
100 nodes have 2000MB/second availablebetween them. When using the
version 2 BYNET, the same 100 nodescommunicate at 12,000MB/second
(60MB/second * 2 BYNETs * 100 nodes).
The second and equally important aspect of the BYNET uses the
two connectionsfor availability. Regardless of the speed associated
with each BYNET connection, ifone of the connections should fail,
the second is completely independent and can
Message Passing Layer (BYNET)
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
40 of 45 8/16/2014 12:40 AM
-
many normal networks that typically transfer messages at 10MB
per second.
All messages going across the BYNET offer guaranteed delivery.
So, any messagesnot successfully delivered because of a failure on
one connection automaticallyroute across the other connection.
Since half of the BYNET is not working, thebandwidth reduces by
half. However, when the failed connection is returned toservice,
its topology is automatically configured back into service and it
beginstransferring messages along with the other connection. Once
this occurs, thecapacity returns to normal.
Posted 21st March 2013 by pankaj agarwal
0 Add a comment
21st March 2013AMPs
The Access Module Process (AMP) is the heart of the Teradata
RDBMS.The Access Module Process is a virtual processor (vproc) that
provides a BYNETinterface and performs many database and file
management tasks.AMPs control the management of the Teradata RDBMS
and also provide controlover the disk subsystem, with each AMP
being assigned to a virtual disk.
Each AMP controls the following set of functions:- BYNET (or
Boardless BYNET) interface- Database manager- Locking
Joins1.Sorting2.Aggregation3.Output data conversion4.Disk space
management5.Accounting6.Journaling7.
- File system and disk management
Access Module Process AMP in TD
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
41 of 45 8/16/2014 12:40 AM
-
1. Lock the table.2. Execute the operation requested.3. End the
transaction.
Posted 21st March 2013 by pankaj agarwal
0 Add a comment
20th March 2013
Architecture of Teradata RDBMS
Teradata is designed using Shared-Nothing architecture. Each
processing unit processes its own unit
of data in parallel. Teradata systems can be either SMP
(Symmetric Multi Processing) or MPP (Massively
Parallel Processing). In simple words a SMP system is a single
node system where as a MPP system has
two or more nodes working in parallel.
Teradata architecture contains following components :
1) Node
2) VPROC
3) PE
4) AMP
5) BYNET
Architecture Components
Node
The basic building block for a Teradata system, the node is
where the processing occurs for the database.
A node is simply collection of many hardware and software
components.
PDE
The PDE (Parallel Database Extensions) software layer runs the
operating system on each node. It was
created by NCR to support the parallel environment.
Teradata RDBMS Components
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
42 of 45 8/16/2014 12:40 AM
-
Operating system software Teradata software Application software
System dump space
Teradata database tables are stored on disk arrays, not on the
system disks.
Memory
Vprocs share a free memory pool within a node. A segment of
memory is allocated to a vproc for its use,
then returned to the memory pool for use by another vproc. The
free memory pool is a collection of
memory available to the node.
Vproc
A virtual processor or a vproc is a group of one or more
software processes running under the operating
system's multi-tasking environment:
On the UNIX operating system, a vproc is a collection of
software processes. On the Windows operating systems, a vproc is a
single software process.
The two types of Teradata vprocs are:
AMP (Access Module Processor) PE (Parsing Engine)
When vprocs communicate, they use BYNET hardware (on MPP
systems), BYNET software, and PDE.
The BYNET hardware and software carry vproc messages to and from
a particular node. Within a node,
the BYNET and PDE software deliver messages to and from the
participating vprocs.
PE
PEs (Parsing Engines) are vprocs that receive SQL requests from
the client and break the requests into
steps. The PEs send the steps to the AMPs and subsequently
return the answer to the client.
AMP
AMPs (Access Module Processors) are virtual processors (vprocs)
that receive steps from PEs (Parsing
Engines) and perform database functions to retrieve or update
data. Each AMP is associated with one
virtual disk (vdisk), where the data is stored. An AMP manages
only its own vdisk, not the vdisk of any
other AMP.
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
43 of 45 8/16/2014 12:40 AM
-
The vdisk is made up of 1 to 64 pdisks (user slices in UNIX or
partitions in Windows NT, whose size and
configuration vary based on RAID level). The pdisks logically
combine to comprise the AMP's vdisk.
Although an AMP can manage up to 64 pdisks, it controls only one
vdisk. An AMP manages only its own
vdisk, not the vdisk of any other AMP.
BYNET
The BYNET (banyan network) is a combination of hardware and
software that provides high performance
networking between the nodes of a Teradata system. A
dual-redundant, bi-directional, multi-staged
network, the BYNET enables the nodes to communicate in a high
speed, loosely-coupled fashion. It is
based on banyan topology, a mathematically defined structure
that has branches reminiscent of a banyan
tree.
The BYNET is a high-speed interconnect (network) that enables
multiple nodes in the system to
communicate.
The BYNET hardware and software handle the communication between
the vprocs.
Hardware: The nodes of an MPP system are connected with the
BYNET hardware, consisting ofBYNET boards and cables.
Software: The BYNET software is installed on every node. This
BYNET driver is an interface betweenthe PDE software and the BYNET
hardware.
SMP systems do not contain BYNET hardware. The PDE and BYNET
software emulates BYNET activity in
a single-node environment. The SMP implementation is sometimes
called "boardless BYNET."
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
44 of 45 8/16/2014 12:40 AM
-
[http://4.bp.blogspot.com/-14j5RcK_QIo/UUnlAZjUL6I/AAAAAAAADsQ/hl3mDm8W5uU/s1600/TD_Architcture.png]
Posted 20th March 2013 by pankaj agarwal
0 Add a comment
Dynamic Views template. Powered by Blogger.
Classic
TERADATA
http://tdpank.blogspot.in/search?updated-min=2013-01-01T00:00:00-08...
45 of 45 8/16/2014 12:40 AM