Accelerating Reporting and Advanced Analytics @Kognitio Follow the conversation on Twitter: 22 August 2013
Dec 04, 2014
Accelerating Reporting and Advanced Analytics
@KognitioFollow the conversation on Twitter:
22 August 2013
• Why Data Warehouses need Acceleration• The “Achilles Heel” of Teradata Systems• Demonstration: Analytical Acceleration for
Teradata systems – lighting queries• Complementing the EDW and enabling Hadoop• Summary, Question & Answer Session
Web Briefing Agenda
Tera‐Tom Author of over 50 Books
Tera‐Tom books are so popular because even a seven year old boy (raised by wolves) can understand them.
“To have everything is to possess nothing.”‐ Buddha
“To have every database is to possess Nexus.”‐ IT Buddha
What is Parallel Processing?
Two guys were having fun on a Saturday night when one said, “Got to go and do my laundry”. The other said, “What?” The man explained that if he went to the laundry mat the next morning he would be lucky to get one machine and be there all day, but if he went on Saturday night he could get all the machines. Then he could do all his wash and dry in two hours. Now that’s parallel processing mixed in with a little dry humor!
“After enlightenment, the laundry”‐ Zen Proverb
“After parallel processing the laundry, enlightenment!”‐Teradata Zen Proverb
Tera‐Tom’sParallel
ProcessingWashand Dry
Start Small and Think Big
Teradata was born to be parallel and with each query a single step is performed in parallel by each AMP. A Teradata system consists of a series of AMPs that will work in parallel to store and process your data. This design allows you to start small and grow infinitely. If your Teradata system provides you with an excellent Return On Investment (ROI) then continue to invest by purchasing more AMPs. Most companies start small, but after seeing what Teradata can do they continue to grow their ROI from the single step of implementing a Teradata system to millions of dollars in profits.
ParsingEngine
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
AMP
BYNET 0 BYNET 1
“A Journey of a thousand miles begins with a single step.”‐ Lao Tzu
How Teradata Creates Traditional Tables
Notice the last line of the CREATE Table example above and you will see that EmpNo is defined as the Primary Index. This means that the rows that are loaded into the Employee_Table will be hashed and distributed to the AMP based solely on the value in the rows EmpNo. The column EmpNo will be responsible for the distribution and if the column EmpNo is used by the user in the SQL to find a specific employee number (EmpNo) then only one AMP will be contacted to find the row.
CREATE TABLE Employee_Table( EmpNo INTEGER,Dept_No INTEGER,First_Name VARCHAR (12),Last_Name CHAR(20),Salary DECIMAL (10,2)) UNIQUE PRIMARY INDEX (EmpNo) ;
When a table is created the Primary Index is defined. 90% of your tables will use this design. Choosing the best column for the Primary Index is your number one strategy.
The Table Header is created on each AMP when the table is created. That is why all AMPs have the exact same number of tables. It is like looking into a mirror.
A Table Header is Placed Immediately on Every AMP
AMP AMP AMP
Employee_Table Header Employee_Table Header Employee_Table Header
The Table Header is created on each AMP when the table is created. The rows are stored in data blocks when the data is loaded. Both are stored separately on each AMP
When Data is Loaded it is Separated from the Table Header
AMP AMP AMP
Employee_Table
Row 1Row 4
Row 7Row 10
Employee_Table Employee_Table
Row 2Row 5
Row 8Row 11
Row 3Row 6
Row 9Row 12
Each AMP stores the rows they own inside a data block. Above you can see that this AMP is responsible for four rows and those four rows are held in a single data block.
An AMP Stores the Rows of a Table inside a Data Block
AMP
Sales_Table Header
Row ARow BRow CRow D
Sales_TableData Block
To read data an AMP must transfer the table header and the data block from inside it’s disk to it’s dedicated memory called “File System Generating” Cache (FSG Cache).
To Read a Data Block an AMP Moves the Block into Memory
AMP
Sales_Table Header
FSG Cache MemoryHeader
Row ARow BRow CRow D
Data Block
Row ARow BRow CRow D
Data Block
A Full Table Scan means that all AMPs must transfer their data block from their disk into their FSG Cache memory and then each AMP must read each row from the table starting from the first row they own to the last row.
A Full Table Scan Means All AMPs must Read All RowsAMP 1
Sales_Table Header
FSG Cache Memory
Header
Row ARow BRow CRow D
Data Block
AMP 2
Sales_Table Header
FSG Cache Memory
Header
AMP 3
Sales_Table Header
FSG Cache Memory
Header
AMP 4
Sales_Table Header
FSG Cache Memory
Header
Row ERow FRow GRow H
Data BlockRow IRow JRow KRow L
Data BlockRow MRow NRow ORow P
Data Block
Row ARow BRow CRow D
Data BlockRow ERow FRow GRow H
Data BlockRow IRow JRow KRow L
Data BlockRow MRow NRow ORow P
Data Block
To read or write data an AMP must move the data block from disk into it’s FSG Cache Memory. This is the Achilles heal of the system and it is painfully slow!
The “Achilles Heal “ or Slowest Process is Block Transfer
AMP
Sales_Table Header
FSG Cache MemoryHeader
Row ARow BRow CRow D
Data Block
Row ARow BRow CRow D
Data Block
A good physical database design limits the block movement and the reading of entire blocks.
How is this done? Read on!
Each table chooses a column to be the Primary Index. When users query a table and use the Primary Index column in their SQL only a “Single AMP” is used.
Each Table has a Primary Index
Each table chooses a column to be the Primary Index. The Primary Index column is used to distribute the rows among the AMPs and it’s how each AMP sorts the rows.
AMP 1
EmpNo 1EmpNo 2EmpNo 3EmpNo 4
Employee_TableData Block
Employee_Table HeaderPrimary Index (EmpNo)
EmpNo 5EmpNo 6EmpNo 7EmpNo 8
Employee_TableData Block
Employee_Table HeaderPrimary Index (EmpNo)
AMP 2
Choosing a good Primary Index results in only a “Single AMP” being used in the query.
A Query Using the Primary Index is a Single AMP Retrieve.AMP 1
Employee_Table Header
FSG Cache Memory
EmpNo 1Data Block
EmpNo 2EmpNo 3EmpNo 4
EmpNo 5Data Block
EmpNo 6EmpNo 7EmpNo 8
Employee_Table Header
FSG Cache Memory
Employee_Table Header
EmpNo 5Data Block
EmpNo 6EmpNo 7EmpNo 8
AMP 2
I need informationon employee 8.
A Teradata table can have trillions of rows so an individual AMP might have millions or even billions of rows for a single table. As rows of a table are inserted inside a block the block grows. Once a block reaches a maximum size it splits into two smaller blocks.
As Rows are Added a Data Block will Eventually Split
AMP
Sales_Table Header
Row ARow BRow CRow D
Sales_TableData Block 1
Row ERow FRow GRow H
Sales_TableData Block 2
A Full Table Scan means that all AMPs must transfer their data blocks from their disk into their FSG Cache memory and then read each block to evaluate the first row they own to the last row. Each AMP above process two blocks so there is twice the transfer.
A Full Table Scan Means All AMPs must Read All BlocksAMP 1
Sales_Table Header
FSG Cache Memory
Header
AMP 2
Sales_Table Header
FSG Cache Memory
Header
AMP 3
Sales_Table Header
FSG Cache Memory
Header
Block 1
Block 2
Block 1
Block 2
Block 1
Block 2
Block 1 Block 1 Block 1
ParsingEngine
Here is the plan AMPs.
This is a Full Table Scan of the Sales_Table.
You should each have two blocks. Transfer the blocks to your FSG Cache one at a time and send
me the results.
AMP 2 was contacted and told to only transfer the block that has EmpNo 12. Now you see the importance of each AMP sorting their rows to limit transferring each block.
A Primary Index Query uses a Single AMP and Single BlockAMP 1
Employee_Table Header
FSG Cache Memory
Employee_Table Header
FSG Cache Memory
Employee_Table Header
EmpNo 9Data Block 1
EmpNo 10EmpNo 11EmpNo 12
AMP 2
I need informationon employee 12.
Block 1
Block 2
Block 1
Block 2
EmpNo1‐4
EmpNo5‐8
EmpNo9‐12
EmpNo13‐16
Parsing Engine
Here is the plan AMP 2.
I know you have EmpNo 12 because EmpNo is the Primary
Index.
You should have two blocks. Only Transfer the block holding EmpNo 12 to your FSG Cache and send me the results.
Each AMP has the same table header, but contain different data rows for each table. Some tables are huge like the Order_Table. As more and more data was loaded it performed 12 block splits. The Customer_Table is smaller and contains only one block.
Each AMP Can Have Many Blocks for a Single Table
AMP 1
Order_Table Header
Customer_Table Header
AMP N
Order_Table Header
Customer_Table Header
A Full Table Scan means that all AMPs must transfer their data blocks from their disk into their FSG Cache memory and then read each block to evaluate the first row they own to the last row. Each AMP above process two blocks so there is twice the transfer.
A Full Table Scan Means All AMPs must Read All BlocksAMP 1
Order_Table Header
FSG Cache Memory
Order_Table Header
AMP 2
Order_Table Header
FSG Cache Memory
Order_Table Header
AMP 3
Order_Table Header
FSG Cache Memory
Order_Table Header
Block 1
ParsingEngine
Here is the plan AMPs.
This is a Full Table Scan of the Order_Table.
You should each have 24blocks. Transfer the
blocks to your FSG Cache one at a time and send
me the results.
Block 1 Block 1
How Teradata Creates a PPI Table
In the above example the first part of the CREATE Table statement looks just like the previous example, but it is the latter part of the statement that you see the words “Partition By”. This table’s rows will still be distributed among the AMPs via the Primary Index of Order_Number, but the AMPs won’t sort by Order_Number. Each AMP will sort their rows by the partition which is Month of the Order_Date. Look at the next page to see a visual of the AMPs and their sorting of millions of rows.
A Partitioned Primary Index (PPI) table has a Primary Index that distributes the rows among the AMPs, but they are not sorted by the Primary Index. Instead an AMP is instructed to sort the
rows they own by the Partition.
CREATE TABLE Order_Table( Order_Number INTEGER
,Customer_Number INTEGER ,Order_Date DATE ,Order_Total Decimal (10,2)
) PRIMARY INDEX(Order_Number) PARTITION BY RANGE_N (Order_Date BETWEEN date '2013‐01‐01' AND date '2013‐12‐31'
EACH INTERVAL ‘1' Month) ;
PPI Table Sorting the Rows by Month of Order_Date
Each AMP above sorts their rows by Month (of Order_Date), so if a user queries and only wants to see the orders placed in March then each AMP just transfers the blocks with March orders. This is an all AMP retrieve, but each AMP only has to retrieve from a single partition, which is the March Partition.
AMP 4
JanuaryFebruaryMarchAprilMayJuneJuly
AugustSeptemberOctober
NovemberDecember
Order_TableJanuaryFebruaryMarchAprilMayJuneJuly
AugustSeptemberOctoberNovemberDecember
Order_TableJanuaryFebruaryMarchAprilMayJuneJuly
AugustSeptemberOctoberNovemberDecember
Order_TableJanuaryFebruaryMarchAprilMayJuneJuly
AugustSeptemberOctoberNovemberDecember
Order_Table
AMP 3AMP 2AMP 1
All AMP are used to satisfy the query, but each AMP only reads a portion of their rows.
An All AMPs Retrieve By Way of a Single PartitionAMP 1
Order_Table Header
FSG Cache Memory
Order_Table Header
FSG Cache Memory
Order_Table Header
Mar Data Block
AMP N
I need a report onall orders placed in
The month of March.
Parsing Engine
Calling all AMPs. Do NOT do a Full Table Scan!
You should each have 12 blocks (one per month). Move your March Partition block into your
FSG Cache.
Give me all March Orders.
Jan Feb Mar
Apr
Jul Aug Sep
Oct Nov Dec
May Jun
Jan Feb Mar
Apr
Jul Aug Sep
Oct Nov Dec
May Jun
Order_Table Header
Jan Order 1Mar Data Block
Jan Order 2Jan Order 3Jan Order 4
Jan Order 5Jan Order 6Jan Order 7Jan Order 8
The two tables above contain the same Employee data, but the bottom example is a Columnar table. Employee_Normal has 3 rows on each AMP with 5 columns. Employee_Columnar is split into 5 containers and each container has one column.
Employee_Normal Employee_Normal
Employee_Columnar Employee_Columnar
What does a Columnar Table look like?
AMP 1 AMP 2
CREATE Table Employee_Columnar( Emp_No Integer,Dept_No Integer,First_Name Varchar(20),Last_Name Char(20) ,Salary Decimal (10,2)) No Primary Index Partition By Column ;
The normal table on top is one block containing three rows and five columns. The columnar table below has five blocks each containing one column of three rows. Columnar tables are better when users query just a few columns and not all columns.
A Comparison of Data for Normal Vs. Columnar
AMP 1
Emp_No Dept_No First_Name Last_Name Salary101 200 Hitesh Patel 80000102 300 Maria Garcia 75000106 100 Squiggy Jones 45000
101 200 Hitesh Patel 80000102 300 Maria Garcia 75000106 100 Squiggy Jones 45000
Emp_No Dept_No First_Name Last_Name Salary
Employee_Normal
Employee_Columnar
All rows come back, but only two columns. We moved less than half the block volume.
A Columnar Table is Best for Queries with Few ColumnsAMP 1
Employee_Columnar Header
FSG Cache Memory
Employee_Columnar Header
FSG Cache Memory
AMP N
I need a reportof only last namesand their salaries
Parsing Engine
Calling all AMPs.
You should each have 5 container blocks in your table named
Employee_Columnar. Move only your Last_Name and Salary container blocks
into your FSG Cache.
Give me all Last Names and Salaries.
Employee_Columnar Header Employee_Columnar Header
101 Hitesh
102 Maria
106 Squiggy
200
300
100
Patel
Garcia
Jones
80000
75000
45000
104 Sally
105 Bobby
107 Sara
100
200
300
Mars
Kent
Davis
65000
75000
82000
Patel
Garcia
Jones
80000
75000
45000
Mars
Kent
Davis
65000
75000
82000
Teradata has Secondary IndexesAMP AMP AMP
USI SubtableUSI SubtableUSI Subtable
StoverDavis
Gomez RiversKhan
KertzelKinskiSwartz
1,1 2,1 3,14,1 5,1 6,17,1 8,1 9,1
Emp_Intl Emp_Intl Emp_Intl
NUSI SubtableNUSI SubtableNUSI Subtable
The Base Table has a Primary Index of Last_Name. The USI was created on EmpNo and the NUSI on First_Name. The USI rows are hashed to different AMPs, but the NUSI rows are AMP local. Both subtables contain the same Base Table Row‐IDs.
25,1 Maria 2,1
16,1 1002 2,1
30,1 Rafael 1,1
22,1 1001 1,118,1 1004 4,1
40,1 Kyle 4,150,1 Sushma 7,1
21,1 1007 7,1
35,1 Rob 5,1
19,1 1005 5,1
41,1 Mo 8,1
15,1 1008 8,1
28,1 Charl 3,1
14,1 1003 3,1
65,1 Inna 6,1
17,1 1006 6,1
70,1 Mo 9,1
20,1 1009 9,1
Minal Rafael 10011004
Kyle1007Sushma
Maria 10021005Rob1008 Mo
Charl 10031006Inna1009Mo
USI’sare
Hashed
NUSI’sareAMPLocal
Teradata Join Quiz
Do you know which statement above is False?
Which Statement is NOT true!
1. Each Table in Teradata has a Primary Index, unless it is a NoPI table.
2. The Primary Index is the mechanism that allows Teradata to physically distribute the rows of a table across the AMPs using a Hash Formula and the Hash Map.
3. Each AMP Sorts its rows by the Row‐ID, unless it is a Partitioned table, and then it sorts first by the Partition and then by Row‐ID which is actually the Row Key.
4. For two rows to be Joined together Teradata insists that both rows are physically on the same AMP.
5. Teradata will either Redistribute one or both of the tables or Duplicate the smaller table across all AMPs to ensure that the matching rows are on the same AMP in FSG Cache. Once the matching rows are on the same AMP the join can take place.
CustNo (1‐6) (red) are the Join Condition (PK/FK). Each customer has placed one order. The matching join rows are on different AMPs because the tables were distributed by different Primary Indexes. How will Teradata get the joining rows on the same AMP. They will redistribute the Order_Table by Cust_No in FSG Cache memory.
The Joining of Two TablesAMP 1
Customer_Table1 Acme Products2 Billy’s Best Choice3 Carling’s Cars
Customer_Table4 Dave’s Dogs5 Ellen’s Earrings6 Fanny’s Fans
Order_Table Order_Table1000 3 '2013‐01‐01' 100.001001 5 '2013‐01‐01' 200.001002 6 '2013‐01‐01' 300.00
1003 1 '2013‐01‐01' 400.00 1004 2 '2013‐01‐01' 500.001005 4 '2013‐01‐01' 600.00
SELECT C.CustNo,,C.CustName ,O.Order_Total
FROM Customer_Table as CINNER JOIN Order_Table as OON C.CustNo = O.CustNo ;
Data Distributed to AMPs by Primary Index CustNo
Data Distributed to AMPs by Primary Index OrderNo
For a join to take place all joining rows must be on the same AMP together!
AMP n
On all joins the matching rows must be on the same AMP so hashing is how it is done.
Teradata Moves Joining Rows to the Same AMPAMP 1
FSG Cache Memory FSG Cache Memory
AMP n
I need a join of the Order_Table
and the Customer_Table
Parsing Engine
Move you Customer_Table and Order_Table blocks into FSG Cache.
Redistribute the Order_Table over the BYNET by the CustNo column.
Now Join the matching CustNo rows now that they’re in the same FSG Cache.
Order_Table Header Order_Table Header
Customer_Table Header Customer_Table Header
Order_Table Header Order_Table Header
Customer_Table Header Customer_Table Header
Order_Table
Customer_Table
Order_Table
Customer_Table
2 Billy’s Best Choice
3 Carling’s Cars
4 Dave’s Dogs
5 Ellen’s Earrings
6 Fanny’s Fans3 100.00
5 200.00
6 300.00
1 400.00
2 500.00
4 600.00
Redistribute byHash of CustNo
Redistribute byHash of CustNo
1 Acme Products
The Join Index looks like an Answer Set, but each row is stored like a normal table in that the rows of the Join Index are spread amongst the AMPs. Users can’t query the Join Index, but the Parsing Engine gets data from the Join Index when it chooses.
Employee_No Dept_No Last_Name First_Name SalaryEmployee_Table
Dept_No Department_NameDepartment_Table
123257812563492341218231222520000001000234112133413246571333454
100400400300?10400200200
ChambersHarrisonReillyLarkinsJonesSmytheStricklingCoffingSmith
MandeeHerbertWilliamLoraineSquiggyRichardCletusBillyJohn
48850.0054500.0036000.0040200.0032800.5032800.0054500.0041888.8848000.00
100200300400500
MarketingResearch and DevSalesCustomer SupportHuman Resources
____________ ________ __________ __________ ______ _______ ________________
Employee_No Dept_No Last_Name First_Name Salary Department_Name1232578125634923412182312225112133413246571333454
100400400300400200200
ChambersHarrisonReillyLarkinsStricklingCoffingSmith
MandeeHerbertWilliamLoraineCletusBillyJohn
48850.0054500.0036000.0040200.0054500.0041888.8848000.00
MarketingCustomer SupportCustomer SupportSalesCustomer SupportResearch and DevResearch and Dev
Join Index named EMP_DEPT_IDX___________ _______ _________ _________ _____ ______________
Join Index
Teradata has a complex and intensive Traffic System
Imagine our highways with only one lane or our roads with no stop signs or lights. Teradata has the most sophisticated traffic system in the industry. Teradata allows for rules, times, delays, green lights to query and red lights to wait. Why put a long‐haul trucker with an oversized load in the fast lane? Marathon runners don’t run at the same speed at sprinters so you need to give your fastest speeds to your tactical queries and slower speeds for your batch processing. Teradata Active System Management (TASM) controls the query traffic so users can take the route less traveled
“Two roads diverged in a wood and I took the one less traveled by, and that has made all the difference.”‐ Robert Frost
Teradata Viewpoint
Teradata allows your queries to float like a butterfly and not sting at all! This is because Viewpoint gives the DBA and the users their own view of their Teradata world so everyone knows exactly what is going on with the system.
“A man who views the world at 50 the same as he did at 20 has wasted 30 years of his life.”‐Muhammad Ali
METRIC VALUE VS THRESHOLD LAST 30 MIN VALUE
TDEXPRES HEALTH DETAILS X7:21 PM
CPU UTILIZATION
USER
SYSTEM
WAIT IO
AMP CPU SKEW
AMP IO SKEW
AMP WORKER TASKS
DBC DISK SPACE
ACTIVE SESSIONS
24.67%
22.61%
2.057%
19.89%
0%
16.67%
1.724%
38.57%
85.44%
Have A Multi‐Vendor Data Warehouse
Teradata is unique just like every other vendor. Implement your Teradata warehouse, but take advantage of In‐memory vendors, Columnar vendors, Appliances, Amazon’s cloud technology, your existing OLTP and Mainframe systems and even your smaller databases such as MySQL and even Excel. And most importantly implement a Hadoop system in conjunction with all of your other systems. This industry has never moved forward faster than today and every vendor listed above is a serious contender bringing their own unique technology into your enterprise.
“Always remember that you are unique just like everyone else.”‐ Anonymous
TeradataMainframeDB2
Oracle
KognitioRedshift
Vertica
SQL ServerPDW
MySQL
Hadoop Excel
Netezza Greenplum
Teradata Kognitio Hadoop
Memory
Have A Multi‐Vendor Data Warehouse
Retail Demo Data
T_RET_PROD_SECTIONT_RET_SALE SECTION_NO INTEGERSALEDATE DATE T_RET_PRODUCT SECTION_NAME CHAR(35)SALETIME TIME PRODNO SMALLINT GROUP_NO INTEGERBASKETNO BIGINT SECTION_NO TINYINTPRODNO SMALLINT GROUP_NO TINYINTPRICE SMALLINT DEPT_NO TINYINT T_RET_PROD_GROUPSTORENO SMALLINT PRODUCT_NAME CHAR(30) GROUP_NO TINYINTTILLNO TINYINT GROUP_NAME CHAR(20)SALEWEEK TINYINT DEPT_NO TINYINT
T_RET_STORESTORENO TINYINT
T_RET_DATES STORENAME CHAR(20) T_RET_PROD_DEPTSALEDATE DATE STOREREGION CHAR(30) DEPT_NO TINYINTSALE_DOW_NO TINYINT STORENIGRID CHAR(6) DEPT_NAME CHAR(25)SALE_DOW CHAR STORELAT DECIMAL(10,6)SALE_WEEK TINYINT STORELONG DECIMAL(10,6)SALE_MONTH_NO TINYINTSALE_MONTH CHARSALE_QUARTER TINYINTSALE_YEAR SMALLINTSALE_YEAR_WEEK_NO INTSALE_YEAR_MONTH_NO INTSALE_YEAR_QUARTER_NO INT
Demo Data
• UK grocery chain • EPOS - items in baskets through cash register at stores
Teradata Demo Platform• 1 Amazon GPU Instance cg1.4xlarge
– 33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core with hyperthread)
– 2 x 840GB Disks – 23GB RAM
• Teradata Software Release 14.00– 276GB 2 AMP system– 100GB per AMP– 145GB free space used for query
processing (spool)• 525 million POS Transactions
– 70 Stores– 6 months
Kognitio Demo Platform• 6 HP BL465c-G7 blade system• Each BL465c blade:
– 2 x 12 core AMD Opteron 2.2GHz
– 2 x 600GB Disks – 128GB RAM
• End-user capacity– 680 GB RAM
• Kognitio Software release 7.2• 3 billion POS Transactions
– 140 Stores– 18 months
In-Memory Analytical Acceleration
Michael HiskeyVP of Marketing & Business Development
@mphnyc
Kognitio: Analytical Accelerator for Teradata
Comprehensive • Real-time, full data volume, new sources, cross-correlation
Engage Big Data and enable Hadoop without changing your environment
Flexible• Accelerate queries, enable departmental self-service for
every departmental need
Universal• Standardize connections without custom coding
Analytical Platform: “The Golden Layer”
AnalyticalPlatform
LayerNear-lineStorage
(optional)
Application &Client Layer
All BI Tools All OLAP Clients Excel
PersistenceLayer
HadoopClusters
Enterprise DataWarehouses
LegacySystems
KognitioStorage
Reporting
Cloud Storage
Performance acceleration with Kognitio
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Teradata Kognitio
Repo
rt sp
eed relativ
eto m
edian Ko
gnitio speed
Max
Median
Min
0
0.5
1
1.5
2
2.5
3
3.5
4
SQL Server Kognitio
Que
ry Spe
ed re
lativ
eto m
edian Ko
gnitio speed
Bigger is better!
Big Data: Bring the Analytics TO the Data
Kognitio Hadoop Integration • Kognitio Map/Reduce Agent uploads itself to
Hadoop nodes• Query passes selections, relevant predicates• Data filtering & projection locally on each node
• Data filtered as it is read from file(s)• Only data of interest is transferred and loaded
into memory via parallel load streams
Kognitio
Kognitio is focused on providing the premier high‐performance analytical platform to power business insight
around the world
• Kognitio invented the in‐memory analytical platform, first taking it to market in 1989
• Privately held• Labs in the UK ‐ HQ in New York, NY
Analytical Acceleration for Teradata
Analytical Accelerationwww.kognitio.com/accelerate
Nexus Query Chameleonwww.coffingdw.com/software/nexus/
Request an Assessment Meetingwww.kognitio.com/meeting
connect
www.kognitio.com
twitter.com/kognitiolinkedin.com/companies/kognitio
tinyurl.com/kognitio youtube.com/kognitio
+1 855 KOGNITIO