Ingres/VectorWise Doug Inkster – Ingres Development.
Post on 15-Dec-2015
229 Views
Preview:
Transcript
Abstract
Ingres Corp. recently announced a cooperative project with the database research firm VectorWise of Amsterdam. This session discusses the nature of the relationship between Ingres and VectorWise and its impact on Ingres users
Overview
What is VectorWise Significance of project (to existing users) Column store v.s. row store VectorWise innovations Ingres/VectorWise interface details
VectorWise
Small Dutch startup spun from CWI Currently 6-7 employees Exciting development project based on Ph. D. research of
Marcin Zukowski under guidance of Peter Boncz Ingres provided seed money and currently has exclusive
rights to their technology
New Ingres Storage Type
VectorWise technology will take the form of new Ingres storage type
Column store – but turbo-charged Users will just define tables as usual, but with new
storage type Extremely fast
Performance in Traditional RDBMS
TPC H query 1: 6 million rows MySQL: 26.2 seconds DBMS x: 28.1 seconds Hand coded C program: 0.2 seconds
Initiated thought about why traditional RDBMS is so slow
Row Stores – Storage Model
Row store : data stored on disk and processed by query engine in the form of rows– Ingres and all other commercial RDBMS’ follow same
model– Data stored on disk as rows packed into blocks/pages– Extracted from pages and passed between query plan
operations (joins, sorts, etc.) as rows Techniques improved over time, but same basic
model as research engines of 1970’s (System R, Berkeley Ingres)
Row Stores – Storage Model
Select a, b from c:– Even if 200 columns in c, reads whole row image for all
qualified rows– Inefficient use of disk (excessive I/O)– Inefficient use of memory (large row-sized buffers)
Row Stores – Execution Model
Select a*b from c where d+e >= 250:– Execution model turns column expressions into
pseudo-code (ADF in Ingres)– Pseudo-code evaluation almost always by
interpretation – one row at a time– For each instruction, operand addresses are built
(using known buffers) – all row induced overhead– Big switch on “opcode” (integer addition, float
compare, type coercions, etc.)– Very poor locality (code and data), very inefficient– No benefit from modern hardware features (cache,
instruction pipelines, etc.)
Row Stores - Performance
Poor disk bandwidth High instructions per tuple Poor locality (data and instructions), poor exploitation
of newer hardware – high cycles per instruction Extremely high cycles per tuple!! Designed for OLTP, not large scale analytical
processing
Column Stores – Storage Model
Derived from research in 1990’s One of the earliest was MonetDB – Peter Boncz’s
thesis Data stored in columns (all values from a column in
contiguous storage) Ordinal position in column store dictates row Only columns actually requested are accessed
(compare to “select a, b from c” example) Far more efficient disk usage
Column Stores – Execution Model
Some column stores (Sybase IQ, Vertica) just read columns from disk and re-compose rows in memory
Execution model same as row stores Only deal with I/O problem
MonetDB – Improved Performance
Column store improved I/O performance New execution model improved CPU performance Column wise (not row wise) computation Exploitation of newer hardware, compiler features
MonetDB – Improved Performance
CPU Efficiency depends on “nice” code– out-of-order execution– few dependencies (control,data)– compiler support
Compilers love simple loops over arrays– loop-pipelining– automatic SIMD
MonetDB – Expression Execution
SELECT id, name, (age-30)*50 as bonus
FROM people
WHERE age > 30
/* Returns vector of oid’s satisfying comparison
** and count of entries in vector. */ int
select_gt_float( oid* res,
float* column,
float val, int n)
{
for(int j=0,i=0; i<n; i++)
if (column[i] >val) res[j++] = i;
return j;
}
Compiles into loop that performs many (10, 100, 1000) comparisons in parallel
MonetDB - Problem
Operations required on whole columns at a time Lots of expensive intermediate result materialization Memory requirements Doesn’t scale into very large databases Still way faster than anything before
– Q1: MySQL – 26.2 seconds– MonetDB - 3.7 seconds
MonetDB/X100 - Innovations
Fix MonetDB scaling problem– Columns broken into “chunks” or vectors (sized to fit cache)– Expression operators execute on vectors
Most primitives take just 0.5 (!) to 10 cycles per tuple– 10-100+ times faster than tuple-at-a-time
Other performance improvements– Tuned to disk/memory/cache– Lightweight compression to maximize throughput– Cooperative scans (future)
Q1: MySQL – 26.2 seconds– MonetDB – 3.7 seconds– X100 – 0.6 seconds!
MonetDB/X100 – Hardware Trends
CPU– Increased cache, memory (64-bit)– Instruction pipelining, multiple simultaneous
instructions, SIMD (single instruction, multiple data) Disk
– Increased disk capacity– Increased disk bandwidth– Increased random access speeds – but not as much
as bandwidth– Sequential access increasing in importance
MonetDB/X100 - Innovations
Reduced interpretation overhead– 100x less Function Calls
Good CPU cache use– High locality in the primitives– Cache-conscious data placement
No Tuple Navigation– Primitives only see arrays
Vectorization allows algorithmic optimization CPU and compiler-friendly function bodies
– Multiple work units, loop-pipelining, SIMD…
Ingres/VectorWise Integration
Ingres offers:– Infrastructure– User interfaces, SQL compilation, utilities, etc.– Existing user community
VectorWise offers:– Execution engine (QEF/DMF/ADF equivalent)– Its own relational algebraic language
Ingres/VectorWise Integration
SQL query parsed, optimized in Ingres Ingres builds query plan from OPF-generated QEP
– Query plan contains cross-compiled VectorWise query syntax
QEF passes query to VectorWise engine (and gets out of the way!)
VectorWise passes result set back to Ingres to be returned to caller (array of result rows – mapped to Ingres fetch buffers, many rows at a time)
Ingres/VectorWise Integration
Currently:– Create table including unique/referential constraint
definitions (with structure = …)– Copy table– Drop table– Select– Insert– Update– Delete– Create table … as select …– Insert into … select …– Create index (VectorWise style)
Ingres/VectorWise – Initial Release
Individual queries limited to either VectorWise or Ingres native tables– Probably lifted in the near future
Seamless as possible– Integrated ingstart/ingstop– Same utilities– Same user interfaces– Same recovery processes
top related