Top Banner
Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes
34

Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Dutch-Belgium DataBase DayUniversity of Antwerp, 2004.12.03

MonetDB/x100

Peter Boncz, Marcin Zukowski, Niels Nes

Page 2: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Introduction

What is x100 ?

A new query processing engine developed for MonetDB

Page 3: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Contents

IntroductionCWI Database GroupMotivation

MonetDB/x100 Architecture HighlightsOptimizing CPU performanceExploiting cache memoriesEnhancing disk bandwidth

ConclusionsDiscussion

Page 4: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

CWI Database Group

Database ArchitectureDBMS design, implementation, evaluationWide area; many sub-areas

Data structuresQuery processing algorithmsModern computer architectures

MonetDB1994-2004 at CWIopen-source high-performance DBMSFuture: X100, MonetDB 5.0

Page 5: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Motivation

Multimedia retrievalTREC Video: 130 hours of news, growing

each yearTask: search for a given text (speech

recognition) or video similar to a given image

3 TB of data (!)

Page 6: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Motivation

Similar areasData-miningOLAP, data warehousingScientific applications (astronomy,

biology…)

Challenge: process really large datasets within DBMS efficiently

Page 7: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

x100 Highlights

Use computer architecture to guide this talk

Page 8: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

CPU

Actual data processing

Page 9: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

CPU

From CISC to hyper-pipelined 1986: 8086: CISC 1990: 486: 2 execution units 1992: Pentium: 2 x 5-stage pipelined units 1996: Pentium3: 3 x 7-stage pipelined units 2000: Pentium4: 12 x 20-stage pipelined execution units

Each instruction executes in multiple steps… A -> A1, …, An

… in (multiple) pipelines:

Page 10: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

CPU

But only, if the instructions are independent! Otherwise:

Problems:branches in program logicaccessing recently modified memory

[ailamaki99, …] DBMSs bad at filling pipelines

Page 11: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

x100: vectorized processing

*(int,int): int *(int[],int[]) : int[]

Page 12: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

x100: vectorized processing

Primitives: vector at a time very basic functionality independent loop iterations simple code

Optimization levels: Compiler loop pipelining CPU full pipelines

*(int,int): int *(int[],int[]) : int[]

Page 13: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

x100: results (TPC-H Q1)

Few CPU cycles per tuplee.g. MySQL spends ~100 cycles for such

operators

Page 14: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Main memory

Large, but not unlimited

Page 15: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Cache

Faster, but very limited storage

Page 16: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Cache Memory Bottleneck

Cache to hide memory access costDifferent costs at different levels:

L1 cache access: 1-2 cyclesL2 cache access: 6-20 cyclesmain-memory access: 100-400 cycles

Consequences: random access into main-memory very

expensiveDBMS must buffer for CPU cache, not RAM

Page 17: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Cache Memory Bottleneck

Cache to hide memory access costDifferent costs at different levels:

L1 cache access: 1-2 cyclesL2 cache access: 6-20 cyclesmain-memory access: 100-400 cycles

Consequences : random access into main-memory very

expensiveDBMS must buffer for CPU cache, not RAM

cache-conscious query processing MonetDB research [VLDB99,00,02,04]

Page 18: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

x100: pipelining

Vectors fill the CPU cache

main-memory access only at the data sources and sinks-

*

+

Project( )

0.19

-

X100 query processor

CPU Cache

RAM

X100 buffer mgr

disk

MonetDB uses much more main memory bandwidth

Page 19: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

x100: pipelining

Vectors fill the CPU cache

main-memory access only at the data input and output-

*

+

Project( )

0.19

-

X100 query processor

CPU Cache

RAM

X100 buffer mgr

disk

x100

MonetDB

Page 20: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Disk

Slow, but unlimited () storage

Page 21: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Disk

Random access hopelessSize grows faster than bandwidth

Page 22: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

x100: problem - bandwidth

MonetDB/x100 too fast for disksTPC-H queries need 200-600MB/s

Page 23: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Bandwidth improvements

Three ideas:Vertical Fragmentation (MonetDB) new: Lightweight Compression new: Cooperative Scans

Page 24: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Vertical fragmentation

DBMS disk access in data-intensive applications

Only the relevant data is read – reduced disk bandwidth requirements

Page 25: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Lightweight Compression

Compression introduced not to reduce storage space but to increase disk bandwidth:Due to efficient code for disk-based data

only few percents of CPU time are usedPart of this extra time can be spent on

decompressing data

Page 26: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Lightweight Compression

Rationale:- Disk RAM transfer

uses DMA and does not need CPU

- (de)compress only vector-at-a-time when data is needed

-

*

+

Project( )

0.19

-

X100 query processor

CPU Cache

RAM

X100 buffer mgr

disk

Compress on the CPU cache RAM boundary

Page 27: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Lightweight Compression

Standard compression won’t doCompresses too well => too slow (100MB/s)

Research Questiondevise lightweight (de)compression

algorithms

Results so farcompression factor relatively small, up to 3.5decompression speed – 3GB/sec (!)compression speed – 1GB/sec (!!!)perceived bandwidth 3 times bigger

Page 28: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Cooperative Scans

Idea: use I/O bandwidth to satisfy multiple queries Cooperative Scans

Active Buffer Manager, is aware of concurrent scans on the same table

Research Question: devise adaptive buffer management strategies

Benefits: I/O Bandwidth is re-used by multiple queries Concurrent queries don’t fight anymore for the disk arm

Page 29: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Cooperative Scans

x100 and Cooperative Scans:>30 queries without performance

degradation

Page 30: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

x100 summary

Original MonetDB successful in the same application areas, howeverSub-optimal CPU utilizationOnly efficient if problem fits RAM

x100 improves architecture on all levelsBetter CPU utilizationBetter cache utilizationScales to non-memory resident datasetsImproves I/O bandwidth using compression

and cooperative scans

Page 31: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Example results

Performance close to hand-written C functions

TPCH SF-1 x100 Oracle MonetDB

Q1 0.54s 30s 9.4s

Q3 0.24s 10s 2.5s

Q6 0.15s 1.5s 2.5s

Q14 0.13s 2s 1.2s

Page 32: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

x100 status

First proof-of-concept implemented

Full TPC-H benchmark executesFuture work:

lots of engineeringnew buffer manager more vectorized algorithmsmemory footprint tuning (for small devices)SQL front-end

Page 33: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

More information

www.cwi.nl/~boncz/x100.htmlCIDR’05 paper:

“MonetDB/X100: Hyper-pipelining query execution”

Page 34: Dutch-Belgium DataBase Day University of Antwerp, 2004.12.03 MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.

Discussion

?