Top Banner
Welcome
75

Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ...

Mar 11, 2018

Download

Documents

HoàngTử
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Welcome

Page 2: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

It used to be easy…

Page 3: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

they all looked pretty much alike

Page 4: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

NoSQL BigData MapReduce Graph Document

BigTable Shared Nothing

Column Oriented CAP Eventual

Consistency

ACID BASE Mongo Coudera Hadoop

Voldemort Cassandra Dynamo Marklogic Redis

Velocity Hbase Hypertable Riak BDB

Page 5: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Now it’s downright

c0nfuZ1nG!

Page 6: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

What Happened?

Page 7: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

we changed scale

Page 8: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

we changed tack

Page 9: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

so where does

big data meet

big database?

Page 10: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

The world’s largest NoSQL database?

Page 11: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

The Internet

Page 12: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

So how Big is Big?

Words (0.6)

Web Pages (40)

Everything (5000)

Sizes in Petabytes 0.01%

Page 13: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Many more Big Sources

mobile

sensors

Logs

video audio

Social data

weather

Page 14: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

But it is pretty useful

Marketing

Fraud detection

Tax Evasion

Intelligence

Advertising

Scientific research

Page 15: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Gartner

80% of business is conducted on unstructured information

Page 16: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Big Data is now a new class of economic asset*

*World economic forum 2012

Page 17: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Yet 80% Enterprise Databases < 1TB

Page 18: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Along came the Big Data Movement

Page 19: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

MapReduce (2004)

•  Large, distributed, ordered map

•  Fault-tolerant file system

•  Petabyte scaling

Page 20: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Disruptive Simple

Pragmatic

Solved an insoluble problem

Unencumbered by tradition (good & bad)

Hacker rather than Enterprise culture

Page 21: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

A Different Focus

Tradition •  Global consistency •  Schema driven •  Reliable Network •  Highly Structured

The new wave •  Local consistency •  Schemaless / Last •  Unreliable Network •  Semi-structured/

Unstructured

Page 22: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Novel?

Possibly better put as: A timely and elegant combination of existing ideas, placed together to solve a previously unsolved problem.

Page 23: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Backlash (2009) Not novel (dates back to the 80’s)

Physical level not the logical level (messy?)

Incompatible with tooling

Lack of integrity (referential) & ACID

MR is brute force ignoring indexing, scew

Page 24: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

All points are reasonable

Page 25: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

And they proved it too!

“A comparison of Approaches to Large Scale Data Analysis” – Sigmod 2009

•  Vertica vs. DBMSX vs. Hadoop

•  Vertica up to 7 x faster than Hadoop over benchmarks

Databases faster than Hadoop

Page 26: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

But possibly missed the point?

Page 27: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Was MapReduce was not supposed to be a Data Warehousing tool?

Page 28: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

If you need more, layer it on top

For example Tensing & Magastore @ Google

Page 29: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

So MapReduce represents a bottom-up approach to accessing

very large data sets that is unencumbered by the past.

Page 30: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

…and the Database Field knew it had Problems

Page 31: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

We Lose: Joe Hellerstein (Berkeley) 2001

“Databases are commoditised and cornered to slow-moving, evolving, structure intensive, applications that require schema evolution.“ … “The internet companies are lost and we will remain in the doldrums of the enterprise space.” … “As databases are black boxes which require a lot of coaxing to get maximum performance”

Page 32: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Yet they do some very cool stuff

Statistically based optimisers, Compression, indexing structures, distributed optimisers, their own

declarative language

Page 33: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·
Page 34: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

They are an Awesome Tool

Page 35: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

They Don’t talk our Language

Page 36: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

They Default to Constraint

Page 37: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

So NoSurprise with NoSQL then Simpler Contract

Shared nothing

No joins / ACID

No impedance mismatch

No slow schema evolution

Simple code paths

Just works

Page 38: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

The NoSQL Approach

Simple, flexible storage over a diverse range of data structures that will

scale almost indefinitely.

Page 39: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Different Flavours

Page 40: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Two Ways In: Key Based Access

Client

Page 41: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Two Ways In: Broadcast to Every Node

Client

Page 42: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

So..

A simple bottom up approach to data storage that scales almost indefinitely. •  No relations •  No joins •  No SQL •  No Transactions •  No sluggish schema evolution

Page 43: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

The Relational Database

Page 44: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

The ‘Relational Camp’ had been busy too

Realisation that the traditional architecture was insufficient for

various modern workloads

Page 45: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

End of an Era Paper - 2007

“Because RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark, then there is no market where they are competitive. As such, they should be considered as legacy technology more than a quarter of a century in age, for which a complete redesign and re-architecting is the appropriate next step.” – Michael Stonebraker

Page 46: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

No Longer a One-Size-Fits-All

Page 47: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Architecting for Different Non-Functionals

In-Memory Shared

Nothing /Disk

Fast Network/

SSD Column

Orientation

Page 48: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

In-Memory

Page 49: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Distributed In-Memory

Page 50: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Shared Disk Architecture

All machines see all data

Cache sits above whole dataset

Single node can handle any query

Page 51: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Shared Nothing Architecture

•  Autonomy over a shard •  Divide and conqueror

(non-key hit every node)

Cache over just the shard

Queries hit every node

Page 52: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Vendors polarise over this issue

Shared Nothing •  TerraData (Aster Data) •  Netezza (IBM) •  ParAccel •  Vertica •  Greenplumb

Shared Everything •  Oracle RAC/Exadata •  IBM purescale •  Sybase IQ •  Microsoft SQL Server

(there is some blurring)

Page 53: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Column Oriented Storage

Columns laid contiguously

2-10x compression typical

Indexing becomes less important.

Pinpoint I/O slow (tuple construction)

Bulk read/write faster

Compression >> row-based alternatives

Page 54: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Solid State Drives

1ms 1µsSSD Drive

HDD Seek Time

•  Traditional databases are designed for sequential access over magnetic drives, not random access over SSD.

•  Weakens the columnar/row argument

Page 55: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Faster Networking

1ms 1µs 1ns

Gigabit Ethernet

10Gigabit Ethernet

RDMA

RAM

SSD DriveHDD Seek

Time

Page 56: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

The best technologies of the moment are leveraging many of these factors

Page 57: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

There is a new and impressive breed

•  Products < 5 years old •  Shared nothing with SSD’s over shards •  Large address spaces (256GB+) •  No indexes (column oriented) •  No referential integrity •  Surprisingly quick for big queries when

compared with incumbent technologies.

Page 58: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

TPC-H Benchmarks

Several new contenders with good scores: – Exasol – ParAccel – Vectorwise

Page 59: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

TPC-H Benchmarks

•  Exasol has 100GB -> 10TB benchmarks •  Up to 20x faster than nearest rivals

(But take benchmarks with a pinch of salt)

Page 60: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Relational Approach

Solid data from every angle, bounded in terms of scale, but with a boundary that is

rapidly expanding.

Page 61: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Comparisons

Page 62: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

At the extreme MapReduce has it

������������

����������������������������

1001010 1000 10,000

TB

Page 63: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

But there is massive overlap

������������

����������������������������

1001010 1000 10,000

TB

Page 64: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

It’s not just data volume/velocity

Page 65: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

The Dimensions of Data

•  Volume (pure physical size) •  Velocity (rate of change) •  Variety (number of different types of data,

formats and sources) •  Static & Dynamic Complexity

Page 66: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Consider the characteristics of data to be integrated, and how that equates to cost

Page 67: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Ability to model data is much more of a gating factor than raw size, particularly

when considering new forms of data

Dave Campbell (Microsoft – VLDB Keynote)

Page 68: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

It becomes about your data and you want to do with it

Do you need to more than just SQL to process your data?

Does your data change rapidly?

Are you ok with some degree of eventual consistency?

Do isolation and consistency matter

Do you need to answer questions absolutely or within a tolerance?

Do you want to keep your data in its natural form?

Do you prefer to work bottom up or top down?

How risk averse are you?

Are you willing to pay big vendor prices?

Page 69: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Composite Offerings

Hadoop has Pig & Hbase

Mongo offers Query Language, atomaticity & MR

Oracle have BigData appliance with Cloudera

IBM have a Map Reduce offering

Sybase (now part of SAP) provides MR natively

EMC acquired Greenplum which has MR support

Page 70: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Complementary Solutions

�������������

���������

�� ������� ����

������� �������������

Page 71: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Relational world has focused on keeping data consistent and well structured so it can be sliced and

diced at will

Page 72: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Big data technologies focus on executing code next to data, where that data is held in a more natural

form.

Page 73: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

So

•  NoSQL has disrupted the database market, questioning the need for constraint and highlighting the power of simple solutions.

•  DB startups are providing some surprisingly fast solutions that drop some traditional database tenets and cleverly leverage new hardware advances.

•  Your problem (and budget) is likely a better guide than the size of the data

•  The market is converging on both sides towards a middle ground and integrated suites of complementary tools.

Page 74: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

The right tool for the job

“Attempting to force one technology or tool to satisfy a particular need for which another tool is more effective and efficient is like attempting to drive a screw into a wall with a hammer when a screwdriver is at hand: the screw may eventually enter the wall but at what cost?”

E.F. Codd, 1993

Page 75: Welcome [gotocon.com]• Vertica vs. DBMSX vs. Hadoop ... • Netezza (IBM) • ParAccel ... Shared Everything • Oracle RAC/Exadata • IBM purescale • Sybase IQ ...gotocon.com/dl/qcon-london-2012/slides/Benjamin... ·

Thanks

http://www.benstopford.com