Top Banner
From Moore to Metcalf: The Network as the Next Database Platform HPDC June 2007 Michael Franklin UC Berkeley & Truviso (formerly, Amalgamated Insight)
46
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ppt

From Moore to Metcalf: The Network as the Next

Database Platform

HPDC June 2007

Michael Franklin

UC Berkeley&

Truviso (formerly, Amalgamated Insight)

Page 2: ppt

Michael FranklinJune 2007

Outline

• Motivation• Stream Processing Overview• Micro-Architecture Issues• Macro-Architecture Issues• Conclusions

Page 3: ppt

Michael FranklinJune 2007

Moore’s Law vs. Shugart’s:The battle of the bottlenecks

• Moore: Exponential Processor and Memory improvement.

• Shugart: Similar law for disk capacity.• The yin and yang of DBMS architecture:

“disk-bound” or “memory-bound”?• OR are DBMS platforms getting faster or

slower relative to the data they need to process?

• Traditionally, the answer dictates where you innovate.

Page 4: ppt

Michael FranklinJune 2007

Metcalf’s Law will drive more profound changes

• Metcalf: “The value of a network grows with the square of the # of participants”.

• Practical implication: all interesting data-centric applications become distributed.• Already happening:

• Service-based architectures (and Grid!)• Web 2.0• Mobile Computing

Page 5: ppt

Michael FranklinJune 2007

Bell’s law will amplify Metcalf’s

Bell: “Every decade, a new, lower cost, class of computers emerges, defined by platform, interface, and interconnect.”

• Mainframes 1960s• Minicomputers 1970s• Microcomputers/PCs 1980s• Web-based computing 1990s• Devices (Cell phones, PDAs, wireless sensors,

RFID) 2000’s

Enabling a new generation of applications forOperational Visibility, monitoring, and alerting.

Page 6: ppt

Michael FranklinJune 2007

The Network as platform: Challenges

Clickstream

BarcodesPoS System

Sensors

RFID

Telematics

• Data Constantly “On-the-Move”

• Increased Data Volume

• Increased Heterogeneity & Sharing

• Shrinking decision cycles

• Increased data and decision complexity

Mobile Devices

TransactionalSystems

Information Feeds

XYZ 23.2; AAA 19; …

Blogs/Web 2.0

Page 7: ppt

Michael FranklinJune 2007

Lots of challenges:• Integration (or “Dataspaces”)• Optimization/Planning/Adaptivity• Consistency/Master Data Mgmt• Continuity/Disaster Mgmt• Stream Processing (or data-on-the-move)

My current focus (and thus, the focus of this talk) is the latter.

The Network as platform: Implications

Page 8: ppt

Michael FranklinJune 2007

Stream Processing

My view: Stream Processing will become the 3rd leg of standard IT data management:• OLAP splitoff from OLTP for historical

reporting.• OLSA (On-line Stream Analytics) will handle:

• Monitoring• Alerting• Transformation• Real-time Visability and Reporting

Note: CEP (Complex Event Processing) is a

related, emerging technology.

Page 9: ppt

Michael FranklinJune 2007

Stream Processing + Grid?

• On-the-fly stream processing required for high-volume data/event generators.

• Real-time event detection for coordination of distributed observations.

• Wide-area sensing in environmental macroscopes.

Page 10: ppt

Stream Processing - Overview

Page 11: ppt

Michael FranklinJune 2007

Turning Query Processing Upside Down

Static Batch Reports

BulkLoad

Data

Queries Results

• Batch ETL & load, query later

• Poor RT monitoring, no replay

• DB size affects query response

Traditional Database Approach

DataWarehouse

• Always-on data analysis & alerts

• RT Monitor & Replay to optimize

• Consistent sub-second response

Data Stream Processing Approach

Continuous, Visibility, Alerts

Live Data Streams

Results

Data Stream

Processor

Page 12: ppt

Michael FranklinJune 2007

Example 1: Simple Stream Query

Time

Raw readings

Smoothed output

A SQL smoothing filter to interpolate dropped RFID readings.

SELECT distinct tag_idFROM RFID_stream [RANGE ‘5 sec’]GROUP BY tag_id

SELECT distinct tag_idFROM RFID_stream [RANGE ‘5 sec’]GROUP BY tag_id

Smoothing Filter

Page 13: ppt

Michael FranklinJune 2007

Example 2 - Stream/Table Join

SELECT T.symbol, AVG(T.price*T.volume)FROM Trades T [RANGE ‘5 sec’ SLIDE ‘3 sec’], SANDP500 SWHERE T.symbol = S.symbol AND T.volume > 5000GROUP BY T.symbol

Every 3 seconds, compute avg transaction value of high-volume trades on S&P 500 stocks, over a 5 second “sliding window”

StreamTable

Windowclause

Note: Output is also a Stream

Page 14: ppt

Michael FranklinJune 2007

Example 3 - Streaming ViewPositive Suspense: Find the top 100 store-skus ordered by their decreasing positive suspense (inventory - sales).

CREATE VIEW StoreSKU (store, sku, sales) as (SELECT P.store, P.sku,SUM(P.qty) as sales FROM POSLog P[RANGE `1 day’ SLIDE `10 min’], Inventory I WHERE P.sku = I.sku and P.store = I.store and P.time > I.time GROUP BY P.store, P.sku)

SELECT (I.quantity – S.sales) as positive_suspenseFROM StoreSKU S, Inventory IWHERE S.store = I.store and S.sku = I.skuORDER BY positive_suspense DESCLIMIT 100

Page 15: ppt

Michael FranklinJune 2007

Application Areas• Financial Services: Trading/Capital Mkts• SOA/Infrastructure Monitoring; Security• Physical (sensor) Monitoring• Fraud Detection/Prevention• Risk Analytics and Compliance• Location-based Services• Customer Relationship Management/Retail• Supply chain/Logistics• …

Page 16: ppt

Michael FranklinJune 2007

Real-TimeMonitoring

16

A Flex-baseddashboarddriven bymultiple SQL queries.

Page 17: ppt

Michael FranklinJune 2007

The “Jellybean” Argument

Reality: With stream query processing, real-time is cheaper than batch.• minimize copies & query

start-up overhead• takes load off

expensive back-end systems

• rapid application dev & maintenance

Conventional Wisdom: “can I afford real-time?” Do the benefits justify the cost?

Page 18: ppt

Michael FranklinJune 2007

Historical Context and status• Early stuff:

• Data “Push”, Pub/Sub, Adaptive Query Proc.

• Lots of non-SQL approaches• Rules systems (e.g., for Fraud Detection)• Complex Event Processing (CEP)

• Research Projects led to companies• TelegraphCQ -> Truviso (Amalgamated)• Aurora -> Streambase• Streams -> Coral8

• Big guys ready to jump in: BEA, IBM, Oracle, …

Page 19: ppt

Michael FranklinJune 2007

Requirements

• High Data Rates: 1K (SOA monitoring) up to 700K rec/sec (option trading)

• # queries: single digits to 10,000’s • Query complexity

• Full SQL + windows + events + analytics

• Persistence, replay, historical comparison

• Huge range of Sources and Sinks

Page 20: ppt

Stream QP: Micro-Architecture

Page 21: ppt

Michael FranklinJune 2007

Proprietary APIs

Other CQE Instances

OtherCQE Instances

Single Node Architecture

ExternalArchive

Continuous Query Engine

AdaptiveSQL QueryProcessor

Concurrent Query

Planner

Triggers/ Rules

Active Data

Replay Database

StreamingSQL QueryProcessor

© 2007, Amalgamated Insight, Inc.

… …

XMLCSVMQ

MSMQJDBC.NET

Co

nn

ect

ors

Tra

nsf

orm

atio

ns

Ingress

XMLMessage Bus

AlertsPub/Sub

EventsCo

nn

ect

ors

Tra

nsf

orm

atio

ns

Egress

Page 22: ppt

Michael FranklinJune 2007

Ingress Issues (performance)• Must support high data rates

• 700K ticks/second for FS• Wirespeed for networking/security

• Minimal latency• FS trading particularly sensitive to this

• Fault tolerance• Especially given remote sources

• Efficient (bulk) data transformation• XML, text, binary, …

• Work well for both push and pull sources

XMLCSVMQ

MSMQJDBC.NET C

on

ne

cto

rs

Tra

nsf

orm

atio

ns

Ingress

Page 23: ppt

Michael FranklinJune 2007

Prop. APIs

Egress Issues (performance)

• Must support high data rates• Minimal latency• Fault tolerance• Efficient (bulk) data transformation• Buffering/Support for JDBC-style

clients• Interaction with bulk warehouse

loaders• Large-scale dissemination (Pub/Sub)

XMLMessage Bus

AlertsPub/Sub

EventsCo

nn

ect

ors

Tra

nsf

orm

atio

ns

Egress

Page 24: ppt

Michael FranklinJune 2007

Query Processing (Single)

• Simple approach:• Stream inputs are “scan” operators• Adapt operator plumbing to push/pull

• “Exchange” operators/ Fjords

• Need to run lots of these concurrently• Index the queries?• Scheduling, Memory Mgmt.

• Must avoid I/O, cache misses to run at speed

• Predicate push-down - a la Gigascope

Continuous Query Engine

AdaptiveSQL QueryProcessor

Concurrent Query

Planner

Triggers/ Rules

Active Data

Replay Database

StreamingSQL QueryProcessor

Page 25: ppt

Michael FranklinJune 2007

QP (continued)

• Transactional/Correctness issues: • Never-ending queries hold locks forever!• Need efficient heartbeat mechanism to keep

things moving forward.• Dealing with corrections (e.g., in

financial feeds).• Out-of-order/missing data

• “ripples in the stream” can hurt clever scheduling mechanisms.

• Integration with external code: • Matlab, R, …, UDFs and UDAs

Page 26: ppt

Michael FranklinJune 2007

Query Processing (Shared)

• Previous approach misses huge opportunity.• Individual execution leads to linear

slowdown • Until you fall off the memory cliff!

• Recall that we know all the queries• we know when they will need data• we know what data they will need• we know what things they will compute

• Why run them individually (as if we didn’t know any of this)?

Page 27: ppt

Michael FranklinJune 2007

No redundant modules = Super-Linear Query Scalability

Shared Processing - The Überquery

SELECT T.symbol, AVG(T.price*T.volume)FROM Trades T [RANGE ‘5 sec’ SLIDE ‘3 sec’], SANDP500 SWHERE T.symbol = S.symbol AND T.volume > 5000GROUP BY T.symbol

Form “query plan” from query textNew query plan enters the system

Shared Query Engine

More queries arrive … SELECT …FROM …WHERE ….GROUP BY …

SELECT …FROM …WHERE ….GROUP BY …

SELECT …FROM …WHERE ….GROUP BY …

Queries get compiled into plans

Each plan is folded into the global plan

Page 28: ppt

Michael FranklinJune 2007

Shared QP raises lots of new issues• Scheduling based on data

availability/location and work affinity.• Lots of bittwiddling: need efficient

bitmaps.• Query “folding” - how to combine (MQO)• On-the-fly query changes.• How does shared processing change the

traditional architectural tradeoffs?• How to process across multiple: cores,

dies, boxes, racks, rooms?

Refs: NiagaraCQ, CACQ, TelegraphCQ, Sailesh Krishnamurthy’s thesis

Page 29: ppt

Michael FranklinJune 2007

Archiving - Huge area• Most streaming use-cases want

access to historical information.• Compliance/Risk: also need to keep the

data.• Science apps need to keep raw data around too.

• In a high-volume streaming environment, going to disk is an absolute killer.

• Obviously need clever techniques: • Sampling, Index update deferral, load shedding• Scheduling based on time-oriented queries• Good old buffering/prefetching

ExternalArchive

Page 30: ppt

Stream QP: Macro-Architecture

Page 31: ppt

Michael FranklinJune 2007

HiFi - Taming the Data Flood

Receptors

Warehouses, Stores

Dock doors, Shelves

Regional Centers

Headquarters

Hierarchical Aggregation:Spatial & Temporal

In-network StreamQuery Processing and Storage

Fast DataPath vs.Slow DataPath

Page 32: ppt

Michael FranklinJune 2007

Problem: Sensors are Noisy

• A simple RFID Experiment

• 2 adjacent shelves, 6 ft. wide

• 10 EPC-tagged items each, plus 5 moved between them

• RFID antenna on each shelf

Page 33: ppt

Michael FranklinJune 2007

Shelf RIFD - Ground Truth

Page 34: ppt

Michael FranklinJune 2007

Actual RFID Readings

“Restock every time inventory goes below 5”

Page 35: ppt

Michael FranklinJune 2007

“Virtual Device(VICE)API”

Vice API is a natural placeto hide much of the complexity arising from physical devices.

VICE: Virtual Device Interface [Jeffery et al., Pervasive 2006, VLDBJ 07]

Page 36: ppt

Michael FranklinJune 2007

Query-based Data Cleaning

Point

Smooth

CREATE VIEW smoothed_rfid_stream AS(SELECT receptor_id, tag_id FROM cleaned_rfid_stream [range by ’5 sec’, slide by ’5 sec’] GROUP BY receptor_id, tag_id HAVING count(*) >= count_T)

Page 37: ppt

Michael FranklinJune 2007

Query-based Data Cleaning

Point

Smooth

ArbitrateCREATE VIEW arbitrated_rfid_stream AS(SELECT receptor_id, tag_idFROM smoothed_rfid_stream rs [range by ’5 sec’, slide by ’5 sec’]GROUP BY receptor_id, tag_idHAVING count(*) >= ALL (SELECT count(*) FROM smoothed_rfid_stream [range by ’5 sec’, slide by ’5 sec’] WHERE tag_id = rs.tag_id GROUP BY receptor_id))

Page 38: ppt

Michael FranklinJune 2007

After Query-based Cleaning

“Restock every time inventory goes below 5”

Page 39: ppt

Michael FranklinJune 2007

Adaptive Smoothing[Jeffery et al. VLDB 2006]

Page 40: ppt

Michael FranklinJune 2007

SQL Abstraction Makes it Easy?

• Soft Sensors - e.g., “LOUDMOUTH” sensor (VLDB 04)

• Quality and lineage• Optimization (power, etc.)• Pushdown of external validation

information• Automatic/Adaptive query placement• Data archiving• Imperative processing

Page 41: ppt

Michael FranklinJune 2007

Some Challenges• How to run across the full gamut of

devices from motes to mainframes?• What about running *really* in-the-network?

• Data/query placement and movement• Adaptivity is key• “Push down” is a small subset of this problem.• Sharing is also crucial here.

• Security, encryption, compression, etc.• Lots of issues due to devices and

“physical world” problems.

Page 42: ppt

Michael FranklinJune 2007

OLTPOLTP

OLTPOLTP

OLTPOLTP

OLTPOLTP

Batch Load

E-com

Transactional

OLTP

ERP

CRM SCM

OLTPOLTP

OLTP

AnalyticalPCs PoSHandhelds Readers

EdgeDevices

EnterpriseApps

TransactionalData Stores

IntegrationBus

Reports Analytics

OLAPOLAP

OLAP

OLAPOLAP

OLAP

OLAPOLAP

OLAP

EnterpriseData Warehouse

SpecializedData Marts

BusinessIntelligence

DataMining

PortalOperational

BIAlertsDash-Boards

DistributedData

BatchLatency

ExplodingData Volumes

QueryLatency

DecisionLatency

It’s not just a sensor-net problem

Page 43: ppt

Michael FranklinJune 2007

Data Dissemination (Fan-Out)

• Many applications have large numbers of consumers.

• Lots of interesting questions on large-scale pub/sub technology.• Micro-scale: locality, scheduling,

sharing, for huge numbers of subscriptions.

• Macro-scale: dissemination trees, placement, sharing, …

Page 44: ppt

Michael FranklinJune 2007

What to measure? (a research opportunity)• High Data Rates/Throughput

• rec/sec; record size

• Number of concurrent queries.• Query complexity • Huge range of Sources and Sinks

• transformation and connector performance

• Minimal Benchmarking work so far:• “Linear Road” from Aurora group• CEP benchmark work by Pedro Bizarro

Page 45: ppt

Michael FranklinJune 2007

Conclusions• Two relevant trends:

• Metcalf’s Law DB systems need to become more network-savvy.

• Jim Gray and others have helped demonstrate the value of SQL to science.

• Stream query processing is where these two trends meet in the Grid world.• A new (3rd) component of data management

infrastructure.

• Lots of open research problems for the HPDC (and DB) community.

Page 46: ppt

Michael FranklinJune 2007

Resources• Research Projects @ Berkeley

• TelegraphCQ - single-site stream processor• HiFi - Distributed/Hierarchical see www.cs.berkeley.edu/~franklin for

links/papers

• Good jumping off point for CEP and related info: www.complexevents.com

• The company: www.truviso.com