From Moore to Metcalf: The Network as the Next Database Platform HPDC June 2007 Michael Franklin UC Berkeley & Truviso (formerly, Amalgamated Insight)
From Moore to Metcalf: The Network as the Next
Database Platform
HPDC June 2007
Michael Franklin
UC Berkeley&
Truviso (formerly, Amalgamated Insight)
Michael FranklinJune 2007
Outline
• Motivation• Stream Processing Overview• Micro-Architecture Issues• Macro-Architecture Issues• Conclusions
Michael FranklinJune 2007
Moore’s Law vs. Shugart’s:The battle of the bottlenecks
• Moore: Exponential Processor and Memory improvement.
• Shugart: Similar law for disk capacity.• The yin and yang of DBMS architecture:
“disk-bound” or “memory-bound”?• OR are DBMS platforms getting faster or
slower relative to the data they need to process?
• Traditionally, the answer dictates where you innovate.
Michael FranklinJune 2007
Metcalf’s Law will drive more profound changes
• Metcalf: “The value of a network grows with the square of the # of participants”.
• Practical implication: all interesting data-centric applications become distributed.• Already happening:
• Service-based architectures (and Grid!)• Web 2.0• Mobile Computing
Michael FranklinJune 2007
Bell’s law will amplify Metcalf’s
Bell: “Every decade, a new, lower cost, class of computers emerges, defined by platform, interface, and interconnect.”
• Mainframes 1960s• Minicomputers 1970s• Microcomputers/PCs 1980s• Web-based computing 1990s• Devices (Cell phones, PDAs, wireless sensors,
RFID) 2000’s
Enabling a new generation of applications forOperational Visibility, monitoring, and alerting.
Michael FranklinJune 2007
The Network as platform: Challenges
Clickstream
BarcodesPoS System
Sensors
RFID
Telematics
• Data Constantly “On-the-Move”
• Increased Data Volume
• Increased Heterogeneity & Sharing
• Shrinking decision cycles
• Increased data and decision complexity
Mobile Devices
TransactionalSystems
Information Feeds
XYZ 23.2; AAA 19; …
Blogs/Web 2.0
Michael FranklinJune 2007
Lots of challenges:• Integration (or “Dataspaces”)• Optimization/Planning/Adaptivity• Consistency/Master Data Mgmt• Continuity/Disaster Mgmt• Stream Processing (or data-on-the-move)
My current focus (and thus, the focus of this talk) is the latter.
The Network as platform: Implications
Michael FranklinJune 2007
Stream Processing
My view: Stream Processing will become the 3rd leg of standard IT data management:• OLAP splitoff from OLTP for historical
reporting.• OLSA (On-line Stream Analytics) will handle:
• Monitoring• Alerting• Transformation• Real-time Visability and Reporting
Note: CEP (Complex Event Processing) is a
related, emerging technology.
Michael FranklinJune 2007
Stream Processing + Grid?
• On-the-fly stream processing required for high-volume data/event generators.
• Real-time event detection for coordination of distributed observations.
• Wide-area sensing in environmental macroscopes.
Stream Processing - Overview
Michael FranklinJune 2007
Turning Query Processing Upside Down
Static Batch Reports
BulkLoad
Data
Queries Results
• Batch ETL & load, query later
• Poor RT monitoring, no replay
• DB size affects query response
Traditional Database Approach
DataWarehouse
• Always-on data analysis & alerts
• RT Monitor & Replay to optimize
• Consistent sub-second response
Data Stream Processing Approach
Continuous, Visibility, Alerts
Live Data Streams
Results
Data Stream
Processor
Michael FranklinJune 2007
Example 1: Simple Stream Query
Time
Raw readings
Smoothed output
A SQL smoothing filter to interpolate dropped RFID readings.
SELECT distinct tag_idFROM RFID_stream [RANGE ‘5 sec’]GROUP BY tag_id
SELECT distinct tag_idFROM RFID_stream [RANGE ‘5 sec’]GROUP BY tag_id
Smoothing Filter
Michael FranklinJune 2007
Example 2 - Stream/Table Join
SELECT T.symbol, AVG(T.price*T.volume)FROM Trades T [RANGE ‘5 sec’ SLIDE ‘3 sec’], SANDP500 SWHERE T.symbol = S.symbol AND T.volume > 5000GROUP BY T.symbol
Every 3 seconds, compute avg transaction value of high-volume trades on S&P 500 stocks, over a 5 second “sliding window”
StreamTable
Windowclause
Note: Output is also a Stream
Michael FranklinJune 2007
Example 3 - Streaming ViewPositive Suspense: Find the top 100 store-skus ordered by their decreasing positive suspense (inventory - sales).
CREATE VIEW StoreSKU (store, sku, sales) as (SELECT P.store, P.sku,SUM(P.qty) as sales FROM POSLog P[RANGE `1 day’ SLIDE `10 min’], Inventory I WHERE P.sku = I.sku and P.store = I.store and P.time > I.time GROUP BY P.store, P.sku)
SELECT (I.quantity – S.sales) as positive_suspenseFROM StoreSKU S, Inventory IWHERE S.store = I.store and S.sku = I.skuORDER BY positive_suspense DESCLIMIT 100
Michael FranklinJune 2007
Application Areas• Financial Services: Trading/Capital Mkts• SOA/Infrastructure Monitoring; Security• Physical (sensor) Monitoring• Fraud Detection/Prevention• Risk Analytics and Compliance• Location-based Services• Customer Relationship Management/Retail• Supply chain/Logistics• …
Michael FranklinJune 2007
Real-TimeMonitoring
16
A Flex-baseddashboarddriven bymultiple SQL queries.
Michael FranklinJune 2007
The “Jellybean” Argument
Reality: With stream query processing, real-time is cheaper than batch.• minimize copies & query
start-up overhead• takes load off
expensive back-end systems
• rapid application dev & maintenance
Conventional Wisdom: “can I afford real-time?” Do the benefits justify the cost?
Michael FranklinJune 2007
Historical Context and status• Early stuff:
• Data “Push”, Pub/Sub, Adaptive Query Proc.
• Lots of non-SQL approaches• Rules systems (e.g., for Fraud Detection)• Complex Event Processing (CEP)
• Research Projects led to companies• TelegraphCQ -> Truviso (Amalgamated)• Aurora -> Streambase• Streams -> Coral8
• Big guys ready to jump in: BEA, IBM, Oracle, …
Michael FranklinJune 2007
Requirements
• High Data Rates: 1K (SOA monitoring) up to 700K rec/sec (option trading)
• # queries: single digits to 10,000’s • Query complexity
• Full SQL + windows + events + analytics
• Persistence, replay, historical comparison
• Huge range of Sources and Sinks
Stream QP: Micro-Architecture
Michael FranklinJune 2007
Proprietary APIs
Other CQE Instances
OtherCQE Instances
Single Node Architecture
ExternalArchive
Continuous Query Engine
AdaptiveSQL QueryProcessor
Concurrent Query
Planner
Triggers/ Rules
Active Data
Replay Database
StreamingSQL QueryProcessor
© 2007, Amalgamated Insight, Inc.
… …
XMLCSVMQ
MSMQJDBC.NET
Co
nn
ect
ors
Tra
nsf
orm
atio
ns
Ingress
XMLMessage Bus
AlertsPub/Sub
EventsCo
nn
ect
ors
Tra
nsf
orm
atio
ns
Egress
Michael FranklinJune 2007
Ingress Issues (performance)• Must support high data rates
• 700K ticks/second for FS• Wirespeed for networking/security
• Minimal latency• FS trading particularly sensitive to this
• Fault tolerance• Especially given remote sources
• Efficient (bulk) data transformation• XML, text, binary, …
• Work well for both push and pull sources
XMLCSVMQ
MSMQJDBC.NET C
on
ne
cto
rs
Tra
nsf
orm
atio
ns
Ingress
Michael FranklinJune 2007
Prop. APIs
Egress Issues (performance)
• Must support high data rates• Minimal latency• Fault tolerance• Efficient (bulk) data transformation• Buffering/Support for JDBC-style
clients• Interaction with bulk warehouse
loaders• Large-scale dissemination (Pub/Sub)
XMLMessage Bus
AlertsPub/Sub
EventsCo
nn
ect
ors
Tra
nsf
orm
atio
ns
Egress
Michael FranklinJune 2007
Query Processing (Single)
• Simple approach:• Stream inputs are “scan” operators• Adapt operator plumbing to push/pull
• “Exchange” operators/ Fjords
• Need to run lots of these concurrently• Index the queries?• Scheduling, Memory Mgmt.
• Must avoid I/O, cache misses to run at speed
• Predicate push-down - a la Gigascope
Continuous Query Engine
AdaptiveSQL QueryProcessor
Concurrent Query
Planner
Triggers/ Rules
Active Data
Replay Database
StreamingSQL QueryProcessor
Michael FranklinJune 2007
QP (continued)
• Transactional/Correctness issues: • Never-ending queries hold locks forever!• Need efficient heartbeat mechanism to keep
things moving forward.• Dealing with corrections (e.g., in
financial feeds).• Out-of-order/missing data
• “ripples in the stream” can hurt clever scheduling mechanisms.
• Integration with external code: • Matlab, R, …, UDFs and UDAs
Michael FranklinJune 2007
Query Processing (Shared)
• Previous approach misses huge opportunity.• Individual execution leads to linear
slowdown • Until you fall off the memory cliff!
• Recall that we know all the queries• we know when they will need data• we know what data they will need• we know what things they will compute
• Why run them individually (as if we didn’t know any of this)?
Michael FranklinJune 2007
No redundant modules = Super-Linear Query Scalability
Shared Processing - The Überquery
SELECT T.symbol, AVG(T.price*T.volume)FROM Trades T [RANGE ‘5 sec’ SLIDE ‘3 sec’], SANDP500 SWHERE T.symbol = S.symbol AND T.volume > 5000GROUP BY T.symbol
Form “query plan” from query textNew query plan enters the system
Shared Query Engine
More queries arrive … SELECT …FROM …WHERE ….GROUP BY …
SELECT …FROM …WHERE ….GROUP BY …
SELECT …FROM …WHERE ….GROUP BY …
Queries get compiled into plans
Each plan is folded into the global plan
Michael FranklinJune 2007
Shared QP raises lots of new issues• Scheduling based on data
availability/location and work affinity.• Lots of bittwiddling: need efficient
bitmaps.• Query “folding” - how to combine (MQO)• On-the-fly query changes.• How does shared processing change the
traditional architectural tradeoffs?• How to process across multiple: cores,
dies, boxes, racks, rooms?
Refs: NiagaraCQ, CACQ, TelegraphCQ, Sailesh Krishnamurthy’s thesis
Michael FranklinJune 2007
Archiving - Huge area• Most streaming use-cases want
access to historical information.• Compliance/Risk: also need to keep the
data.• Science apps need to keep raw data around too.
• In a high-volume streaming environment, going to disk is an absolute killer.
• Obviously need clever techniques: • Sampling, Index update deferral, load shedding• Scheduling based on time-oriented queries• Good old buffering/prefetching
ExternalArchive
Stream QP: Macro-Architecture
Michael FranklinJune 2007
HiFi - Taming the Data Flood
Receptors
Warehouses, Stores
Dock doors, Shelves
Regional Centers
Headquarters
Hierarchical Aggregation:Spatial & Temporal
In-network StreamQuery Processing and Storage
Fast DataPath vs.Slow DataPath
Michael FranklinJune 2007
Problem: Sensors are Noisy
• A simple RFID Experiment
• 2 adjacent shelves, 6 ft. wide
• 10 EPC-tagged items each, plus 5 moved between them
• RFID antenna on each shelf
Michael FranklinJune 2007
Shelf RIFD - Ground Truth
Michael FranklinJune 2007
Actual RFID Readings
“Restock every time inventory goes below 5”
Michael FranklinJune 2007
“Virtual Device(VICE)API”
Vice API is a natural placeto hide much of the complexity arising from physical devices.
VICE: Virtual Device Interface [Jeffery et al., Pervasive 2006, VLDBJ 07]
Michael FranklinJune 2007
Query-based Data Cleaning
Point
Smooth
CREATE VIEW smoothed_rfid_stream AS(SELECT receptor_id, tag_id FROM cleaned_rfid_stream [range by ’5 sec’, slide by ’5 sec’] GROUP BY receptor_id, tag_id HAVING count(*) >= count_T)
Michael FranklinJune 2007
Query-based Data Cleaning
Point
Smooth
ArbitrateCREATE VIEW arbitrated_rfid_stream AS(SELECT receptor_id, tag_idFROM smoothed_rfid_stream rs [range by ’5 sec’, slide by ’5 sec’]GROUP BY receptor_id, tag_idHAVING count(*) >= ALL (SELECT count(*) FROM smoothed_rfid_stream [range by ’5 sec’, slide by ’5 sec’] WHERE tag_id = rs.tag_id GROUP BY receptor_id))
Michael FranklinJune 2007
After Query-based Cleaning
“Restock every time inventory goes below 5”
Michael FranklinJune 2007
Adaptive Smoothing[Jeffery et al. VLDB 2006]
Michael FranklinJune 2007
SQL Abstraction Makes it Easy?
• Soft Sensors - e.g., “LOUDMOUTH” sensor (VLDB 04)
• Quality and lineage• Optimization (power, etc.)• Pushdown of external validation
information• Automatic/Adaptive query placement• Data archiving• Imperative processing
Michael FranklinJune 2007
Some Challenges• How to run across the full gamut of
devices from motes to mainframes?• What about running *really* in-the-network?
• Data/query placement and movement• Adaptivity is key• “Push down” is a small subset of this problem.• Sharing is also crucial here.
• Security, encryption, compression, etc.• Lots of issues due to devices and
“physical world” problems.
Michael FranklinJune 2007
OLTPOLTP
OLTPOLTP
OLTPOLTP
OLTPOLTP
Batch Load
E-com
Transactional
OLTP
ERP
CRM SCM
OLTPOLTP
OLTP
AnalyticalPCs PoSHandhelds Readers
EdgeDevices
EnterpriseApps
TransactionalData Stores
IntegrationBus
Reports Analytics
OLAPOLAP
OLAP
OLAPOLAP
OLAP
OLAPOLAP
OLAP
EnterpriseData Warehouse
SpecializedData Marts
BusinessIntelligence
DataMining
PortalOperational
BIAlertsDash-Boards
DistributedData
BatchLatency
ExplodingData Volumes
QueryLatency
DecisionLatency
It’s not just a sensor-net problem
Michael FranklinJune 2007
Data Dissemination (Fan-Out)
• Many applications have large numbers of consumers.
• Lots of interesting questions on large-scale pub/sub technology.• Micro-scale: locality, scheduling,
sharing, for huge numbers of subscriptions.
• Macro-scale: dissemination trees, placement, sharing, …
Michael FranklinJune 2007
What to measure? (a research opportunity)• High Data Rates/Throughput
• rec/sec; record size
• Number of concurrent queries.• Query complexity • Huge range of Sources and Sinks
• transformation and connector performance
• Minimal Benchmarking work so far:• “Linear Road” from Aurora group• CEP benchmark work by Pedro Bizarro
Michael FranklinJune 2007
Conclusions• Two relevant trends:
• Metcalf’s Law DB systems need to become more network-savvy.
• Jim Gray and others have helped demonstrate the value of SQL to science.
• Stream query processing is where these two trends meet in the Grid world.• A new (3rd) component of data management
infrastructure.
• Lots of open research problems for the HPDC (and DB) community.
Michael FranklinJune 2007
Resources• Research Projects @ Berkeley
• TelegraphCQ - single-site stream processor• HiFi - Distributed/Hierarchical see www.cs.berkeley.edu/~franklin for
links/papers
• Good jumping off point for CEP and related info: www.complexevents.com
• The company: www.truviso.com