Top Banner
HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04
43

HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

HiFi Systems: Network-Centric Query

Processing for the Physical World

Michael Franklin

UC Berkeley

2.13.04

Page 2: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Introduction

Continuing improvements in sensor devices

– Wireless motes

– RFID

– Cellular-based telemetry Cheap devices can monitor the environment at

a high rate.

Connectivity enables remote monitoring at many different scales.

Widely different concerns at each of these levels and scales.

Page 3: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Plan of Attack

Motivation/Applications/Examples Characteristics of HiFi Systems Foundational Components

– TelegraphCQ– TinyDB

Research Issues Conclusions

Page 4: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

The Canonical HiFi System

Page 5: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

RFID - Retail Scenario

“Smart Shelves” continuously monitor item addition and removal.

Info is sent back through the supply chain.

Page 6: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Manufacturer C

Retailer A

“Extranet” Information Flow

Manufacturer D

Retailer B

Aggregation/ Distribution

Service

Page 7: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

M2M - Telemetry/Remote Monitoring

Energy Monitoring - Demand Response

Traffic Power Generation Remote Equipment

Page 8: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Time-Shift Trend Prediction

National companies can exploit East Coast/ West Coast time differentials to optimize West Coast operations.

Page 9: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Virtual Sensors

Sensors don’t have to be physical sensors. Network Monitoring algorithms for detecting viruses,

spam, DoS attacks, etc. Disease outbreak detection

Page 10: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Properties

High Fan-In, globally-distributed architecture.

Large data volumes generated at edges.– Filtering and cleaning must be done there.

Successive aggregation as you move inwards.– Summaries/anomalies continually, details later.

Strong temporal focus. Strong spatial/geographic focus. Streaming data and stored data. Integration within and across enterprises.

Page 11: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

One View of the Design Space

Filtering,Cleaning,Alerts

Monitoring,Time-series

Data mining(recent history)

Archiving(provenanceand schemaevolution)

On-the-flyprocessing

Disk-basedprocessing

CombinedStream/DiskProcessing

TimeScale

seconds years

Page 12: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Another View of the Design Space

Filtering,Cleaning,Alerts

Monitoring,Time-series

Data mining(recent history)

Archiving(provenanceand schemaevolution)

GeographicScope

local global

SeveralReaders

RegionalCenters

CentralOffice

Page 13: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

One More View of the Design Space

Filtering,Cleaning,Alerts

Monitoring,Time-series

Data mining(recent history)

Archiving(provenanceand schemaevolution)

Degree of Detail Aggregate

Data VolumeDup Elimhistory: hrs

Interesting Eventshistory: days

Trends/Archivehistory: years

Page 14: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Building Blocks

TelegraphCQTinyDB

Page 15: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

TelegraphCQ: Monitoring Data Streams

Streaming Data– Network monitors– Sensor Networks– News feeds– Stock tickers

B2B and Enterprise apps– Supply-Chain, CRM, RFID– Trade Reconciliation, Order Processing etc.

(Quasi) real-time flow of events and data Must manage these flows to drive business

(and other) processes. Can mine flows to create/adjust business

rules or to perform on-line analysis.

Page 16: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

TelegraphCQ (Continuous Queries)

An adaptive system for large-scale shared dataflow processing.

Based on an extensible set of operators:1) IngressIngress (data access) (data access) operators

Wrappers, File readers, Sensor Proxies2) Non-Blocking Data processingData processing operators

Selections (filters), XJoins, …

3) Adaptive RoutingAdaptive Routing Operators Eddies, STeMs, FLuX, etc.

Operators connected through “Fjords”– queue-based framework unifying push&pull.– Fjords will also allow us to easily mix and match

streaming and stored data sources.

Page 17: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Extreme Adaptivity

This is the region that we are exploring in the Telegraph project.

???Dynamic,

Parametric,Competitive,

staticplans

latebinding

inter-operator

per tupl

ecurrentDBMS

Query Scrambling,MidQuery

Re-opt

Eddies,CACQ

XJoin, DPHJConvergent

QP

???

PSoup

intra-operator

Traditional query optimization depends on statistical knowledge of the data and a stable environment.

The streaming world has neither.

Page 18: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Adaptivity Overview [Avnur & Hellerstein 2000]

• How to order and reorder operators over time?

– Traditionally, use performance, economic/admin feedback

– won’t work for never-ending queries over volatile streams

• Instead, use adaptive record routing.

Reoptimization = change in routing policy

staticdataflow

A B

C

D

eddy

A B C D

Page 19: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

The TelegraphCQ Architecture

TelegraphCQ Wrapper

ClearingHouse

Wrappers

Proxy

TelegraphCQ Front End

Planner Parser Listener

Mini-Executor

Catalog

Query Plan Queue

Eddy Control Queue

Query Result Queues

}

Shared Memory

Shared Memory Buffer Pool

Disk

Split

TelegraphCQBack End

Modules

Scans

CQEddySplit

Split

TelegraphCQBack End

Modules

Scans

CQEddy

A single CQEddycan encode multiplequeries.

Page 20: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

The StreaQuel Query Language

SELECT projection_list

FROM from_list

WHERE selection_and_join_predicates

ORDEREDBY

TRANSFORM…TO

WINDOW…BY

Target language for TelegraphCQ

Windows can be applied to individual streams Window movement is expressed using a “for loop construct in

the “transform” clause We’re not completely happy with our syntax at this point.

Page 21: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

Example Window Query: Landmark

0 105 15 20 25 30 35 40 45 50 55 60

NOW = 40 = t

TimelineSTWindow

TimelineSTWindow

TimelineSTWindow

TimelineSTWindow

NOW = 41 = t

...

...

NOW = 45 = t

NOW = 50 = t

Page 22: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

Current Status - TelegraphCQ System developed by modifying PostgreSQL.

Initial Version released Aug 03 – Open Source (PostgreSQL license)– Shared joins with windows and aggregates– Archived/unarchived streams– Next major release planned this summer.

Initial users include– Network monitoring project at LBL (Netlogger)– Intrusion detection project at Eurecom (France)– Our own project on Sensor Data Processing– Class projects at Berkeley, CMU, and ???

Visit http://telegraph.cs.berkeley.edu for more information.

Page 23: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Query-based interface to sensor networks

Developed on TinyOS/Motes Benefits

– Ease of programming and retasking

– Extensible aggregation framework

– Power-sensitive optimization and adaptivity

Sam Madden (Ph.D. Thesis) in collaboration with Wei Hong (Intel).

http://telegraph.cs.berkeley.edu/tinydb

SELECT MAX(mag) FROM sensors WHERE mag > threshSAMPLE PERIOD 64ms

App

Sensor Network

TinyDB

Query, Trigger

Data

Page 24: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

Declarative Queries in Sensor Nets

SELECT nestNo, lightFROM sensorsWHERE light > 400EPOCH DURATION 1s

EpochEpoch nestNonestNo LightLight TempTemp AccelAccel SoundSound

0 1 455 x x x

0 2 389 x x x

1 1 422 x x x

1 2 405 x x x

Sensors

“Report the light intensities of the bright nests.”

EpochEpoch nestNonestNo LightLight TempTemp AccelAccel SoundSound

0 1 455 x x x

0 2 389 x x x

Many sensor network applications can be described using query Many sensor network applications can be described using query

language primitives.language primitives.

– Potential for tremendous reductions in development and debugging effort.

Page 25: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

Aggregation Query Example

Epoch region CNT(…) AVG(…)

0 North 3 360

0 South 3 520

1 North 3 370

1 South 3 520

“Count the number occupied nests in each loud region of the island.”

SELECT region, CNT(occupied) AVG(sound)

FROM sensors

GROUP BY region

HAVING AVG(sound) > 200

EPOCH DURATION 10sRegions w/ AVG(sound) > 200

Page 26: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Query Language (TinySQL)

SELECT <aggregates>, <attributes>[FROM {sensors | <buffer>}][WHERE <predicates>][GROUP BY <exprs>][SAMPLE PERIOD <const> | ONCE][INTO <buffer>][TRIGGER ACTION <command>]

Page 27: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

A

B C

D

FE

Sensor Queries @ 10000 Ft

Query

{D,E,F}

{B,D,E,F}

{A,B,C,D,E,F}

Written in SQLWith Extensions For :

•Sample rate

•Offline delivery

•Temporal Aggregation

(Almost) All Queries are Continuous and Periodic

M. Franklin, UC Berkeley, Feb. 04

Page 28: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

In-Network Processing: Aggregation

1 2 3 4 5

4

3

2

1

4

1

2 3

4

5

Sensor #

Inte

rval #

Interval 4SELECT COUNT(*) FROM sensors

Epoch

Page 29: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

In-Network Processing: Aggregation

1 2 3 4 5

4 1

3

2

1

4

1

2 3

4

5

1

Sensor #

Inte

rval #

Interval 4SELECT COUNT(*) FROM sensors

Epoch

Page 30: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

In-Network Processing : Aggregation

1 2 3 4 5

4 1

3 2

2

1

4

1

2 3

4

5

2

Sensor #

Interval 3SELECT COUNT(*) FROM sensors

Inte

rval #

Page 31: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

In-Network Processing : Aggregation

1 2 3 4 5

4 1

3 2

2 1 3

1

4

1

2 3

4

5

31

Sensor #

Interval 2SELECT COUNT(*) FROM sensors

Inte

rval #

Page 32: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

In-Network Processing : Aggregation

1 2 3 4 5

4 1

3 2

2 1 3

1 5

4

1

2 3

4

5

5

Sensor #

SELECT COUNT(*) FROM sensors Interval 1

Inte

rval #

Page 33: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

In-Network Processing : Aggregation

1 2 3 4 5

4 1

3 2

2 1 3

1 5

4 1

1

2 3

4

5

1

Sensor #

SELECT COUNT(*) FROM sensors Interval 4

Inte

rval #

Page 34: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

In Network Aggregation: Example Benefits

2500 Nodes

50x50 Grid

Depth = ~10

Neighbors = ~20

M. Franklin, UC Berkeley, Feb. 04

Total Bytes Xmitted vs. Aggregation Function

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

EXTERNAL MAX AVERAGE COUNT MEDIANAggregation Function

To

tal B

yte

s X

mit

ted

Page 35: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Taxonomy of Aggregates

TinyDB insight: classify aggregates according to various functional properties

– Yields a general set of optimizations that can automatically be applied

Property Examples Affects

Partial State MEDIAN : unbounded, MAX : 1 record

Effectiveness of TAG

Duplicate Sensitivity

MIN : dup. insensitive,AVG : dup. sensitive

Routing Redundancy

Exemplary vs. Summary

MAX : exemplaryCOUNT: summary

Applicability of Sampling, Effect of Loss

Monotonic COUNT : monotonicAVG : non-monotonic

Hypothesis Testing, Snooping

Page 36: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

Current Status - TinyDB System built on top of TinyOS (~10K lines embedded

C code)Latest release 9/2003 Several deployments including redwoods at UC

Botanical Garden

Visit http://telegraph.cs.berkeley.edu/tinydb for more information.

36m33m: 11132m: 110

30m: 109,108,107

20m: 106,105,104

10m: 103, 102, 101

Temperature vs. Time

8

13

18

23

28

33

7/7/039:40

7/7/0313:41

7/7/0317:43

7/7/0321:45

8/7/031:47

8/7/035:49

8/7/039:51

8/7/0313:53

8/7/0317:55

8/7/0321:57

9/7/031:59

9/7/036:01

9/7/0310:03

Date

Tem

pera

ture

(C

)

Humidity vs. Time

35

45

55

65

75

85

95

Rel H

um

idit

y (

%)

101 104 109 110 111

Page 37: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Putting It All Together?

TelegraphCQTinyDB

Page 38: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Ursa - A HiFi Implementation

Current effort towards building an integrated infrastructure that spans the large scale in:– Time– Geography– Resources

Ursa-Minor

(TinyDB-based)Ursa-Major

(TelegraphCQ w/Archiving)

Mid-tier

(???)

Page 39: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

TelegraphCQ/TinyDB Integration

Fjords [Madden & Franklin 02] provide the dataflow plumbing necessary to use TinyDB as a data stream.

Main issues revolve around what to run where.– TCQ is a query processor– TinyDB is also a query processor– Optimization criteria include: total cost,

response time, answer quality, answer likelihood, power conservation on motes, …

Project on-going, should work by summer. Related work: Gigascope work at AT&T

Page 40: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

TCQ-based Overlay Network

TCQ is primarily a single node system– Flux operators [Shah et al 03] support cluster-based

processing.

Want to run TCQ at each internal node. Primary issue is support for wide-area

temporal and geographic aggregation.– In an adaptive manner, of course

Currently under design. Related work: Astrolabe, IRISNet, DBIS, …

Page 41: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Querying the Past, Present, and Future

Need to handle archived data– Adaptive compression can reduce processing

time.– Historical queries– Joins of Live and Historical Data– Deal with later arriving detail info

Archiving Storage Manager - A Split-stream SM for stream and disk-based processing.

Initial version of new SM running. Related Work: Temporal and Time-travel DBs

Page 42: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

XML, Integration, and Other Realities

Eventually need to support XML Must integrate with existing enterprise apps.

In many areas, standardization well underway Augmenting moving data

Related Work: YFilter [Diao & Franklin 03], Mutant Queries [Papadimos et al. OGI], 30+ years of data integration research, 10+ years of XML research, …

High Fan-in High Fan-out

Page 43: HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04.

M. Franklin, UC Berkeley, Feb. 04

Conclusions

Sensors, RFIDs, and other data collection devices enable real-time enterprises.

These will create high fan-in systems.

Can exploit recent advances in streaming and sensor data management.

Lots to do!