Data Streams, Data Streams, Message Brokers, Message Brokers, Sensor Nets, Sensor Nets, and Other and Other Strange Strange Places to Run Places to Run Database Queries Database Queries Michael Franklin Michael Franklin UC Berkeley UC Berkeley July 2003 July 2003
57
Embed
Data Streams, Message Brokers, Sensor Nets, and Other Strange Places to Run Database Queries Michael Franklin UC Berkeley July 2003.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Data Streams, Data Streams, Message Brokers, Message Brokers, Sensor Nets, and Other Sensor Nets, and Other StrangeStrange Places to Run Places to Run Database QueriesDatabase Queries
Michael FranklinMichael FranklinUC BerkeleyUC Berkeley
July 2003July 2003
Data EverywhereData Everywhere
Increasingly ubiquitous networking at all scales. ad hoc sensor nets, wireless, global Internet
Explosion in numbernumber, typestypes, and locationslocations of data sources and sinks. mobile devices, P2P networks, data centers
Emerging software infrastructure to put it all together. pub/sub, XML, web services, …
Data Management in a Data Management in a Networked WorldNetworked World
Data is thethe crucial resource for emerging networked applications.
Database techniques are all about data organization and access. They can be adapted for network-centric environments. In particular, query processingquery processing can play a central role in
a number of non-traditional settings.
““When processing, storage, and transmission cost When processing, storage, and transmission cost micro-dollars, the the only real value is the data and its micro-dollars, the the only real value is the data and its organization.”organization.” (Jim Gray’s 1998 Turing Award Paper)
Networked Data Management Networked Data Management Projects @UCB-DB GroupProjects @UCB-DB Group
GridDB - Relational interaction model for Scientific Grid Computing. [SIGMOD 03 Demo]
MobiScopeMobiScope - Distributed processing for Location-based Services [MDM 03]
PIERPIER - P2P Data Management [VLDB 03]
TelegraphCQTelegraphCQ - Adaptive Dataflow Processing for Data Streams. [CIDR 03; SIGMOD 03 Demo]
TinyDBTinyDB - Sensor Networks for environmental monitoring [OSDI 02;SIGMOD 03]
YFilterYFilter - XML Message Brokering [ICDE 02 Demo; VLDB 03]
Issues: operator placement, data placement, physical operators, caching, replication, synchronization,…
Beyond Emps and DeptsBeyond Emps and Depts
In emerging networked data environments, queries can also be used for: Monitoring Real-time Analysis Actuation Routing Transformation Service Composition Definition,Naming, and Access Rights
New QP ScenariosNew QP Scenarios
Sensor Networks Message Brokers Data Streams Information/Application Integration
New QP ScenariosNew QP Scenarios
Sensor NetworksSensor Networks Message Brokers Data Streams Information/Application Integration
Many sensor network applications can be described using Many sensor network applications can be described using query language primitives.query language primitives. Potential for tremendous reductions in development and
debugging effort.
Aggregation Query ExampleAggregation Query Example
Epoch region CNT(…) AVG(…)
0 North 3 360
0 South 3 520
1 North 3 370
1 South 3 520
“Count the number occupied nests in each loud region of the island.”
SELECT region, CNT(occupied) AVG(sound)
FROM sensors
GROUP BY region
HAVING AVG(sound) > 200
EPOCH DURATION 10sRegions w/ AVG(sound) > 200
A
B C
D
FE
Sensor Queries @ 10000 FtSensor Queries @ 10000 Ft
Insight: Root can provide information that will suppress readings that cannot affect the final aggregate value. E.g. Tell all the nodes that the MIN is definitely < 50;
nodes with value ≥ 50 need not participate. Depends on monotonicity
How is hypothesis computed? Blind guess Statistically informed guess Observation over first few levels of tree / rounds of aggregate
Sensor Networks Message Brokers Data Streams Information/Application Integration
New QP ScenariosNew QP Scenarios
Sensor Networks Message BrokersMessage Brokers Data Streams Information/Application Integration
Web Services/Message BrokersWeb Services/Message Brokers•A platform for dynamic, loosely-coupleddynamic, loosely-coupled integration of enterprise applications and data.•Interaction accomplished through exchange of messages in the wide area.
(e.g., Adam Bosworth’s VLDB 02 keynote: http://www.cs.ust.hk/vldb2002/VLDB2002-proceedings/slides/S01P01slides.pdf)
The challenge is to efficiently and quickly match incoming XML documents against the potentially huge set of user profiles.
XQuery-based SubscriptionsXQuery-based SubscriptionsA query consists of a constant tag and an FLWR
expression A for clause: a variable and a path expression An optional where clause: conjunctive predicates A return clause: interleaved constant tags and path
expressions where and return clause paths are relativerelative<sections>{for $s in document(“doc.xml”)//section where $s//figure/title = “XML processing” return <section>
For large-scale systems, shared processingshared processing is essential.
YFilter uses an NFA-based approach to share path matching work among queries.
Location steps
/a
//a
/*
//*
NFA fragments
a
*a
*
**
Constructing a Query NFAConstructing a Query NFA
Concatenate NFA fragments for location steps in a path expression.
/a a
//b*a
Query “/a//b”
a *b
Constructing the Combined NFA Constructing the Combined NFA
a
{Q1}
b
Q1=/a/bQ2=/a/cQ3=/a/b/c
Q4=/a//b/c
Q5=/a/*/b
Q6=/a//c
Q7=/a/*/*/c
Q8=/a/b/c
a {Q2}
c
c {Q3}
{Q4}c
b*
*c {Q5}
c {Q6}
* c{Q7}
{Q3, Q8}
NFA ExecutionNFA Execution
read <a>
21
match Q1
read <b>
3
21
match Q3 Q8
read <c>
5
3 9 7 6
21
read </c>
3 9 7 6
21
read </b>
21
read </a>
1
initial
1
Runtime Stack
NFA
An XML fragment <a> <b> <c> </c> </b> </a>
c
cb
{Q1}
{Q3, Q8}
{Q2} {Q4}
{Q6}
{Q5}{Q7}
a *
c
c
* c
c
*
b
1
4
3 5
8
6
12
10
27
11
13
9
9 7
6 10128 11 6
Q5 Q6Q4
Performance EvaluationPerformance Evaluation
0
200
400
600
800
0 50 100 150
Number of Queries (x1000)
MQ
PT
(m
s)
xfilter(lb)
hybrid
yfilter
Varying number of distinct queries (NITF, D=6, W=0.2, //=0.2)
With YFilter, path matching is no longer the dominant cost!With YFilter, path matching is no longer the dominant cost!
YFilter: prefix sharing
XFilter (list balance): no sharing
Hybrid approach: share substrings containing ‘/’ only
• YFilter is significantly faster (around 30 ms for 150K queries)
• Parsing not included: Xerces (168 ms) Java XML Pack (141 ms) Saxon (86 ms).
Message TransformationMessage Transformation Change YFilter to output streams of “path tuples”.
Each path tuple contains a sequence of node ids representing the elements that matched the path.
This output is post-processed using relational-style operators to produce customized messages.
Three approaches (differ in the extent to which they push work to the engine) PathSharing-FPathSharing-F: For clause paths only PathSharing-FWPathSharing-FW: For & Where clause paths PathSharing-FWRPathSharing-FWR: For, Where & Return
Inherent tension between path sharing and result customization!
Message Broker – Wrap UpMessage Broker – Wrap UpSharing is the key to performance
NFA provides excellent scalability/performance PathSharing-FWR performs best, when combined with
optimizations based on the queries and DTD. When the post-processing is shared, even more scalability
can be achieved. This sharing is facilitated by using relational-like query plans.
On-going work - How to deploy in the wide area?: Distributed Filtering and Content Delivery Network
Combining distributed query processing and state-of-the-art application-level multicast protocols.
What semantics can/should be provided?
For more information see: www.cs.berkeley.edu/~daioyl/yfilter
New QP ScenariosNew QP Scenarios
Sensor Networks Message Brokers Data Streams Information/Application Integration
New QP ScenariosNew QP Scenarios
Sensor Networks Message Brokers Data StreamsData Streams Information/Application Integration
Monitoring (2) : Data StreamsMonitoring (2) : Data Streams Streaming Data
Network monitors news feeds stock tickers
B2B and Enterprise apps Supply-Chain, CRM Trade Reconciliation, Order Processing etc.
(Quasi) real-time flow of events and data Must manage these flows to drive business (and
other) processes. Mine flows to create and adjust business rules. Can also “tap into” flows for on-line analysis.
TelegraphCQ OverviewTelegraphCQ Overview An adaptive system for large-scale shared
dataflow processing.
Based on an extensible set of operators:1) IngressIngress (data access) (data access) operators
•A generalization of the symmetric hash join (n-way)•SteMs maintain intermediate state for multiple joins.•Use Eddy to route tuples through the necessary modules.•SteMs + Eddy reduce need for optimizer, increasing adaptivity in volatile streaming environments.