The Stanford Data The Stanford Data Streams Research Project Streams Research Project Profs. Rajeev Motwani & Jennifer Widom And a cast of full- and part-time students: Arvind Arasu, Brian Babcock, Shivnath Babu, Mayur Datar, Gurmeet Manku, Liadan O’Callaghan, Justin Rosentein, Qi Sun, Rohit Varma
The Stanford Data Streams Research Project. Profs. Rajeev Motwani & Jennifer Widom And a cast of full- and part-time students: Arvind Arasu, Brian Babcock, Shivnath Babu, Mayur Datar, Gurmeet Manku, Liadan O’Callaghan, Justin Rosentein, Qi Sun, Rohit Varma st anfordst re amdat am anager. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The Stanford Data Streams The Stanford Data Streams Research ProjectResearch Project
Profs. Rajeev Motwani & Jennifer Widom
And a cast of full- and part-time students:Arvind Arasu, Brian Babcock, Shivnath Babu,
Data StreamsData Streams• Traditional DBMS -- data stored in finite,
persistent data setsdata sets
• New applications -- data as multiple, continuous, rapid, time-varying data streamsdata streams– Network monitoring and traffic engineering– Security applications– Telecom call records– Financial applications– Web logs and click-streams– Sensor networks– Manufacturing processes
Query Example 3Query Example 3• Find total connection time for each caller
(relational grouping and aggregation)SELECT O.caller, sum(O.end – O.start)FROM Outgoing OGROUP BY O.caller
• Cannot provide result in (append-only) stream
stanfordstreamdatamanager 20
Project GoalProject Goal Reconsider all aspects of data management
and processing in presence of data streams
stanfordstreamdatamanager 21
Remainder of TalkRemainder of Talk• Data stream model
• Queries over data streams– Language, semantics, evaluation & optimization
• DSMS query processing architecture and system internals
• Results to date
• Ongoing work
• Related work
stanfordstreamdatamanager 22
Data ModelData Model• Database: relations + data streamsrelations + data streams
• Stream characteristics– Type of data (schema)– Data distribution– Flow rate– Stability of distribution and flow– Ordering and other constraints– Synchronization of multiple streams– Distributed streams
Some Results to DateSome Results to Date• Algorithms on data streams
– Online clustering [FOCS 2000, ICDE 2002]
– Online quantiles [SIGMOD 98, SIGMOD 99]
– Statistics over sliding windows [SODA 2002]
– Online frequency counting
• Theory of stream query processing– Memory requirements of stream queries [PODS02]
• System design– STREAMSTREAM: stanfordstreamdatamanager
stanfordstreamdatamanager 33
STREAM System ImplementationSTREAM System Implementation• Comprehensive DSMS query processor
• Broad suite of operators and synopses
• Sophisticated “developer’s workbench” interface– Submit queries in extended SQL or algebra– Submit or edit query plans in XML or GUI– Query plan execution visualizer– On-the-fly modification of memory allocation,
scheduling policies, etc.
stanfordstreamdatamanager 34
Ongoing WorkOngoing Work• Algebra for streams
• Synopses and algorithmic issues
• Memory management issues
• Exploiting constraints on streams
• Approximation in query processing
• Distributed stream processing
• System development
stanfordstreamdatamanager 35
Ongoing WorkOngoing Work• Algebra for streams
• Synopses and algorithmic issues
• Memory management issues
• Exploiting constraints on streams
• Approximation in query processing
• Distributed stream processing
• System development
stanfordstreamdatamanager 36
Ongoing Work -- ConstraintsOngoing Work -- Constraints• Exploiting constraints on streams in query