This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Continuous Data Stream Continuous Data Stream ProcessingProcessing
Music Virtual Channel – extensionsData Stream Monitoring – tree pattern miningContinuous Query Processing – sequence queries
Music Virtual Channel Music Virtual Channel Extensions Extensions
…11
NN
22
…
Music collections
Internet V.C.player
V.C.player Filtering
engineFilteringengine
Music channel simulat
or
Music channel simulat
or
InterfaceInterface
ProfilemonitorProfile
monitorClustermonitorClustermonitor
ChannelmonitorChannelmonitor
FavoritechannelFavoritechannel
Clustercoordinator
Clustercoordinator
Peer searchengine
Peer searchengine
Profiledatabase
Profiledatabase
MusicXML
database
MusicXML
database
XML Filteringengine
XML Filteringengine
Continuous Data Stream Management
33
An Extension on Virtual ChannelAn Extension on Virtual Channel
After a player starts a rangerange (or kNNkNN) search, It updates its profile periodically The search results are continuously maintained
V.C. player(query)
0%
10%
20%
30%
40%
50%
POP BLUE ROCK LATIN JAZZ DANCE0%
10%
20%
30%
40%
50%
POP BLUE ROCK LATIN JAZZ DANCE
V.C. player(peer)
0%
10%
20%
30%
40%
50%
POP BLUE ROCK LATIN JAZZ DANCE0%
10%
20%
30%
40%
50%
POP BLUE ROCK LATIN JAZZ DANCE0%
10%
20%
30%
40%
50%
POP BLUE ROCK LATIN JAZZ DANCE
Continuous Data Stream Management
44
An Extension on Virtual ChannelAn Extension on Virtual Channel
Compared with the clustering engine A flexible definition of “clusters” Update is more natural than insertion/deletion No need of parameter setting and re-clustering Indexing can relieve the pain of frequent update
Compared with the problem of moving objects Movements in a high-dimensional feature space In most cases every object is also a query Prediction of object movement is possible
Continuous Data Stream Management
55
When a music piece is played on a channel, The corresponding musicXML file can be obtained A query can be a portion of musicXML or XQuery
An Extension on Favorite ChannelAn Extension on Favorite Channel
Continuous Data Stream Management
66
An Extension on Favorite ChannelAn Extension on Favorite Channel
Compared with query segments More musical semantic in a query Do not interfere the music playback Matching on complex tree-structures
• Common subquery is still useful
Continuous Data Stream Management
77
Research IssuesResearch Issues
Peer Search Engine An indexing method to support continuous query An indexing method to support continuous query
processing for high-dimensional moving objectsprocessing for high-dimensional moving objects A prediction-based bounding mechanism to reduce
the frequency of profile updateXML Filtering Engine
An online method to enable tree pattern mining An online method to enable tree pattern mining over a data streamover a data stream
An indexing mechanism to support XML filtering
Discovering Frequent Tree Discovering Frequent Tree Patterns over Data StreamsPatterns over Data Streams
Submitted for publication
Continuous Data Stream Management
99
Problem DefinitionProblem Definition
As the query trees stream in, find out the subtrees which occur more then θ·N times, where N is the number of trees received so far and 0≦θ 1≦
STMerSTMer
Frequent Tree Patterns
T1 T3 T2
Continuous Data Stream Management
1010
Problem Definition (Cont.)Problem Definition (Cont.)
Labeled ordered treeInduced subtree
B
D C
differs fromB
C D
A
B E
C D
Tree pattern Query Tree
Continuous Data Stream Management
1111
An ExampleAn Example
Given θ = 0.6
Frequent Tree Patterns (occurrence > 0.6*1) :
STMerSTMer
A
B C
A
B CA B C
A
B
A
C
Frequent Tree Patterns (occurrence > 0.6*2) :
B
B
D E
Frequent Tree Patterns (occurrence > 0.6*3) :
A BA
B
A
B F
Continuous Data Stream Management
1212
Main DifficultiesMain Difficulties
The properties of data streams: One pass Traditional tree mining methods fail Fast input rate Efficiency issue is critical Incremental An incremental algorithm is
required Unbounded Approximate counting is needed
Continuous Data Stream Management
1313
An Overview of Our MethodAn Overview of Our Method