A Generalized Multidimensional Index Structure for Multimedia Data to Support Content-Based Similarity Searches in a Collaborative Environment Kasturi Chatterjee Distributed Multimedia Information Systems Laboratory School of Computing and Information Sciences Florida International University
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A Generalized Multidimensional Index Structure for Multimedia Data to Support Content-Based Similarity Searches in a
Collaborative Environment Kasturi Chatterjee
Distributed Multimedia Information Systems Laboratory
School of Computing and Information SciencesFlorida International University
Committee Members
• Dr. Shu-Ching Chen (Advisor)• Dr. Jainendra K. Navlakha• Dr. Xudong He• Dr. Keqi Zhang• Dr. Mei-Ling Shyu
2
Acknowledgment
School of Computing and Information Sciences
Continuing Graduate Assistantship (GA, RA)Awards recognizing research
Florida International UniversityDissertation Year FellowshipTravel Grants (GSA)
Members of DMIS Lab
SCIS staffs Special thanks to Olga 3
Outline
i. Motivationii. Contributions
a. Generalized Index Structureb. Query Refinementc. Visualizing & Analyzing
Multimedia Semantic Relationships in Collaborative Environments
SELECT studentName FROM table WHERE studentAge > 20 AND studentMajor = ‘Computer Science’;
SELECT image FROM table WHERE red ‘is-close-to’ 0.245 AND black ‘is-close-to’ 0.356 AND red-yellow ‘is-close-to’ 0.5672 AND …….. AND semanticInterpretation = ‘something’….etc.
A Typical Query
Traditional alpha-numeric queries
Multimedia queries
Existing DBMS Framework not
suitable to handle such data type!
11
What is missing?
i. Suitable data organization (index structure)
ii.Suitable query handling
iii. Suitable handling of semantic contents
Communication Manager
Application Front Ends
SQL Interface
SQL Compiler/Interpreter
Query Evaluation Engine
Query Optimizer
Query Processor
Query Evaluator
Catalog Manager
Transaction Manager
Lock manager
Buffer Manager
Access Structure Manager
Index Structure
Recovery Manager
Storage Manager
Index Access
12
Outline
i. Motivationii. Contributions
a. Generalized Index Structureb. Query Refinementc. Visualizing & Analyzing
Multimedia Semantic Relationships in Collaborative Environments
iii. Discussionsiv. Future Direction
13
Generalized Index StructureGeM-Tree [chat09c]
i. Provide a single framework to manage different types of multimedia data
separate index structures for different data types are inefficient
to embed into the database kernel
Expectations
14
Generalized Index StructureGeM-Tree
ii. Accommodate varied Multidimensional Representation
Expectations
existing index structures for
database kernels are mostly single-dimensional
existing multidimensional index
structures cannot handle retrieval requirements of multimedia dataplethora of feature
representations call for a flexible structure 15
Generalized Index StructureGeM-Tree
iii.Accommodate CBR of individual data type along with concept retrievals involving cross-similarity between multimedia data
query handling need to consider low-level
features & semantic-information
Expectations
existing index structures cannot handle such retrieval approaches
16
What has been done so far
First generation index structures
B-Tree [1]• tree-based
index structure • single-
dimensional• currently used
in relational databases
Multi-dimensional index structures
Feature-Based• feature
space indexed based on feature dimension
• KDB-Tree [2], R-Tree[3], Hybrid-Tree[4]
Distance-Based• metric-space
formed from the distances between data objects is indexed
• M-Tree [5], VP-Tree[6] 17
KDB-Tree
3 4 7 8 F I
G H J K N
D A
L O C M
E B
T P Q1 2 5 6 S R
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 34 5 6 7 8
D E A B C K L M N O S T P Q RF G H I J
18
VP-Tree
Data Space Partition for VP-Tree
(A,B,C,D) closest to V
(E,F,G,H) next close
(I,J,K) farthest
I
V
E
H
J
G F
K
B
A C
D
19
Issues?
Semantic Information during CBRFeature-Based
IndexesDistance-Based Indexes
low-level feature values correlated to semantics
no existing semantics capturing model embedded into search queries
Cannot handle the semantic
gap! Different data types none designed for handling videos/documents
Seamless solutionnone designed to handle multiple data types from a single framework
Cannot handle context-based
/cross-similarity
based retrievals
20
GeM-Treehow does it accomplish the
goals?Expectation I
Using a data-signature to represent multimedia data
Accommodate CBR of individual data type along with concept retrievals involving cross-similarity between multimedia datadata-signature + EMD + Affinity
Relationship[8][9] a stochastic construct
called Markov Model Mediator [12]
extended into HMMM for videos
determines the closeness of two multimedia objects (affinity) by following the access patterns
“more frequently two objects are accessed together, greater is their semantic closeness/affinity”
23
How GeM-Tree supports CBR
Range Search: select all the appropriate database objects within a given range from the queryk-NN Search: search the entire database to select k database objects most similar to the query
filtered nodes to original nodes to maximize overall similarity
Gra
ph L
ayout Generate
the representative graph
45
Detailed AlgorithmN
ode F
ilteri
ng Pick
nodes based on network structure/user choice
Sample nodes to capture overall network characteristics
Select nodes representing different groups in the network
Random sampling approaches which preserve the distribution
Step 1
46
Detailed Algorithm
Step 2
Dete
rmin
e N
od
e
Metr
icCalculate structural and semantic metric
• Adjacency Matrices: edge source & edge terminus
Structural metrics
• A matrix of scores of different centrality values
Semantic metrics
47
Detailed Algorithm
Step 3
Sim
ilari
ty
Calc
ula
tionCalculate
structural & semantic similarity
• Coupled node-edge score [11]
Structural similarity
• Euclidean distance between semantic values
Semantic metrics
( ) ( ) ( ) ( )
( ) , ( ) ( ) , ( )
( ) ( 1) ( 1)
( ) ( 1) ( 1)
ij s i s j t i t j
ij kl klt k i t l j s k i s l j
y k x k x k
x k y k y k
48
Detailed AlgorithmN
ode
Ass
ign
men
tAssign filtered nodes to original nodes to maximize overall similarity
• Pick up m nodes from the set of n nodes which maximizes the total similarity score between the original graph and the sub-graph formed
• Assignment Problem applying Munkres Algorithm
Hungarian Algorithm
Step 4
49
Detailed Algorithm
Step 5
Gra
ph
Layou
t Generate the representative graph
• Preserve the ties between nodes
• Consider the overall reach/strength of each node
Shortest Path Approach
Connect node i and j with edgei,j if thresholdSPMax
SP
kji
ji )( ,
,
50
Evaluation
• Overall structural comparison• Degree of similarity between connected
nodes (dyads)• Using Euclidean distance between the
centrality valuesWhat is Centrality? [10]
• Centrality measures the power/importance of a node with respect to the entire network it belongs to
• Measure of holistic behavior of a node
M
kjkik
M
kjkik
cc
ccEc
1
2
1
2
max
1
51
Generated Previews
low error value ~ 0.02
52
How is the Multimedia Data Network utilized ?
• identify mutual relationships and role of a particular multimedia data object in a database
• design decisions of operations of the index structures
Index structure is built on ONLY the low-level features
Semantic relationship was introduced during querying
No existing insertion policies consider the semantic information stored in a data object
Semantic relationships
can be introduce into the indexed
metric space!53
Insertion policies
Use degree centrality
degree centrality is defined as the number of links incident upon a node (i.e., the number of ties that a node has)
For a Multimedia Data Network, degree centrality identifies the power/importance of a particular data object in the entire networknode 1 node 2
image to be inserted
higher centrality insert
54
Deletion policies
Any delete request from the users is entertained
That the user and hence the data might belong to a collaborative environment is not considered
Current Status
55
Deletion policies
Use betweenness centrality
betweenness centrality is defined as the number of vertices that connect via a particular node
For a delete request, if betweenness centrality of the node is high, ask the user to reconsiderSeveral other
decisions can be taken based
on such analysis!
56
Outline
i. Motivationii. Contributions
a. Generalized Index Structureb. Query Refinementc. Visualizing & Analyzing
Multimedia Semantic Relationships in Collaborative Environments
iii. Discussionsiv. Future Direction
57
Assumptions and Limitations
• Assumed that features used for indexing represent the multimedia data well
• Accuracy calculations are not quantitative and it may vary from person to person
• Can handle only Numeric Data• Only Soccer videos were used as test bed,
other domains were not checked
58
Outline
i. Motivationii. Contributions
a. Generalized Index Structureb. Query Refinementc. Visualizing & Analyzing
Multimedia Semantic Relationships in Collaborative Environments
iii. Discussionsiv. Future Direction
59
Future Direction
• Intelligent multimedia index structure optimizer
• Document indexing• Support traditional alpha-numeric data• Query optimizer for multimedia database• Multimedia data management framework for Collaborative Applications
60
PublicationsJournals & Book Chapters
i. [chat10] Kasturi Chatterjee, Shixia Liu, Shu-Ching Chen, “Social Network Preview using Graph Similarity,” (submitted to ACM Transactions on Information Systems), 2010.
ii. [chat09a] Kasturi Chatterjee, S. Masoud Sadjadi, Shu-Ching Chen, “A Distributed Multimedia Data Management over Grid,” Multimedia Services in Intelligent Environments – Integrated Systems, 2009 (in press).
iii. [chat09b] Kasturi Chatterjee, Shu-Ching Chen, “HAH-tree: Towards a Multidimensional Index Structure Supporting Different Video Modeling Approaches in a Video Database Management System,” IJIDS, vol. 2, no. 2, pp. 188-207, 2010.
iv. [chat09c] Kasturi Chatterjee, Shu-Ching Chen, “A Multimedia Data Management Approach with GeM-Tree,” JMM, 2010 (in press).
v. [chat09d] Shu-Ching Chen, Min Chen, Na Zhao, Shahid Hamid, Kasturi Chatterjee, and Michael Armella, “Florida Public Hurricane Loss Model: Research in Multi-Disciplinary System Integration Assisting Government Policy Making,” Special Issue on Building the Next Generation Infrastructure for Digital Government, Government Information Quarterly, Volume 26, Issue 2, pp. 285-294, April 2009.
vi. [chat 07a] Kasturi Chatterjee and Shu-Ching Chen, “A Novel Indexing and Access Mechanism using Affinity Hybrid Tree for Content-Based Image Retrieval in Multimedia Databases,” International Journal of Semantic Computing (IJSC), Vol. 1, Issue 2, pp. 147-170, June 2007.
61
PublicationsConferences
Publications
i. [chat09d] Yudan Li, Kasturi Chatterjee, Shu-Ching Chen, and Keqi Zhang, “A 3-D Traffic Animation System with Storm Surge Response,” accepted for publication, IEEE International Symposium on Multimedia (ISM2009), 2009.
ii. [chat08a] Kasturi Chatterjee and Shu-Ching Chen, “GeM-Tree: Towards a Generalized Multidimensional Index Structure Supporting Image and Video Retrieval,” the Fourth IEEE International Workshop on Multimedia Information Processing and Retrieval (MIPR2008), in conjunction with IEEE International Symposium on Multimedia (ISM2008), 2008.
iii. [chat08c] Kasturi Chatterjee and Shu-Ching Chen, “Hierarchical Affinity-Hybrid Tree: A Multidimensional Index Structure to Organize Videos and Support Content-Based Retrievals,” Proceedings of the 2008 IEEE International Conference on Information Reuse and Integration (IEEE IRI-08), 2008.
iv. [chat08d] Shu-Ching Chen, Min Chen, Na Zhao, Shahid Hamid, Khalid Saleem, and Kasturi Chatterjee, “Florida Public Hurricane Loss Model (FPHLM): Research Experience in System Integration,” the 9th Annual International Conference on Digital Government Research, 2008.
62
PublicationsConferences
v. [chat08e] Kasturi Chatterjee, Shixia Liu, and Shu-Ching Chen, “Using Graph Similarity for Social Network Analysis,” in 6th LA Grid Summit, (First Place), 2008.
vi. [chat06a] Kasturi Chatterjee and Shu-Ching Chen, “Affinity Hybrid Tree: An Indexing Technique for Content-Based Image Retrieval in Multimedia Databases,” in proceedings of IEEE International Symposium on Multimedia (ISM2006), (Best Paper Award), 2006.
vii. [chat06b] Kasturi Chatterjee, Khalid Saleem, Na Zhao, Min Chen, Shu-Ching Chen, and Shahid Hamid, “Modeling Methodology for Component Reuse and System Integration for Hurricane Loss Projection Application,” in proceedings of IEEE International Conference on Information Reuse and Integration (IEEE IRI-2006),2006.
63
Comments/Question!
64
References
[1] R. Bayer, “Binary B-Trees for Virtual Memory,” in ACM-SIGFIDET Workshop, San Diego, California, Session 5B, pp. 219-235, 1971. [2] J. Robinson, “The k-d-b-tree: A search structure for large multidimensional dynamic indexes,” in Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data, Ann Arbor, United States, pp. 10–18, 1981.[3] Y. N. Peter, "Data structures and algorithms for nearest neighbor search in general metric spaces,“ in Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, pp. 311-321, 1993.[4] C. Patella, et al., “M-tree: An efficient access method for similarity search in metric spaces,’’ in Proceedings of 23rd VLDB, pp. 426-435, 1997. [5] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” in Proc. 1984 ACM SIGMOD International Conference on Management of Data, pp. 47-57, 1984.[6] K. Chakrabarti, S. Mehrotra, “The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces,” in ICDE 1999, pp. 440-447, 1999.[7] K. Chakbarti, et al., “ Efficient Query Refinement in Multimedia Databases,” in Proc. International Conference on Data Engineering, pp. 196-200, 2000.[8] M-L. Shyu, S-C. Chen, M. Chen, C. Zhang, and C-M. Shu, "MMM: A Stochastic Mechanism for Image Database Queries," Proceedings of the IEEE Fifth International Symposium on Multimedia Software Engineering (MSE2003), pp. 188-195, December 10-12, 2003, Taichung, Taiwan, ROC.
65
References
[9] Shu-Ching Chen, Na Zhao, and Mei-Ling Shyu, "Modeling Semantic Concepts and User Preferences in Content-Based Video Retrieval," International Journal of Semantic Computing (IJSC), Vol. 1, Issue 3, pp. 377-402, September 2007.[10] L. C. Freeman, “Centrality in Social Network: Conceptual Classification,” Social Networks, vol. 1, no. 3, pp. 215-239, 1979.[12] L. A . Zager, et. sl., “Graph Similarity Scoring and Matching,” Applied Mathematics Letters, vol. 21, no.1, pp. 86-94, 2007.