FastMap FastMap : Algorithm for : Algorithm for Indexing, Data-Mining and Indexing, Data-Mining and Visualization of Traditional Visualization of Traditional and Multimedia Datasets and Multimedia Datasets
Jan 03, 2016
FastMap FastMap : Algorithm for : Algorithm for Indexing, Data-Mining and Indexing, Data-Mining and
Visualization of Traditional and Visualization of Traditional and Multimedia DatasetsMultimedia Datasets
FastMap FastMap : Algorithm for : Algorithm for Indexing, Data-Mining and Indexing, Data-Mining and
Visualization of Traditional and Visualization of Traditional and Multimedia DatasetsMultimedia Datasets
AbstractAbstractAbstractAbstract
Describe a fast algorithm to Describe a fast algorithm to map objects into points in map objects into points in some k-dimensional space, some k-dimensional space, such that the dis-similarities such that the dis-similarities are preserved.are preserved.
AbstractAbstractAbstractAbstract
Thus, we can subsequently Thus, we can subsequently use fine-tuned spatial access use fine-tuned spatial access methods (SAMs) to answer methods (SAMs) to answer queries such as “query by queries such as “query by example” or “all pairs query”.example” or “all pairs query”.
IntroductionIntroductionIntroductionIntroduction
Not easy to extract Not easy to extract kk feature- feature-extraction functions, which extraction functions, which map to map to kk-dimensional points-dimensional points
For instance, typed English For instance, typed English words, what distance function words, what distance function should we consider to should we consider to transform one string to the transform one string to the other? other?
SolutionsSolutionsSolutionsSolutions
Old : Multi-Dimensional Old : Multi-Dimensional Scaling (MDS)Scaling (MDS) Unsuitable for indexingUnsuitable for indexing
Proposed : Fast AlgorithmProposed : Fast Algorithm Much fasterMuch faster Allow indexingAllow indexing
ApplicationsApplicationsApplicationsApplications
Image and multimedia Image and multimedia databasesdatabases Medical databasesMedical databases
ApplicationsApplicationsApplicationsApplications
String databases, e.g. OCRString databases, e.g. OCR Time series, e.g. financial Time series, e.g. financial data data
ApplicationsApplicationsApplicationsApplications
Data mining and visualization Data mining and visualization applications applications
Desirable types of queriesDesirable types of queriesDesirable types of queriesDesirable types of queries
query-by-examplequery-by-example search a search a collection of objects to find the collection of objects to find the ones that are within a user-ones that are within a user-defined distance from the defined distance from the query objectquery object
all pairs queryall pairs query find the pairs of find the pairs of objects which are within objects which are within distance from each otherdistance from each other
Benefit of mapping objectsBenefit of mapping objectsBenefit of mapping objectsBenefit of mapping objects
Accelerate the search time for Accelerate the search time for queries, by employing SAMs queries, by employing SAMs like like RR*-trees and *-trees and zz-ordering-ordering
Help with visualization, Help with visualization, clustering and data-miningclustering and data-mining
Ideal mapping fulfills…Ideal mapping fulfills…Ideal mapping fulfills…Ideal mapping fulfills…
Fast to compute: O(Fast to compute: O(NN) or O() or O(N N loglogN)N), but not O(, but not O(N N 22))
Preserve distances with little Preserve distances with little discrepanciesdiscrepancies
Should be very fast to map a Should be very fast to map a new objectnew object
MDSMDSMDSMDS
Used to discover the Used to discover the underlying (spatial) structure underlying (spatial) structure of a set of data items from of a set of data items from the (dis)similarity informationthe (dis)similarity information
Map objects to a k-Map objects to a k-dimensional space, so as to dimensional space, so as to minimize the minimize the stressstress function function
MDSMDSMDSMDS
Stress functionStress function
it is the average difference it is the average difference between the distance of the between the distance of the "images" and the actual "images" and the actual distance.distance.
Drawbacks of MDSDrawbacks of MDSDrawbacks of MDSDrawbacks of MDS
Requires O(NRequires O(N22) time, which is ) time, which is impractical for large impractical for large databasesdatabases
Fast retrieval is questionable Fast retrieval is questionable as MDS is not prepared for as MDS is not prepared for “query-by-example” “query-by-example” operationoperation
DefinitionsDefinitionsDefinitionsDefinitions
k-d point Pk-d point Pii that corresponds that corresponds to the object Oto the object Oii, will be called , will be called the the ‘image’‘image’ of object O of object Oii. That . That is , Pis , Pii = (x = (xii,1, x,1, xii,2,…, x,2,…, xii,k),k)
k-d space containing ‘images’ k-d space containing ‘images’ will be called will be called target spacetarget space
Proposed algorithmProposed algorithmProposed algorithmProposed algorithm
Assumption: a domain expert Assumption: a domain expert has only provided us with a has only provided us with a distance/dis-similarity distance/dis-similarity function function D D (*, *)(*, *)
For instance, the Euclidean For instance, the Euclidean distance between two feature distance between two feature vectors as the distance vectors as the distance function between the function between the corresponding objectscorresponding objects
Proposed algorithmProposed algorithmProposed algorithmProposed algorithm
Pretend that objects are Pretend that objects are indeed points in some indeed points in some unknown unknown nn-dimensional -dimensional space, and to try to project space, and to try to project these points on these points on kk mutually mutually orthogonal directionsorthogonal directions
The challenge is to compute The challenge is to compute these projections from the these projections from the distance matrix onlydistance matrix only
Proposed algorithmProposed algorithmProposed algorithmProposed algorithm Project the objects on a carefully Project the objects on a carefully
selected “line”selected “line” Choose OChoose Oaa and O and Obb be “pivot be “pivot
objects”objects”
Proposed algorithmProposed algorithmProposed algorithmProposed algorithm
compute the distance of each compute the distance of each point from the pivot points point from the pivot points using only information we using only information we know, i.e., the distances know, i.e., the distances between objectsbetween objects
Proposed algorithmProposed algorithmProposed algorithmProposed algorithm
By Cosine Law, in any triangle By Cosine Law, in any triangle OOaaOOiiOObb
ddb,ib,i22 = d = da,ia,i
22 + d + da,ba,b22 – 2x – 2xiidda,ba,b
ddi,ji,j the shorthand for the the shorthand for the distance distance DD (O (Oii, O, Ojj))
Proposed algorithmProposed algorithmProposed algorithmProposed algorithm
By simple math manipulationBy simple math manipulation
Xi = (dXi = (da,ia,i22 + d + da,ba,b
2 2 - d- db,ib,i22) / 2d) / 2da,ba,b
We can map objects into We can map objects into points on a line, preserving points on a line, preserving some of the distance some of the distance informationinformation
Proposed algorithmProposed algorithmProposed algorithmProposed algorithm Solved 2-d spaceSolved 2-d space Extend to higher dimensionsExtend to higher dimensions
Proposed algorithmProposed algorithmProposed algorithmProposed algorithm
Determines the coordinates of Determines the coordinates of the N objects on a new axis, the N objects on a new axis, after each of k recursive callsafter each of k recursive calls
Record the Record the “pivot objects”“pivot objects” in in each recursive call is to each recursive call is to facilitate queriesfacilitate queries
Choose pivots objects by Choose pivots objects by heuristic algorithmheuristic algorithm
Proposed algorithmProposed algorithmProposed algorithmProposed algorithm
All steps are linearAll steps are linear Complexity is O(N k)Complexity is O(N k)
ExperimentsExperimentsExperimentsExperiments
Compare FastMap with MDSCompare FastMap with MDS speed and qualityspeed and quality
Illustrate the visualization Illustrate the visualization and clustering abilitiesand clustering abilities real and synthetic datasetsreal and synthetic datasets
Comparison with MDSComparison with MDSComparison with MDSComparison with MDS Response time vs. no. of Response time vs. no. of
database sizedatabase size
Comparison with MDSComparison with MDSComparison with MDSComparison with MDS Response time vs. no. of Response time vs. no. of
dimensions kdimensions k
Comparison with MDSComparison with MDSComparison with MDSComparison with MDS Response time vs. stressResponse time vs. stress
Clustering/visualization properties Clustering/visualization properties of FastMapof FastMapClustering/visualization properties Clustering/visualization properties of FastMapof FastMap
Clustering/visualization properties Clustering/visualization properties of FastMapof FastMapClustering/visualization properties Clustering/visualization properties of FastMapof FastMap
ConclusionConclusionConclusionConclusion
A fast algorithm to map objects A fast algorithm to map objects into points in k-d spaceinto points in k-d space
Accelerate searching by highly Accelerate searching by highly optimized SAMs e.g. R-trees, R*-optimized SAMs e.g. R-trees, R*-trees etc.trees etc.
Application of the algorithm to Application of the algorithm to multimedia database, data-multimedia database, data-mining, clustering and document mining, clustering and document retrieval etc.retrieval etc.
ReferenceReferenceReferenceReference Christos Faloutsos, King-Ip (David) LinChristos Faloutsos, King-Ip (David) Lin
FastMap: A Fast Algorithm for Indexing, DaFastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional ta-Mining and Visualization of Traditional and Multimedia Datasetsand Multimedia Datasets
Joseph B. Kruskal, Myron WishJoseph B. Kruskal, Myron WishMultidimensional scalingMultidimensional scaling