Top Banner
FastMap FastMap : Algorithm for : Algorithm for Indexing, Data-Mining and Indexing, Data-Mining and Visualization of Traditional Visualization of Traditional and Multimedia Datasets and Multimedia Datasets
33

Abstract

Jan 03, 2016

Download

Documents

megan-kirk

FastMap : Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. Abstract. Describe a fast algorithm to map objects into points in some k-dimensional space, such that the dis-similarities are preserved. Abstract. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Abstract

FastMap FastMap : Algorithm for : Algorithm for Indexing, Data-Mining and Indexing, Data-Mining and

Visualization of Traditional and Visualization of Traditional and Multimedia DatasetsMultimedia Datasets

FastMap FastMap : Algorithm for : Algorithm for Indexing, Data-Mining and Indexing, Data-Mining and

Visualization of Traditional and Visualization of Traditional and Multimedia DatasetsMultimedia Datasets

Page 2: Abstract

AbstractAbstractAbstractAbstract

Describe a fast algorithm to Describe a fast algorithm to map objects into points in map objects into points in some k-dimensional space, some k-dimensional space, such that the dis-similarities such that the dis-similarities are preserved.are preserved.

Page 3: Abstract

AbstractAbstractAbstractAbstract

Thus, we can subsequently Thus, we can subsequently use fine-tuned spatial access use fine-tuned spatial access methods (SAMs) to answer methods (SAMs) to answer queries such as “query by queries such as “query by example” or “all pairs query”.example” or “all pairs query”.

Page 4: Abstract

IntroductionIntroductionIntroductionIntroduction

Not easy to extract Not easy to extract kk feature- feature-extraction functions, which extraction functions, which map to map to kk-dimensional points-dimensional points

For instance, typed English For instance, typed English words, what distance function words, what distance function should we consider to should we consider to transform one string to the transform one string to the other? other?

Page 5: Abstract

SolutionsSolutionsSolutionsSolutions

Old : Multi-Dimensional Old : Multi-Dimensional Scaling (MDS)Scaling (MDS) Unsuitable for indexingUnsuitable for indexing

Proposed : Fast AlgorithmProposed : Fast Algorithm Much fasterMuch faster Allow indexingAllow indexing

Page 6: Abstract

ApplicationsApplicationsApplicationsApplications

Image and multimedia Image and multimedia databasesdatabases Medical databasesMedical databases

Page 7: Abstract

ApplicationsApplicationsApplicationsApplications

String databases, e.g. OCRString databases, e.g. OCR Time series, e.g. financial Time series, e.g. financial data data

Page 8: Abstract

ApplicationsApplicationsApplicationsApplications

Data mining and visualization Data mining and visualization applications applications

Page 9: Abstract

Desirable types of queriesDesirable types of queriesDesirable types of queriesDesirable types of queries

query-by-examplequery-by-example search a search a collection of objects to find the collection of objects to find the ones that are within a user-ones that are within a user-defined distance from the defined distance from the query objectquery object

all pairs queryall pairs query find the pairs of find the pairs of objects which are within objects which are within distance from each otherdistance from each other

Page 10: Abstract

Benefit of mapping objectsBenefit of mapping objectsBenefit of mapping objectsBenefit of mapping objects

Accelerate the search time for Accelerate the search time for queries, by employing SAMs queries, by employing SAMs like like RR*-trees and *-trees and zz-ordering-ordering

Help with visualization, Help with visualization, clustering and data-miningclustering and data-mining

Page 11: Abstract

Ideal mapping fulfills…Ideal mapping fulfills…Ideal mapping fulfills…Ideal mapping fulfills…

Fast to compute: O(Fast to compute: O(NN) or O() or O(N N loglogN)N), but not O(, but not O(N N 22))

Preserve distances with little Preserve distances with little discrepanciesdiscrepancies

Should be very fast to map a Should be very fast to map a new objectnew object

Page 12: Abstract

MDSMDSMDSMDS

Used to discover the Used to discover the underlying (spatial) structure underlying (spatial) structure of a set of data items from of a set of data items from the (dis)similarity informationthe (dis)similarity information

Map objects to a k-Map objects to a k-dimensional space, so as to dimensional space, so as to minimize the minimize the stressstress function function

Page 13: Abstract

MDSMDSMDSMDS

Stress functionStress function

it is the average difference it is the average difference between the distance of the between the distance of the "images" and the actual "images" and the actual distance.distance.

Page 14: Abstract

Drawbacks of MDSDrawbacks of MDSDrawbacks of MDSDrawbacks of MDS

Requires O(NRequires O(N22) time, which is ) time, which is impractical for large impractical for large databasesdatabases

Fast retrieval is questionable Fast retrieval is questionable as MDS is not prepared for as MDS is not prepared for “query-by-example” “query-by-example” operationoperation

Page 15: Abstract

DefinitionsDefinitionsDefinitionsDefinitions

k-d point Pk-d point Pii that corresponds that corresponds to the object Oto the object Oii, will be called , will be called the the ‘image’‘image’ of object O of object Oii. That . That is , Pis , Pii = (x = (xii,1, x,1, xii,2,…, x,2,…, xii,k),k)

k-d space containing ‘images’ k-d space containing ‘images’ will be called will be called target spacetarget space

Page 16: Abstract

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

Assumption: a domain expert Assumption: a domain expert has only provided us with a has only provided us with a distance/dis-similarity distance/dis-similarity function function D D (*, *)(*, *)

For instance, the Euclidean For instance, the Euclidean distance between two feature distance between two feature vectors as the distance vectors as the distance function between the function between the corresponding objectscorresponding objects

Page 17: Abstract

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

Pretend that objects are Pretend that objects are indeed points in some indeed points in some unknown unknown nn-dimensional -dimensional space, and to try to project space, and to try to project these points on these points on kk mutually mutually orthogonal directionsorthogonal directions

The challenge is to compute The challenge is to compute these projections from the these projections from the distance matrix onlydistance matrix only

Page 18: Abstract

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm Project the objects on a carefully Project the objects on a carefully

selected “line”selected “line” Choose OChoose Oaa and O and Obb be “pivot be “pivot

objects”objects”

Page 19: Abstract

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

compute the distance of each compute the distance of each point from the pivot points point from the pivot points using only information we using only information we know, i.e., the distances know, i.e., the distances between objectsbetween objects

Page 20: Abstract

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

Oa Ob

Oi

Xi

Page 21: Abstract

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

By Cosine Law, in any triangle By Cosine Law, in any triangle OOaaOOiiOObb

ddb,ib,i22 = d = da,ia,i

22 + d + da,ba,b22 – 2x – 2xiidda,ba,b

ddi,ji,j the shorthand for the the shorthand for the distance distance DD (O (Oii, O, Ojj))

Page 22: Abstract

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

By simple math manipulationBy simple math manipulation

Xi = (dXi = (da,ia,i22 + d + da,ba,b

2 2 - d- db,ib,i22) / 2d) / 2da,ba,b

We can map objects into We can map objects into points on a line, preserving points on a line, preserving some of the distance some of the distance informationinformation

Page 23: Abstract

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm Solved 2-d spaceSolved 2-d space Extend to higher dimensionsExtend to higher dimensions

Page 24: Abstract

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

Determines the coordinates of Determines the coordinates of the N objects on a new axis, the N objects on a new axis, after each of k recursive callsafter each of k recursive calls

Record the Record the “pivot objects”“pivot objects” in in each recursive call is to each recursive call is to facilitate queriesfacilitate queries

Choose pivots objects by Choose pivots objects by heuristic algorithmheuristic algorithm

Page 25: Abstract

Proposed algorithmProposed algorithmProposed algorithmProposed algorithm

All steps are linearAll steps are linear Complexity is O(N k)Complexity is O(N k)

Page 26: Abstract

ExperimentsExperimentsExperimentsExperiments

Compare FastMap with MDSCompare FastMap with MDS speed and qualityspeed and quality

Illustrate the visualization Illustrate the visualization and clustering abilitiesand clustering abilities real and synthetic datasetsreal and synthetic datasets

Page 27: Abstract

Comparison with MDSComparison with MDSComparison with MDSComparison with MDS Response time vs. no. of Response time vs. no. of

database sizedatabase size

Page 28: Abstract

Comparison with MDSComparison with MDSComparison with MDSComparison with MDS Response time vs. no. of Response time vs. no. of

dimensions kdimensions k

Page 29: Abstract

Comparison with MDSComparison with MDSComparison with MDSComparison with MDS Response time vs. stressResponse time vs. stress

Page 30: Abstract

Clustering/visualization properties Clustering/visualization properties of FastMapof FastMapClustering/visualization properties Clustering/visualization properties of FastMapof FastMap

Page 31: Abstract

Clustering/visualization properties Clustering/visualization properties of FastMapof FastMapClustering/visualization properties Clustering/visualization properties of FastMapof FastMap

Page 32: Abstract

ConclusionConclusionConclusionConclusion

A fast algorithm to map objects A fast algorithm to map objects into points in k-d spaceinto points in k-d space

Accelerate searching by highly Accelerate searching by highly optimized SAMs e.g. R-trees, R*-optimized SAMs e.g. R-trees, R*-trees etc.trees etc.

Application of the algorithm to Application of the algorithm to multimedia database, data-multimedia database, data-mining, clustering and document mining, clustering and document retrieval etc.retrieval etc.

Page 33: Abstract

ReferenceReferenceReferenceReference Christos Faloutsos, King-Ip (David) LinChristos Faloutsos, King-Ip (David) Lin

FastMap: A Fast Algorithm for Indexing, DaFastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional ta-Mining and Visualization of Traditional and Multimedia Datasetsand Multimedia Datasets

Joseph B. Kruskal, Myron WishJoseph B. Kruskal, Myron WishMultidimensional scalingMultidimensional scaling