Top Banner
Fast Display of Massive, High-Dimensional Data Alexander Gray Georgia Institute of Technology www.fast-lab.org
33

Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

May 24, 2018

Download

Documents

hanhi
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Fast Display of Massive, High-Dimensional Data

Alexander Gray

Georgia Institute of Technology

www.fast-lab.org

Page 2: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

The FASTlab

2 / 33

Page 3: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Goal: Seeing the Unseeable

Given a collection of data objects, can we show their relationshipsin a 2-d plot?

Figure: Recovered locations of USPS handwritten digits “3” and “5” given by

RankMap. The original data are in 16 × 16 = 256 dimensions.

3 / 33

Page 4: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Basic Approach

1 Choose a notion of distance that defines the relationshipsbetween the objects (defines a graph or kernel matrix, andchoose something you want to preserve about thoserelationships

2 Perform a computation (graph construction + convex

optimization) that defines the relationship-preserving mappingto 2-d points

4 / 33

Page 5: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Nonlinear Dimension Reduction: Key Bottlenecks

• Isomap: all-pairs shortest paths, all-nearest-neighbors, SVD

• Locally linear embedding: all-nearest-neighbors, SVD

• Maximum variance unfolding: all-nearest-neighbors, SDP

• Rankmap (Ouyang and Gray 2008, ICML):(all-nearest-neighbors), SDP/QP

• Isometric non-neg matrix fac (Vasiloglou, Gray, andAnderson 2009, SDM): all-nearest-neighbors, SDP

• Isometric separation maps (Vasiloglou, Gray, andAnderson 2009, MLSP): all-nearest-neighbors, SDP

• Density-preserving maps (Ozakin and Gray, in prep):kernel summation, SDP

What about “supervised dimension reduction”?

• Sparse support vector machines: LP/QP/DC/MINLP,kernel summation

5 / 33

Page 6: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Sidebar: Forget about Preserving Distances?

A theorem that goes back to Gauss and Riemann implies that it isimpossible to preserve the intrinsic distances between points in anintrinsically curved d-dimensional space by representing the pointsin d-dimensional Euclidean space. A familiar instance of thistheorem is the fact that it is impossible to preserve all the distancesbetween the points on the surface of the Earth by representingthem on a flat map. Although seemingly (or “extrinsically”)curved, surfaces such as the Swiss roll are intrinsically flat;however, a sphere is intrinsically curved. Various manifold learningmethods, when faced with data on such a curved space, distort thedata upon performing the dimensional reduction in various ways. Inother words: Are distances the right things to preserve at all?

6 / 33

Page 7: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

How About Preserving Densities?

Ozakin, Vasiloglou and Gray, in prep: You can’t always preservedistances, but you can always preserve densities:

TheoremLet (M, gM) and (N, gN) be two closed, connected, d-dimensional Riemannianmanifolds, diffeomorphic to each other, with the same total Riemannianvolume. Let X be a random variable on M, i.e., a measurable map X : Ω → Mfrom a probability space (Ω,F , P) to M. Assume that X∗(P), the pushforwardmeasure of P by X, is absolutely continuous with respect to the Riemannianvolume measure µM on M, with a continuous density f on M. Then thereexists a diffeomorphism φ : M → N such that the pushforward measurePN := φ∗(X∗(P)) is absolutely continuous with respect to the Riemannianvolume measure µN on N, and the density of PN is given by f φ−1.

7 / 33

Page 8: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Density-Preserving Maps: SDP

Density-Preserving Maps (Ozakin, Vasiloglou, and Gray, inprep):

maxK

trace(K )

such that:

fi =Ne

hdi

j∈Ii

ǫij

ǫij = (1 − d2ij/h

2i )

d2ij = Kii + Kjj − Kij − Kji

K 0

ǫij ≥ 0n∑

i , j=1

Kij = 0

8 / 33

Page 9: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Estimating High-Dimensional Densities

Submanifold Kernel Density Estimation (Ozakin and Gray2009, NIPS) can perform high-dimensional nonparametric densityestimation, if the data are on a manifold M.

TheoremLet f : M → [0,∞) be a probability density function defined on M (so that therelated probability measure is fV ), and K : [0,∞) → [0,∞) be a continousfunction that satisfies vanishes outside [0, 1), is differentiable with a boundedderivative in [0, 1), and satisfies,

R

‖z‖≤1K(‖z‖)dnz = 1. Assume f is

differentiable to second order in a neighborhood of p ∈ M, and for a sampleq1, . . . , qm of size m drawn from the density f , define an estimator fm(p) of

f (p) as, fm(p) = 1m

Pm

j=11

hnmK

up (qj )

hm

where hm > 0. If hm satisfies

limm→∞ hm = 0 and limm→∞ mhnm = ∞, then, there exists non-negative

numbers m∗, Cb, and CV such that for all m > m∗ we have,

MSEh

fm(p)i

= E

»

fm(p) − f (p)”2

< Cbh4m +

CV

mhnm

. (1)

If hm is chosen to be proportional to m−1/(n+4), this gives,

(fm(p) − f (p))2˜

= O“

1

m4/(n+4)

as m → ∞.

9 / 33

Page 10: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Density-Preserving Maps: Example

−1−0.5

00.5

1

−1

−0.5

0

0.5

10

0.5

1

1.5

2

DPM for punctured sphere

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure: a) A punctured sphere data set. b) The data reduced by density

preserving maps. c) The eigenvalue spectra of the inner product matrices

learned by PCA (green, ’+’), Isomap (red, ’.’), MVU (blue, ’*’), and DPM

(blue, ’o’).

10 / 33

Page 11: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

kD-trees

11 / 33

Page 12: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

kD-trees

12 / 33

Page 13: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

kD-trees

13 / 33

Page 14: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

kD-trees

14 / 33

Page 15: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

kD-trees

15 / 33

Page 16: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

kD-trees

16 / 33

Page 17: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

kD-trees

17 / 33

Page 18: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Cover trees

18 / 33

Page 19: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Cover trees

19 / 33

Page 20: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Cover trees

20 / 33

Page 21: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Cover trees

21 / 33

Page 22: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Cover trees

22 / 33

Page 23: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Fast All-Nearest-Neighbors

Common graph: Use k-nearest-neighbors. Problem: This is O(N2)naively. Often graph construction is the most expensive step ofmanifold learning.

Fast solution: 1) Use space-partitioning trees, such as kd-trees orcover-trees. This is the fastest approach for exact single-querysearches. 2) Traverse two simultaneously for the greatest speed inall-nearest-neighbor searches via dual-tree algorithms (Gray andMoore 2000, NIPS).

Analysis: Shown to be O(N), or linear-time, in generalbichromatic case, on cover-trees (Ram, Lee, March, and Gray2009, NIPS).

23 / 33

Page 24: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Fast Nearest-Neighbor in High Dimensions

In general the original data are high-dimensional. Problem: Acurse of dimensionality (Hammersley 1950) says that distancesapproach the same numerical value as dimension goes up. Thismakes tree algorithms ineffective in very high dimensions.

A recent trend has been to approximate nearest-neighbor byreturning a point within (1 + ǫ) of the true nearest-neighbordistance with high probability, e.g. LSH (Andoni and Indyk, 2006).Problem: In high dimensions, all points could satisfy this criterion,making the results junk.

More accurate solution: rank-approximate nearest-neighbor(Ram, Lee, Ouyang, and Gray 2009, NIPS), which is in generalfaster and more accurate than LSH.

24 / 33

Page 25: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Fast Nearest-Neighbor in High Dimensions

Figure: The traversal paths of the exact and the rank-approximate algorithm

in a kd-tree.

25 / 33

Page 26: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Fast Nearest-Neighbor in High Dimensions

bio corel covtype images mnist phy urand

100

101

102

103

104

spee

dup

over

line

ar s

earc

h

ε=0%(exact),0.001%,0.01%,0.1%,1%,10%α=0.95

Figure: Speedups (logscale on the Y-axis) over the linear search algorithm

while finding the NN in the exact case or (1 + εN)-RANN in the approximate

case with ε = 0.001%, 0.01%, 0.1%, 1.0%, 10.0% and a fixed success probability

α = 0.95 for every point in the dataset. The first(white) bar in each dataset in

the X-axis is the speedup of exact dual tree NN algorithm, and the

subsequent(dark) bars are the speedups of the approximate algorithm with

increasing approximation. 26 / 33

Page 27: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Fast Nearest-Neighbor in High Dimensions

Generally more accurate than LSH, automatically.

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

0.5

1

1.5

2

2.5

3

3.5

4

Maximum Rank Error

Tim

e (in

sec

.)

Random Sample of size 10000

RANNLSH

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

1

2

3

4

5

6

7

8

9

10

Maximum Rank Error

Tim

e (in

sec

.)

Random Sample of size 10000

RANNLSH

Figure: Query times on the X-axis and the Maximum Rank Error on the

Y-axis. Left: Layout histogram data. Right: MNIST data.

27 / 33

Page 28: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Fast Kernel Summation

For DPM, kernel density estimation (KDE) is needed as the firststep: ∀x , f (x) = 1

N

∑Ni Kh(‖x − xi‖). Problem: This is O(N2)

naively.

The fastest recent methods for this problem are physics-inspiredfast multipole-like methods Lee and Gray 2006, UAI). Problem:These approaches require computing a number of coefficients thatexplodes as dimension goes up, and thus are good only in lowdimensions.

Better solution for high dimensions: Monte Carlo multipolemethods (Lee and Gray 2008, NIPS), which has been shown tobe effective in up to 800 dimensions, at the cost of its accuracyguarantee only holding with high probability.

28 / 33

Page 29: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Fast KDE Learning

To learn the optimal bandwidth for KDE, it must effectively be runmany times, one for each bandwidth in cross-validation. Problem:This multiplies the cost significantly.

Faster solution: multi-tree Monte Carlo methods (Holmes,Gray and Isbell 2008, UAI) make this scalar sum fast, withaccuracy holding with high probability.

29 / 33

Page 30: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Fast Approximate Singular Value Decomposition

For most manifold learning methods, the final computation is asingular value decomposition (SVD). Problem: This is O(N3) foran N × N matrix.

Recent approaches have applied Monte Carlo ideas to linearalgebra (cf. NIPS 2009 tutorial). Problem: These methods aredriven by theoretical sample complexity bounds, which are notdata-dependent, and assume the rank is known.

Faster solution: A Monte Carlo approach based on cosine trees(Holmes, Gray and Isbell 2008, NIPS) samples more efficiently,and stops as soon as the original matrix is well-approximated,making it faster as well as automatic.

30 / 33

Page 31: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Fast Semidefinite Programming for Manifold Learning

The most recent manifold learning methods result in semidefiniteprograms (SDPs). Problem: These can be O(N3) or worse.

In (Vasiloglou, Gray, and Anderson 2008, MLSP) it shown howthe Burer-Monteiro method, a relaxation to a non-convex problemwhich preserves the same optimum, in conjunction with L-BFGS,can make MVU-like methods scalable to nearly a million points.

31 / 33

Page 32: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Fast Support Vector Machines

Various SVM formulations lead to quite different optimization, andhard, problems. Fastest methods to date include:

• For L2 SVMs with squared hinge loss: our stochasticFrank-Wolfe method for this QP (online nonlinear SVMtraining) (Ouyang and Gray 2010, SDM; ASAComputational Statistics Student Paper Prize)

• For L0<q<1 SVMs: our DC programming method (Guanand Gray 2010, under review)

• For L0 SVMs: our perspective cuts method for this MINLP(Guan and Gray 2010, under review)

32 / 33

Page 33: Fast Display of Massive, High-Dimensional Datafodava.gatech.edu/files/review2010/Alex-fodava-12.6.10-2.pdfalgebra (cf. NIPS 2009 tutorial). Problem: These methods are driven by theoretical

Software

33 / 33