Top Banner
CMStatistics London, UK December 17, 2017 Slides. bit.ly/arb-CMStatistics17 Joint work with David Gleich (Purdue) & Lek-Heng Lim (U. Chicago) Spacey random walks Austin R. Benson Cornell University
31

Spacey random walks CMStatistics 2017

Jan 28, 2018

Download

Data & Analytics

Austin Benson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spacey random walks CMStatistics 2017

CMStatistics

London, UK

December 17, 2017

Slides. bit.ly/arb-CMStatistics17

Joint work with

David Gleich (Purdue) &

Lek-Heng Lim (U. Chicago)

Spacey random walks

Austin R. Benson

Cornell University

Page 2: Spacey random walks CMStatistics 2017

1. Start with a Markov chain

2. Inquire about the stationary

distribution

3. Discover an eigenvector problem on

the transition matrix

2

In general, {Zt} will be a

stochastic process in this talk.

This is the limiting fraction of

time spent in each state.

Background. Markov chains, matrices, and

eigenvectors have a long-standing relationship.

Page 3: Spacey random walks CMStatistics 2017

3

Higher-order means keeping more history on the same state space.

Better model for several applications…

traffic flow in airport networks [Rosvall+ 14]

web browsing behavior [Pirolli-Pitkow 99; Chierichetti+ 12]

DNA sequences [Borodovsky-McIninch 93; Ching+ 04] Rosvall et al., Nature Comm., 2014.

second-

order MC

Background. Higher-order Markov chains are useful for many

data problems.

Page 4: Spacey random walks CMStatistics 2017

4

1

32

P

Transition probability tensor

[Li-Cui-Ng 13; Culp-Pearson-Zhang 17],

stochastic tensor [Yang-Yang 11]

stochastic hypermatrix [Benson-Gleich-Lim

17]

For our purposes, “tensors” are just multi-way arrays of numbers

(tensor ⟷ hypermatrix).

A is a third-order n x n x n tensor

→ Ai,j,k is a (real) number, 1 < i, j, k <

n.

1

32

A

2

1A

(a matrix is just a

second-order tensor)

Background. The transition probabilities of higher-order Markov

chains can be represented by a tensor.

Page 5: Spacey random walks CMStatistics 2017

5

A tensor eigenpair for a tensor A is a solution (x, 𝜆) to the

following system of polynomial equations [Lim 05, Qi 05].

technically called an l2 or z tensor eigenpair—there are a few

types of tensor eigenvectors (see new Qi-Luo 2017 book!)

Analogous to matrix case, eigenpairs are stationary points of

the Lagrangian for a generalized Rayleigh quotient [Lim 05].

Background. Tensors also have eigenvectors.

tensor eigenvectormatrix eigenvector

Page 6: Spacey random walks CMStatistics 2017

6

However, there are few results connecting

tensors and higher-order Markov chains.

Page 7: Spacey random walks CMStatistics 2017

7

Do tensor eigenvectors tell us anything about

higher-order Markov chains?

1. Start with a Markov chain

2. Inquire about stationary dist.

3. Discover a matrix eigenvector

problem on the transition matrix

1. Start with a higher-order MC

2. Inquire about stationary dist.

3. Discover a tensor eigenvector

problem on the transition tensor

?

Page 8: Spacey random walks CMStatistics 2017

8

Second-order Markov chains have stationary distribution on

pairs of states. Li-Ng approx. gives tensor eigenvectors.

1

32

P

The stationary distribution on pairs of states is still a matrix eigenvector.

But requires O(N2) space.

[Li-Ng 14] Rank-1 approximation Xi,j = xixj gives a “distribution” on the original

states as a tensor eigenvector. Only needs O(N) space.

Page 9: Spacey random walks CMStatistics 2017

9

1

32

P

Higher-order Markov chains and tensor eigenvectors.

The Li and Ng “stationary distribution”

This tensor eigenvector x has been studied algebraically…

Is nonnegative and sums to 1 ⟶ is stochastic [Li-Ng 14]

Almost always exists [Li-Ng 14]

…but might not be unique

Can sometimes be computed [Chu-Wu 14; Gleich-Lim-Yu 15]

Nagging questions. 1. What is the stochastic process underlying this tensor eigenvector?

2. How can we use it to study data?

Page 10: Spacey random walks CMStatistics 2017

10

Do tensor eigenvectors tell us anything about

higher-order Markov chains?

1. Start with a Markov chain

2. Inquire about stationary dist.

3. Discover a matrix eigenvector

problem on the transition matrix

1. Start with a higher-order MC

2. Inquire about stationary dist.

3. Discover a tensor eigenvector

problem on the transition tensor

of a related stochastic proc.

Page 11: Spacey random walks CMStatistics 2017

11

Alright, so what is the stochastic process

whose stationary distribution is the tensor

eigenvector Px2 = x?

Page 12: Spacey random walks CMStatistics 2017

1. Start with the transition probabilities of a higher-order Markov

chain

2. Upon arriving at state Zt = j, we space out and forget about

coming from Zt-1 = k.

3. We still think that we are higher-order so we draw a random state

r from our history and “pretend” that Zt-1 = r.

12

The spacey random walk.

1

32

P

Page 13: Spacey random walks CMStatistics 2017

10

12

4

9

7

11

4

Zt-1

Zt

Yt

Theorem [Benson-Gleich-Lim 17]

Limiting distributions of this process are tensor eigenvectors of P.

Higher-order Markov

chain transition

probabilities.

13

The spacey random walk.

1

32

P

Page 14: Spacey random walks CMStatistics 2017

14

Fraction of time spent at

state k up to time t

The spacey random walk is a type of vertex-reinforced

random walk.

Vertex-reinforced random walks [Diaconis 88; Pemantle 92, 07; Benaïm 97]

Ft is the 𝜎-algebra generated by the

history up to time t {Z1, …, Zt}

M(wt) is a column stochastic

transition matrix that depends on wt

Spacey random walks come from a particular map M that depends on P.

1

32

P

2

1 M(w t )

Page 15: Spacey random walks CMStatistics 2017

15

Theorem [Benaïm97] heavily paraphrased

In a discrete VRRW, the long-term behavior of the

occupancy distribution wt follows the long-term

behavior of the following dynamical system

Key idea. we study convergence of the dynamical

system for our particular map M

Stationary distributions of vertex-reinforced random walks

follow the trajectories of ODEs.

𝛑 maps a column stochastic

matrix to its stationary distribution.

Page 16: Spacey random walks CMStatistics 2017

16

Dynamical system for VRRWs

Map for spacey random walks

Stationary point

Tensor eigenvector! (but not all are attractors)

From continuous time dynamical systems to tensor

eigenvectors.

Page 17: Spacey random walks CMStatistics 2017

17

Our stochastic viewpoint gives a new approach.

We simply numerically integrate the dynamical

system (works for our stochastic tensors).

Current tensor eigenvector computation

algorithms are algebraic, look like generalizations

of matrix power method, shifted iteration, Newton

iteration.[Lathauwer-Moore-Vandewalle 00, Regalia-Kofidis 00,

Li-Ng 13; Chu-Wu 14; Kolda-Mayo 11, 14]

Computing tensor eigenvectors.

Higher-order power method

Dynamical system

Many known convergence issues!

Empirical observation integrating the dynamical system with ODE45() in

MATLAB/Julia always converges–tested for a wide variety of synthetic and

real-world data (even when state-of-the-art general algorithms diverge!)

Page 18: Spacey random walks CMStatistics 2017

18

1. If the higher-order Markov chain is really just a first-order chain,

then the SRW is identical to the first-order chain.

2. SRWs are asymptotically first-order Markovian.

wt converges to w⟶ dynamics converge to M(w)

3. Stationary distributions only need O(n) memory unlike higher-

order Markov chains.

4. SRWs generalize Pólya urns processes.

5. Nearly all 2 x 2 x 2 x … x 2 SRWs converge.

6. Some convergence guarantees with Forward Euler integration

and a new algorithm for computing the eigenvector.

Theory of spacey random walks.[Benson-Gleich-Lim 17]

Page 19: Spacey random walks CMStatistics 2017

19

Key idea. reduced dynamics to 1-dimensional ODE

1

32

P

Dynamics of two-state spacey random walks.

Unfolding of P.

Then we can just write out our dynamics…

Page 20: Spacey random walks CMStatistics 2017

20

stable

stable

unstableTheorem [Benson-Gleich-

Lim 17]

The dynamics of almost every

2 x 2 x … x 2 spacey random

walk (of any order) converges

to a stable equilibrium point.

Dynamics of two-state spacey random walks.

Page 21: Spacey random walks CMStatistics 2017

Theorem [Benson-Gleich-Lim 17]

If a < ½, then the dynamical system

converges to a unique fixed point, and numerical integration using

forward Euler with step size h < (1 – a) / (1 – 2a) converges to this

point.

21

Similar to the PageRank modification to a Markov chain.

1. With probability a, follow the spacey random walk

2. With probability 1 – a, teleport to random node.

The spacey random surfer offers additional structure.

1

32

P1

32

E= +all ones tensortransition tensorSRS tensor Pa

1

32

Pa

Page 22: Spacey random walks CMStatistics 2017

22

1. Modeling transportation systems [Benson-Gleich-Lim 17]

The SRW describes taxi cab trajectories.

2. Clustering multi-dimensional nonnegative data [Wu-Benson-Gleich 16]

The SRW provides a new spectral clustering methodology.

3. Ranking multi-relational data [Gleich-Lim-Yu 15]

The spacey random surfer is the stochastic process underlying the

“multilinear PageRank vector”.

4. Population genetics.

The spacey random walk traces the lineage of alleles in a random

mating model. The stationary distribution is the Hardy–Weinberg

equilibrium.

Applications of spacey random walks.

Page 23: Spacey random walks CMStatistics 2017

23

1,2,2,1,5,4,4,…

1,2,3,2,2,5,5,…

2,2,3,3,3,3,2,…

5,4,5,5,3,3,1,…

Model people by locations.

A passenger with location k is drawn at random.

The taxi picks up the passenger at location j.

The taxi drives the passenger to location i with probability Pi,j,k

Approximate location dist. by history ⟶ spacey random walk.

Urban Computing Microsoft Asia nyc.gov

Spacey random walk model for taxi trajectories.

Page 24: Spacey random walks CMStatistics 2017

24

x(1), x(2), x(3), x(4),…

Maximum likelihood estimation problem convex

objective

linear constraints

Spacey random walk model for taxi trajectories.

Page 25: Spacey random walks CMStatistics 2017

One year of 1000 taxi trajectories in NYC.

States are neighborhoods in Manhattan.

Learn tensor P under spacey random walk model from training data of 800 taxis.

Evaluation RMSE on test data of 200 taxis.

25

NYC taxi data supports the SRW hypothesis

Page 26: Spacey random walks CMStatistics 2017

26

Connecting spacey random walks to clustering.

Joint work with

Tao Wu, Purdue

Spacey random walks with stationary distributions are

asymptotically Markov chains

occupancy vector wt converges to w⟶ dynamics converge to M(w)

This connects to spectral clustering on graphs.

Eigenvectors of the normalized Laplacian of a graph are

eigenvectors of the random walk matrix.

1

32

P

2

1 M(w t )

General tensor spectral co-clustering for higher-order data, Wu-Benson-Gleich, NIPS, 2016.

Page 27: Spacey random walks CMStatistics 2017

27

We use the random walk connection to spectral clustering to

cluster nonnegative tensor data.

[i1, i2, …, in]3

[i1, i2, …, in1] x

[j1, j2, …, jn2] x

[k1, k2, …, kn3]

If the data is a brick, we

symmetrize before normalization. [Ragnarsson-Van Loan 2011]

Generalization of

If the data is a symmetric

cube, we can normalize it

to get a transition tensor P.

Page 28: Spacey random walks CMStatistics 2017

28

Input. Nonnegative brick of data.

1. Symmetrize the brick (if necessary)

2. Normalize to a stochastic tensor

3. Estimate the stationary distribution of the spacey random walk (or a generalization for sparse data—super-spacey random walk)

4. Form the asymptotic Markov model

5. Bisect indices using eigenvector of the asymptotic Markov model

6. Recurse

Output. Partition of indices.

The clustering methodology.

1

32

T

Page 29: Spacey random walks CMStatistics 2017

29

Ti,j,k = #(flights between airport i and airport j on airline k)

Clustering airline-airport-airport networks.

UNCLUSTERED

no apparent structure

CLUSTERED

diagonal structure evident

Page 30: Spacey random walks CMStatistics 2017

30

“best” clusters

pronouns & articles (the, we, he, …)

prepositions & link verbs (in, of, as, to, …)

fun 3-gram clusters

{cheese, cream, sour, low-fat, frosting, nonfat, fat-free}

{bag, plastic, garbage, grocery, trash, freezer}

fun 4-gram cluster

{german, chancellor, angela, merkel, gerhard, schroeder, helmut, kohl}

Ti,j,k = #(consecutive co-occurrences of words i, j, k in corpus)

Ti,j,k,l = #(consecutive co-occurrences of words i, j, k, l in corpus)

Data from Corpus of Contemporary American English (COCA) www.ngrams.info

Clustering n-grams in natural language.

Page 31: Spacey random walks CMStatistics 2017

Spacey random walks

The spacey random walk: a stochastic process for higher-order

data.

Austin Benson, David Gleich, and Lek-Heng Lim. SIAM Review,

2017.

https://github.com/arbenson/spacey-random-walks

General tensor spectral co-clustering for higher-order data.

Tao Wu, Austin Benson, and David Gleich, NIPS, 2016.

https://github.com/wutao27/GtensorSChttp://cs.cornell.edu/~arb

@austinbenson

[email protected]

Thanks!Austin R.

Benson

1. Spacey random walks are stochastic processes that explain the principal tensor z-eigenvectors of transition probability tensors.

2. Provide distribution over original state space. Only need O(N) space

3. Some convergence guarantees (always converges in practice).

4. Data applications in human dynamics and multirelationalclustering.