# Spacey random walks CMStatistics 2017

Jan 28, 2018

## Data & Analytics

1. 1. CMStatistics London, UK December 17, 2017 Slides. bit.ly/arb-CMStatistics17 Joint work with David Gleich (Purdue) & Lek-Heng Lim (U. Chicago) Spacey random walks Austin R. Benson Cornell University
2. 2. 1. Start with a Markov chain 2. Inquire about the stationary distribution 3. Discover an eigenvector problem on the transition matrix 2 In general, {Zt} will be a stochastic process in this talk. This is the limiting fraction of time spent in each state. Background. Markov chains, matrices, and eigenvectors have a long-standing relationship.
3. 3. 3 Higher-order means keeping more history on the same state space. Better model for several applications traffic flow in airport networks [Rosvall+ 14] web browsing behavior [Pirolli-Pitkow 99; Chierichetti+ 12] DNA sequences [Borodovsky-McIninch 93; Ching+ 04] Rosvall et al., Nature Comm., 2014. second- order MC Background. Higher-order Markov chains are useful for many data problems.
4. 4. 4 1 3 2 P Transition probability tensor [Li-Cui-Ng 13; Culp-Pearson-Zhang 17], stochastic tensor [Yang-Yang 11] stochastic hypermatrix [Benson-Gleich-Lim 17] For our purposes, tensors are just multi-way arrays of numbers (tensor hypermatrix). A is a third-order n x n x n tensor Ai,j,k is a (real) number, 1 < i, j, k < n. 1 3 2 A 2 1 A (a matrix is just a second-order tensor) Background. The transition probabilities of higher-order Markov chains can be represented by a tensor.
5. 5. 5 A tensor eigenpair for a tensor A is a solution (x, ) to the following system of polynomial equations [Lim 05, Qi 05]. technically called an l2 or z tensor eigenpairthere are a few types of tensor eigenvectors (see new Qi-Luo 2017 book!) Analogous to matrix case, eigenpairs are stationary points of the Lagrangian for a generalized Rayleigh quotient [Lim 05]. Background. Tensors also have eigenvectors. tensor eigenvector matrix eigenvector
6. 6. 6 However, there are few results connecting tensors and higher-order Markov chains.
7. 7. 7 Do tensor eigenvectors tell us anything about higher-order Markov chains? 1. Start with a Markov chain 2. Inquire about stationary dist. 3. Discover a matrix eigenvector problem on the transition matrix 1. Start with a higher-order MC 2. Inquire about stationary dist. 3. Discover a tensor eigenvector problem on the transition tensor ?
8. 8. 8 Second-order Markov chains have stationary distribution on pairs of states. Li-Ng approx. gives tensor eigenvectors. 1 3 2 P The stationary distribution on pairs of states is still a matrix eigenvector. But requires O(N2) space. [Li-Ng 14] Rank-1 approximation Xi,j = xixj gives a distribution on the original states as a tensor eigenvector. Only needs O(N) space.
9. 9. 9 1 3 2 P Higher-order Markov chains and tensor eigenvectors. The Li and Ng stationary distribution This tensor eigenvector x has been studied algebraically Is nonnegative and sums to 1 is stochastic [Li-Ng 14] Almost always exists [Li-Ng 14] but might not be unique Can sometimes be computed [Chu-Wu 14; Gleich-Lim-Yu 15] Nagging questions. 1. What is the stochastic process underlying this tensor eigenvector? 2. How can we use it to study data?
10. 10. 10 Do tensor eigenvectors tell us anything about higher-order Markov chains? 1. Start with a Markov chain 2. Inquire about stationary dist. 3. Discover a matrix eigenvector problem on the transition matrix 1. Start with a higher-order MC 2. Inquire about stationary dist. 3. Discover a tensor eigenvector problem on the transition tensor of a related stochastic proc.
11. 11. 11 Alright, so what is the stochastic process whose stationary distribution is the tensor eigenvector Px2 = x?
12. 12. 1. Start with the transition probabilities of a higher-order Markov chain 2. Upon arriving at state Zt = j, we space out and forget about coming from Zt-1 = k. 3. We still think that we are higher-order so we draw a random state r from our history and pretend that Zt-1 = r. 12 The spacey random walk. 1 3 2 P
13. 13. 10 12 4 9 7 11 4 Zt-1 Zt Yt Theorem [Benson-Gleich-Lim 17] Limiting distributions of this process are tensor eigenvectors of P. Higher-order Markov chain transition probabilities. 13 The spacey random walk. 1 3 2 P
14. 14. 14 Fraction of time spent at state k up to time t The spacey random walk is a type of vertex-reinforced random walk. Vertex-reinforced random walks [Diaconis 88; Pemantle 92, 07; Benam 97] Ft is the -algebra generated by the history up to time t {Z1, , Zt} M(wt) is a column stochastic transition matrix that depends on wt Spacey random walks come from a particular map M that depends on P. 1 3 2 P 2 1 M(wt )
15. 15. 15 Theorem [Benam97] heavily paraphrased In a discrete VRRW, the long-term behavior of the occupancy distribution wt follows the long-term behavior of the following dynamical system Key idea. we study convergence of the dynamical system for our particular map M Stationary distributions of vertex-reinforced random walks follow the trajectories of ODEs. maps a column stochastic matrix to its stationary distribution.
16. 16. 16 Dynamical system for VRRWs Map for spacey random walks Stationary point Tensor eigenvector! (but not all are attractors) From continuous time dynamical systems to tensor eigenvectors.
17. 17. 17 Our stochastic viewpoint gives a new approach. We simply numerically integrate the dynamical system (works for our stochastic tensors). Current tensor eigenvector computation algorithms are algebraic, look like generalizations of matrix power method, shifted iteration, Newton iteration. [Lathauwer-Moore-Vandewalle 00, Regalia-Kofidis 00, Li-Ng 13; Chu-Wu 14; Kolda-Mayo 11, 14] Computing tensor eigenvectors. Higher-order power method Dynamical system Many known convergence issues! Empirical observation integrating the dynamical system with ODE45() in MATLAB/Julia always convergestested for a wide variety of synthetic and real-world data (even when state-of-the-art general algorithms diverge!)
18. 18. 18 1. If the higher-order Markov chain is really just a first-order chain, then the SRW is identical to the first-order chain. 2. SRWs are asymptotically first-order Markovian. wt converges to w dynamics converge to M(w) 3. Stationary distributions only need O(n) memory unlike higher- order Markov chains. 4. SRWs generalize Plya urns processes. 5. Nearly all 2 x 2 x 2 x x 2 SRWs converge. 6. Some convergence guarantees with Forward Euler integration and a new algorithm for computing the eigenvector. Theory of spacey random walks. [Benson-Gleich-Lim 17]
19. 19. 19 Key idea. reduced dynamics to 1-dimensional ODE 1 3 2 P Dynamics of two-state spacey random walks. Unfolding of P. Then we can just write out our dynamics
20. 20. 20 stable stable unstable Theorem [Benson-Gleich- Lim 17] The dynamics of almost every 2 x 2 x x 2 spacey random walk (of any order) converges to a stable equilibrium point. Dynamics of two-state spacey random walks.
21. 21. Theorem [Benson-Gleich-Lim 17] If a < , then the dynamical system converges to a unique fixed point, and numerical integration using forward Euler with step size h < (1 a) / (1 2a) converges to this 21 Similar to the PageRank modification to a Markov chain. 1. With probability a, follow the spacey random walk 2. With probability 1 a, teleport to random node. The spacey random surfer offers additional structure. 1 3 2 P 1 3 2 E = + all ones tensortransition tensorSRS tensor Pa 1 3 2 Pa
22. 22. 22 1. Modeling transportation systems [Benson-Gleich-Lim 17] The SRW describes taxi cab trajectories. 2. Clustering multi-dimensional nonnegative data [Wu-Benson-Gleich 16] The SRW provides a new spectral clustering methodology. 3. Ranking multi-relational data [Gleich-Lim-Yu 15] The spacey random surfer is the stochastic process underlying the multilinear PageRank vector. 4. Population genetics. The spacey random walk traces the lineage of alleles in a random mating model. The stationary distribution is the HardyWeinberg equilibrium. Applications of spacey random walks.
23. 23. 23 1,2,2,1,5,4,4, 1,2,3,2,2,5,5, 2,2,3,3,3,3,2, 5,4,5,5,3,3,1, Model people by locations. A passenger with location k is drawn at random. The taxi picks up the passenger at location j. The taxi drives the passenger to location i with probability Pi,j,k Approximate location dist. by history spacey random walk. Urban Computing Microsoft Asia nyc.gov Spacey random walk model for taxi trajectories.
24. 24. 24 x(1), x(2), x(3), x(4), Maximum likelihood estimation problem convex objective linear constraints Spacey random walk model for taxi trajectories.
25. 25. One year of 1000 taxi trajectories in NYC. States are neighborhoods in Manhattan. Learn tensor P under spacey random walk model from training data of 800 taxis. Evaluation RMSE on test data of 200 taxis. 25 NYC taxi data supports the SRW hypothesis
26. 26. 26 Connecting spacey random walks to clustering. Joint work with Tao Wu, Purdue Spacey random walks with stationary distributions are asymptotically Markov chains occupancy vector wt converges to w dynamics converge to M(w) This connects to spectral clustering on graphs. Eigenvectors of the normalized Laplacian of a graph are eigenvectors of the random walk matrix. 1 3 2 P 2 1 M(wt ) General tensor spectral co-clustering for higher-order data, Wu-Benson-Gleich, NIPS, 2016.
27. 27. 27 We use the random walk connection to spectral clustering to cluster nonnegative tensor data. [i1, i2, , in]3 [i1, i2, , in1 ] x [j1, j2, , jn2 ] x [k1, k2, , kn3 ] If the data is a brick, we symmetrize before normalization. [Ragnarsson-Van Loan 2011] Generalization of If the data is a symmetric cube, we can normalize it to get a transition tensor P.
28. 28. 28 Input. Nonnegative brick of data. 1. S
Welcome message from author