Augment and conquer Sequentially observed count vectors e.g., counting daily interactions between pairs of countries July 1, 2003 y (1) ,..., y (T ) where y (t) v number of times event type v occurred during time step t = Poisson—Gamma Dynamical Systems Aaron Schein 1 1 Mingyuan Zhou 2 2 3 Hanna Wallach 3 Objective: Gibbs-sampling inference Poisson—gamma dynamical system ⇡ k ⇠ Dir(⌫ 1 ⌫ k ,..., ⇠⌫ k ,..., ⌫ K ⌫ k ) ✓ (1) k ⇠ Gam(⌧ 0 ⌫ k , ⌧ 0 ) ⌫ k ⇠ Gam ⇣ γ 0 K , β ⌘ y (t) v ⇠ Pois δ (t) K X k =1 φ kv ✓ (t) k ! ✓ (t) k ⇠ Gam ⌧ 0 K X k 2 =1 ⇡ kk 2 ✓ (t-1) k 2 , ⌧ 0 ! Poisson matrix factorization (natural for count matrices) Dynamical system of gammas (conjugate prior to Poisson) h y (t) i = Φ ✓ (t) δ (t) h ✓ (t) i = ⇧ ✓ (t-1) Columns of the transition matrix are probability vectors A shrinkage prior on shuts off unneeded model capacity by shrinking transition probabilities and initial chain value ✓ (1) k ⌫ k ⇡ kk 2 Challenge: Conditional non-conjugacy in original model means conditional posteriors are not available in closed form Solution: Augment the model with auxiliary variables and transform it into a model with closed-form conditional posteriors Three rules, applied recursively, transform the original model. MCMC inference Forwards sampling ✓ (1) k ⇠ Gam ⇣ m (1) k + ⌧ 0 ⌫ k , ⌧ 0 + δ (1) + ⇣ (2) ⌧ 0 ⌘ for t =2,...,T : ✓ (t) k ⇠ Gam m (t) k + ⌧ 0 K X k 2 =1 ⇡ kk 2 ✓ (t-1) k 2 , ⌧ 0 + δ (t) + ⇣ (t+1) ⌧ 0 ! for t = T,..., 2: ⇣ (t) := ln(1 + δ (t) ⌧ -1 0 + ⇣ (t+1) ) for k =1,...,K : m (t) k := y (t) k + K X k 1 =1 l (t+1) k 1 k l (t) k· ⇠ CRT ⇣ m (t) k , ⌧ 0 K X k 2 =1 ⇡ kk 2 ✓ (t-1) k 2 ⌘ ( l (t) kk 2 ) K k 2 =1 ⇠ Mult ⇣ l (t) k· , ( ⇡ kk 2 ✓ (t-1) k 2 ) K k 2 =1 ⌘ Backwards filtering K K l (t) k 1 · l (t) ·k 2 l (t) k 1 k 2 l (t-1) k 1 · allocate across columns sample row sums for t-1 sum across rows Input: ⇣ (T +1) (default is 0) l (T +1) k 1 k 2 ⇠ Pois ⇣ ⇣ (T +1) ⌧ 0 ⇡ k 1 k 2 ✓ (T ) k 2 ⌘ Setup to BFFS Conditional posteriors for all latent variables are available under one or all of the alternate models. Sampling transition matrix for k =1,...,K : ⇡ k ⇠ Dir ( ⌫ 1 ⌫ k + l (·) 1k ,..., ⇠⌫ k + l (·) kk ,..., ⌫ K ⌫ k + l (·) Kk ) This results in an efficient backward filtering—forward sampling (BFFS) algorithm. 1988 1991 1994 1997 2000 5 10 15 20 25 Jan 2003 Mar 2003 May 2003 Aug 2003 Oct 2003 1 2 3 4 5 6 Interpretable latent structure 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 ⌫ k ⇡ k 1 k 2 ✓ (t) k All parameters are non-negative and interpretable. NIPS corpus data y (t) v number of times word type v was used in NIPS papers during year t = GDELT data y (t) v number of interactions between country pair v during day t = word types v with largest φ kv φ kv in component k ✓ (t) k country pairs with largest φ kv φ kv in component k Three components visualized here are those with the largest ⌫ k ⌫ k values. Shrinkage promotes diagonal structure in the transition matrix 1. Green (Israel—Palestine) 2. Blue (Iraq War) 3. Red (Six-party talks) Predictive performance NIPS (top) versus ICEWS (bottom). NIPS corpus is less bursty. PGDS has a better inductive bias for bursty count data Baselines models Gaussian linear dynamical system (LDS) y (t) ⇠ N ⇣ Φ ✓ (t) ,D ⌘ ✓ (t) ⇠ N ⇣ ⇧ ✓ (t-1) , ⌃ ⌘ Gamma process dynamic Poisson factor analysis (GP-DPFA) y (t) v ⇠ Pois K X k=1 λ k φ kv ✓ (t) k ! ✓ (t) k ⇠ Gam ⇣ ✓ (t-1) k ,c (t) ⌘ We compared the predictive performance on smoothing (predicting missing entries in the input matrix) and forecasting (predicting future data) to two baselines on two country event data sets (GDELT, ICEWS) and three text data sets (SOTU, DBLP, NIPS). Step 1: Augment with a Poisson Step 2: Apply Rule 1 Step 3: Apply Rule 2 Step 3: Augment with a CRT Step 3: Apply Rule 3 … Alternative model recurse When a red variable has green arrows leading out, we can form its conditional posterior! Represents future information Original model: ˆ B = 1 V V X v=1 T T - 1 T -1 X t=1 |y (t+1) v - y (t) v | P T t=1 y (t) v We measure bustiness as: Rule 1: Two independent Poissons are multinomial when conditioned on their sum: (the steps below are equivalent). Rule 3: The magic bivariate count distribution. The same bivariate distribution factorizes in two ways that encode different conditional independencies. y ⇠ Pois(✓ ) l ⇠ Pois(✓ ) m ⇠ Pois(2✓) y ⇠ Bin(m, 0.5) l := m - y ✓ ⇠ Gam(↵, β ) m ⇠ Pois(✓ ) m ⇠ NB ( ↵, 1 1+ β ) m ⇠ NB ( ↵, 1 1+ β ) l ⇠ CRT(m, ↵) l ⇠ Pois ( ↵ ln(1 + β -1 ) ) m ⇠ SumLog ( l, 1 1+ β ) Rule 2: A Poisson with a gamma-distributed rate becomes a negative if its rate is marginalized out. (y,l ) ⇠ Multi ( m, (0.5, 0.5) )