Poisson Gamma Dynamical Systems - Columbia Universityas5530/ScheinZhouWallach2016_poster.pdf · (t1) k,c (t) ⌘ We compared the predictive performance on smoothing (predicting missing

Augment and conquer

Sequentially observed count vectors

e.g., counting daily interactions between pairs of countries

July 1, 2003y(1), . . . ,y(T )

where

y(t)v

number of times event type v occurred during time step t

=

Poisson—Gamma Dynamical SystemsAaron Schein1

1

Mingyuan Zhou2

2 3

Hanna Wallach3

Objective: Gibbs-sampling inference

Poisson—gamma dynamical system

⇡k ⇠ Dir(⌫1⌫k, . . . , ⇠⌫k, . . . , ⌫K⌫k) ✓(1)k ⇠ Gam(⌧0⌫k, ⌧0)⌫k ⇠ Gam⇣�0K

, �⌘

y(t)v ⇠ Pois

�(t)

KX

k=1

�kv✓(t)k

!✓(t)k ⇠ Gam

⌧0

KX

k2=1

⇡kk2✓(t�1)k2

, ⌧0

!

Poisson matrix factorization(natural for count matrices)

Dynamical system of gammas(conjugate prior to Poisson)

hy(t)

i= �✓(t)�(t)

h✓(t)

i= ⇧✓(t�1)

Columns of the transition matrixare probability vectors

A shrinkage prior on shuts off unneeded model capacity by shrinking transition probabilities and initial chain value ✓(1)k

⌫k⇡kk2

Challenge: Conditional non-conjugacy in original model means conditional posteriors are not available in closed form

Solution: Augment the model with auxiliary variables and transform it into a model with closed-form conditional posteriors

Three rules, applied recursively, transform the original model.

MCMC inference Forwards sampling✓(1)k ⇠ Gam

⇣m(1)

k + ⌧0⌫k, ⌧0 + �(1) + ⇣(2)⌧0⌘

for t = 2, . . . , T :

✓(t)k ⇠ Gam

m(t)

k + ⌧0

KX

k2=1

⇡kk2✓(t�1)k2

, ⌧0 + �(t) + ⇣(t+1)⌧0

!

for t = T, . . . , 2 :

⇣(t) := ln(1 + �(t)⌧�10 + ⇣(t+1)

)

for k = 1, . . . ,K :

m(t)k := y(t)k +

KX

k1=1

l(t+1)k1k

l(t)k· ⇠ CRT

⇣m(t)

k , ⌧0

KX

k2=1

⇡kk2✓(t�1)k2

⌘

�l(t)kk2

�Kk2=1

⇠ Mult

⇣l(t)k· ,

�⇡kk2✓

(t�1)k2

�Kk2=1

⌘

Backwards filtering

K

K

l(t)k1·

l(t)·k2

l(t)k1k2l(t�1)k1·

allocate across columns

sample

row

sums f

or t-1

sum across

rows

Input: ⇣(T+1)(default is 0)

l(T+1)k1k2

⇠ Pois

⇣⇣(T+1) ⌧0 ⇡k1k2 ✓

(T )k2

⌘

Setup to BFFS

Conditional posteriors for all latent variables are available under one or all of the alternate models.

Sampling transition matrixfor k = 1, . . . ,K :

⇡k ⇠ Dir

�⌫1⌫k + l(·)1k , . . . , ⇠⌫k + l(·)kk, . . . , ⌫K⌫k + l(·)Kk

�

This results in an efficient backward filtering—forward sampling (BFFS) algorithm.

1988 1991 1994 1997 20005

10

15

20

25

Jan 2003 Mar 2003 May 2003 Aug 2003 Oct 2003 Dec 20031

2

3

4

5

6

Interpretable latent structure

0 1 2 3 4 5 6 7 8 9

01

23

45

67

89

0123456

⌫k

⇡k1k2

✓(t)k

All parameters are non-negative and interpretable.

NIPS corpus datay(t)v

number of times word type v was used in NIPS papers during year t =

GDELT datay(t)v

number of interactions between country pair v during day t

=

word types v with largest �kv�kv

in component k✓(t)k

country pairs with largest �kv�kv

in component kThree components visualized here are those with the largest ⌫k⌫k values.

Shrinkage promotes diagonal structure in the transition matrix

1. Green (Israel—Palestine)2. Blue (Iraq War)3. Red (Six-party talks)

Predictive performance NIPS (top) versus ICEWS (bottom). NIPS corpus is less bursty.

PGDS has a better inductive bias for bursty count data

Baselines modelsGaussian linear dynamical system (LDS)

y(t) ⇠ N⇣�✓(t), D

⌘✓(t) ⇠ N

⇣⇧✓(t�1), ⌃

⌘

Gamma process dynamic Poisson factor analysis (GP-DPFA)

y(t)v ⇠ Pois

KX

k=1

�k�kv✓(t)k

!✓(t)k ⇠ Gam

⇣✓(t�1)k , c(t)

⌘

We compared the predictive performance on smoothing (predicting missing entries in the input matrix) and forecasting (predicting future data) to two baselines on two country event data sets (GDELT, ICEWS) and three text data sets (SOTU, DBLP, NIPS).

Step 1: Augment with a Poisson Step 2: Apply Rule 1

Step 3: Apply Rule 2 Step 3: Augment with a CRT

Step 3: Apply Rule 3

…

Alternative model

recurse

When a red variable has green arrows

leading out, we can form its conditional

posterior!

Represents future information

Original model:

B̂ =1

V

VX

v=1

T

T � 1

T�1X

t=1

|y(t+1)v � y(t)v |PT

t=1 y(t)v

We measure bustiness as:

Rule 1: Two independent Poissons are multinomial when conditioned on their sum: (the steps below are equivalent).

Rule 3: The magic bivariate count distribution. The same bivariate distribution factorizes in two ways that encode different conditional independencies.

y ⇠ Pois(✓)

l ⇠ Pois(✓) m ⇠ Pois(2✓)

y ⇠ Bin(m, 0.5)

l := m� y

✓ ⇠ Gam(↵,�)

m ⇠ Pois(✓) m ⇠ NB�↵,

1

1 + �

�

m ⇠ NB�↵,

1

1 + �

�

l ⇠ CRT(m,↵)

l ⇠ Pois

�↵ ln(1 + ��1

)

�

m ⇠ SumLog

�l,

1

1 + �

�

Rule 2: A Poisson with a gamma-distributed rate becomes a negative if its rate is marginalized out.

(y, l) ⇠ Multi�m, (0.5, 0.5)

�

Poisson Gamma Dynamical Systems - Columbia Universityas5530/ScheinZhouWallach2016_poster.pdf · (t1) k,c (t) ⌘ We compared the predictive performance on smoothing (predicting missing

Documents