Top Banner
1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe
17

1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

1

Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models

Guillaume Bouchard

Xerox Research Centre Europe

Page 2: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 2

Deterministic Inference in Hybrid Graphical Models

Discrete variables with continuous* parents No sufficient statistic No conjugate distribution

Intractable inference Approximate deterministic inference

Local sampling Deterministic approximations

Gaussian quadrature delta method Laplace approximation Maximize a lower bound to the

variational free energy

X1X1 X2

X2 X3X3 X4

X4

Y1Y1

X5X5 X5

X5 Y2Y2

Y3Y3

X0X0

Discrete variable

Continuous variableObserved variable

Hidden variable *or a large number of discrete parents

Page 3: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 3

Variational inference

Focus on Bayesian multinomial logistic regression

Mean field approximation

Q belongs to an approximation family

Discrete variable

Continuous variable

Observed variable

Hidden variable

X1iX1i X2i

X2i β 1β 1 β2

β2

YiYi

Data i

upper bound?

upper bound?

max

Page 4: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 4

Bounding the log-partition function (1)

Binary case dimension: classical bound [Jordan and Jaakkola]

We propose its multiclass extension

Page 5: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 5

Bounding the log-partition function (2)

K=2

K=10

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.5

1

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10

15

20

25

30

35

40

45

50

x

log k ex

k

worse curvatureoptimal tightoptimal average (=2)

optimal average (=1)

optimal average (=0.1)

Page 6: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 6

Other upper bounds

Concavity of the log [e.g. Blei et al.]

Worst curvature [Bohning]

Bound using hyperbolic cosines [Jebara]

Local approximation [Gibbs]not proved to be an upper bound

Page 7: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 7

ProofIdea: Expand the product of inverted sigmoids

Upper-bounded by K quadratic upper bounds

Lower bounded by a linear function (log-convexity of f)

Proof: apply Jensen inequality to

Page 8: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 8

Bounds on the Expectation

Exponential bound

Quadratic bound

simulations

Page 9: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 9

Bayesian multinomial logistic regression

Exponential bound

Cannot be maximized in closed form gradient-based optimization Fixed point equation (unstable !)

Quadratic bound

Analytic update:

Page 10: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 10

Numerical experiments

Iris dataset 4 dimensions 3 classes Prior: unit variance

Experiment Learning: Batch updates Compared to MCMC

estimation based on 100K samples

Error = Euclidian distance between the mean and variance parameters

ResultsThe “worse curvature” bound is more faster and better…

0 10 20 30 40 50 60 70 80 90 1001.5

2

2.5

3

3.5

4

4.5

5

5.5

6

number of iterations

err

or

worse curvaturesigmoid product bound

0 20 40 60 80 100 1200

2

4

6

8

10

12

14

16x 10

5

number of iterations

Var

iatio

na

l Fre

e E

nerg

y

worse curvaturesigmoid product bound

Page 11: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 11

Conclusion

Multinomial links in graphical models are feasible Existing bound work well We can expect further improvements Remark

better bounds are only needed for the Bayesian setting

For MAP estimation, even a loose bound converge Future work

Application to discriminative learning Mixture-based mean-field approximation

Page 12: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 12

Page 13: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 13

Backup slides

Page 14: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 14

Numerical experiments

Iris dataset 4 dimensions 3 classes Prior: unit variance

Experiment Learning: Batch updates Compared to MCMC

estimation based on 100K samples

Error = Euclidian distance between the mean and variance parameters

ResultsThe “worse curvature” bound is more faster and better…

0 10 20 30 40 50 60 70 80 90 1001.5

2

2.5

3

3.5

4

4.5

5

5.5

6

number of iterations

err

or

worse curvaturesigmoid product bound

0 20 40 60 80 100 1200

2

4

6

8

10

12

14

16x 10

5

number of iterations

Var

iatio

na

l Fre

e E

nerg

y

worse curvaturesigmoid product bound

Page 15: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 15

Numerical experiments

Iris dataset 4 dimensions 3 classes Prior: unit variance

Experiment Learning: Batch updates Compared to MCMC

estimation based on 100K samples

Error = Euclidian distance between the mean and variance parameters

ResultsThe “worse curvature” bound is more faster and better…

0 20 40 60 80 1000.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

0.022

number of iterations

err

or

worse curvaturesigmoid product bound

Page 16: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 16

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.5

1

-5 -4 -3 -2 -1 0 1 2 3 4 50

10

20

30

40

50

60

70

x

K=3

log k ex

k

worse curvatureoptimal tightoptimal average (=2)

optimal average (=1)

optimal average (=0.1)

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.5

1

-5 -4 -3 -2 -1 0 1 2 3 4 50

10

20

30

40

50

60

70

x

K=3

log k ex

k

worse curvatureoptimal tightoptimal average (=2)

optimal average (=1)

optimal average (=0.1)

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.5

1

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10

15

20

25

30

35

40

45

50

x

log k ex

k

worse curvatureoptimal tightoptimal average (=2)

optimal average (=1)

optimal average (=0.1)

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.5

1

-5 -4 -3 -2 -1 0 1 2 3 4 50

20

40

60

80

100

120

140

x

K=100

log k ex

k

worse curvatureoptimal tightoptimal average (=2)

optimal average (=1)

optimal average (=0.1)

Page 17: 1 Efficient Bounds for the Softmax Function Applications to Inference in Hybrid Models Guillaume Bouchard Xerox Research Centre Europe.

December 7, 2007 Guillaume Bouchard, Xerox Research Center Europe 17

Jebara’s bound

One dimension: Hyperbolic cosine bound

Multi-dimensional case