Top Banner
Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school
42

Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Efficiently handling discrete structure

in machine learning

Stefanie JegelkaMADALGO summer school

Page 2: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Overview

• discrete labeling problems (MAP inference)• (structured) sparse variable selection• finding informative / influential subsets

Recurrent questions:• how model prior knowledge / assumptions? structure• efficient optimization?

Recurrent themes:• convexity• submodularity• polyhedra

Page 3: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.
Page 4: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Intuition: min vs max

Page 5: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Sensing

Place sensors to monitor temperature

Page 6: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Sensing

Ys: temperatureat location s

Xs: sensor valueat location s

Xs = Ys + noise

x1 x2 x3

x6

x5x4

y1

y4

y3

y6y5

y2

Where to measure to maximize information about y?monotone submodular function!

Page 7: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Maximizing influence

Page 8: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Maximizing diffusioneach node• monotone submodular

activation functionand random threshold

• activated if

active neighbors

Theorem (Mossel & Roch 07)

is submodular. # active after n steps

Page 9: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Diversity priors

“spread out”

Page 10: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Determinantal point processes

• normalized similarity matrix

• sample Y:

repulsion

is submodular (not monotone)

Page 11: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Diversity priors

(Kulesza & Taskar 10)

Page 12: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Summarization

(Lin & Bilmes 11)

Relevance Diversity

Page 13: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

• assume

• generic case– bi-directional greedy (BFNS12)– local search (FMV07)

• monotone function (constrained)– greedy (NWF78)– relaxation (CCPV11)

• exact methods (NW81,GSTT99,KNTB09)

NP hard

Page 14: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Monotone maximization

greedy algorithm:

Page 15: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Monotone maximizationTheorem (NWF78)

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

sensor placement

info

rmati

on g

ain

optimalgreedy

empirically:

speedup in practice:“lazy greedy” (Minoux, 78)

Page 16: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

More complex costraints

Ground setConfiguration:Sensing quality model

k

Configuration is feasible if no camera points in two directions at once

Page 17: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Matroids

17

S is independent if …

… |S| ≤ k

Uniform matroid

… S contains at most one element from each square

Partition matroid

… S contains no cycles

Graphic matroid

• S independent T S also independent

• Exchange property: S, U independent, |S| > |U| some can be added to U: independent

• All maximal independent sets have the same size

Page 18: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Matroids

18

S is independent if …

… |S| ≤ k

Uniform matroid

… S contains at most one element from each group

Partition matroid

… S contains no cycles

Graphic matroid

• S independent T S also independent

• Exchange property: S, U independent, |S| > |U| some can be added to U: independent

• All maximal independent sets have the same size

Page 19: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

More complex costraints

Ground setConfiguration:Sensing quality model

k

Configuration is feasible if no camera points in two directions at once

Partition matroid

independence if

Page 20: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Maximization over matroids

greedy algorithm:

Page 21: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Maximization over matroidsTheorem (FNW78)

• better: relaxation (continuous greedy)approximation factor (CCPV11)

Page 22: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

• concave in certain directions

• approximate by sampling

Multilinear relaxation vs. Lovász ext.

• convex

• computable in O(n log n)

Page 23: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

• assume

• generic case– bi-directional greedy (BFNS12)– local search (FMV07)

• monotone function (constrained)– greedy (NWF78)– relaxation (CCPV11)

• exact methods (NW81,GSTT99,KNTB09)

NP hard

Page 24: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Non-monotone maximization

A B

a

b

c

d

e

f

a

Page 25: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Non-monotone maximization

A B

a

c

d

e

f

a

c

Page 26: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Non-monotone maximization

Theorem (BFNS12)

Page 27: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Summary• submodular maximization

NP-hard – ½ approximation

• constrained maximizationNP-hard, mostly constant approximation factors

• submodular minimizationexploit convexity – poly-time

• constrained minimization?special cases poly-time; many cases polynomial lower bounds

Page 28: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Constraints

28

cut matching path spanning tree

ground set: edges in a graph

minimum…

Page 29: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

29

Recall: MAP and cuts

pairwise random field:

What’s the problem?

minimum cut: prefershort cut = short object boundary

aimreality

Page 30: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

30

MAP and cutsMinimum cut

minimizesum of edge weights

implicit criterion:short cut = short boundary

minimize submodular function of edges

new criterion:boundary may be long if the boundary is homogeneous

Minimum cooperative cut

not a sum of edge weights!

Page 31: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Reward co-occurrence of edges

31

submodular cost function:use few groups Si of edges

sum of weights:use few edges

7 edges, 4 types25 edges, 1 type

Page 32: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

32

ResultsGraph cut Cooperative cut

Page 33: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Constrained optimization

33

cut matching path spanning tree

convex relaxation minimize surrogate function

(Goel et al.`09, Iwata & Nagano `09, Goemans et al. `09, Jegelka & Bilmes `11, Iyer et al. `13, Kohli et al `13...)

approximate optimization

approximation bounds dependent on F: polynomial – constant – FPTAS

Page 34: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Efficient constrained optimization

34

(JB11, IJB13)

2. Solve easy sum-of-weights problem:

and repeat.

minimize a series of surrogate functions

1. compute linear upper bound

• efficient• only need to solve sum-of-weights problems

Page 35: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Does it work?

35

Goemans et al 2009

majorize-minimize

1 iteration

optimalsolution

empirical results much better than theoretical worst-case bounds!?

Page 36: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

36

Does it work?

approximate solution optimal solution

(Kohli, Osokin, Jegelka 2013)(Jegelka & Bilmes 2011)

minimum cut solution

Page 37: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

37

Theory and practice

vs. worst-case Lower bound

trees, matchings

cuts

approximation

learning

bounds from (Goel et al.‘09, Iwata & Nagano‘09, Jegelka & Bilmes‘11, Goemans et al‘09, Svitkina& Fleischer‘08, Balcan & Harvey’12)

Good approximations in practice …. BUT not in theory?

theory says: no good approximations possible (in general)

What makes some (practical) problems easier than others?

Page 38: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

38

Curvature

Theorems (IJB 2013).Tightened upper & lower bounds for constrained minimization, approximation, learning:

size of setfor submodular max: (Conforti & Cornuéjols`84, Vondrák`08)

marginal cost

single-item cost

small large

worst case

opt

cost

Page 39: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

39

Curvature and approximations

smalleris

better

Page 40: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

If there was more time…• Learning submodular functions• Adaptive submodular maximization• Online learning/optimization• Distributed algorithms• Many more applications…

• worst case vs. average practical case

pointers and references: http://www.cs.berkeley.edu/~stefje/madalgo/literature_list.pdfslides: http://www.cs.berkeley.edu/~stefje/madalgo/

Page 41: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Summary

• discrete labeling problems (MAP inference)• (structured) sparse variable selection• finding informative / influential subsets

Recurrent questions:• how model prior knowledge / assumptions? structure• efficient optimization?

Recurrent themes:• convexity• submodularity• polyhedra

Page 42: Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Submodularity and machine learning

42

bla blablala oh bla bl abl lba bla gggg hgt dfg uyg sd djfkefbjalodh wdbfeowhjkd fenjk jj

bla blablala oh bla dw bl abl lba bla gggg hgt dfg uygsd djfkefbjal odh wdbfeowhjkd fenjk jj

bla blablala oh bla bl abl lba bla gggg hgt dfg uyg efefm o

sd djfkefbjal odh wdbfeowhjkd fenjk jj ef

owskf wu

distributions over labels, setsoften: tractability –

submodularitye.g. “attractive” graphical models,

determinantal point processes

(convex) regularizationsubmodularity: “discrete

convexity”e.g. combinatorial sparse estimation

diffusion processes,covering, rank,connectivity,

entropy,economies of scale,summarization, …submodular phenomena

submodularitybehind a lot of machine

learning!