Top Banner
Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University
29

Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Max-norm Projections for Factored MDPs

Carlos Guestrin

Daphne KollerStanford University

Ronald ParrDuke University

Page 2: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Motivation MDPs: plan over

atomic system states; Policy — specifies

action at every state; Polytime algorithms

for finding optimal policy.

Number of states exponential in state variables.

Page 3: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Motivation: BNs meet MDPs

Real-world MDPS have: Hundreds of variables; Googles of states.

Can we exploit problem specific structure?

For representation; For planning.

Goal: Merge BN and MDPs for Efficient Computation.

Page 4: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Factored MDPs [Boutilier et al.]

Total reward adding sub-rewards:R=R1+R2

R2

Z

R1

Y’

Z’

Y

X’ X

Time t t+1

Actions only change small parts of model.

Value function: Value of policy starting at state s.

Page 5: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Exploiting Structure

Structured value function approach: [Boutilier et al. ‘95] Collapse value function using a tree; Works well only when many states have same

value. X

3)( =XV Z

5)( =ZXV 9)( =ZXV

Model structure may imply structured value function;

Page 6: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Decomposable Value Functions

Each hi is the status of some small part(s) of a complex system: status of a machine; inventory of a store.

∑=i ii shwsV )()(

~Linear combination of restricted domain functions. [Bellman et al. ‘63][Tsitsiklis & Van Roy ’96][Koller & Parr ’99,’00]

AwV =~

K basis functions

2n states

h1(s1) h2(s1)...h1(s2) h2(s2)…...

A=

Page 7: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Our Approach

Embed structure into value function space a priori: Project into structured vector space of factored value

functions; Efficiently find closest approximation to “true” value.

∑=k kkhwV

~

Linear Combinationof Structured Features

Page 8: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Policy Iteration

Value of acting on

Guess V= greedy(V)V = value of acting on

VPRV γ+=(2nx2n)(2nx1) (2nx1)

Value RewardDiscounted expected value

Page 9: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Approximate Policy Iteration

Guess w0

t= greedy(A wt)Awt+1 value of acting on t

AwPRAw γ+≈ Approximate value determination:

Page 10: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Approximate Value Determination

Need a projection of the value function into thespace of the basis functions: (Ld projection)

( )dw AwPRAww ππ γ+−= minarg

Previous work uses L2 and weighted-L2 projections.

[Koller & Parr ’99, ’00]

AwPRAw γ+≈

Page 11: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

( ) .max ...1 ∞= +−= τππττ ττγβ AwPRAwt

P

Analysis of Approx. PI

Theorem:

;)1(

22

*0

*

γγβγ−

+−≤−∞∞

Pt

t VAwVAw

We should be doing projections in Max-norm!

( )∞

−−= γ RwAPAw wminarg

Page 12: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Approximate PI: Revisited

Guess w0

t= greedy(A wt)Awt+1 value of acting on t

AwPRAw γ+≈ Approximate value determination:

Analysis motivating projections in max-norm;

Efficient algorithm for max-norm

projection.

Page 13: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Efficient Max-norm Projection

Computing max-norm for fixed weights;

Cost networks; Efficient max-norm projection.

( )∞

−−= γ RwAPAw wminarg

∞−= bHww wminarg

AwPRAw γ+≈

Page 14: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Efficient Max-norm Projection

Computing max-norm for fixed weights;

Cost networks; Efficient max-norm projection.

( )∞

−−= γ RwAPAw wminarg

∞−= bHww wminarg

AwPRAw γ+≈

Page 15: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Max over Large State Spaces

For fixed weights w, compute max-norm:

)()(max sbshwbHwi

iis

−=−= ∑∞φ

However, if basis and target are functions of only a few variables, we can do it efficiently!

Cost Networks can maximize over large state spaces efficiently when function is factored: { }niii

XXXXCwhereCf

n

KK

1,)(max1

⊆∑

Page 16: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Efficient Max-norm Projection

Computing max-norm for fixed weights;

Cost networks; Efficient max-norm projection.

( )∞

−−= γ RwAPAw wminarg

∞−= bHww wminarg

AwPRAw γ+≈

Page 17: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Can use variable elimination to maximize over state space: [Bertele & Brioschi ‘72]

Cost Networks

[ ]),(),(),(max

),(),(max),(),(max

),(),(),(),(max

121,,

4321,,

4321,,,

CBgCAfBAf

DBfDCfCAfBAf

DBfDCfCAfBAf

CBA

DCBA

DCBA

++=

+++=

+++ A

D

B C

1f

4f 3f

2f

As in Bayes nets, maximization is exponential in size of largest factor.

Here we need only 16, instead of 64 sum operations.

Page 18: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Efficient Max-norm Projection

Computing max-norm for fixed weights;

Cost networks; Efficient max-norm projection.

( )∞

−−= γ RwAPAw wminarg

∞−= bHww wminarg

AwPRAw γ+≈

Page 19: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Algorithm for finding:

∞−∈ bHww wminarg*

.)()(max

)()(max:

;:;,,...,:

1

1

1

⎟⎠

⎞⎜⎝

⎛ −≥

⎟⎠

⎞⎜⎝

⎛ −≥

=

=

k

iiis

k

iiis

k

shwsb

andsbshwtoSubject

MinimizewwVariables

φ

φ

φφ

Max-norm Projection

Solve by Linear Programming: [Cheney ’82]

Page 20: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Representing the Constraints

Explicit representation is exponential (|S|=2n):

Sssbshwk

iii K1,)()(

1

=−≥ ∑=

φ

If basis and target are factored, can use Cost Networks to represent the constraints:

[ ]),(),(max),(),(max 4321,,

DBfDCfCAfBAfDCBA

+++≥φ

),(),(

),(),(max

43),(

1

),(121

,,

DBfDCfg

gCAfBAf

CB

CB

CBA

+≥

++≥φ

Page 21: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Approximate Policy Iteration

Guess w0

t= greedy(A wt)Awt+1 value of acting on t

How do represent the policy? How do we update it efficiently?

PolicyImprovement

Page 22: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

What about the Policy ?Contextual Action Model:

Z

Y’

Z’

Y

X’ Xdefault

Z

Y’

Z’

Y

X’ XAction 1

Z

Y’

Z’

Y

X’ XAction 2

Factored value functions and model compact policy descriptionPolicy forms a decision list:

If then action 1 else if then action 2 else if then action 1

xyz

x

Theorem: [Koller & Parr ’00]

Page 23: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Factored Policy Iteration: Summary

Guess V = greedy(V)V = value of acting on

Structure inducesdecision-list policy

Key operations isomorphicto Bayesian Network inference

Time per iteration reduced from O((2n)3) to O(poly(k,n,C))

• C = largest factor in cost net (function of structure)• k = number of basis functions (k << 2n)• poly = complexity of LP solver, in practice close to linear

Page 24: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Network Management Problem

Computers connected in a network;

Each computer can fail with some probability;

If a computer fails, it increases the probability its neighbors will fail;

At every time step, the sys-admin must decide which computer to fix.

Bidirectional Ring Ring and Star

Server

Star

3 LegsRing of Rings

Server

Server

Page 25: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Comparing projections in L2

to L

Max-norm projection also much more efficient: Single cost network rather than many many BN

inferences; Use of very efficient LP package (CPLEX).

0

0.05

0.1

0.15

0.2

0.25

0.3

3 4 5 6 7 8 9 10

number of variables

Relative error:

L2 single basis

L single basis

L pair basis

L2 pair basis

Page 26: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Results on Larger Problems: Running Time

0

100

200

300

400

500

1E+00 1E+02 1E+04 1E+06 1E+08 1E+10 1E+12 1E+14

number of states

Total Time (minutes)

Ring

3 Legs

Star

Runs in time O(n3) not O((2n)3)

Page 27: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Results on Larger Problems: Error Bounds

0

0.1

0.2

0.3

0.4

1E+00 1E+02 1E+04 1E+06 1E+08 1E+10 1E+12 1E+14

number of states

Bellman Error / Rmax

Ring

3 Legs

Star

Error remains bounded

Page 28: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Conclusions Max-norm projection directly minimizes error

bounds;

Closed-form projection operation provides exponential complexity reduction;

Exploit structure to reduce computation costs! Solve very large MDPs efficiently.

Page 29: Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

Future Work

POMDPs (IJCAI’01 workshop paper);

Additional structure: Factored actions; Relational representations; CSI;

Multi-agent systems;

Linear program solution for MDP.