Artificial Agents Play the Beer Game Eliminate the Bullwhip Effect and Whip the MBAs Steven O. Kimbrough D.-J. Wu Fang Zhong FMEC, Philadelphia, June 2000;

Artificial Agents Play the Beer Game Eliminate the Bullwhip Effect

and Whip the MBAs

Steven O. Kimbrough

D.-J. Wu

Fang ZhongFMEC, Philadelphia, June 2000; file: beergameslides.ppt

The MIT Beer Game • Players

– Retailer, Wholesaler, Distributor and Manufacturer.

• Goal– Minimize system-wide (chain) long-run average cost.

• Information sharing: Mail. • Demand: Deterministic.• Costs

– Holding cost: $1.00/case/week.

– Penalty cost: $2.00/case/week.

• Leadtime: 2 weeks physical delay

Timing

1. New shipments delivered.

2. Orders arrive.

3. Fill orders plus backlog.

4. Decide how much to order.

5. Calculate inventory costs.

Game Board

…

The Bullwhip Effect

• Order variability is amplified upstream in the supply chain.

• Industry examples (P&G, HP).

Observed Bullwhip effect from undergraduates game playing

Retailer's Order

0

10

20

30

40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Week

Ord

er

Wholesaler's Order

0

10

20

30

40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Week

Ord

er

Distributor's Order

0

10

20

30

40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Week

Ord

er

Factory's Order

0

10

20

30

40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Week

Ord

er

Bullwhip Effect Example (P & G)Lee et al., 1997, Sloan Management Review

Analytic Results: Deterministic Demand

• Assumptions:– Fixed lead time.– Players work as a team.– Manufacturer has unlimited capacity.

• “1-1” policy is optimal -- order whatever amount is ordered from your customer.

Analytic Results: Stochastic Demand (Chen, 1999, Management Science)

• Additional assumptions:– Only the Retailer incurs penalty cost.– Demand distribution is common knowledge.– Fixed information lead time.– Decreasing holding costs upstream in the chain.

• Order-up-to (base stock installation) policy is optimal.

Agent-Based Approach

• Agents work as a team.

• No agent has knowledge on demand distribution.

• No information sharing among agents.

• Agents learn via genetic algorithms.

• Fixed or stochastic leadtime.

Research Questions

• Can the agents track the demand?

• Can the agents eliminate the Bullwhip effect?

• Can the agents discover the optimal policies if they exist?

• Can the agents discover reasonably good policies under complex scenarios where analytical solutions are not available?

Flowchart

Agents Coding Strategy

• Bit-string representation with fixed length n.• Leftmost bit represents the sign of “+” or “-”.• The rest bits represent how much to order.• Rule “x+1” means “if demand is x then order

x+1”.• Rule search space is 2n-1 – 1.

Experiment 1a: First Cup

• Environment:– Deterministic demand with fixed leadtime.– Fix the policy of Wholesaler, Distributor and

Manufacturer to be “1-1”.– Only the Retailer agent learns.

• Result: Retailer Agent finds “1-1”.

Experiment 1b• All four Agents learn under the environment of

experiment 1a.• Über rule for the team.

• All four agents find “1-1”.

Result of Experiment 1bAll four agents can find the optimal “1-1” policy

0

1

2

3

4

5

6

7

8

9

1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 20 2 22 23 24 25 26 27 28 29 30 3 32 33 34 35

Week

Retai ler

WholeSaler

Distr ibuter

Factor y

Artificial Agents Whip the MBAs and Undergraduates in Playing the MIT Beer

GameAccumulated Cost Comparison of MBAs and our agents

0

1000

2000

3000

4000

5000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Week

Accum

ula

ted C

ost

MBA Group1

MBA Group2

MBA Group3

Agent

UnderGrad Group1

UnderGrad Group2

UnderGrad Group3

Stability (Experiment 1b)• Fix any three agents to be “1-1”, and allow the fourth

agent to learn.

• The fourth agent minimizes its own long-run average cost rather than the team cost.

• No agent has any incentive to deviate once the others are playing “1-1”.

• Therefore “1-1” is apparently Nash.

Experiment 2: Second Cup

• Environment:– Demand uniformly distributed between [0,15].– Fixed lead time.– All four Agents make their own decisions as in

experiment 1b.

• Agents eliminate the Bullwhip effect.

• Agents find better policies than “1-1”.

Artificial agents eliminate the Bullwhip effect.

0

2

4

6

8

10

12

14

16

18

201 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35

Week

Ord

er

Retailer

WholeSaler

Factory

Distributer

Artificial agents discover a better policy than “1-1” when facing stochastic demand with penalty costs for all players.

Accumulated Cost vs. Week

0

1000

2000

3000

4000

5000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35

Week

Acc

umul

ated

Cos

t

Agent Cost

1-1 Cost

Experiment 3: Third Cup

• Environment:– Lead time uniformly distributed between [0,4].

– The rest as in experiment 2.

• Agents find better policies than “1-1”.• No Bullwhip effect.• The polices discovered by agents are Nash.

Artificial agents discover better and stable policies than “1-1” when facing stochastic demand and stochastic lead-time.

0

2000

4000

6000

8000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35

Week

1-1 cost

Agent cost

Artificial Agents are able to eliminate the Bullwhip effect when facing stochastic demand with stochastic leadtime.

0

5

10

15

20

25

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35

Week

Retailor Order

WholeSaler Order

Distributer Order

Factory Order

Agents learning

Winner Strategies

Generation Retailer Wholesaler Distributor Manufacturer

Total Cost

0 x – 0 x – 1 x + 4 x + 2 7380

1 x + 3 x – 2 x + 2 x + 5 7856

2 x – 0 x + 5 x + 6 x + 3 6987

3 x – 1 x + 5 x + 2 x + 3 6137

4 x + 0 x + 5 x – 0 x – 2 6129

5 x + 3 x + 1 x+ 2 x + 3 3886

6 x – 0 x + 1 x + 2 x + 0 3071

7 x + 2 x + 1 x + 2 x+ 1 2694

8 x + 1 x + 1 x + 2 x + 1 2555

9 x + 1 x + 1 x + 2 x + 1 2555

10 x + 1 x + 1 x + 2 x + 1 2555

The Columbia Beer Game

• Environment:– Information lead time: (2, 2, 2, 0).– Physical lead time: (2, 2, 2, 3).– Initial conditions set as Chen (1999).

• Agents find the optimal policy: order whatever is ordered with time shift, i.e.,Q1 = D (t-1), Qi = Qi-1 (t – li-1).

Ongoing Research: More Beer

• Value of information sharing.

• Coordination and cooperation.

• Bargaining and negotiation.

• Alternative learning mechanisms: Classifier systems.

Summary

• Agents are capable of playing the Beer Game– Track demand.

– Eliminate the Bullwhip effect.

– Discover the optimal policies if exist.

– Discover good policies under complex scenarios where analytical solutions not available.

• Intelligent and agile supply chain.• Multi-agent enterprise modeling.

A framework for multi-agent intelligent enterprise modeling

Executive Community (StrategyFinder)

Supply Chain Community

(DragonChain)

Production Community

(LivingFactory)

Pricing Agent

Investment Agent

Auction Agent

Bidding Agent

Contracting Agent

Factory Agent

Distributor Agent

Wholesaler Agent

Retailer Agent

E-Marketplace Community

(eBAC)

Artificial Agents Play the Beer Game Eliminate the Bullwhip Effect and Whip the MBAs Steven O. Kimbrough D.-J. Wu Fang Zhong FMEC, Philadelphia, June 2000;

Documents

policy slide

flowchart slide

artificial agents

game board slide

mit beer game slide

agentbased approach

agents coding strategy

sloan management review