Top Banner
Kevin Forbes Optimizing Flocking Controllers using Gradient Descent
17
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

Kevin Forbes

Optimizing Flocking Controllers using Gradient Descent

Page 2: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

Motivation

• Flocking models can animate complex scenes in a cost-effective way• But, they are hard to control – there are many parameters that interact in non-intuitive ways – animators find good values by trial and error

• Can we use machine learning techniques to optimize the parameters instead of setting them by hand?

Page 3: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

Background – Flocking modelReynolds (1987): Flocks, Herds, and Schools: A Distributed Behavioral Model

Reynolds (1999): Steering Behaviors For Autonomous Characters

• Each agent can “see” other agents in its neighbourhood

• Motion derived from weighted combination of force vectors

Alignment Cohesion Separation

Page 4: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

Background – Learning ModelLawrence (2003): Efficient Gradient Estimation for Motor Control Learning

Policy Search: Finds optimal settings of a system’s control parameter vector, as evaluated by some objective function

Stochastic elements in the system result in noisy gradient estimates, but there are techniques to limit their effects.

Simple 2-parameter example

Axes: values of control parameters

Color: value of objective function

Blue arrows: negative gradient of objective function

Red line: result of gradient descent

Page 5: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

Project Steps

1. Define physical agent model

2. Define flocking forces

3. Define objective function

4. Take derivatives of all system element w.r.t all control parameters

5. Do policy search

Page 6: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

1. Agent Model

Recursive definition: the base case is the system’s initial condition.If there are no stochastic forces, the system is deterministic (w.r.t. the initial conditions).

The flock’s policy is defined by the alpha vector.

Position, Velocity and Acceleration defined as in Reynolds (1999):

Page 7: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

2. ForcesThe simulator includes the following forces:

Flocking Forces:Cohesion*, Separation*, Alignment

Single-Agent Forces:Noise, Drag*

Environmental Forces:Obstacle Avoidance, Goal Seeking*

* Implemented with learnable coefficients (so far)

Page 8: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

3. Objective Function

The exact function used depends upon the goals of the particular animation

I used the following objective function for the flock at time t:

The neighbourhood function implied here (and in the force calculations) will come back to haunt us on the next slide. . .

Page 9: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

4. Derivatives

In order to estimate the gradient of the objective function, it must be differentiable.

We can build an appropriate N-function by multiplying transformed sigmoids together:

Other derivative-related wrinkles:

• Can not use max/min truncations• Numerical stability issues• Increased memory requirements

Page 10: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

5. Policy SearchUse Monte Carlo to estimate the expected value of the gradient:

This assumes that the only random variables are the initial conditions. A less-noisy estimate can be made if the distribution of the stochastic forces in the model are taken into account using importance sampling.

Page 11: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

The Simulator

Features:• Forward flocking simulation• Policy learning and mapping• Optional OpenGL visualization• Spatial sorting gives good performance

Limitations:• Wraparound• Not all forces are learnable yet• Buggy neighbourhood function derivative

Page 12: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

Experimental Method

Simple Gradient descent:

• Initialize flock, assign a random alpha• Run simulation ( N times)• Step (with annealing) in negative gradient direction• Reset flock

Steps 2-4 are repeated for a certain number of steps

Page 13: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

Results - ia

Simple system test:

• 1 agent• 2 forces: seek and drag• Seek target in front of agent; agent initially moving towards target• Simulate 2 minutes• No noise• Take best of 10 descents

Page 14: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

Results - ib

Simple system test w/noise:

• Same as before, but set the wander force to strength 1• Used N=10 for the Monte Carlo estimate

Effects of noise:

• Optimal seek force is larger• Both surface and descent path are noisy

Page 15: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

Results - ii

More complicated system:

• 2 Agents• Seek and drag set at 10, .1• Learn cohesion, separation• Seek target orbiting agents’ start position• Simulate 2 minutes• Target distance of 5,10• Noise in initial conditions

Results:

• Target distance does influence the optimized values• Search often gets caught in foothills

Page 16: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

Results - iii

Higher dimensions:

• 10 agents• Learn cohesion, separation, seek, drag• Otherwise the same as last test

Results:

• Objective function is being optimized (albeit slowly)!• Target distance is matched!

Page 17: Kevin Forbes Optimizing Flocking Controllers using Gradient Descent.

Conclusion

• Technique shows promise• Implementation has poor search performance

Future Work

• Implement more learnable parameters• Fix neighbourhood derivative• Improve gradient search method• Use importance sampling!

Demonstrations