Page 1
Trajectory Optimization for Underactuated (and other)
Robots
LA(x0, U ;µ,�) = `f (xN ) +N�1X
k=0
`(xk, uk) +
1
2
NX
k=0
c(xk,uk)TIµkc(xk, uk) + diag(�k)c(xk, uk),
Q
xx
= Q
xx
+@c(x, u)
@x
I
µ
@c(x, u)
@x
Q
uu
= Q
uu
+@c(x, u)
@u
I
µ
@c(x, u)
@u
+ ⇢I
Q
xu
= Q
xu
+@c(x, u)
@x
I
µ
@c(x, u)
@u
Q
x
= Q
x
+ c(x, u)Iµ
@c(x, u)
@x
+ diag(�)@c(x, u)
@x
Q
u
= Q
u
+ c(x, u)Iµ
@c(x, u)
@u
+ diag(�)@c(x, u)
@u
http://agile.seas.harvard.edu [email protected]
Scott Kuindersma Harvard Agile Robotics Lab
Page 2
Why Dynamic Motion Planning?• Kinematic planning is simpler • But many tasks require careful
consideration of dynamics • Underactuated systems highlight this
point
Page 3
x =
✓
✓
�u ⌘ base torque
(nonlinear)x = f(x, u)
−4 −3 −2 −1 0 1 2 3 4−6
−4
−2
0
2
4
6
θ
θ
The Simplest Robot
Page 4
How can we choose u=π(x) such that the pendulum stabilizes in the upright position?
Invert gravity!
Page 5
−4 −3 −2 −1 0 1 2 3 4−6
−4
−2
0
2
4
6
θ
θ
Control : reshaping the phase portrait
Invert Gravity
Page 6
−4 −3 −2 −1 0 1 2 3 4−6
−4
−2
0
2
4
6
θ
θ
−4 −3 −2 −1 0 1 2 3 4−6
−4
−2
0
2
4
6
θ
θ
Q: minimal effort solution?
Page 7
one actuator here
How do control this system to the vertical position?
Underactuated… have exploit actuation and passive dynamics
to achieve goals!
The Acrobot
Page 8
u=π(x) is not obvious, but we can compute it with
optimization! (more later)
Acrobot Swing-up
Page 9
Acrobot ~ Simple Walker
Page 10
But my robot has lots of motors…Don’t worry!
The tools we will discuss today are useful for generating dynamic behaviors in all kinds of robots!
Page 11
The optimization view of the world
• There are many ways to generate behaviors in robots:
• Heuristic methods, virtual model control, programming by demonstration, etc.
• Optimization is a powerful and general framework for creating goal-directed behavior in robots
• Encode goals using cost and constraint functions
Page 12
Optimal Control
• Find a policy, , that minimizes:
• Dynamics constraints:
u = ⇡(x)
subject to
“move quickly”“minimize energy”
“get to the goal”
J = `f (xN ) +N�1X
k=0
`(xk, uk)
xk+1 = f(xk, uk)
(discrete time)
Page 13
A note about time discretization
J = `f (x(tf )) +
Z tf
0`(x(t), u(t))dt
)x = f(x, u)
Continuous time
h = tf/N
J = `f (xN ) +N�1X
k=0
`(xk, uk)h
xk+1 = xk + h · f(xk, uk)
(you can use higher-order integrators too)
(time step)
Discrete time
Page 14
Example: Airplane Barrel Roll• Goal: 180 degree barrel roll
• States:
• Input:
• Cost: J = (xN � xref)
TQ(xN � xref) +
N�1X
k=0
u
TkRuk
x 2 R12
u 2 R4
Page 15
An Intuitive Solutionπ(x) = ?
π(x) = ?
Idea: work backwards in time!
Find a policy, , that minimizes cost u = ⇡(x)
Page 16
An Algebraic View• Cost:
• Define V*(x,n) as the lowest possible cost remaining starting in state x at timestep n
• At timestep N, there’s nothing to do!
J = `f (xN ) +N�1X
k=0
`(xk, uk)
V
⇤(x,N) = `f (x)
Page 17
An Algebraic Viewx
0 = f(x, u)V
⇤(x,N � 1) = minu
[`(x, u) + `f (x0)]
V
⇤(x,N � 2) = minu
h`(x, u) + min
u0[`(x0
, u
0) + `f (x00)]
i
= minu
[`(x, u) + V
⇤(x0, N � 1)]
Page 18
An Algebraic View
V
⇤(x, n) = minu
[`(x, u) + V
⇤(x0, n+ 1)]
• Key idea… Bellman’s equation
x
0 = f(x, u)V
⇤(x,N � 1) = minu
[`(x, u) + `f (x0)]
V
⇤(x,N � 2) = minu
h`(x, u) + min
u0[`(x0
, u
0) + `f (x00)]
i
= minu
[`(x, u) + V
⇤(x0, N � 1)]
Page 19
Optimal behavior with 1-step lookahead
⇡
⇤(x, n) = argminu
[`(x, u) + V
⇤(x0, n+ 1)]
Page 20
Quiz
• Q1: Suppose my cost is quadratic, is also V*(x,n) quadratic in general?
• Q2: Does Bellman’s equation hold for multiplicative costs?
V
⇤(x, n) = minu
[`(x, u) + V
⇤(x0, n+ 1)]
A1: No (thanks to nonlinear dynamics)
A2: No… try it on our derivation!
J =NY
k=0
`(xk, uk)
Page 21
Clearly V*(x,n) is useful, so how can we compute it?
Page 22
Dynamic Programming• Turn Bellman equation into an update rule
• Simple algorithm: • Discretize state space
• Loop over all states
• Apply Bellman update
• Until V* converged
➡ Polynomial in the number of states
➡ …but the number of states grows exponentially in the dimension of the robot
V
⇤(x, n) minu
[`(x, u) + V
⇤(xn+1, n+ 1)]
Page 23
Curse of Dimensionality• As state dimension increases, discretization results in an
exponential blowup
• What if we instead settle for local policies around a trajectory?
• Which trajectory?
• How “local”?
Page 24
Differential Dynamic Programming
1. Input:
2. Simulate dynamics forward using
4. Compute an approximation to V(x,k) at every state
5. Compute control modifications using Bellman’s equation
6. Go to #2 until convergence
x0
x
goal
x0, u0:N�1
xk+1 = f(xk, uk)
Page 25
Differential Dynamic Programming• Given initial trajectory as input,
• Define:
• Approximate to second order:
x0:N , u0:N�1
Q(�x, �u) = `(x+ �x, u+ �u) + V (f(x+ �x, u+ �u))
Q(�x, �u)
�x, �u
0
Quadratic approxQ function
Page 26
Differential Dynamic Programming
Q(�x, �u)
�x, �u
0
Quadratic approxQ function
Q(�x, �u) ⇡ 1
2
2
41�x
�u
3
5T
2
40 Q
T
x
Q
T
u
Q
x
Q
xx
Q
xu
Q
u
Q
T
xu
Q
uu
3
5
2
41�x
�u
3
5,
@
2Q
@x@u
etc.
[see, e.g., Tassa thesis 2011]
Page 27
Backwards Pass• Want to minimize Q backwards in time
• For k=N:0
• Set derivative of Q equal to zero, solve for
• Substitute into Q approximation at time k and compute Q approximation at k-1
�u = �Q
�1uu
(Qux
�x+Q
u
) ⌘ K�x+ a
�u
�u
Page 28
Forwards Pass• Forward simulate system from x0 using feedback
controller computed in backwards pass
• Repeat until convergence
xk+1 = f(xk, uk +Kk�xk + ak)
x
goal
x0 u0u1
x0
x
goal
e.g., max(�u) < ✏
Page 29
Some DDP Variants• Most commonly used variant is Iterative LQR (iLQR)
• Leave out the 2nd derivative of dynamics in Q
• Slightly slower convergence (superlinear vs quadratic), otherwise exactly the same algorithm
• Input-constrained DDP to handle torque limits
• Unscented dynamic programming
• Derivative-free, Q update based on the Unscented Transform[Manchester & Kuindersma, CDC’16]
[Tassa et al., ICRA’14]
[Li et al., 2004]
Page 30
Examples
• Barrel Roll
• Todorov MPC
Page 31
DDP for Model-Predictive Control
Page 32
DDP for Model-Predictive Control
Page 33
So far…
• Computing globally-optimal policies is infeasible for most robots
• Instead settle for locally-optimal trajectories
• DDP is an efficient algorithm for computing trajectories and local feedback controllers, but…
➡ State constraints require care (typically approximated using costs)
➡ Attention to implementation details needed to ensure good numerical performance
➡ Assumes f(x,u) is a smooth function of x,u
Page 34
Trajectory Optimization as an NLP• We arrived at DDP using Bellman’s equation + local
approximation
• We can instead transcribe the optimal control problem as a standard nonlinear program:
minimize
x0:N ,u0:N�1
`
f
(x
N
) +
N�1X
k=0
`(x
k
, u
k
)
subject to x
k+1 = f(x
k
, u
k
)
g(x
k
, u
k
) 0
h(x
k
, u
k
) = 0
“don’t walk through walls”
“keep your left foot on the ground”
explicit dynamics constraints
Page 35
Direct Transcription
• Why?
✓ Easy to add nonlinear state and input constraints
✓ Exploit fast and reliable off-the-shelf solvers (e.g., SNOPT [1])
• Spend your time designing costs/constraints, not debugging solvers
[1] http://www.sbsi-sol-optimize.com/asp/sol_product_snopt.htm
Page 36
Intuition: Newton’s Method• A similar picture as before (ignoring constraints)
• Now updates are performed over entire trajectory, rather than sequentially backwards in time
J
x0, u0, . . . , xN
�x, �u
Q(�x, �u)
k = N
k = N � 5
. . .
Page 37
Sequential Quadratic Programming• Many powerful off-the-shelf solvers implement SQP
• High-level idea:
• At each iteration linearize constraints at current trajectory, construct local quadratic cost
• Solve local quadratic program (QP)—this is easy
• Iterate until trajectory converges or max iters
➡ Lots of important practical details you can safely ignore if using an established solver!
[see Nocedal & Wright, 2006]
Page 38
DDP vs. SQP• Differential Dynamic Programming
• Simulates dynamics to compute
• “Shooting method”
• Each iteration solves a sequence of small, 1-step optimizations
• SQP:
• Satisfies dynamics as explicit constraints (often with smaller N)
• “Direct transcription”
• Each iteration solves a single QP over the entire trajectory
x1:N
Page 39
Two ends of a spectrum
DDP (single shooting)Simulate entire trajectory,
1-step updates
Direct transcriptionDynamics are constraints for all
timesteps, N-step updates
Multiple shootingSimulate sub-trajectories, enforce defect constraints
Page 40
Example: Spring Flamingo
• States: Inputs:
• Passive spring ankle
• Cost:
• Final state constraint:
• Dynamics constraints?
≈
x =
q
q
�2 R18 u 2 R4
J =X
k
(xk � x
nom
)TQ(xk � x
nom
) + u
TkRuk
xN = x
goal
Page 41
Handling Contact Dynamics
• Idea: add contact forces, , as decision variables, then constrain them to be physical
• Approach 1: pre-specify contact sequence ahead of time, assign force variables to states in contact
• Works fine for two feet, but exponentially many possibilities as # of contact points grows
• Approach 2: Let the solver “discover” contact sequences for you!
�0:N�1
Page 42
�
Contact-Implicit Constraints• Add contact forces, , as decision variables at
each time step, then constrain them to be physical
• Forces must obey friction limits
• Robot cannot penetrate the ground:
• Forces cannot be applied unless on the ground
• Tangental component maximally dissipates energy
�0:N�1
�(q) � 0
�(q) · � = 0
Page 43
Contact-Implicit Direct Transcription
minimize
x0:N ,u0:N�1,�0:N�1
`
f
(x
N
) +
N�1X
k=0
`(x
k
, u
k
)
subject to x
k+1 = f(x
k
, u
k
,�
k
)
�(q
k
) � 0
�(q
k
) · �k
= 0
friction, dissipation constraints
dynamics includes forces
Contact-related constraints{
[see: Posa et al. IJRR’13]
Page 44
Complementarity Constraints
�
�
�
�0 0
� · � = 0 � · � s
add slack variable , then penalize it in the cost
Useful trick:s � 0
Page 45
Spring Flamingo SQP Optimization
Page 46
Spring Flamingo SQP Converged
Page 47
LittleDog Climbing a Stair
Page 49
Tracking Trajectories
• Unlike DDP, direct methods output only the nominal trajectory
• Running this open loop will fail, even in simulation
• Hand-tuned PID can work in some cases
• Linear-quadratic regulator (LQR)
• a slightly more elegant way to derive linear feedback controllers
Page 50
LQR Trajectory Tracking• Given trajectory:
• Linearize dynamics:
• Define a tracking cost:
• Note—in general, different from the trajectory optimization cost
• Desired output: a tracking controller, , that works in the neighborhood of the trajectory
Jlqr = x
TNQN xN +
N�1X
k=0
x
TkQxk + u
TkRuk
x0:N , u0:N�1
xk = x� xk
uk = u� ukxk+1 = Akxk +Bkuk
⇡(x, k)
Page 51
LQR Trajectory Tracking• The return of Bellman!
• Thanks to the linearized dynamics, V* will have a quadratic form for all k=1:N
V
⇤(x, N) = x
TQN x
V
⇤(x, k) = x
TQx+min
u
⇥u
TRu+ V
⇤(x0, k + 1)
⇤
V
⇤(x, k) = x
TSkx
(skipping ugly algebra)Sk�1 = Q+AT
k SkAk �ATk SkBk(R+BT
k SkBk)�1(BT
k SkAk)
Page 52
Summary of LQR• Set
• For k=N:1
• For k=0:N-1
• Output linear feedback controller:
Sk�1 = Q+ATk SkAk �AT
k SkBk(R+BTk SkBk)
�1(BTk SkAk)
SN = QN
Kk = (R+BTk Sk+1Bk)
�1BTk Sk+1Ak
u
lqrk = �Kk(x� xk) + uk
Page 53
Control Stabilized Running
Page 54
LQR Variants• Slightly different formulations for continuous time and
infinite horizon
• Note MATLAB command: lqr(A,B,Q,R)
• Can handle equality constraints through projection into constraint nullspace (e.g. [Posa, Kuindersma, Tedrake, ICRA’15])
• “Hybrid” variants for handling impacts and reset maps
• Instantaneously handle input constraints using QPs (e.g. [Kuindersma et al., Auton. Robots 2016])
Page 55
Summary• Dynamics matters, especially (but not only) for underactuated systems
• Most of these ideas have a long history in the control literature outside of robotics
• Shooting vs direct transcription
• Multiple shooting is a middle-ground
• Linear-quadratic feedback is a powerful tool for tracking planned motions
• Contact constraints are essential for planning manipulation and locomotion behaviors. Many competing strategies, I gave you one
• Still work to be done…
Page 56
Real robots still avoid contact, but we’re working on it!