Top Banner
Machine Learning for Control: Experiments with Agile Quadrupeds and Perching UAVs Russ Tedrake Assistant Professor MIT Computer Science and Artificial Intelligence Lab Russ Tedrake, MIT Machine Learning for Control
45

Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Oct 07, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Machine Learning for Control: Experimentswith Agile Quadrupeds and Perching UAVs

Russ TedrakeAssistant Professor

MIT Computer Science and Artificial Intelligence Lab

Russ Tedrake, MIT Machine Learning for Control

Page 2: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

The State-of-the-Art in Bipedal Walking Robots

Honda’s ASIMO

Russ Tedrake, MIT Machine Learning for Control

Page 3: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Performance of Honda’s ASIMO Control

Works well on flat terrain, and even up stairs

Trajectories are constrained by an overly restrictive measure ofdynamic balance

Cannot compete with humans in terms of:

Speed (.83 m/s top speed)Efficiency (uses roughly 20xenergy, scaled, as a human)Robustness (no examples onuneven or unmodelled terrain)

Russ Tedrake, MIT Machine Learning for Control

Page 4: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

The Challenge: Underactuated Systems

Russ Tedrake, MIT Machine Learning for Control

Page 5: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Walking is underactuated

Consider a 7 link planar biped:

6 actuators (one at each joint)

Want to control 7+ degrees offreedom.

Note: “Fully”-actuated if we assume that one foot is boltedto the ground (walking robotic arms)

Russ Tedrake, MIT Machine Learning for Control

Page 6: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Honda’s ASIMO Control

Honda’s solution: constrain thedynamics

Keep foot flat on the ground(fully actuated)Estimate danger of foot roll bymeasuring ground reaction forcesCarefully design desiredtrajectoriesKeep knees bent (avoidsingularities)High-gain PD control

Same approach used by a largenumber of “ZMP walkers”

Russ Tedrake, MIT Machine Learning for Control

Page 7: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Passive Dynamic Walking

[Collins et al., 2001]

Russ Tedrake, MIT Machine Learning for Control

Page 8: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Robust and Efficient Bipeds

To achieve high performance, we must relinquish control!

High-gain feedback is not necessary

Passive walker only works downhill, initialized by a “skilledhand”

Challenge: How do you design a minimal control system topush and pull the natural dynamics?

Russ Tedrake, MIT Machine Learning for Control

Page 9: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

A Machine Learning Approach

Reinforcement learning

Every time the robot runs, give it a score(cost function)

“Learning” algorithm on robotassociates control actions with rewards

Through trial and error, the robot canlearn very advanced skills

Improved algorithms allow the robot tolearn more skills in less trials

Reinforcement learning is also known asapproximate optimal control

Russ Tedrake, MIT Machine Learning for Control

Page 10: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

A Machine Learning Approach

Reinforcement learning

Every time the robot runs, give it a score(cost function)

“Learning” algorithm on robotassociates control actions with rewards

Through trial and error, the robot canlearn very advanced skills

Improved algorithms allow the robot tolearn more skills in less trials

Reinforcement learning is also known asapproximate optimal control

Russ Tedrake, MIT Machine Learning for Control

Page 11: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

A Machine Learning Approach

Reinforcement learning

Every time the robot runs, give it a score(cost function)

“Learning” algorithm on robotassociates control actions with rewards

Through trial and error, the robot canlearn very advanced skills

Improved algorithms allow the robot tolearn more skills in less trials

Reinforcement learning is also known asapproximate optimal control

Russ Tedrake, MIT Machine Learning for Control

Page 12: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

A Machine Learning Approach

Reinforcement learning

Every time the robot runs, give it a score(cost function)

“Learning” algorithm on robotassociates control actions with rewards

Through trial and error, the robot canlearn very advanced skills

Improved algorithms allow the robot tolearn more skills in less trials

Reinforcement learning is also known asapproximate optimal control

Russ Tedrake, MIT Machine Learning for Control

Page 13: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

A Machine Learning Approach

Reinforcement learning

Every time the robot runs, give it a score(cost function)

“Learning” algorithm on robotassociates control actions with rewards

Through trial and error, the robot canlearn very advanced skills

Improved algorithms allow the robot tolearn more skills in less trials

Reinforcement learning is also known asapproximate optimal control

Russ Tedrake, MIT Machine Learning for Control

Page 14: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

A Machine Learning Approach

Reinforcement learning

Every time the robot runs, give it a score(cost function)

“Learning” algorithm on robotassociates control actions with rewards

Through trial and error, the robot canlearn very advanced skills

Improved algorithms allow the robot tolearn more skills in less trials

Reinforcement learning is also known asapproximate optimal control

Russ Tedrake, MIT Machine Learning for Control

Page 15: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

A Machine Learning Approach

Reinforcement learning

Every time the robot runs, give it a score(cost function)

“Learning” algorithm on robotassociates control actions with rewards

Through trial and error, the robot canlearn very advanced skills

Improved algorithms allow the robot tolearn more skills in less trials

Reinforcement learning is also known asapproximate optimal control

Russ Tedrake, MIT Machine Learning for Control

Page 16: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Reinforcement learning example

Russ Tedrake, MIT Machine Learning for Control

Page 17: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Learning To Walk in 20 minutes

Science, 2005

Russ Tedrake, MIT Machine Learning for Control

Page 18: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Learning Results

Learning only on the robot (no simulation)

Algorithm comes with theoretical convergence guarantees

Stochastic gradient descent on the average reward, despitenoise and disturbancesConverges to local minima

Very fast and robust convergence

Learns from a “blank slate” in < 20 minutesAfter initial learning, adapts to changes in a few steps

Always learning, always adapting to the terrain as it walks

Russ Tedrake, MIT Machine Learning for Control

Page 19: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Learning Results

Learning only on the robot (no simulation)

Algorithm comes with theoretical convergence guarantees

Stochastic gradient descent on the average reward, despitenoise and disturbancesConverges to local minima

Very fast and robust convergence

Learns from a “blank slate” in < 20 minutesAfter initial learning, adapts to changes in a few steps

Always learning, always adapting to the terrain as it walks

Russ Tedrake, MIT Machine Learning for Control

Page 20: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Learning Results

Learning only on the robot (no simulation)

Algorithm comes with theoretical convergence guarantees

Stochastic gradient descent on the average reward, despitenoise and disturbancesConverges to local minima

Very fast and robust convergence

Learns from a “blank slate” in < 20 minutesAfter initial learning, adapts to changes in a few steps

Always learning, always adapting to the terrain as it walks

Russ Tedrake, MIT Machine Learning for Control

Page 21: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Learning Results

Learning only on the robot (no simulation)

Algorithm comes with theoretical convergence guarantees

Stochastic gradient descent on the average reward, despitenoise and disturbancesConverges to local minima

Very fast and robust convergence

Learns from a “blank slate” in < 20 minutesAfter initial learning, adapts to changes in a few steps

Always learning, always adapting to the terrain as it walks

Russ Tedrake, MIT Machine Learning for Control

Page 22: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

How far can we take this learning idea?

Russ Tedrake, MIT Machine Learning for Control

Page 23: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Simple Bipeds on Rough Terrain

Russ Tedrake, MIT Machine Learning for Control

Page 24: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Quadrupeds on rough terrain (motion planning)

Russ Tedrake, MIT Machine Learning for Control

Page 25: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

High-dimensional underactuated motion planning

Formulate underactuated control as a search problem (aka.kino-dynamic motion planning)

Current real-time methods limited to relatively low-dimensionalproblems (simple robots)Underactuated systems are notoriously difficult (tunnels andtubes)

Big idea: Exploit structure in the eqs of motion

H(q)q + C(q, q)q + G(q) = Bτ.

Example: Dimensionality reduction for motion planning

Make a high-dimensional underactuated system act like alow-dimensional fully-actuated systemMethod either produces the correct torques, or says “can’t dothat”Perform a low-dimensional search

Russ Tedrake, MIT Machine Learning for Control

Page 26: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Little Dog highlights

Russ Tedrake, MIT Machine Learning for Control

Page 27: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Flapping-Winged Flight

Russ Tedrake, MIT Machine Learning for Control

Page 28: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Flapping-Winged Flight

Russ Tedrake, MIT Machine Learning for Control

Page 29: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Autonomous Flapping-Winged Flight

Russ Tedrake, MIT Machine Learning for Control

Page 30: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Autonomous Flapping-Winged Flight

Russ Tedrake, MIT Machine Learning for Control

Page 31: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Control in unsteady fluid dynamics

Russ Tedrake, MIT Machine Learning for Control

Page 32: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Two Goals for Flapping Flight

Probably won’t beat a propellor for efficient forward flight

Two goals for outperforming fixed-wing aircraft:

1 Aggressive aerial maneuvers (e.g. landing on a perch)

2 “Harvesting” energy from the air

Russ Tedrake, MIT Machine Learning for Control

Page 33: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

The Heaving Foil (work with Jun Zhang, NYU)

[Vandenberghe et al., 2006]

Rigid, symmetric wing

Driven vertically

Free to rotatehorizontally

Russ Tedrake, MIT Machine Learning for Control

Page 34: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Symmetry breaking leads to forward flight

Russ Tedrake, MIT Machine Learning for Control

Page 35: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Flow visualization

Russ Tedrake, MIT Machine Learning for Control

Page 36: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Prospects for optimization

Previous work only consider sinusoidal trajectories

Control problem: optimize stroke form to maximize efficiency

CFD model[Alben and Shelley, 2005]

Takes approximately 36 hours to simulate 30 flaps

Can we perform the optimization directly in the fluid?

Russ Tedrake, MIT Machine Learning for Control

Page 37: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Prospects for optimization

Previous work only consider sinusoidal trajectories

Control problem: optimize stroke form to maximize efficiency

CFD model[Alben and Shelley, 2005]

Takes approximately 36 hours to simulate 30 flaps

Can we perform the optimization directly in the fluid?

Russ Tedrake, MIT Machine Learning for Control

Page 38: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Prospects for optimization

Previous work only consider sinusoidal trajectories

Control problem: optimize stroke form to maximize efficiency

CFD model[Alben and Shelley, 2005]

Takes approximately 36 hours to simulate 30 flaps

Can we perform the optimization directly in the fluid?

Russ Tedrake, MIT Machine Learning for Control

Page 39: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Learning results

Parameterized splinetrajectory

Policy gradient learning

Learning improves efficiency3x in just 15 minutes

Dynamic explanation ofoptima

Implications:

Optimization inexperimental fluiddynamicsControl for the birds

0 10 20 30 40 504

5

6

7

8

9

10

11

12

13

TrialRe

ward

IC − Sine WaveIC − Smoothed Square Wave

Russ Tedrake, MIT Machine Learning for Control

Page 40: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

An airplane that can land on a perch

Conventional aircraft use high-gain feedback and avoidcomplicated nonlinear dynamics regimes (just like ASIMO)

Higher performance in maneuverability and efficiency if youexploit the nonlinear, unsteady fluid dynamics (just likewalking)

A benchmark problem: landing on a perch

Russ Tedrake, MIT Machine Learning for Control

Page 41: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Experiment Design

Glider (no propellor)

Dihedral (passive rollstability)

Offboard sensing andcontrol

Russ Tedrake, MIT Machine Learning for Control

Page 42: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

System Identification

Real flight data

Very high angle-of-attack regimesSurprisingly good match to theoryVortex shedding

Lift Coefficient

−20 0 20 40 60 80 100 120 140 160−1.5

−1

−0.5

0

0.5

1

1.5

Angle of Attack

Cl

Glider DataFlat Plate Theory

Drag Coefficient

−20 0 20 40 60 80 100 120 140 160−0.5

0

0.5

1

1.5

2

2.5

3

3.5

Angle of Attack

Cd

Glider DataFlat Plate Theory

Russ Tedrake, MIT Machine Learning for Control

Page 43: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Glider Perching

Enters motion capture @ 6 m/s.

Perch is < 3.5 m away.

Entire trajectory @ 1 second.

RequiresSeparation!

Russ Tedrake, MIT Machine Learning for Control

Page 44: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Conclusions

New tools from machine learning can help solveunderactuated control problems, such as the control oflocomotion, but we must take advantage of:

Mechanical design and plant dynamicsClassical control

Approximate optimal control solutions can exploit thedynamics of walking machines, to produce efficient and robustgaits.

Initial evidence suggests that designing control policies forfluid systems (at intermediate Reynolds numbers) might beeasier than completely describing their dynamics (birds don’tsolve Navier Stokes)

Russ Tedrake, MIT Machine Learning for Control

Page 45: Machine Learning for Control: Experiments with Agile ...people.csail.mit.edu/teller/agile/RH3_Tedrake_RobotLocomotion.pdfPerformance of Honda’s ASIMO Control. Works well on at terrain,

Alben, S. and Shelley, M. (2005).Coherent locomotion as an attracting state for a free flappingbody.Proceedings of the National Academy of Sciences,102(32):11163–11166.

Collins, S. H., Wisse, M., and Ruina, A. (2001).A three-dimensional passive-dynamic walking robot with twolegs and knees.International Journal of Robotics Research, 20(7):607–615.

Vandenberghe, N., Childress, S., and Zhang, J. (2006).On unidirectional flight of a free flapping wing.Physics of Fluids, 18.

Russ Tedrake, MIT Machine Learning for Control