Top Banner
Towards Safe Learning-Based Control Performance & Constraint Satisfaction under Uncertainties Melanie Zeilinger Institute for Dynamic Systems and Control [email protected] Collaboration with: E. Klenske, P. Hennig, B. Schölkopf (MPI) K. Akametalu, J. Fisac C. Tomlin (UC Berkeley)
45

Towards Safe Learning-Based Control

Dec 18, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Towards Safe Learning-Based Control

Towards Safe Learning-Based ControlPerformance & Constraint Satisfaction under Uncertainties

Melanie ZeilingerInstitute for Dynamic Systems and Control

[email protected]

Collaboration with:E. Klenske, P. Hennig, B. Schölkopf (MPI)K. Akametalu, J. Fisac C. Tomlin (UC Berkeley)

Page 2: Towards Safe Learning-Based Control

Variable speed control of compressorsABB drives control the compressors of the world‘s longest gas export pipeline

theweek.comabb.com control.ee.ethz.ch

PerformanceSafety

Page 3: Towards Safe Learning-Based Control

Performance and SafetyGo Together

no knowledge of safety boundaries

with knowledge of safety boundaries

Video courtesy of Kene Akametalu

Page 4: Towards Safe Learning-Based Control

Variable speed control of compressorsABB drives control the compressors of the world‘s longest gas export pipeline

theweek.comabb.com control.ee.ethz.ch

Model! System dynamics

! Constraints

! Computation

PerformanceSafety

Page 5: Towards Safe Learning-Based Control

The Quest for a Good ModelModel Uncertainty

x = f (x, u, t, d) g(x, u, t, d) � 0

Complexity External effects Variation

tcadsolutions.com

autoguide.com

Page 6: Towards Safe Learning-Based Control

Optimization-based Control High Performance for Constrained Systems

6

mea

sure

men

ts

Dynamical System

control input

minN�

k=0

l(x(k), u(k)) + g(x(N))

Z�[� x(k + 1) = f (x(k), u(k), d(k))

(x(k), u(k)) � X

Standard control

Model predictive control

Page 7: Towards Safe Learning-Based Control

Optimization-based Control High Performance for Constrained Systems

7

mea

sure

men

ts

Dynamical System

control input

minN�

k=0

l(x(k), u(k)) + g(x(N))

Z�[� x(k + 1) = f (x(k), u(k), d(k))

(x(k), u(k)) � X

Rely on system model! High performance! Recursive constraint satisfaction! Stability by design! Automatic synthesis

Page 8: Towards Safe Learning-Based Control

Automatic Synthesis Of High Performance, Safe Controllers

8

Learning&

Controller

Performance &

Safety

Software/Implementation

mea

sure

men

ts

Dynamical System

control input

Environment Human

Online data

Specifications+

Online data

Learning-basedcontroller

Page 9: Towards Safe Learning-Based Control

Previous WorkReal-time Constraints in Optimization-based Control

9

ControllerPerformance

& Safety

Software/Implementation

! Real-time model predictive control [Zeilinger et al., 2008 – 2014]

! High-speed, certified optimization [Domahidi, Z. et al., 2012; Pu, Z. et al., 2014, 2016]

mea

sure

men

ts

Dynamical System

control input

minN�

k=0

l(x(k), u(k)) + g(x(N))

Z�[� x(k + 1) = f (x(k), u(k), d(k))

(x(k), u(k)) � X

Software/Specifications+

Online data

Page 10: Towards Safe Learning-Based Control

Outline

10

mea

sure

men

ts

Dynamical System

control input

Environment Human

Online data

This talk:! Learning for performance: Tailoring optimal controllers online ! Safety guarantees: Constraint satisfaction for any performance controller12

Learning-basedcontroller

Page 11: Towards Safe Learning-Based Control

Outline

11

mea

sure

men

ts

Dynamical System

control input

Environment Human

Online data

This talk:! Learning for performance: Tailoring optimal controllers online ! Safety guarantees: Constraint satisfaction for any performance controller12

minN�

k=0

l(x(k), u(k)) + g(x(N))

Z�[� x(k + 1) = f (x(k), u(k), d(k))

(x(k), u(k)) � X

How to predict complex behavior, quantify & reduce

uncertainties?

Page 12: Towards Safe Learning-Based Control

Quantifying Uncertain Predictions From Online Data

12

estimated RegressionDynamical

Model

Infer function d (t ) online:

measured states

applied inputs

{x(t k)}Kk=0

{u(t k)}Kk=0

�d(t k)

� Kk=0

inferred function d(t )

e.g. x(k + 1 )

= f (x(k ), u(k )) + d(x, u, k )

x(k + 1) = f (x(k), u(k), d(k))

d

x 1 x2

Page 13: Towards Safe Learning-Based Control

Quantifying Uncertain Predictions From Online Data Using Gaussian Process Regression

13

meanfunction

variance

measured states

applied inputs

estimated GPRegression

Dynamical Model

{x(t k)}Kk=0

{u(t k)}Kk=0

d(t )

� 2d (t )

" Predictive distribution

Infer function d (t ) online:

GP prior: mean + covariance fcn

d(t )

2 �d (t )

d(t )

t

�d(t k)

� Kk=0

x(k + 1) = f (x(k), u(k), d(k))

Page 14: Towards Safe Learning-Based Control

Quantifying Uncertain Predictions From Online Data Using Gaussian Process Regression

14

meanfunction

variance

measured states

applied inputs

estimated GPRegression

Dynamical Model

{x(t k)}Kk=0

{u(t k)}Kk=0

d(t )

� 2d (t )

Infer function d (t ) online:

GP prior: mean + covariance fcn

�d(t k)

� Kk=0

Examples:! Mechanical errors! Prediction of loads, efficiencies etc. in energy systems! Unknown nonlinear dynamics (e.g. ground effects)

Gears Cause Periodic Errorsexample: astrophotography

1

" Predictive distribution

x(k + 1) = f (x(k), u(k), d(k))

Page 15: Towards Safe Learning-Based Control

Example:Telescope Guiding

Photograph by Robert Vanderbei, La Palma, 2012

Page 16: Towards Safe Learning-Based Control

Quantifying Uncertain Predictions From Online Data Using Gaussian Process Regression

16

meanfunction

variance

measured states

applied inputs

estimated GPRegression

Dynamical Model

{x(t k)}Kk=0

{u(t k)}Kk=0

d(t )

� 2d (t )

Infer function d (t ) online:

Gaussian Process regression with ‘weakly periodic’ kernel

�d(t k)

� Kk=0

linear system + gear effectx(k + 1 ) = A x(k ) + B u(k ) + d(k )

Page 17: Towards Safe Learning-Based Control

Comparison to State of the ArtOpen Source Software PHD Guiding by Edgar Klenske

17currently own branch on github.com/OpenPHDGuiding/phd2

0 500 1,000 1,500 2,000 2,500 3,000 3,500�2�1

012

errorRA

[px]

Tracking Algorithm of PHD2 Guiding 2.5.0

0 500 1,000 1,500 2,000 2,500 3,000 3,500�2�1

012

errorRA

[px]

Tracking with GP Prediction

RMS(e) = 0.4174

RMS(e) = 0.3080

26% reduction

erro

r RA

[px]

erro

r RA

[px]

26% reduction

time [s]

Tracking Algorithm of PHD2 Guiding 2.5.0

Tracking with GP Prediction

RMS(error) = 0.4174

RMS(error) = 0.3080

Page 18: Towards Safe Learning-Based Control

Tracking in “Darkness”Robustness through Improved Predictions

18

0 200 400 600 800 1,000 1,200 1,400 1,600 1,800 2,0000

10

20

30

40

50

time [s]

accumulated

gear

errorRA

[px]

Tracking with ”cloud”

accu

mula

ted

gear

erro

r RA

[px]

time [s]

Page 19: Towards Safe Learning-Based Control

Outline

19

mea

sure

men

ts

Dynamical System

control input

Environment Human

Online data

This talk:! Learning for performance: Tailoring optimal controllers online ! Safety guarantees: Constraint satisfaction for any performance controller12

minN�

k=0

l(x(k), u(k)) + g(x(N))

Z�[� x(k + 1) = f (x(k), u(k), d(k))

(x(k), u(k)) � X

Page 20: Towards Safe Learning-Based Control

The Issue of The TransientAn Example

20

What if we have to learn while

satisfying constraints?

Stanford Autonomous Helicopter – Tic-toc learning from scratch [Abbeel, Coates, Quigley, Ng, NIPS 2007]

Transients make system traverse through unsafe behavior

Goal: Safe online learning in control

Page 21: Towards Safe Learning-Based Control

Safety GuaranteesInvariance & Stability

21

Constraint satisfactionSatisfy constraints now & in future" Utilize feasible, invariant set

StabilityControl system to output target" Show Lyapunov function

states x � X , inputs u � UModel: x(t) = f (x(t), u(t))

V (x) positive definite, bounded˙V (x) � ��(|x |) �x � X

O = {x � Rn | �u(·) � Uts.t. �� � [ 0,�),

�(� ; x, t, u(·)) � X}

Page 22: Towards Safe Learning-Based Control

Safety GuaranteesInvariance & Stability for Uncertain Systems

22

Robust Invariance Satisfy constraints now & in future" Utilize feasible, robust invariant set

Input-to-State Stability (ISS)Control system to output target" Show ISS Lyapunov function

, disturbances d � DModel: x(t ) = f (x(t ), u(t ), d(t ))

states x � X , inputs u � U

O = {x � R n | �u(· ) � U tZ�[� �� � [ 0 ,�), �d(· ) � D t�(� ; x, t , u(· ), d(· )) � X }

V (x) positive definite, bounded˙V (x, d) � ��(|x |) + �(|d |)

�x � X , d � D

Alternative: Constraint relaxation" Soft constraints with stability guarantees

[Zeilinger et al., TAC 2014]

Page 23: Towards Safe Learning-Based Control

Safety GuaranteesInvariance & Stability for Uncertain Systems

23

Robust Invariance Satisfy constraints now & in future" Utilize feasible, robust invariant set

Input-to-State Stability (ISS)Control system to output target" Show ISS Lyapunov function

, disturbances d � DModel: x(t ) = f (x(t ), u(t ), d(t ))

states x � X , inputs u � U

O = {x � R n | �u(· ) � U tZ�[� �� � [ 0 ,�), �d(· ) � D t�(� ; x, t , u(· ), d(· )) � X }

V (x) positive definite, bounded˙V (x, d) � ��(|x |) + �(|d |)

�x � X , d � D

Invariant set is based on system model" Common practice: Use conservative bound on model uncertainties" Small safe set restricts performance

Page 24: Towards Safe Learning-Based Control

" Distribution for model uncertainties" Estimate of disturbance set

Quantifying Model Uncertainties Using Gaussian Process Regression

24

meanfunction

variance

Infer disturbance function d (x ) online:

measured states

applied inputs

estimated GPRegression

Dynamical Model

{x(t k)}Kk=0

{u(t k)}Kk=0

x = f (x, u, d(x))

�d(xk)

� Kk=0

{x(t k)}Kk=0d(x)

� 2d (x)

ˆD (x) = [ d(x)� m � d (x), d(x) + m � d (x)]

Prior distribution

m chosen according to desired probability p(e.g. m=3 for p=99.7%)

Page 25: Towards Safe Learning-Based Control

Safety Based on Robust InvarianceMain Idea

25

Goal: Guarantee safety for any online controller

Floor

Ceilin

gSafe

Unsafe

Largest controlled invariant set within constraints

Example: Quadrotor tracking up and down reference (e.g.pick-up task)

x = f (x, u) + d(x)

states:vertical position

velocity

control:thrust

model uncertainty

Idea: Utilize robust invariant safe set

Unsafe

Page 26: Towards Safe Learning-Based Control

Safety Based on Reachability Main Idea

26

Goal: Guarantee safety for any online controller

Idea: Utilize robust invariant safe set

1. Inside: Apply performance controller2. On boundary: Apply safety controller

" Control law:

" Guaranteed constraint satisfaction, if model captures true system dynamics

[Gillula & Tomlin, ICRA 2012]

safety controller

u =

�u p (x), PM x PUZPKLu s (x), V[OLY^PZL

Floor

Ceilin

gSafe

Unsafe

Apply safety controller

Page 27: Towards Safe Learning-Based Control

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4x = f (x, u) + d1 (x)

d1 (x) � D 1 (x)

Safety Based on Reachability and Online Model Validation

27

Goal: Leverage online data to improve performance and safety properties

1. Learn model from online data and compute safe set

2. On boundary: Apply safety controller

3. Inside: Apply performance controller

[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]

Idea: Utilize robust invariant safe set

Page 28: Towards Safe Learning-Based Control

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4x = f (x, u) + d1 (x)

d1 (x) � D 1 (x)

Safety Based on Reachability and Online Model Validation

28

Goal: Leverage online data to improve performance and safety properties

[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4x = f (x, u) + d2(x)

d2(x) � D 2(x)1. Learn model from online data and

compute safe set2. On boundary: Apply safety controller

3. Inside: Apply performance controller

Idea: Utilize robust invariant safe set

Page 29: Towards Safe Learning-Based Control

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Safety Based on Reachability and Online Model Validation

29

Goal: Leverage online data to improve performance and safety properties

[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]

1. Learn model from online data and compute safe set

2. On boundary: Apply safety controller

3. Inside: Check confidence in model! If confident: Apply performance-

maximizing controller! If unconfident: Contract safe set

and apply safety controller

Idea: Utilize robust invariant safe setx = f (x, u) + d2(x)

d2(x) � D 2(x)

Loss of model confidence

Page 30: Towards Safe Learning-Based Control

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Safety Based on Reachability and Online Model Validation

30

Goal: Leverage online data to improve performance and safety properties

[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]

Idea: Utilize robust invariant safe setx = f (x, u) + d2(x)

d2(x) � D 2(x)

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Contracted safe set

Loss of model confidence

1. Learn model from online data and compute safe set

2. On boundary: Apply safety controller

3. Inside: Check confidence in model! If confident: Apply performance-

maximizing controller! If unconfident: Contract safe set

and apply safety controller

x = f (x, u) + d2(x)

d2(x) � D 2(x)

Page 31: Towards Safe Learning-Based Control

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Safety Based on Reachability and Online Model Validation

31

Safety by combining learning and control theoretic techniques

Goal: Leverage online data to improve performance and safety properties

[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]

Idea: Utilize robust invariant safe setx = f (x, u) + d2(x)

d2(x) � D 2(x)

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4 1. Learn model from online data and compute safe set

2. On boundary: Apply safety controller

3. Inside: Check confidence in model! If confident: Apply performance-

maximizing controller! If unconfident: Contract safe set

and apply safety controller

Page 32: Towards Safe Learning-Based Control

0

0.5

1

1.5

2

2.5

3

Alti

tud

e (

m)

No Online Validation

0 10 20 30 40 50 600

0.5

1

1.5

2

2.5

3

Alti

tud

e (

m)

Online Disturbance Validation

Time (s)

Safety: increased thrust command

Safety: decreased thrust command

Physical Collision

Reaction to Incorrect ModelWith Model Validation, No Re-computation of Safe Set

32

ground collisions

Detect unmodeleddisturbance" Contract safe set" Wait for

recomputation of safe set

No online validation

Online disturbance validation

Page 33: Towards Safe Learning-Based Control

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Safe Online Learning With Model Learning and Validation

33

! First updated model is invalid

! Method detects inconsistency, slightly contracts safe set

! Improved tracking after next model update

Initial Inaccurate Improved

System never leaves converged safe set

Page 34: Towards Safe Learning-Based Control

Safety Based on Reachability and Online Model Validation

34

Safety by combining learning and control theoretic techniques

Goal: Leverage online data to improve performance and safety properties

[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4 1. Learn model from online data and compute safe set

2. On boundary: Apply safety controller

3. Inside: Check confidence in model! If confident: Apply performance-

maximizing controller! If unconfident: Contract safe set

and apply safety controller

Idea: Utilize robust invariant safe set

Page 35: Towards Safe Learning-Based Control

The Safe SetHamilton-Jacobi-Isaacs Formulation

35Constraint Set

u � U , d � DModel:

K

x = f (x, u, d)

Goal: Compute safe set = maximum robust control invariant subset

Stay in : Control u vs. disturbance d

K

Page 36: Towards Safe Learning-Based Control

The Safe SetHamilton-Jacobi-Isaacs Formulation

36

l(x )l(x ) : positive in setand negative otherwise

Safe Set

Constraint Set

u � U , d � D

J(x )

� J (x, t )

� t= �min

0 ,ma xu �U

mind �D

� J (x, t )

�x

T

f (x, u, d)

J (x, 0) = l(x)

[Mitchell, Bayen, Tomlin, TAC 2005]

Model:

{x : J (x) � 0 }

K

x = f (x, u, d)

Modified HJI equation:

Stay in : Control u vs. disturbance d

K

Safe Set:= Largest robust control invariant set = Set of states for which disturbance cannot win!

Page 37: Towards Safe Learning-Based Control

The Safe SetHamilton-Jacobi-Isaacs Formulation

37

l(x )

Safe Set:= Largest robust control invariant set = Set of states for which disturbance cannot win!

l(x ) : positive in setand negative otherwise

Safe Set

Constraint Set

u � U , d � D

J(x )

� J (x, t )

� t= �min

0 ,ma xu �U

mind �D

� J (x, t )

�x

T

f (x, u, d)

J (x, 0) = l(x)

[Mitchell, Bayen, Tomlin, TAC 2005]

Model:

{x : J (x) � 0 }

K

x = f (x, u, d)

Safetycontroller: u s (x) = HYNma x

u �Umind �D

� J (x) T

�xf (x, u, d)

Modified HJI equation:

Stay in : Control u vs. disturbance d

K

Page 38: Towards Safe Learning-Based Control

Safety Based on Reachability and Online Model Validation

38

Safety by combining learning and control theoretic techniques

Goal: Leverage online data to improve performance and safety properties

[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]

Altitude (m)-0.5 0 0.5 1 1.5 2 2.5 3

Velo

city

(m/s

)

-3

-2

-1

0

1

2

3

4

1

2

3

4

Idea: Utilize robust invariant safe set

1. Learn model from online data and compute safe set

2. On boundary: Apply safety controller

3. Inside: Check confidence in model! If confident: Apply performance-

maximizing controller! If unconfident: Contract safe set

and apply safety controller

Page 39: Towards Safe Learning-Based Control

Benefit of HJI FormulationInfinite Number of Invariant Sets

39

� J (x, t )

� t= �min

0 ,ma xu �U

mind �D

� J (x, t )

�x

T

f (x, u, d)

Lemma:

Safe Set: {x : J (x) � 0 }

K

(U` UVUULNH[P]L Z\WLYSL]LS ZL[ VM J (x)�P�L� {x | J (x) � �,� � 0 }�PZ YVI\Z[ JVU[YVS PU]HYPHU[ �Y�[� d � D

x, u, d)

J (x)

Page 40: Towards Safe Learning-Based Control

Safety Strategy with Online Model Validation

40

Given:

! Initialize: Active safe set = biggest candidate set" Critical level JL = 0

! At each time step:Check confidence in If not confident:

Contract safe set to include current state

! Control law

ˆD (x)

ˆD (x), J (x)

Ku =

�u p (x), PM J (x) > J L

u s (x), V[OLY^PZL

J L = ma x {J (x), J L }

Standard:J (x) > 0

J (x)

J LJ L

Page 41: Towards Safe Learning-Based Control

Safety Strategy with Online Model Validation

41

Given:

! Initialize: Active safe set = biggest candidate set" Critical level JL = 0

! At each time step:Check confidence inIf not confident:

Contract safe set to include current state

! Control law

ˆD (x), J (x)

K

ˆD (x)

u =

�u p (x), PM J (x) > J L

u s (x), V[OLY^PZL

J L = ma x {J (x), J L }

J (x)

J L

J L

Page 42: Towards Safe Learning-Based Control

Safety Strategy with Online Model Validation

42

ˆD (x), J (x)

K

Given:

! Initialize: Active safe set = biggest candidate set" Critical level JL = 0

! At each time step:Check confidence inIf not confident:

Contract safe set to include current state

! Control law

ˆD (x)

re-compute

u =

�u p (x), PM J (x) > J L

u s (x), V[OLY^PZL

J L = ma x {J (x), J L }

= biggest candidate set

J (x)

Page 43: Towards Safe Learning-Based Control

Safety Strategy with Online Model ValidationGlobal Guarantees

43

Theorem:

Intuitively: Safety guaranteed, if there exists some superlevel set, such that disturbance is captured on its boundary

K

0M MVY ZVTL Z[H[L z [OL TVKLS JVUÄKLUJL PZ SV^ HUK �J S � [ 0 , J (z )] Z\JO [OH[d(x) � ˆD (x) �x � {x | J (x) = J S }� [OLU [OL JVU[YVS SH^ PZ N\HYHU[LLK [V RLLW [OLZ[H[L ^P[OPU [OL JVUZ[YHPU[Z H[ HSS [PTLZ�

[Akametalu, Fernandez-Fisac, Zeilinger, Tomlin CDC 2014]

Page 44: Towards Safe Learning-Based Control

Safety A First Experiment with Humans

44

Model inconsistency

Safer to stay above this altitude

Page 45: Towards Safe Learning-Based Control

SummaryPerformance & Safety for Learning-based Control

45

! Online model learning provides automatic enhancement of prediction & controller performance

" Important to characterize residual uncertainty

! Invariant sets provide safe exploration space, but require system model

! Model validation critical when learning online" Iteratively identify safe set

[Klenske et al., TCST 2015]

Acknowledgements:Edgar Klenske, Philipp Hennig, Bernhard SchölkopfKene Akametalu, Jaime Fisac, Shahab Kaynama, Claire Tomlin,

Optimal Controller

Dynamical System

External effects(human, environment)

Model Learning

d

[Akametalu, Fisac et al., CDC 2014]