Design of Attitude and Path Tracking Controllers for Quad-Rotor Robots using Reinforcement Learning Sérgio Ronaldo Barros dos Santos Cairo Lúcio Nascimento.

Design of Attitude and Path Tracking Controllers for Quad-Rotor Robots

using Reinforcement Learning

Sérgio Ronaldo Barros dos SantosCairo Lúcio Nascimento Júnior

Instituto Tecnológico de Aeronáutica (ITA)Brazil

Sidney Nascimento Givigi JúniorRoyal Military College of Canada (RMCC)

Canada1

Introduction• Quad-rotor robots have attracted the attention of

many researchers in the past few years.

• Examples of applications:– Military applications: surveillance, border patrolling,

crowd control.

– Civilian applications: rescue missions during floods and earthquakes, monitoring pipelines and electric transmission liones.

2

IntroductionA quad-rotor consists of four independent propellers attached to the corners of a cross-shaped frame, turning in opposite directions.

3

Quad-Rotor DynamicsAll rotational and translational movements of a quad-rotor can be achieved by adjusting its rotor speeds.

4

Introduction• Quad-rotor robots are affected by a

number of physical effects such as:

– Aerodynamic effects,

– Gravity effect,

– Ground effect,

– Gyroscopic effect,

– Friction.

• Due to these nonlinear effects, it is difficult to design good controllers for a quad-rotor.

5

Introduction• Typically quad-rotor applications use controllers

derived using linearized models.

• These controllers exhibit poor performance for fast maneuvers or in the presence of disturbances such as wind and the ground effect.

• In order to perform path tracking in the presence of nonlinear disturbances, a machine learning technique (RL-LA) will be applied.

6

Objectives

• To present a solution for testing and evaluation of attitude stabilization and path tracking controllers for quad-rotors.

• To use a Reinforcement Learning algorithm (Learning Automata) to adjust the controllers parameters using a simulation environment that includes wind and ground effects.

7

Quad-Rotor Dynamics• An inertial frame and a body fixed frame whose

origin is in the center of mass of the quad-rotor are used.

8

Quad-Rotor Dynamics• The dynamic model is derived under the

following assumptions.

– the vehicle frame is rigid and symmetrical,

– the body fixed frame is located at the vehicle center of mass,

– the propellers are also rigid.

9

Quad-Rotor Dynamics• The dynamic model of the quad-rotor can de

derived using Newton-Euler formalism.

10

Robot Controllers• The control architecture for the robot involves

two loops: inner and outer. The roll, pitch, and yaw angles are represented by Φ, θ and ψ, respectively.

11

Robot Controllers• Three nonlinear control strategies are used:

- Nonlinear PID Control, - Backstepping technique

- Sliding Model Control.

12

Robot ControllersThe parameters of the 6 controllers are tuned using the RL algorithm.

Technique

Controllers

Path Tracking AttitudeHeight

x-position y-position Pitch Roll Yaw

PID kp,ki,kd kp,ki, ,kd kp,ki,kd kp,ki,kd kp,ki,kd kp,ki,kd

Backstepping α12, α11 α10, α9 α4, α3 α1, α2 α5, α6 α7, α8

Sliding Mode k5, λ5 k4, λ4 k2, λ2 k1, λ1 k3, λ3 k6, λ6

13

Simulation Environment• A simulation setup is proposed to train and

evaluate the quad-rotor controller under more realistic conditions.

14

Simulation Environment

15

Simulation Environment

16

Simulation Environment• Using the Plane-Marker, a X-Plane model of the

X3D-BL quad-rotor (manufactured by Ascending Technologies) was created.

17

Simulation Environment• The responses of the X-Plane and SIMULINK

models are compared for a hovering maneuver.

18

Reinforcement Learning• Learning Automata (LA) is an alternative approach

that can be used to adjust the parameters of the controllers.

19

Reinforcement Learning• Steps of the learning process:1. Initialize the probability and parameters vectors of each

controller;

2. Select the parameters for each controller using its associated probability vector;

3. Execute the desired task, obtain its response and use a cost function to measure its performance.

4. Compute the reinforcement signal;

5. Adjust the probability vectors;

6. Check the probability vectors for convergence, otherwise return to step 2.

20

Reinforcement Learning• Supervisory level: LA adjusts the parameters of

the attitude and path tracking controllers.

21

Reinforcement Learning• Learning the parameters of the controllers was

executed using the X-Plane model in 3 stages with increasing levels of difficulty :

– without the presence of any external disturbances,

– considering only the presence of wind,

– considering the wind and ground effects.

22

Reinforcement Learning

23

Reinforcement Learning

24

Reinforcement Learning• A cost function evaluates the response of each

controller (i) for the selected task at the end of each trial (k) :

T

ssspMeik EGMGdtteGJ

0

2 )(

25

Reinforcement LearningThe reinforcement signal is computed for each controller (i) at the end of each trial (k):

22

1,,0maxmin10min

pbpbik

ib

ip

med

medikii

RRRRCRR

JJ

JJCC

kk

kk

26

Reinforcement Learning1. The element of the probability vector

associated with the selected controller parameter is adjusted:

2. The probability vector is then normalized.

ikik

ik jpjp 11

27

Reinforcement Learning• Learning the desired trajectory using the PID

controller during the first stage.

28

Results• The nonlinear PID controllers results obtained

during simulation. The trajectory is formed by the points (0,0) - (0,10) - (10,10) - (10,0) meters.

29

Results• The quad-rotor robot during the execution of a

pre-defined trajectory visualized in the X-Plane.

30

Results• The backstepping controller results in the

presence of wind and ground effects

31

Results• The path tracking of quad-rotor obtained by the

backstepping controllers in the presence of wind and ground effects, visualized in the X-Plane.

32

Results• The sliding mode controller response using the

in presence of wind and ground effects.

33

Results• The quad-rotor trajectory obtained by the sliding

controllers in presence of wind and ground effects, visualized in the X-Plane.

34

Results• Evaluation of the controllers tracking of desired

path after the learning process.

35

Conclusions

• The proposed method (Learning Automata) allows one to tune the parameters of different controllers for a quad-rotor aircraft, considering external disturbances such as wind and ground effects.

• It was shown that the proposed simulation framework can be useful to investigate the application of learning algorithms to adjust the control laws of quad-rotors for different flight maneuvers.

36

Future Research

• Evaluate the controllers (obtained using LA, the simulated model, the simulation environment) using real quad-rotors.

• On-line learning: useful to correct inaccuracies of the simulated (model + environment).

37

Future Research

• Comparison to other RL methods (e.g., Q-Learning) and other search procedures (e.g., genetic algorithms).

• Limitation of learning: generalization to other tasks

Problem: selection of tasks to be executed during training (adaptive control: choice of excitation signal).

38

Thank You !

39

Design of Attitude and Path Tracking Controllers for Quad-Rotor Robots using Reinforcement Learning Sérgio Ronaldo Barros dos Santos Cairo Lúcio Nascimento.

Documents

quad rotor

quadrotor dynamics

quadrotor applications

quadrotor controller

introduction quadrotor

rotor speeds

robot controllers

controllers parameters