Barcelona. arXiv:2004.14362v1 [eess.SY] 29 Apr 2020

TS-MPC for Autonomous Vehicle using aLearning Approach ?

Eugenio Alcala ∗,∗∗ Olivier Sename ∗∗∗ Vicen Puig ∗,∗∗

Joseba Quevedo ∗∗

∗ Institut de Robotica i Informatica Industrial (CSIC-UPC). CarrerLlorens Artigas, 4-6, 08028 Barcelona (email:

[email protected]).∗∗ Supervision, Safety and Automatic Control Research Center

(CS2AC) of the Universitat Politecnica de Catalunya, Campus deTerrassa, Gaia Building, Rambla Sant Nebridi, 22, 08222 Terrassa,

Barcelona.∗∗∗ Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000

Grenoble, France.

Abstract: In this paper, the Model Predictive Control (MPC) and Moving Horizon Estimator(MHE) strategies using a data-driven approach to learn a Takagi-Sugeno (TS) representation ofthe vehicle dynamics are proposed to solve autonomous driving control problems in real-time. Toaddress the TS modeling, we use the Adaptive Neuro-Fuzzy Inference System (ANFIS) approachto obtain a set of polytopic-based linear representations as well as a set of membership functionsrelating in a non-linear way the different linear subsystems. The proposed control approach isprovided by racing-based references of an external planner and estimations from the MHEoffering a high driving performance in racing mode. The control-estimation scheme is tested ina simulated racing environment to show the potential of the presented approaches.

Keywords: Takagi-Sugeno approach, Model predictive control, Autonomous vehicles,Data-driven identification, Learning control

1. INTRODUCTION

In recent years, the amount of learning-based applicationshas increased immensely. Particularly, in the autonomousdriving field, we have witnessed new approaches as theend-end driving where the goal consists on guiding thevehicle by means of using learning algorithms and inputsensors data. In Sallab et al. (2017), a deep reinforcementlearning framework is proposed that takes raw sensormeasurements as inputs and outputs driving actions. InBojarski et al. (2016), authors use a Convolutional NeuralNetwork (CNN) to obtain the appropriate steering signalfrom the images of a single front camera.

Nowadays, from a control point of view, several strate-gies are starting to use learning tools to improve theircapabilities while guaranteeing overall system stability. Wehave witnessed an advance in this field reaching learningtechniques to adjust controllers, identify some parame-ters inside models or even control non-linear systems. InLefevre et al. (2015b,a), some solutions for controlling thelongitudinal velocity of a car based on learning humanbehaviour are presented. Also, a Model Predictive Control(MPC) technique using deep CNN to predict cost func-tions from the camera input images is developed in Drewset al. (2017). In Rosolia and Borrelli (2017); Rosolia et al.(2017); Rosolia and Borrelli (2019), the authors propose

? The authors wish to thank the support received by the Spanishnational project DEOCS (DPI2016-76493-C3-3-R).

a reference-free iterative MPC strategy able to learn fromprevious laps information.

Most of the last approaches were based on learning somepolicies to drive the vehicle independently of a physicalmodel. In this work, we are interested on learning a realis-tic and accurate representation of the system dynamics toimprove the control performance. In Kabzan et al. (2019),authors use a simple starting vehicle model which is en-hanced on-line by learning the model error using Gaussianprocess regression and measured vehicle data.

In paper, we propose the use of ANFIS approach, that is anadaptive neuro-fuzzy inference system, to learn the vehiclemodel. In the same manner as artificial neural networks,it works as a universal approximator (Jang, 1993). Themain purpose of using ANFIS is to learn an input-outputmapping based on input data. This tool has been widelyused in other engineering fields (Ndiaye et al., 2018; Jaleeland Aparna, 2019).

The main contribution of this work is to model accuratelya non-linear system as a structured Takagi-Sugeno (TS)representation of the vehicle by means of using machinelearning tools and input data. In particular, this papertakes advantage of the properties of ANFIS tool to learna data-driven TS system which will be later used by apredictive optimal control to solve the driving problem.

The paper is structured as follows. Section 2 presents thetesting vehicle used in simulation. Section 3 details the

arX

iv:2

004.

1436

2v1

[ee

ss.S

Y]

29

Apr

202

0

Fig. 1. Real picture of the vehicle used for simulation

proposed learning-based method and its main components.Section 4 formulates the control and estimation problems.Section 5 introduces its application to a case study toassess the methodology, as well as various performanceresults. Finally, Section 6 presents several conclusionsabout the method suitability.

2. TESTING VEHICLE

The Berkeley Autonomous Race Car (Gonzales et al.,2016) (BARC 1 ) is a development platform for au-tonomous driving to achieve complex maneuvers. This is a1/10 scale RWD electric remote control (RC) vehicle (seeFigure 1) that has been modified to operate autonomously.Mechanically speaking, this has been modified with somedecks to protect the on-board electronics and sensors.

The non-linear model used in this chapter for simulatingthe BARC vehicle is presented as

vx = ar +−Fyf sin δ − µg

m+ ωvy

vy =Fyf cos δ + Fyr

m− ωvx

ω =Fyf lf cos δ − Fyrlr

I

αf = δ − tan−1

(vyvx− lfω

vx

)αr = − tan−1

(vyvx

+lrω

vx

)Fyf = d sin (c tan−1(bαf ))

Fyr = d sin (c tan−1(bαr))

, (1)

where the dynamic vehicle variables vx, vy and ω representthe body frame velocities, i.e. linear in x, linear in yand angular velocities, respectively. The control variablesδ and a are the steering angle at the front wheels andthe longitudinal acceleration vector on the rear wheels,respectively. Fyf and Fyr are the lateral forces producedin front and rear tires, respectively. This considers thesimplified ”Magic Formula” model for simulating lateraltire forces where the parameters b, c and d define the shapeof the curve. Front and rear slip angles are represented as1 http://www.barc-project.com/

αf and αr, respectively. m and I represent the vehiclemass and inertia and lf and lr are the distances from thevehicle center of mass to the front and rear wheel axes,respectively. µ and g are the static friction coefficient andthe gravity constant, respectively. All the dynamic vehicleparameters are properly defined in Table 1.

Table 1. Dynamic model parameters

Parameter Value Parameter Value

lf 0.125 m lr 0.125 mm 1.98 kg I 0.03 kg m2

d 7.76 c 1.6b 6.0 µ 0.1

In this work, with the aim of improving the simulation,Gaussian noise has been introduced in the measuredvariables as

n(·) ∼ N(0, Co(·)) (2)

where Co(·) is the signal covariance.

3. LEARNING THE TS MODEL

In this section, we present the modeling methodologyused for obtaining the TS representation of the vehicledynamic model. The tool ANFIS (Jang, 1993), is anadaptive neuro-fuzzy inference machine that is used forlearning a particular structure from input-output data.More in detail, this modeling tool configures a neuralnetwork that learns from IO data the dynamic behaviourof the vehicle using back propagation technique and alsoemploying the Recursive Least Squares (RLS) method foradjusting additional parameters.

The methodology consists on providing a dataset to themodeling algorithm (ANFIS). This is composed by vehiclestates and inputs that represents a set of particular drivingmaneuvers guaranteeing rich enough data. Then, aftera learning-based procedure, this provides a set of linearparameters, also known as consequent parameters, anda set of premise parameters or non-linear parametersthat define the set of membership functions (MF) thatprovide the non-linear relationships between the differentlinear polynomials. One typical membership function isthe generalized Gaussian function.

However, obtaining the TS representation of a system bymeans of using this resulting parameters is not trivial.The procedure is based on performing some inverse stepsthat ANFIS internally does. To address this problem wehave to follow a set of reformulating steps. First, dueto ANFIS algorithm can be only used for Multi InputSingle Output (MISO) systems where just an outputvariable can be considered. Then, we split the systemin MISO sub-systems obtaining as many sub-systems asstate variables have the system. Our dynamic vehiclemodel is a third order system so three sub-systems willbe obtained and three learning procedures will be carriedout. Once the algorithm has computed conveniently theconsequent and premise parameters for each one of theMISO sub-systems, we build the polytopic TS state-spacerepresentation for each one of these sub-systems. To dothis, first, the polynomial representation of each sub-system is formulated as

Fig. 2. Polytopic TS learning scheme for the vx sub-systemcase

Pi = p1ivx + p2ivy + p3iω + p4iδ + p5ia+ p6i

∀i = 1, ..., Nv,(3)

where Pi stands for a linear polynomial representationof the dynamics of a sub-system at a particular statesconfiguration, pji, ∀j = 1, ..., Nζ , are the consequentparameters obtained from ANFIS where Nζ stands forthe number of scheduling variables (See Figure 2) and Nvrepresents the number of polytopic vertexes.

Then, simply by reorganising the terms in (3) as

Pi = [ p1i p2i p3i ]x+ [ p4i p5i ]u+ [ p6i ] (4)

the polynomial structure is transformed into the discrete-time state-space representation given by

x+i = Aix+Biu+ Ci ,∀i = 1, ..., Nv, (5)

where, in order to easy the comprehension from a controlpoint of view, Pi is represented as the sub-system i variableat the next discrete step (x+

i ) with symbol + representingthe k + 1 discrete step. Ai, Bi and Ci define the so-called

vertex systems, x = [ vx vy ω ]T

and u = [ δ a ]T

.

At this point, we use the obtained premise parameters toformulate the membership function. One of the most usedis the generalized Gaussian Bell function (GB). This isdefined by three parameters (a, b and c) as follows

ηm(ζo) =1

1 + ζo−cmoamo

2bmo,

∀m = 1, ..., NMF ,∀o = 1, ..., Nζ ,

(6)

where ζ represents the ANFIS input vector of variables(from now on we will refer to them as scheduling variables)and NMF and Nζ represent the number of MF per schedul-ing variable and the number of scheduling variables, re-spectively. For a common case where NMF is two, then,the normalized weights (µNi) are computed following

µi(ζ) =

Nζ∏j=1

ξij(η0, η1),∀i = 1, ..., Nv, (7)

where ξij(·) corresponds to any of the weighting functionsthat depend on each rule i. Then, using

µNi(ζ) =µi(ζ)

Nv∑j=1

µj(ζ)

,∀i = 1, ..., Nv, (8)

Fig. 3. Schematical view of the simulation set up

the normalized weights are obtained. Note that, eachscheduling variable ζo is known and varies in a definedinterval ζo ∈

[ζo, ζo

]∈ R. Finally, the polytopic TS model

for each sub-system is represented as

x+j =

Nv∑i=1

µNji(ζ)(Ajix+Bjiu+Cji) ,∀j = 1, ..., NG , (9)

where NG is the number of sub-systems.

Finally, for this work we consider the third order dynamicsystem presented in (1) which implies NG = 3 andtherefore the overall TS system is represented as

x+ =

Nv∑i=1

[µN1i

µN2i

µN3i

]([A1i

A2i

A3i

]x+

[B1i

B2i

B3i

]u+

[C1i

C2i

C3i

]).

(10)From now on, with the aim of an easier reading, the systemrepresentation in (10) will be expressed as

xk+1 =

Nv∑i=1

µNi(ζk)(Aixk +Biuk + Ci) . (11)

4. TS CONTROL AND ESTIMATION

In this section, we present the formulations for the MPCand MHE techniques using the TS formulation.

4.1 TS-MPC Design

Computing the predicting states behaviour in a certainhorizon when using a system dependent on some schedul-ing variables (TS system) can be a challenging task some-times leading to errors in the instantiation since the realfuture behaviour is unknown.

In this work, we propose the use of data coming fromtwo different locations to approximate in a better way thepredictive instantiation and avoid a lack of convergencein the optimal procedure. On the one hand, data comingfrom a planner is used which represents the desired statesbehaviour for tracking the desired trajectory. On the otherhand, predicted states from the past optimal realisationare also used to improve the TS model instantiation.The model used in this section is the one presented in(11) where the vector of scheduling variables is definedas ζ := [ vx vy ω δ a ]. The use of this model allows toformulate the MPC problem as a quadratic optimizationproblem that is solved at each time k to determine thecontrol actions considering that the values of xk and uk−1

are known

min∆Uk

Jk =

Hp−1∑i=0

((rk+i − xk+i)

TQ(rk+i − xk+i)

+ ∆uk+iR∆uk+i

)+ xTk+HpPxk+Hp

s.t. xk+i+1 =

Nv∑j=1

µNj (ζk)(Aj xk+i +Bjuk+i + Cj)

uk+i = uk+i−1 + ∆uk+i

∆Uk ∈ ∆Π

Uk ∈ Π

xk+Hp ∈ χye ∈ [ye, ye]

xk+0 = xk ,(12)

where Π = {uk|Auuk = bu, uk ≥ 0} and ∆Π ={∆uk|A∆u∆uk = b∆u,∆uk ≥ 0} constraint the systeminputs and their variations, respectively.

State vector is x = [ vx vy ω ]T

, x is the estimated state

vector, r = [ vxr 0 ωr ]T

is the reference vector provided

by the trajectory planner, u = [ δ a ]T

is the controlinput vector and Hp is the prediction horizon. The tuningmatrices Q ∈ R3x3 and R ∈ R2x2, are positive definitein order to obtain a convex cost function. The closedloop stability is guaranteed by introducing P ∈ R3x3

and χ which represent the terminal set and the terminalconstraint, respectively. Both are computed following thedesign presented in Alcala et al. (2019). Note that thetime discretization is embedded inside the identificationprocedure such that the learned TS system is already indiscrete time.

4.2 TS-MHE Design

For the vehicle presented in Section 2, vehicle lateralvelocity (vy) is an unmeasurable variable and a necessarystate to perform the closed-loop control of the vehicle.In this paper, we solve the estimation problem using theMHE approach. The aim of the MHE is to compute thecurrent dynamic states by means of running a constrainedoptimization, using a set of past data and employing asystem model for computing the current state. At thispoint, using the presented TS model in (11), we can run aquadratic optimization similar than in TS-MPC algorithmfor estimating the current dynamic states as follows

minXk

Jk =

0∑i=−Hp

(wTk+iQwk+i + sTk+iRsk+i

),

s.t. xk+i+1 =

Nv∑j=1

µNj (ζk)(Aj xk+i +Bjuk+i + Cj)

+ wk+i

yk+i = Cxk+i + sk+i

Xk ∈ Xd

∀i = −Hp, ..., 0 ,(13)

that is solved online for

Xk =

xk−Hp+1

xk−Hp+2

...xk+1

∈ RHp×s , (14)

where Xd is the constraint region for the dynamic statesand its defined as Xd = {xk|Axxk = bx, xk ≥ 0}. Hp

stands for the past data horizon and s the number ofstates. Matrices Q = QT ∈ R3x3 and R = RT ∈ R2x2,are positive definite to generate a convex cost functionand w and s represent the error of estimation and theprocess noise, respectively. State and input vectors are

x = [ vx vy ω ]T

and u = [ δ a ]T

. Note that, unlike MPCtechnique, MHE strategy performs an optimization takeninto account a window of past vehicle data.

5. RESULTS

The data-driven model identification carried out by theproposed approach is used to learn a state-space TSformulation of the real vehicle dynamics. In Figure 4,the membership functions learned for each input after theoffline identification procedure are shown in the left side.These represent the fuzzy rules that will be used online forobtaining the current state-space representation.

a

Fig. 4. Input-Output scheme for the vx sub-system case

Note that, since the discretization time is embedded inthe input data of the learning procedure, the selection ofa different sampling time is not allowed.

The way of evaluating the MPC and MHE strategiesusing the data-driven approach presented is through asimulation scenario. In this, a racing situation is proposedwhere the autonomous scheme presented in Figure 3 issimulated.

First, at every sampling period, i.e. 30 Hz, the racingplanner provides the references for the control strategysuch that the vehicle will have to behave in a racing drivingmode, what directly implies a more challenging controlproblem. Then, the TS-MHE optimal problem presentedin (13) is solved for estimating the current vehicle vector

state using past vehicle measurements. The next step isto instantiate the TS model matrices for the predictionstage using the approach presented in Section 2. Note that,both the planning evolution information as the previousoptimal prediction are used to achieve a good guess of thefuture values of the scheduling vector ζ. At this point,the quadratic optimal problem (12) is solved using theestimated state variables and the references coming fromthe trajectory planner. Once the optimal control actions(δ and a) are computed they are applied to the simulationvehicle presented in (1). As a consequence, the vehiclechange its state and this is measured by the net of sensors.Besides, with the aim of adding more realistic conditions tothe problem, white Gaussian noise magnitudes are addedto measured states with zero mean and covariances

Covx = 1× 10−6 , Coω = 4× 10−8 . (15)

Both TS-MPC and TS-MHE algorithms are coded inMATLAB framework. Yalmip and GUROBI (Gurobi Op-timization, 2015) are used for solving a quadratic opti-mization problem and running on a DELL inspiron 15(Intel core i7-8550U CPU @ 1.80GHzx8). In the controller,the tuning aims to minimize the longitudinal and angularvelocity while computing smooth control actions. The di-agonal terms of the weighting matrices in the cost functionand prediction horizon of (12), found by iterative tuninguntil the desired performance is achieved, are

Q = 0.65 [ 0.4 10−6 0.6 ],

R = 0.35 [ 0.7 0.3 ],

Hp = 6 .

(16)

The TS-MPC input constraints are defined as

Au =

1 0−1 00 10 −1

, bu =

0.2490.249

41

, (17a)

A∆u =

1 0−1 00 10 −1

, b∆u =

0.050.050.50.5

. (17b)

In the estimator, the tuning aims to minimize the processnoise while guessing the right value of vy by using the TSmodel. The diagonal terms of the weighting matrices in thecost function and past horizon of (13), found by iterativetuning until the desired performance is achieved, are

Q = 0.5 [ 0.25 0.5 0.25 ],

R = 0.5 [ 0.5 0.5 ],

Hp = 10 .

(18)

The TS-MHE state region is defined in the polytope

Ax =

1 0 0−1 0 00 1 00 −1 00 0 10 0 −1

, bx =

2.7−0.10.120.121.961.96

. (19a)

Figure 5 shows both the reference and the response foreach one of the velocity variables. Note that, the vehiclelateral velocity (vy) cannot be measured and hence, thesignal presented is the estimated one. It can be seenthat the controller is able to perfectly track the proposed

30 40 50 60 70 80 90 1001

2

3

vx [

m/s

]

reference

response

30 40 50 60 70 80 90 100

-0.1

0

0.1

vy [

m/s

]

30 40 50 60 70 80 90 100

Time [s]

-2

0

2

[ra

d/s

]Fig. 5. Vehicle states throughout the simulation. Horizon-

tal red lines represent the upper and lower limits

references although having little troubles when driving inracing mode, i.e. after 85 seconds. Horizontal red linesrepresent the polytope boundaries for each one of thescheduling variables that in this approach coincide withthe state and input vehicle variables. Note that, thislimits are imposed in the learning stage by the maximumand minimum values of the input signals, i.e. schedulingvariables.

In Figure 6, the optimal control actions are shown aswell as their discrete time variations which are the onesminimized in the cost function of (12). Note that thesteering angle reaches the upper and lower limits at somepoints while the rear wheel acceleration moves in a widerrange.

Finally, after observing a good tracking performance inlast figures, we present the elapsed time per iteration ofthe TS-MPC in Figure 7. It can be seen that, using aprediction horizon of 6 steps, the quadratic solver is ableto obtain an average of 4.8 ms. This is one of the mostremarkable results of this approach.

6. CONCLUSIONS

In this paper, a learning-based approach for identifying thedynamics of the vehicle and formulating them as a TS rep-resentation has been presented. Then, a TS-MPC strategyhas been proposed as the approach to solve autonomousdriving control problems under realistic conditions in real-time. In addition, using racing-based references providedby an external planner the controller makes the vehicleto perform in racing mode. The control strategy has beentested in simulation showing high performance potentialin both reference tracking and computational time. How-ever, this approach share the limitation of learning-basedprocedures where you can only do what you learn.

30 40 50 60 70 80 90 100

-0.2

0

0.2

[ra

d]

30 40 50 60 70 80 90 100

0

0.5

1

a [m

/s2]

30 40 50 60 70 80 90 100-0.05

0

0.05

[ra

d/s

]

30 40 50 60 70 80 90 100

Time [s]

-0.02

0

0.02

a [m

/s3]

Fig. 6. Control actions and their time derivative variablesthroughout the simulation. Horizontal red lines rep-resent the upper and lower limits

30 40 50 60 70 80 90 100

Time [s]

4

4.5

5

5.5

6

6.5

7

7.5

8

MP

C E

lapsed T

ime [s]

10 -3

Fig. 7. Computational time required by the TS-MPCthroughout the simulation

ACKNOWLEDGEMENTS

This work has been funded by the Spanish Ministry ofEconomy and Competitiveness (MINECO) and FEDERthrough the projects SCAV (ref. DPI2017-88403-R) andHARCRICS (ref. DPI2014-58104-R). The author is sup-ported by a FI AGAUR grant (ref 2017 FI B00433).

REFERENCES

Alcala, E., Cayuela, V.P., and Casin, J.Q. (2019). Ts-mpc for autonomous vehicles including a ts-mhe-uioestimator. IEEE Transactions on Vehicular Technology.

Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B.,Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller,U., Zhang, J., et al. (2016). End to end learning forself-driving cars. arXiv preprint arXiv:1604.07316.

Drews, P., Williams, G., Goldfain, B., Theodorou, E.A.,and Rehg, J.M. (2017). Aggressive deep driving: Modelpredictive control with a cnn cost model. arXiv preprintarXiv:1707.05303.

Gonzales, J., Zhang, F., Li, K., and Borrelli, F. (2016).Autonomous drifting with onboard sensors. In AdvancedVehicle Control: Proceedings of the 13th InternationalSymposium on Advanced Vehicle Control (AVEC16),September 13-16, 2016, Munich, Germany, 133.

Gurobi Optimization, I. (2015). Gurobi optimizer refer-ence manual. URL http://www. gurobi. com.

Jaleel, E.A. and Aparna, K. (2019). Identification ofrealistic distillation column using hybrid particle swarmoptimization and narx based artificial neural network.Evolving Systems, 10(2), 149–166.

Jang, J.S. (1993). Anfis: adaptive-network-based fuzzyinference system. IEEE transactions on systems, man,and cybernetics, 23(3), 665–685.

Kabzan, J., Hewing, L., Liniger, A., and Zeilinger, M.N.(2019). Learning-based model predictive control forautonomous racing. IEEE Robotics and AutomationLetters, 4(4), 3363–3370.

Lefevre, S., Carvalho, A., and Borrelli, F. (2015a). Au-tonomous car following: A learning-based approach. In2015 IEEE Intelligent Vehicles Symposium (IV), 920–926. IEEE.

Lefevre, S., Carvalho, A., and Borrelli, F. (2015b). Alearning-based framework for velocity control in au-tonomous driving. IEEE Transactions on AutomationScience and Engineering, 13(1), 32–42.

Ndiaye, A., Tankari, M.A., Lefebvre, G., et al. (2018).Adaptive neuro-fuzzy inference system application forthe identification of a photovoltaic system and the fore-casting of its maximum power point. In 2018 7th Inter-national Conference on Renewable Energy Research andApplications (ICRERA), 1061–1067. IEEE.

Rosolia, U. and Borrelli, F. (2017). Learning model pre-dictive control for iterative tasks. a data-driven controlframework. IEEE Transactions on Automatic Control,63(7), 1883–1896.

Rosolia, U. and Borrelli, F. (2019). Learning how toautonomously race a car: a predictive control approach.arXiv preprint arXiv:1901.08184.

Rosolia, U., Carvalho, A., and Borrelli, F. (2017). Au-tonomous racing using learning model predictive con-trol. In 2017 American Control Conference (ACC),5115–5120. IEEE.

Sallab, A.E., Abdou, M., Perot, E., and Yogamani, S.(2017). Deep reinforcement learning framework forautonomous driving. Electronic Imaging, 2017(19), 70–76.

Barcelona. arXiv:2004.14362v1 [eess.SY] 29 Apr 2020

Documents