"Commande sous-optimale de systèmes non-linéaires en temps discret avec des garanties de stabilité"
(Thèse Mathieu GRANZOTTO)
Résumé :
Artificial intelligence is rich in algorithms for optimal control. This involves generating order entries for dynamic systems in order to minimize a given cost function describing the energy of the system, for example. These methods are applicable to large classes of non-linear systems in discrete time and have been proven in many applications. Their use in control problems is therefore very promising. However, a fundamental question remains to be clarified for this purpose: that of stability. Indeed, these studies focus on optimality and in most cases ignore the stability of the controlled system, which is at the heart of control theory. The objective of my thesis is to study the stability of non-linear systems controlled by such algorithms. The issue is important because it will create a new bridge between artificial intelligence and control theory. Stability informs us about the behavior of the system over time and guarantees its robustness in the presence of model perturbations or uncertainties. Artificial intelligence algorithms focus on control optimality and do not exploit the properties of system dynamics. Stability is not only desirable for reasons previously, but also for the possibility of using it to improve these artificial intelligence algorithms. My research focuses on control techniques from dynamic (approximate) programming when the system model is known. For this purpose, I identify general conditions by which it is possible to guarantee the stability of the closed-loop system. On the other hand, once stability has been established, we can use it to drastically improve the stability guarantees of the literature. My work has focused on two main areas. The first concerns the value iteration algorithm, which is one of the pillars of the approached dynamic programming and is at the heart of many reinforcement learning algorithms. The cost function considered is deducted, i.e. the function to be minimized is weighted by a 'forgetting factor' or `discounting factor' that decreases over time. This type of function is often considered in dynamic programming, reinforcement learning, and optimistic planning for example, and offers many advantages for synthesis and optimality analysis. On the other hand, the discounting factor is a source of difficulty when it comes to stability. Obviously, recent studies demonstrate the technical difficulties of the presence of the discounting factor, and that it must be chosen large enough (ergo, a negligible discounting), but these results are not adapted to the case of the value iteration algorithm. A finer analysis closer to the algorithm is required. In reality, the settled infinite horizon cost functions are approached by settled finite horizon cost functions.
Jury : | |
- Rapporteurs : | TABUADA Paulo - Prof - University of California |
TRELAT Emmanuel - Prof - Sorbonne | |
- Autres membres : | Examinateurs : POSTOYAN Romain - CRAN STOICA MANIU Cristina - Prof - CentraleSupélec NESIC Dragan - Prof - Universite of Melbourne HENRION Didier - Directeur de recherche - LAAS-CNRS Bruno SCHERRER |