Trainee Project
Dates:
2024/04/11 - 2024/09/30
Student:
Supervisor(s):
Description:
Description: The subject seeks development as well as implementation of the existing state of the art methods for design optimal (sub-optimal) control laws for autonomous vehicles. In particular, optimal control design using state feedback approaches [1,2] will be envisaged for various purposes such as trajectory following (point to point, line following etc.), lane following, obstacle detection and avoidance, without as well as with vision (camera) based knowledge.
The internship will focus on objectives in a progressive manner starting from model-based feedback control design for control of dynamical systems that require safe control design, followed by learning strategies within the framework of reinforcement learning.
Recent advancements in the domain of adaptive dynamic programming and Reinforcement learning (ADP-RL) have led to remarkable results in optimal control design for non-linear systems in the absence of system knowledge (complete or partial)(Kiumarsi et al. 2017). RL is a mature field with well-established mathematical grounds for optimal (sub-optimal) control of non-linear dynamical systems in continuous as well as discrete time (Bertsekas et al. 1995). RL has become one of the most important and useful approach in control engineering. RL uses a trial-and-error learning process to maximize a decision-making agent's total reward observed from the environment. Here, the optimal control synthesis is largely based upon iterative solution for non-linear Hamilton-Jacobi-bellman equation (HJB) using neural network-based structures. Such a strategy is well applicable to discrete time as well as continuous time systems (Wang, Liu, and Wei 2012; C Mu, Wang, and He 2018; Chaoxu Mu et al. 2016)(Lewis 2008). Deep RL based approaches typically employ deep neural networks as efficient function approximators that approximate the system states, value/policy equations using various Deep neural network structures ( deep learning structures) that lead to optimal control solution in an approximate manner (Bertsekas and Tsitsiklis 1996), (Lillicrap et al. 2015; Dulac-Arnold et al. 2015)(Buşoniu et al. 2018). It should be noted that such solutions are intelligent and typically address the needs of unknown systems
The algorithms will be implemented in real time over the 1/10th scaled autonomous car Quanser CAR (QCAR) studio (see information here), available at CRAN (Polytech Nancy). See Annex for more details.
Objectives:
In this research subject, learning of control laws using policy gradient methods including DDPG will be targeted.
The objectives at high level include:
Study of existing work (bibliographic survey) on state feedback control design and reinforcement learning for safe control learning of a dynamical system.
Control design for point to point, line and trajectory following and lane following.
o Hands on tests on QCAR.
o Implementation of Code/program in MATLAB/Python.
Design of health aware control by incorporating battery degradation data within the control design.
The internship will provide possibilities for scientific publication in international conferences and reputed scientific journals.
The internship will focus on objectives in a progressive manner starting from model-based feedback control design for control of dynamical systems that require safe control design, followed by learning strategies within the framework of reinforcement learning.
Recent advancements in the domain of adaptive dynamic programming and Reinforcement learning (ADP-RL) have led to remarkable results in optimal control design for non-linear systems in the absence of system knowledge (complete or partial)(Kiumarsi et al. 2017). RL is a mature field with well-established mathematical grounds for optimal (sub-optimal) control of non-linear dynamical systems in continuous as well as discrete time (Bertsekas et al. 1995). RL has become one of the most important and useful approach in control engineering. RL uses a trial-and-error learning process to maximize a decision-making agent's total reward observed from the environment. Here, the optimal control synthesis is largely based upon iterative solution for non-linear Hamilton-Jacobi-bellman equation (HJB) using neural network-based structures. Such a strategy is well applicable to discrete time as well as continuous time systems (Wang, Liu, and Wei 2012; C Mu, Wang, and He 2018; Chaoxu Mu et al. 2016)(Lewis 2008). Deep RL based approaches typically employ deep neural networks as efficient function approximators that approximate the system states, value/policy equations using various Deep neural network structures ( deep learning structures) that lead to optimal control solution in an approximate manner (Bertsekas and Tsitsiklis 1996), (Lillicrap et al. 2015; Dulac-Arnold et al. 2015)(Buşoniu et al. 2018). It should be noted that such solutions are intelligent and typically address the needs of unknown systems
The algorithms will be implemented in real time over the 1/10th scaled autonomous car Quanser CAR (QCAR) studio (see information here), available at CRAN (Polytech Nancy). See Annex for more details.
Objectives:
In this research subject, learning of control laws using policy gradient methods including DDPG will be targeted.
The objectives at high level include:
Study of existing work (bibliographic survey) on state feedback control design and reinforcement learning for safe control learning of a dynamical system.
Control design for point to point, line and trajectory following and lane following.
o Hands on tests on QCAR.
o Implementation of Code/program in MATLAB/Python.
Design of health aware control by incorporating battery degradation data within the control design.
The internship will provide possibilities for scientific publication in international conferences and reputed scientific journals.
Keywords:
reinforcement leanirng, safe leanring, control barrier functions
Department(s):
Control Identification Diagnosis |