Search for: Search Button - Le CRAN
- Recherche
- Emplois-stages
- Formations
- Partenariats
- International

Ph. D. Project

Synthesis of unsupervised control law for reliable operation in the presence of component degradation

Dates:

2021/10/08 - 2024/09/30

Student:

Supervisor(s):

Description:

Context:Dynamic systems must perform their mission before component failures occur, which is inevitable if components undergo

progressive degradation [1]. Thus in this recent field of study, defined as the "Health Aware control framework"[2], control lawdesign

aims to achieve an optimal compromise between the desired system performance and the residual life of critical components (RUL -

Remaining Useful Life). In this context, it is important to highlight the following typical characteristics:-Component degradation models

are generally unknown and mathematical models are rarely available for control law design. The models of the global system are

also uncertain and accurate models are rarely available. The historical failure database and/or degradation test data must be effectively

combined with the overall (uncertain) system.-The degradation dynamics are generally very different from those of the overall

system. This is a different time scale problem to be taken into account in the synthesis.-Degradation is an irreversible physical

phenomenon, a monotonic processthat requires a specific approach to designing the controls of the affected system. The above-

mentioned elements will be studied in the context of Reinforcement Learning (RL) where the (sub)optimal control policy (law) is

obtained by minimizing an appropriate cost function in the absence of knowledge of the model[3]. The theory of optimal control is

well based on Dynamic programming and is well suited to address systems with uncontrolled dynamics.It is important to note

that the exact solution(s) obtained by iterative minimization of the quadratic cost function over an infinite horizon using Dynamic

programming is equivalent to that obtained by Riccati's solution according to the classical theoryof optimal control, as

developed in[4]. Note that approximate ("inaccurate") solutions can be obtained as part of (RL -Reinforcement Learning) using

approximate dynamic programming[5]. These last works will be considered to initiate the thesis work.Objectifs: In afirst step, the

problem of synthesizing an unsupervised control law that is safe to operate in the presence of component degradation will be

addressed for linear systems in discrete time. The extension to non-linear affine systems will be considered in a second step. To conduct

such a synthesis of unsupervised and optimal (suboptimal) control law without a physical model of degradation, the research work will

have to provide solutions to the following questions: --What are the effective and new architectures for combining the component

failure/damage database with the global system model based on unsupervised learning? We will use our recent result in which the

prediction of the RUL was integrated into the cost function for unsupervised learning in the RL using the Q-Learning algorithm[6].

System stability in the presence of increasing monotonous degradation should preferably be investigated using contraction model

analysis (fixed point solution)[7], as well as commandability analysis (derivation of achievable and/or controllable state sets). We will

have to solve the problem related to the dynamics of the system in the presence of different time scales.-How to learnthe control law

efficiently in the presence of a large state space (or degradation database)? It has been established that in the presence of large state

spaces, function approximators can be used to approximate the control law effectively[8][9]. We have recently shown an

improvement in learning ability by using the prediction of RUL in anapproximate Q-Learning algorithm[10]. We will use its recent

results to increase the effectiveness of learning by developing new approximators of functions relevant to our context.The algorithms

developed will be applied/validated on a bearing degradation benchmark available at CRAN (University's AM2I scientific pole

support, 2018-2019).[1]M. S. Jha, G. Dauphin-Tanguy, and B. Ould-Bouamama, "Particle filter based hybrid prognostics for health

monitoring of uncertain systems in bond graph framework," Mech. Syst. Signal Process., 2015.[2]J. C. Salazar, P. Weber, F. Nejjari, R.

Sarrate, and D. Theilliol, "System reliability aware model predictive control framework," Reliab. Eng. Syst. Saf., vol. 167, pp. 663-672,

2017.[3]F. L. Lewis and D. Liu, Reinforcement learning and approximate dynamic programming for feedback control, vol. 17. John Wiley

& Sons, 2013.[4]D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas, Dynamic programming and optimal control, vol. 1,

no. 2. Athena scientific Belmont, MA, 1995.[5]D. P. Bertsekas and J. N. Tsitsiklis, Neuro-dynamic programming, vol. 5. Athena Scientific

Belmont, MA, 1996.[6]M. S. Jha, P. Weber, D. Theilliol, J.-C. Ponsart, and D. Maquin, "A Reinforcement Learning Approach to Health

Aware Control Strategy," in 2019 27th Mediterranean Conference on Control and Automation (MED), 2019, pp. 171-176.[7]D. P.

Bertsekas, Abstract dynamic programming. Athena Scientific, 2018.[8]V. Mnih et al., "Human-level control through deep reinforcement

learning," Nature, vol. 518, no. 7540, p.529, 2015.[9]G. Dulac-Arnold et al., "Deep reinforcement learning in large discrete action

spaces," arXiv Prepr. arXiv1512.07679, 2015.[10]M. S. Jha, D. Theilliol, G. Biswas, and P. Weber, "Approximate Q-learning approach for

Health Aware Control Design," in 4th International Conference on Control and Fault-Tolerant Systems (SYSTOL), 18-20 September 2019,

Casablanca, Morocco, 2019

progressive degradation [1]. Thus in this recent field of study, defined as the "Health Aware control framework"[2], control lawdesign

aims to achieve an optimal compromise between the desired system performance and the residual life of critical components (RUL -

Remaining Useful Life). In this context, it is important to highlight the following typical characteristics:-Component degradation models

are generally unknown and mathematical models are rarely available for control law design. The models of the global system are

also uncertain and accurate models are rarely available. The historical failure database and/or degradation test data must be effectively

combined with the overall (uncertain) system.-The degradation dynamics are generally very different from those of the overall

system. This is a different time scale problem to be taken into account in the synthesis.-Degradation is an irreversible physical

phenomenon, a monotonic processthat requires a specific approach to designing the controls of the affected system. The above-

mentioned elements will be studied in the context of Reinforcement Learning (RL) where the (sub)optimal control policy (law) is

obtained by minimizing an appropriate cost function in the absence of knowledge of the model[3]. The theory of optimal control is

well based on Dynamic programming and is well suited to address systems with uncontrolled dynamics.It is important to note

that the exact solution(s) obtained by iterative minimization of the quadratic cost function over an infinite horizon using Dynamic

programming is equivalent to that obtained by Riccati's solution according to the classical theoryof optimal control, as

developed in[4]. Note that approximate ("inaccurate") solutions can be obtained as part of (RL -Reinforcement Learning) using

approximate dynamic programming[5]. These last works will be considered to initiate the thesis work.Objectifs: In afirst step, the

problem of synthesizing an unsupervised control law that is safe to operate in the presence of component degradation will be

addressed for linear systems in discrete time. The extension to non-linear affine systems will be considered in a second step. To conduct

such a synthesis of unsupervised and optimal (suboptimal) control law without a physical model of degradation, the research work will

have to provide solutions to the following questions: --What are the effective and new architectures for combining the component

failure/damage database with the global system model based on unsupervised learning? We will use our recent result in which the

prediction of the RUL was integrated into the cost function for unsupervised learning in the RL using the Q-Learning algorithm[6].

System stability in the presence of increasing monotonous degradation should preferably be investigated using contraction model

analysis (fixed point solution)[7], as well as commandability analysis (derivation of achievable and/or controllable state sets). We will

have to solve the problem related to the dynamics of the system in the presence of different time scales.-How to learnthe control law

efficiently in the presence of a large state space (or degradation database)? It has been established that in the presence of large state

spaces, function approximators can be used to approximate the control law effectively[8][9]. We have recently shown an

improvement in learning ability by using the prediction of RUL in anapproximate Q-Learning algorithm[10]. We will use its recent

results to increase the effectiveness of learning by developing new approximators of functions relevant to our context.The algorithms

developed will be applied/validated on a bearing degradation benchmark available at CRAN (University's AM2I scientific pole

support, 2018-2019).[1]M. S. Jha, G. Dauphin-Tanguy, and B. Ould-Bouamama, "Particle filter based hybrid prognostics for health

monitoring of uncertain systems in bond graph framework," Mech. Syst. Signal Process., 2015.[2]J. C. Salazar, P. Weber, F. Nejjari, R.

Sarrate, and D. Theilliol, "System reliability aware model predictive control framework," Reliab. Eng. Syst. Saf., vol. 167, pp. 663-672,

2017.[3]F. L. Lewis and D. Liu, Reinforcement learning and approximate dynamic programming for feedback control, vol. 17. John Wiley

& Sons, 2013.[4]D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas, Dynamic programming and optimal control, vol. 1,

no. 2. Athena scientific Belmont, MA, 1995.[5]D. P. Bertsekas and J. N. Tsitsiklis, Neuro-dynamic programming, vol. 5. Athena Scientific

Belmont, MA, 1996.[6]M. S. Jha, P. Weber, D. Theilliol, J.-C. Ponsart, and D. Maquin, "A Reinforcement Learning Approach to Health

Aware Control Strategy," in 2019 27th Mediterranean Conference on Control and Automation (MED), 2019, pp. 171-176.[7]D. P.

Bertsekas, Abstract dynamic programming. Athena Scientific, 2018.[8]V. Mnih et al., "Human-level control through deep reinforcement

learning," Nature, vol. 518, no. 7540, p.529, 2015.[9]G. Dulac-Arnold et al., "Deep reinforcement learning in large discrete action

spaces," arXiv Prepr. arXiv1512.07679, 2015.[10]M. S. Jha, D. Theilliol, G. Biswas, and P. Weber, "Approximate Q-learning approach for

Health Aware Control Design," in 4th International Conference on Control and Fault-Tolerant Systems (SYSTOL), 18-20 September 2019,

Casablanca, Morocco, 2019

Keywords:

Optimal control, Reinforcement Learning, Health Aware Control Design

Department(s):

Control Identification Diagnosis |