Ph. D. Project
Synthesis of unsupervised control law for reliable operation in the presence of component degradation
Dates:
2021/10/08 - 2025/08/31
Student:
Supervisor(s):
Description:
Context:Dynamic systems must perform their mission before component failures occur, which is inevitable if components undergo
progressive degradation [1]. Thus in this recent field of study, defined as the "Health Aware control framework"[2], control lawdesign
aims to achieve an optimal compromise between the desired system performance and the residual life of critical components (RUL -
Remaining Useful Life). In this context, it is important to highlight the following typical characteristics:-Component degradation models
are generally unknown and mathematical models are rarely available for control law design. The models of the global system are
also uncertain and accurate models are rarely available. The historical failure database and/or degradation test data must be effectively
combined with the overall (uncertain) system.-The degradation dynamics are generally very different from those of the overall
system. This is a different time scale problem to be taken into account in the synthesis.-Degradation is an irreversible physical
phenomenon, a monotonic processthat requires a specific approach to designing the controls of the affected system. The above-
mentioned elements will be studied in the context of Reinforcement Learning (RL) where the (sub)optimal control policy (law) is
obtained by minimizing an appropriate cost function in the absence of knowledge of the model[3]. The theory of optimal control is
well based on Dynamic programming and is well suited to address systems with uncontrolled dynamics.It is important to note
that the exact solution(s) obtained by iterative minimization of the quadratic cost function over an infinite horizon using Dynamic
programming is equivalent to that obtained by Riccati's solution according to the classical theoryof optimal control, as
developed in[4]. Note that approximate ("inaccurate") solutions can be obtained as part of (RL -Reinforcement Learning) using
approximate dynamic programming[5]. These last works will be considered to initiate the thesis work.Objectifs: In afirst step, the
problem of synthesizing an unsupervised control law that is safe to operate in the presence of component degradation will be
addressed for linear systems in discrete time. The extension to non-linear affine systems will be considered in a second step. To conduct
such a synthesis of unsupervised and optimal (suboptimal) control law without a physical model of degradation, the research work will
have to provide solutions to the following questions: --What are the effective and new architectures for combining the component
failure/damage database with the global system model based on unsupervised learning? We will use our recent result in which the
prediction of the RUL was integrated into the cost function for unsupervised learning in the RL using the Q-Learning algorithm[6].
System stability in the presence of increasing monotonous degradation should preferably be investigated using contraction model
analysis (fixed point solution)[7], as well as commandability analysis (derivation of achievable and/or controllable state sets). We will
have to solve the problem related to the dynamics of the system in the presence of different time scales.-How to learnthe control law
efficiently in the presence of a large state space (or degradation database)? It has been established that in the presence of large state
spaces, function approximators can be used to approximate the control law effectively[8][9]. We have recently shown an
improvement in learning ability by using the prediction of RUL in anapproximate Q-Learning algorithm[10]. We will use its recent
results to increase the effectiveness of learning by developing new approximators of functions relevant to our context.The algorithms
developed will be applied/validated on a bearing degradation benchmark available at CRAN (University's AM2I scientific pole
support, 2018-2019).[1]M. S. Jha, G. Dauphin-Tanguy, and B. Ould-Bouamama, "Particle filter based hybrid prognostics for health
monitoring of uncertain systems in bond graph framework," Mech. Syst. Signal Process., 2015.[2]J. C. Salazar, P. Weber, F. Nejjari, R.
Sarrate, and D. Theilliol, "System reliability aware model predictive control framework," Reliab. Eng. Syst. Saf., vol. 167, pp. 663-672,
2017.[3]F. L. Lewis and D. Liu, Reinforcement learning and approximate dynamic programming for feedback control, vol. 17. John Wiley
& Sons, 2013.[4]D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas, Dynamic programming and optimal control, vol. 1,
no. 2. Athena scientific Belmont, MA, 1995.[5]D. P. Bertsekas and J. N. Tsitsiklis, Neuro-dynamic programming, vol. 5. Athena Scientific
Belmont, MA, 1996.[6]M. S. Jha, P. Weber, D. Theilliol, J.-C. Ponsart, and D. Maquin, "A Reinforcement Learning Approach to Health
Aware Control Strategy," in 2019 27th Mediterranean Conference on Control and Automation (MED), 2019, pp. 171-176.[7]D. P.
Bertsekas, Abstract dynamic programming. Athena Scientific, 2018.[8]V. Mnih et al., "Human-level control through deep reinforcement
learning," Nature, vol. 518, no. 7540, p.529, 2015.[9]G. Dulac-Arnold et al., "Deep reinforcement learning in large discrete action
spaces," arXiv Prepr. arXiv1512.07679, 2015.[10]M. S. Jha, D. Theilliol, G. Biswas, and P. Weber, "Approximate Q-learning approach for
Health Aware Control Design," in 4th International Conference on Control and Fault-Tolerant Systems (SYSTOL), 18-20 September 2019,
Casablanca, Morocco, 2019
progressive degradation [1]. Thus in this recent field of study, defined as the "Health Aware control framework"[2], control lawdesign
aims to achieve an optimal compromise between the desired system performance and the residual life of critical components (RUL -
Remaining Useful Life). In this context, it is important to highlight the following typical characteristics:-Component degradation models
are generally unknown and mathematical models are rarely available for control law design. The models of the global system are
also uncertain and accurate models are rarely available. The historical failure database and/or degradation test data must be effectively
combined with the overall (uncertain) system.-The degradation dynamics are generally very different from those of the overall
system. This is a different time scale problem to be taken into account in the synthesis.-Degradation is an irreversible physical
phenomenon, a monotonic processthat requires a specific approach to designing the controls of the affected system. The above-
mentioned elements will be studied in the context of Reinforcement Learning (RL) where the (sub)optimal control policy (law) is
obtained by minimizing an appropriate cost function in the absence of knowledge of the model[3]. The theory of optimal control is
well based on Dynamic programming and is well suited to address systems with uncontrolled dynamics.It is important to note
that the exact solution(s) obtained by iterative minimization of the quadratic cost function over an infinite horizon using Dynamic
programming is equivalent to that obtained by Riccati's solution according to the classical theoryof optimal control, as
developed in[4]. Note that approximate ("inaccurate") solutions can be obtained as part of (RL -Reinforcement Learning) using
approximate dynamic programming[5]. These last works will be considered to initiate the thesis work.Objectifs: In afirst step, the
problem of synthesizing an unsupervised control law that is safe to operate in the presence of component degradation will be
addressed for linear systems in discrete time. The extension to non-linear affine systems will be considered in a second step. To conduct
such a synthesis of unsupervised and optimal (suboptimal) control law without a physical model of degradation, the research work will
have to provide solutions to the following questions: --What are the effective and new architectures for combining the component
failure/damage database with the global system model based on unsupervised learning? We will use our recent result in which the
prediction of the RUL was integrated into the cost function for unsupervised learning in the RL using the Q-Learning algorithm[6].
System stability in the presence of increasing monotonous degradation should preferably be investigated using contraction model
analysis (fixed point solution)[7], as well as commandability analysis (derivation of achievable and/or controllable state sets). We will
have to solve the problem related to the dynamics of the system in the presence of different time scales.-How to learnthe control law
efficiently in the presence of a large state space (or degradation database)? It has been established that in the presence of large state
spaces, function approximators can be used to approximate the control law effectively[8][9]. We have recently shown an
improvement in learning ability by using the prediction of RUL in anapproximate Q-Learning algorithm[10]. We will use its recent
results to increase the effectiveness of learning by developing new approximators of functions relevant to our context.The algorithms
developed will be applied/validated on a bearing degradation benchmark available at CRAN (University's AM2I scientific pole
support, 2018-2019).[1]M. S. Jha, G. Dauphin-Tanguy, and B. Ould-Bouamama, "Particle filter based hybrid prognostics for health
monitoring of uncertain systems in bond graph framework," Mech. Syst. Signal Process., 2015.[2]J. C. Salazar, P. Weber, F. Nejjari, R.
Sarrate, and D. Theilliol, "System reliability aware model predictive control framework," Reliab. Eng. Syst. Saf., vol. 167, pp. 663-672,
2017.[3]F. L. Lewis and D. Liu, Reinforcement learning and approximate dynamic programming for feedback control, vol. 17. John Wiley
& Sons, 2013.[4]D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas, Dynamic programming and optimal control, vol. 1,
no. 2. Athena scientific Belmont, MA, 1995.[5]D. P. Bertsekas and J. N. Tsitsiklis, Neuro-dynamic programming, vol. 5. Athena Scientific
Belmont, MA, 1996.[6]M. S. Jha, P. Weber, D. Theilliol, J.-C. Ponsart, and D. Maquin, "A Reinforcement Learning Approach to Health
Aware Control Strategy," in 2019 27th Mediterranean Conference on Control and Automation (MED), 2019, pp. 171-176.[7]D. P.
Bertsekas, Abstract dynamic programming. Athena Scientific, 2018.[8]V. Mnih et al., "Human-level control through deep reinforcement
learning," Nature, vol. 518, no. 7540, p.529, 2015.[9]G. Dulac-Arnold et al., "Deep reinforcement learning in large discrete action
spaces," arXiv Prepr. arXiv1512.07679, 2015.[10]M. S. Jha, D. Theilliol, G. Biswas, and P. Weber, "Approximate Q-learning approach for
Health Aware Control Design," in 4th International Conference on Control and Fault-Tolerant Systems (SYSTOL), 18-20 September 2019,
Casablanca, Morocco, 2019
Keywords:
Optimal control, Reinforcement Learning, Health Aware Control Design
Department(s):
Control Identification Diagnosis |