CRAN - Campus Sciences
BP 70239 - 54506 VANDOEUVRE Cedex
Tél : +33 (0)3 72 74 52 90
Sujet de Thèse : Modélisation de systèmes environnementaux à partir de données mesurées
Dates : 2014/10/01 - 2017/09/30
Directeur(s) CRAN : Marion GILSON-BAGREL , Vincent LAURAIN
Autre(s) Directeur(s) : Lorenzo LEIJA-SALAS, Prof- CINVESTAV-IPN de Mexico
Description : 1 Scientific context

In environmental sciences nowadays, the main focus and concern is the ability to forecast the current tendencies impact at short middle and long term. The problem is crucial in many different applications such as global warming, air or water pollution. However, the precision of an answer to such a problem is closely linked to the capacity of understanding the behaviour of the environmental system considered and the dynamical relationships involved between different variables. The considered system behaviour is almost always approximated through models using physical knowledge which are used to forecast some output or simulate different scenarii. The obtention of such models is a very tedious work for the specialists, requires much a priori knowledge about the studied system and leads to complex models hard to use in practical conditions. Indeed, given the complexity of the involved structures, the number of intricate behaviours and the missing information it is almost impo ssible to tune a given model to fit its behaviour on available data. Thus, the forecasting is rendered most uncertain and leave the environmentalists with no other choice than try to simplify the model structures or create new one specially dedicated to the exact data under studied. Nevertheless, such approximations or simplifications are not an easy task depending on the geographic scale of the system, the geological structures or the human activities. This lead to the scientific context of the proposed research topic: How to build simplified models in environmental context using measured data?

2 Goal

When dealing with data-driven dynamical modelling, an innovative solution is the use of system identification theory. Even if the underlying theory is well established, some issues remain widely open when applied to environmental data. The study of these issues is intended as the main objective for this Ph.D. thesis :

- The automated choice of a suitable model structure: it is well-known that environmental systems behave mostly nonlinearly and often in a time-varying manner (due to human activities, seasonal variations...). Hence the immediate question arising is: how to choose a suitable model structure? Moreover, in this environmental context, not only the model should reproduce the data behaviour, but it also need to be simple and to offer some physical insight interpretable by environmentalists in order to validate the approach. These constraints lead to the need of innovative identification methods able to optimise a model structure not solely based on the data but also on the "interpretability" factor. For example, many approaches are proposed in the literature in order to enforce sparsity and therefore simplicity. They are nevertheless not really applicable in the presented context due to the second main issue when dealing with environmental data : the noise. Indeed the former app roaches require restrictive assumptions on the noise which are usually violated in environmental applications.

- To cope with missing data: unlike in many industrial applications, the noise in environmental data plays a major role and is not mainly issued from measurement quality. Most often, the environmentalists observe and measure a few data of interest but cannot possibly acquire all possible data related to the studied phenomenon. Therefore the data acquisition does not match the usual automatic sensor solution used in many industrial processes which is required in system identification. This results in unusual datasets :
- The dataset is only partial (either unmeasured, noisy, irregularly sampled...).
- The noise behavior is often dynamical, nonlinear and time-varying.
Since the proposed method will be based on measured data, the solutions will need to cope with these issues which are general open challenges in system identification.

- Apply the proposed methods to real data: the main application considered in this project is the water pollution by nitrogen in agricultural regions. The problem of nitrogen fertilisers has become a major concern for stakeholders in charge with drinkable water sources in agricultural regions. Therefore it becomes compulsory to offer some predictions about the water pollution level in order to help their decisions regarding agricultural policies. This Ph.D. thesis aims at proposing some innovative solutions in a discipline ruled by physical modelling and which is currently facing some strong societal issues.

In conclusion the final goal of this thesis is to propose and develop new approaches robust to the several aforementioned problems and able to estimate a physically interpretable environmental processes based on real-life data.

3 Suitable profile

Applicants should have some background in automatic control and if possible in system identification. Other suitable background is optimisation and/or machine learning. Moreover, the applicant must exhibit a good level in English language as some foreign stay will be strongly encouraged during the Ph.D. thesis. Finally, the applicant should be autonomous, rigorous and willing to share and learn from other disciplines, in this case, hydrology and agronomy.
For any futher information and publication about ongoing work :
Mots clés : System identification, water quality, optimisation, machine learning
Département(s) :
Contrôle Identification Diagnostic
Financement : Contrat doctoral UL de l'Ecole Doctorale IAEM Lorraine