Ph. D. Project
Selection and analysis of models for biological networks: use ofdifferent heterogenous knowledge for biological networks disrupted by pathologies
2018/10/01 - 2022/08/31
Other supervisor(s):
Scientific Framework :
Biological systems are very complex compared to human-made systems. Developing a dynamic model of a cell in its entirety remains utopian to this day. However, the understanding of the networks operating in a cell for the regulation of gene expression or for signalling makes it possible to better understand the phenomena that lead to a disease.
The control theory vision seems relevant to analyse the structure of biological systems because it consists of breaking down a complex system into a set of subsystems with good local properties, and then studying a posteriori with global properties due to the connection of these subsystems. Much work has been done in recent years on the construction and simulation of biological networks from experimental data. Various formalisms have been proposed to model these complex biological systems: Boolean networks, Bayesian networks, Petri nets, ordinary differential equations that can give nonlinear or linear time-varying models or even systems of stochastic equations. Each formalism is more or less apt to express the specific characteristics of a particular type of network (signalling, regulation, or metabolism). Once the formalism has been chosen, a modelling approach can be used to build a model (or models) based on experimental data. The chosen model must however be validated before being used in simulation or prediction.

In addition to the need to convert one formalism into another, studies focus on the integration of different types of biological networks (as in a cell for example) where each network is modelled in a specific formalism. It should be noted that differential equations constitute a generic formalism that makes it possible to build from experimental data, signalling networks, regulation networks as well as metabolic networks. These equations can be represented in the form of graphs where one can make a structural or topological analysis, allowing for example the ability to estimate the degree or the force of coupling / decoupling of sub-networks, to determine the number of points of stability, the subdivision and the hierarchy of the networks etc. When applied to gene expression, this type of analysis should lead to the characterization of existing regulations between genes, allowing us to answer in a generic way the problems of direct and reverse control: If one acts on a set of genes, what will be the consequences? If we want to modify the expression of a set of genes, what are the actions that make it possible? Various control strategies, whose objective is to intervene on the control of networks in order to avoid undesirable states of cells, or to force the network to converge towards a desired state, have been proposed by taking inspiration from the optimal control theor.
The complexity of modelling biological networks should not obscure the existence of large amounts of data and annotations from biological databases. Indeed, it is now possible not only to exploit a wide variety of results of past biological experiments, but also to access and use these annotations and models already described. Once resources for a given problem have been identified, a KDD (Knowledge Discovery from Databases) process can be implemented to derive from these resources the knowledge needed to solve a problem. Recent years have seen the rise of open and linked data (LOD) especially in the field of Biology. This data is represented in semantic web languages (RDF, RDFS) and is described with minimal semantics, which facilitates their integration into Ontology Web Language (OWL) knowledge bases. It is then possible to organize this data in a more expressive formalization of domain knowledge and to apply inference mechanisms in the service of problem solving or decision support.

Relevance, originality and objectives
This thesis proposal is motivated by two obstacles which the current approaches to model biological networks face. The first is that it is difficult to construct a complete descriptive model of a biological network when data is incomplete or uncertain. We propose to introduce the notion of an oriented model, meaning that we seek to build a model oriented by the specific objective of a model from a number of identified phenomena, possibly in the form of a set of protagonists and known parameters (genes, proteins, molecules, situations, pathology, environment, treatment ...) for which experimental observation data is available.
The second obstacle lies in the fact that it is challenging to build many candidate models from a set of experimental data. A manual analysis by biologists is then a necessity in order to choose which model seems most promising due to their vast expertise in this specific field of study. Examples of interesting work include model checking techniques for the validation of properties in interest from a biologists' point of view in complex networks or appropriate evaluation methods.

The aim of the PhD thesis is to formalize and evaluate the notion of oriented models with known methods for building biological networks and to design and test mechanisms for model reduction, or selection in an automated way guided by formalized knowledge, using semantic web languages with formal semantics such as RDF (S), OWL 2 EL, OWL2 QL, and OWL2 RL.

This doctoral project is both interdisciplinary and ambitious. It will provide dual expertise in data, construction and knowledge management of quantitative and experimental models and the analysis of the structural properties of these models.

Application framework:
This project can be applied to the study of various types of regulatory and signalling networks of interest for cancers. In the continuity of work already undertaken, it will be possible to begin by modelling the known receptor regulatory networks to better understand those that specifically involve an oestrogen receptor variant described as a factor of bad prognosis but whose regulation and activity remain unknown (or not well described).
The methodology developed to construct and validate oriented models of the genetic regulation network of oestrogen receptors can also be tested with the data relating to the mineralocorticoid receptor (MR), in the case of heart failure in the context of the Hospital Research Project "Fight Heart Failure" in which two LORIA teams are involved.
model selection, bayesian networks, biological databases, structural analysis
Control Identification Diagnosis
Biology, Signals and Systems in Cancer and Neuroscience
The thesis is co-funded by Federation Charles Hermite and Région Grand-Est    + CRAN - Publications