Ph. D. Project
Title:
Formal methods for extracting and reusing knowledge from heterogeneous sources for semantic interoperability of distributed architectures
Dates:
2020/09/01 - 2023/08/31
Supervisor(s): 
Other supervisor(s):
TESTE Laurent (Laurent.Teste@snmsf.com)
Description:
Description of the research theme and the associated thesis topic : Theme and research problem This work
focuses on the creation of mathematical models and the implementation of intelligent sensors, Cyber Physical
Systems (CPS) to enrich the data layer that comes up from the field. One of the most relevant scientific challenges
is the lack of formalisation (in other words mathematical) of models of systems and the information systems that
result from them, as well as the definition of the semantics of the concepts and relationships they apply, in order
to ensure their common understanding and to facilitate their interoperability by minimising semantic losses ;
State of the art science In order to make precise and concrete scientific contributions to this project, which is
already underway, an approach to interoperable systems engineering (Ramos, 2011) and (Morel, 2003) will be
used, which consists of relying on different types and levels of abstraction or models. These models should
express and formalize not only the "structural" aspect of the system components, but also their behaviour (Maier,
1998), which may be limited by the specific requirements of the system domain (business rules). Another type of
constraint may be induced by the interoperability protocol(s), which may impose strict rules to endow
interoperable systems with properties such as autonomy, confidentiality and transparency (Zdravković et al,
2016). The objective of this research project is twofold: on the one hand, to model data from heterogeneous
sources and, on the other hand, to study the problems posed by model-driven engineering in cooperative
systems. Involving cooperation on "systems of actors" willing to interoperate. Collaborative systems are now
organized into networks, or complex systems (Camarinha-Matos, 2014). The complex systems envisaged will be
composed of networks of CSPs, intelligent sensors, which will retrieve data by inserting the context and thus form
information networks (Cardin, 2016). The scientific challenge is thus to make available languages and modeling
tools adapted to each project of systems with distributed architecture, despite the heterogeneity of business skills
and the multidisciplinarity of the domains. This challenge has two dimensions: on the one hand, that of the
capacity of modelling to provide tools for business processes, which requires the definition and formalisation of
their invariants; on the other hand, the study of the conditions of use of models in practice, which is always
evolving and uncertain. Formal concept analysis (FCA) (Priss, 2006) is a useful and powerful tool for formally
describing the links between any objects (which form a context), especially between knowledge objects. This
method is based on lattice theory (Wille, 2009), which can be used to solve problems of interoperability
assessment between information systems within companies. An extension of the CFA mechanisms has been
introduced in (Rouane-Hacene et al. 2013) and called Relational Concepts Analysis (RCA) where the focus is on
data sets compatible with Relationship Entity (RE) Models (Chen, 1976) or, alternatively, with the Resource
Description Framework (RDF) (Miller, 1998). Linked Open Data has been recognized as a valuable source for
general information on data mining and knowledge graphs are a method for formalizing this knowledge (Ristoski,
2016). This is a method for extracting conceptual knowledge from multi-relational data. Information mining is part
of the field of study called data mining (Manning et al, 2008), information that can be related to each other can
be studied through the methods of multi-relational data mining (MRDM) (Džeroski, 2003) which deals with multi-
contextual data. The RCA method is not limited to extracting knowledge from separate contexts: it aims to
express knowledge by interoperating the semantics of different contexts, i.e., in addition to extracting knowledge
from a specific context, the data contained in the other contexts are used to enrich the knowledge extraction.
Scientific locks addressed by the thesis Faced with this challenge, the scientific locks concern:

1. The lack of formalization (in other words mathematical) of the agglomeration of information in the models of
systems and the information systems that emerge from them, as well as the definition of the semantics of the
concepts and relationships that they implement, to ensure their common understanding, and to facilitate their
interoperation by minimizing semantic losses ;
2. The adaptation (or even extension) of tools of an algebraic and/or geometric nature (lattice theory, category
theory, homological algebra) in the context of the analysis of formal concepts, for the processing of
heterogeneous data in constant evolution. This is a recent approach that has not yet been fully developed (even
from a mathematical point of view) for this type of data.

Objectives and contribution to the SNMSF R&D axes :
The proposed thesis falls within the framework of the Strategic Expertise Domain "Industry 4.0". Previous work
has demonstrated the interest of a holistic approach to all information resources and has allowed the
development of a methodology focused on the optimization of knowledge management. The present proposal
therefore aims to continue the work initiated by the development of a formal method for extracting and reusing
knowledge from heterogeneous sources for the semantic interoperability of distributed architectures. This
method will be integrated as a methodological building block in the process of managing information resources
suitable for decision support. It thus constitutes a structuring element for the project of a large SNMSF platform
"Mon séjour en Montagne" (My stay in the mountains).
Keywords:
Formalization of knowledge, Multi-relational data mining
Conditions:
Three years from Octobre 2020 to Octobre 2023
Thesis CIFRE, Syndicat National des Moniteurs du Ski Français (SNMSF), Grenoble and Nancy.
Remuneration at the level of a Computer Engineer.
Studying with strong skills in Algorithms and knowledge formalization.
Department(s): 
Eco-Technic systems engineering
Funds:
The funding comes from a grant paid through the ANR CIFRE tool and the SNMSF salary.