Ph. D. Project
Reliable SDN automation of time-sensitive and dependable networks
2019/09/30 - 2022/09/30
General context Networks are now managed by administrators based on metrology and supervision information to respond to traffic developments and communication infrastructure failures. A new trend is to automate this network management by deploying network controllers, dynamically capable of detecting and even anticipating the occurrence of network system state changes and programming its reconfiguration (Software-Defined Networking (SDN) [5, 8]). In the field of time-critical and reliable applications, the network is ultimately the only subsystem that has not yet been automated. This is more detrimental in the context of Factory of the Future and Industry 4.0, for which the requirements of flexibility, reactivity, operational safety and security are ever more stringent. This thesis therefore concerns the safe automation of networks, vectors of communications for highly constrained applications. As noted in [9], the controller is a key element in SDN, potentially subject to attacks (e. g. malicious traffic leading to Denial of Service attacks [4]) and failures. The rule generation process can thus be influenced and lead to inconsistencies in plans. Links and equipment can be congested or even used even if they fail. Even if internal mechanisms (intrusion detection, attack prevention, rule conflict detection) can reinforce a controller's security, multiple controller architectures (HyperVisor, Onix, ONOS, etc.) appear in the literature (surveys [12, 6, 3]), thus avoiding the single point of failure/invasion and ensuring data plan continuity. Topic The use of multiple controllers is traditionally addressed for the management (scaling, load balancing) of large architectures by partitioning/distribution into limited administration domains or for improving the robustness and security of network control. It is in the latter case that we place ourselves, by approaching security as a problem of unavailability [12] and trust in control, and thus considering both reliability and security. In most of the work, we consider a cluster of controllers (of heterogeneous systems and languages to limit the impact of attacks) and we add a decision-making interface (voting type according to statistics of minority dummy rules generation), a proxy between the southern interface and the controller(s) or a broker agent calculating a reputation indicator for each controller. However, whether in linear or hierarchical topology, coordination between controllers requires synchronization and exchange/notification between controllers [2]. This interface (East-West) may be unsecured (encryption required), requires active connection maintenance (keep-alive messages [1]), and does not benefit from recognized standardized protocols [11]. It can therefore be used to convey malicious information and thus lead to inconsistencies in rules between controllers. We therefore propose to investigate a passive replication (without East-West synchronization either in polling or producer/consumer mode [6]) or when the main controller enters a failure/unavailable state, the backup controller takes over (primary-backup replication). This is an asymmetric strategy with roles defined between controllers. To detect an abnormal situation (failure or attack), we propose that the backup controller checks (in real time - no confidence in the main controller - and asynchronously) the rules in force, by capturing and analysing network traffic. We focus on an Industry of the Future context, where critical traffic is known (at least a set of information - period, size, sequence of even source/destination events, Quality of Service requirements: throughputs, delays, loss rates). The control will be declared healthy if and only if the characteristics of this sensitive traffic are respected. It is accepted that this information will a priori be known to the emergency controller (North interface), which will allow him to take control of the main controller [11]. We propose here to extend the concept of network observability defined in [10, 7]. The first point is to determine the optimal placement of traffic capture agents (considering both control and transport links). The solution must minimize the number of agents and the induced load on the network, be robust to control plans based on non-deterministic methods, support a level of redundancy, and minimize transport delays to the backup controller. This multi-constraint optimization problem can be modelled as linear (real variable) programs, linear integer programs, or mixed linear programs, and a heuristic will be proposed. Then, it will be a question of determining from the trace whether the flows are correctly served by the main controller. Two types of tests are envisaged: consistency and time tests. On the one hand, each event associated with these flows will be used to feed a state model (representation of the packet scheduling and their location), which will be compared by flow model-checking with the requirements of each flow (expressed for example according to a tagged signal model) in order to verify that no unacceptable situation is reached. On the other hand, from the previously developed model and a reconstruction of the data and control plans (and ultimately, the switching rules defined by the main controller), the time properties (such as end-to-end delays) will be calculated according to deterministic theories (network calculus or trajectory approach) in order to be compared to the requirements (such as freshness). The comparison should consider the difference between the date perceived by the backup controller and the arrival date at a point in the network, which uncertainty should be increased analytically when optimizing the placement of capture agents. If the requirements are not met, the backup controller will take over (we will assume an OpenFlow Master/Slave connection). A reconfiguration solution should be proposed, especially when the communication strategy between the controller and the equipment is in-band [6], i.e. when it uses data transport links (and not dedicated links). The important point to study here is the management of the transient, namely how to ensure the consistency of the tables (and ultimately the order of the messages and that no packet will be lost) when only a subset of the equipment has to be migrated to the new controller (only one controller having simultaneous control over an equipment), and more specifically in case of a packet avalanche. It will therefore be necessary to define a migration order of the equipment to ensure the consistency of the plans. More generally, the scalability question and the complexity of the algorithms selected will have to be evaluated. Experimental platform The previous studies will be validated in practice on the ReLans platform (located at CRAN and identified as a research platform in the Digitrust project). Composed of 8 Cisco IE3000 switches (supporting PTP - Precision Time Protocol), and about 100 µ-PC, it has just been enriched with 5 IE4000 switches supporting TSN (Time Sensitive Network) mechanisms and SDN protocols. Each of the reconfiguration algorithms and protocols will be implemented and evaluated on this platform, with failure generation and attack injection.

[1] J. Benabbou, K. Elbaamrani, and N. Idboufker. Security in openflow-based sdn, opportunities and chal- lenges. Photonic Network Communications, Oct 2018.
[2] F. Benamrane, M. Ben mamoun, and R. Benaini. An east-west interface for distributed sdn control plane : Implementation and evaluation. Computers & Electrical Engineering, 57 :162 175, 2017.
[3] T. Hu, Z. Guo, P. Yi, T. Baker, and J. Lan. Multi-controller based software-defined networking : A survey. IEEE Access, 6 :15980 15996, 2018.
[4] R. Macedo, R. de Castro, A. Santos, Y. Ghamri-Doudane, and M. Nogueira. Self-organized sdn controller cluster conformations against ddos attacks effects. In 2016 IEEE Global Communications Conference (GLOBECOM), pages 1 6, Dec 2016.
[5] A. Maleki, M.M. Hossain, J.-P. Georges, E. Rondeau, and T. Divoux. An sdn perspective to mitigate the energy consumption of core networks GEANT2. In SEEDS 17, International Conference on Sustainable Ecological Engineering Design for Society, Leeds, Royaume-Uni, September 2017.
[6] Y.E. Oktian, S. Lee, H. Lee, and J. Lam. Distributed sdn controller system : A survey on design choice. Computer Networks, 121 :100 111, 2017.
[7] D. Petit, J.-P. Georges, T. Divoux, B. Regnier, and P. Miramont. Freshness analysis of functional sequences in launchers. In 4th IFAC Symposium on Telematics Application, Porto Alegre, Brésil, November 2016.
[8] D. Petit, J.-P. Georges, J.-P., T. Divoux, B. Regnier, and P. Miramont. A strategy to implement a soft- ware defined networking controller in a space launcher. In 3rd IFAC Conference on Embedded Systems, Computational Intelligence and Telematics in Control CESCIT 2018, Faro, Portugal, June 2018.
[9] C. Qi, J. Wu, G. Cheng, J. Ai, and S. Zhao. An aware-scheduling security architecture with priority-equal multi-controller for sdn. China Communications, 14(9) :144 154, Sept 2017.
[10] J. Robert, J.-P. Georges, J.-P., T. Divoux, P. Miramont, and B. Rmili. On the observability in switched Ethernet networks in the next generation of space launchers : Problem, challenges and recommendations. In SPACOMM 15, 7th International Conference on Advances in Satellite and Space Communications, pages 1318, Barcelone, Espagne, April 2015.
[11] F. Shang, Y. Li, Q. Fu, W. Wang, J. Feng, and L. He. Distributed controllers multi-granularity security communication mechanism for software-defined networking. Computers & Electrical Engineering, 66 :388 406, 2018.
[12] Y. Zhang, L. Cui, W. Wang, and Y. Zhang. A survey on software defined networking with multiple controllers. Journal of Network and Computer Applications, 103 :101 118, 2018.
Eco-Technic systems engineering