1. Introduction
Future digitalization of the process industry by connecting more advanced and powerful technologies like cloud computing and machine learning is expected to increase production efficiency. As a response to the envisioned effects of the integration of new technologies in the automation systems, several initiatives such as Industry 4.0, Industrial Cyber-Physical Systems (CPSs), and Industrial IoT systems have been formed to respond to the needs, to mention a few [
1,
2,
3,
4]. Furthermore, research towards closing the large gap with respect to interoperability is also ongoing, such as OPC Unified Architecture (OPC UA) [
5,
6]. In order to harvest the envisioned effects, it has to be assumed that any information can easily be accessed from everywhere, independent of residing in Information Technology (IT) or Operational Technology (OT) systems. Therefore, it is paramount to reduce or even eliminate the boundaries between IT and OT subsystems to pave the way for innovations, new products, and services towards higher automation levels.
However, with the existing traditional architecture, the automation pyramid, the possibilities are limited. The automation pyramid, illustrated in
Figure 1, shows a typical example of today’s architecture in existing industrial automation installations. This architecture has been widely adopted, evolved, and implemented in the last 30 years, accompanied by hierarchical, diverse, and domain-specific communication structures. This implies that there is a large installed base that cannot easily replace the automation systems and network infrastructures in one maintenance task with the latest technologies. In order to benefit and increase the probability of adaption in the installed base, e.g., brown-field installations, a stepwise introduction would be beneficial. The COVID-19 pandemic has further pushed the need for stepwise upgrades due to the fact that all personnel cannot be on-site, thus, many need to solve their daily tasks remotely.
As the traditional automation pyramid serves its purpose with respect to safety, fault containment, and enforcing traffic types into dedicated levels in order to meet requirements such as availability, deterministic behavior, and high throughput, the integration of new high-level functionality is hard to achieve. New functionality that requires (new) information from the factory floor imposes certain challenges. This is mainly because the networks at the lowest levels have real-time requirements, and in order to guarantee deterministic behavior, other unrestricted traffic is typically not allowed, or not even possible.
The desired network architecture design that accommodates both IT and OT traffic on the same platform is illustrated in
Figure 2, where a single bus is shared by different services on time-stringent OT network components (e.g., Field Communication Interface (FCI), dedicated I/O devices, Centralized Network Configuration (CNC), Human Machine Interface (HMI)) and IT network entities (e.g., Enterprise Resource Planning (ERP), Manufacturing Execution Systems (MES)). In fact, customers aim at increasing operational efficiency by decreasing downtime, providing interoperability for the best of the breed, and increasing flexibility and portability. This can be further translated into collapsing multiple one-purpose networks that carry dedicated protocols into one general purpose network that can accommodate both IT and OT traffic in the same infrastructure.
However, this imposes new challenges and ways of working when deploying converged networks. Especially, priorities, Quality of Service (QoS) levels, as well as avoidance of bottleneck links have to be carefully handled, since all traffic classes and traffic types need to coexist while meeting the sum of all requirements to have a sustainable production plant. This task may be difficult even in a green-field scenario, where the latest technology can be installed and commissioned without previous systems that impose constraints. In the brown-field scenario, it is even more challenging, as one may not know what traffic exists and where or how close to the capacity ceiling the network is. Having actual traffic performance from actual installations will guide the research in the green-field scenario and is essential in brown-field scenarios. By ensuring the overall network performance, it would be possible to enable new services and business opportunities by unlocking stranded information and letting, for example, mobile operators and maintenance personnel have access to relevant information from the converged network. This is not an easy task in itself. However, the Return-on-Investments (RoI) of the installed base has to be considered as well to enable a step-by-step transition towards the future industrial networks.
In addition, the integration of IT and OT in the traditional and hierarchical architecture is hard to achieve since bypassing of levels is not supported by the system architecture. Besides, identifying the gap between the new architecture and the traditional one is exceptionally challenging. This makes it difficult for the existing architectures to simply adapt to the evolving opportunities in ICT technology [
7].
This paper aims to discuss challenges that will be encountered through the evolution process of the industrial networks towards converged networks. By analyzing real traffic from a typical installation in the process industry, we present challenges that have to be addressed and solved. Based on the identified challenges and gaps, we shed some light on future research directions to safeguard the installed base and the RoI in the process industry.
The next section presents the analysis of the current state of a control network and characteristics of the existing traffic flows in a case-study.
Section 3 gives an overview of the relevant technologies that aim to realize the IT/OT integration. In
Section 4, challenges along the way of the network evolution from guidelines, security, engineering tools, reliability, distributed real-time, and synchronization aspects are discussed. Deriving from the discussed challenges,
Section 5 identifies some research directions that can address these challenges, before
Section 6 concludes the paper.
3. Technology Enablers for Future Control Architecture
Fundamental technology drivers that make the future industrial network happen are seen in the realization of the three main domains of (i) deterministic communication infrastructure, (ii) edge/fog computing, and (iii) security.
Deterministic communication infrastructures are expected to be addressed according to the Time-Sensitive Networking (TSN) standard, which is under the development of IEEE standardization. TSN is working to provide an innovative networking solution for closed-loop control systems that support mixed traffic in critical applications such as transportation (vehicular/automotive networks, railway, and autonomous cars), motion control, power utility automation, and industrial distributed control systems [
8]. Another technology driver is fog/edge computing, which extends cloud computing capabilities in terms of storage, computation, and networking to the edge of the network [
9]. Fog computing is the leading architectural enabler supporting low latency, low-power consumption, high reliability, and location awareness that might be of interest in the aforementioned critical applications. Security is another crucial entity that enables future industrial automation. In traditional industrial networks, the main security concern usually correlates to safety definition, where the main goal is to protect humans and machines against the consequences of system failures [
10]. By integrating information technology into the industrial control systems, protection against cyberattacks has become the major design goal of Industrial IoT systems [
11].
In this paper, we focus on the core functionality of enabling IT traffic in the OT networks. The main challenge is to find means to accommodate an unknown amount of IT traffic while preserving the real-time guarantees for OT traffic in the network. TSN is a developing technology that is expected to have a significant impact on the realization of such functionality while providing a deterministic communication infrastructure. Unfortunately, there is still a lack of corresponding technology candidates for edge/fog computing and security solutions. We believe that when IT and OT convergence is in place, the focus can shift to higher-level functionality and new ways of deploying automation systems for process automation, like edge and cloud solutions. However, history has proven that security cannot easily be retrofitted, and thus, this has to have been taken into account from the start.
3.1. TSN
TSN is a set of IEEE 802 Ethernet substandards that have been developed by the Time-Sensitive Networking task group of the IEEE 802.1 working group [
12] with the aim to enable deterministic real-time communication over Ethernet. TSN achieves determinism over Ethernet using a set of tools that can be partitioned into four main domains of (i) time synchronization, (ii) bounded low latency, (iii) ultra reliability, and (iv) resource dedication.
Table 1 lists a number of standards developed in each of these four domains. Time synchronization deals with timing and synchronization; in other words, it provides the network with synchronized clocks. 802.1AS-Rev is the main standard in this domain. The Bounded low latency domain is responsible for the determinism, which guarantees data availability at the expected time by applying different scheduling and forwarding techniques. The Ultrareliability domain targets the reliability of the system, and focuses on the errors, faults, and redundancy techniques to keep the system dependable. The Resource dedication and application programming interface (API) domain enables the high-level planning and configuration required to allow systemwide feature capabilities in heterogeneous networks.
The ongoing work on TSN promises hard real-time capabilities. It is also claimed that TSN can reserve the bandwidth exactly according to the application latency requirement and consequently enables the convergence of different networks into one common network that transmits time-sensitive control data together with best-effort data and data with soft real-time requirements. Although this can be seen as a real game-changer for real-time automation applications, it is still unclear how to efficiently select a set of TSN Standards and tune the relevant settings.
In order to derive and develop technical solutions that will enable a transition towards an information-centric architecture, we need to consider how the process industries are established and evolving to stay profitable and competitive. The most obvious scenario that comes to mind is when a brand new production site is built. In this scenario, the automation system providers are competing with their latest and most advanced products and system solutions that solve the site owner’s needs. This is referred to as a green-field installation. Another scenario in process automation, probably the most common scenario, where the company is long-time established with one or more large-scale production sites. Those established companies are most often either modernizing or extending their production capacities or product ranges. In this scenario, there is already a large installed base of automation equipment, as well as existing network infrastructures, in place. This is referred to as a brown-field installation. Based on the two scenarios described above, one can identify the need for a technology migration path in order to achieve the desired market penetration to justify technology investments. In fact, the transition from the de-facto hierarchical system architecture is essential for business and technology success.
Standardization of communication protocols used in hierarchical industrial systems is a critical issue. Today’s control systems widely use Profinet and OPC Unified Architecture (OPC UA) to meet the communication requirements of different levels. OPC UA has its strengths in vertical communication between devices of different levels and controller-to-controller communication at the control level. At the same time, Profinet meets all the communication requirements in the field network. Today’s network only allows Profinet as the real-time capable protocol (besides TCP/IP-based traffic). TSN will make it possible to converge different communication means by simultaneously running multiple real-time-capable protocols in a single convergent network. The IEC/IEEE 60802 TSN Industry Automation profile proposes flattening of networks so that Profinet and OPC UA to operate on the same network physical layer along with IT data communication such as cloud and video. With TSN integration, the field-level device data and diagnostic information can be collected by controllers from the field network using Profinet over TSN. OPC UA over TSN can deliver the aggregated information to higher-level systems such as ERP, MES and cloud to make informed decisions.
To achieve a flattened network architecture, the network infrastructure needs to cope with the aggregated traffic while preserving the requirement of each traffic flow. As the primary goal of process industries is to produce goods, all the functions are required to operate flawlessly with high availability. If one function fails, the most common scenario is that the process cannot continue to operate. This implies that the requirements and characteristics of different types of traffic flows needs to be studied in relation to other traffic. Furthermore, a high-level classification of various traffic flows can be done by grouping them into different traffic types. The traffic types can then be mapped to network functionality that enables the desired characteristics.
3.2. TSN in the Process Industry
The attempt of standardization of network traffic types is still in progress under the joint IEEE/IEC 60802 standard.
Table 2 shows the most developed proposal for traffic classification presented at IEEE/IEC 60802 via the Industrial Internet Consortium, IIC [
13]. However, the values for a period, tolerance to loss, and criticality for each traffic type may vary with different applications, which may not strictly follow those listed in
Table 2. Mapping various traffic types to TSN mechanisms that fulfill the requirements of each traffic type is also ongoing in IEEE/IEC 60802, with inputs from the Shapers Initiative via Open DeviceNet Vendors Association (ODVA).
Table 3 aims to derive the specific TSN mechanisms together with the appropriate recommendation type based on the required QoS level.
As can be seen from
Table 2 and
Table 3 the specification is on its way, but more work is needed to remove ambiguities. For instance, from a process automation perspective, the support for brown-field installations seems to be missing. Moreover, as can be seen from the captured data-set at Iggesund Mill in
Section 2, it is far from trivial to map existing communication flows to dedicated TSN mechanisms. It is, by all means, no exercise that is quickly done and requires a lot of domain knowledge in order to derive the correct conclusions. The final performance of TSN networks will heavily depend on the input parameters, and those parameters might be difficult to derive. Assuming several parameters to be derived for each communication flow, the engineering efforts quickly become infeasible and require excellent tool support and engineering guidelines.
4. Evolution Challenges
This section focuses on main evolution challenges including engineering guidelines, tools, security, reliability, and time-synchronization issues and describes the real-time challenges in distributed systems. We also elaborate on possible directions for research in the area of IT/OT convergence.
4.1. Engineering Guidelines
Despite all the benefits TSN promises, it can not be harvested with just a click. For the green-fields, they can adopt TSN in a revolutionary way since they have the largest flexibility to choose from all the available TSN functionalities/devices with a clean slate network including TSN aware end-points. After the traffic analysis, managing all the traffic is another challenge because the legacy end-devices may not support TSN capabilities/management to apply the TSN tools directly. Furthermore, engineers need to derive several parameters, e.g., deadlines, jitter, and packet sizes, to create network schedules, independent of green-field or brown-field installations. With good tool support, the green-field installations can in any case be streamlined compared to the brown-field installations. It takes effort to come up with evolutionary technologies for brown-field networks, and they should be compatible with other TSN technologies that may be deployed in the future. There are also foreseen difficulties from the personnel perspective, especially with the IT and OT staff. With the integration of the IT and OT networks, they are inevitable to interact more often and probably will have disagreements about the responsibility over some new issues emerging in the integrated network (e.g., reserving time slots, tuning maximum burst rates per link, and setting stream identifiers), as well as agreeing on best practices.
4.2. Security Challenges
The transition to more open network architectures, combined with Big Data and Cloud computing, will bring profound opportunities to smart manufacturing systems. At the same time, new security challenges are presented with billions of smart devices interconnected in the world of Industrial IoT. The biggest security concern comes from connecting IoT devices, including sensors, actuators, and edge-computing units, with existing controllers and end devices in automation and manufacturing information networks. These challenges are, of course, on top of applied secure protocols like OPC UA, or the necessity of securing other protocols like Modbus TCP, which are separate essential topics to address. The existing OT and IT security approaches and policies [
14,
15] will need to be adapted to embrace these new IoT security challenges. One important direction is the authorization management that assigns the different access levels to only access the necessary data from the OT domain. From the device perspective, smart industrial devices have much smaller footprints of computing power and operating systems. The convention in the traditional automation network assumes that no software or patches are needed once installed, which leaves them to be an important attack surface that is vulnerable to new types of malware or denial of service attacks.
4.3. Engineering Tool Challenges
In TSN, the promised performance relies on traffic engineering and scheduling, which further rely on two TSN entities: (i) Centralized User Configuration (CUC) and (ii) Centralized Network Configuration (CNC), as specified in IEE 802.1 Qcc. CUC is responsible for discovering the network’s physical topology, collecting requirements and properties of every TSN flow from the end devices, e.g., the packet size, cycle time, end-to-end latency, and sending the collected information to the CNC. The CNC executes the scheduling and returns the decision of whether a TSN flow can be accepted. For an accepted TSN flow, the corresponding end-to-end path will be sent to CUC together with the scheduling along the path. The scheduling is also sent out from CNC to the bridges along the end-to-end path via network management protocols, e.g., NETCONF.
The first challenge is the lack of a standardized northbound interface of CUC, i.e., the parameters to characterize each TSN flow, which are necessary to reduce the manual input from the operator and enable the plug-and-play functionality of end devices. Another challenge arises from online diagnosis and configuration. After the initial offline scheduling and configuration, the process is expected to operate with very short and even zero downtime. This requires an online diagnosis to detect potential problems beforehand and may further result in some reconfiguration of the related TSN flows. Moreover, online configuration is needed when adding new devices or new applications to the operation network. In this case, CUC will send requests for the new TSN flows to CNC responsible for calculating schedules and updating the configuration. However, when the network is unable to handle new TSN flows due to the lack of available resources, it is not appreciated to simply decline the request and return a notice to upgrade the network. Instead, the CNC should provide a good scheduling strategy to accommodate all the new critical TSN flows while making a negligible effect on the existing TSN flows to limit the configuration changes on the bridges.
From an implementation perspective, the support of YANG models and NETCONF is not entirely in place. Despite the fact that the YANG data model has been included in the TSN standard and it is agreed that NETCONF will serve as the network management protocol between CNC and the bridges, the development of YANG models and NETCONF in many switches progress rather slow compared to other TSN functionalities. The lack of efficient configuration presents a gap to enable an automated engineering workflow.
4.4. Reliability Challenges
Providing a reliable system is essential for any type of network, and TSN is not an exception. Implementing fault-tolerance solutions is one of the major steps towards system reliability, which is mainly addressed through redundancy mechanisms. In TSN, fault tolerance can be achieved through two main substandards: (i) path control and reservation (IEEE 802.1Qca) that enable the creation of multiple paths in the network; (ii) frame replication and elimination (IEEE 802.1CB) that allow replication of streams and deploying them through the paths created by the Qca substandard—see
Table 1. However, the way that those aforementioned mechanisms are set is very dependent on the application requirements [
16]. One setting, called the decoupled approach, allows for arbitrary redundancy protocols to be utilized by decoupling the stream reservation from the redundancy mechanism. This setting is more appropriate for applications that have less stringent reliability requirements. Another setting, called harmonized, integrates the establishment of the reservation and the redundancy requirements at the cost of higher protocol overhead and bandwidth demands.
Another important aspect of process automation systems is that a single point of error must be avoided in many scenarios. In order to meet that requirement, the end-nodes themselves need to be redundant. From a network perspective, this means that there are two independent network ports on the devices, which are required to be connected to two different switches to avoid a single point of error. In this perspective, the IEEE 802.1CB and IEEE 802.1Qca standards are not sufficient as they only provide network redundancy and not end-to-end redundancy. Moreover, an IEEE 802.1Qca update is needed to associate dual network ports of the redundant end-points.
4.5. Distributed Real-Time System Challenges
Distributed real-time systems are commonly deployed in process automation, where the whole process is carried out with multiple control networks, each consisting of a local set of sensors and actuators [
17]. The process conditions are monitored periodically by the sensors, which can be denoted as a snapshot of the whole process line. The controllers are responsible for making computations based on the received snapshot and sending out action commands to the actuators within predefined time bounds. To guarantee correct operation, the periodic process snapshot, as well as the functionality of all the controllers, should be available at every controller that needs the data before executing the control application. For instance, all the controllers belonging to the distributed control system need to act on the same set of information in order to guarantee the correct output. This is a general requirement that the system should be in a consistent state at all times, independent of if there are changes in process data, applications, or in the network. Changes have to be coordinated in time as well in order to preserve the real-time properties during run-time changes and avoid bringing the ongoing process down.
4.6. Synchronization
Synchronization of industrial devices and systems with adequate accuracy and precision is an essential part of monitoring and control functions of automation systems. Different synchronization requirements per application, harsh environment, and nondeterministic networks make the synchronization in industrial systems challenging.
With TSN, the IEEE802.1AS standard is introduced. The revision of this standard, IEEE802.1AS-rev, is under discussion. It is envisioned to provide fault tolerance and highly accurate time synchronization. The green-field installations would get benefited by implementing this feature-rich synchronization profile. However, the comprehensive functional and security performance of a new profile in the industrial environment has yet to be assessed.
In the case of brown-field installations, the automation systems typically require one to a few thousands of milliseconds of synchronization accuracy for most of their applications. Since the TSN networks operate at the synchronization accuracy of nanoseconds order, integrating legacy industrial devices to the TSN network and thereby achieving deterministic data delivery of critical messages is technically challenging.
6. Conclusions
There is an increasing need to bridge the gap between the IT and OT networks in the process industry to take the next leap in productivity and innovation. Our case study at a typical process automation factory is a first step to provide the characteristics of OT traffic and aims to inspire more research and standardization work towards the IT/OT convergence for process automation. Due to the variety of process automation scenarios as well as the underlying network topology, applications, and communication protocols used, more case studies should be taken to reveal the comprehensive traffic characteristics in process automation. TSN is one promising technology towards collapsing the networks. However, in order to deploy TSN in large-scale production facilities, many challenges need to be addressed beforehand. Specifically, further research in the areas of efficient engineering, security, automatic tool support, traffic modeling and profiling, and online monitoring are necessary. Moreover, it is crucial to preserve the performance and characteristics of the distributed real-time systems that are required for process automation. We appeal for more research efforts on deriving engineering guidelines for brown-field, including network performance analysis, as it is the base upon which to add other functionalities and eventually integrate IT and OT systems.