A Hybrid and Modular Integration Concept for Anomaly Detection in Industrial Control Systems

Goetz, Christian; Humm, Bernhard G.

doi:10.3390/ai6050091

Open AccessArticle

A Hybrid and Modular Integration Concept for Anomaly Detection in Industrial Control Systems

by

Christian Goetz

^1,2,*

and

Bernhard G. Humm

¹

Hochschule Darmstadt—Department of Computer Science, University of Applied Sciences, 64295 Darmstadt, Germany

²

Yaskawa Europe GmbH, Philipp-Reis-Str. 6, 65795 Hattersheim am Main, Germany

^*

Author to whom correspondence should be addressed.

AI 2025, 6(5), 91; https://doi.org/10.3390/ai6050091

Submission received: 10 March 2025 / Revised: 10 April 2025 / Accepted: 21 April 2025 / Published: 27 April 2025

(This article belongs to the Special Issue Artificial Intelligence Challenges to the Industrial Internet of Things and Industrial Control Systems Applications)

Download

Browse Figures

Versions Notes

Abstract

Effective anomaly detection is essential for realizing modern and secure industrial control systems. However, the direct integration of anomaly detection within such a system is complex due to the wide variety of hardware used, different communication protocols, and given industrial requirements. Many components of an industrial control system allow direct integration, while others are designed as closed systems or do not have the required performance. At the same time, the effective usage of available resources and the sustainable use of energy are more important than ever for modern industry. Therefore, in this paper, we present a modular and hybrid concept that enables the integration of efficient and effective anomaly detection while optimising the use of available resources under consideration of industrial requirements. Because of the modular and hybrid properties, many functionalities can be outsourced to the respective devices, and at the same time, additional hardware can be integrated where required. The resulting flexibility allows the seamless integration of complete anomaly detection into existing and legacy systems without the need for expensive centralised or cloud-based solutions. Through a detailed evaluation within an industrial unit, we demonstrate the performance and versatility of our concept.

Keywords:

anomaly detection; industrial control systems; intelligent manufacturing; deep learning; hybrid integration; modular system; decentralized integration

1. Introduction

An industrial control system (ICS) can be described as a combination of hardware and software connected by different networks, focusing on controlling an underlying industrial physical process [1]. In an ICS, various control systems, such as supervisory control and data acquisition (SCADA) systems, remote terminal units, and programmable logic controllers (PLCs), are combined with sensors and actuators [2].

In order to meet the growing requirements and to realize the complex processes of modern industry, the integration of industrial Internet of Things (IOT) and cyber–physical systems (CPSs) within an ICS is also increasing [3]. These usually high-performing and highly networked devices enable an extensive exchange of data within the system [4]. The ongoing development of individual subcomponents of the ICS enhances the potential for integrating advanced functionality, including comprehensive anomaly detection (AD), data collection in real-time, an independent adaptation of the process, or rapid reaction to changing requirements and environmental parameters [5].

As processes become increasingly complex and the dependency of the entire production on individual sub-processes rises, implementing effective error detection mechanisms becomes important [6]. Individual errors can impact the entire system, leading to faulty processes, a breakdown of the entire system, or potential risks to people. One possibility to identify such errors and enable early response is through AD in the physical process of the ICS [7]. In an ICS, anomalies can be defined as all deviations from a predefined, intended normal process [8]. Integrating effective AD allows for monitoring of the entire system, detecting anomalies, and ideally responding immediately to potential errors to prevent or mitigate them. Therefore, integrating an effective AD is crucial for achieving modern and secure ICSs [9].

At the same time, integrating AD in an ICS poses enormous challenges. While the underlying physical process is highly complex, each ICS differs in structure and has a high variance of used hardware and software. Furthermore, individual components of the system are usually limited in terms of memory capacity and processing performance [10].

Considering these aspects, a centralized integration approach for AD is usually chosen. All available data are transmitted to a centralized processing unit, such as a cloud server, where the data are processed, and the results are then sent back to the devices. Although this approach addresses the previously mentioned challenges, it also presents several significant drawbacks. These include the additional costs associated with establishing a centralized high-performance computing unit, difficulties in transmitting the large volumes of data generated by the various devices in the ICS [11], concerns about data security during the collection, transmission, and external storage of process data [12], and increased response times due to communication delays during transmission [13].

Due to the current global situation, the shortage of resources, and rising energy prices, as well as advancing developments within ICSs, decentralized approaches are becoming increasingly important [14]. In a decentralized concept, all necessary functionalities are distributed across the available subcomponents. This structure offers several advantages, including reduced cost and energy consumption, as it eliminates the need for a central computing unit and minimizes the required infrastructure. Processing data directly on the respective device helps lower data transmission, latency, and response time [15]. Data security and knowledge protection can be ensured since all generated data are processed and stored within the system rather than transferred to a central or external location.

Although decentralized integration offers several benefits, it requires more implementation effort. For example, AD techniques must be adapted to the performance capabilities of individual devices, communication between sub-devices must be established, and structural limitations must be considered [16]. Additionally, realizing the direct integration of AD for every device within an ICS is often impossible. In particular, legacy systems tend to include a higher number of devices that are either designed as closed systems or lack the resources necessary for direct integration.

In order to ensure the optimum utilization of all available resources and enable complete and effective AD, even for older or closed systems, we present a hybrid and modular integration concept. We organize the various functionalities required for implementing AD into separate modules, establish unified communication, and integrate them into the different devices available in the ICS. While our primary focus is on achieving decentralization, if this is not feasible, functions can be outsourced to adjacent devices, or additional high-performance, cost-effective single-board computers (SBCs) can be implemented. Therefore, in this concept, the term "hybrid" refers to the ability of the integration to enable both decentralized and centralized implementation options. For instance, certain functionalities can be combined and executed at a central location, while others can be simultaneously distributed across multiple devices.

The contributions of this article can be summarized as follows:

We provide a detailed specification of the real-world challenges and resulting requirements that arise from integrating a hybrid anomaly detection in an ICS.
We introduce a novel hybrid and modular integration concept for AD in ICSs and outline the individual modules, their functionalities, and the integration process.
We demonstrate the effectiveness of our approach through a comprehensive evaluation of a proof-of-concept implementation against the predefined requirements.
We show that achieving high-performing, secure, and efficient anomaly detection is feasible even for resource-constrained legacy ICSs while maximizing the utilization of all available resources.

In this work, we focus on the modular and hybrid concept, its realization, and its integration into an ICS. To remain within the scope of the work, we briefly discuss the AD method but do not describe detailed tests and analyses of the detection performance. A comprehensive evaluation of the various methods for AD, including detailed tests and comparisons, can be found in our previous work in [8].

The remainder of this paper is structured as follows: Section 2 provides an analysis and discussion of related work. Requirements and challenges are outlined in Section 3. Section 4 includes a detailed description of the concept and the individual modules. The prototypical implementation is explained in Section 5. In Section 6, the evaluation based on an experimental integration is described. Finally, Section 7 offers a summary and an outlook for future work.

2. Related Work

Surveys of different AD technologies for ICSs can be found in [17,18,19]. AD methods for ICSs can be distinguished into two areas based on the information used to detect anomalies. The first area uses network-level data, such as network data packets, whereas the second area identifies anomalies based on the underlying physical process of the ICS. While the former is mainly designed to detect external intrusions like cyber-attacks [20], the latter identifies all deviations from the regular intended system operation, which can be caused by a cyber-attack, production process errors, or changes in system characteristics, like the wear of mechanical components [21]. Based on this and the attempted direct integration within the individual devices of the ICS, the subsequent focus will be on identifying anomalies based on the underlying physical process of the ICS.

The literature contains various promising AD implementations for ICSs. In [22], 1D convolutional neural networks (CNNs) are used to detect cyber-attacks on a water treatment plant. The authors emphasize that 1D CNNs are structurally simpler, smaller, and faster than comparable recurrent neural networks (RNNs) when used for time series data, which enables integration into low-power devices. The authors of [23] propose an AD method for ICSs based on a long short-term memory (LSTM) autoencoder. They achieve high detection performance on an ICS dataset by reconstructing current data and predicting future values simultaneously. Ref. [24] introduces a dual attention network in which temporal causality is used to learn the causal relationships between device components to detect anomalies in multivariate time series data of the ICS.

Several studies about hybrid and decentralized approaches for AD in ICSs can be found. In [25], an AD method for ICSs based on a variational autoencoder (VAE) is presented. The individual models are trained by combining all sub-devices and a higher-level cloud server. The method is characterized by its low hardware requirements, fast training time, and high detection rate. A study on detecting anomalies in underground mining was conducted in [26], where the AD algorithm was distributed among the different edge devices to address the issue of insufficient available computing power. The experimental results show that the presented algorithm outperforms comparable algorithms, such as K-means and C-means, even with more devices. The authors of [27] demonstrate that performing distributed AD on individual edge devices can achieve results similar to centralized AD while significantly reducing data traffic. A similar approach based on sensing time series data from IoT edge devices is presented in [28], where the authors also show that distributed AD on individual edge devices can perform just as well as centralized AD.

Resource-efficient approaches and engines are required to effectively process the data streams produced by individual devices. In [29], the authors compare three data streaming engines for performance in resource-constrained devices. Apache Spark (https://spark.apache.org/, accessed 24 November 2024) and Apache Flink (https://flink.apache.org/, accessed 24 November 2024) emerged as the top performers for such applications. Another popular open-source distributed streaming platform is Apache Kafka (https://kafka.apache.org/, accessed 24 November 2024), which offers high scalability, fault tolerance, and real-time data exchange [30,31]. In [32], the authors conduct a comprehensive study of Apache Kafka and demonstrate its ability to handle large amounts of data, while being scalable and high-performing.

Different available studies about hybrid and decentralized frameworks and integration concepts in industrial systems can be found. In [33], a framework for detecting anomalies in the hypertext transfer protocol (HTTP) is presented. For this purpose, the AD process is distributed to the various nodes in the system instead of using a central server. The framework significantly increases detection speed and accuracy compared to centralized models. An agent-based approach for decentralized data analysis in industrial CPSs is shown in [34]. Different agent modules are distributed over cloud, fog, and edge devices in the concept to realize effective and continuous data analysis. In [35], a decentralized IoT architecture is introduced, where part of the services are moved from the cloud to the edge. Therefore, the key component is an IoT gateway, which includes several advanced functionalities and connects the devices to the cloud. Another implementation of an industrial cloud-edge computing platform can be found in [36], where various devices are connected via different communication protocols to a cloud platform. By collecting data in a cloud DB, models can be trained over an interface and transferred back to edge devices.

In summary, only limited references to hybrid and decentralized integration concepts are available. Even fewer are designed to be fully decentralized without needing a central unit, such as a cloud server, or are capable of being completely executed on low-performance devices. Overall, no existing work addresses a modular and hybrid anomaly detection integration concept that can be fully decentralized, considers the various challenges present in an industrial environment, and provides detailed measurements of latency, resource consumption, and energy usage.

Based on the identified limitations of existing methods, our work can be summarized as follows:

In comparison to other work, we consider in detail a whole range of real-world challenges that arise from a hybrid and modular integration of anomaly detection in an ICS. These include handling high-dynamic systems, limited resource availability, the complexity of the underlying process, data security and knowledge protection, increasing volume of data, risk management, and effective anomaly detection. After analyzing all these challenges, we define the respective requirements and consider all within the developed concept.
We present a hybrid and modular integration concept that, compared to other concepts, can be fully decentralized and centrally integrated. Within the concept, both approaches can be flexibly and efficiently combined to optimize resource utilization. The complete modular design facilitates easy expansion, enabling the system to adapt to changes, modifications, and diverse hardware within the ICS.
In contrast to other work, we implement the described concept prototypically within a real ICS and utilize existing devices to maximize decentralization and resource efficiency. We demonstrate the effectiveness of the approach through a comprehensive evaluation against the previously defined requirements and present a detailed analysis of the required memory and CPU power, the energy consumption, and the achieved latency.
We demonstrate that the integration we present enables effective, secure, and efficient anomaly detection in resource-limited ICSs without relying on a central high-performance unit, which is often necessary in many other approaches. Additionally, the introduced concept can be easily expanded and adapted, allowing the implementation in existing and legacy systems with minimal effort.

3. Challenges

In this section, we outline the challenges and the resulting requirements for developing and integrating such a concept in an industrial environment.

Highly dynamic systems: ICSs are highly dynamic, with varying system structures and hardware. Devices from different manufacturers are connected via various communication protocols to realize an underlying physical process [37]. During the lifespan of a system, processes can change, and hardware components may be added, removed, or replaced. This presents a significant challenge, as the continuous and repeated development of one AD for the entire system is associated with time, effort, and increasing costs [38]. Consequently, the concept should enable the integration of a wide variety of hardware while remaining flexible and adapting to system changes.
Limited available resources: Due to the varying hardware used in such systems, the resources available for a possible integration are also changing [39]. This presents an important challenge, especially in decentralized integration, as not all devices can integrate additional functionality directly. Furthermore, adding hardware is associated with costs and labor [40]. Therefore, a practical and effective integration must adapt to the existing system, address limited capacities, optimize available resource usage, and allow flexible expansion through additional software and hardware when necessary.
The complexity of the underlying process: ICSs realize complex industrial processes that often focus on productivity and speed to achieve maximum efficiency. These physical processes naturally exhibit noisy behaviour, which affects the recorded data [41]. This poses a specific challenge in identifying anomalies, particularly for the AD models, which can be influenced by poor data quality and varying recording rates [42]. Thus, any possible integration must handle various data types and qualities, differentiating sampling rates while also considering the natural behaviour of industrial data.
Data security and knowledge protection: Another essential criterion is data security. Knowledge about the system and the realized process can be found in the sampled data from the ICS [43,44]. This is a crucial factor, especially for small- and medium-sized companies, as knowledge and expertise are key factors for competing against larger competitors. Consequently, companies often do not want to record, store, or transport their data outside their system to mitigate such risks, which can be challenging, as large numbers of data and increased performance are often required, especially when creating and training AD models [45]. Therefore, the resulting integration should process all data internally while simultaneously utilizing the system’s resources to realize all functionalities.
Volume of data: The overall volume of data produced in industry is increasing, driven by modern complex processes and new industrial hardware and software [46]. For instance, some field devices are already achieving sample rates in the microsecond ranges. By utilizing these rapidly recorded data for anomaly detection, it is possible to better identify specific anomalies, such as short collisions, while simultaneously improving reaction times [47,48]. However, this poses a distinct challenge for integration, as it must handle large volumes of data and high sample rates within limited available resources [49]. Consequently, integration needs to be both efficient and flexible to effectively process the resulting data stream, even under resource constraints, thereby enabling rapid reaction times.
Risk management: In an ICS, the correct and safe execution of the physical process must always be ensured. This is especially challenging in decentralized integration, where different devices, such as control units, can be actively influenced by the integration of additional software [50,51]. Such integration can lead to an overload or breakdown of the device, which may impact production or, in the worst-case scenario, endanger workers [52]. Therefore, when integrating AD into a system or device, it is crucial to prevent any negative effects on the safety or functionality of the system.
Effective anomaly detection: Prompt detection, quick access to results, and reasonable reactions are essential for effective and reliable anomaly detection [53,54]. This is especially critical in cases like collisions or in rapid processes. Immediate detection enables a quick response, which can prevent faults from occurring or propagating through the entire production line [55]. The system should efficiently identify anomalies, provide real-time information, assist users in finding solutions, and respond appropriately when necessary.

Based on these challenges, a hybrid and modular concept was developed to meet all requirements.

4. Hybrid and Modular Integration Concept

Achieving purely decentralized integration is nearly impossible because many devices do not meet the necessary hardware and software requirements or are generally designed as closed systems, which allow no direct interference. Therefore, hybrid and modular integration is proposed. The required functionalities for the AD are divided into modules, which can be integrated into the individual devices of the ICS based on the available computational resources. Each module can be seen as a small independent service, collectively forming a complete AD. Assuming that the computational resources of a device are insufficient or that integration is impossible, this modularization allows the corresponding modules to be outsourced to an adjacent device. If no appropriate device is available within the system, the hybrid concept enables the incorporation of additional hardware, such as an SBC, to ensure integration.

Figure 1 shows an exemplary implementation of the concept. For each ICS device, one data collection module (DCM) and one production module (PM) are connected to perform AD. A communication module (COM) enables real-time data exchange and unified communication among all modules. The reaction module (RM) records all detected anomalies, generates visualizations, and provides the user with additional information about the anomaly. In the following section, the functionality and structure of each module is described.

4.1. Data Collection Module

Each DCM consists of a collector, transformer, and producer (Figure 2a). By adapting the collector to the specific communication protocol of each device, data can be collected, processed (such as cleaned, combined, and formatted by the transformer), and then transmitted to the COM by the producer. Therefore, the DCM enables a unified connection and acts as an interface between the device and the COM.

4.2. Production Module

The PM (Figure 2b) executes one AD pipeline for one device in the ICS. It comprises one collector, an executor, and a producer. The collector is directly connected to the COM and receives the discrete data stream collected by a DCM. In the executor, the AD is performed. Detected anomalies can be sent back to the COM via the producer and then written to the event database (DB).

4.3. Development Module

The development module (DEV) (Figure 2d) creates separate AD-Pipelines for each PM, where each pipeline consists of a streamer, preprocessor, and AD model. Therefore, each training cycle consists of data analysis, model initialization, training, evaluation, and finally, the export of the AD-Pipeline. The required training data and essential development parameters are imported from the DBs of the COM. A tracking server monitors the complete process and stores the results in the tracking DB of the COM for later analysis and distribution to the PMs.

4.4. Reaction Module

In the RM (Figure 2c), monitoring and analyzing of the event DB are established. The collector constantly scans all incoming events. If an anomaly is detected, further time series data from the time-series database of the COM are obtained, and the analyzer evaluates the anomaly. All collected information is then linked to the corresponding event and saved through the producer as a log in the event/log DB. Based on the extended observation period and the additional information gathered, a better reaction, even from non-expert users, can be reached.

4.5. Communication Module

The COM, shown in Figure 3, enables real-time data exchange and unified communication among all modules. It includes a data exchange server where each data-producing device is connected via a DCM. In contrast, data-consuming devices are connected via PM. Additionally, the module comprises several databases (DBs).

A time-series DB that facilitates the storage of short-term data from various devices, enabling the development of AD models.
A integration DB for storing essential information and structural knowledge about the system, modules, and the development process.
An event/log DB for collecting detected anomalies, which can be later retrieved for in-depth analysis by the RM.
A tracking DB that records the development results and stores the generated AD model for later distribution to the PMs.

4.6. Process Cycle

To establish the hybrid concept, at least one DCM and PM, as well as in the initial phase one COM and DEV, must be available. At the same time, a single device can include multiple modules. The simplified UML sequence diagram [56] in Figure 4 illustrates a possible chronological process and the interactions between the modules. In the initial stage, data are collected from the device by the DCM and stored in the TS-DB of the COM. Once enough data have been gathered, the complete data package is transferred to the DEV, where the AD-Pipeline is created and subsequently deployed to the respective PM. During the production stage, a continuous data stream from the DCM is transmitted via the COM to the PM, where the AD-Pipeline is executed. All detections are sent back to the COM, where they are saved in the event/log DB. The RM monitors all incoming events, analyzes the detections, and provides a reaction to the specific device.

5. Prototype Implementation

All modules have been implemented prototypically as containerized microservices. Each module can be customized and adjusted to available hardware, for example, by selecting the appropriate base image. All information needed to execute the module is predefined and stored in the integration DB of the COM, which gets imported when the service starts up. Important parameters can be, e.g., the communication protocol of the device, the hyperparameters of the AD model, or the transformation used in the DCM. A GUI was developed to support the setup process, even for users with limited domain knowledge.

Communication Module: Apache Kafka, an open-source event streaming platform, was selected to achieve efficient and fast communication between the modules. Apache Kafka is a distributed system based on servers and clients connected by a high-performance TCP network protocol. A Kafka server is integrated into the COM, enabling the definition of various data topics. Each DCM can send data to a specific topic, from which the PM can retrieve that data. MongoDB (https://www.mongodb.com, accessed 24 November 2024) was used for the time-series DB, integration DB, and event/log DB. The time-series DB acts as a sink to store the data from individual topics of the Kafka server. All structural information about the system and the integration are stored in the integration dB. Detected anomalies and analysis from the RM are collected in the event/log DB. The tracking DB was realized as a MySQL (https://www.mysql.com/, accessed 24 November 2024) DB to enable tracking of the development process and storage of the models.

Development Module: For universal integration of different AD models and techniques, ONNX (https://onnx.ai/, accessed 24 November 2024), an open standard for machine learning interoperability, was chosen. Therefore, nearly every machine learning development framework within the DEV can be selected to generate the AD, e.g., PyTorch (https://pytorch.org, accessed 24 November 2024), TensorFlow (https://www.tensorflow.org, accessed 24 November 2024), or Scikit-learn (https://scikit-learn.org, accessed 24 November 2024), as long as the resulting model can be transformed into an ONNX model. This method enables a high degree of flexibility and significantly reduces the effort to adapt the PMs to different ML frameworks. For the prototype, TensorFlow was chosen to enable fast and effective development of the required models. Using the tf2onnx library, the resulting model can be converted to the corresponding ONNX model. A Tracking Server based on MLFlow (https://mlflow.org/, accessed 24 November 2024) was utilized to track the development and enable an easy export of the models.

Data Collection Module: Several individual collectors were developed to enable communication between the DCMs and devices using various industrial communication protocols, such as OPC UA, Modbus/Memobus, or remote system calls (RSCs). Additional communication protocols can easily be integrated as further collectors at a later stage to be adaptable to different industrial systems. The transformer was created as an open concept to be flexible to the respective data from the device. An Apache Kafka Producer was implemented to connect the DCM to the COM.

Production Module: ONNX Runtime was chosen as a cross-platform inference machine learning accelerator for the universal integration of different AD models in the PM. A Kafka collector was established to receive a discrete data stream from the corresponding topic in the COM provided by a DCM. The AD-Pipeline, including the developed ONNX model, runs in the executor, and a MongoDB producer sends any detected anomaly to the event DB in the COM.

Reaction Module: The reaction module is used to monitor the event DB of the COM. Any input, e.g., a detected anomaly from one PM, is registered, and corresponding information is gathered. Additionally, data from the specific PM topic of the Kafka server is gathered to reconstruct the time-series data at the precise moment when the anomalies occurred. All data are later saved as a Log in the Log-DB to visualize the anomaly for the user.

5.1. GUI

A GUI was developed to facilitate a quick and straightforward setup of the concept, even for complex systems. This involves defining the integration structure, specifying each module, and assigning them to the appropriate hardware. Also, specific parameters for each module can be selected, such as the communication interface for the DCMs or the appropriate AD model for a PM. Furthermore, the detected anomalies were visualized by using the data stored in the Log-DB.

5.2. Anomaly Detection

Devices within an ICS often have limited capacity and processing power, making not all AD models suitable for direct integration. Therefore, a sliding window approach with different types of autoencoders as reconstruction AD models was chosen. As demonstrated in our previous work [8] and by the authors in [22], these models achieve excellent detection results while utilizing minimal resources, making them an ideal choice for integration into low-performance devices in the ICS. Different scalers were implemented to preprocess the data stream from the DCM, such as scaling them into an equal numerical range. The user can select the model and scaler for each PM through the implemented GUI, along with crucial parameters such as the number of layers, layer dimensions, and the loss function. In the initial phase, the DEV imports all the parameters, generates the AD-Pipeline, and automatically defines the threshold.

5.3. Cybersecurity

As modern ICSs are becoming increasingly interconnected with corporate networks and the internet, they are also becoming more vulnerable to cyberattacks [57]. Vulnerabilities can occur at various levels of the ICS architecture [58]. Since the structure, devices, and communication protocols vary in each ICS, potential vulnerabilities at the hardware, firmware, and network levels must be considered individually for each system [59]. At the software level, where the integration of the presented concept occurs, measures can be taken to reduce potential vulnerabilities. By implementing the modules as microservices and subsequently containerizing them, partial isolation from the operating system of the respective device can be achieved. Dividing the entire AD into smaller, independent modules lowers complexity and the amount of programming code in each module. This simplification eases the management of the modules and reduces the likelihood of code errors. Further advantages of containerization include the unique adaptation and restrictions of each microservice, such as limited network access and data access, as well as the straightforward updating and patching of identified vulnerabilities without disrupting the industrial process of the ICS. By tailoring the desired response in the RM, unexpected influences at the process level can be prevented.

6. Evaluation

6.1. Experimental Setup

The rotary table dispenser demonstration system ‘Totaru’ from Yaskawa (see Figure 5 was selected to evaluate and test the prototype. The system consists of three robots, two of which are six-axis systems and the remaining one a three-axis system. Furthermore, four individual drives are integrated and connected to one or multiple mechanical axes. The system realizes multiple industrial production processes, including transporting components, picking and placing objects in moving containers, and packing and sorting products.

In order to effectively realize these processes, an ICS is utilized within the system, incorporating multiple control elements. These include a robot controller for each six-axis robot, a central motion controller that generates the control commands for the three-axis robot and the individual axes, and two further control units for monitoring and controlling external processes. Additionally, an HMI is used for user input of control commands, and an SCADA system for process monitoring. The system is well suited for experimental integration because of its unrestricted access to the unit, allowing for structural changes without risking production breakdowns and economic losses while facilitating complex industrial processes.

6.2. Experimental Integration

Several modules were integrated into the devices of the experimental system, focusing on incorporating these modules into the existing hardware already used for the production process. For devices without integration capabilities, the modules were shifted to adjacent devices or additional hardware, such as a cheap single-board computer. To demonstrate the flexibility and adaptability of the concept, a Raspberry Pi 5 was integrated as a communication unit (ComUnit). The Raspberry Pi is well suited for prototyping due to its cost efficiency, flexibility, and available interfaces. Furthermore, to generate and train the AD models, a Dell laptop as an external removable computing device was utilized. A comprehensive list of all motion devices, the responsible modules, and the respective hardware identifier can be found in Table 1. Detailed specifications of the hardware used can be found in Table 2. The experimental integration of the prototype with modules and motion devices is visualized in Figure 6.

A COM and a DEV were established to ensure the basic functionalities of the concept. In addition, a DCM and a PM per motion device were integrated and distributed over the available hardware. The RM tracks and records all identified anomalies from each PM, including details about process steps, detection time, and impact. A GUI was used to easily integrate the concept and visualize all detected anomalies provided by the RM. To establish fast data collection, OPC Data Access was utilized for the DCMs integrated into the ComUnit, and remote service calls (RSCs) were implemented for the DCMs in the Control Units to receive the current motion data (speed, torque, position) of the respective motion device. Therefore, a unique buffer publish routine for each device was developed, allowing a sample rate of 8 ms with a buffer size of 30 for each variable. Consequently, 120–210 data points were collected by each DCM at each publish interval and transmitted over the COM to the corresponding PM. In the DEV, an AD-Pipeline for each PM, based on the normal process of the system, was generated. As previously mentioned, a sliding window approach was used as a streamer to split the discrete data stream into sliding windows. By setting the step size of the streamer to a small number, e.g., one, detection can be carried out, even if the received data packages are less than the input size of the AD model. A MinMax Scaler was used as a preprocessor to bring the different features of the respective data package into a uniform numerical range. A multi-layer perception autoencoder (MLP-AE) and a 1D-convolutional autoencoder (1D-CAE) were chosen as AD models. To generate valid and, in other scenarios, transferable CPU and memory consumption results, default parameter sets (Table 3) for implementing the AD-Pipeline were defined. Based on the training results in the initial stage of the development phase, the best-performing AD-Pipeline was exported to the corresponding PM.

6.3. Experimental Results

Every specified module was successfully integrated within the available devices of the experimental setup. All structural details regarding the integration and generated AD-Pipelines were defined exclusively through the GUI. The DEV automatically generates the AD models during the initial phase from the data of the normal process of the experimental unit. The selected models were exported to the respective previously defined PMs. In order to demonstrate the effectiveness and performance of the prototype, in the following section, the integration is validated against the previously defined industrial requirements. Additional detailed latency, resource consumption, and energy usage measurements can be found in the Appendix A, Appendix B and Appendix C.

Highly dynamic systems: Based on the flexible concept and the customization of the DCMs, multiple devices from different manufacturers could be integrated with various communication protocols (e.g., OPCDA, OPCUA, Modbus, and RSC). By designing the modules as adaptable microservices, seamless integration into the existing devices of the experimental ICS was possible. In addition to the devices used in the experimental setup, the integration of modules into controllers from other manufacturers (e.g., PxC AXC F2152), other SBCs (e.g., Raspberry Pi4, Nvidia Jetson TX2), and those in various edge PCs could already be realized. In order to adapt to structural changes in the system, the concept enhances the flexible addition, removal, or modification of modules. For example, existing AD-Pipelines can be exchanged during continuous production, ensuring that the ongoing monitoring of the remaining system is not affected. Therefore, the concepts consider the highly dynamic nature of modern ICSs, enabling the integration of a wide variety of hardware while remaining flexible and adaptable to system changes.
Limited available resources: With the creation of lightweight modules, the possibility of integration in a wide range of hardware, and the option to outsource functionalities to adjacent devices, the optimal utilization of the existing system resources can be ensured. Various modules could be incorporated into existing devices within the experimental setup, such as directly into the control units or the SCADA system. By incorporating a low-cost SBC as an additional hardware component, the system’s expandability can be effectively demonstrated. Through the interconnection of multiple COM units, the concept can be further extended to monitor entire production lines. An extended analysis of used resources and energy consumption can be found in Appendix A and Appendix C. Thus, the concept enables practical and effective integration even in resource-constrained environments, optimizes available resource usage, and allows flexible expansion through additional hardware when necessary.
Complexity of the underlying process: By individually adapting the models and AD pipelines, different types of devices and the unique physical characteristics of the industrial processes could be considered. For instance, it was possible to integrate various motion devices of the experimental unit, including robots and linear and rotary axes from the experimental unit (Section 5), while simultaneously addressing the noise and jitter of each device separately by customizing the transformers in the DCMs. Given the individual connections of the DCMs and PMs, asynchronous and varying recording rates of the devices can be considered. Consequently, a broad range of devices and processes can be incorporated while taking into account their unique characteristics, data quality, sample rates, and the natural behavior of the industrial process.
Data security and knowledge protection: The lightweight and hybrid integration of all necessary modules and functionalities directly within the system ensures that no data need to be recorded, stored, or transported to external locations. Depending on the required resources, additional devices (e.g., a laptop within the experimental setup) can be flexibly incorporated into the integration. Even during the resource-intensive training of the AD models, an external device can be temporarily added and subsequently removed. Consequently, no external connection to a data centre or cloud is necessary, resulting in enhanced data security and knowledge protection.
Volume of data: With the flexible and hybrid integration of the modules, such as the direct integration within the control units or the possibility of outsourcing functionalities to adjacent devices, the large volume of data generated from high sampling rates can be processed effectively. At the same time, this reduces cost and energy consumption related to transmitting data to an external central location while minimizing latency. In the experimental unit, the data generated could be processed directly on the respective device, resulting in a reduced data stream and achieving high sampling rates, e.g., 8 ms, even with limited computing resources.
Risk management: Through the implementation of the modules as microservices and thus the partial decoupling of the functionality from the device’s operating system, the associated risk can be mitigated, even when direct integration within the devices is employed. Furthermore, the maximum CPU and memory consumption of individual services can be defined, which minimizes the effects on the device and the respective process, even in the event of an error within a module. By separating the individual functions, if one module fails, the rest of the system remains stable, allowing all other devices to be continuously monitored. The error-prone module can then be reported, and once rectified, the service can be restarted, allowing full functionality of the AD to resume. It is also possible to implement redundancy, as can be seen in Appendix A, whereby multiple modules can monitor the same device. As a result, the concept reduces risks and minimizes the interference of devices and processes within the ICS, even in direct integration.
Anomaly detection: Fast response times can be achieved through directly integrating functionalities within adjacent devices and the individual adaptation of DCMs and PMs. As a result, latency times of 300 ms were achieved with a model execution time of 200 ms within the experimental concept, even on resource-limited devices. A detailed analysis and discussion of the response times achieved with various integration concepts can be found in Appendix B. This also facilitates the early detection of critical anomalies, allowing for rapid responses. Simultaneously, the immediate localization of anomalies can be realized through the separate monitoring of individual devices, which can subsequently assist the user in resolving the anomalies.

7. Conclusions and Future Work

Complete decentralized integration of AD within an ICS is nearly impossible due to the significant variance among different devices. While some devices allow direct integration, others are designed as closed systems or lack the necessary performance. Therefore, in this paper, we introduce a hybrid and modular concept that enables the separation, modularisation, and implementation of required functionalities based on the available performance capabilities of the system. To establish AD, even for devices without direct integration possibilities, corresponding functions can be flexibly outsourced to adjacent devices, or additional hardware can be easily integrated. As a result, complete coverage of all devices in the ICS can be achieved. Furthermore, data processing directly in each device ensures high data security, low latency, and reduced data traffic. By customizing each module for the respective devices, risks can be minimized even during direct implementation. Based on the prototypical implementation within an industrial unit and the validation against all previously defined industrial requirements, we were able to demonstrate the effectiveness and efficiency of our concept. Thus, the concept facilitates the development of a effective, secure, and efficient AD even for resource-constrained legacy ICS while maximizing the utilization of all available resources.

Despite the promising results of the concept, there are several directions for future research. Until now, the concept has only been tested in one industrial system. By integrating the concept into a complete production line, more tests can be made, and the concept can be further improved. Furthermore, by incorporating additional AD models such as LSTMs, RNNs, VAEs, and OC-SVMs, along with communication protocols like EtherCAT, EtherNet/IP, and Mechatrolink, even greater flexibility and universal integrability can be achieved. Moreover, despite the possibilities already described to minimize the implementation risks of individual modules within ICS devices, specific strategies for security, authentication, and redundancy could further enhance the overall cybersecurity of the concept. Finally, by constantly improving the hybrid integration concept to make it more accessible, easier to implement, and simpler to manage, we are focusing on realizing the concept as a framework to enable its use even for non-domain experts.

Author Contributions

Conceptualization, C.G.; methodology, C.G.; software, C.G.; validation, C.G.; formal analysis, C.G.; investigation, C.G.; resources, C.G.; data curation, C.G.; writing—original draft preparation, C.G.; writing—review and editing, C.G. and B.G.H.; visualization, C.G.; supervision, B.G.H.; project administration, C.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data and additional information can be provided upon request.

Conflicts of Interest

Author Christian Goetz was employed by Yaskawa Europe GmbH. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Resource Consumption

All integrated modules listed in Table 1 were evaluated for their resource utilization. The required CPU power and memory consumption of each module on the respective device were monitored during production. In the cases of CtrUnit1 and CtrUnit2, it was not possible to track the modules individually due to the specialized hardware. Thus, the total resource consumption for the entire device was recorded instead. Figure A1 shows the separate device statistics, whereby for the ComUnit, the recordings were split between the COM and the integrated DCMs for better visualization. In the figures, the area between the two red dotted lines indicates a production stop with the corresponding standby status of the modules. The area behind the purple line in Figure A1g,h shows the idle state after removing the modules.

Figure A1. CPU and memory usage of the different devices in the experimental integration: (a,b) ComUnit with COM; (c,d) ComUnit with DCMs; (e,f) SCADA with PMs; (g,h) CtrUnit1/CtrUnit2 with PMs/DCMs; red dotted lines: indicate a production stop with the corresponding standby status of the modules; purple dotted line: shows the idle state after removing the modules.

As can be seen in Figure A1e,f, the PMs require the most CPU power to execute the respective AD pipeline. Within the SCADA, the average CPU performance is 6% per module with a memory usage of 80 MB each. For the DCMs within the ComUnit, the average CPU performance was around 1.5%, with a memory usage of 45 MB. Due to the lightweight integration, the two PMs and DCMs of the CtrUnits could be redundantly integrated, ensuring continuous monitoring even if one control unit was interrupted. The COM (Figure A1a,b) required a maximum CPU performance of 4% with a maximum memory usage of 1500 MB to realize the entire parallel data traffic of the experimental setup. The measurements show that the added SBC can successfully handle both the COM and all DCMs of the experimental integration, with additional resources still available. This means that it is feasible to consider integrating even more devices should any changes be made to the system. The PMs require more resources and consume nearly all the performance of the SCADA system. To integrate additional PMs, outsourcing to the ComUnit or utilizing another device should be considered.

According to the results, the lightweight modules can be flexibly distributed to the various existing devices, even in the CtrlUnits, allowing optimum use of system resources. Simultaneously, with the hybrid concept, additional hardware can effortlessly be integrated, and COM modules can be interconnected, enabling fast and scalable integration of further modules and complete systems.

Appendix B. Latency

Different integration scenarios were generated to evaluate the response times of various integration possibilities of the hybrid concept (Figure A2a–e). To ensure valid and precise measurements, even across multiple devices, artificial anomalies were generated directly by the data-producing device. The measurement was started as soon as the anomaly was generated and concluded when feedback was received on the device. Therefore, the cycle time could be measured in the ms range without inaccuracies caused by the measurements across different devices. The diverse scenarios are shown in Figure A2a–e, the achieved latency times for the complete cycle (Cyc) with separately listed model execution times (ME) in Figure A2f.

Figure A2. Latency measurement in different scenarios (MC = Motion Control System): (a) direct integration; (b) cloud integration; (c) single ComUnit integration; (d) fully hybrid; (e) high-performance unit integration; (f) latency measurements: complete detection cycle (Cyc), model execution cycle (ME).

As can be seen from Figure A2f, the highest model execution time (Scenario_1_ME) is reached when the model is directly integrated into the CtrUnit due to its limited performance. On the other hand, the shortest model execution time is reached in Scenarios 2 and 5 on the most powerful device (an external high-performance PC). In Scenario 4, integration is achieved through a hybrid approach using multiple devices. On average, the lowest latency is achieved in Scenario 3, resulting from the direct integration of all modules into the ComUnit and the shortest transmission times. In Scenario 2, a connection to a central external computing unit is simulated, and the data exchange occurs via remote access. Despite the low model execution times, the latency is the highest and fluctuates significantly, which can be explained by the uncertain transmission delay over several stations.

Based on the results, the hybrid integration achieves faster response times than an external central integration, even with resource-constrained devices, facilitating quick and effective anomaly detection.

Appendix C. Energy Consumption

Various measurements were carried out to determine the energy consumption of the experimental integration. Since the devices used in the experimental unit perform the AD and the production process, the consumption was compared between the integration and non-integration of the modules. To record the energy consumption, for the SCADA, the library CarbonCode (https://codecarbon.io/, accessed 24 November 2024) and for the ComUnit, the system tool vcgencmd were used. All measurements were checked for accuracy using external power meters (YOJOCK UT003, Tapo P100). Figure A3 shows the measurements.

Figure A3. Energy consumption measurements: (a) SCADAUnit; (b) ComUnit; red dotted line: standby of the system; yellow dotted line: without DCMs; purple dotted line: without DCMs and COM.

Within the SCADA Unit Figure A3a, the energy consumption increases by an average of 12 W due to the integration of the 6 PMs. In the ComUnit, the energy consumption increases by 1.5 W for the 6 DCMs and 1 W for the COM. In Figure A3a,b, the area between the two red dotted lines indicates a production stop with the corresponding standby status of the modules. In Figure A3b, the yellow line shows when the DCMs are removed, while the purple line shows the shutdown of the COM.

Based on the measurements, the energy consumption of the complete system increases only slightly, even with the full integration of all modules. By implementing the hybrid integration concept, the effective utilization of existing resources can be ensured while maintaining a limited increase in energy consumption.

References

Umer, M.A.; Junejo, K.N.; Jilani, M.T.; Mathur, A.P. Machine learning for intrusion detection in industrial control systems: Applications, challenges, and recommendations. Int. J. Crit. Infrastruct. Prot. 2022, 38, 100516. [Google Scholar] [CrossRef]
Bhamare, D.; Zolanvari, M.; Erbad, A.; Jain, R.; Khan, K.; Meskin, N. Cybersecurity for industrial control systems: A survey. Comput. Secur. 2020, 89, 101677. [Google Scholar] [CrossRef]
Conti, M.; Donadel, D.; Turrin, F. A Survey on Industrial Control System Testbeds and Datasets for Security Research. IEEE Commun. Surv. Tutor. 2021, 23, 2248–2294. [Google Scholar] [CrossRef]
Kabore, R.; Kouassi, A.; N’goran, R.; Asseu, O.; Kermarrec, Y.; Lenca, P. Review of anomaly detection systems in industrial control systems using deep feature learning approach. Engineering 2021, 13, 30–44. [Google Scholar] [CrossRef]
Kriaa, S.; Pietre-Cambacedes, L.; Bouissou, M.; Halgand, Y. A survey of approaches combining safety and security for industrial control systems. Reliab. Eng. Syst. Saf. 2015, 139, 156–178. [Google Scholar] [CrossRef]
Feng, C.; Li, T.; Chana, D. Multi-level Anomaly Detection in Industrial Control Systems via Package Signatures and LSTM Networks. In Proceedings of the 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Denver, CO, USA, 26–29 June 2017; pp. 261–272. [Google Scholar] [CrossRef]
Armellin, A.; Caviglia, R.; Gaggero, G.; Marchese, M. A Framework for the Deployment of Cybersecurity Monitoring Tools in the Industrial Environment. IT Prof. 2024, 26, 62–70. [Google Scholar] [CrossRef]
Goetz, C.; Humm, B. Decentralized Real-Time Anomaly Detection in Cyber-Physical Production Systems under Industry Constraints. Sensors 2023, 23, 4207. [Google Scholar] [CrossRef]
Koay, A.M.; Ko, R.K.L.; Hettema, H.; Radke, K. Machine learning in industrial control system (ICS) security: Current landscape, opportunities and challenges. J. Intell. Inf. Syst. 2023, 60, 377–405. [Google Scholar] [CrossRef]
Huč, A.; Šalej, J.; Trebar, M. Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices. Sensors 2021, 21, 4946. [Google Scholar] [CrossRef]
Tsukada, M.; Kondo, M.; Matsutani, H. A Neural Network-Based On-Device Learning Anomaly Detector for Edge Devices. IEEE Trans. Comput. 2020, 69, 1027–1044. [Google Scholar] [CrossRef]
Huong, T.T.; Bac, T.P.; Ha, K.N.; Hoang, N.V.; Hoang, N.X.; Hung, N.T.; Tran, K.P. Federated Learning-Based Explainable Anomaly Detection for Industrial Control Systems. IEEE Access 2022, 10, 53854–53872. [Google Scholar] [CrossRef]
Brandalero, M.; Ali, M.; Le Jeune, L.; Hernandez, H.G.M.; Veleski, M.; da Silva, B.; Lemeire, J.; Van Beeck, K.; Touhafi, A.; Goedemé, T.; et al. AITIA: Embedded AI Techniques for Embedded Industrial Applications. In Proceedings of the 2020 International Conference on Omni-layer Intelligent Systems (COINS), Barcelona, Spain, 31 August–2 September 2020; pp. 1–7. [Google Scholar] [CrossRef]
Nedeljkovic, D.; Jakovljevic, Z. CNN based method for the development of cyber-attacks detection algorithms in industrial control systems. Comput. Secur. 2022, 114, 102585. [Google Scholar] [CrossRef]
Yu, X.; Yang, X.; Tan, Q.; Shan, C.; Lv, Z. An edge computing based anomaly detection method in IoT industrial sustainability. Appl. Soft Comput. 2022, 128, 109486. [Google Scholar] [CrossRef]
Lübben, C.; Pahl, M.O. Distributed Device-Specific Anomaly Detection using Deep Feed-Forward Neural Networks. In Proceedings of the NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, Miami, FL, USA, 8–12 May 2023; pp. 1–9. [Google Scholar] [CrossRef]
Hu, Y.; Yang, A.; Li, H.; Sun, Y.; Sun, L. A survey of intrusion detection on industrial control systems. Int. J. Distrib. Sens. Netw. 2018, 14, 1550147718794615. [Google Scholar] [CrossRef]
Kim, B.; Alawami, M.A.; Kim, E.; Oh, S.; Park, J.; Kim, H. A Comparative Study of Time Series Anomaly Detection Models for Industrial Control Systems. Sensors 2023, 23, 1310. [Google Scholar] [CrossRef]
Yan, P.; Abdulkadir, A.; Luley, P.P.; Rosenthal, M.; Schatte, G.A.; Grewe, B.F.; Stadelmann, T. A Comprehensive Survey of Deep Transfer Learning for Anomaly Detection in Industrial Time Series: Methods, Applications, and Directions. IEEE Access 2024, 12, 3768–3789. [Google Scholar] [CrossRef]
Nankya, M.; Chataut, R.; Akl, R. Securing Industrial Control Systems: Components, Cyber Threats, and Machine Learning-Driven Defense Strategies. Sensors 2023, 23, 8840. [Google Scholar] [CrossRef]
Zhao, X.; Zhang, L.; Cao, Y.; Jin, K.; Hou, Y. Anomaly Detection Approach in Industrial Control Systems Based on Measurement Data. Information 2022, 13, 450. [Google Scholar] [CrossRef]
Kravchik, M.; Shabtai, A. Detecting Cyber Attacks in Industrial Control Systems Using Convolutional Neural Networks. In Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCy, CPS-SPC’18, Toronto, ON, Canada, 15–19 October 2018; pp. 72–83. [Google Scholar] [CrossRef]
Wang, C.; Wang, B.; Liu, H.; Qu, H. Anomaly Detection for Industrial Control System Based on Autoencoder Neural Network. Wirel. Commun. Mob. Comput. 2020, 2020, 8897926. [Google Scholar] [CrossRef]
Xu, L.; Wang, B.; Zhao, D.; Wu, X. DAN: Neural network based on dual attention for anomaly detection in ICS. Expert Syst. Appl. 2025, 263, 125766. [Google Scholar] [CrossRef]
Sater, R.A.; Hamza, A.B. A federated learning approach to anomaly detection in smart buildings. ACM Trans. Internet Things 2021, 2, 1–23. [Google Scholar] [CrossRef]
Liu, C.; Su, X.; Li, C. Edge Computing for Data Anomaly Detection of Multi-Sensors in Underground Mining. Electronics 2021, 10, 302. [Google Scholar] [CrossRef]
Schneible, J.; Lu, A. Anomaly detection on the edge. In Proceedings of the MILCOM 2017—2017 IEEE Military Communications Conference (MILCOM), Baltimore, MD, USA, 23–25 October 2017; pp. 678–682. [Google Scholar] [CrossRef]
Liu, Y.; Garg, S.; Nie, J.; Zhang, Y.; Xiong, Z.; Kang, J.; Hossain, M.S. Deep Anomaly Detection for Time-Series Data in Industrial IoT: A Communication-Efficient On-Device Federated Learning Approach. IEEE Internet Things J. 2021, 8, 6348–6358. [Google Scholar] [CrossRef]
Lee, H.; Oh, J.; Kim, K.; Yeon, H. A data streaming performance evaluation using resource constrained edge device. In Proceedings of the 2017 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 18–20 October 2017; pp. 628–633. [Google Scholar] [CrossRef]
Raptis, T.P.; Cicconetti, C.; Passarella, A. Efficient topic partitioning of Apache Kafka for high-reliability real-time data streaming applications. Future Gener. Comput. Syst. 2024, 154, 173–188. [Google Scholar] [CrossRef]
Peddireddy, K. Streamlining Enterprise Data Processing, Reporting and Realtime Alerting using Apache Kafka. In Proceedings of the 2023 11th International Symposium on Digital Forensics and Security (ISDFS), Chattanooga, TN, USA, 11–12 May 2023; pp. 1–4. [Google Scholar] [CrossRef]
Raptis, T.P.; Passarella, A. A Survey on Networked Data Streaming with Apache Kafka. IEEE Access 2023, 11, 85333–85350. [Google Scholar] [CrossRef]
An, Y.; Yu, F.R.; Li, J.; Chen, J.; Leung, V.C.M. Edge Intelligence (EI)-Enabled HTTP Anomaly Detection Framework for the Internet of Things (IoT). IEEE Internet Things J. 2021, 8, 3554–3566. [Google Scholar] [CrossRef]
Queiroz, J.; Leitão, P.; Barbosa, J.; Oliveira, E. Agent-Based Approach for Decentralized Data Analysis in Industrial Cyber-Physical Systems. In Proceedings of the Industrial Applications of Holonic and Multi-Agent Systems, Linz, Austria, 26–29 August 2019; Mařík, V., Kadera, P., Rzevski, G., Zoitl, A., Anderst-Kotsis, G., Tjoa, A.M., Khalil, I., Eds.; Springer: Cham, Switzerland, 2019; pp. 130–144. [Google Scholar]
Mocnej, J.; Pekar, A.; Seah, W.K.; Papcun, P.; Kajati, E.; Cupkova, D.; Koziorek, J.; Zolotova, I. Quality-enabled decentralized IoT architecture with efficient resources utilization. Robot. Comput.-Integr. Manuf. 2021, 67, 102001. [Google Scholar] [CrossRef]
Gerz, F.; Bastürk, T.R.; Kirchhoff, J.; Denker, J.; Al-Shrouf, L.; Jelali, M. A Comparative Study and a New Industrial Platform for Decentralized Anomaly Detection Using Machine Learning Algorithms. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; pp. 1–8. [Google Scholar] [CrossRef]
Zamanzadeh Darban, Z.; Webb, G.I.; Pan, S.; Aggarwal, C.; Salehi, M. Deep Learning for Time Series Anomaly Detection: A Survey. ACM Comput. Surv. 2024, 57, 1–42. [Google Scholar] [CrossRef]
Pinto, R.; Gonçalves, G.; Delsing, J.; Tovar, E. Enabling data-driven anomaly detection by design in cyber-physical production systems. Cybersecurity 2022, 5, 9. [Google Scholar] [CrossRef]
Takyiwaa Acquaah, Y.; Kaushik, R. Normal-Only Anomaly Detection in Environmental Sensors in CPS: A Comprehensive Review. IEEE Access 2024, 12, 191086–191107. [Google Scholar] [CrossRef]
Nardi, M.; Valerio, L.; Passarella, A. Centralised vs decentralised anomaly detection: When local and imbalanced data are beneficial. In Proceedings of the Third International Workshop on Learning with Imbalanced Domains: Theory and Applications, Bilbao, Spain, 17 September 2021; Proceedings of Machine Learning Research. Moniz, N., Branco, P., Torgo, L., Japkowicz, N., Woźniak, M., Wang, S., Eds.; PMLR: Cambridge, MA, USA, 2021; Volume 154, pp. 7–20. [Google Scholar]
Adhikari, D.; Jiang, W.; Zhan, J.; Rawat, D.B.; Bhattarai, A. Recent advances in anomaly detection in Internet of Things: Status, challenges, and perspectives. Comput. Sci. Rev. 2024, 54, 100665. [Google Scholar] [CrossRef]
Gunes, V.; Peter, S.; Givargis, T.; Vahid, F. A Survey on Concepts, Applications, and Challenges in Cyber-Physical Systems. KSII Trans. Internet Inf. Syst. 2014, 8, 4242–4268. [Google Scholar] [CrossRef]
Nasir, Z.U.I.; Iqbal, A.; Qureshi, H.K. Securing Cyber-Physical Systems: A Decentralized Framework for Collaborative Intrusion Detection with Privacy Preservation. IEEE Trans. Ind. Cyber-Phys. Syst. 2024, 2, 303–311. [Google Scholar] [CrossRef]
Salam, A.; Abrar, M.; Amin, F.; Ullah, F.; Khan, I.A.; Alkhamees, B.F.; AlSalman, H. Securing Smart Manufacturing by Integrating Anomaly Detection with Zero-Knowledge Proofs. IEEE Access 2024, 12, 36346–36360. [Google Scholar] [CrossRef]
Yang, M.; Zhang, J. Data anomaly detection in the internet of things: A review of current trends and research challenges. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 9. [Google Scholar] [CrossRef]
Luo, Y.; Xiao, Y.; Cheng, L.; Peng, G.; Yao, D.D. Deep Learning-based Anomaly Detection in Cyber-physical Systems: Progress and Opportunities. ACM Comput. Surv. 2021, 54, 1–36. [Google Scholar] [CrossRef]
Ahmed, C.M.; M R, G.R.; Mathur, A.P. Challenges in Machine Learning based approaches for Real-Time Anomaly Detection in Industrial Control Systems. In Proceedings of the 6th ACM on Cyber-Physical System Security Workshop, CPSS ’20, Taipei, Taiwan, 6 October 2020; pp. 23–29. [Google Scholar] [CrossRef]
Jankov, D.; Sikdar, S.; Mukherjee, R.; Teymourian, K.; Jermaine, C. Real-time High Performance Anomaly Detection over Data Streams: Grand Challenge. In Proceedings of the 11th ACM International Conference on Distributed and Event-Based Systems, DEBS ’17, Barcelona, Spain, 19–23 June 2017; pp. 292–297. [Google Scholar] [CrossRef]
Iglesias Vázquez, F.; Hartl, A.; Zseby, T.; Zimek, A. Anomaly detection in streaming data: A comparison and evaluation study. Expert Syst. Appl. 2023, 233, 120994. [Google Scholar] [CrossRef]
Zhou, X.; Peng, X.; Xie, T.; Sun, J.; Ji, C.; Li, W.; Ding, D. Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study. IEEE Trans. Softw. Eng. 2021, 47, 243–260. [Google Scholar] [CrossRef]
Laigner, R.; Zhou, Y.; Salles, M.A.V.; Liu, Y.; Kalinowski, M. Data management in microservices: State of the practice, challenges, and research directions. Proc. VLDB Endow. 2021, 14, 3348–3361. [Google Scholar] [CrossRef]
Zografopoulos, I.; Ospina, J.; Liu, X.; Konstantinou, C. Cyber-Physical Energy Systems Security: Threat Modeling, Risk Assessment, Resources, Metrics, and Case Studies. IEEE Access 2021, 9, 29775–29818. [Google Scholar] [CrossRef]
Feng, C.; Tian, P. Time Series Anomaly Detection for Cyber-physical Systems via Neural System Identification and Bayesian Filtering. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, Virtual, 14–18 August 2021; pp. 2858–2867. [Google Scholar] [CrossRef]
Dehlaghi-Ghadim, A.; Moghadam, M.H.; Balador, A.; Hansson, H. Anomaly Detection Dataset for Industrial Control Systems. IEEE Access 2023, 11, 107982–107996. [Google Scholar] [CrossRef]
Gaddam, A.; Wilkin, T.; Angelova, M.; Gaddam, J. Detecting Sensor Faults, Anomalies and Outliers in the Internet of Things: A Survey on the Challenges and Solutions. Electronics 2020, 9, 511. [Google Scholar] [CrossRef]
Micskei, Z.; Waeselynck, H. The many meanings of UML 2 Sequence Diagrams: A survey. Softw. Syst. Model. 2011, 10, 489–514. [Google Scholar] [CrossRef]
Yoo, H.; Ahmed, I. Control Logic Injection Attacks on Industrial Control Systems. In Proceedings of the ICT Systems Security and Privacy Protection, Lisbon, Portugal, 25–27 June 2019; Dhillon, G., Karlsson, F., Hedström, K., Zúquete, A., Eds.; Springer: Cham, Switzerland, 2019; pp. 33–48. [Google Scholar]
McLaughlin, S.; Konstantinou, C.; Wang, X.; Davi, L.; Sadeghi, A.R.; Maniatakos, M.; Karri, R. The Cybersecurity Landscape in Industrial Control Systems. Proc. IEEE 2016, 104, 1039–1057. [Google Scholar] [CrossRef]
Marulli, F.; Lancellotti, F.; Paganini, P.; Dondossola, G.; Terruggia, R. Towards a novel approach to enhance cyber security assessment of industrial energy control and distribution systems through generative adversarial networks. J. High Speed Netw. 2024. [Google Scholar] [CrossRef]

Figure 1. Exemplary implementation of modules in an ICS. COM: communication module; PM: production module; DCM: data collection module; RM: reaction module; SBC: Single-Board Computer; blue arrow: indicates the integration of the corresponding module into the device; orange arrow: marks the device for which the DCM is collecting data; green arrow: indicates related modules.

Figure 2. Structure of (a) data collection module; (b) production module; (c) reaction module; (d) development module.

Figure 3. Overview of the communication module, including the different integrated DBs. As the central component, the data exchange server enables communication and data transfer between the DCMs and the PMs. The integration DB holds essential information required to generate the modules. The time-series DB stores the data transmitted from the individual devices by the DCMs for generating the AD-Pipelines. The event/log DB collects the detected anomalies, including additional information for later analysis. The tracking DB records the training results of the AD pipelines.

Figure 4. Simplified UML sequence diagram of initial and production stages, including interactions of the modules and direct reaction to the devices.

Figure 5. Rotary table dispenser demonstrations system ‘Totaru’ [8].

Figure 6. The experimental integration of the prototype as defined in the GUI.

Table 1. Motion devices of the ICS with responsible integrated modules and the respective hardware identifier.

Motion Device	Modules	Hardware Identifier
SliderRobot	PM0	Control Unit 1
SliderRobot	DCM0	Control Unit 2
RobotLogistics	PM1	SCADA Unit
RobotLogistics	DCM1	Communication Unit
SliderLogisticsTray	PM2	Control Unit 1
SliderLogisticsTray	DCM2	Control Unit 2
RobotProduction	PM3	SCADA Unit
RobotProduction	DCM3	Communication Unit
CircularConveyorBelt	PM4	SCADA Unit
CircularConveyorBelt	DCM4	Communication Unit
RobotPPU (U Axis of RobotPP)	PM5	SCADA Unit
RobotPPU (U Axis of RobotPP)	DCM5	Communication Unit
RobotPPL (L Axis of RobotPP)	PM6	SCADA Unit
RobotPPL (L Axis of RobotPP)	DCM6	Communication Unit
RobotPPS (S Axis of RobotPP)	PM7	SCADA Unit
RobotPPS (S Axis of RobotPP)	DCM7	Communication Unit
RotatingTable	PM8	SCADA Unit
RotatingTable	DCM8	Communication Unit
	COM	Communication Unit
	DEV	Development Unit
	RM	SCADA Unit
	GUI	SCADA Unit

Table 2. Detailed specification of used hardware for model integration.

Hardware Identifier	Hardware	Specification
Development Unit	Dell Latitude 7490	Intel i7-8650U 1.90 GHz/32 GB RAM
Communication Unit	Raspberry Pi 5	ARM Cortex-A76/8 GB RAM
SCADA Unit	Intel NUC7i7DNB	Intel i7-8650U 1.90 GHz/16 GB RAM
Control Unit 1	Yaskawa iC9212-EC	ARM Cortex-A17 1.26 GHz/2 GB RAM
Control Unit 2	Yaskawa iC9226-EC	ARM Cortex-A17 1.26 GHz/2 GB RAM

Table 3. Default AD-Pipeline parameters.

Parameter Set	Set 1	Set 2
Model	1D-CAE	MLP-AE
Window size	64	64
Step size	1	1
Number of layer	5	5
Dimensions of layer	64/32/16/32/64 (MaxPooling)	64/32/16/32/64
Loss function	MAE	MAE
Optimizer	Adam	Adam

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Goetz, C.; Humm, B.G. A Hybrid and Modular Integration Concept for Anomaly Detection in Industrial Control Systems. AI 2025, 6, 91. https://doi.org/10.3390/ai6050091

AMA Style

Goetz C, Humm BG. A Hybrid and Modular Integration Concept for Anomaly Detection in Industrial Control Systems. AI. 2025; 6(5):91. https://doi.org/10.3390/ai6050091

Chicago/Turabian Style

Goetz, Christian, and Bernhard G. Humm. 2025. "A Hybrid and Modular Integration Concept for Anomaly Detection in Industrial Control Systems" AI 6, no. 5: 91. https://doi.org/10.3390/ai6050091

APA Style

Goetz, C., & Humm, B. G. (2025). A Hybrid and Modular Integration Concept for Anomaly Detection in Industrial Control Systems. AI, 6(5), 91. https://doi.org/10.3390/ai6050091

Article Menu

A Hybrid and Modular Integration Concept for Anomaly Detection in Industrial Control Systems

Abstract

1. Introduction

2. Related Work

3. Challenges

4. Hybrid and Modular Integration Concept

4.1. Data Collection Module

4.2. Production Module

4.3. Development Module

4.4. Reaction Module

4.5. Communication Module

4.6. Process Cycle

5. Prototype Implementation

5.1. GUI

5.2. Anomaly Detection

5.3. Cybersecurity

6. Evaluation

6.1. Experimental Setup

6.2. Experimental Integration

6.3. Experimental Results

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Resource Consumption

Appendix B. Latency

Appendix C. Energy Consumption

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI