1. Introduction
OT devices and thus communication protocols are widely being adopted into networked applications [
1]. This is being expanded with the increasing adoption of Industrial IoT (IIoT) networks, driven by the demand for greater automation, data sharing, and remote monitoring [
2]. These networks may also rely on time-sensitive applications that may impact the safety of workers, for example, to shut down a production line when an incident is detected, or to protect energy grid systems in the event of a fault. Previously, delivery of communication over serial networks had limited distances to travel and set maximum delays for communication. As a result, these protocols were ideally suited for safety-critical applications. However, the translation to TCP/IP-based networked communication to better integrate and coexist with existing IT protocols has introduced additional latency and cybersecurity concerns for these communication networks.
In many cases, a complete retrofit of existing OT device deployments is not feasible due to the cost, complexity, and potential disruptions associated with upgrading legacy equipment. As a result, there is a growing reliance on solutions that bridge the gap between older OT systems and modern IT networks. Protocol gateways, which act as translators between different communication protocols, have become a common solution in these scenarios. These gateways facilitate the integration of OT systems that use legacy communication protocols, such as Modbus/RTU [
3], with more modern network infrastructures using more sophisticated OT protocols, such as DNP3 [
4], by mapping protocol values through shared memory within the gateway device. However, while these gateways offer flexibility and interoperability, they also introduce new vulnerabilities into OT networks by making communication more accessible to external threats.
The security and timely communication of OT systems are critical, as vulnerabilities in communication protocols can have far-reaching consequences on both operational uptime and worker safety. OT protocols are often designed with limited security features and are sometimes not updated to address modern cybersecurity threats. These vulnerabilities are often mitigated through a defense-in-depth strategy, which involves multiple layers of protection, including network segmentation, firewalls, and intrusion detection systems. However, these measures may not always prevent all vulnerabilities, especially those inherent in a device’s protocol implementation [
5]. The need to monitor and control OT systems across vast geographic areas has further complicated this issue. An example is the utilization of renewable energy resources and decentralized power grids, which requires the deployment of monitoring devices and sensors interconnected over a large geographical footprint. Remote monitoring, often achieved via wireless networks [
6,
7], is becoming essential for maintaining operational efficiency and system resilience. However, this shift introduces new challenges, particularly when communication failures occur between devices located at remote sites. These failures can lead to significant disruptions in system performance, delayed responses to critical events, and increased downtime.
In our previous works, we implemented both physical and emulated testbeds to evaluate the performance of specific OT protocols and their on-device implementations [
8]. These studies revealed significant variability in the performance of the same protocol across different devices, highlighting the potential for device-specific implementation flaws to cause system-wide failures. In some cases, mismanagement of key protocol functions, such as flag handling or message synchronization, led to complete communication breakdowns.
Given the critical need for reliable communication in OT systems, it is essential to address the vulnerabilities introduced by integrating legacy OT protocols with modern IT infrastructure. Our team’s ongoing efforts focus on systematically reviewing OT protocol specifications to identify vulnerabilities and safety risks, especially those stemming from dependencies on lower-layer IT protocols.
Building on our prior capabilities to analyze such vulnerabilities through simulation, emulation, and real-world testbeds, this paper provides a detailed comparison of two widely deployed OT protocols: Modbus TCP and DNP3. We examine how their design choices—particularly their use of TCP/IP—affect communication reliability, especially under conditions such as device faults or malicious disruptions.
The core contributions of this study are twofold:
We identify critical stages in the Modbus TCP and DNP3 specifications where communication recovery mechanisms are defined and analyze how these impact time-sensitive OT operations.
We implement a multi-tiered OT testbed to empirically evaluate the real-world implications of these behaviors. We then also evaluate the effectiveness of individual defense strategies for these vulnerabilities.
Our results demonstrate that such dependencies can lead to unintended and potentially hazardous consequences in industrial environments. We also evaluate various mitigation strategies within the testbed to provide practical guidance for improving OT network resilience. The remainder of the paper is organized as follows.
Section 2 covers related works,
Section 3 describes our methodology—including possible communication errors and their network implications.
Section 4 presents the results of this study and
Section 5 contains a discussion of our obtained results.
2. Related Works
The security of OT protocols has been extensively studied through various methods such as formal verification, simulation, and physical testbeds [
9,
10]. As OT equipment in the field becomes increasingly interconnected with accessible IT data networks, it presents greater opportunities for improved maintenance, safety, and operability. However, these advancements also introduce new security and safety risks, as connected devices can potentially become entry points for malicious actors into a previously well-isolated OT system. The growing integration of OT systems with IT networks expands the attack surface and raises significant challenges in securing these systems.
A comprehensive review by the authors in [
11] examines attacks on DNP3 and other OT protocols over the past 15 years, shedding light on various vulnerabilities that attackers have exploited. Their review includes attacks targeting OT IP filtering systems and emphasizes the need for defense-in-depth strategies, protocol hardening, encryption, and anomaly detection mechanisms to prevent such attacks. In a more specific case, the authors of [
12] proposes an Artificial Neural Network (ANN) to detect reconnaissance attacks on the DNP3 protocol. While the approach shows promise, the authors note limitations, particularly the model’s inability to generalize, resulting from the limited size and diversity of the training and test datasets. This issue is common in OT security research, where available datasets are often insufficient to model the full range of attack scenarios.
In [
13], the authors take a different approach, by training a deep neural network to serve as an intrusion detection system (IDS) specifically for detecting attacks against DNP3. Their system demonstrated an impressive 99% accuracy in identifying attacks, showcasing the potential of machine learning to enhance OT security. Building on this work, Dangwal et al. [
14] compared the efficacy of various intrusion detection models, including decision trees, deep neural networks, and transformer neural networks. Their results showed that transformer neural networks achieved the highest accuracy, with a detection rate of 99.56%, which underscores the potential for deep learning models to improve the robustness of OT security systems.
The integration of serial OT protocols with modern communication networks, such as the internet, is another critical area of concern. These communication interfaces provide greater connectivity but also introduce new risks related to interoperability and security. The work in [
15] presents an interworking model between Modbus and IoT devices, validated in a scenario involving solar energy equipment. This research highlights the complexities of integrating legacy OT protocols with emerging technologies, where maintaining both security and seamless functionality is often a delicate balance. Similarly, in [
16], the authors assess the performance of Modbus and DNP3 protocols through both simulation and testbed environments. Their findings indicate that wireless technologies like WiFi, while widely used, are not suitable for meeting the strict latency and security requirements of OT systems.
In [
17], a security enhancement for the Modbus/TCP protocol is proposed through the use of Chaskey-12 Message Authentication Codes (MACs), as defined in IEC 29192-6 [
18]. This cryptographic technique provides an additional layer of security by ensuring the integrity and authenticity of Modbus messages, helping to prevent unauthorized access or manipulation of data during communication. Another important consideration in OT security is the scalability of communication networks, particularly in large-scale deployments. In [
19], the authors demonstrate that, due to the sequential polling method of the Modbus protocol, reliable communication becomes infeasible in environments with higher Bit Error Rates (BERs), where retransmissions are required, even before reaching the maximum number of servers supported by the protocol. This highlights the challenges faced in large-scale OT systems, where communication reliability and scalability are crucial for maintaining system performance and security.
This scalability issue is further addressed in [
20], where the authors design and configure a hardware-in-the-loop testbed based on the 2000-bus Texas synthetic grid. This testbed is monitored by DNP3 using real-world equipment, providing valuable insights into the challenges of scaling OT communication protocols while ensuring both operational efficiency and security.
Similarly, Moldovan and Ayyanar [
21] explore how the larger packet frame of DNP3 can reduce computational and transmission times, particularly in large-scale Distributed Energy Resource (DER) networks. Through network emulation with an OPAL-RT simulator, the study demonstrates that DNP3’s design is well-suited to support high-volume data exchanges in large-scale OT systems, providing both operational efficiency and improved performance in communication-heavy environments.
Recent studies have highlighted vulnerabilities in encrypted communications. One such study, published in [
22], demonstrates that even when Modbus traffic is encrypted using unpadded cipher modes like AES-GCM or RC4, side-channel attacks still manage to exploit variations in packet sizes to infer specific Modbus function codes. This is due to the limited set of Modbus commands and their distinct payload sizes, which can be distinguished through differential packet size analysis. In [
23], the authors demonstrated how the manipulation of Linux’s IP identification assignment can create a side channel for off-path attackers to infer TCP sequence numbers, enabling session hijacking without direct access to packet contents. Similarly, traffic-based and energy-based side-channel attacks in wireless networks have shown that encrypted communications and even device energy patterns can be exploited to leak sensitive information.
In addition to traffic-based side-channel attacks that incorporate packet sizes or timing; recent work by Cao et al. [
24] presents a principled method for discovering TCP side-channel vulnerabilities using a model-checking tool called SCENT. Their work highlights violations of the non-interference property between TCP connections and discovers 12 previously unknown side-channel vulnerabilities in Linux and FreeBSD. These vulnerabilities could, in theory, enable attackers to infer the state of TCP connections even without being on the path of the connection.
While research efforts have been published that provide an overview of attacks on SCADA systems and provide datasets for different attack vectors they investigated, there is no review of mitigation strategies and their outcomes against these attacks with specific OT security mechanisms. For example, in [
25], the authors create a dataset from a testbed utilizing virtual OT equipment such as PLCs and water tank sensors protected by a
pfsense firewall and a Suricata IDS. In [
26], the authors present a dataset from industrial control systems controlling gas pipeline and water storage traffic over Modbus. However, our work utilizes hardware designed for OT applications with OT-specific security features.
The primary contribution of our work presented in this paper is the exploration of underlying assumptions when delivering application data over IT protocols, that is, the timely and reliable delivery of data. We show that the resilience of communication utilizing these protocols may be different based on the outlined behavior of each device within the protocol. Specifically, we focus on the impacts of the single-ended method of communication outlined in Modbus and how that affects reliability. We collect extensive results on a multi-tiered testbed to show the implications of communication issues within safety-critical applications.
3. Methodology
In our previous work [
27], we outlined the critical states for the Modbus protocol, and the implications for the protocol’s operation from the lack of defined timeout requirements for any communication errors. With that, however, we must also consider these issues for the underlying layers of TCP/IP that transport our Modbus data. For this method, we carefully reviewed the specifications and investigated the implementation guides for both Modbus TCP and DNP3, and found differing rules for connection loss and connection re-establishment between the two protocols, which would lead to differing performance with communication outages and time-sensitive data.
To lend context to our findings within the broader OT landscape, we also consider the behavior of protocols such as IEC-61850’s [
28] GOOSE and SMV, and the IEEE C37 series. Unlike Modbus TCP and DNP3, GOOSE and SMV are designed for high-speed, event-driven, or time-critical communication directly over Ethernet, bypassing all higher-layer aspects of the TCP/IP stack entirely. SMV, in particular, supports continuous streaming of time-synchronized measurement data with strict timing constraints, making it suitable for protection and control applications that demand deterministic performance. The IEEE C37.118 protocol suite, widely used in synchrophasor communications, also emphasizes low-latency, high-reliability data transfers with explicit handling for error states and quality indicators to maintain operational continuity. These more modern or specialized alternatives highlight the limitations inherent to legacy protocols and underscore the need for well-defined recovery mechanisms in time-sensitive OT environments. This broader perspective emphasizes the relevance of our work for evaluating protocol robustness across a spectrum of industrial communication standards.
3.1. Identification of Possible Communication Failures
The implementation guide for Modbus TCP closely aligns with the traditional operating conditions of Modbus over serial communication media [
29], in particular RS-232 and RS-485. In these conditions, the client is always the only device initiating a connection to the server. We found from our review of the protocol specification that this operating principle was similarly applied to Modbus TCP, in which clients are still the only devices allowed to make TCP connections. Thus, if a device crashes or has a communication failure, it is the responsibility of the client to re-establish its TCP communication, even if the server is able to detect that communication failure and may even have pending data. In the case of a single device disconnecting, we refer to this as a single-ended communication failure. The procedures for handling these scenarios are largely dependent on the underlying transport technology—in this case, TCP/IP. These failures are subsequently remedied by the client re-initiating a handshake. However, failure to respond expands the retry time based on TCP’s exponential backoff. We contrast this to DNP3’s implementation guide [
4], which allows for connection establishment by both the master and outstation. In DNP3’s implementation, there are also rules that connections can be established as soon as data is available. Overall, we found from our review that Modbus TCP’s protocol specification is in large part driven by its serial communication roots, whereas DNP3 more aptly considers its TCP/IP-based operating environment.
For the one-sided communication crashes in Modbus TCP, we can examine them as client-based, as shown in
Figure 1 and server-based, as shown in
Figure 2. The client crashing or losing communication in some form performs similarly to standard TCP connection establishment, as the client will send a SYN packet to the server to establish a new link. This can, however, cause issues for time-sensitive communications, as the server may not accept the client’s request if ports are reused until the Keep-Alive timeout occurs or the client’s new synchronization packet prompts the server to reset the connection.
Furthermore, our review found that server crashes are more severe in their potential ramifications on OT operations, because they may, in effect, cause more time delays in re-establishment due to the exponential backoff implemented by Modbus TCP clients. In this case, a client that is unable to connect on its first attempt will increase the time between attempts until the maximum is reached, typically 64 s. This can become increasingly important, as this can easily be exploited by a malicious actor sending malicious packets that interrupt communication and thereby force the exponential backoff to approach its upper limit. As a result, the connection will remain interrupted for extended periods of time, with significant repercussions for safety-critical applications.
We modeled both client-side and server-side fault conditions using controlled packet injection and reset mechanisms. Packet capture tools such as Wireshark and device-level logging were used to trace connection recovery behavior, latency, and TCP retransmission patterns. Malicious traffic scenarios, which included spoofed SYN and RST packets, were crafted to test the robustness of client reconnection logic and protocol handling. This expanded testing and evaluation methodology ensures that our findings are grounded in both theoretical review and empirical observations.
3.2. Active Network Implications
To identify and evaluate the implications of these findings on an active network, we introduce the testbed depicted in
Figure 3. This testbed employs a process-level network utilizing serial Modbus communication for a pump, linear actuator, and temperature sensor. These devices are connected to a Programmable Logic Controller (PLC), which converts Modbus TCP communications into Modbus RTU messages to control the physical processes.
The PLC subsequently communicates with a protocol translation gateway between Modbus TCP and DNP3. This gateway facilitates bi-directional communication, allowing values from the DNP3 network to be mapped to the Modbus network. For this study, the Modbus network reads register values to determine the required position of the motor and linear actuator in the serial network. The gateway is the only master device in the DNP3 network and uses only solicited data, polled at regular intervals.
To evaluate the resistance of each device under threat conditions, we conducted targeted tests on the PLC, gateway, and DNP3 outstation. This included SYN-flood, RST, and PSH+ACK floods increased incrementally at rates of 5, 10, 15, 20, 25, 30, 35, 40, 45, and 50 packets per second. The tests were introduced with and without the activation of various mitigation strategies to assess their effectiveness in reducing or preventing disruptions under increasing network stress.
3.3. Mitigation Strategies
The configuration and commissioning of OT networks can incorporate a variety of security strategies to help mitigate failures in specific devices. In this study, we leveraged the built-in security features of the gateway device and introduced additional security components, such as a security gateway that acted as a firewall for both Modbus and DNP3 networks, as well as a software-defined switch to control the physical routing between devices. Each device was evaluated individually to assess its contributions to failure mitigation within the OT network.
For example, the communication gateway includes several security features aimed at mitigating failures. Specifically, we tested an IP whitelist, SYN scan protection, SYN-flood protection, and Denial-of-Service (DoS) prevention. The security gateway functions as a physical firewall between networks. Since both OT networks utilize TCP-based protocols, we implemented two separate firewall filters: one allowing all TCP traffic and another permitting traffic only within a specified port range for current TCP connections.
The final security appliance tested was a software-defined switch, which replaces a standard managed or unmanaged switch more commonly found in traditional OT networks. For this, we utilized an OpenFlow-configurable switch designed for OT network configurations, which is capable of enforcing physical routing restrictions. For testing, each switch was configured to allow communication exclusively between the gateway and its respective OT devices on each bus, namely the DNP3 outstation and the PLC.
Defense-in-depth for OT networks is a strategy that layers multiple security measures to protect from cyberattacks against multiple avenues. The methodology is built around the assumption that no single security feature is foolproof and combines safeguards across different levels. In this paper, we consider on-device security features, external firewalls, intrusion detection systems (IDSs), and software-defined networking for security across multiple layers. We consider and describe these in more detail in the following sections.
Our mitigation strategy evaluation emphasizes both technical feasibility and deployment practicality. We tested configurations such as static route enforcement using software-defined networking (SDN), real-time packet filtering via external firewalls, and protocol-specific security features like SYN-flood prevention. These were assessed under variable traffic loads to understand their behavior under stress.
3.3.1. Device Security Features
At the device level, on-device security features such as IP or MAC address filtering limit the traffic the device processes, in addition to runtime integrity checks, to ensure that each component operates as intended and resists tampering. This layer of defense is designed so that even with compromised devices that have allowed routing to a device, attacks may still be thwarted. These protections are the simplest to deploy, as they can be configured when the device is being commissioned for a standard deployment. However, there are limitations to the protections this provides, as the device will not have any information other than the packet data.
3.3.2. External Firewall or IDS
An external firewall aims to block malicious traffic from being sent to vulnerable devices in the first place. While several off-the-shelf firewalls are in use for IT networks, there are also specifically trained IDSs for applications such as OT networks. Recent research in this area has adopted the utilization of machine learning to improve the adaptability of such traffic monitoring methods. Deployment of these protections, however, can increase the cost, time, and effort for commissioning a network due to additional hardware being required for traffic inspection. In the case of IDSs, special training for a machine learning model may greatly increase the complexity of the deployment process. These protections also have a difficult time detecting traffic from an existing device being compromised on a network.
3.3.3. Software-Defined Networking
Additionally, software-defined networking (SDN) enhances network visibility and control by enabling dynamic segmentation and rigid communication routes, which are valuable in complex OT environments where traditional networking lacks flexibility. These add the most complexity among all of the mitigation strategies we investigated, as complex networks have a large number of connections, which would each have to be configured for specific traffic routes.
Together, these layered controls create a resilient defense framework that significantly reduces the risk of compromise in OT systems.
4. Results
For the initial testing of the exponential backoff behavior, the server was disconnected from the testbed. The backoff, as shown in
Figure 4, increased until the server was reconnected, in this case, the last delay being 42 s. However, as we show further below, this could be compounded by malicious traffic that can prolong the reconnection phase.
4.1. Testbed Effects of Malicious Traffic
The focus of the testbed experiments was to validate our protocol review findings regarding the Modbus protocol’s safety implications and also to illustrate the issues with connection re-establishment in Modbus TCP implementations. Towards that goal, we first conducted baseline performance measurements focusing on characterizing latency with and without security features, comparing the use of external security gateways and a software-defined switch. UDP broadcast packets were generated by the attacking agent, emulating additional background traffic from an active network. While each packet was routed to the interfaces of the OT devices, the packets had no data intended for the devices and were expected to be discarded during normal operation. The results, shown in
Table 1, demonstrate minimal impact from the additional security features. This is likely due to the low link requirements of OT devices (operating at 100 Mbps) and the efficient processing capabilities of both the security gateway firewall and the software-defined switch.
Since the TCP connection time was the focus of our tests, and the security appliances added limited latency to the communication, the attacking agent was subsequently utilized to imitate new connection requests using SYN messages to the server side, and reset (RST) requests to the client, as shown in
Figure 5.
This attack mimics a SYN-flood attack that might be carried out on other TCP connections. However, by changing the IP and MAC addresses listed within the packet, we can successfully trick the gateway into sending the reset packet to the PLC. This, in effect, tells the PLC (client) that the connection has been closed and a new one needs to be initiated. The effect of this can be seen in
Figure 6. If these attacks are applied consistently over a long period of time, the communication can effectively be stopped between the two devices, requiring minimal effort by the attacker.
However, there are built-in mitigation strategies within many devices for this case. An example is the SYN-flood protection available within the gateway’s configuration. However, this protection seems to only come into effect after 50 or more packets are received per second, which we tested in increments of 5, as shown in
Figure 7. This figure shows that even with SYN-flood mitigation attempts that act to prevent Denial-of-Service, access to the network makes communication disruption possible and trivial.
A further interesting observation from this behavior was that the PLC failed to correctly filter the packet for the correct port number. To have the intended effect, the attack packet needed to be timed to arrive while the PLC was awaiting a Modbus response. Any packets received outside of this window were handled correctly. Additional testing revealed that the PLC did not correctly filter these packets based on port numbers, sequences, or acknowledgment numbers. Thus, subsequent connection attempts could also be reset by the same reset packet. However, these packets required the sequence and acknowledgment numbers to increment to avoid being processed as retransmissions.
As previously mentioned, recovery from the SYN-flood attack was significantly complicated by Modbus TCP’s backoff mechanism. When the connection was abruptly terminated, the PLC would increase the time between reconnection attempts. As a result, prolonged attack traffic caused longer outages. During testing, the communication link took anywhere from 20 s to 5 min to re-establish, depending on the length of the SYN-flood attack.
Furthermore, we observed adverse effects on the gateway’s response time, even though the Modbus TCP communication link remained functional. As shown in
Figure 8, PSH+ACK packets occasionally triggered large latency spikes of up to 100 ms. These spikes were unpredictable, which could lead to issues in systems requiring low latency. Interestingly, when these spikes were absent, the gateway responded consistently within 3 ms without having to retransmit packets during the study.
Several optional security features were enabled on both the protocol and security gateways to gauge their effectiveness in preventing the aforementioned attacks. These features included SYN-flood protection, SYN-scan protection, and DoS protection on the protocol gateway. The security gateway acted as a physical firewall and was tested by limiting traffic to TCP and restricting specific port ranges. Finally, the software-defined switch was reintroduced with static route declarations to ensure that only the PLC and protocol gateway could communicate over Modbus TCP. A second switch was added to the DNP3 network to ensure that only the DNP3 outstation and protocol gateway could communicate with each other.
The results of these tests are summarized in
Table 2. From these results, it is evident that the software-defined switch effectively mitigates external attacks, as any attacks would require physical alteration of a device or access to the switch configuration. However, several security features aimed at addressing the vulnerabilities identified in the tests failed to prevent communication loss. The protocol gateway’s SYN-flood protection, for instance, only triggered when the rate of SYN packets exceeded 50 packets per second. Therefore, attacks with fewer than 50 packets per second were capable of disrupting Modbus TCP communication without triggering the protection. Similarly, the DoS prevention mechanism encountered similar limitations, as the attack traffic did not meet the threshold required to activate packet blocking.
The security gateway had more success mitigating SYN-flood and RST attacks by filtering traffic to specific ports. However, standard firewall features that restricted traffic to TCP alone failed to prevent SYN attacks, and port filtering only worked if the SYN-flood did not duplicate the restricted ports.
4.2. Limitations
This work focuses on two specific OT protocols: Modbus TCP and DNP3. These are two very widely deployed protocols that are representative of different protocol generations: whereas Modbus TCP reflects legacy constraints, DNP3 offers a more robust and modern behavior. We chose to focus on these two protocols because they share commonalities, but also provide a good contrast of approaches, resulting in divergent behavior despite sharing a common transport mechanism. Other OT protocols, like IEC-61850, PROFINET, and EtherNet/IP, each have distinct architectures, timing models, and fault-handling mechanisms. Since the focus of this study was on two protocols that rely on existing IT protocol stacks, it may not be fully generalizable to all OT communication stacks. Additional protocols will be the future subjects of study in our ongoing efforts into OT protocol cybersecurity.
The multi-tiered testbed developed for this study included both physical hardware and emulated components, which provided a flexible and cost-effective evaluation platform. However, real-world industrial deployments often involve heterogeneous device configurations, aging infrastructure, and environmental challenges such as electromagnetic interference or extreme operating conditions. It is impractical to fully incorporate these factors into cybersecurity studies. This would require deployments in actual production environments, which for energy sector deployments is highly dangerous and could severely impact energy grid operations. Thus, our studies are all conducted in a lab environment under controlled conditions that mimic real-world conditions as best as possible. As a result, our tests may not reflect edge cases such as cross-vendor integration complexities of equipment aging effects, but we posit that it is nevertheless highly representative and provides valuable insights into OT cybersecurity considerations.
5. Results and Discussion
Table 3 outlines the contribution of our work compared to other studies within this field. As shown in this table, other studies do not perform a comparative analysis of OT-specific security devices and their individual efficacy against common attack vectors. These off-the-shelf solutions show realistic performance in relation to real-world deployments.
These real-world deployments often include a mix of vendor-specific devices, legacy firmware versions, and environmental constraints such as EMI or thermal variation. While our controlled testbed cannot replicate these factors directly, our previous works have shown that these tests reflect realistic operational topologies and threat models.
Furthermore, it is not feasible for our evaluations to study long-term effects, such as device aging, on the performance metrics we are evaluating here, nor would we expect OT devices to change their networking behavior due to aging, but rather as a consequence of firmware upgrades or topology changes. Thus, given the intractable large variability of these parameters within real-world production deployments, we consider such an evaluation to be out of scope. Also, while it would be advantageous to be able to conduct our experiments on live production environments, such as actual substations, this is neither safe nor allowed in most such environments. Hence, all of our experiments were conducted in environments that closely mirror such deployments, with devices commonly found in use in such environments, and with configurations that are typically employed with them. We did our utmost best to replicate networking conditions in our controlled lab environment as we would encounter them in production environments. Thus, we posit that our test results closely mirror results we would obtain from such environments.
Due to these facts, all of our results, insights, and mitigation strategies are very much applicable to actual production environments. They can be applied through firmware modifications as part of regular maintenance efforts. Our future work will test this approach in more detail through our collaborations with OT device manufacturers. We will explore remedial actions in our lab testing environment using actual OT devices and corresponding firmware revisions. Our future work will also incorporate additional high-fidelity production environment replicas through hardware-in-the-loop and emulated traffic traces.
The results of our protocol review and subsequent testbed validation experiments highlight significant disparities in the resilience and responsiveness of Modbus TCP and DNP3 when subjected to communication failures. One of the primary issues identified was the impact of Modbus TCP’s reliance on a single-ended communication model, which delegates all responsibility for connection re-establishment to the client. This architecture, inherited from Modbus’ serial communication roots, lacks the robustness required for time-sensitive operations within modern OT environments, and was clearly illustrated by our experimental results. We also showed that the exponential backoff mechanism, while a standard feature of TCP/IP to prevent congestion, introduces unacceptable delays in scenarios where prompt data transmission is critical, such as when OT protocols are used in safety-critical environments or scenarios. Our tests revealed that connection recovery delays can approach or exceed 40 s, which could be catastrophic in industrial contexts requiring real-time actuation or monitoring. Specifically, excessive latency can, for example, cause improper operations of assembly lines [
30] resulting in defective products, or damage to smart grid components in the event of cascade failures not being curtailed in time [
31].
In contrast, DNP3’s more symmetrical approach to connection management, where both the master and outstation may initiate communication, provides enhanced resilience. The specification’s support for reconnection upon data availability ensures quicker recovery from network disruptions, particularly in distributed or resource-constrained environments. These protocol-level behaviors underscore the importance of selecting communication standards based not just on compatibility or performance, but also on fault tolerance and the expected operational environment.
This raises an important consideration for OT system operators: the tangible benefits of upgrading legacy systems to modern, networked architectures. One key advantage is improved communication performance—not only in terms of speed, but also reduced latency, increased reliability, and lower message loss rates. These enhancements benefit not just sensors, but also actuators, PLCs, and other critical components that rely on a timely and accurate data exchange. Upgrading also enables broader protocol support, improves interoperability across devices, and helps phase out outdated or insecure technologies. While such transitions can introduce new security challenges, they also provide the foundation for greater system visibility, future-proofing, and more robust operational control.
Our testbed simulations further reinforced these findings. The integration of Modbus TCP with DNP3 via a gateway offered a practical bridge between legacy and modern OT systems but also introduced new potential points of failure. In particular, gateway devices, if misconfigured or targeted by malicious traffic, may become bottlenecks or vulnerabilities in the network. However, our implementation of defense-in-depth mitigations, including a dedicated security gateway and software-defined switching, proved to be effective. Despite introducing minor additional latency (as shown in
Table 1), these tools can significantly enhance the security posture of OT networks with negligible performance penalties.
Importantly, these mitigations help address a broader concern observed in related works: the lack of intrinsic security in many legacy OT protocols. Unlike newer industrial communication standards, both Modbus TCP and DNP3 were not originally designed with cybersecurity threats in mind. While DNP3 has evolved to include secure authentication, the default behavior of many deployed systems still reflects legacy assumptions. Therefore, supplementary protection mechanisms, such as whitelisting and SYN-flood prevention, remain essential components in securing these networks.
Our results also align with findings from previous studies. Similar to other studies that identified the performance degradation of Modbus in high-BER environments [
19], our observations show that the protocol’s resilience is further reduced under real-world failure conditions. Moreover, the minimal additional latency introduced by modern OT security appliances supports the position advocated by [
17] that security enhancements can be applied without sacrificing system performance.
Future Work
All experiments were performed on a lab-scale network, involving a limited number of devices, links, and communication paths. While this scale is sufficient to evaluate protocol behavior and demonstrate specific vulnerabilities, larger-scale systems, such as smart grids, may exhibit slight behavior differences due to load, event rates, and other system stress factors. For example, issues like latency amplification, network-wide retransmission storms, or cascading connection failures in distributed systems were beyond the scope of this work. These scalability dynamics merit further study in future work using high-fidelity, distributed testbeds. The attack models explored were also executed under controlled conditions, focusing on the specific vulnerabilities exposed by the TCP exponential backoff behavior. Our future work will widen the scope and incorporate a broader threat model with both passive and active attack vectors.
One important additional direction for future work is a more detailed examination of the performance–security trade-offs associated with deploying layered defense mechanisms in time-sensitive OT networks. While our study observed and reported only minor latency increases from measures such as SYN-flood protection and port filtering (as shown in
Table 1), it did not evaluate whether these delays are acceptable in applications with strict timing constraints, such as sub-10ms control loops found in power grid protection, motion control, and high-speed manufacturing systems. Future experiments will focus on measuring end-to-end system responsiveness under these conditions, using hardware-in-the-loop (HIL) setups and real-time process control benchmarks to quantify any disruption to deterministic performance.
6. Conclusions
In this study, we examined the behavioral differences and vulnerabilities inherent in two widely used OT protocols, Modbus TCP and DNP3, within the context of communication failure scenarios and security threats. Driven by our protocol specification review, we identified reconnections as a critical consideration and differentiator between these two protocols, especially for safety-critical OT applications. Through a multi-tiered testbed, we then validated our finding that Modbus TCP’s dependence on single-ended recovery mechanisms and TCP-level exponential backoff creates significant latency issues in time-sensitive applications. Conversely, DNP3’s dual-role communication model provides more resilient recovery behavior in the face of disruptions.
The results emphasize the importance of protocol-aware design in OT system architectures, particularly when integrating legacy systems into modern networked infrastructures. Furthermore, we showed that defense-in-depth strategies, such as the deployment of security gateways and software-defined switches, can effectively mitigate common vulnerabilities without imposing considerable overhead.
As the convergence between IT and OT continues, and as Industrial IoT deployments grow in scale and complexity, future work should focus on developing adaptive protocol frameworks that combine the backward compatibility of legacy standards with the fault tolerance, scalability, and security required in today’s industrial environments. Additionally, broader adoption of protocol-aware intrusion detection systems and protocol-hardening techniques will be essential in safeguarding critical infrastructure from evolving threats.