1. Introduction
In the vehicle communication system, research on fault tolerance in the large-capacity communication network is rapidly increasing due to the increase in the amount of vehicle data transmission along with the bandwidth demand for large-capacity data transmission. Although various studies are underway for large-capacity data transmission [
1], the research should be considered in the environments with Giga-band or higher bandwidth communication. Meanwhile, modern vehicles are increasingly equipped with systems such as brake and stability control, lane departure warning systems (LDWS), in-vehicle infotainment systems (IVI) offering various convenience features via electronic control units (ECUs), and safety enhancement functions such as parking assistance systems (PAS) [
2]. Remote vehicle diagnosis, the emergency rescue vehicle, such as the required size of the data communication service, is also increasing. As a result, many companies are subject to a vehicle with synchronous Ethernet-based networks, which are applied to the in-vehicle network (
herein after called INV) pertaining to the most important standard [
2]. The global automotive industry, the automotive software platform (e.g., AutoSAR [
3], GINI-V [
4]), and synchronous Ethernet with the same standard for building a collaborative research environment, starting with an open system for automotive electronics, aim to standardize the interface using very active research [
5].
In particular, image-based obstacle recognition systems, and in a variety of applications such as automotive black box processing systems, require large amounts of information around the vehicle and are the standardization of the network backbone to be prepared. Especially, RAPIEnet [
6], an Ethernet standard that has established itself as an international standard for industrial Ethernet, is widely used in the development and production of vehicle safety systems, including in applications approved as real-time automation protocols for Ethernet [
7].
Ethernet has emerged as a critical technology for enabling synchronous communication between electronic control units (ECUs), maintaining a consistent phase relationship between transmission and reception to ensure reliable interactions among multiple devices. With the rapid surge in user demand, the required network bandwidth is increasing at an unprecedented rate [
8]. Consequently, existing vehicular communication technologies such as Controller Area Network (CAN) and FlexRay [
9] are being re-evaluated worldwide, including through experiments involving novel types of soft errors, in efforts to overcome their limitations in high-capacity data transmission.
Figure 1 is a hierarchical explanation of the in-vehicle communication structure. It refers to the overall configuration of the communication system inside and outside the vehicle, designed based on the backbone network for high-speed communication called synchronous Ethernet (Automotive Intelligent Network) at the top. In particular, it supports large-capacity data transmission and time-synchronized information exchange between major systems inside the vehicle. At the bottom, the Infotainment Bus and Control BUS (e.g., CAN, FlexRay) are explained separately. The Infotainment Bus processes user-centered infotainment functions such as car navigation, audio, and media, and the Control BUS stably processes the real-time data of the control system (braking, steering, engine, etc.). If this integrated network architecture is actually tested, it will be utilized as a core basic technology for implementing intelligent systems in vehicles.
In this paper, we first present a theoretical framework for analyzing a multi-port fault tolerance mechanism. The proposed approach addresses the increasing complexity of in-vehicle networks (IVNs) by enabling robust communication among multiple nodes, even under fault injection scenarios. This analysis serves as a foundational step toward the development of reliable automotive communication protocols. Also, we develop a scalable performance evaluation framework based on a packet-level (i.e., data-level) injection system. This framework supports flexible testing across various data rates and communication scenarios, making it suitable for a broad range of automotive applications. The proposed architecture enables modular scalability and repeatable testing procedures, providing a practical means for validating both the functional reliability and performance efficiency of embedded in-vehicle network systems. Finally, we design and implement a Network Interface Controller (NIC) at the hardware-level using Verilog Hardware Description Language (HDL). The hardware architecture enables protocol-level verification through simulation, ensuring accurate timing behavior and real-time fault recovery. This contribution bridges the gap between theoretical modeling and practical implementation, offering a viable blueprint for the development of fault-tolerant NIC architectures in automotive environments.
A preliminary research paper suggested that, in the case of a dual-port hardware redundancy system with four or more grants from the side using the NIC port to control the algorithm, it proves more unstable and is mathematically very difficult [
11,
12,
13]. The differences and contributions compared to existing methods are summarized as follows.
- (i)
We design and implement a fault-tolerant Ethernet-based NIC architecture for in-vehicle networks (IVNs), supporting 1 Gbps throughput and multi-port configurations (≥4 ports), whereas prior works have primarily focused on dual-port redundancy.
- (ii)
We propose a novel fault injection and recovery algorithm that integrates CRC-based error detection with real-time correction logic, enabling fast recovery (<1 μs) from injected transmission faults.
- (iii)
We provide a comparative evaluation against conventional protocols (e.g., CAN, FlexRay, and commercial NICs such as RAPIEnet and BroadR-Reach), highlighting the proposed system’s superior scalability, deterministic timing, and robustness under fault conditions.
In particular, we evaluate the communication performance of the IVN from a system-wide fault tolerance perspective by utilizing a commercial automotive physical layer solution BCM89810, a single monolithic CMOS chip [
14,
15].
The overall structure of this paper is as follows.
Section 2 describes the background and related research.
Section 3 presents a theoretical discussion on nodes, ports, and connections of a multi-port NIC card.
Section 4 discusses the methodology for verifying the system with a fault-tolerant algorithm.
Section 5 presents a network-based performance evaluation by setting up a test environment, and
Section 6 presents verification results using different data types for performance evaluation.
Section 7 presents performance verification of fault tolerance through hardware design using an actual HDL design method. Finally,
Section 8 presents our conclusions.
2. Background and Related Work
Since the early 1960s, engineers have recognized the importance of fault injection in the design of microprocessor-based electronic systems. However, at the time, simulation models were often implemented late in the design process, immediately before final validation, resulting in limited effectiveness in verifying timing-related issues [
16]. Early relatively simple embedded systems could be verified using basic simulations. However, as embedded systems increased in complexity and functionality, the potential impact of deploying defective products grew significantly, leading to increased costs and risks associated with failure. Consequently, it became essential to test system reliability through software simulations before hardware production and board-level integration [
17].
Fault injection methodologies are generally classified into three categories: (i) hardware-based approaches, (ii) software-implemented techniques, and (iii) simulation-based methods [
18,
19]. Simulation-based techniques are typically used in early design stages, whereas hardware fault injection is more commonly applied after manufacturing physical prototypes.
Design engineers frequently utilize high-level modeling techniques such as Petri nets [
20], queuing models, and Dependability Parameter Estimation (DPE) to evaluate fault tolerance. However, to achieve precise and meaningful results, it is often necessary to refine these models at a lower level of abstraction [
21]. System reliability is increasingly being verified through software-based simulations as a prerequisite for real-world deployment [
22].
Although several global companies have already developed and commercialized Ethernet and synchronous Ethernet (SyncE) protocols and PHYs, research and development in the automotive domain remain incomplete. Notably, comprehensive testing methodologies that integrate both hardware and software components are still lacking [
23]. This limitation highlights the urgent need for holistic test frameworks that address the unique fault tolerance requirements of vehicular networks.
In this context, the present study proposes a high-level fault-tolerant algorithm and validates its applicability within a synchronous Ethernet-based in-vehicle communication system. Prior research on fault-tolerant communication and SyncE integration is also reviewed to ensure the algorithm’s compatibility and effectiveness under realistic network conditions [
23].
Synchronous Ethernet systems are inherently vulnerable to physical-layer failures such as line short circuits, which can severely compromise reliability. To address this, redundant fault-tolerant Ethernet architectures have been proposed, aiming to minimize the probability of communication loss by leveraging partial redundancy [
24]. Traditional Ethernet systems transmit packets over a shared medium using a best-effort approach. While this method provides high bandwidth utilization, it does not support deterministic or real-time behavior, which is crucial for automotive applications [
25].
Achieving accurate time synchronization is critical in distributed systems that require coordinated data exchange. The IEEE 1588 Precision Time Protocol (PTP) has been proposed as the foundational technology for synchronized communication in such systems [
26,
27]. Current IEEE standardization efforts on synchronous Ethernet are gaining attention due to their potential to overcome the limitations of asynchronous Ethernet, particularly by enabling deterministic quality-of-service (QoS) guarantees [
28].
Recently, as the complexity of in-vehicle communication systems increases, various IEEE-based communication technologies are being introduced. In particular, the IEEE 802.3ch-2020 standard supports high-speed Ethernet communication of 2.5 Gb/s, 5 Gb/s, and 10 Gb/s in a vehicle environment, enabling real-time transmission of high-resolution sensor data and high-speed control signals [
27,
29].
In addition, the IEEE 802.1AS-2020 standard is a standard for time synchronization of in-vehicle networks, providing precise time synchronization at the ≦1 μs threshold level to meet the requirements of real-time control and multimedia applications [
30].
Since the introduction of such high-speed communication and precise time synchronization technologies directly affects the capacity and performance of in-vehicle communication networks, it is essential to verify performance considering communication capacity from the design stage. This ensures the stability and reliability of the system and guarantees the scalability and compatibility of future vehicle communication systems. Based on these technological advances, it is important to actively utilize IEEE standards in the design and verification process of vehicle communication systems, which will play a key role in the development of autonomous driving and connected car technologies in the future.
In the automotive domain, manufacturers face numerous challenges in analyzing and controlling internal data flow. During the development of new hardware and software models, system safety must be ensured through rigorous analysis and validation of underlying data structures. In particular, safety-critical applications demand verified system behavior before deployment. Therefore, simplified yet efficient transport protocols capable of handling real-world data within the constraints of embedded controllers must be designed. By incorporating fault tolerance into these communication frameworks, developers can improve reliability while expanding the system’s applicability across a broader range of automotive and industrial use cases [
31]. Existing in-vehicle network (IVN) standards, such as CAN and FlexRay, have been widely deployed in automotive control systems due to their reliability and cost efficiency. However, their limited bandwidth (CAN ≤ 1 Mbps, FlexRay ≤ 10 Mbps) cannot accommodate the growing demand for high-throughput data transmission required by modern sensor and infotainment systems. BroadR-Reach (IEEE 802.3bw, 100BASE-T1) addresses wiring complexity by enabling 100 Mbps over a single UTP cable, but it does not natively provide deterministic or fault-tolerant communication. RAPIEnet, originally developed for industrial Ethernet, supports redundancy and Ring topologies with recovery times < 1 ms, making it suitable for safety systems, but its deployment in automotive environments remains limited. Synchronous Ethernet (SyncE) ensures clock-level synchronization but requires additional protocols for error tolerance.
In contrast, the proposed NIC architecture integrates CRC-based error detection, real-time recovery logic (<1 μs), and multi-port scalability (≥4 ports). This design ensures deterministic timing, robust fault tolerance, and scalability to 1 Gbps, surpassing existing IVN technologies. A comparative summary is presented in
Table 1.
As shown in
Table 1, the proposed NIC outperforms conventional IVN technologies by combining the scalability of Ethernet with built-in fault-tolerant mechanisms. Unlike BroadR-Reach, which prioritizes cost and cabling efficiency, the proposed system achieves both high bandwidth and deterministic timing. Compared to RAPIEnet and SyncE, the proposed NIC provides faster recovery from injected faults (≦1 μs) and ensures robust operation under multi-port conditions, making it particularly suitable for safety-critical and real-time automotive applications.
3. Theoretical Analysis
In the context of modern high-performance computing systems and data-intensive embedded architectures, the design of scalable and efficient NICs plays a pivotal role in determining the overall communication performance, fault tolerance, and expandability of the network subsystem. To facilitate the theoretical foundation for such NIC architectures, we define a formal model of NIC-port configurations and interconnect structures that serve as the basis for performance analysis and architectural decisions.
Figure 2 illustrates the inter-node communication structure, where each node is modeled as a multi-port module. Using the formal port-to-port mapping
Ci,j(k), we define a scalable connection model capable of supporting redundancy, bandwidth aggregation, or failover routing, depending on the topology constraints and application context. This architecture enhances fault resilience and improves system availability under link or port failure conditions.
In the proposed multi-port communication model, each node
Ni is equipped with
n ports, indexed from
i to
i +
n − 1. The connection between two nodes
Ni and
Nj is established via a dedicated port-to-port mapping, denoted as follows:
where
Pi(k) and P
j(k) represent the
k-th port of nodes
Ni and
Nj, respectively, and
Ci,j(k) denotes the
k-th connection channel between the two nodes.
The complete connection set
Ci,j between
Ni and
Nj can be defined as follows.
Therefore, we can define the relationship between these two relations again as a mathematical formula. Let each node Ni be equipped with n ports, denoted as Pi = {Pi1, Pi2, …, Pin}. A directed communication link between port Pia on node Ni and port Pjb on node Nj is defined as Ci,j(a,b) = Pia → Pjb. The complete set of possible connections between nodes is, therefore, Ci,j = {Ci,j(a,b) | a,b ∈ [1,n]}.
This summation is generalizable to NICs with heterogeneous designs, allowing for the inclusion of devices with different bandwidth capacities, signaling standards, or functional roles.
Additionally, this again raises the need to clearly examine the mathematical model and analysis from the perspective of multiple multi-ports (P) that a one node (N) has. Let us examine this again from a connection (C) oriented interpretation rather than a port-oriented interpretation.
Figure 3 depicts a connection-oriented communication model between two nodes,
Ni and
Nj, each composed of multiple ports. Rather than focusing solely on individual ports, the model highlights the logical connections
Cx ∈
C{i, j} that represent directed data paths between source and destination ports. This abstraction enables a modular understanding of how data flow across the network layer, especially in systems with multi-port Network Interface Controllers (NICs).
The bold dark-blue arrows represent active and valid one-way transmission paths (e.g., Pi → Pj, Pi → P{j+1}, or Pi → Pn). These connections are currently engaged in data transmission and are monitored for integrity and consistency. Conversely, the light-blue arrows indicate potential or standby links (e.g., Pj → Pi), which are either inactive or in passive listening mode, serving redundancy, load balancing, or fault recovery purposes.
Each connection
Cx is defined formally as follows.
The complete connection set between the two nodes is given by the following:
where
k ≤
n2 depending on whether the network uses full-mesh or selective connections.
The notation C1 ⇒ C4 and C1 ⇒ C2 annotated below the figure suggests the dynamic redirection or logical mapping of connection identifiers based on context or rerouting policies. This highlights the flexibility of the communication layer to adapt connection roles or endpoints under dynamic runtime conditions.
To further describe the port-to-port relationship, consider two arbitrary ports:
Pia, the
a-th port on NIC
i, and
Pib, the
b-th port on NIC
j. These two ports may be physically or logically connected via a communication channel C
{i,j}{(a,b)}, defined as follows.
This directional connection indicates that port
a on NIC
i is transmitting to port
b on NIC
j. In symmetric or full-duplex systems, the reverse link may also be assumed.
The nature of the communication between Pia and Pib is significant in architectural design. For instance, one-to-one deterministic mappings are typically used in latency-critical embedded systems, whereas shared multi-point mappings may be employed in data center fabrics to support flexible routing. The number of such connections, their topological arrangement, and their symmetry properties directly influence routing table size, congestion management, and fault containment strategies.
For a system with
n NICs, each with
n ports, the full set of pairwise inter-NIC port connections forms a complete bipartite graph in the space.
To analyze the scalability of such NIC interconnection models, particularly in systems where network expansion is anticipated, we consider the asymptotic behavior of inter-NIC connection growth.
This result implies that the marginal increase in interconnection complexity approaches a constant ratio of 1 as n → ∞, which suggests a bounded growth rate of interconnection overhead.
Figure 4 presents a conceptual abstraction of an in-vehicle communication structure. In
Figure 4a, a top-down view of the vehicle is shown, indicating the physical layout for control modules.
Figure 4b models a logical communication hierarchy in which upper-layer decision-making nodes interface with lower-layer wheel actuators through switches
S1 and
S2.
Let each connection between a module and switch be represented as follows.
The full set of communication paths in the system is defined as follows:
where
C1 and
C2 represent connections from decision nodes to switch
S1.
C3 and
C4 represent connections from wheel actuators to switch
S2.
To analyze in more detail, if we perform Hierarchical Layer Interpretation, it can be divided into the following three parts. (i) Upper Layer (Control Plane): Nodes interfacing through C1 and C2 to switch S1 handle decision making, such as steering logic or velocity planning. (ii) Middle Layer (Switching Plane): Switches S1 and S2 form the backbone of communication, enabling data routing and isolation between layers. (iii) Lower Layer (Execution Plane): Actuators (wheels) receive control commands through C3 and C4, enabling real-world action such as steering and traction.
In conclusion, designers of such communication systems can simulate communication links, evaluate bandwidth usage, and identify bottlenecks in the overall system by mathematically modeling the connection(
C)-centric interactions across nodes (
N) as well as specific ports
Pia and
Pjb. This is especially important when designing multi-port NICs for area-wide automotive networks, multi-host RDMA clusters, or containerized microservice environments, where traffic patterns between nodes or ports are non-uniform and dynamically reconfigured. Such structures are also trending toward modeling real-world automotive Ethernet or ECU communication frameworks based on service-oriented architecture (SoA) [
31,
32].
4. Methodology
Related research on fault-tolerant high performance computing via a coding approach has been conducted by many researchers for a long time [
33,
34]. In addition to performing validation and verification at the entire system level, there are also scientists who use formal languages to perform formal validation at the chip level or gate level [
35]. In such vehicular communication networks, it is also more important to efficiently check the stability of the system. In this paper, validation of the entire system of the vehicular communication network, i.e., INV, was considered as a priority. In order to verify the soundness of the system through the research results by fault injection [
36], we focused on improving the algorithm [
37] that selects an intended cyclic redundancy check (CRC) method to inject faults into the system and check.
Therefore, verification at the system level must be carried out, and the first priority for this is how to conduct the experiment. As shown in
Figure 2 and
Figure 3, we must first decide on the initial number of NICs and the data to be transmitted accordingly. The experimental data were divided into integer-type data and character-type data for testing purpose, and the number of data used in the experiment was set to be the same. Therefore, in the actual experiment, five data sizes of 100, 343, 700, 5000, 10,000, 16,384, and 65,536 were applied to the system simultaneously. At this time, 100, 343, 700, 5000, and 10,000 are empirical values determined by the experiment, and 16,384 and 65,536 were determined by the pixel data size ratio compared to the 128 × 128 image, which is the image data with the same number of pixels, instead of avoiding the experiment by proportion. At this time, 128 × 128 means the number of image pixels.
As the first step, the port value of each NIC is initialized. The initialization function generates a value of between 1 and 4095 and artificially connects a random IP to the point where the error occurs [
38].
Furthermore, we would like to explain the process of developing a fault diagnosis and recovery algorithm using cyclic redundancy check (CRC-16). In general, CRC-16 is a widely used error detection algorithm that processes a stream of input bits and computes a fixed-size 16-bit checksum (also called a codeword or remainder) using a predefined generator polynomial (in general, x16 + x15 + x2 + 1). Algorithmic behavior should be explained as follows. CRC computation typically involves the following: first, the bitwise division of the input message polynomial by a fixed generator polynomial, and second, the use of XOR and shift operations at each step to update a register (often modeled as a Linear Feedback Shift Register, or LFSR). Also, depending on the implementation, two common variants exist: bitwise CRC, which processes each bit of the input sequentially, and bytewise or table-driven CRC, which processes each byte (or word) using precomputed tables, improving performance via lookup operations.
We present a simulation test that was used to conduct experiments on five types of errors using QPFT between NIC 1 and NIC 2. Among them, injection fault, data loss, injection noise, access fail, and wrong data are well expressed visually. The following is an algorithm simulation screen showing how the system detects and automatically recovers the errors that occurred.
Assuming that there are 5000 data in one file, we set it so that any piece of data, regardless of its type, causes a problem in the NIC. This is the stage where the fundamental fault-based validation approach that injects an intended fault begins. After that, the NIC sets an error at 0, and when initial loading is performed, the fault injection is rejected based on the research hypothesis that there are no data. At this time, we applied a method to check the health of the entire system through CRC-16, and the first data set that occurs overall may not be a problem. Each NIC is designed to be connected as shown in
Figure 2.
N0 and
N1 are responsible for transmission to Port 2, and Port 3 is responsible for reception. When receiving data from NIC 1, 2, and 3, the received data are 0, 1, and if the copies of data Port 0 and Port 1 are different, the CRC code of NIC_0 and NIC_1’s Port 0 and Port 1 are compared to determine the conversion value of all data values by comparing the codes with the same port number before the NIC is judged normal. Also, as shown in
Figure 3, the NIC is converted back to the end of the first NIC when designing the Ring-type topology of the current network structure, and the NIC information is transmitted.
The current program can always transmit packet data of CRC-18 bytes with CRC-16 or higher, and the padding length can perform the operation. The integer-type data and character-type data used in the experiment have the same integer value as the length of the data generated in the experiment, which is similar to the actual data length of all generated data. If the length of both data is different, the packet payload size may vary, so in the case of the widely used automotive CAN, Flex Ray, and UDP environments in the current network environment, the size of the 20-byte packet allows for the implementation of the padding function as the experiment progresses [
39,
40]. Also, time complexity analysis should be explained as follows. Let
n be the total number of bits in the input packet. The complexity of CRC-16 computation can be analyzed in two common implementation approaches: bitwise and bytewise (or table-driven). In the bitwise CRC-16 method, each bit of the input data undergoes a fixed sequence of operations: primarily shifts and XORs. Since each bit is processed individually in constant time, the total number of operations is proportional to the number of bits n, resulting in a time complexity of
O (
n). In contrast, the bytewise or table-driven CRC-16 implementation utilizes precomputed lookup tables to process data one byte at a time. Although only
n/8 bytes are processed, each byte still requires a constant-time lookup and XOR operation. Therefore, the total operation count is
O (n/8) which simplifies to
O (
n) in terms of asymptotic complexity. Thus, regardless of the implementation method, the CRC-16 algorithm exhibits linear time complexity relative to the input size, denoted as
O (
n).
In the algorithm of the existing study [
41], two ports were used when using a dual-port NIC [
42]. However, in the new type of board, multiple ports (at least four ports) are used to build a communication environment between data. In addition to handling errors that occur on their own, we aim to implement a redundant switch to reduce the occurrence of errors. Partial or full redundant circuits are one of the causes of failures and can be problematic because they minimize the probability of stable operation [
43].
The newly proposed quad-port NIC has the advantage of high transmission speed of at least two external ports. This overcomes the switching redundancy that can occur in the existing dual-port NIC itself and secures data transmission and system efficiency. Considering the importance of in-vehicle communication (INV) as well as the redundancy of external communication (OVN), the quad-port NIC design is essential for more stable data transmission. The currently implemented fault-tolerant program is structured in the form of a Mesh topology; refer to the proposed pseudo-code implemented in the existing way using the two ports at the top of the NIC system flow diagram shown in
Figure 4 based on the program implementation using a quad-port, i.e., four-port NIC.
This pseudo-code shows the multi-port NIC communication process, including IP acquisition, data exchange per port, and integrity verification through CRC-16. When a mismatch is detected, an error notification containing the entire data block is broadcast, and all listeners respond via a predefined port, enabling a synchronized recovery or acknowledgement response. This mechanism supports reliable and fault-tolerant communication in embedded or distributed network systems.
The proposed pseudo-code (Algorithm 1) models a fault-tolerant communication scenario between two NICs, each equipped with four ports. Unlike previous dual-port approaches, the proposed quad-port design ensures fast deterministic timing and superior fault tolerance. Communication is established between corresponding ports of NIC
i and NIC
j (i.e., NIC
i.port
k ↔ NIC
j.port
k), enabling parallel and isolated data exchange. A fault is intentionally injected into a specific port at a predefined iteration to evaluate the system’s robustness. CRC16 (CCITT) is used to verify the integrity of transmitted and received data on each port, and if a mismatch is detected, a recovery mechanism is triggered to restore the corrupted data.
For the interpretation related to CRC or linear fault detection code, consider the above Equations (1) to (3). Equation (1) represents the process of calculating the CRC value
Ci of the current iteration by applying the generator polynomial
Gcrc to the given message
Mi. Equations (2) and (3) are used as conditions for verifying the integrity of data by comparing the CRC values between consecutive iterations. That is, if
Ci ≠ C
{i−1}, there is a possibility that a data error has occurred, and if
Ci =
C{i−1}, it means that the data have been maintained normally.
Algorithm 1. Fault-tolerant multi-port NIC communication process. |
Step 1. Initialize communication ports and assign IP addresses. Step 2. Establish port-to-port data exchange across multi-port NICs. Step 3. Perform CRC-16 verification for transmitted data blocks. Step 4. If mismatch detected, broadcast error notification. Step 5. Execute synchronized data recovery procedure using redundant or mirrored blocks. Step 6. Confirm restored CRC value and resume normal communication. |
The proposed algorithm enhances data recovery by simplifying CRC-based validation through the addition of a function that assigns and tracks a unique identifier for each NIC port [
44]. This approach enables efficient data verification by associating CRC codes with specific ports. As shown in Equation (1) [
45], CRC calculation is fundamentally a division operation. Each message block is encoded into an
n-bit codeword by dividing a predefined bit string, represented as a polynomial, by a generator polynomial. The resulting remainder becomes the CRC value. This process is efficiently implemented using bitwise shift operations and modular polynomial arithmetic.
The algorithm employs the conventional CCITT CRC-16 library, which computes CRC values based on the assigned unique identifiers for each port [
46]. By streamlining CRC computation and tracking, the method supports fast and accurate data integrity checks. In hardware implementations, such as those using Verilog, the CRC-16 algorithm is typically realized using a Linear Feedback Shift Register (LFSR) that updates in parallel with each clock cycle. This allows each byte of input data to be processed per clock cycle, enabling pipelined execution. While the overall latency of the CRC-16 operation remains
O (
n)—where n is the number of input bits—the throughput can be significantly improved through pipelining or parallelization techniques. The CRC-16 computation logic, thus, maintains linear time complexity
O (
n) with respect to the input packet size. This computational efficiency, combined with its minimal hardware resource requirements, makes CRC-16 particularly well suited for real-time and embedded systems, such as those found in automotive communication environments.
Moreover, the proposed algorithm addresses error detection and data loss by injecting and comparing data to identify corruption in stored information. Unlike previous approaches, which relied on retrieving and comparing stored CRC values after an error occurred [
41], this method eliminates the inefficiencies of historical comparison. Each NIC port and its associated memory perform real-time CRC checks, allowing for the immediate detection of transmission errors without incurring additional computational overhead.
6. Quantitative Evaluation of Transmission Efficiency Based on Data Type Representation
We conducted the usability experiments of the proposed algorithm as follows. First, we controlled data transmission to verify the behavior of the integer and text data used. The data length used at this time may vary depending on the situation, and assuming that the code and control data continue to generate different data while the actual program is running, the length of the integer will be 0 to 255.
Similarly, assuming that the message data length and content processing speed vary over time, the srand() function is used to generate random data. Since the generated data is plain character data representing ASCII code values, the value is specified to be newly generated by increasing the lowercase letters when checking the readability of the data.
The content of the data is randomly generated, but the length of the payload data to be stored is optimized so that it can be implemented as a value of 0 in all CAN, FlexRay, and UDP environments. Thus, data are added to 18 bytes or 20 bytes of CRC-16 or higher to perform the operation [
13].
The current program is substantially not to perform communication. When performing in the actual communication environment, it will take longer than this. At this time, the delay that occurs during the network connection process occurred about 3
us, and this was implemented by adding each numerical error to reduce the time taken to pass through the NIC switch [
13]. In addition, when the experiment was conducted using three NICs, two test progress and transmission intervals occurred, resulting in a total delay time of 6
us. When the number of NIC switches was left, if a one-way transmission delay time occurred, the result was derived as in Equation (11).
To quantitatively represent the relationship between the number of NICs and the accumulated transmission delay, a delay propagation model is proposed. This model assumes a base delay of 3 μs per NIC handover, excluding the final reception stage. Based on this assumption, the following expressions are derived to estimate the deterministic delay
. and the final transmission time
:
where
represents the cumulative deterministic delay introduced by intermediate NIC switches, calculated as
×
, and
denotes the final transmission time, obtained by summing
and the final reception timestamp
.
This formulation enables accurate estimation of transmission latency in multi-hop Ethernet systems, especially under fault-tolerant or time-critical communication environments. As the number of NICs increases, the additive delay becomes a significant factor in real-time performance analysis and should be carefully considered in system design.
Furthermore, equation-based delay estimation offers a theoretical foundation for understanding system latency. To complement this, a packet transmission experiment was conducted to evaluate the actual data size variation according to data type.
The results, summarized in
Table 2, compare integer-type and character-type file data under identical transmission conditions, providing insight into the overhead caused by data encoding formats.
The experiment compares integer-type and character-type data to evaluate how different data representations affect transmission efficiency under identical network conditions. The EPOCH value was fixed at 3000 to ensure consistent iteration cycles for fair comparison, minimizing variability due to temporal execution differences. This setting allows for an isolated analysis of how data type impacts packet volume and total transmission size across increasing packet counts.
Figure 13 is a visual comparison of the change in data size (bytes) according to the increase in the number of packets for integer-type data and character-type data, respectively. It can be seen that character-type data require a larger data size than integer-type data throughout the entire section, which reflects the characteristic that character data consume more bytes for expression than integer data. In particular, this difference becomes more prominent as the number of packets increases, and it suggests that the selection of data types can have a substantial impact on communication efficiency, given that character-type data have a higher transmission cost in the same EPOCH environment, 3000.
The evaluation of packet transmission across varying data sizes serves to enhance the credibility of hardware-level performance assessment. Given the experimental reliance on FPGA-based simulation and loopback trials, it is critical to clarify the inherent limitations of loopback configurations. By emphasizing the controlled nature of the automotive testbed conditions, this methodology strengthens the validity of the experimental results and reinforces the trustworthiness of the proposed architecture under realistic operational constraints. In this study, the loopback configuration was implemented with a fixed buffer size of 1024 bytes and a transmission–reception cycle period of 5 μs per frame. The test environment utilized a deterministic timing window with jitter tolerance below ±0.3 μs to ensure repeatable signal propagation. A total of 10,000 loopback iterations were conducted, with each cycle recording latency, frame integrity, and CRC verification metrics. This quantifiable setup provides a reliable basis for benchmarking transmission stability and fault detection precision in real-time automotive communication systems.
7. Hardware Implementation and Validation of CRC-Corrected Ethernet Architecture
The increasing demand for high-reliability and high-bandwidth communication in modern vehicles has necessitated the development of fault-tolerant NIC architectures suitable for IVN environments. This chapter presents a detailed implementation and verification of a real NIC system designed to operate under deterministic conditions, providing resilient communication in automotive Ethernet applications.
Achieving real-time determinism in distributed NIC architectures mandates precise clock synchronization across nodes. The IEEE 1588 precision time protocol (PTP) is employed to enable ≦ 1 µs threshold synchronization [
11,
27,
28]. To implement this, a complete Ethernet MAC layer along with a software communication stack must be realized. The Synopsys Ethernet MAC IP core, which supports FIFO buffering, CRC validation, and traffic prioritization, is integrated and validated using RTL simulation tools [
26,
30].
To ensure uninterrupted communication under fault conditions, the NIC architecture incorporates the Redundant Ring Protocol (RRP) in compliance with IEC 62439-7. RRP provides seamless path redundancy and immediate failover in Ring topologies, which is essential for critical subsystems in IVNs such as gateway ECUs and sensor clusters. Studies have demonstrated that RRP achieves near-zero recovery time, reinforcing its suitability for real-time automotive networks [
29].
Also, the NIC implementation utilizes the Xilinx Zynq-7000 extensible processing platform (EPP), which combines a dual-core ARM Cortex-A9 processor running at 900 MHz with reconfigurable logic fabric [
30,
47]. The programmable logic is leveraged to implement custom MAC control modules, CRC generation units, and high-speed recovery logic. AXI interconnections ensure efficient communication between processing and logic domains.
Commercial-grade IVN NIC devices, such as RAPIEnet-based Ethernet switches, demonstrate the practicality of the proposed architecture. Typical configurations include 1 Gbps Ethernet with four-port switching capabilities, integrated TCP/IP recognition, CRC validation, and PWM signal handling [
30]. The underlying device driver stack utilizes Synopsys MAC IP cores with tri-level FIFO queues for differentiated quality of service [
31]. While commercial-grade IVN NICs such as RAPIEnet-based Ethernet switches offer integrated support for CRC validation, tri-level FIFO queuing, TCP/IP protocol recognition, and real-time PWM signal processing, their design is often optimized for deterministic in-vehicle control with minimal latency and fast error recovery within a 1 μs threshold. In contrast, TSN (Time-Sensitive Networking) architectures—based on IEEE 802.1 standards—provide time-aware scheduling and bounded latency guarantees over standard Ethernet but typically rely on centralized time synchronization and software-based traffic shaping, which can introduce added complexity in distributed systems. Furthermore, TTEthernet (Time-Triggered Ethernet), widely adopted in aerospace and high-integrity automotive systems, offers strict time-slot-based scheduling and fault-tolerant redundancy at the protocol level [
33]. However, TTEthernet’s deterministic behavior is achieved through a rigid time-triggered architecture, which can limit scalability and flexibility compared to RAPIEnet’s port-level redundancy and modular integration. Therefore, while TSN and TTEthernet emphasize synchronization and timing determinism at the network level, RAPIEnet distinguishes itself through hardware-centric fault detection and correction, enabling practical deployment in harsh automotive environments with high resilience and minimal computational overhead. These distinctions suggest that the choice of NIC architecture should be closely aligned with the intended application domain, favoring RAPIEnet for multi-port low-latency automotive use cases, and TSN or TTEthernet for systems requiring global temporal determinism or mixed-criticality scheduling.
Moreover, we also compare RAPIEnet and BroadR-Reach. Both are used in automotive Ethernet applications [
33], but they serve different system-level goals: First, RAPIEnet is ideal for deterministic fault-tolerant systems, such as safety-critical subsystems, thanks to its real-time Ring topology and redundancy support. Second, BroadR-Reach (IEEE 802.3bw) is better suited for non-critical high-volume data applications like infotainment, camera-based ADAS, and sensor streaming, where cost and weight efficiency are more important than hard real-time guarantees. In summary, RAPIEnet and BroadR-Reach represent two distinct approaches to in-vehicle Ethernet communication. RAPIEnet offers robust fault-tolerant capabilities with deterministic performance, making it ideal for safety-critical and real-time control applications. In contrast, BroadR-Reach provides a cost-effective lightweight solution optimized for bandwidth-efficient tasks such as infotainment and ADAS. Selecting the appropriate protocol depends on the system’s real-time requirements, fault recovery needs, and architectural constraints within the automotive network.
The NIC system incorporates a fault detection and correction mechanism composed of three key modules: a data comparison unit for validating cyclic redundancy check (CRC) values across multiple ports, a memory unit for temporarily storing verified data frames, and a response generator designed to issue corrective signals within a response time of 1 μs, as we presented in
Section 5. Simulation results confirm that the system can detect and recover from data corruption within the 1 μs threshold. These outcomes are validated through waveform analysis using Mentor Graphics ModelSim [
48,
49].
Therefore, this chapter demonstrates the design, implementation, and validation of a fault-tolerant NIC architecture suitable for automotive Ethernet environments. The integration of IEEE 1588 synchronization, RRP failover mechanisms, and CRC-based data correction provides a robust foundation for future IVN deployments. The Zynq-7000 EPP platform offers the necessary processing and reconfigurable resources for real-time diagnostics and error recovery.
To further evaluate the temporal stability and quality of service (QoS) under different network configurations, jitter metrics were also analyzed and are summarized in
Table S4 (in
Supplementary Materials). This complements the end-to-end delay analysis in
Table S3 by providing insight into the variance in packet arrival times, which is critical for assessing real-time communication performance in automotive Ethernet environments. Following the end-to-end delay assessment in
Table S4,
Table S5 provides a complementary analysis of jitter characteristics, which is essential for understanding the consistency of data transmission under the Double Star and Ring topologies.
Table S3 and Table S4 present the experimental results of this study, comparing end-to-end delay and time jitter across Double Star and Ring network topologies to evaluate overall communication performance. While the Ring topology showed slightly better performance for VoIP in terms of both delay and jitter, the Double Star configuration consistently outperformed it in nearly all other application types, particularly in real-time and control-critical transmissions. The key distinction lies in stability and predictability: the Double Star topology provided lower latency and more consistent delivery, making it more suitable for time-sensitive automotive applications.
To further demonstrate the effectiveness of the proposed NIC’s fault detection and correction mechanism under real-time constraints,
Figure 14 illustrates the internal state transitions during error injection and subsequent recovery. This simulation snapshot highlights how corrupted data are identified through CRC mismatch and subsequently repaired within one iteration, verifying the system’s ability to autonomously restore data integrity in ≦1 µs threshold timescales.
In detail,
Figure 14 illustrates the fault-tolerant behavior of the NIC communication system during error occurrence and recovery. In the top section, an error is detected during data transmission, indicated by the program status “.fault” and a mismatched CRC value, i.e., 55714, corresponding to an “injection fault” in the fault state. In the bottom section, the system enters the “.repair” state, and the corrupted data are restored to its correct form, verified by the restored CRC value, i.e., 2570. This confirms the system’s ability to detect transmission errors and autonomously perform data correction to maintain communication integrity.
The waveform in
Figure 15 presents a digital timing simulation generated using Mentor Graphics’ ModelSim [
48,
49], illustrating the behavior of an NIC architecture during data transmission. The CRC outputs of both NIC 1 and NIC 2 are shown to remain in a high-impedance (‘Z’) state, indicating that the CRC computation module was not yet functionally implemented at the time of simulation. In contrast, the bottom section of the waveform confirms the successful reading of external file data into the system, which is subsequently used as dynamic input to the data path logic. This simulation validates the integrity of file-based data injection while concurrently highlighting a critical missing functional component—CRC logic—for complete transmission verification.
Figure 16 illustrates a Verilog-based test bench structure and data injection logic designed for evaluating fault detection and recovery mechanisms in Network Interface Controllers (NICs). The simulation reads hex-formatted input and controls test progression based on a counter mechanism triggered on each positive clock edge. In
Figure 16a, the hierarchy confirms the instantiation of two NIC modules, each containing four distinct communication ports, reflecting a parallelized communication environment.
Figure 16b shows the associated behavioral Verilog code, where external input data are sequentially read from a file and injected into the NIC under test on every rising clock edge. This setup enables systematic fault simulation and recovery validation by automating input vector delivery and integrating a controlled test progression counter, thereby facilitating reproducible hardware validation for real-time data path integrity.
The environment for testing the fault tolerance program in the above manner is the one that verifies the algorithm to be applied in an NIC environment with four ports without applying socket communication. For the actual simulation, the method was changed to use the Verilog codes in
Figure 13. In addition, the algorithm that prioritizes and processes the transmission of various types of data, such as integer-type or non-integer type file data, as well as the control and message data currently being handled, was verified by testing with actual data.
Figure 17 illustrates the functional simulation results of a 16-bit CRC computation module, as observed in a waveform generated using the commercial tool Mentor Graphics ModelSim [
25]. The signal data_in [7:0] represents sequential byte-level input data applied synchronously with the rising edge of the system clock (clk), while the crc_en signal enables CRC calculation. The evolving crc_out [15:0] values show valid transitions corresponding to the applied input, confirming the correct behavior of the LFSR-based CRC generator. The waveform also indicates that reset (rst) is held inactive, ensuring uninterrupted register operation, which validates the module’s ability to perform real-time CRC encoding in streaming data environments.
Furthermore,
Figure 18 provides a detailed waveform-level view of an NIC-based fault injection simulation, capturing multiple signal transitions during port communication and CRC monitoring. The PORT_ip and PORT_2 signals exhibit synchronized transitions driven by external data injection, confirming the successful sequential loading of test vectors from a memory-mapped file. Notably, the CRC_check lines remain in a high-impedance state (‘Z’), indicating that the CRC generation or comparison logic is either disabled or not yet implemented at this simulation stage. The temporal alignment of the clk, RESET, and read_data signals ensures reproducible timing control, allowing for the precise evaluation of functional behavior under controlled fault scenarios within a digital hardware test bench environment. The synchronization of data loading and clock edges demonstrates deterministic timing behavior for test-bench-controlled input delivery.
Therefore, the experimental results and waveform analyses collectively validate the functional correctness and timing determinism of the proposed fault-tolerant NIC system architecture. Through file-driven input vector injection, LFSR-based CRC logic, and real-time recovery mechanisms, the system demonstrates robustness in detecting and repairing corrupted data across multi-port communication paths. The high-impedance CRC outputs observed during simulation emphasize the need for complete CRC pipeline integration, while the synchronous behavior across clock domains ensures reliable data propagation. Overall, the hardware–software co-simulation framework effectively supports controlled fault evaluation and lays the groundwork for scalable real-time embedded network diagnostics.