The proposed closed-loop HIL simulation framework emulates the real-time dynamics of a semi-truck and replicates vehicle behavior under various driving conditions, including acceleration, braking, road-surface variation, and steering interaction. This framework provides testing inputs to the ECU that closely mirror real-world scenarios by using the Trucksim simulator, which generates diverse driving conditions and verifies the ECU response under these conditions. The FPGA-based prediction module functions as a virtual counterpart to the physical ABS by implementing a TCN-based virtual wheel speed sensor.
Sensor signals are emulated through signal generators and transmitted to the ECU, while feedback from the ECU is collected via signal-acquisition modules. The ABS input signals of the FPGA design are multiplexed with feedback signals from the ECU using a multiplexer (MUX). The MUX selects between FPGA-generated ABS signals selected from the Trucksim dataset and ABS feedback signals received from the ECU, enabling real-time comparison and validation.
3.1. Simulation Model
The first step focuses on developing the simulation model used for ABS ECU testing and data generation. To validate the performance of the ABS ECU and generate training data for the TCN model, TruckSim 2022 was integrated with MATLAB/Simulink R2022b.This co-simulation platform enables realistic reproduction of vehicle dynamics under different braking, steering, and speed conditions.
TruckSim provides the vehicle dynamics environment, while MATLAB/Simulink generates the input profiles and manages the data-logging process. As shown in
Figure 4, the integrated Simulink–TruckSim model receives input signals from MATLAB functions to create diverse driving scenarios. These functions generate key variables such as steering angle, brake command, and initial vehicle speed, using randomized braking intensity and initial-speed values to improve dataset diversity. The generated signals are then applied to the TruckSim “VS_SF” block to simulate vehicle behavior and export ABS-related time-series data for TCN training, validation, and FPGA-based HIL testing. The inputs to the TruckSim “VS_SF” block include:
IMP_SPEED → initial speed (km/h);
IMP_PCON_BK → brake intensity and brake rate (m/s2);
IMP_STEER_SW → initial steering angle and steering rate.
The variability introduced in each test scenario generates a comprehensive dataset for ABS validation and TCN model development. The simulation framework improves dataset diversity by varying critical factors such as initial speed, braking intensity, and steering conditions. This dataset enables the AI-based prediction model to learn representative vehicle dynamics and supports real-time FPGA-based HIL simulation.
The key parameters summarized in
Table 1 are designed to simulate diverse driving scenarios based on randomized input generation. The initial vehicle speed is randomly selected in the range of 20 to 120 km/h to reflect different urban and highway driving conditions. Brake-related inputs include the brake delay, brake rate, and brake intensity. Steering angle parameters are constrained to ensure vehicle stability and rollover prevention, particularly under road-friction conditions such as dry asphalt.
Although the dataset is generated synthetically, scenario realism is supported by the use of the TruckSim physics-based vehicle model coupled with MATLAB/Simulink input generation. The randomized input parameters were selected within physically meaningful ranges for initial speed, brake delay, brake rate, brake intensity, steering angle, steering delay, and steering rate. These variations generate diverse braking and steering conditions, including normal braking, partial braking, delayed braking, braking during steering, wheel speed transients, and ABS modulation behavior. In addition, generated simulations were filtered using vehicle-stability constraints, including roll-angle limits, to remove physically unrealistic or unstable cases. Therefore, the dataset is intended to provide broad coverage of representative synthetic ABS operating conditions rather than exhaustive coverage of all possible real-world events.
3.2. TCN Learning Model
Recurrent Neural Networks (RNNs) are widely applied to sequential data tasks such as time-series analysis. However, conventional RNN-based models, including Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Nonlinear AutoRegressive with eXogenous input (NARX) networks, can suffer from inefficiencies when processing long sequences, gradient-vanishing problems, and limited parallel processing. TCNs [
27] address these issues by employing causal and dilated convolutions, which enable efficient parallel processing while capturing long-range temporal dependencies.
The predictive capability of TCNs was empirically validated through a comparative study with LSTM, GRU, and NARX models. As shown in
Table 2, the TCN achieved the lowest MSE and MAE on the ABS ECU dataset, showing the best prediction accuracy among the evaluated sequence models. These results motivated the selection of TCNs for FPGA deployment, where low latency and high throughput are essential for real-time ECU validation.
Trucksim-generated data are used to train the TCN model in Python 3.10 to replicate ABS ECU behavior.
By capturing temporal patterns in vehicle sensor data, the TCN predicts ABS sensor values for different driving conditions used in closed-loop HIL simulations. The dataset is generated by co-simulating Trucksim with Simulink, and Trucksim exports time-series ABS ECU-related signals in CSV format. Using 14 input signals representing driver commands and ECU feedback, the TCN model is trained to predict 17 output signals. The TruckSim parameters used as inputs and outputs for the ABS TCN model are summarized in
Table 3.
The TCN is designed for sequential data processing and includes the following features: causal convolution, same-length input and output mapping, receptive field expansion through dilation, and parallel processing. Causal convolution ensures that outputs depend only on past and present inputs, avoiding future information leakage [
28]. Same-length input and output mapping supports direct sequential-data prediction [
29]. Dilated convolutions expand the receptive field efficiently, capturing long-range dependencies for ABS and steering-control applications [
30].
The TCN architecture shown in
Figure 5 consists of an input layer, encoder, three stacked temporal convolution cells, decoder, and output layer. Although the physical input set contains 14 external signals, the implemented model also uses the previous values of the 17 predicted outputs as autoregressive feedback. Therefore, the TCN receives 31 input channels in total and predicts 17 output channels at each time step. In the implemented model used in this study, the TCN uses three temporal convolution cells, 128 hidden channels, a kernel size of 4, and a base dilation factor of 3.
Although tire–road friction, tire condition, and load transfer strongly influence wheel speed dynamics during braking, these variables were not explicitly included as independent TCN input channels in the current implementation. The objective of this work was to build a virtual sensor using signals available through the implemented ABS ECU HIL interface, rather than relying on internal simulator-only variables. In practical ECU testing, road-friction coefficients and tire-condition parameters are often not directly measured by the ECU and may not be available as real-time input signals. Their effects are instead reflected indirectly through the vehicle response, including wheel speed transients, vehicle speed, yaw rate, lateral acceleration, roll angle, and load-related dynamics.
In the implemented autoregressive TCN, these effects are implicitly captured through the previous 17 predicted output signals that are fed back to the model. Future extensions will evaluate the inclusion of explicit road-friction profiles, tire-condition parameters, and road-surface labels as additional inputs or scenario descriptors to improve generalization under low-friction, split-friction, and abrupt friction-transition braking events.
For this configuration, the effective receptive field is 40 samples. Since the TruckSim data are sampled at 60 Hz, this corresponds to approximately 0.67 s of temporal context. This receptive field was selected to capture short-term ABS dynamics, including wheel speed transients, brake pressure variations, and ABS modulation behavior, while keeping the model compact enough for deterministic low-latency FPGA implementation.
The model was trained for 140 epochs using the Adam optimizer with a learning rate of and a batch size of 16. Before training, all input and output variables were normalized using MinMax scaling. The primary loss function was the mean squared error (MSE). To improve closed-loop stability and reduce high-frequency oscillations in the predicted signals, an additional first-difference loss was added to the TCN objective and weighted by . Gradient clipping with a maximum norm of 0.15 was applied during training. Small Gaussian noise with a standard deviation of was added to the non-Boolean input channels to improve robustness.
No dropout layer was used in the implemented TCN. Instead, closed-loop prediction stability was improved through first-difference regularization, input-noise injection, and gradient clipping.
The receptive field
R of a TCN determines how many input time steps influence a single output value and is calculated as
where
L is the number of TCN layers,
is the kernel size at layer
l, and
is the dilation factor at layer
l. In the implemented model, the TCN has three temporal convolution layers, a constant kernel size of
, and a base dilation factor of 3. Therefore, the dilation factors are
,
, and
. The effective receptive field is therefore
Thus, each prediction uses 40 previous samples. Since the TruckSim data are sampled at 60 Hz, this corresponds to approximately
This temporal context was selected to capture short-term ABS dynamics, including wheel speed transients, brake pressure variations, and ABS modulation behavior, while keeping the TCN compact enough for deterministic low-latency FPGA implementation.
The primary metrics used to evaluate the TCN model are the mean squared error (MSE) and the mean absolute error (MAE). The MSE is defined as
where
is the ground-truth value,
is the predicted value, and
n is the total number of samples. The MAE is defined as
MSE penalizes larger prediction errors more strongly, while MAE provides a direct measure of the average absolute prediction deviation.
TCN Hyperparameter Ablation Study
The ablation results include both 64-channel and 128-channel candidate architectures. The 64-channel models were used to evaluate the effect of temporal depth and dilation factor with lower model capacity, while the 128-channel models were used to examine the effect of increased model capacity. Although TCN-D uses a deeper four-cell architecture and a larger kernel size, it did not outperform the selected TCN architecture. The selected TCN, with 128 hidden channels, three temporal convolution cells, kernel size 4, and base dilation factor 3, achieved the best overall prediction accuracy, with an MSE of and an MAE of 0.0099. Therefore, this configuration was selected for FPGA implementation because it provides the best trade-off between prediction accuracy, temporal context, and deterministic hardware deployment.
Only the selected TCN model was implemented and synthesized on the FPGA. Therefore, the ablation study reports GPU-based prediction accuracy for the candidate architectures, while FPGA latency, resource utilization, and power consumption are reported for the selected architecture. Increasing the number of temporal cells, kernel size, or channel width increases the number of convolution operations, stored activations, and coefficient-memory requirements. Therefore, deeper or wider TCN variants are expected to require higher LUT, DSP, and BRAM usage.The preliminary GPU-based ablation results for the TCN hyperparameters are summarized in
Table 4.
3.3. FPGA Implementation
The next stage of the prediction module is the FPGA implementation of the trained TCN model. The FPGA implementation enables real-time sensor-data processing for closed-loop HIL simulation by executing the TCN inference directly on the programmable logic. The trained weights and biases generated by the Python-based model are mapped to FPGA memory resources and used to predict ABS-related sensor values, including wheel speed signals.
Although the physical input set contains 14 external signals, the implemented autoregressive TCN also feeds back the previous values of the 17 predicted outputs. Therefore, the FPGA implementation processes 31 input channels in total and generates 17 output channels at each time step.
Figure 6 shows the FPGA architecture of the TCN prediction module. The design is organized into modular hardware blocks, including input buffering, BRAM-based coefficient memories, convolution blocks, ReLU activation, residual-addition blocks, optimized fixed-point MAC units, and output-interface logic. The TCN finite-state machine (FSM) and inner control blocks schedule the execution of the model layers and coordinate data movement between memory and computation units.
A key step in the hardware implementation is mapping the trained TCN weights and biases to FPGA memory resources. The pre-trained parameters are converted into Q1.15 fixed-point format and stored in coefficient files used to initialize ROM blocks during bitstream generation. During inference, the FSM controller generates the required ROM addresses and supplies the stored coefficients directly to the corresponding MAC units, avoiding external memory access for model parameters.
The MAC unit is one of the most resource-demanding components in neural network accelerators. Therefore, optimizing the MAC design is essential to reduce FPGA resource utilization while maintaining deterministic real-time performance. The proposed implementation uses an optimized fixed-point MAC unit (OFM) [
31]. The OFM consists of a multiplication unit and an accumulator operating on Q1.15 fixed-point operands, where one bit is assigned to the integer/sign part and fifteen bits are assigned to the fractional part. This format provides efficient FPGA resource usage while maintaining sufficient numerical precision for ABS sensor prediction.
Hardware Timing, Quantization, and Scalability Analysis
The FPGA implementation was designed to provide deterministic real-time inference rather than only high average throughput. The TCN datapath is controlled by a synchronous finite-state machine that schedules the encoder, temporal convolution cells, ReLU stages, residual addition, and decoder in a fixed sequence. The control and handshake signals are registered to reduce long combinational paths and simplify timing closure at the 100 MHz operating frequency. The convolution stage is implemented using a parallel multiplier-bank/MAC structure, while the ReLU and residual-addition operations are organized as separate pipeline stages. This organization provides deterministic cycle-by-cycle behavior and avoids uncontrolled memory-access conflicts during inference.
The trained TCN parameters are stored locally in FPGA BRAMs as coefficient memories initialized during bitstream generation. Therefore, inference does not require external memory access for weights or biases. The BRAM-based weight storage supplies the MAC units directly, while the AXI-Stream FIFO interface between the programmable logic and the processing system decouples the TCN pipeline from USB3 communication with the HIL simulator. As a result, communication latency is isolated from the internal TCN computation, supporting the measured PL inference latency of 5.45 μs and the total end-to-end latency of 10.61 μs.
The Q1.15 fixed-point format was selected to balance numerical precision and FPGA resource efficiency, following the optimized fixed-point MAC design presented in [
31]. Since the input and output variables are MinMax-normalized, most values lie within the representable range of Q1.15. The quantization step is
, which provides sufficient fractional resolution for the normalized ABS-related signals. In the fixed-point MAC, each multiplication produces an intermediate wider product that is scaled back to Q1.15 format using normalization and rounding. These operations reduce truncation error and help limit numerical drift during closed-loop prediction.
The effect of fixed-point quantization was evaluated by comparing the Python floating-point model, the FPGA floating-point implementation, and the FPGA Q1.15 fixed-point implementation. The detailed fixed-point accuracy results and FPGA resource-utilization comparison are reported in the Results and Discussion section. The Q1.15 implementation was selected to support deterministic low-latency execution and resource-efficient real-time ECU testing.
A complete sensitivity analysis over multiple fixed-point formats, such as Q2.14, Q3.13, or Q4.12, was not performed in the present study because each format requires a separate quantization, FPGA implementation, timing closure, and synthesis flow. Future work will include a systematic fixed-point word-length study to evaluate the trade-off between prediction accuracy, numerical stability, overflow margin, LUT/DSP/BRAM usage, and timing performance.
From a scalability perspective, the fixed-point implementation leaves substantial resource headroom on the ZCU102 platform. The fixed-point MAC version reduces LUT and DSP usage compared with the FP16 implementation, enabling the integration of additional virtual sensors, more wheel speed channels, or larger TCN models. Increasing the TCN channel width, depth, or kernel size affects multiple FPGA resource categories. Additional weights and past activations increase BRAM requirements, while additional convolution operations increase DSP and LUT usage. The architecture can therefore be scaled in two ways: by increasing parallel MAC resources to preserve inference latency, or by reusing the existing MAC units across more cycles when a larger latency budget is acceptable.
3.4. Closed-Loop HIL Simulation
Closed-loop HIL simulation provides real-time interaction between the ECU and the virtual environment. Data are transferred through the signal-generation module to the ECU, and feedback signals from the ECU are transferred from the acquisition module to the prediction module. Signal generation and acquisition components emulate system inputs and outputs, including wheel speed sensors, brake pressure sensors, steering angle sensors, and solenoid valves. The FPGA-based closed-loop ABS ECU testing workflow is shown in
Figure 7.
The PS/PL AXI FIFO interface enables low-latency communication between the processor system (PS) and programmable logic (PL) in the FPGA. The PS includes an ARM processor that manages the USB3 connection to the simulator. The USB3 interface provides high-bandwidth communication for real-time exchange of sensor and actuator data between the simulator and the FPGA prediction module.
The FPGA-based prediction module acts as a virtual replica of the physical ABS. The model operates in two phases: an initialization open-loop phase and a prediction closed-loop phase. In the open-loop phase, short sequences of ground-truth values from Trucksim are provided to the FPGA-based TCN model to initialize the simulation context. In the closed-loop phase, the TCN predicts wheel speed sensor values based only on ABS control signals. The predicted sensor values are fed back into the model as inputs, creating a feedback loop that allows continuous prediction without further dependency on ground-truth data.
Because the closed-loop phase feeds the predicted outputs back into the TCN input sequence, recursive prediction errors may accumulate over time. To reduce this risk, the implemented TCN was trained with first-difference regularization, small Gaussian input-noise injection, and gradient clipping, and the closed-loop prediction phase is initialized using a short ground-truth context sequence. These mechanisms help reduce high-frequency oscillations and support short-term closed-loop stability during the evaluated ABS scenarios. However, a dedicated long-duration prediction-drift analysis under extended braking maneuvers, abrupt steering transitions, and long continuous closed-loop operation was not performed in the present study. Future work will therefore include long-duration closed-loop experiments and explicit drift metrics over extended prediction horizons to further evaluate recursive prediction stability.