Collaborative Fusion Attention Mechanism for Vehicle Fault Prediction

Jia, Hong; Qian, Dalin; Chen, Fanghua; Zhou, Wei

doi:10.3390/fi17090428

Open AccessArticle

Collaborative Fusion Attention Mechanism for Vehicle Fault Prediction

by

Hong Jia

^1,2,3,

Dalin Qian

¹,

Fanghua Chen

^2,3,*

and

Wei Zhou

^2,3

¹

School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China

²

Automobile Transportation Research Center, Research Institute of Highway Ministry of Transport, Beijing 100088, China

³

Key Laboratory of Operation Safety Technology on Transport Vehicles, Research Institute of Highway Ministry of Transport, Beijing 100088, China

^*

Author to whom correspondence should be addressed.

Future Internet 2025, 17(9), 428; https://doi.org/10.3390/fi17090428

Submission received: 5 August 2025 / Revised: 3 September 2025 / Accepted: 17 September 2025 / Published: 19 September 2025

(This article belongs to the Topic Big Data and Artificial Intelligence, 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

In this study, we investigate a deep learning-based vehicle fault prediction model aimed at achieving accurate prediction of vehicle faults by analyzing the correlations among different faults and the impact of critical faults on future fault development. To this end, we propose a collaborative modeling approach utilizing multiple attention mechanisms. This approach incorporates a graph attention mechanism for the fusion representation of fault correlation information and employs a novel learning method that combines a Long Short-Term Memory (LSTM) network with an attention mechanism to capture the impact of key faults. Based on experimental validation using real-world vehicle fault record data, the model significantly outperforms existing prediction models in terms of fault prediction accuracy.

Keywords:

vehicle maintenance; fault prediction; graph attention; mechanism; attention mechanism

1. Introduction

Vehicle fault prediction represents a critical component within contemporary automotive industry and service ecosystems. By proactively identifying potential failures, fault prediction mechanisms substantially enhance vehicular safety and reliability, thereby preventing major system failures and associated hazards. Furthermore, fault prediction contributes to the reduction in maintenance expenditures. Whereas conventional post-failure repair strategies typically incur substantial expenses, predictive maintenance facilitates early intervention upon initial fault detection, consequently mitigating severe damage and expensive repairs. Additionally, fault prediction extends vehicular service life through scientifically informed maintenance strategies that mitigate component degradation and wear, thereby prolonging the operational lifecycle.

The importance of fault prediction is further underscored by recent advancements in prognostics and health management (PHM). As highlighted by [1], accurate estimation of remaining useful life (RUL) provides significant economic merits by optimizing maintenance strategies and avoiding potential human casualties. Furthermore, [2] demonstrates how sensorless strategies utilizing motor driver data can eliminate the need for expensive sensor installation and data management, enhancing cost-effectiveness across diverse industrial applications. The work in [3] reveals that intelligent fault diagnosis and RUL prediction are essential to reliable operation of mechanical systems, enhancing safety, availability, and productivity in manufacturing industries.

During vehicle fault progression, concurrent faults may occur. For instance, an electronic control unit (ECU) fault might cause simultaneous malfunctions in other vehicle components, such as the engine, transmission, and in-car entertainment system. Likewise, a generator fault might prevent the battery from recharging, affecting vehicle ignition and the operation of all electronic systems. Moreover, during vehicle fault progression, certain critical faults significantly impact the development of subsequent faults. For example, a malfunctioning engine temperature sensor can cause engine overheating, which not only degrades engine performance but may also lead to cylinder block damage or increased fuel consumption, creating a cascading effect. Minor faults in the braking system, such as uneven brake pad wear or minor hydraulic leaks, may develop into severe brake fault issues if not promptly detected and addressed, severely endangering driving safety. Despite these well-documented patterns of fault propagation, most existing prediction methods suffer from critical limitations: they typically focus on isolated subsystems, neglect the complex interdependencies between different fault types, and fail to model the cascading effects where a single critical fault triggers subsequent failures across multiple vehicle systems.

Building upon this research foundation, this study proposes a vehicle fault prediction framework that employs multiple attention mechanisms to capture inter-fault correlations and the significant influence of critical faults on subsequent failure progression. Accurate prediction of future faults is enabled by the comprehensive modeling and analysis of historical fault patterns utilizing multiple attention mechanisms. The primary contributions are as follows:

To maximize the correlation between different faults and the impact of key faults on future fault prediction, this study proposes a vehicle fault prediction model utilizing a collaborative modeling approach through multiple attention mechanisms.
To fully exploit the correlation between diverse faults, this research study proposes a fault correlation information fusion representation module grounded in a graph attention mechanism.
To accurately capture the significant impact of key faults on subsequent faults, this research study introduces a novel learning method synergizing the Long Short-Term Memory (LSTM) network with an attention mechanism.
Extensive experiments were conducted using real-world vehicle fault data, revealing that our model significantly outperforms existing advanced prediction models.

2. Related Work

Concurrent with rapid advancements in the automotive industry, vehicle fault prediction methodologies have undergone substantial evolution. The transition from rudimentary statistical prediction methods to sophisticated knowledge-integrated models, coupled with the current proliferation of vehicle condition data, has propelled machine learning- and deep learning-based predictive maintenance techniques to the forefront of research, substantially enhancing prediction accuracy. Vehicle fault prediction techniques are categorized based on the related vehicle components into electronic and safety system prediction, chassis and handling system prediction, and power and drivetrain prediction.

2.1. Power and Drivetrain Fault Prediction

In the domain of power and drivetrain fault prediction, researchers have developed numerous techniques and frameworks to improve fault detection accuracy and efficiency. Wang et al. [4] proposed a diagnostic approach based on Extended Neural Network Type 1 (ENN-1), which was utilized to diagnose vehicle engine faults. ENN-1 exhibits a simple structure, rapid learning capability, and robustness to data noise, demonstrating effectiveness comparable to multi-layer neural networks and k-means classification methods in vibration fault detection. Wu and Kuo [5] developed a vehicle generator fault diagnosis framework based on the Adaptive Neuro-Fuzzy Inference System (ANFIS). This framework employs discrete wavelet analysis for feature extraction and the ANFIS for fault condition classification, with experimental results indicating its potential for generator fault diagnosis. Bafroui and Ohadi [6] explored the application of wavelet energy and Shannon entropy for fault detection in gearboxes under variable-speed conditions, integrating wavelet transform and resampling techniques with non-stationary vibration signals and utilizing a feedforward multi-layer perceptron neural network to classify extracted features, thereby achieving precise gear fault diagnosis. Wong et al. [7] proposed a Probabilistic Committee Machine (PCM) framework that integrates multiple signals and a Sparse Bayesian Extreme Learning Machine (SBELM) for engine fault diagnosis. Diagnostic accuracy is enhanced through a novel probabilistic integration approach capable of detecting both single and simultaneous faults. Mohammadi et al. [8] proposed a methodology based on artificial neural networks and 3D sensitivity modeling for fault diagnosis in Proton Exchange Membrane Fuel Cells (PEMFCs) in automotive applications to discern current and temperature distributions under different fault conditions. Sankavaram et al. [9] examined fault diagnosis in regenerative braking systems of hybrid vehicles, employing a data-driven approach alongside multivariate data reduction techniques to achieve efficient fault isolation within system memory constraints. Yang et al. [10] investigated a battery ESC fault diagnosis method based on fractional-order modeling and Random Forest Classification, enhancing battery fault detection through comparative analysis of prediction accuracy across various models. Wang et al. [11] proposed an engine fault diagnosis model (WPA-ANN) based on acoustic intensity analysis and incomplete wavelet packet analysis, achieving efficient fault detection through noise feature extraction and ANN modeling. Zuo et al. [12] proposed a recurrent neural network (RNN) model incorporating attention mechanisms to enhance prognostic accuracy in proton exchange membrane fuel cells (PEMFCs) under dynamic test conditions. Zhang et al. [13] proposed a hybrid deep belief network (HDBN) model for vehicle drive system fault diagnosis and investigated three data fusion methods to improve diagnostic accuracy. Branco and Fontanela [14] developed a digital twin methodology for predicting the remaining useful life of electric vehicle batteries, thereby optimizing maintenance processes through systematic asset management approaches. Molaie et al. [15] proposed a neural network-based method for estimating gear safety factors from ISO-based simulations, integrating data-driven modeling with standard-based simulation to develop efficient digital twins for gear transmission systems. This hybrid approach reduces computational costs while preserving accuracy, demonstrating the potential of neural networks to support real-time condition monitoring and predictive maintenance in gearbox systems. Chen et al. [16] developed a data-driven Long Short-Term Memory neural network model for insulation fault diagnosis in fuel cell vehicles. Their approach utilizes a robust locally weighted scatterplot smoothing method for data filtering and achieves a high coefficient of determination of 99.78% in identifying insulation resistance value anomalies caused by deionizer failure and other reliability issues.

2.2. Chassis and Operating System Fault Prediction

Recent years have witnessed significant advancements in fault prediction and diagnosis research in vehicle chassis and operating systems. Researchers have explored various techniques and methodologies to enhance vehicle safety, stability, and comfort. Ghimire et al. [17] proposed an integrated model and data-driven fault detection and diagnosis approach for electric power steering (EPS) systems. They developed a physics-based model of the EPS system, conducted fault injection experiments to derive dependencies of faulty sensor measurements, and investigated various fault detection and diagnosis (FDD) schemes to detect and isolate faults. Yin and Huang [18] investigated vehicle suspension system fault detection based on fuzzy empirical C-means clustering and Fisher discriminant analysis techniques. This method requires only accelerometer measurements at the four corners of the vehicle suspension, reducing complexity compared with traditional approaches. Wang and Yin [19] proposed a data-driven fault diagnosis method for fault detection and root-cause isolation in vehicle suspension systems, employing clustering techniques and Fisher discriminant analysis. This approach is characterized by its independence from suspension models or pre-defined fault characteristics, effectively treating different spring coefficients as a single fault. Jegadeeshwaran and Sugumaran [20] achieved 96% classification accuracy using vibration analysis and clone selection classification algorithms for hydraulic braking system fault diagnosis. This study introduced novel approaches for vibration-based condition monitoring of braking systems. Capriglione et al. [21] investigated soft displacement sensors for rear suspensions in two-wheeled vehicles, focusing on real-time instrumented fault detection and isolation through recursive artificial neural networks and experimental design. Ghimire et al. [22] developed a fault diagnosis method for electric power steering systems based on rough set theory, emphasizing robustness to missing data and enhanced fault classification accuracy. Alamelu Manghai and Jegadeeshwaran [23] employed wavelet features to extract information from vibration signals, utilizing machine learning analysis of signals obtained from hydraulic brake test setups, achieving 99.45% classification accuracy. Zehelein et al. [24] demonstrated the potential of deep learning in vehicle diagnostics by analyzing electronic stability control sensor signals using convolutional neural networks (CNN) to identify defects in vehicle shock absorbers. Jeong et al. [25] proposed a fault diagnosis algorithm based on support vector machines and FIO, designed for quarter-vehicle test rigs to independently detect faults in suspension system sensors, significantly reducing diagnostic {algorithm design effort. Siegel et al. [26] developed a densely connected convolutional neural network model for tire condition assessment, enhancing awareness of rubber degradation risks and improving vehicle safety through crack identification via smartphone photography.

2.3. Electronic and Safety Systems Fault Prediction

In recent years, research on fault prediction for electronic and safety systems has garnered considerable attention, yielding significant advancements across multiple domains. Nowaczyk et al. [27] introduced a machine learning-based approach for predicting truck compressor faults using recorded vehicle data. The study addressed limitations of standard attribute–value knowledge representation and proposed a novel AQ method based on the Michalski classical rule induction algorithm, accounting for the composite performance of each truck and offering enhanced flexibility in fault prediction. Cerqueira et al. [28] proposed a data mining pipeline for predictive maintenance of pneumatic systems in heavy trucks, incorporating feature selection, meta-feature engineering, bias sampling, and boosted tree modeling, with experimental results demonstrating that meta-feature engineering and bias sampling are critical to enhancing classifier performance. Rengasamy et al. [29] explored dynamically weighted loss functions for prognostics and health management of sensor systems, demonstrating substantial improvements in prediction and fault detection rates through deep learning models. Fang et al. [30] developed a hybrid framework for fault diagnosis in autonomous vehicles, integrating hybrid data analysis approaches with fuzzy PID control. Their study employed discrete wavelet transform for denoising and feature extraction and utilized an autoencoder based on extreme learning machines for anomaly detection. Biddle and Fallah [31] introduced an algorithm for multi-fault detection, identification, and prediction in autonomous vehicle controllers, leveraging support vector machine (SVM) techniques and validating effectiveness using MATLAB/IPG CarMaker. Safavi et al. [32] explored an architecture for detecting, identifying, isolating, and predicting multi-faults in multi-sensor systems, such as autonomous vehicles, employing two deep neural network architectures to facilitate effective fault management in complex systems. Xu et al. [33] proposed a high-accuracy health prediction model for sensor systems based on improved correlation vector machine-integrated regression, enhancing prediction accuracy by introducing uncertainty quantification in soft sensors for performance variable extraction. Giordano et al. [34] presented a data-driven strategy for predictive maintenance, focusing on oxygen sensor clogging prediction in diesel engines. They introduced the PREPIPE pipeline, which matches state-of-the-art deep learning performance while maintaining interpretability. Ren et al. [35] proposed a high-risk test scenario generation framework for autonomous vehicles at roundabouts using naturalistic driving data. Their approach utilizes a time-series generative adversarial network (TimeGAN) to create realistic safety-critical driving trajectories, addressing the scarcity of critical scenarios in virtual testing and accelerating the testing procedure for autonomous vehicles.

2.4. Critical Synthesis and Research Gap

Existing research has made significant progress in vehicle fault prediction, yet most methods remain confined to isolated diagnosis in specific subsystems (e.g., engines and batteries) or fault classification based on static features, overlooking the complex interdependencies within the vehicle as an integrated system and the temporal evolution of faults. Specifically, current approaches often ignore intrinsic correlations among different fault types, failing to model cross-system fault propagation and cascading effects. Moreover, they lack the explicit modeling of critical precursor events from historical faults, limiting their ability to capture early signs of future failures. Although a few recent studies have begun to explore multi-fault scenarios [32], their focus remains on detection and isolation within specific modules rather than system-level predictive forecasting based on historical sequences and global fault dependencies.

To address these limitations, this paper proposes a Collaborative Fusion Attention Mechanism (CFATM) for vehicle-level fault prediction. The model explicitly captures global fault correlations using a graph attention network (GAT), learning the weighted relationships among all fault codes. Furthermore, it integrates Long Short-Term Memory (LSTM) networks with a temporal attention mechanism to emphasize critical historical fault events that significantly influence future predictions. By synergizing structural and temporal dependencies, CFATM provides a holistic and accurate approach to fault forecasting, offering a novel solution for comprehensive vehicle health management.

3. Proposed Model

3.1. Framework

The proposed Collaborative Fusion Attention Mechanism for Vehicle Fault Prediction (CFATM) framework for vehicle maintenance demand prediction is presented in Figure 1. Specifically, the framework comprises three primary modules.

Fusion representation of fault correlation information. This module employs a graph attention mechanism to learn and integrate correlation features among different faults, constructing a fused representation of their interdependencies.
Key temporal information learning. This module combines a Long Short-Term Memory (LSTM) network with an attention mechanism to extract and weight critical fault information from historical sequences.
Synthesis, integration, and prediction. This final module synthesizes the outputs from the previous two components using an attention mechanism for enhanced integration. The resulting comprehensive representation is subsequently used for fault prediction.

3.2. Fusion Representation of Fault Correlation Information

The complete set of fault combinations observed during vehicle fault progression forms a global fault co-occurrence graph G. This graph encapsulates the correlation patterns between different vehicle faults. The nodes of G represent the faults P in the set

{p_{i}}_{i = 1, 2, \dots, | P |}

. An edge between a fault pair

(p_{m}, p_{n})

is established if they co-occur within the maintenance records of a single vehicle. The weight of this edge is symmetric, meaning that weights for

(m, n)

and

(n, m)

are identical. The frequency of co-occurrence

t_{m n}

across all vehicle records is then computed to determine the edge weight. Consequently, for a given fault

p_{m}

, we define its relational set

Δ_{m} = \{p_{n} | \frac{t_{m n}}{\sum_{n = 1}^{| P |} t_{m n}}\}

. Let

q_{m} = \sum_{p_{n} \in Δ_{m}} t_{m n}

denote the total co-occurrence frequency for

p_{m}

. The graph G is represented by a co-occurrence matrix

A \in R^{| P | \times | P |}

, defined as shown in Equation (1).

A_{i j} = \{\begin{matrix} 0 & if m = n \\ \frac{t_{m n}}{q_{m}} & otherwise \end{matrix}

(1)

Leveraging this co-occurrence matrix, the fusion of graph structural information with historical fault data is accomplished using a graph attention network (GAT) [36] followed by a single-head attention mechanism [37]. This process is formalized in Equation (2).

\begin{matrix} E_{t} & = GAT (A, F_{t}) \\ O_{t} & = Atten (E_{t}, F_{t}, F_{t}) \end{matrix}

(2)

The definition of attention mechanism in the above equation is shown in Equation (3):

Atten (Q, K, V) = softmax (\frac{Q W_{q} {(K W_{k})}^{T}}{\sqrt{d}}) V W_{V}

(3)

where d is the dimension of attention and

W_{q}, W_{k} \in R^{| P | \times d}, W_{v} \in R^{| P | \times | P |}

are the weights of attention.

Then the fault phasor fusion representation results are obtained using the stack method.

3.3. Key Temporal Information Learning

Within the progression of vehicle faults, certain critical faults significantly influence the development of future faults. To more effectively capture this critical information from historical records, we propose a learning method that integrates a Long Short-Term Memory (LSTM) network [38] with an attention mechanism. First, the historical fault sequence is processed by the LSTM network to obtain hidden state representations for each maintenance record, as formalized in Equation (4).

R_{1}, R_{2}, \dots, R_{T} = LSTM (F_{1}, F_{2}, \dots, F_{T})

(4)

This process allows

R_{1}, R_{2}, \dots, R_{T}

to encapsulate both short-term fluctuations and long-term cumulative information within the sequential fault records, thereby capturing long-range dependencies. These hidden states are then processed through an attention mechanism to enhance the interactions between historical records and identify the most critical information influencing future faults, as shown in Equation (5).

B_{1}, B_{2}, \dots, B_{T} = Atten (R, R, R)

(5)

where

R = R_{1}, R_{2}, \dots, R_{T}

; through the above equation, the interaction between historical fault records is enhanced, obtaining critical information that affects future faults.

The synthesized results D of critical fault information learning are then obtained by summation, as shown in Equation (6).

D = B_{1} + B_{2} + \dots + B_{T}

(6)

3.4. Comprehensive Fusion Prediction

To fully integrate the outputs from the fault correlation representation (S) and the critical fault information learning (D) for predicting future faults, we employ an attention mechanism to perform enhanced fusion. This process is detailed in Equation (7).

Y = Atten (D, D, S)

(7)

The fault at time

T + 1

is obtained based on the correlation-aware fusion result Y, and the specific realized process is shown in Equation (8).

\hat{y} = Sigmoid (W_{y} dropout (Y) + b_{y})

(8)

where

W_{y} \in R^{| P | \times d_{y}}

and

b_{y} \in R^{| P |}

are the parameters and

d_{y}

denotes the dimensions of Y. To improve the robustness of the model, a dropout operation is performed on Y before making a prediction.

3.5. Model Optimization

We train the CFATM model to predict the

T + 1

fault of each vehicle, and the global objective function is a binary cross-entropy loss function, as shown in Equation (9).

L = - \sum_{i = 1}^{| P |} (y_{i}^{T} log ({\hat{y}}_{i}) + {(1 - y_{i})}^{T} log (1 - {\hat{y}}_{i}))

(9)

where

{\hat{y}}_{i}^{T}

denotes the prediction result of fault

p_{i}

and

y_{i}

denotes the true label of fault

p_{i}

.

4. Experimental Results and Analysis

4.1. Dataset Description

To evaluate the performance of our method, we utilized real vehicle maintenance data obtained from 30 maintenance companies for validation. First, vehicles with missing or outlier values in the data were excluded. Subsequently, the fault records in the remaining dataset were categorized based on vehicle construction theory, ultimately yielding 2240 distinct fault types. Subsequent to initial filtering, data pertaining to vehicles that had undergone two or more maintenance services, along with the corresponding fault information for each service, were retained. The final dataset comprises the complete fault records of 7932 vehicles serviced between April 2011 and December 2020. Detailed statistical information of the dataset is presented in Table 1, while the distribution of vehicle maintenance record data is illustrated in Figure 2.

To facilitate the experimental procedure, the dataset was randomly partitioned by vehicle ID into a training set and a test set. These sets contain fault records from 5500 and 2432 vehicles, respectively. Within our framework, the faults identified in the most recent maintenance record are designated as labels, while the faults of all preceding historical maintenance records are employed as input features.

4.2. Baseline Models and Evaluation Metrics

The principal objective of this experiment is to predict faults in the

T + 1

-th maintenance record based on the fault history from the previous T maintenance records for each vehicle, constituting a multi-label classification problem. The evaluation metrics adopted for this task are the weighted

F_{1}

score (

w - F_{1}

) and recall at k (

R @ k

). The

w - F_{1}

score represents the weighted average of the

F_{1}

scores for all fault classes, while

R @ k

denotes the average ratio of the first k predicted faults to the total number of true faults, thereby measuring the prediction accuracy. To benchmark our method against state-of-the-art models, we selected the following methods for comparative analysis: MLP [39], RNN [40], LSTM [38], GRU [41], and Transformer [37]. Additionally, we incorporated typical spatio-temporal graph learning methods such as PDFormer [42] and STAEformer [43]. Furthermore, a comparison was made with Dipole [44], a representative method in the field of medical diagnosis prediction.

4.3. Implementation Details

In our experiments, all model parameters were randomly initialized. Hyperparameters and activation functions were meticulously tuned on a separate validation set. The model was trained for 100 epochs using the Adam optimizer with a learning rate of 0.001. The model architecture consists of several key components: a 10-layer LSTM network with a hidden size of 150 units, a self-attention mechanism with feature dimension of 150, a graph attention network (GAT) layer with input and output dimensions of 2240 and a dropout rate of 0.1, and a scaled dot-product attention module with an attention dimension of 64. The classifier employs a dropout rate of 0.5 before the final linear projection. All programs were implemented in Python 3.10.0 and PyTorch 1.10.0, utilizing CUDA 11.4 on a system equipped with 64 GB of RAM and an NVIDIA GPU (driver version 472.39). To ensure statistical robustness, each experiment was repeated five times with different random seeds.

4.4. Prediction Performance

The experimental results of the model are shown in Table 2, and since the average number of faults is 5.19, we set

k = [5, 10, 15, 20, 25, 30, 35]

for

R @ k

. The experimental results show that our CFATM model significantly outperforms all baseline models. To rigorously validate that the improvements are statistically significant and not due to random chance, we conducted a paired sample t-test comparing the results of our CFATM model against the best-performing baseline (Dipole) across the five independent runs. As shown in Table 2, the improvements achieved by CFATM in all metrics are statistically significant, with a p-value < 0.01 (**). In terms of

w - F_{1}

, the CFATM model performs well in both precision and recall. Compared with the best-performing baseline model, Dipole, CFATM improves the

F_{1}

score by 1.03%, which indicates that the CFATM model possesses a better classification balance and can maximize fault identification while maintaining high precision, thus effectively reducing the false alarm rate. This represents a certain degree of optimization in the model’s balanced consideration of precision and recall, enabling better anticipation of a vehicle’s actual condition. It enables more timely and accurate prediction of maintenance needs, thereby preventing safety incidents caused by delayed maintenance. From the perspective of

R @ k

, as the value of k changes, the CFATM model always maintains its performance advantage. This indicates that the CFATM model can accurately predict the desired faults in both short and long prediction lists, showing strong generalization ability. This stability of performance is crucial to practical applications, especially in highly dynamic and complex fault environments.

4.5. Ablation Study

To validate the effectiveness of each module in our model, we compared three model ablation variants with different configurations. The ablation variants are defined as follows:

CFATM-NoGraph: This variant verifies its functionality by removing the input of the Fusion representation of fault correlation information, resulting in Y = D.
CFATM-NoTemporal: This variant seeks to elucidate the function of the Key temporal information learning component by excluding the inputs of D, resulting in Y = S.
CFATM-NoAttnFusion: To elucidate the importance of the attention mechanism for integration and prediction, the the attention mechanism is excluded in this variant, such that Y = D + S.

The ablation results in Table 3 offer nuanced insights into the architectural efficacy of each module. The ablation study validates the contribution of each component in the proposed CFATM model. The full model consistently outperforms all ablated variants, demonstrating that every module plays a critical role. The performance degradation observed in the CFATM-NoGraph variant underscores the importance of incorporating relational information among faults for learning superior representations. Similarly, the performance drop in the CFATM-NoTemporal variant confirms that capturing long-range dependencies and weighting historically significant faults are essential to accurate prediction. The clear performance decrease in the CFATM-NoAttnFusion variant illustrates the efficacy of the attention mechanism in adaptively integrating features from different modules, as opposed to simple summation. The complete CFATM model benefits from the synergistic effect of all components, leveraging fault correlations, key temporal patterns, and attentive fusion to achieve optimal results.

4.6. Parameter Sensitivity Analysis

In order to investigate the impact of the key parameters of CFATM on its performance, we evaluated the sensitivity of the dropout rate in Equation (8), and we set the dropout rate in the range of 0.3 to 0.7. The results shown in Figure 3 indicate that the model works best when the dropout rate is 0.5.

5. Conclusions

In this study, we developed a deep learning-based model for vehicle fault prediction that effectively addresses key challenges in the field. By leveraging multiple attention mechanisms, the model comprehensively incorporates correlations between faults and the critical impact of specific faults on future fault development, thereby enhancing predictive accuracy. Specifically, we developed an information fusion module based on a graph attention network to capture correlations among different faults. This was combined with a Long Short-Term Memory (LSTM) network and an attention mechanism to accurately assess the impact of key historical faults on future predictions. Comprehensive experimental validation using a large-scale real-world vehicle fault dataset demonstrated that the proposed model significantly outperforms traditional prediction models. This improvement in predictive capability can contribute to enhanced vehicle safety, reliability, and service life. This work highlights the significant value of mining historical maintenance records for predictive insights. While the model currently operates on historical faults and not on real-time CAN bus sensor streams, it provides a powerful tool for fleet-level prognostic health management. It enables strategic decision making, such as optimizing maintenance schedules and the inventory of spare parts based on predicted fault probabilities, thereby enhancing operational efficiency, safety, and vehicle service life.

From an industrial applicability perspective, the proposed CFATM model demonstrates reasonable computational efficiency during inference, making it suitable for deployment in cloud-based or edge computing maintenance platforms. The graph attention and LSTM modules are parallelizable and can be optimized for GPU acceleration, ensuring scalability to large fleets. However, real-time deployment would require integration with vehicle telematics systems and further optimization for low-latency processing. Future work will include compressing the model via quantization or knowledge distillation to reduce memory and computation overhead, facilitating its adoption in resource-constrained environments.

However, this study has certain limitations. The model’s performance, trained on a specific set of vehicles and fault codes, requires further validation regarding its generalizability to entirely new vehicle types or data from previously unseen maintenance providers. Furthermore, as the prediction is based on statistical correlations between faults, it does not account for real-time operational conditions or external environmental factors, which could influence the exact timing of a fault. In the future, we will further investigate several directions: (1) enhancing the model’s interpretability to provide clearer insights into the reasons behind predictions; (2) integrating continuous time-series sensor data from the CAN bus with historical faults to create a more comprehensive and accurate prediction system; and (3) evaluating the model’s robustness and generalization capability across diverse vehicle fleets and operating environments. These steps are crucial to transitioning from a proof-of-concept model to a deployable tool in practical vehicle maintenance scenarios.

Author Contributions

H.J. contributed to data analysis, algorithm construction, and the writing and editing of the manuscript. D.Q. and F.C. reviewed and edited the manuscript. W.Z. proposed the idea, contributed to data acquisition, performed supervision and project administration, and reviewed and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research study received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Authors Fanghua Chen and Wei Zhou were employed by the Research Institute of Highway Ministry of Transport. The remaining authors declare that they have no commercial or financial relationships that could be construed as a potential conflict of interest.

Correction Statement

This article has been republished with a minor correction to the Data Availability Statement. This change does not affect the scientific content of the article.

References

Qi, J. Enhanced particle filter and cyclic spectral coherence based prognostics of rolling element bearings. In Proceedings of the PHM Society European Conference, Utrecht, The Netherlands, 3–6 July 2018; Volume 4. [Google Scholar]
Qi, J.; Chen, Z.; Uhlmann, Y.; Schullerus, G. Sensorless robust anomaly detection of roller chain systems based on motor driver data and deep weighted KNN. IEEE Trans. Instrum. Meas. 2024, 74, 3502613. [Google Scholar] [CrossRef]
Qi, J.; Chen, Z.; Kong, Y.; Qin, W.; Qin, Y. Attention-guided graph isomorphism learning: A multi-task framework for fault diagnosis and remaining useful life prediction. Reliab. Eng. Syst. Saf. 2025, 263, 111209. [Google Scholar] [CrossRef]
Wang, M.-H.; Chao, K.-H.; Sung, W.-T.; Huang, G.-J. Using ENN-1 for fault recognition of automotive engine. Expert Syst. Appl. 2010, 37, 2943–2947. [Google Scholar] [CrossRef]
Wu, J.-D.; Kuo, J.-M. Fault conditions classification of automotive generator using an adaptive neuro-fuzzy inference system. Expert Syst. Appl. 2010, 37, 7901–7907. [Google Scholar] [CrossRef]
Bafroui, H.H.; Ohadi, A. Application of wavelet energy and Shannon entropy for feature extraction in gearbox fault detection under varying speed conditions. Neurocomputing 2014, 133, 437–445. [Google Scholar] [CrossRef]
Wong, P.K.; Zhong, J.; Yang, Z.; Vong, C.M. Sparse Bayesian extreme learning committee machine for engine simultaneous fault diagnosis. Neurocomputing 2016, 174, 331–343. [Google Scholar] [CrossRef]
Mohammadi, A.; Djerdir, A.; Steiner, N.Y.; Bouquain, D.; Khaburi, D. Diagnosis of PEMFC for automotive application. In Proceedings of the 2015 5th International Youth Conference on Energy (IYCE), Pisa, Italy, 27–30 May 2015; pp. 1–6. [Google Scholar]
Sankavaram, C.; Pattipati, B.; Pattipati, K.; Zhang, Y.; Howell, M.; Salman, M. Data-driven fault diagnosis in a hybrid electric vehicle regenerative braking system. In Proceedings of the 2012 IEEE Aerospace Conference, Big Sky, MT, USA, 3–10 March 2012; pp. 1–11. [Google Scholar]
Yang, R.; Xiong, R.; He, H.; Chen, Z. A fractional-order model-based battery external short circuit fault diagnosis approach for all-climate electric vehicles application. J. Clean. Prod. 2018, 187, 950–959. [Google Scholar] [CrossRef]
Wang, Y.S.; Liu, N.N.; Guo, H.; Wang, X.L. An engine-fault-diagnosis system based on sound intensity analysis and wavelet packet pre-processing neural network. Eng. Appl. Artif. Intell. 2020, 94, 103765. [Google Scholar] [CrossRef]
Zuo, J.; Lv, H.; Zhou, D.; Xue, Q.; Jin, L.; Zhou, W.; Yang, D.; Zhang, C. Deep learning based prognostic framework towards proton exchange membrane fuel cell for automotive application. Appl. Energy 2021, 281, 115937. [Google Scholar] [CrossRef]
Zhang, T.; Li, Z.; Deng, Z.; Hu, B. Hybrid data fusion DBN for intelligent fault diagnosis of vehicle reducers. Sensors 2019, 19, 2504. [Google Scholar] [CrossRef]
Branco, C.T.N.M.; Fontanela, J.M. A design methodology to employ digital twins for remaining useful lifetime prediction in electric vehicle batteries. In Proceedings of the SAE Brasil 2023 Congress, São Paulo, Brazil, 10–11 October 2024. SAE Technical Paper. [Google Scholar]
Molaie, M.; Zippo, A.; Pellicano, F. Neural Network-Based Estimation of Gear Safety Factors from ISO-Based Simulations. Symmetry 2025, 17, 1312. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, J.; Zhai, S.; Hu, Z. Data-driven modeling and fault diagnosis for fuel cell vehicles using deep learning. Energy AI 2024, 16, 100345. [Google Scholar] [CrossRef]
Ghimire, R.; Sankavaram, C.; Ghahari, A.; Pattipati, K.; Ghoneim, Y.; Howell, M.; Salman, M. Integrated model-based and data-driven fault detection and diagnosis approach for an automotive electric power steering system. In Proceedings of the 2011 IEEE Autotestcon, Baltimore, MD, USA, 12–15 September 2011; pp. 70–77. [Google Scholar]
Yin, S.; Huang, Z. Performance monitoring for vehicle suspension system via fuzzy positivistic C-means clustering based on accelerometer measurements. IEEE/ASME Trans. Mechatronics 2014, 20, 2613–2620. [Google Scholar] [CrossRef]
Wang, G.; Yin, S. Data-driven fault diagnosis for an automobile suspension system by using a clustering based method. J. Frankl. Inst. 2014, 351, 3231–3244. [Google Scholar] [CrossRef]
Jegadeeshwaran, R.; Sugumaran, V. Brake fault diagnosis using Clonal Selection Classification Algorithm (CSCA)–A statistical learning approach. Eng. Sci. Technol. Int. J. 2015, 18, 14–23. [Google Scholar] [CrossRef]
Capriglione, D.; Carratù, M.; Liguori, C.; Paciello, V.; Sommella, P. A soft stroke sensor for motorcycle rear suspension. Measurement 2017, 106, 46–52. [Google Scholar] [CrossRef]
Ghimire, R.; Zhang, C.; Pattipati, K.R. A rough set-theory-based fault-diagnosis method for an electric power-steering system. IEEE/ASME Trans. Mechatronics 2018, 23, 2042–2053. [Google Scholar] [CrossRef]
Manghai, T.M.A.; Jegadeeshwaran, R. Vibration based brake health monitoring using wavelet features: A machine learning approach. J. Vib. Control. 2019, 25, 2534–2550. [Google Scholar] [CrossRef]
Zehelein, T.; Hemmert-Pottmann, T.; Lienkamp, M. Diagnosing automotive damper defects using convolutional neural networks and electronic stability control sensor signals. J. Sens. Actuator Netw. 2020, 9, 8. [Google Scholar] [CrossRef]
Jeong, K.; Choi, S.B.; Choi, H. Sensor fault detection and isolation using a support vector machine for vehicle suspension systems. IEEE Trans. Veh. Technol. 2020, 69, 3852–3863. [Google Scholar] [CrossRef]
Siegel, J.E.; Sun, Y.; Sarma, S. Automotive diagnostics as a service: An artificially intelligent mobile application for tire condition assessment. In Proceedings of the Artificial Intelligence and Mobile Services–AIMS 2018: 7th International Conference, Held as Part of the Services Conference Federation, SCF 2018. Seattle, WA, USA, 25–30 June 2018. [Google Scholar]
Nowaczyk, S.; Prytz, R.; Rögnvaldsson, T.; Byttner, S. Towards a machine learning algorithm for predicting truck compressor faults using logged vehicle data. In Proceedings of the 12th Scandinavian Conference on Artificial Intelligence, Aalborg, Denmark, 20–22 November 2013; pp. 205–214. [Google Scholar]
Cerqueira, V.; Pinto, F.; Sá, C.; Soares, C. Combining boosted trees with metafeature engineering for predictive maintenance. In Advances in Intelligent Data Analysis XV; Springer: Cham, Switzerland, 2016; pp. 393–397. [Google Scholar]
Rengasamy, D.; Jafari, M.; Rothwell, B.; Chen, X.; Figueredo, G.P. Deep learning with dynamically weighted loss function for sensor-based prognostics and health management. Sensors 2020, 20, 723. [Google Scholar] [CrossRef]
Fang, Y.; Cheng, C.; Dong, Z.; Min, H.; Zhao, X. A fault diagnosis framework for autonomous vehicles based on hybrid data analysis methods combined with fuzzy PID control. In Proceedings of the 2020 3rd International Conference on Unmanned Systems (ICUS), Harbin, China, 27–28 November 2020; pp. 281–286. [Google Scholar]
Biddle, L.; Fallah, S. A novel fault detection, identification and prediction approach for autonomous vehicle controllers using SVM. Automot. Innov. 2021, 4, 301–314. [Google Scholar] [CrossRef]
Safavi, S.; Safavi, S.A.; Hamid, H.; Fallah, S. Multi-sensor fault detection, identification, isolation and health forecasting for autonomous vehicles. Sensors 2021, 21, 2547. [Google Scholar] [CrossRef] [PubMed]
Xu, P.; Wei, G.; Song, K.; Chen, Y. High-accuracy health prediction of sensor systems using improved relevant vector-machine ensemble regression. Knowl.-Based Syst. 2021, 212, 106555. [Google Scholar] [CrossRef]
Giordano, D.; Giobergia, F.; Pastor, E.; Macchia, A.L.; Cerquitelli, T.; Baralis, E.; Mellia, M.; Tricarico, D. Data-driven strategies for predictive maintenance: Lesson learned from an automotive use case. Comput. Ind. 2022, 134, 103554. [Google Scholar] [CrossRef]
Ren, D.; Huang, H.; Li, Y.; Jin, J. High-Risk Test Scenario Generation for Autonomous Vehicles at Roundabouts Using Naturalistic Driving Data. Appl. Sci. 2025, 15, 4505. [Google Scholar] [CrossRef]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU neural network methods for traffic flow prediction. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; pp. 324–328. [Google Scholar]
Jiang, J.; Han, C.; Zhao, W.X.; Wang, J. Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 4365–4373. [Google Scholar]
Liu, H.; Dong, Z.; Jiang, R.; Deng, J.; Deng, J.; Chen, Q.; Song, X. Spatio-temporal adaptive embedding makes vanilla transformer sota for traffic forecasting. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 4125–4129. [Google Scholar]
Ma, F.; Chitta, R.; Zhou, J.; You, Q.; Sun, T.; Gao, J. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 1903–1911. [Google Scholar]

Figure 1. An overview of the CFATM model. It is mainly composed of three modules: (1) fusion representation of fault correlation information; (2) key temporal information learning; (3) synthesis, integration, and prediction.

Figure 2. The distribution of vehicle maintenance record data: (a) The distribution of vehicles by the number of maintenance; (b) The distribution of maintenance record by the number of fault.

Figure 3. Parameter sensitivity analysis of dropout rate.

Table 1. Experimental dataset statistics.

Description	Value
Total number of vehicles	7932
Unique fault types recorded	2240
Maximum events for a single vehicle	45
Average events per vehicle	3.89
Maximum faults in a single event	51
Average faults per maintenance event	5.19

Table 2. Fault prediction results using w−F₁(%) and R@k (%). Statistical significance of the improvement of our CFATM model over the best baseline (Dipole) is measured by a paired t-test (**:

p < 0.01

).

Table 2. Fault prediction results using w−F₁(%) and R@k (%). Statistical significance of the improvement of our CFATM model over the best baseline (Dipole) is measured by a paired t-test (**:

p < 0.01

).

Model	w−F₁	R@5	R@10	R@15	R@20	R@25	R@30	R@35
MLP	31.57 ± 0.42	52.64 ± 0.15	57.65 ± 0.11	62.57 ± 0.13	66.51 ± 0.19	69.73 ± 0.14	70.81 ± 0.10	72.23 ± 0.18
RNN	32.31 ± 0.66	52.94 ± 0.09	58.37 ± 0.20	63.04 ± 0.17	67.34 ± 0.16	70.23 ± 0.15	72.33 ± 0.12	73.53 ± 0.10
LSTM	32.09 ± 0.80	53.30 ± 0.18	57.58 ± 0.17	62.38 ± 0.11	66.82 ± 0.14	70.25 ± 0.16	73.82 ± 0.19	75.66 ± 0.15
GRU	32.20 ± 0.43	53.11 ± 0.20	57.70 ± 0.12	62.89 ± 0.14	67.44 ± 0.13	70.46 ± 0.11	72.76 ± 0.17	74.16 ± 0.09
Dipole	32.59 ± 0.31	52.88 ± 0.21	57.73 ± 0.16	62.66 ± 0.10	67.45 ± 0.18	70.61 ± 0.15	73.66 ± 0.12	75.49 ± 0.13
Transformer	31.63 ± 0.19	52.74 ± 0.12	57.37 ± 0.14	62.64 ± 0.20	66.71 ± 0.16	70.03 ± 0.13	71.34 ± 0.19	73.23 ± 0.11
PDFormer	31.68 ± 0.31	52.78 ± 0.32	57.43 ± 0.41	62.67 ± 0.23	66.78 ± 0.36	70.09 ± 0.31	71.37 ± 0.41	73.29 ± 0.31
STAEformer	31.71 ± 0.43	52.82 ± 0.64	57.46 ± 0.32	62.73 ± 0.27	66.77 ± 0.49	70.12 ± 0.35	71.42 ± 0.58	73.32 ± 0.46
CFATM	33.62 ** ± 0.47	54.38 ** ± 0.58	60.62 ** ± 1.18	65.87 ** ± 1.02	69.66 ** ± 0.89	73.11 ** ± 0.37	78.16 ** ± 0.13	78.16 ** ± 0.23

Table 3. Performance of CFATM model variants under different ablation settings.

Variant	Ablation Strategy	$w -$ $F_{1}$	R@5	R@10	R@15	R@20	R@25	R@30	R@35
CFATM	None	33.62	54.38	60.62	65.87	69.66	73.11	78.16	78.16
CFATM-NoGraph	Remove fault correlation graph	31.70	52.77	57.77	62.69	66.63	69.86	70.93	72.36
CFATM-NoTemporal	Remove LSTM—temporal attention	32.43	53.06	58.49	63.16	67.46	70.35	72.45	73.65
CFATM-NoAttnFusion	Replace attentive fusion with addition	32.21	53.42	57.70	62.50	66.94	70.37	73.94	75.78

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jia, H.; Qian, D.; Chen, F.; Zhou, W. Collaborative Fusion Attention Mechanism for Vehicle Fault Prediction. Future Internet 2025, 17, 428. https://doi.org/10.3390/fi17090428

AMA Style

Jia H, Qian D, Chen F, Zhou W. Collaborative Fusion Attention Mechanism for Vehicle Fault Prediction. Future Internet. 2025; 17(9):428. https://doi.org/10.3390/fi17090428

Chicago/Turabian Style

Jia, Hong, Dalin Qian, Fanghua Chen, and Wei Zhou. 2025. "Collaborative Fusion Attention Mechanism for Vehicle Fault Prediction" Future Internet 17, no. 9: 428. https://doi.org/10.3390/fi17090428

APA Style

Jia, H., Qian, D., Chen, F., & Zhou, W. (2025). Collaborative Fusion Attention Mechanism for Vehicle Fault Prediction. Future Internet, 17(9), 428. https://doi.org/10.3390/fi17090428

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Collaborative Fusion Attention Mechanism for Vehicle Fault Prediction

Abstract

1. Introduction

2. Related Work

2.1. Power and Drivetrain Fault Prediction

2.2. Chassis and Operating System Fault Prediction

2.3. Electronic and Safety Systems Fault Prediction

2.4. Critical Synthesis and Research Gap

3. Proposed Model

3.1. Framework

3.2. Fusion Representation of Fault Correlation Information

3.3. Key Temporal Information Learning

3.4. Comprehensive Fusion Prediction

3.5. Model Optimization

4. Experimental Results and Analysis

4.1. Dataset Description

4.2. Baseline Models and Evaluation Metrics

4.3. Implementation Details

4.4. Prediction Performance

4.5. Ablation Study

4.6. Parameter Sensitivity Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Correction Statement

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI