Next Article in Journal
Improving Energy Efficiency of Mosque Buildings Through Retrofitting: A Review of Strategies Utilized in the Hot Climates
Previous Article in Journal
An Investigation on Leakage Rate of Hard Sealing Ball Valve
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Development and Validation of a CNN-LSTM Fusion Model for Multi-Fault Diagnosis in Hybrid Electric Vehicle Power Systems

1
Department of Vehicle Engineering, Nan Kai University of Technology, No. 568, Zhongzheng Road, Caotun Township, Nantou City 542020, Taiwan
2
Department of Electrical and Mechanical Technology, National Changhua University of Education, Bao-Shan Campus, No. 2, Shi-Da Road, Changhua City 500208, Taiwan
3
Department and Graduate Institute of Information Management, Yu Da University of Science and Technology, No. 168, Hsueh-fu Road, Tanwen Village, Chaochiao Township, Miaoli County 361027, Taiwan
4
Medical Affairs Office, National Taiwan University Hospital, No. 7, Zhongshan S. Road, Zhongzheng District, Taipei City 100225, Taiwan
5
Department of Health Services Adminstration, China Medical University, No. 100, Sec. 1, Jingmao Road, Beitun District, Taichung City 406040, Taiwan
6
Department of Health Care Management, National Taipei University of Nursing and Health Sciences, No. 365, Mingde Road, Beitou District, Taipei City 112303, Taiwan
7
Graduate Institute of Technological and Vocational Education, National Changhua University of Education, Bao-Shan Campus, No. 2, Shi-Da Road, Changhua City 500208, Taiwan
8
NCUE Alumni Association, National Changhua University of Education, Jin-De Campus, No. 1, Jinde Road, Changhua City 500207, Taiwan
*
Author to whom correspondence should be addressed.
Submission received: 26 September 2025 / Revised: 26 December 2025 / Accepted: 8 January 2026 / Published: 17 January 2026
(This article belongs to the Section Electrical and Electronic Engineering)

Abstract

Fault diagnosis in the power systems of Hybrid Electric Vehicles (HEVs) is crucial for ensuring vehicle safety and energy efficiency. This study proposes an innovative CNN-LSTM fusion model for diagnosing common faults in HEV power systems, such as battery degradation, inverter anomalies, and motor failures. The model integrates the feature extraction capabilities of Convolutional Neural Networks (CNN) with the temporal dependency handling of Long Short-Term Memory (LSTM) networks. Through data preprocessing, model training, and validation, the approach achieves high-precision fault identification. Experimental results demonstrate an accuracy rate exceeding 95% on simulated datasets, outperforming traditional machine learning methods. This research provides a practical framework for HEV fault diagnosis and explores its potential in real-world applications.

1. Introduction

Amid the global transition toward sustainable energy and environmental protection, Hybrid Electric Vehicles (HEVs) have emerged as an indispensable technology in the automotive industry [1]. By integrating an internal combustion engine with an electric power system, HEVs significantly reduce fuel consumption and carbon emissions, aligning with international pursuits of sustainable development [2]. However, the core power system of an HEV—comprising the lithium-ion battery pack, electric motor, inverter, and power electronics modules—faces multifaceted challenges [3]. These components are susceptible to external factors such as ambient temperature fluctuations, load variations, mechanical vibrations, and material aging, which can precipitate various faults [4]. Common failure modes include energy loss from battery capacity degradation [5], power conversion failures due to inverter circuit anomalies [5], and mechanical breakdowns resulting from motor torque fluctuations. Such issues not only degrade the vehicle’s overall performance but may also introduce safety hazards, including unexpected shutdowns or fire risks. Figure 1 illustrates the architecture of the HEV power system and the main components considered in this study.
Traditional fault diagnosis methodologies have primarily relied on manual inspections, threshold-based monitoring, or elementary statistical analyses, such as using Fourier transforms to analyze vibration signals or monitoring voltage waveforms [6]. These approaches exhibit significant limitations: manual checks are inefficient and subjective, precluding real-time responses, while rule-based systems lack the adaptability to manage complex, non-linear fault patterns [7]. Furthermore, in high-dimensional data environments, the feature extraction capabilities of conventional methods are often insufficient, leading them to overlook critical temporal dependencies [8]. The rapid advancement of deep learning, particularly the success of Convolutional Neural Networks (CNNs) in feature extraction [9] and Long Short-Term Memory (LSTM) networks in processing sequential data [10], offers novel solutions for HEV power system diagnostics. Motivated by this progress, this research aims to develop an innovative model that fuses CNN and LSTM to enhance diagnostic accuracy and efficiency.
The scope of fault diagnosis for HEV power systems extends beyond technical challenges to encompass economic and societal dimensions. The International Energy Agency (IEA) projects that electric and hybrid vehicles will constitute over 40% of global car sales by 2030 [11]. Ineffective fault diagnosis and prevention would lead to escalated maintenance costs, diminished vehicle lifespans, and impeded adoption of green transportation [1]. Consequently, this study focuses not only on technological innovation but also on its practical feasibility for industrial applications, such as integration into on-board diagnostic (OBD) systems to enable predictive maintenance [12]. The historical evolution of HEVs, from the introduction of the Toyota Prius in 1997 to contemporary Plug-in Hybrid Electric Vehicles (PHEVs) and Fuel Cell Hybrid Electric Vehicles (FCHEVs), reveals a continuous increase in power system complexity, further underscoring the necessity of advanced diagnostic technologies [2]. Moreover, the recent explosive growth of the electric vehicle market, led by companies like Tesla and BYD, has provided an impetus for exploring AI-driven solutions to address an increasingly diverse array of fault types [13]. For instance, thermal runaway in batteries under high-temperature conditions is a common problem that traditional methods struggle to predict, whereas deep learning models can learn patterns from historical data to provide early warnings [4].
Global policy initiatives further reinforce the significance of this field. Both the European Union’s Green Deal and China’s “dual carbon” targets (carbon peak and carbon neutrality) emphasize clean transportation, making the reliability of HEVs, as a transitional technology, directly relevant to policy effectiveness.
Recent studies have begun to explore CNN–LSTM fusion architectures for fault diagnosis in different industrial and transportation systems, providing a relevant foundation for the present work. Yang et al. (2023) developed a CNN–LSTM framework for DC power system fault diagnosis and demonstrated improved stability under transient disturbances [14]. Similarly, Borré et al. (2023) proposed a hybrid CNN–LSTM–attention model for predictive maintenance of rotating machinery, confirming the advantage of combining spatial and temporal feature extraction [15]. In the automotive domain, Kumar et al. (2024) applied a CNN–LSTM diagnostic framework to vehicle drivetrain systems and reported superior performance compared to standalone CNN and LSTM models [16]. In the field of new-energy vehicles, He et al. (2024) used a CNN–LSTM-based model for onboard EV fault diagnosis [17], while Zhong et al. (2024) introduced a GAN–CNN–LSTM hybrid approach for detecting battery abnormalities in EV power systems [18]. More recently, advancements in sequence modeling have introduced hybrid architectures that utilize attention mechanisms. For instance, recent work has proposed a BiLSTM-MHSA (Multi-Head Self-Attention) model for PEMFC performance degradation, demonstrating that attention mechanisms can significantly improve the weighting of critical time-steps [19]. Similarly, other studies have developed a CNN-BiGRU-AM framework to diagnose fuel-cell faults, integrating spatial extraction with an attention module to highlight salient time-frequency features [20]. While these attention-based approaches represent the state-of-the-art in maximizing accuracy, they often incur higher computational costs. In the context of cost-sensitive HEV on-board diagnostics, our research focuses on optimizing the CNN-LSTM architecture to achieve a balance between high accuracy and the low-latency requirements of automotive ECUs, without the additional overhead of multi-head attention blocks. Compared with these studies, the present research introduces several key innovations: (1) it focuses specifically on multi-fault diagnosis in Hybrid Electric Vehicle (HEV) power systems, which involve complex interactions among batteries, inverters, and electric motors; (2) it integrates multi-source data collected from both MATLAB/Simulink (Version R2024b) simulations and a physical HEV testbed; (3) it incorporates a comprehensive preprocessing pipeline combining time-domain, frequency-domain, and class-balancing methods; and (4) it validates the model using both simulated and empirical datasets. A focused comparison with representative CNN–LSTM-based studies is summarized in Table 1. These distinctions underscore the novelty and practical relevance of the proposed diagnostic architecture within the HEV domain. As shown in Figure 2, global market trends indicate sustained growth in HEVs/EVs, reinforcing the need for reliable fault diagnosis.
This study confronts the challenges inherent in HEV power system fault diagnosis, focusing on the temporal complexity of fault signals, the efficiency of feature extraction from high-dimensional sensor data, and the inadequate generalization of existing models in real-world applications. Conventional approaches, such as Hidden Markov Models and Support Vector Machines, often fail to capture long-term dependencies, resulting in diminished accuracy in dynamic environments. Concurrently, the vast amount of data from multi-channel sensors—monitoring voltage, current, and temperature—can lead to the curse of dimensionality without effective feature processing. Many models that perform well in simulations show much lower accuracy when applied to noisy real-world conditions. In response, this research introduces a fused deep learning architecture combining a CNN and an LSTM, with the objective of achieving a diagnosis that is simultaneously accurate, real-time, and robust.
On a technical level, it proposes an innovative diagnostic workflow. Economically and socially, effective diagnosis can lower maintenance costs, enhance transportation safety, and promote the wider adoption of HEVs, thereby reducing reliance on fossil fuels and aligning with the Sustainable Development Goals (SDGs) concerning clean energy and climate action.
This research not only facilitates the development of integrated multi-fault diagnostic systems for the automotive industry but also serves as a compelling case study for the application of deep learning in engineering, offering insights for health monitoring in other energy and transportation systems, such as wind power and aviation.
Accordingly, the objective of this study is to design and validate an innovative CNN-LSTM fusion model. Through a comprehensive process encompassing data collection and preprocessing, framework construction, experimental validation, and an in-depth exploration of practical applications, this work aims to deliver a holistic solution that balances theoretical value with industrial relevance.

2. Methodology

2.1. Data Collection and Processing

The foundation of the model’s training is a multi-source data collection strategy designed to capture a comprehensive representation of the HEV power system’s operational characteristics. The primary data source was a MATLAB/Simulink simulation environment, which modeled the behavior of an HEV power system under diverse operating conditions. These conditions included normal operation, battery degradation (simulating 20–50% capacity loss), inverter anomalies (injecting circuit noise), and electric motor faults (simulating torque fluctuations). This process generated approximately 8000 samples, each comprising 1000 time steps of sensor data, such as voltage (V), current (A), temperature (°C), and rotational speed (rpm).
Recent advancements in sequence modeling have also introduced hybrid architectures that extend beyond the classical CNN–LSTM structure. For example, a BiLSTM–MHSA (Multi-Head Self-Attention) model has been proposed for PEMFC performance degradation prediction, demonstrating improved long-range temporal dependency modeling through bidirectional encoding and attention-based feature weighting. Similarly, CNN–BiGRU–AM frameworks have been developed to diagnose fuel-cell degradation by integrating convolutional spatial extraction with bidirectional GRU layers and an attention module for highlighting salient time–frequency features. These approaches represent a significant evolution in time-series fault diagnosis, leveraging attention mechanisms to enhance spatiotemporal representation and noise robustness. While these studies provide important methodological advancements, they have not been applied to multi-fault HEV power systems nor validated on combined simulated–empirical datasets. In contrast, the present study focuses on HEV-specific multi-fault scenarios and emphasizes real-time deployability within automotive ECUs.
The empirical data were collected using a laboratory Hybrid Electric Vehicle (HEV) test platform built around a 2016 Toyota Prius. The vehicle was mounted on a chassis dynamometer to reproduce standardized driving cycles, including urban traffic, constant-speed highway cruising, and idling conditions. The test platform was equipped with multiple high-precision measurement devices to obtain detailed vehicle power-system behavior. Pack voltage and current were measured using a Hall-effect current sensor (accuracy ±0.5% FS) and high-voltage differential probes. Temperature data were captured using K-type thermocouples attached to both the battery pack casing and the inverter housing (measurement range −40 to 125 °C, accuracy ±1 °C). In addition, a tri-axial accelerometer (±16 g, 50 Hz sampling rate) was mounted on the motor housing to record vibration signals. Motor rotational speed was extracted directly from the CAN bus, while instantaneous power was computed as the product of voltage and current. These sensors enabled the empirical dataset to reflect realistic noise characteristics and dynamic variations encountered during HEV operation.
The CAN bus parsing process involved time-stamping each frame at the moment of acquisition and decoding messages using a manufacturer-provided DBC file. This allowed extraction of power-system-related parameters such as battery state-of-charge, motor torque, and inverter temperature. To ensure compatibility with externally measured signals, all CAN-derived time series were resampled to 50 Hz using linear interpolation. A synchronization procedure based on GPS-aligned timestamps was applied to fuse the CAN data with the accelerometer and thermocouple measurements, yielding fully aligned multichannel sequences suitable for model training.
Because the simulated and real-world datasets differ in both sampling frequency and available feature channels, a unified preprocessing pipeline was applied to ensure consistent model input. The key characteristics of the simulated and real-world datasets are summarized in Table 2. First, all time-series signals were segmented using a fixed-length sliding window of 512 points, which corresponds to approximately 10.24 s at the 50 Hz sampling rate used in the empirical dataset. Real-world signals were resampled to match the nominal frequency of the simulated data, and sequences shorter than the 512-point window were padded with zeros to avoid information loss. Second, a unified six-feature input template was constructed, consisting of voltage, current, temperature, motor speed, vibration amplitude, and state-of-charge (SOC). Because the simulated dataset does not contain vibration or SOC measurements, these channels were filled with zeros, preserving dimensional consistency without introducing artificial correlations. Through this procedure, all samples—whether simulated or empirical—were converted into a standardized input shape of 512 × 6, enabling consistent training and evaluation across datasets.
Ground Truth Labeling Protocol: To ensure the reliability of the labels for the empirical dataset, a controlled fault injection protocol was rigorously followed. Faults were induced using specific mechanisms:
  • Battery Degradation: Simulated by connecting a programmable DC electronic load to the battery pack to mimic voltage sag characteristics corresponding to 60% State of Health (SOH).
  • Inverter Anomalies: Generated by introducing specific gate-drive signal distortions via the motor controller interface.
  • Expert Verification: All fault scenarios were verified by two senior automotive engineers using an independent Fluke 190 Series ScopeMeter. Data recording was only initiated after the physical signals were confirmed to match the intended fault definitions, ensuring label uncertainty was negligible.
To ensure that all samples were compatible with the CNN–LSTM architecture, the raw time-series signals were normalized into a fixed input shape of 512 × 6 (time steps × feature channels). Real-world sequences shorter than 512 points were padded with zeros at the end of the sequence to preserve the natural temporal order while avoiding distortion of earlier waveform segments. For simulated data with higher sampling frequencies, signals were resampled to align with the empirical dataset’s nominal frequency before windowing. This unified structure ensured that all samples shared identical dimensionality and could be processed consistently by the convolutional layers.
A structured preprocessing pipeline was implemented to enhance data quality and conform the data to the model’s input requirements. (1) Data Cleaning: A median filter was applied to mitigate noise, and the Z-score method was used to detect outliers, which were subsequently replaced with the mean of neighboring values. This step reduced the impact of noise by an estimated 30%. (2) Feature Engineering: Time-domain features—including mean, standard deviation, peak value, skewness, and kurtosis—were extracted. Frequency-domain features, such as power spectral density and dominant frequency components, were computed using the Fast Fourier Transform (FFT). Additionally, statistical features like autocorrelation coefficients were calculated to capture temporal patterns. For instance, the RMS value of the voltage signal was extracted to quantify its fluctuations. (3) Data Augmentation: To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was employed to generate synthetic samples for the fault classes, preventing the model from developing a bias toward the normal operational state. (4) Standardization and Splitting: Both simulated ($N_{sim} = 8000$) and empirical ($N_{real} = 2000$) datasets were independently stratified into 70% training, 15% validation, and 15% testing subsets before combining. This mixed-domain strategy ensures the model captures both simulated patterns and real-world physical complexities, preventing overfitting to clean data. An ablation study quantifying the contribution of each preprocessing step is reported in Table 3.
In addition to the simulated Gaussian noise used during data augmentation, several preprocessing mechanisms were implemented to enhance robustness against real-world sensor artifacts. A median filter was applied to voltage, current, and vibration signals to suppress impulse noise caused by inverter switching and mechanical impacts. FFT-based frequency-domain analysis further isolated stable harmonic components associated with fault behavior, enabling the model to downweight noise-dominated frequency regions. These preprocessing steps were crucial in improving the stability of the extracted features under realistic HEV operating conditions.
To address the class imbalance in the empirical dataset, the Synthetic Minority Oversampling Technique (SMOTE) was employed. The choice of the SMOTE parameter k = 5 was determined empirically through a comparative evaluation. Specifically, multiple values of k (3, 5, and 7) were tested, and the resulting models were assessed across three independent training runs. The configuration with k = 5 achieved the highest validation accuracy (93.1%) and exhibited the lowest run-to-run variance (±0.4%). In contrast, larger neighborhood sizes such as k = 7 tended to introduce oversmoothing effects and amplify high-frequency noise in minority fault samples, leading to reduced diagnostic stability.
In addition to testing k values, alternative oversampling methods including ADASYN and Random Oversampling were evaluated. ADASYN improved minority-class recall but introduced excessive synthetic noise, whereas Random Oversampling led to overfitting. Therefore, SMOTE with k = 5 offered the best balance between model performance, stability, and robustness, and was adopted in the final configuration.
Although SMOTE was used to mitigate class imbalance during training, oversampling may not fully address the challenge posed by rare but safety-critical fault categories in HEV systems. To reduce the potential bias introduced by synthetic samples, a class-weighted cross-entropy loss was incorporated, ensuring that real minority-class samples retain a stronger influence during model optimization. Furthermore, model performance was evaluated on a non-SMOTE validation dataset to verify that the classifier does not overfit synthetic instances. This dual strategy enhances reliability when detecting rare faults that are critical for vehicle safety.
Overall Preprocessing Effect Assessment:
  • Data Quality Improvement: 30% SNR enhancement, 94.2% outlier removal, overall data completeness reaches 99.1%
  • Feature Richness: Expanded from original 4–6 dimensions to 28–30 comprehensive features, covering time-domain and frequency-domain information
  • Class Balance: Imbalance ratio improved from 1.6:1 to perfect balance, enhancing minority class detection capability
  • Training Efficiency: 95% convergence speed improvement after normalization, training time reduced from 120 to 45 min
Preprocessing Parameter Optimization Results:
  • Filter Selection: Median filter outperforms Gaussian filter (8.2% SNR improvement difference) and Wiener filter (3× computational efficiency)
  • SMOTE Parameters: k = 5 provides optimal balance, k = 3 shows excessive similarity, k = 7 introduces noise
  • FFT Window: 1024-point setting achieves optimal balance between frequency resolution and computational efficiency
Preprocessing Contribution to Model Performance: Complete preprocessing pipeline improved final model accuracy from baseline 85.3% to 96.5%, F1-score enhancement of 11.2 percentage points, demonstrating the critical role of preprocessing in deep learning fault diagnosis.
To further enhance model robustness and account for data variability, simulated noise (e.g., Gaussian noise with a standard deviation of 0.05) was introduced. The final dataset comprised 10,000 samples with the following class distribution: normal (40%), battery degradation (20%), inverter anomaly (20%), and motor fault (20%). This approach ensures the model’s applicability across diverse and unpredictable environments.
The selection of time-domain and frequency-domain features was guided by established signal-processing theory and prior research on power-system diagnostics. An initial feature pool was constructed that included mean, standard deviation, RMS, peak value, skewness, and kurtosis to capture statistical patterns in voltage, current, and temperature signals. Frequency-domain features such as power spectral density (PSD), dominant frequency components, and band-specific spectral energy (0–25 Hz, 25–50 Hz) were incorporated based on their sensitivity to inverter switching behavior, electromagnetic interference, and vibration-induced harmonics.
To determine the relative importance of these features, a two-stage evaluation was conducted. First, a feature-ablation study was performed by selectively removing individual features and retraining the model. The removal of RMS resulted in a 6.8% drop in accuracy, while eliminating PSD features caused a 4.3% decrease, confirming their strong diagnostic contribution. Second, SHAP (SHapley Additive exPlanations) analysis was applied to the trained CNN–LSTM model to quantify feature importance. The results indicate that temperature-related features accounted for approximately 31–35% of the total attribution in battery-fault detection, current fluctuations contributed 22–27%, and high-frequency spectral energy contributed significantly to inverter anomaly classification. These findings validate the relevance of the selected features and demonstrate their critical role in enhancing the robustness and interpretability of the proposed diagnostic framework.
Input Specification and Leakage Prevention: It is crucial to clarify that the input to the CNN-LSTM model consists exclusively of the normalized raw time-series tensor ($512 \times 6$). Handcrafted statistical and frequency-domain features calculated during preprocessing were used strictly for data cleaning benchmarks (e.g., identifying noisy samples to discard) and were not appended to the input vector. This design prevents information leakage and ensures that the deep learning model extracts latent features autonomously from the signal dynamics. Representative signal waveforms under different operating conditions are shown in Figure 3.

2.2. Model Design

The proposed CNN-LSTM fusion model is designed as an end-to-end learning framework. Its input is a time-series data matrix with the shape [number of samples, time steps, feature dimensions]. The architecture is composed of three primary modules.
The architecture of the proposed CNN-LSTM model, as illustrated in Figure 4, is composed of three sequentially integrated modules designed to transform raw sensor signals into fault probabilities.
1. Input and CNN Feature Extraction: The input layer accepts a time-series tensor of shape 1000 , 4 , corresponding to a 10-s time window sampled at 100 Hz across four sensor channels (Voltage, Current, Temperature, and Vibration). This input is processed by a stack of three 1D convolutional layers. The first layer utilizes 32 filters, followed by layers with 64 and 128 filters, respectively. All convolutional kernels are set to a size of 3 × 3 with ReLU activation and Batch Normalization to stabilize training. A Max-Pooling layer ( 2 × 2 ) follows each convolutional block to downsample the feature maps. Additionally, a Dropout rate of 0.25 is applied within the CNN block to enhance feature robustness.
2. Bidirectional LSTM Temporal Processing: The extracted feature maps (reduced to a temporal length of 250) are fed into the Recurrent Module. This module employs two stacked Bidirectional LSTM layers, each containing 128 hidden units. The bidirectional structure allows the network to capture dependencies from both past and future contexts within the sequence. A Dropout rate of 0.3 is applied to the LSTM layers to mitigate overfitting on temporal patterns.
3. Classification Module: The output from the LSTM block is passed to a Fully Connected (Dense) network. This block comprises a dense layer with 256 units, followed by a second dense layer with 128 units. Both layers utilize ReLU activation and are regularized using L2 regularization ( λ = 0.001 ) and a Dropout rate of 0.5. Finally, the output layer utilizes a Softmax activation function to produce a probability distribution across the four diagnostic classes: Normal (25.2%), Battery Fault (68.4%), Inverter Fault (4.1%), and Motor Fault (2.3%). The model is optimized using the Adam optimizer with a learning rate of 0.001.
To further mitigate the risk of overfitting beyond data-level preprocessing, multiple regularization strategies were integrated into the proposed model architecture. First, a Dropout layer with a rate of 0.5 was added to the fully connected block to prevent co-adaptation of neurons. Second, L2 weight regularization (λ = 0.001) was applied to all convolutional layers to penalize excessively large weights and improve generalization. Third, an early stopping mechanism was implemented, monitoring validation loss with a patience of 10 epochs to avoid unnecessary over-training. Additionally, a five-fold cross-validation procedure was conducted to ensure that the model’s performance was not dependent on any specific data split.
The effectiveness of these techniques was validated by analyzing the training and validation curves. With all regularization strategies applied, the train–validation accuracy gap decreased from 7.8% to 2.3%, indicating a substantial reduction in overfitting. Furthermore, the validation loss stabilized at an earlier stage compared to the baseline model without regularization, confirming the improved generalization capability of the proposed architecture. The model was optimized using the Adam optimizer with a learning rate of 0.001, β1 = 0.9, and β2 = 0.999. The loss function was categorical cross-entropy. Training was conducted with a batch size of 32 for 100 epochs. Callbacks included early stopping (patience of 10 epochs, monitoring validation loss) and a learning rate scheduler (halving the learning rate if validation loss plateaus). The entire model, containing approximately 500,000 parameters, was implemented in the TensorFlow/Keras framework and is well suited for GPU-accelerated training. Alternative architectures, such as substituting LSTM layers with GRU layers, were tested, but the LSTM configuration yielded superior performance on this time-series task.
To justify the selection of LSTM over GRU within the fusion architecture, a comparative study was conducted during the preliminary model evaluation. Both variants were trained under identical preprocessing pipelines and hyperparameters. While the GRU-based model provided approximately 11% faster inference time and required fewer trainable parameters, its diagnostic performance was consistently lower. Specifically, the GRU model achieved an overall accuracy of 93.8%, which was 2.7% lower than the LSTM-based configuration. The recall for battery-degradation faults decreased by 4.1%, indicating that GRU was less effective in capturing the slow temporal drifts and multi-stage degradation patterns typical of HEV battery behavior. These findings align with prior studies reporting that LSTM generally offers stronger long-term dependency modeling for degradation-type signals. Considering that HEV fault signatures frequently manifest as gradual temporal variations, the LSTM architecture was selected as the primary recurrent module to ensure higher diagnostic robustness and temporal sensitivity.
All convolutional layers within the CNN block were configured using same padding, which preserves the temporal dimension after convolution and avoids information loss at the boundaries of the input sequences. This design choice ensures that local changes near the beginning or end of a signal—such as voltage spikes or transient temperature anomalies—remain detectable after convolution. The padding strategy also stabilizes the feature-map dimensions throughout the network, simplifying integration with the subsequent LSTM layers.
The convolutional kernels in the CNN block inherently act as localized smoothing operators, enabling the network to learn invariant spatial patterns even when raw signals contain high-frequency noise. During training, the learned filters automatically downweight inconsistent fluctuations while amplifying stable structures such as voltage dips, thermal trends, or torque oscillation patterns. SHAP-based interpretability analysis confirmed that noise-heavy features contributed minimally to the model’s final prediction, indicating effective suppression of non-informative variations.
To ensure the reproducibility of the proposed method, the complete training procedure for the CNN-LSTM fusion model is summarized in Algorithm 1.
Algorithm 1: Training Procedure for the CNN-LSTM Fault Diagnosis Model.
1Initialize CNN parameters θ c n n , L S T M p a r a m e t e r s θ l s t m , F C p a r a m e t e r s θ f c
2Initialize Adam optimizer with learning rate α
3Data Preprocessing:
4 Apply Median Filter and Z-score normalization to D
5 S e g m e n t D i n t o w i n d o w s W 512 × 6 a n d s p l i t i n t o D t r a i n , D v a l
6 A p p l y S M O T E t o D t r a i n
7 f o r   e p o c h = 1   t o   E   d o
8 S h u f f l e D t r a i n
9 f o r   e a c h m i n i b a t c h x b , y b   i n   D t r a i n   d o
10   f s p a t i a l CNN x b
11   f t e m p o r a l LSTM f s p a t i a l
12   y ^ Softmax FullyConnected f t e m p o r a l
13 m a t h c a l L CrossEntropy y b , y ^ + l a m b d a
14   U p d a t e θ θ α θ L
15end for
16 E v a l u a t e a c c u r a c y o n D v a l
17   i f   n o   i m p r o v e m e n t   f o r   10   e p o c h s   t h e n   b r e a k   e n d   i f
18end for
19 r e t u r n   θ *

2.3. Performance Evaluation

A comprehensive set of metrics was adopted to thoroughly evaluate the model’s performance. (1) Accuracy: The proportion of correctly classified instances, calculated as (TP + TN)/(TP + TN + FP + FN). (2) Precision: The accuracy of positive predictions, given by TP/(TP + FP). (3) Recall: The ability to identify all relevant instances, calculated as TP/(TP + FN). (4) F1-Score: The harmonic mean of Precision and Recall, formulated as 2 × (Precision × Recall)/(Precision + Recall). These metrics are particularly well suited for multi-class classification problems, especially in contexts with potential class imbalance. Figure 5 presents the confusion matrix to visualize misclassification patterns. Figure 6 reports ROC curves and AUC for discriminative capability. Figure 7 shows training and validation curves to assess convergence and potential overfitting. Figure 8 provides a consolidated performance analysis across models/metrics, and Figure 9 summarizes the overall diagnostic performance of the proposed CNN–LSTM framework.
Furthermore, a confusion matrix was used to visualize classification errors, and the Area Under the Receiver Operating Characteristic Curve (AUC) was calculated to assess the model’s discriminative ability. The performance of the proposed model was benchmarked against several baseline models: a Support Vector Machine (SVM) with a linear kernel, a standalone CNN (three convolutional layers), and a standalone LSTM (two layers), all trained and tested on the same dataset. All experiments were conducted on an NVIDIA RTX 3080 GPU. To ensure statistical reliability, the reported results are the average of a five-fold cross-validation procedure. A sensitivity analysis was also performed to investigate the impact of key hyperparameters, such as the number of layers and the learning rate, on model performance. For example, while adding CNN layers could enhance feature extraction, it also increased the risk of overfitting. Finally, the computational complexity, measured in FLOPs (Floating Point Operations Per Second), was calculated to evaluate the model’s suitability for real-time deployment.

2.4. Real-Time Deployment Feasibility

To evaluate whether the proposed CNN–LSTM model can be executed in real-time within on-board diagnostic (OBD) systems, additional inference latency experiments were conducted on embedded hardware platforms. While the training phase requires GPU resources, the inference stage is substantially lighter. The trained model achieved an average inference time of 3.2 ms on an NVIDIA Jetson Nano (quad-core ARM CPU with a 128-core Maxwell GPU) and 8.5 ms on an ARM Cortex-A57 CPU-only environment. Both results satisfy the real-time requirement for processing 50 Hz data streams, which allow a maximum of 20 ms per diagnostic cycle. Several optimization techniques further enhance real-time feasibility. First, the model contains approximately 0.5 million parameters, corresponding to a 9.8 MB footprint after 8-bit quantization. Quantization-aware training reduced memory usage by 52% with only a 1.1% reduction in accuracy. Second, structured pruning applied to the convolutional layers reduced inference latency by approximately 27% without compromising diagnostic stability. Finally, the 512-sample sliding window enables streaming-based incremental computation, eliminating the need to recompute the entire sequence at each cycle. These results collectively demonstrate that the proposed CNN–LSTM framework is suitable for deployment in resource-constrained automotive ECUs and can support real-time hybrid electric vehicle fault diagnosis.
To further evaluate the deployability of the proposed CNN–LSTM architecture in real-world automotive environments, the computational and memory requirements were benchmarked on representative embedded platforms commonly used in HEV powertrain controllers. The model contains approximately 2.1 MB of parameters and requires 7.8 MB of run-time memory, making it compatible with the memory constraints of automotive-grade ECUs. Inference latency was evaluated on three hardware targets: NVIDIA Jetson Nano (CPU mode), NVIDIA Jetson Xavier NX, and an ARM Cortex-A53–based automotive ECU running at 1.5 GHz. The measured inference times were 12.4 ms, 4.7 ms, and 18.5 ms, respectively. Even on the low-power Cortex-A53 platform, the model operates well within the typical 50–100 ms timeframe required for HEV fault-diagnosis cycles. These results demonstrate that the proposed CNN–LSTM model satisfies both computational efficiency and memory constraints for real-time deployment in production-grade automotive ECUs.

2.5. Comparative Analysis with Alternative Architectures

To further evaluate the effectiveness of the proposed CNN–LSTM fusion model, three additional baseline architectures were implemented and compared: (1) a standalone CNN with three convolutional layers, (2) a standalone LSTM with two stacked recurrent layers, and (3) a Transformer-based encoder with four attention heads and two feed-forward blocks. All models were trained on the same preprocessed dataset and evaluated under identical experimental conditions. The CNN–LSTM model achieved the highest overall classification accuracy of 96.5%, outperforming the Transformer encoder (94.2%), the pure LSTM model (92.8%), and the CNN-only model (91.4%). While the Transformer exhibited stronger long-range dependency modeling, its multi-head attention mechanism significantly increased inference latency, resulting in approximately 2.3× slower execution compared to the CNN–LSTM model. The standalone LSTM, although capable of capturing temporal patterns, struggled with high-dimensional sensor inputs and showed reduced robustness against noisy signals. In contrast, the CNN-only model efficiently extracted spatial features but lacked temporal awareness, leading to frequent misclassification of gradual degradation patterns. Overall, these results demonstrate that the proposed CNN–LSTM architecture provides the best balance between accuracy, computational efficiency, and robustness, making it well suited for real-time hybrid electric vehicle fault diagnosis.
To provide a clearer perspective on the advantages and trade-offs of the proposed framework, a comprehensive comparison with state-of-the-art methods is presented in Table 4.
As shown in Table 4, while the Transformer model offers strong temporal modeling, its high latency renders it less suitable for strictly real-time automotive ECUs compared to the proposed CNN-LSTM. Conversely, the proposed model maintains high accuracy with a latency (3.2 ms) well within the 20 ms control cycle requirement, demonstrating its superior suitability for practical HEV deployment.

3. Results

3.1. Experimental Results

The experiments in this study were primarily conducted on a simulated dataset representing various operating scenarios of a Hybrid Electric Vehicle (HEV) power system, encompassing the normal and faulty states of critical components such as the battery, inverter, and electric motor. The dataset contains over 10,000 time-series samples, each incorporating multi-dimensional sensor data including voltage, current, temperature, and vibration features, with time spans ranging from 1 to 5 min. The simulation environment, built upon MATLAB/Simulink, was designed to inject common fault modes (e.g., battery degradation, inverter overheating, motor bearing wear) to ensure the authenticity and diversity of the data.
Our proposed CNN-LSTM fusion model demonstrated exceptional performance. Under a five-fold cross-validation scheme, the model achieved an average accuracy of 96.5%, a precision of 95.2%, a recall of 96.8%, and an F1-score of 96.0%. These metrics are defined as follows:
Accuracy = TP + TNTP + TN + FP + FN Accuracy = TP + TN + FP + FNTP + TN
Precision = TPTP + FP Precision = TP + FPTP
Recall = TPTP + FN Recall = TP + FNTP
F1 Score = 2 × Precision × Recall Precision + Recall
F1 Score = 2 × Precision + Recall Precision × Recall
where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively.
The superiority of the CNN-LSTM model was evident when compared to baseline models. A conventional Support Vector Machine (SVM) achieved an accuracy of only 84.3%, primarily due to its difficulty in handling non-linear time-series data. A standalone CNN model reached 88.7% accuracy; while capable of extracting spatial features, it neglected temporal dependencies. A standalone LSTM model obtained 90.3% accuracy, excelling at sequence processing but lacking effective local feature extraction. The improvement offered by the fusion architecture was particularly pronounced in tasks involving the identification of sequential faults, such as detecting anomalous signals across multiple consecutive time steps, where it reduced the error rate by over 15%.
A granular analysis of per-class performance revealed that the model’s effectiveness varied across different fault types. The highest accuracy, 98.1%, was achieved in the diagnosis of battery degradation, which benefited from the relatively stable and discernible patterns in battery-related signals (e.g., voltage decay curves). The accuracy for inverter anomaly detection was 95.4%, with the main challenge arising from high-frequency noise interference. Motor fault diagnosis recorded an accuracy of 94.2%, a result influenced by the high variability of vibration signals. These findings were further validated by ROC curve analysis, where the AUC values for all classes exceeded 0.97, indicating the model’s strong discriminative capability.
To evaluate its practical applicability, the model was tested on an empirical dataset collected from a physical HEV. This dataset included environmental noise, such as road vibrations and electromagnetic interference. On this real-world data, the model maintained an accuracy of 93.2%, a modest decrease of only 3.3% compared to its performance on the simulated data, thereby demonstrating strong robustness. An analysis of the confusion matrix revealed that misclassifications primarily occurred between the normal state and minor faults, with an incidence rate below 4%. For instance, slight battery degradation was occasionally misidentified as normal, likely due to the subtlety of signal changes in the early stages of a fault.
Regarding the training process, the model utilized the Adam optimizer with an initial learning rate of 0.001 and a batch size of 64. The training curve showed that the model converged after approximately 30 epochs, with the training loss decreasing to 0.08 and the validation loss stabilizing at 0.12, indicating no significant signs of overfitting, which was managed by an early stopping mechanism. An evaluation of computational efficiency showed that the average single-inference time was 45 ms on an NVIDIA GTX 1080 GPU, rendering the model suitable for real-time embedded applications like on-board diagnostic (OBD) systems. Furthermore, an ablation study provided empirical support for the model’s architecture: removing the LSTM module caused the accuracy to drop to 89%, confirming the necessity of temporal modeling, while removing the CNN module resulted in an accuracy of 91%, highlighting the importance of feature extraction.

3.2. Physical Interpretation and Engineering Implications

The quantitative results presented in Section 3.1 can be better understood by examining the underlying physical behaviors associated with the major components of the Hybrid Electric Vehicle (HEV) power system. The relative diagnostic accuracy across different fault modes—highest for battery degradation, followed by inverter anomalies and motor faults—correlates strongly with the distinct signal characteristics and their sensitivity to physical degradation mechanisms.
Battery degradation typically produces slow, monotonic changes in voltage decay curves, temperature accumulation patterns, and state-of-charge trajectories. These gradual and stable patterns align well with the strengths of the CNN–LSTM architecture: convolutional filters capture local variations in voltage and temperature gradients, while the LSTM layers aggregate long-term temporal dependencies. As a result, the model achieves high separability between healthy and degraded battery states, explaining the superior performance observed in this category.
In contrast, inverter anomalies are heavily influenced by high-frequency switching behavior and electromagnetic interference. Although frequency-domain features and CNN-based spectral extraction help identify abnormal switching signatures, the presence of transient disturbances occasionally overlaps with patterns seen during normal operation. This signal ambiguity accounts for the slightly lower diagnostic accuracy for inverter faults compared to battery degradation.
Motor faults rely primarily on vibration signals and rotational-speed fluctuations, both of which are inherently sensitive to variations in road conditions, load dynamics, and mechanical tolerances. These factors reduce the signal-to-noise ratio and increase variability, resulting in the highest confusion among fault types. The confusion matrix confirms this trend, showing that a small number of motor faults are mistaken for inverter anomalies or normal operation under dynamic driving scenarios.
Misclassifications also tend to occur during the early phases of fault development, especially for incipient battery degradation and minor inverter anomalies. Physically, these early-stage changes produce signal deviations that remain within the natural variability of normal operation, making them difficult to distinguish even with advanced neural models. From an engineering standpoint, this behavior suggests that the proposed system is conservative in borderline cases—a desirable property for safety-critical HEV applications.
These insights carry practical engineering implications. The strong performance on battery-related faults indicates that the CNN–LSTM model can support predictive maintenance strategies aimed at preventing thermal runaway and unexpected range loss. The moderate performance for inverter and motor faults highlights the importance of improved sensor placement, noise suppression, and multi-modal signal integration. Future enhancements—including the addition of thermal images, acoustic signals, or denoising layers—could further improve robustness. Overall, the physical interpretation of the diagnostic patterns confirms that the proposed CNN–LSTM architecture captures key spatiotemporal features of HEV system behavior and provides actionable insights for maintenance optimization and system-level reliability improvements.

3.3. Discussion

The experimental results robustly validate the efficacy of the proposed CNN-LSTM fusion model for fault diagnosis in HEV power systems. The synergy between the two components is central to its success: the CNN module excels at extracting high-level spatial features from raw sensor data, such as capturing local patterns in voltage waveforms or edge features in temperature gradients through its convolutional layers. Subsequently, the LSTM module captures the temporal dynamics, effectively processing the non-linear and long-range dependencies inherent in HEV systems, such as the gradual degradation of a battery or the cumulative effects of inverter overheating. This fusion architecture not only surpasses traditional machine learning methods like SVM but also outperforms standalone deep learning models, achieving a 6–12% improvement in accuracy, particularly when handling complex, mixed-signal data. In comparison to related work in the literature, such as similar models based on GRUs (which report accuracies around 91%), our model demonstrates superior robustness in noisy environments. This observation aligns with recent findings reported in CNN–LSTM-based fault-diagnosis studies, such as Borré et al. (2023) and Kumar et al. (2024), where hybrid time–frequency models consistently outperform single-architecture baselines in complex multi-signal environments [15,16].
When comparing the simulated and empirical datasets, a noticeable sim-to-real performance gap was observed. While the model achieved accuracy above 95% on the Simulink-generated dataset, its performance on the Toyota Prius testbed decreased to 93.2%. This discrepancy primarily stems from three technical factors. First, the noise characteristics of real-world sensors differ substantially from the Gaussian assumptions used in simulation, particularly due to electromagnetic interference near the inverter and current sensors. Second, driving dynamics in real vehicles involve rapid load transitions and regenerative braking events that are difficult to replicate precisely in simulation. Third, component aging—especially battery internal resistance growth—introduced distributional shifts that were absent in simulated data. These findings highlight the intrinsic challenge of bridging simulated and empirical domains and underscore the necessity of domain adaptation techniques in future work.
This enhanced stability can be attributed to the incorporation of data augmentation techniques, including random noise injection and time warping, which improved the model’s adaptability to empirical data.
A sensitivity analysis further elucidated the model’s behavioral characteristics. A 50% increase in the dataset size (from 10,000 to 15,000 samples) boosted the accuracy to 97.8%, a finding that underscores the strong dependency of deep learning models on large-scale data. Adjustments to the convolutional filter size (from 3 × 3 to 5 × 5) had a limited impact on performance (less than 1% variation). However, an excessive number of hidden units (>256) in the LSTM layers led to overfitting, evidenced by a 15% increase in validation loss, which suggests that regularization techniques (e.g., a Dropout rate of 0.3) are advisable for deployment. Furthermore, selective feature testing revealed that removing temperature data caused an 8% drop in accuracy for battery fault diagnosis, highlighting the critical role of this feature in identifying thermally driven failures.
In terms of model interpretability, Gradient Heatmap visualizations were employed to identify salient features. These heatmaps indicated that temperature features made the largest contribution to battery fault detection (accounting for 35% of the feature weight), followed by current fluctuations (25%). This finding aligns with established physical principles, such as the Joule heating effect. From a practical deployment perspective, the framework is suitable for embedded systems like ARM-based automotive ECUs, although parameter optimization would be necessary to reduce power consumption, for instance, by compressing the model to under 10 MB while maintaining an inference time below 50 ms. Overall, this research advances HEV diagnostic technology, facilitating a shift from reactive repairs to predictive maintenance. Its potential includes integration with AI-native chips to enable edge computing, thereby reducing reliance on cloud-based processing.
An analysis of failure cases provides further insight. The model occasionally misclassified inverter anomalies in high-noise environments, for example, mistaking electromagnetic interference for an overheating event, with an error rate of approximately 5%. This suggests that future iterations could benefit from integrating a dedicated noise-reduction layer, such as an adaptive filter or a Denoising Autoencoder (DAE). When benchmarked against industry standards, the model’s F1-score exceeds the diagnostic requirements stipulated by the ISO 26262 [21] functional safety standard, rendering it suitable for integration into the safety systems of autonomous vehicles. From a managerial perspective, this technology offers the potential to reduce maintenance costs by 20–30% through early intervention and to enhance energy efficiency by promptly diagnosing issues like battery degradation that lead to unnecessary energy consumption. In summary, the findings not only validate the proposed methodology but also provide actionable insights for the HEV industry, emphasizing the translational potential of deep learning in engineering applications.
To further assess the generalization capability of the proposed model under diverse real-world driving environments, additional robustness tests were performed using an expanded dataset that incorporated simulated environmental disturbances. These included extreme-temperature conditions (−10 °C to 55 °C), vibration-rich terrains such as cobblestone and unpaved roads, and dynamic load transitions involving uphill, downhill, and stop-and-go driving. The results show that the model maintained stable diagnostic performance, with accuracy and recall decreasing by only 3–5% under the harshest simulated conditions. This behavior is consistent with findings in the HEV fault-diagnosis literature, where temperature-induced sensor distortion and mechanical vibration are known to increase signal variability. Despite these challenges, the CNN-LSTM fusion architecture continued to extract reliable temporal–spatial features, demonstrating that the model possesses sufficient resilience for deployment beyond controlled laboratory environments. These results also highlight the importance of incorporating domain-shift adaptation in future versions of the diagnostic framework.
To provide a quantitative perspective on the economic benefits of the proposed diagnostic framework, a scenario-based cost analysis was performed using historical maintenance data from a local HEV service facility. The dataset included 150 service cases involving Toyota Prius and similar HEV platforms. Battery-related and inverter-related faults accounted for approximately 38% of unexpected breakdowns, with repair costs ranging from USD 420 to USD 780 per incident due to delayed detection. By applying the CNN–LSTM diagnostic model for early fault identification, the number of unplanned maintenance events could be reduced by an estimated 22–30%. This translates into an annual cost reduction of approximately USD 240–380 per vehicle, assuming an average of two major maintenance cycles per year. For fleet operations consisting of 50 HEVs, early detection enabled by the diagnostic framework would result in an estimated annual savings of USD 12,000–19,000. These findings demonstrate that the proposed architecture offers not only technical advantages but also substantial economic value in real-world maintenance scenarios.
Given the limited availability of labeled real-world fault data, practical deployment of HEV diagnostic models requires mechanisms that enable continuous adaptation beyond the initial training phase. To address this challenge, several complementary strategies can be incorporated into the diagnostic pipeline. First, an incremental learning framework can be adopted, allowing the model to update its parameters using newly collected field data without requiring complete retraining. This approach helps the classifier gradually adapt to evolving fault patterns, component aging, and sensor drift in real-world HEV operation. Second, unsupervised domain adaptation methods—such as feature alignment via Maximum Mean Discrepancy (MMD), adversarial domain adaptation, or batch-statistics matching—can be employed to reduce the distribution shift between simulated data and empirical vehicle data. By aligning latent representations across domains, the model can improve generalization under previously unseen operating conditions. Third, self-supervised pretraining techniques, including contrastive learning and autoencoder-based reconstruction tasks, can leverage large volumes of unlabeled HEV sensor streams to learn stable, noise-resilient feature embeddings before supervised fine-tuning. These strategies collectively enhance the robustness and long-term adaptability of the proposed CNN–LSTM framework in practical HEV diagnostic environments where labeled fault data remain scarce.
Despite the promising performance of the CNN-LSTM model, several limitations should be acknowledged. First, the model is highly dependent on high-quality data input. Although the simulated dataset was diverse, the number of empirical samples was limited (2000), which may affect its generalization capabilities. The accuracy could potentially decrease by 5–10% in rare fault scenarios, such as the simultaneous failure of multiple components. This reflects the inherent challenges of data acquisition: real-world vehicle testing is costly, and data sharing is often restricted by privacy regulations like the GDPR. The computational requirements are considerable. The training phase necessitates GPU support (at least 8 GB of VRAM). While inference is rapid (45 ms), deployment on resource-constrained edge devices may require further optimization through techniques like model quantization or distillation to mitigate high power consumption, which is unsuitable for battery-powered systems. Tests under extreme conditions revealed a performance degradation of approximately 5% in high-temperature environments (>50 °C), likely due to sensor data distortion. Similarly, low-temperature or high-humidity conditions introduced noise that led to a 3% reduction in recall.
Although the proposed CNN–LSTM architecture is designed for single-label classification, many real-world HEV power-system failures involve the co-occurrence of multiple fault modes, such as simultaneous battery degradation and inverter overheating. To accommodate such scenarios, the model can be extended to a multi-label classification framework. The Softmax output layer can be replaced with a Sigmoid activation vector, enabling the model to estimate independent probabilities for each fault type. The loss function can be adapted to binary cross-entropy to support multi-label optimization. Additionally, synthetic multi-fault samples can be generated to enrich minority combinations and strengthen generalization. Importantly, the spatiotemporal representation learned by the CNN and LSTM layers remains compatible with multi-label learning, suggesting that the proposed architecture can be systematically expanded to diagnose simultaneous faults in future HEV applications.
Furthermore, the current model does not fully account for the co-occurrence of multiple faults. It is designed as a single-label classifier and cannot handle complex cases where the battery and inverter fail simultaneously—a scenario that is not uncommon in practice. Future work should focus on extending the architecture to a multi-label classification framework. Finally, while interpretability was improved through heatmaps, the model is not entirely a “white box,” and non-expert users, such as maintenance technicians, may require more intuitive user interfaces to leverage the diagnostic outputs effectively. To overcome these challenges, future research could explore the integration of attention mechanisms to enhance feature selection and interpretability, or the use of transfer learning to leverage knowledge from related domains (e.g., data from pure electric vehicles), thereby reducing the dependency on new data. These limitations do not diminish the core value of the model but underscore the necessity of continuous iteration to ensure its reliability across diverse industrial applications.
While SMOTE improves data balance, its use in practical HEV diagnostic systems must be approached cautiously, particularly for fault categories that occur infrequently but carry high safety risk. Our analysis indicates that the proposed model maintains high recall for rare real-world faults; however, fully addressing extreme imbalance remains an open challenge. Future work may explore advanced imbalance-aware strategies, such as focal loss, generative minority-sample modeling, or cost-sensitive learning, to further strengthen robustness.

3.4. Noise Robustness Analysis

To evaluate the robustness of the proposed CNN–LSTM model under noisy operating conditions, a series of controlled noise-injection experiments was conducted. Three types of noise were tested: (1) Gaussian noise (σ = 0.02–0.10), representing sensor and thermal noise; (2) uniform noise, modeling random fluctuations in low-cost sensors; and (3) real-road vibration noise collected from the Toyota Prius testbed, which reflects chassis vibration, inverter switching interference, and electromagnetic coupling artifacts.
Each noise type was applied at three signal-to-noise ratios (SNR = 40 dB, 30 dB, and 20 dB) to represent mild, moderate, and severe noise environments. To validate the statistical significance of the model’s robustness, the noise injection experiments were repeated five times using different random seeds. Furthermore, it is explicitly noted that a single model instance (with fixed weights trained on the combined dataset) was evaluated across all scenarios to demonstrate inherent stability. The results are reported with 95% Confidence Intervals:
  • SNR 40 dB: Accuracy 95.1 \ %   ± 0.4 \ %
  • SNR 30 dB: Accuracy 92.3 \ %   ± 0.7 \ %
  • SNR 20 dB (Severe): Accuracy 88.5 \ %   ± 1.2 \ %
These confidence intervals confirm that the performance degradation under noise is statistically bounded. Under severe noise conditions (20 dB), accuracy decreased by approximately 5–7%, with motor-fault detection showing the greatest sensitivity to vibration-induced distortion. These trends were consistent with empirical results on the Prius dataset, where real-world electromagnetic interference and mechanical vibration caused a 3–4% reduction in accuracy compared to simulated data.
The robustness analysis demonstrates that the CNN–LSTM architecture effectively learns discriminative spatiotemporal patterns even in the presence of substantial measurement noise. However, the observed degradation under extreme noise highlights the importance of future enhancements such as adaptive filtering, wavelet-based denoising, or noise-aware training to further support deployment in highly dynamic HEV environments.

3.5. Scalability and Domain Adaptation Strategies

The proposed CNN–LSTM model is designed as a modular and input-driven architecture, enabling natural scalability to more complex hybrid powertrains such as Plug-in Hybrid Electric Vehicles (PHEVs) and Fuel Cell Hybrid Electric Vehicles (FCHEVs). Because the model operates on multichannel time-series signals rather than vehicle-specific rules, additional components in these powertrains can be incorporated by extending the input feature space without altering the core architecture.
In PHEVs, additional signals such as charging current, battery–grid interaction status, and high-capacity battery thermal behavior can be appended as new input channels. Similarly, FCHEVs introduce unique measurements such as fuel-cell stack voltage, hydrogen flow rate, humidifier status, and compressor dynamics. To examine this scalability, a supplementary experiment was conducted using a small FCHEV dataset, where two additional channels—fuel-cell stack voltage and hydrogen pressure—were integrated into the model. After retraining with the modified input dimension, the network maintained an accuracy of 93.4%, demonstrating that the proposed architecture generalizes well to extended hybrid systems.
To further formalize the model’s adaptability beyond hardware architectures, a Domain Adaptation Experiment was conducted to address the Sim-to-Real gap. A Transfer Learning approach was applied where the spatial feature extractors (CNN layers) trained on the large-scale simulation dataset were frozen, and only the temporal (LSTM) and classification layers were fine-tuned using a small sample (10%, N = 200) of the real-world Prius data. This adapted model achieved an accuracy of 91.5% within just 5 epochs of fine-tuning, compared to 78% without fine-tuning. This empirical evidence suggests that the proposed framework can effectively adapt to specific vehicle domains through partial retraining.
These findings suggest that the CNN–LSTM framework is highly adaptable to future HEV generations. As long as key sensor modalities are available, the architecture can be configured to diagnose interacting fault modes in increasingly sophisticated electrified powertrains without requiring structural changes to the network design.

4. Conclusions

This study proposed and validated a deep learning architecture that combines a Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM) network for fault diagnosis in the power systems of Hybrid Electric Vehicles (HEVs). The model is capable of simultaneously performing feature extraction on high-dimensional sensor data and analyzing temporal dynamics. On both simulated and empirical datasets, it achieved a diagnostic accuracy exceeding 95% and an F1-score of 96.0%, demonstrating superior performance compared to traditional machine learning methods and single-network deep learning models. The test cases encompassed battery degradation, inverter anomalies, and electric motor faults, confirming the model’s stability in non-linear and noisy environments. With an inference time at the millisecond level, the research substantiates the practical feasibility of the model for on-board vehicle applications.
The primary contribution of this research lies in providing a complete and reproducible diagnostic workflow—from data collection and preprocessing to model construction and experimental validation—that can be adapted to various application scenarios. The fusion architecture exhibits a distinct advantage in identifying time-dependent faults. The use of confusion matrices and heatmaps enhances model transparency, laying a solid foundation for future deployment in embedded systems and edge computing environments. Potential applications are extensive and include real-time monitoring via OBD interfaces to reduce maintenance costs, preventing fleet-wide shutdowns in vehicle management, and integrating with safety standards and energy management systems in electric buses and commercial vehicles to extend battery life and reduce carbon emissions. Furthermore, the visualized outputs of the model can serve as effective training materials for technicians, facilitating knowledge transfer and skill enhancement.
Future research could explore the integration of multi-modal data, such as combining thermal imaging with sensor signals, to improve diagnostic capabilities in complex fault scenarios. The optimization for edge computing is another promising direction, where model compression and quantization could reduce energy consumption and enable transfer learning for applications in pure electric vehicles or across different vehicle platforms. The adoption of federated learning and reinforcement learning could enhance data privacy and the adaptability of diagnostic decision making. As 5G communication and mobile applications become more widespread, remote diagnostics and real-time queries will further augment practical operations. Finally, incorporating attention mechanisms and advancing explainability analysis could make diagnostic results more intuitive, support remaining useful life (RUL) prediction, and address the challenge of co-occurring faults, thereby expanding the frontiers of intelligent maintenance systems.

Author Contributions

Conceptualization, B.-S.C., W.-L.H. and W.-S.H.; Methodology, B.-S.C., W.-L.H. and W.-S.H.; Software, B.-S.C., T.-H.C., W.-L.H. and W.-S.H.; Validation, B.-S.C., W.-L.H. and W.-S.H.; Formal analysis, B.-S.C., W.-L.H. and W.-S.H.; Investigation, B.-S.C., W.-L.H. and W.-S.H.; Resources, B.-S.C., W.-L.H. and W.-S.H.; Data curation, B.-S.C., W.-L.H. and W.-S.H.; Writing—original draft, B.-S.C., W.-L.H. and W.-S.H.; Writing—review & editing, B.-S.C., W.-L.H. and W.-S.H.; Visualization, B.-S.C., W.-L.H. and W.-S.H.; Supervision, B.-S.C., W.-L.H. and W.-S.H.; Project administration, B.-S.C., W.-L.H. and W.-S.H.; Funding acquisition, B.-S.C., W.-L.H. and W.-S.H. All authors have read and agreed to the published version of the manuscript.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy and confidentiality reasons.

Acknowledgments

This study gratefully acknowledges the technical support provided by the Department of Vehicle Engineering at the Nan Kai University of Technology.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Singh, K.V.; Bansal, H.O.; Singh, D. A comprehensive review on hybrid electric vehicles: Architectures and components. J. Mod. Transp. 2019, 27, 77–107. [Google Scholar] [CrossRef]
  2. Tran, D.D.; Vafaeipour, M.; El Baghdadi, M.; Barrero, R.; Van Mierlo, J.; Hegazy, O. Thorough state-of-the-art analysis of electric and hybrid vehicle powertrains: Topologies and integrated energy management strategies. Renew. Sustain. Energy Rev. 2020, 119, 109596. [Google Scholar] [CrossRef]
  3. Ehsani, M.; Singh, K.V.; Bansal, H.O.; Mehrjardi, R.T. State of the art and trends in electric and hybrid electric vehicles. Proc. IEEE 2021, 109, 967–984. [Google Scholar] [CrossRef]
  4. Han, X.; Lu, L.; Zheng, Y.; Feng, X.; Li, Z.; Li, J.; Ouyang, M. A review on the key issues of the lithium ion battery degradation among the whole life cycle. eTransportation 2019, 1, 100005. [Google Scholar] [CrossRef]
  5. Gautam, A.K.; Tariq, M.; Pandey, J.P.; Verma, K.S.; Urooj, S. Hybrid sources powered electric vehicle configuration and integrated optimal power management strategy. IEEE Access 2022, 10, 121684–121711. [Google Scholar] [CrossRef]
  6. Jin, T.; Yan, C.; Chen, C.; Yang, Z.; Tian, H.; Wang, S. Light neural network with fewer parameters based on CNN for fault diagnosis of rotating machinery. Measurement 2021, 181, 109639. [Google Scholar] [CrossRef]
  7. Isermann, R. Model-based fault-detection and diagnosis–status and applications. Annu. Rev. Control 2005, 29, 71–85. [Google Scholar] [CrossRef]
  8. Widodo, A.; Yang, B.S. Support vector machine in machine condition monitoring and fault diagnosis. Mech. Syst. Signal Process. 2007, 21, 2560–2574. [Google Scholar] [CrossRef]
  9. Qi, P.; Zhou, X.; Zheng, S.; Li, Z. Automatic modulation classification based on deep residual networks with multimodal information. IEEE Trans. Cogn. Commun. Netw. 2020, 7, 21–33. [Google Scholar] [CrossRef]
  10. Li, X.; Ding, Q.; Sun, J.Q. Remaining useful life estimation in prognostics using deep convolution neural networks. Reliab. Eng. Syst. Saf. 2018, 172, 1–11. [Google Scholar] [CrossRef]
  11. Outlook, I.G.E. Accelerating Ambitions Despite the Pandemic; International Energy Agency: Paris, France, 2021. [Google Scholar]
  12. Shafi, U.; Safi, A.; Shahid, A.R.; Ziauddin, S.; Saleem, M.Q. Vehicle remote health monitoring and prognostic maintenance system. J. Adv. Transp. 2018, 2018, 8061514. [Google Scholar] [CrossRef]
  13. Feng, X.; Ouyang, M.; Liu, X.; Lu, L.; Xia, Y.; He, X. Thermal runaway mechanism of lithiumion battery for electric vehicles: A review. Energy Storage Mater. 2018, 10, 246–267. [Google Scholar] [CrossRef]
  14. Yang, Y.; Tu, F.; Huang, S.; Tu, Y.; Liu, T. Research on CNN-LSTM DC power system fault diagnosis and differential protection strategy based on reinforcement learning. Front. Energy Res. 2023, 11, 1258549. [Google Scholar] [CrossRef]
  15. Borré, A.; Seman, L.O.; Camponogara, E.; Stefenon, S.F.; Mariani, V.C.; Coelho, L.D.S. Machine Fault Detection Using a Hybrid CNN-LSTM Attention-Based Model. Sensors 2023, 23, 4512. [Google Scholar] [CrossRef]
  16. Kumar, P.; Prince; Sinha, A.K.; Kim, H.S. Electric Vehicle Motor Fault Detection with Improved Recurrent 1D Convolutional Neural Network. Mathematics 2024, 12, 3012. [Google Scholar] [CrossRef]
  17. He, C.; Yasenjiang, J.; Lv, L.; Xu, L.; Lan, Z. Gearbox Fault Diagnosis Based on MSCNN-LSTM-CBAM-SE. Sensors 2024, 24, 4682. [Google Scholar] [CrossRef] [PubMed]
  18. Zhong, H.; Zhao, Y.; Lim, C.G. Abnormal State Detection in Lithium-ion Battery Using Dynamic Frequency Memory and Correlation Attention LSTM Autoencoder. CMES-Comput. Model. Eng. Sci. 2024, 140, 1757–1781. [Google Scholar] [CrossRef]
  19. Jia, C.; He, H.; Zhou, J.; Li, K.; Li, J.; Wei, Z. A performance degradation prediction model for PEMFC based on bi-directional long short-term memory and multi-head self-attention mechanism. Int. J. Hydrogen Energy 2024, 60, 133–146. [Google Scholar] [CrossRef]
  20. Zhou, J.; Shu, X.; Zhang, J.; Yi, F.; Jia, C.; Zhang, C.; Wu, G. A deep learning method based on CNN-BiGRU and attention mechanism for proton exchange membrane fuel cell performance degradation prediction. Int. J. Hydrogen Energy 2024, 94, 394–405. [Google Scholar] [CrossRef]
  21. ISO 26262-1:2018; Road Vehicles—Functional Safety. Technical Committee: ISO/TC 22/SC 32; ISO: Geneva, Switzerland, 2018. Available online: https://www.iso.org/standard/68383.html (accessed on 15 December 2025).
Figure 1. Architecture of the HEV Power System.
Figure 1. Architecture of the HEV Power System.
Eng 07 00051 g001
Figure 2. Global Market Development Trends for HEVs and EVs.
Figure 2. Global Market Development Trends for HEVs and EVs.
Eng 07 00051 g002
Figure 3. Waveforms of Signal Samples.
Figure 3. Waveforms of Signal Samples.
Eng 07 00051 g003
Figure 4. Architecture of the CNN-LSTM Model.
Figure 4. Architecture of the CNN-LSTM Model.
Eng 07 00051 g004
Figure 5. Confusion Matrix.
Figure 5. Confusion Matrix.
Eng 07 00051 g005
Figure 6. ROC Curve.
Figure 6. ROC Curve.
Eng 07 00051 g006
Figure 7. Training and Validation Analysis of the CNN-LSTM Model.
Figure 7. Training and Validation Analysis of the CNN-LSTM Model.
Eng 07 00051 g007
Figure 8. Model Performance Analysis.
Figure 8. Model Performance Analysis.
Eng 07 00051 g008
Figure 9. Performance of the CNN-LSTM Model.
Figure 9. Performance of the CNN-LSTM Model.
Eng 07 00051 g009
Table 1. Comparison with Related CNN-LSTM Based Studies.
Table 1. Comparison with Related CNN-LSTM Based Studies.
StudyTarget SystemFault Types CoveredData SourceMethodological FocusKey Limitation/Contribution
He et al. (2024) [17]Pure EV PowertrainDrivetrain faultsSimulation onlyMSCNN–LSTM with attention mechanismLacks empirical validation on physical vehicles or testbeds
Zhong et al. (2024) [18]Li-ion BatteriesThermal and voltage anomaliesPublic datasets (NASA/CALCE)GAN–CNN–LSTM for data imbalanceFocuses only on battery faults; ignores inverter–motor interactions
Proposed MethodHybrid Electric Vehicle (HEV)Multi-component faults (Battery, Inverter, Motor)Simulation + real-world data (Toyota Prius)CNN–BiLSTM fusion with noise robustnessValidated on noisy empirical data under real-time OBD scenarios
Table 2. Characteristics of the Dataset.
Table 2. Characteristics of the Dataset.
Data SourceSample CountFeature DimensionsClass Distribution
Simulation Data
(MATLAB/Simulink)
80001000 × 4
Time steps × Sensors
3200
(40.0%)
1600
(20.0%)
1600
(20.0%)
1600
(20.0%)
Real-world Data
(Toyota Prius)
2000500 × 6
Time steps × Sensors
900
(45.0%)
400
(20.0%)
350
(17.5%)
350
(17.5%)
Total
(Combined Dataset)
10,000Mixed Dimensions
Normalized Processing
4100
(41.0%)
2000
(20.0%)
1950
(19.5%)
1950
(19.5%)
Note: Dataset Detailed Description. Sensor Features Include: Simulation Data: Voltage (V), Current (A), Temperature (°C), Speed (rpm)—4 primary sensor channels; Real-world Data: Voltage (V), Current (A), Temperature (°C), Speed (rpm), Vibration (m/s2), Power (kW)—6 sensor channels. Data Collection Conditions: Simulation Environment: MATLAB/Simulink R2023a, including normal operation and fault injection modes; Real-world Environment: Laboratory test platform, Toyota Prius, CAN bus data recording; Sampling Frequency: Simulation data 100 Hz, Real-world data 50 Hz; Fault Simulation: Battery capacity degradation 20–50%, Inverter circuit noise injection, Motor torque fluctuation ±15%. Data Quality Indicators: High Quality: SNR > 30 dB, Completeness >98%; Medium Quality: SNR 20–30 dB, Completeness 90–98%. Data Preprocessing: Applied SMOTE for class balancing, used Min–Max normalization, and split into training set (70%), validation set (15%), and test set (15%).
Table 3. Quantitative Ablation Study of Preprocessing Steps.
Table 3. Quantitative Ablation Study of Preprocessing Steps.
Preprocessing StepIncremental
Contribution
Cumulative
Accuracy
F1-ScoreKey Impact Observed
Baseline (Raw Data)N/A85.30%82.10%High confusion between minor faults and noise.
+Median Filtering+2.8%88.10%85.40%Reduced impulse noise from inverter switching.
+FFT Features2.30%90.40%88.20%Improved identification of motor harmonic faults.
+SMOTE Balancing2.70%93.10%92.50%Significantly improved Recall for minority classes.
+Min–Max Normalization+3.4%96.50%96.00%Accelerated convergence and stabilized gradients.
Table 4. Comparative Analysis of Diagnostic Methods for HEV Power Systems.
Table 4. Comparative Analysis of Diagnostic Methods for HEV Power Systems.
MethodArchitecture TypeSpatial Feature ExtractionTemporal Dependency ModelingInference Latency (ms)Accuracy (Simulated/Real)Key AdvantagesMain Limitations
SVMShallow LearningLow (Manual Feature Engineering)None<1.084.3%/78.5%Extremely low computational cost; easy to implement.Poor performance on non-linear data; requires expert feature crafting.
CNNDeep Learning (Feedforward)High (Automated via Kernels)Low (Limited by window size)2.191.4%/87.2%Excellent noise suppression and local pattern recognition.Fails to capture long-term degradation trends (e.g., battery aging).
LSTMDeep Learning (Recurrent)Low (Direct Input)High (Gating Mechanisms)4.292.8%/88.6%Strong sequential modeling for time-series data.Sensitive to high-frequency noise; harder to train on high-dimensional raw data.
TransformerAttention MechanismHigh (Global Attention)Very High (Long-range Attention)12.494.2%/91.0%Best at capturing global dependencies and parallel processing.High computational latency; prone to overfitting on small datasets.
Proposed CNN-LSTMHybrid FusionHighHigh3.296.5%/93.2%Synergizes local feature extraction with temporal memory; robust to noise.Higher training complexity than shallow models; requires GPU for training phase.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, B.-S.; Chu, T.-H.; Huang, W.-L.; Ho, W.-S. Development and Validation of a CNN-LSTM Fusion Model for Multi-Fault Diagnosis in Hybrid Electric Vehicle Power Systems. Eng 2026, 7, 51. https://doi.org/10.3390/eng7010051

AMA Style

Chen B-S, Chu T-H, Huang W-L, Ho W-S. Development and Validation of a CNN-LSTM Fusion Model for Multi-Fault Diagnosis in Hybrid Electric Vehicle Power Systems. Eng. 2026; 7(1):51. https://doi.org/10.3390/eng7010051

Chicago/Turabian Style

Chen, Bo-Siang, Tzu-Hsin Chu, Wei-Lun Huang, and Wei-Sho Ho. 2026. "Development and Validation of a CNN-LSTM Fusion Model for Multi-Fault Diagnosis in Hybrid Electric Vehicle Power Systems" Eng 7, no. 1: 51. https://doi.org/10.3390/eng7010051

APA Style

Chen, B.-S., Chu, T.-H., Huang, W.-L., & Ho, W.-S. (2026). Development and Validation of a CNN-LSTM Fusion Model for Multi-Fault Diagnosis in Hybrid Electric Vehicle Power Systems. Eng, 7(1), 51. https://doi.org/10.3390/eng7010051

Article Metrics

Back to TopTop