Novel End-to-End CNN Approach for Fault Diagnosis in Electromechanical Systems Based on Relevant Heating Areas in Thermography

Alvarado-Robles, Gilberto; Perez-Cruz, Angel; Espinosa-Vizcaino, Isac Andres; Jaen-Cuellar, Arturo Yosimar; Saucedo-Dorantes, Juan Jose

doi:10.3390/technologies13120551

Open AccessArticle

Novel End-to-End CNN Approach for Fault Diagnosis in Electromechanical Systems Based on Relevant Heating Areas in Thermography

by

Gilberto Alvarado-Robles

¹

,

Angel Perez-Cruz

^1,2,

Isac Andres Espinosa-Vizcaino

¹

,

Arturo Yosimar Jaen-Cuellar

¹

and

Juan Jose Saucedo-Dorantes

^1,2,*

¹

Engineering Faculty, San Juan del Rio Campus, Autonomous University of Queretaro, Rio Moctezuma 249, San Juan del Rio 76807, Mexico

²

C.A. Mechanical and Automotive Systems Applied to the Management of Conventional and Alternative Energies (UAQ-CA-155), Autonomous University of Queretaro, San Juan del Rio 76806, Mexico

^*

Author to whom correspondence should be addressed.

Technologies 2025, 13(12), 551; https://doi.org/10.3390/technologies13120551

Submission received: 20 September 2025 / Revised: 19 November 2025 / Accepted: 24 November 2025 / Published: 26 November 2025

(This article belongs to the Special Issue Image Analysis and Processing)

Download

Browse Figures

Versions Notes

Abstract

The reliability of electromechanical systems is a critical factor in modern Industry 4.0, as unexpected failures in induction motors or gearboxes can cause costly downtime, productivity losses, and increased maintenance demands. Infrared thermography offers a non-invasive and real-time means of monitoring thermal behavior, yet its effective use for fault diagnosis remains challenging due to sensitivity to noise, environmental variability, and the need for robust feature extraction. This work proposes a novel end-to-end convolutional neural network (CNN) methodology for detecting and classifying faults in electromechanical systems through the processing of infrared thermography images. The method integrates an automatic preprocessing stage that isolates the Relevant Heating Areas (RHAs), preserving their geometric and thermal descriptors while filtering irrelevant background information. A tailored data augmentation strategy, including controlled noise injection, was designed to improve robustness under realistic acquisition conditions. The CNN architecture combines 3 × 3 and 5 × 5 kernels to capture both fine-grained and global heating patterns. Experimental validation is carried out under nine different faulty conditions, achieving 99.7% accuracy and demonstrating strong resilience against Gaussian blur and additive Gaussian noise. The results suggest that the method provides a scalable, interpretable, and efficient approach for fault diagnosis in electromechanical systems within Industry 4.0 environments.

Keywords:

computer vision; image processing; thermography; fault detection; induction motor; gearbox

1. Introduction

Industry 4.0 is changing the manner in which industrial processes are designed, monitored and optimized [1]. In this context, the reliability of electromechanical systems and their components, such as electric motors, gearboxes, shafts, pumps, and couplings, among others, has become a relevant concern [2], since unexpected faults can lead to unplanned downtime, production losses and increased maintenance costs. To face this challenge, modern condition monitoring approaches are developed and based on the use of advanced data acquisition systems [3] that may include different physical magnitudes such as acoustic [4,5], vibration [6,7], electrical [8,9], thermographic [10,11] and/or multi-sensor [12,13] information. Moreover, recent advances have proven that these data sources enable more accurate and scalable fault diagnosis when combined with artificial intelligence and machine/deep learning techniques (ML, DL) [14]. Nevertheless, achieving reliable fault detection and classification in such systems remains a challenging task, mainly due to the complexity of signals, the variability of operating conditions, and the need for robust models capable of generalizing across different scenarios [15].

The above-mentioned physical magnitudes represent feasible tools to be used in the condition assessment of induction motors (IMs) and gearboxes (GBs). In fact, vibration-based fault diagnosis is the most well-known and widely used strategy for this purpose [16,17,18], even for assessing any rotating machine. However, the installation of accelerometers or vibration sensors is invasive and often impractical in systems with compact or intricate configurations, which complicates the mounting process. Furthermore, vibration sensors are primarily used as sensitive sensors in the monitoring of mechanical faults such as unbalance, misalignment, bearing defects, and broken rotor bars, while incipient gearbox wear or localized issues in complex assemblies may be less evident. Likewise, acoustic-based methods offer a non-invasive alternative [19,20,21], as they use microphones and do not require prior installation; nevertheless, their accuracy is strongly affected by noisy environments. Electric-based methods can also be slightly invasive or non-invasive and are capable of detecting multiple faults [22,23,24], but they are limited to punctual electrical issues or faults that introduce effects in electrical signatures; however, their robustness can decrease when the fault assessment is performed under variable load conditions. These limitations are often addressed by multi-input approaches, which correlate signals obtained from two or more sources [25,26,27]. For instance, the fusion of vibrations and stator current signatures has been accepted as a reliable combination that leads to accurate diagnosis. Thus, such strategies can effectively compensate for the individual disadvantages of each method, leading to more reliable fault detection and diagnosis; however, they also increase implementation costs, computational requirements, and system complexity, which may restrict their practical applicability. Although these data sources are widely used in electromechanical fault detection, each sensing system presents specific limitations related to installation, noise sensitivity, or dependency on operating conditions. In this context, thermographic analysis through image processing has emerged as a valuable alternative, since it provides a non-invasive and global view of the thermal behavior of machines, which is directly linked to fault progression.

Infrared thermography (IRT) offers significant advantages for the early detection of faults, providing a non-invasive and real-time assessment of temperature variations in electromechanical systems. Since IRT is commonly represented through images, deep learning (DL) networks have enabled the development of numerous studies leveraging this type of sensing [28,29]. In fact, numerous approaches usually focus on the numerical calculation of statistical features and textural characteristics; for instance, in [30], statistical and GLCM features are extracted from thermal images, the most relevant ones are selected using SVM-RFE, and faults are classified with SVM and k-NN; this approach relies exclusively on numerical statistical and GLCM features extracted from the thermal images, without explicitly considering the spatial distribution of heat or visual patterns of thermal anomalies within the image. On the other hand, some studies perform the selection of the region of interest (ROI) through specific coordinates or manual cropping [31,32,33], but this task is a critical issue that complicates end-to-end automation. Some research proposes thermal image-based fault diagnosis for single-phase IMs and transformers using semantic segmentation and CNNs [34]. Accordingly, fault regions are automatically segmented, and multiple CNN models classify the images, achieving very high accuracy. However, the main disadvantage is the high computational cost and the requirement of large, labeled datasets to train the segmentation and classification models. Despite the advances in thermal image-based fault diagnosis, current approaches still face challenges in real-world applications. Manual or coordinate-based ROI selection complicates end-to-end automation, while statistical feature-based methods may miss subtle spatial patterns of heat. Meanwhile, CNN-based approaches capture complex patterns but require labeled datasets and high computational costs. Additionally, infrared thermography is sensitive to environmental conditions and load variations, which can affect reliability [35]. These limitations highlight the need for robust, efficient, and interpretable fault detection methods.

Finally, regarding new research areas and recent studies, condition monitoring and fault diagnosis demand developing models capable of generalizing across diverse operating conditions, especially when thermal patterns or machine conditions differ from those observed during training. In this sense, domain adaptation techniques based on maximum classifier discrepancy and deep feature alignment have shown great potential for improving diagnostic robustness in such scenarios by learning domain-invariant and discriminative representations under changing distributions [36]. Likewise, advances in end-to-end deep learning architectures in other complex real-world applications—such as contextual speech recognition—demonstrate how task-adapted feature extraction and built-in encoder–decoder mechanisms can significantly improve performance even in noisy or resource-constrained environments [37]. These findings reinforce the need for approaches that directly learn relevant features from raw data, while maintaining generalizability in practical Industry 4.0 scenarios, which motivates the development of the end-to-end CNN framework proposed in this work.

Therefore, to address these challenges, this work proposes a novel end-to-end approach for fault detection and classification in electromechanical systems by means of processing infrared thermography images. The main contributions of the present study include the following:

i.: A fully automatic preprocessing phase that segments regions containing Relevant Heat Areas (RHA), providing numerical, visual, and geometric descriptors of the heating patterns associated with each fault condition, including random noise reduction filtering.
ii.: Data augmentation that enhances the realism of the training set, with the main novelty being the inclusion of noisy data augmentation to reinforce robustness under adverse thermographic acquisition conditions.
iii.: A convolutional neural network focused on extracting local and global features, leveraging both the heat patterns and the geometric distribution of the RHA.

Moreover, this approach bridges the gap between traditional statistical methods and complex DL models, providing an interpretable and scalable solution for reliable fault detection. Test results demonstrated a global accuracy of 99.7%. Robustness tests validated the functionality of the proposed method under different simulated distortion conditions.

2. Materials and Methods

The methodology proposed in this work consists of five main stages that integrate the electromechanical system, dataset acquisition, dataset preprocessing, data augmentation, and DL-based classification. Figure 1 shows the structure of the present approach.

2.1. Electromechanical System

The present work focuses on evaluating and classifying different conditions in an electromechanical system (ES). The ES is a self-designed system used for laboratory tests. It consists of a three-phase 220 VAC 1.49 kW induction motor (IM) manufactured by WEG Electric Corporation, Houston, TX, USA (manufactured part 00236ET3E145T-W22), a 4:1 gearbox (GB), and a DC generator, as shown in Figure 2. These elements (IM, GB, and DC generator) are shaft-to-shaft linked through rigid couplings. In this regard, two cases are considered: The first case includes different faults induced in the IM, which include bearing defects (BD), half-broken rotor bar (1⁄2BRB), fully (one) broken rotor bars (1BRB), unbalance (UNB), and misalignment (MIS). The second case focuses on detecting four levels of uniform wear in the gearbox: 0%, 25%, 50%, and 75% (HLT, GB25, GB50 and GB75). Each condition is tested individually under two operating frequencies (50 Hz and 60 Hz) provided by a variable frequency drive (VFD) producing average rotating speeds in the IM of about 2985 rpm and 3590 rpm, respectively.

2.2. Dataset Acquisition

Data acquisition is performed using an IR camera (FLIR GF320, FLIR Systems, Inc., Wilsonville, OR, USA) positioned in front of the ES to capture its emitted infrared spectrum; the IR camera is placed at the center of the ES, and the distance between the ES and the camera is approximately 1.5 m. During each experimental test, the camera is configured to record thermal images from start-up to the thermal steady state of the ES, capturing 5 images per minute over 100 min of continuous operation. Although transient thermal images were also collected, this study focuses exclusively on images corresponding to the thermal steady state. After removing unusable data that belonged to the thermal transient state, a total of 340 valid thermal images were obtained for each tested condition and operating frequency. The thermal images are acquired as 8-bit grayscale with a resolution of 320 × 240 pixels, where the pixel intensity scale [0, 255] represents the emitted infrared spectrum. Also, the grayscale acquisition is normalized to measure a temperature range of +20 °C to +50 °C. Figure 2 also presents a general representation of the data acquisition process.

For dataset construction, images from each condition and frequency were organized into training, validation, and test sets. A split of 70% for training, 15% for validation, and 15% for testing is taken into account; hence, IR images are randomly assigned for training, validation and testing. Afterward, conditions at 50 Hz and 60 Hz are merged into a single class with a total of 680 images (i.e., GB25 at both frequencies combined into one class); this step is performed to prevent frequency bias within individual classes. As a result, the final datasets for each tested condition consist of 476 images for training and 102 images for validation and testing; this data segregation allows us to ensure a balanced distribution across all classes and operating frequencies for robust model evaluation.

2.3. Data Preprocessing

The proposed data preprocessing algorithm aims to generate output images that preserve only relevant heat areas (RHAs). Thus, this step removes low-temperature regions, allowing the training network to filter out irrelevant background information while simultaneously providing a visual representation of the heat distribution. The resulting heating patterns, characterized by their location, size, and shape, constitute novel features that enhance the distinction between classes. The overall preprocessing stage is illustrated in Figure 3.

The process shown in Figure 3 is applied to each sample in the dataset for each tested condition, including train, test, and validation folders. Each image is denoted as

s_{i} (x, y)

, where

s

is the sample image in the dataset,

(x, y)

represents the image dimensions (

x = 320

,

y = 240

), and

i

indicates the index of the sample within the dataset (

i = 1, \dots, N

). To each

s_{i} (x, y)

, the operation defined in Equation (1) is applied. This operation identifies the maximum grayscale value in the image and uses it to normalize or set the reference intensity for each processed image.

m_{i} = \max_{x, y \in Ω} s_{i} (x, y)

(1)

where

m_{i}

is the maximum grayscale value of the

i

-th image, and

Ω

represents the set of all pixel coordinates in the image. Each single image contains a unique and specific

m_{i}

value. Based on these results, Equation (2) is applied.

b_{i} (x, y) = \{\begin{matrix} 1 & w h e n s_{i} (x, y) \geq α * m_{i} \\ 0 & o t h e r w i s e \end{matrix}

(2)

where

b_{i} (x, y)

is the resulting binary value at pixel

(x, y)

of the

i

-th image, and

α

is a threshold factor that determines the minimum fraction of the maximum grayscale value required for a pixel to be set to 1, thus identifying the RHA. As thermographic images are prone to noise, especially when the image acquisition system is of lower quality, a noise reduction phase is applied using binary morphological reconstruction. First, an initial seed is obtained by eroding the binary mask

b_{i} (x, y)

, as expressed in Equation (3):

{\hat{b}}_{i} (x, y) = b_{i} (x, y) ⊖ B

(3)

where

{\hat{b}}_{i} (x, y)

is the eroded binary mask (this will be used as a seed for the reconstruction), and

B

is a

3 \times 3

square structuring element. Next, the binary mask is reconstructed by dilation, constrained by the original mask; this is expressed in Equation (4).

R_{i} (x, y) = R e c ({\hat{b}}_{i}, b_{i})

(4)

where

R e c ({\hat{b}}_{i}, b_{i})

denotes the morphological reconstruction by dilation. This process restores the larger connected regions in the mask while eliminating small noisy components. Finally, the pixels in

R_{i} (x, y)

that are set to 1 are replaced with the corresponding intensity values from the original image

s_{i} (x, y)

, resulting in a cleaned image where the RHAs retain their original grayscale values; this operation is shown in Equation (5).

{\hat{s}}_{i} (x, y) = \{\begin{matrix} s_{i} (x, y) & w h e n R_{i} (x, y) = 1 \\ 0 & o t h e r w i s e \end{matrix}

(5)

where

{\hat{s}}_{i} (x, y)

is the final output image that contains the original pixels from

s_{i} (x, y)

. This provides an output that highlights the most relevant areas, which can be differentiated not only by temperature and heating patterns, but also by RHA size, shape, and location, as illustrated in Figure 4. Specifically, Figure 4a to Figure 4f illustrate the outcome of the proposed preprocessing applied to one of the acquired thermal images for the conditions: HLT, ½ BRB, 1 BRB, MIS, UNB, and GB50, respectively. The obtained images show that the method not only segments the RHA effectively but also preserves their original grayscale values, thereby maintaining the thermal information useful for fault classification. Therefore, the final images highlight the spatial distribution, shape, and size of the RHAs, while also retaining their intensity variations, which reflect temperature differences. Each condition exhibits distinct heating patterns that provide discriminative features for fault identification, demonstrating that the preprocessing enhances both the clarity and the relevance of the data. By reducing irrelevant background information and emphasizing informative thermal regions, the proposed approach generates improved inputs that can enhance the training performance. Such enriched spatial and intensity features constitute valuable input parameters for training CNNs.

The proposed data processing is a key stage since it leads to preserving only RHAs; in this sense, the threshold factor

α

was heuristically set to 0.35. This value defines the minimum fraction of the maximum temperature required for a pixel to be considered part of the Relevant Heating Area (RHA). The aim of setting

α = 0.35

is to discard low-intensity pixels corresponding to background or thermally irrelevant regions while retaining the main temperature gradients associated with heating patterns. A higher α value would cause an excessive reduction in thermal information, eliminating meaningful heating zones that may represent incipient anomalies or relevant local heat conduction patterns. Conversely, lower values would include a larger portion of the background, thereby reducing the discriminative power of the RHA and introducing noise. Although the selection of 0.35 was arbitrary, it was guided by the objective of preserving sufficient information from the heating patterns rather than performing overly aggressive background suppression, ensuring that the preprocessed images capture the essential thermal features relevant for fault classification.

2.4. Data Augmentation

As can be inferred, thermographic images are relatively simple grayscale representations; consequently, models trained on such data are prone to overfitting, which is why state-of-the-art approaches often incorporate data augmentation. In this context, the present work introduces a set of augmentation functions designed not only to replicate realistic data acquisition conditions but also to prevent the model from memorizing the background, which is consistently assumed to be zero across the training, validation, and test images.

To address the limitations of thermographic images and reduce the risk of overfitting, a set of data augmentation strategies was implemented. In addition to geometric transformations such as small rotations (±15°), translations (10% in width and height), and zoom variations (±15%), brightness adjustments within the range of 0.85–1.15 were also applied to replicate changes in illumination that may occur during data acquisition. Furthermore, a custom noise augmentation function was introduced, which combines Gaussian noise (

σ = 2.0

) with salt-and-pepper noise affecting approximately 10% of the pixels. This aims to mimic sensor noise and acquisition artifacts commonly present in low-quality thermographic cameras, while also preventing the model from relying solely on a homogeneous background, which is consistently zero in the training, validation, and test datasets. Unlike typical augmentation, the inclusion of controlled noise injection explicitly addresses the characteristics of thermographic acquisition, ensuring robustness under both realistic and non-realistic perturbation scenarios. Together, these augmentations enhance the variability and realism of the training data, promoting better generalization of the model.

2.5. CNN Architecture and Implementation

The proposed architecture for the training process is shown in Figure 5. It is based on a CNN, where the augmented dataset is used as the input for model training; the CNN is designed to progressively learn hierarchical features through a sequence of convolutional layers with 3 × 3 and 5 × 5 kernels, batch normalization, ReLU activation, and max-pooling operations. This progression enables the model to capture pixel-level intensity patterns, fine thermal textures, and the global geometry of hot regions, thereby enhancing fault classification performance.

As observed in Figure 5, the input data is used without normalization (grayscale kept in [0, 255]), avoiding the loss of contrast between high- and low-temperature regions. The process goes through the augmentation pipeline to prepare the training set. The CNN is then structured in two phases/stages:

Stage 1 (local feature extraction): Three layers configured as $C o n v 2 D (3 x 3) + B N + R e L u + M a x p o o l i n g 2 D (2 x 2)$ , with 16 filters in the first layer and 32 filters in the subsequent ones. This block extracts local, pixel-level intensity features characterizing the distribution of RHAs.
Stage 2 (global feature extraction): Two layers configured as $C o n v 2 D (5 x 5) + B N + R e L u + M a x p o o l i n g 2 D (2 x 2)$ , with 32 filters each. This block captures higher-level information such as the shape, size, and global distribution of RHAs, while reducing sensitivity to pixel-level distortions.

Finally, a Flatten layer vectorizes the feature maps into a 1D array that integrates local and global descriptors. A dropout layer (

r a t e = 0.5

) is applied before the output stage to reduce overfitting caused not only by the limited dataset size but also by the high variability introduced during augmentation. The number of filters is intentionally kept constant to balance representational power with computational efficiency, considering the relatively low complexity of thermographic images.

In regard to its implementation, during the training of the proposed structure, the Adam optimizer is used with a learning rate of 0.001 and a batch size of 12; the training process is run for 30 epochs. All experiments are conducted on a workstation running Windows 11, equipped with an Intel Core i7-12700H 12th gen (4.7 GHz) and an NVIDIA GeForce RTX 4060 GPU, and the image processing is configured to be performed with the GPU using the TensorFlow-GPU 2.10 package.

3. Test and Results

To evaluate the performance of the proposed method, several experiments are conducted using the test dataset described in Section 2.2. The evaluation focuses on the fault classification accuracy across the nine conditions that are tested in the IM and GB of the ES (HLT, BD, ½ BRB, 1 BRB, MIS, UNB, GB25, GB50, and GB75). Hence, the performance is evaluated through confusion matrices, as well as quantitative metrics such as global accuracy (GA), precision, F1-score and recall. The confusion matrix in Table 1 summarizes the individual classification ratios obtained without any noise perturbation applied to the test images. As expected, the model demonstrates high discriminative ability, with the majority of samples being correctly classified in their respective categories, obtaining a GA = 99.78%. Although the misclassification between faulty conditions is also achieved, this issue is not critical since it can be considered as a false positive that can be confirmed by technicians through inspection.

To further analyze robustness, two different noise perturbations are applied to test images, Gaussian blur and additive Gaussian noise, and the performance of the model is evaluated under these conditions. As stated, these filters are chosen because they simulate real-world degradations commonly present in infrared imaging, such as resolution loss, motion blur, and thermal or electronic noise [38], allowing a more realistic assessment of the model’s ability to handle imperfect or noisy images. The formulas that denote the Gaussian blur are expressed in Equations (6) and (7).

I_{b l u r} (x, y) = \sum_{i = - k}^{k} \sum_{k = - k}^{k} G (i, j; σ) * s_{n} (x - i, y - j)

(6)

G (i, j) = \frac{1}{2 π σ^{2}} \exp (- \frac{i^{2} + j^{2}}{2 σ^{2}})

(7)

where

s_{n} (x, y)

denotes the n-th image from the unprocessed dataset, with spatial coordinates

(x, y)

. The Gaussian kernel

G (i, j; σ)

is defined by the standard deviation

σ

, which controls the amount of blur, and the kernel size is determined by the parameter

k

. In this work,

σ

is not manually tuned; instead, it is set according to the automatic configuration of the Python (version 3.10) implementation used, ensuring reproducibility of the results. Accordingly, Figure 6a,b show a sample image in its original format and the corresponding filtered image with

k = 3

, respectively. As observed, the filtered image presents a clear degradation in terms of image quality in contrast with the original image; thus, it can be expected that the addition of noise may affect the performance of the model if it lacks robustness.

Accordingly, once the test images are filtered using the factors

k = 3

and

k = 5

, which emulate different levels of defocus during image acquisition, the proposed methodology is then applied to these perturbed datasets; hence, the corresponding results achieved by the proposed model are presented in Table 2 and Table 3, respectively. Certainly, Table 2 shows the confusion matrix with the individual classification for each assessed condition; although the test images are filtered by using

k = 3

, the classifier achieves near-perfect recognition across all categories with GA = 99.24%. Some samples are also misclassified, but only minor confusion is observed in a single sample between the GB50 and UNB conditions, as well as six samples between BRB and ½ BRB. In contrast, the confusion matrix summarized in Table 3, with

k = 5

, reveals a degradation in performance, particularly for the GB25 condition, which exhibits 43 misclassifications as MIS. These results indicate that while the method remains robust under moderate blur, higher levels of filtering reduce the discriminative features of specific classes, especially GB25, yet accurate classification is preserved for the remaining categories, leading to GA = 95.21%.

Moreover, to further assess the robustness of the proposed method, additive Gaussian noise is also applied to the test images. Unlike Gaussian blur, which models defocusing, this perturbation represents sensor noise or environmental interference during image acquisition. From a viewpoint of real-world applications in the industry, the acquisition of thermal images is often exposed to multiple factors that affect signal quality, such as the presence of dust, oil vapor, mechanical vibrations, or changes in ambient conditions. These elements introduce random disturbances into the images, manifested as variations in pixel intensity, very similar to the behavior of additive Gaussian noise. These disturbances can originate from infrared camera limitations, electronic interference, or variations in surface emissivity during operation. For this reason, incorporating controlled Gaussian noise leads to a more realistic reproduction of adverse field conditions. From a condition-monitoring perspective, this strategy increases the robustness of the proposed model and ensures reliable performance even when the data comes from noisy and imperfect scenarios typical of Industry 4.0. The mathematical formulation of the additive Gaussian noise is provided in Equations (8) and (9).

I_{n o i s e} (x, y) = I (x, y) + n (x, y)

(8)

n (x, y) ~ N (μ, σ^{2})

(9)

where

I (x, y)

represents the original image intensity at spatial coordinates

(x, y)

, and

n (x, y)

is a random variable drawn from a Gaussian distribution with mean

μ

and variance

σ^{2}

. The resulting image

I_{n o i s e} (x, y)

simulates the effect of sensor noise or environmental disturbances, allowing the evaluation of the method’s robustness under realistic acquisition conditions. Figure 7a,b illustrate an original thermal image from the test data and its corresponding filtered image using

σ = 15

and

μ = 5

, which simulate aggressive noise combined with emissivity variation.

Consequently, the test images are filtered using three conditions:

σ = 5

and

μ = 0,

which represent low- to medium-noise conditions;

σ = 8

and

μ = 0

, representing medium- to high-noise conditions; and

σ = 15

and

μ = 5

to simulate aggressive adverse conditions. These conditions reflect typical adverse scenarios encountered during thermal inspections of electromechanical systems in industrial assets. Hence, the proposed method is then applied to these perturbed test data, and the corresponding results are presented in Table 4, Table 5 and Table 6. Certainly, Table 4 shows the confusion matrix obtained with

σ = 5

and

μ = 0

, where the classifier achieves high recognition accuracy across all assessed conditions with GA = 98.15%, with only minor confusions as observed in a single GB50 sample misclassified as ½ BRB, eight misclassified samples between UNB and 1 BRB, and eight wrongly classified samples between the MIS and UNB conditions. Although misclassified samples are present, there exists a relationship between the involved conditions; that is, the observed confusion between conditions such as ½ BRB, 1BRB, MIS and UNB can be attributed to the fact that these faults tend to generate similar thermal and mechanical phenomena in the ES, specifically in the IM. Both UNB and MIS cause additional mechanical stresses that result in irregular heating patterns in the bearings and stator, while the presence of BRB modifies the current distribution and produces localized temperature increases in the rotor area. Due to this overlap in thermal manifestations, it is expected that the proposed model based on IR images will exhibit some degree of confusion between these categories, which reflects a more physical proximity between the failure mechanisms than a critical limitation of the method.

On the other hand, for Table 5 with

σ = 8

and

μ = 0

, a notable degradation in performance leads to obtaining GA = 89.76%; specifically, the UNB condition is the most affected condition since twenty-nine samples are misclassified as MAL and nine samples are classified as GB75. The rest of the misclassified samples are between the conditions BD, ½ BRB, 1 BRB and GB50; these issues may be caused by similarities between patterns associated with the studied faults, as previously explained. Finally, Table 6 with

σ = 15

and

μ = 5

reveals a more severe impact, achieving GA = 90.09%, where the BD and BRB classes present considerable misclassifications into other categories, highlighting a significant reduction in robustness under stronger noise conditions. These results demonstrate that while the proposed method maintains high accuracy under moderate additive Gaussian noise, its discriminative power progressively declines as the noise variance increases, particularly affecting classes with closer thermal patterns such as GB25, BD, and BRB. The results show that the proposed method preserves reliable classification under moderate distortions, while the expected degradation at higher perturbation levels mainly affects classes with closely related thermal patterns, yet overall robustness is maintained.

As a summary, Table 7 shows the results achieved through different tests that demonstrate the robustness of the proposed method under varying levels of image perturbation. For the unaltered dataset, the model achieved near-perfect performance, with an accuracy of 99.78% and an F1-score of 1.00, confirming its effectiveness under ideal conditions. When additive Gaussian noise was introduced, the performance gradually decreased as the noise intensity increased: with

σ = 5

and

μ = 0

(low to medium noise), the model maintained high accuracy (98.15%) and F1-score (0.98), indicating resilience to minor sensor or environmental disturbances. Higher noise levels,

σ = 8

and

μ = 0

(medium to high) and

σ = 15

and

μ = 5

(aggressive), caused a more pronounced reduction in performance (accuracy ~90%, F1 ~0.90), yet the model still reliably classified most samples. For Gaussian blur, representing defocus effects during acquisition, the method achieved 99.24% accuracy for

k = 3

and 95.21% for

k = 5

, illustrating robustness to moderate blur but a sensitivity to stronger defocusing. Overall, these results confirm that the proposed method remains effective across a range of realistic adverse conditions commonly encountered in thermal inspections of electromechanical systems.

Therefore, to ensure the reliability of thermographic images in industrial environments, it is advisable to perform acquisition under controlled conditions that reduce the influence of external factors. These include a constant and known distance between the camera and the assessed machinery, proper calibration of the material’s emissivity, absence of air currents or vapors that alter infrared radiation, stable ambient lighting, and surfaces free of dust, grease, or dirt that could alter thermal reflectivity. These ideal conditions favor obtaining more consistent and representative measurements of the actual machinery conditions.

To further evaluate the efficiency and scalability of the proposed approach, a comparative study was conducted against representative deep learning architectures. The selected architectures include ResNet-18 [39], VGG-16 [40], and ResNet-50 [41], all trained on the same dataset without applying the RHA preprocessing stage, but incorporating a standard data augmentation procedure to prevent overfitting. The applied augmentations consisted of small rotations (±15°), width and height shifts (10%), zoom variations (±15%), and brightness adjustments within the range of 0.85–1.15. The results, summarized in Table 8, demonstrate that the proposed method achieved the highest accuracy (98.15%) and F1-score (0.98), confirming the discriminative advantage provided by the RHA extraction. In terms of computational efficiency, the training time (10.73 min) and prediction time (≈154 ms per image) remain competitive and suitable for real-time condition monitoring. The main limitation is the increased memory usage (≈2.2 GB), mainly attributed to the preservation of full-scale thermal intensity values and the use of dual-kernel convolutional blocks (3 × 3 and 5 × 5). Nonetheless, this design provides richer spatial and thermal information, justifying the higher memory requirement for industrial applications where diagnostic accuracy and interpretability are prioritized. Therefore, this confirms that the proposed approach is not only accurate and robust but also computationally efficient, supporting its potential deployment in Industry 4.0 scenarios where local, on-device fault evaluation is preferred to reduce latency and cloud dependency.

Additionally, a brief comparison with previous reported works in the state of the art is also performed in order to emphasize the contribution of the proposed end-to-end diagnosis method. Thus, by comparing the proposed method with other approaches in terms of global accuracy (GA), it is considered that studies that incorporate different preprocessing and learning strategies are most effective, specifically, methods that require a preprocessing stage with ROI selection and a CNN for training [42], the combination of XGBoost and a fuzzy inference system (FIS) [33] and the end-to-end approach that integrates a semantic segmentation preprocessing phase [34]. The comparison is based on terms of GA values reported by the respective authors, as summarized in Table 9.

From the comparison, it can be observed that the proposed method achieves the highest GA among the reviewed approaches. While the other methods report GA values ranging from 95% to 96.49%, the proposed approach demonstrates a significant improvement, reaching 99.7%. This improvement can be attributed to the preprocessing phase, which eliminates irrelevant background from the thermal images while preserving the Relevant Heating Areas (RHAs). This not only highlights the thermal patterns but also retains geometric information such as size, shape, and relative location of the heated zones, enhancing the input data for model training. Additionally, data augmentation, including the introduction of noisy data, prevents the model from memorizing the black background or specific patterns. The use of both 3 × 3 and 5 × 5 kernels further supports robust feature extraction: 3 × 3 kernels capture fine-grained, pixel-level details, while 5 × 5 kernels capture broader, spatial patterns. This combination allows the model to learn both local and global features without depending on a single scale of information.

4. Ablation Study

A relevant aspect to assess is the contribution of the key components in the proposed methodology, namely the preprocessing stage, the combination of 3 × 3 and 5 × 5 kernels, and the inclusion of noisy data augmentation. To this end, an ablation study is carried out by systematically removing one component at a time and evaluating the resulting performance. For a more realistic assessment, the analysis is conducted using test images perturbed with additive Gaussian noise (

σ = 5

), since in thermographic applications, unprocessed images may yield artificially high accuracy values that do not adequately reflect the robustness of the method. This strategy enables the quantification of the individual impact of each contribution and highlights the added value of the complete proposed architecture compared to simplified baseline configurations. Accordingly, the results of this analysis are summarized in Table 10.

Table 10 shows that removing the preprocessing step leads to the most significant performance drop, with accuracy falling to 80.17% and the F1-score down to 0.75. This confirms that the preprocessing stage is essential to filter irrelevant background information and enhance the discriminative features of the thermal patterns. The exclusive use of 3 × 3 kernels also reduces performance compared to the proposed mixed-kernel design, achieving 93.9% accuracy, which indicates that the inclusion of 5 × 5 kernels is beneficial for capturing larger-scale thermal structures. Finally, omitting noise-based augmentation results in lower generalization capacity (87.15%), highlighting the role of this component in improving robustness. In summary, the ablation study demonstrates that each of the proposed components contributes significantly to the overall performance, with the preprocessing step being the most critical, while kernel diversity and noise augmentation provide complementary improvements that strengthen the model’s accuracy and robustness.

5. Discussion

The results obtained in this work demonstrate that the proposed methodology achieves a classification performance comparable to or even superior to state-of-the-art approaches in infrared thermography-based fault diagnosis. While several existing studies rely on handcrafted features or manual selection of regions of interest, the present method provides a fully automatic, end-to-end pipeline that reduces preprocessing complexity while enhancing scalability and reproducibility. An important contribution of this study is the introduction of the Relevant Heating Area (RHA) preprocessing step, which not only eliminates irrelevant background information but also provides descriptors of the size, shape, and distribution of heat regions. These descriptors contribute to the interpretability of the model, offering a visual explanation of fault patterns that is often missing in deep learning approaches.

Robustness analysis further validated the applicability of the proposed method under realistic operating conditions. When subjected to Gaussian blur and additive Gaussian noise, the model maintained strong recognition performance under moderate perturbations, although some degradation was observed at higher noise levels, particularly for classes with closely related thermal patterns such as GB25. This behavior is expected, since adverse conditions inevitably reduce discriminative features in thermographic images; nevertheless, the method preserved high accuracy in most categories, confirming its resilience in practical scenarios. Moreover, it should also be considered that different faults tend to produce similar patterns that can be masked between them and produce a performance reduction.

The ablation study provides additional insights into the relative importance of the proposed components. Results revealed that the preprocessing stage plays the most critical role in preserving discriminative thermal patterns, while the combination of 3 × 3 and 5 × 5 kernels enhances the capture of both local and global features. The inclusion of noisy data augmentation proved beneficial in improving generalization, particularly under distorted image conditions. Together, these elements justify the design choices of the proposed architecture and highlight the added value of the integrated methodology.

Despite the promising results, some limitations must be acknowledged. The dataset used in this study was collected under controlled laboratory conditions, which may not fully replicate the variability encountered in industrial environments. Additionally, the fault conditions analyzed in this work represent a specific subset of common failures, and further research is required to extend the methodology to a broader range of fault scenarios, even combined faults in the same ES. Future work may explore the deployment of this approach in real-time monitoring systems and its implementation on edge devices, where the low computational complexity of the model could provide significant advantages. Moreover, combining thermographic data with other sensing modalities, such as vibration, stator currents, stray flux and/or acoustic signals, may enhance diagnostic robustness while preserving the interpretability benefits of the proposed preprocessing strategy.

6. Conclusions

In this work, we proposed an end-to-end CNN methodology for fault diagnosis in electromechanical systems (ES). The proposal is based on detecting and extracting the Relevant Heating Areas from IR thermographic images. The results of this work demonstrate that the proposed methodology provides an effective and robust solution for fault diagnosis in different elements (the induction motor and gearbox) of an ES through infrared thermography. The introduction of the RHA as a preprocessing stage proved to be the most decisive component, significantly enhancing the discrimination of thermal patterns, while the combined use of 3 × 3 and 5 × 5 kernels improved the extraction of both local and global features. Furthermore, the inclusion of noise-based data augmentation increased the model’s generalization capacity under adverse acquisition conditions. With accuracy levels exceeding 99% and sustained robustness against distortions, the proposed approach is validated as a practical tool for condition monitoring. Nevertheless, the experiments are conducted under controlled laboratory conditions and with a limited range of faults, which represents a constraint, but through the proposed method, it is possible to perform the detection of nine different conditions. Future work should focus on validating the method in real industrial environments, extending the fault scenarios, and exploring integration with complementary sensing modalities. Overall, the proposed approach not only achieves high accuracy and robustness but also offers improved interpretability, which are key attributes for its implementation in Industry monitoring systems. Finally, the use of IR cameras represents a suitable solution that can be considered to perform condition monitoring in industrial assets due to being a contactless sensing technology.

Author Contributions

Conceptualization, G.A.-R. and I.A.E.-V.; Methodology, G.A.-R.; Software, A.Y.J.-C.; Validation, G.A.-R., I.A.E.-V. and A.Y.J.-C.; Formal analysis, J.J.S.-D.; Investigation, A.P.-C. and J.J.S.-D.; Resources, A.P.-C.; Data curation, G.A.-R. and J.J.S.-D.; Writing—original draft, G.A.-R.; Writing—review & editing, J.J.S.-D.; Visualization, A.P.-C.; Supervision, J.J.S.-D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the “Fondo para el Fortalecimiento de la Investigación, Vinculación y Extensión (FONFIVE-UAQ 2025)” under project grant FIN202556. Moreover, this work was also supported by the Mexican secretary of science, humanities, technology and innovation (SECIHTI) SNII grants, 666566, 558497, and 487599.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Drakaki, M.; Karnavas, Y.L.; Tzionas, P.; Chasiotis, I.D. Recent Developments Towards Industry 4.0 Oriented Predictive Maintenance in Induction Motors. Procedia Comput. Sci. 2021, 180, 943–949. [Google Scholar] [CrossRef]
Stockton, C.A.; McElveen, R.F.; Chastain, E. The Integral Role of Electric Motors in Achieving Sustainability. IEEE Trans. Ind. Appl. 2024, 60, 7949–7957. [Google Scholar] [CrossRef]
Gundewar, S.K.; Kane, P.V. Condition Monitoring and Fault Diagnosis of Induction Motor. J. Vib. Eng. Technol. 2021, 9, 643–674. [Google Scholar] [CrossRef]
Phuong, N.C. Condition Monitoring for Induction Motor Overload Using Sound; Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2021; Volume 739, pp. 133–141. [Google Scholar] [CrossRef]
Glowacz, A.; Sulowicz, M.; Zielonka, J.; Li, Z.; Glowacz, W.; Kumar, A. Acoustic Fault Diagnosis of Three-Phase Induction Motors Using Smartphone and Deep Learning. Expert Syst. Appl. 2025, 262, 125633. [Google Scholar] [CrossRef]
Wang, Z.; Shi, D.; Xu, Y.; Zhen, D.; Gu, F.; Ball, A.D. Early Rolling Bearing Fault Diagnosis in Induction Motors Based on On-Rotor Sensing Vibrations. Measurement 2023, 222, 113614. [Google Scholar] [CrossRef]
Safiullin, R.A. Vibration Diagnostics of Induction Motors. In Proceedings of the ICOECS 2021: 2021 International Conference on Electrotechnical Complexes and Systems, Ufa, Russia, 16–18 November 2021; pp. 228–232. [Google Scholar] [CrossRef]
Allal, A.; Khechekhouche, A. Diagnosis of Induction Motor Faults Using the Motor Current Normalized Residual Harmonic Analysis Method. Int. J. Electr. Power Energy Syst. 2022, 141, 108219. [Google Scholar] [CrossRef]
Vo, T.T.; Liu, M.K.; Tran, M.Q. Harnessing Attention Mechanisms in a Comprehensive Deep Learning Approach for Induction Motor Fault Diagnosis Using Raw Electrical Signals. Eng. Appl. Artif. Intell. 2024, 129, 107643. [Google Scholar] [CrossRef]
Jeffali, F.; Ouariach, A.; El Kihel, B.; Nougaoui, A. Diagnosis of Three-Phase Induction Motor and the Impact on the Kinematic Chain Using Non-Destructive Technique of Infrared Thermography. Infrared Phys. Technol. 2019, 102, 102970. [Google Scholar] [CrossRef]
Al-Musawi, A.K.; Anayi, F.; Packianather, M. Three-Phase Induction Motor Fault Detection Based on Thermal Image Segmentation. Infrared Phys. Technol. 2020, 104, 103140. [Google Scholar] [CrossRef]
Choudhary, A.; Mishra, R.K.; Fatima, S.; Panigrahi, B.K. Multi-Input CNN Based Vibro-Acoustic Fusion for Accurate Fault Diagnosis of Induction Motor. Eng. Appl. Artif. Intell. 2023, 120, 105872. [Google Scholar] [CrossRef]
Jimenez-Guarneros, M.; Morales-Perez, C.; Rangel-Magdaleno, J.D.J. Diagnostic of Combined Mechanical and Electrical Faults in ASD-Powered Induction Motor Using MODWT and a Lightweight 1-D CNN. IEEE Trans. Ind. Inform. 2022, 18, 4688–4697. [Google Scholar] [CrossRef]
Ertargin, M.; Yildirim, O.; Orhan, A. Mechanical and Electrical Faults Detection in Induction Motor across Multiple Sensors with CNN-LSTM Deep Learning Model. Electr. Eng. 2024, 106, 6941–6951. [Google Scholar] [CrossRef]
Kumar, R.R.; Andriollo, M.; Cirrincione, G.; Cirrincione, M.; Tortella, A. A Comprehensive Review of Conventional and Intelligence-Based Approaches for the Fault Diagnosis and Condition Monitoring of Induction Motors. Energies 2022, 15, 8938. [Google Scholar] [CrossRef]
Jang, J.G.; Noh, C.M.; Kim, S.S.; Shin, S.C.; Lee, S.S.; Lee, J.C. Vibration Data Feature Extraction and Deep Learning-Based Preprocessing Method for Highly Accurate Motor Fault Diagnosis. J. Comput. Des. Eng. 2023, 10, 204–220. [Google Scholar] [CrossRef]
Junior, R.F.R.; Areias, I.A.d.S.; Campos, M.M.; Teixeira, C.E.; da Silva, L.E.B.; Gomes, G.F. Fault Detection and Diagnosis in Electric Motors Using 1d Convolutional Neural Networks with Multi-Channel Vibration Signals. Measurement 2022, 190, 110759. [Google Scholar] [CrossRef]
Wang, S.; Wang, Q.; Xiao, Y.; Liu, W.; Shang, M. Research on Rotor System Fault Diagnosis Method Based on Vibration Signal Feature Vector Transfer Learning. Eng. Fail. Anal. 2022, 139, 106424. [Google Scholar] [CrossRef]
AlShorman, O.; Alkahatni, F.; Masadeh, M.; Irfan, M.; Glowacz, A.; Althobiani, F.; Kozik, J.; Glowacz, W. Sounds and Acoustic Emission-Based Early Fault Diagnosis of Induction Motor: A Review Study. Adv. Mech. Eng. 2021, 13, 1–19. [Google Scholar] [CrossRef]
Lucas, G.B.; De Castro, B.A.; Rocha, M.A.; Andreoli, A.L. A New Acoustic Emission-Based Approach for Supply Disturbances Evaluation in Three-Phase Induction Motors. IEEE Trans. Instrum. Meas. 2021, 70, 3508410. [Google Scholar] [CrossRef]
Bórnea, Y.P.; Vitor, A.L.O.; Goedtel, A.; Castoldi, M.F.; Souza, W.A.; Barbara, G.V. A Novel Method for Detecting Bearing Faults in Induction Motors Using Acoustic Sensors and Feature Engineering. Appl. Acoust. 2025, 234, 110627. [Google Scholar] [CrossRef]
Niu, G.; Dong, X.; Chen, Y. Motor Fault Diagnostics Based on Current Signatures: A Review. IEEE Trans. Instrum. Meas. 2023, 72, 3520919. [Google Scholar] [CrossRef]
Guan, B.; Bao, X.; Qiu, H.; Yang, D. Enhancing Bearing Fault Diagnosis Using Motor Current Signals: A Novel Approach Combining Time Shifting and CausalConvNets. Measurement 2024, 226, 114049. [Google Scholar] [CrossRef]
Choi, Y.R.; Joe, I. Motor Fault Diagnosis and Detection with Convolutional Autoencoder (CAE) Based on Analysis of Electrical Energy Data. Electronics 2024, 13, 3946. [Google Scholar] [CrossRef]
Shifat, T.A.; Hur, J.W. An Effective Stator Fault Diagnosis Framework of BLDC Motor Based on Vibration and Current Signals. IEEE Access 2020, 8, 106968–106981. [Google Scholar] [CrossRef]
Al-Haddad, L.A.; Jaber, A.A.; Hamzah, M.N.; Fayad, M.A. Vibration-Current Data Fusion and Gradient Boosting Classifier for Enhanced Stator Fault Diagnosis in Three-Phase Permanent Magnet Synchronous Motors. Electr. Eng. 2024, 106, 3253–3268. [Google Scholar] [CrossRef]
Qian, L.; Li, B.; Chen, L. CNN-Based Feature Fusion Motor Fault Diagnosis. Electronics 2022, 11, 2746. [Google Scholar] [CrossRef]
Chennana, A.; Bessous, N.; Megherbi, A.C.; Saidi, L.; Sbaa, S.; Sayadi, M.; Teta, A. Deep-Shallow Features Fusion for Induction Motor Fault Diagnosis through Infrared Thermography Images. In Proceedings of the 2025 International Conference on Control, Automation and Diagnosis (ICCAD), Barcelona, Spain, 1–3 July 2025; pp. 1–6. [Google Scholar] [CrossRef]
Calderon-Uribe, U.; Lizarraga-Morales, R.A.; Guryev, I.V. Fault Diagnosis in Induction Motors through Infrared Thermal Images Using Convolutional Neural Network Feature Extraction. Machines 2024, 12, 497. [Google Scholar] [CrossRef]
Atif, M.; Azmat, S.; Khan, F.; Albogamy, F.R.; Khan, A. AI-Driven Thermography-Based Fault Diagnosis in Single-Phase Induction Motor. Results Eng. 2024, 24, 103493. [Google Scholar] [CrossRef]
Alvarado-Hernandez, A.I.; Zamudio-Ramirez, I.; Jaen-Cuellar, A.Y.; Osornio-Rios, R.A.; Donderis-Quiles, V.; Antonino-Daviu, J.A. Infrared Thermography Smart Sensor for the Condition Monitoring of Gearbox and Bearings Faults in Induction Motors. Sensors 2022, 22, 6075. [Google Scholar] [CrossRef] [PubMed]
Resendiz-Ochoa, E.; Trejo-Chavez, O.; Saucedo-Dorantes, J.J.; Morales-Hernandez, L.A.; Cruz-Albarran, I.A. Application of Thermography and Convolutional Neural Network to Diagnose Mechanical Faults in Induction Motors and Gearbox Wear. Appl. Syst. Innov. 2024, 7, 123. [Google Scholar] [CrossRef]
Pohakar, P.; Gandhi, R.; Hans, S.; Sharma, G.; Bokoro, P.N. Analysis of Multiple Faults in Induction Motor Using Machine Learning Techniques. e-Prime—Adv. Electr. Eng. Electron. Energy 2025, 12, 101007. [Google Scholar] [CrossRef]
Aslan, B.; Balci, S.; Kayabasi, A. Fault Diagnosis in Thermal Images of Transformer and Asynchronous Motor through Semantic Segmentation and Different CNN Models. Appl. Therm. Eng. 2025, 265, 125599. [Google Scholar] [CrossRef]
Pan, D.; Mo, T.; Jiang, Z.; Duan, Y.; Li, Z.; Maldague, X.; Gui, W. Interference Factors and Compensation Methods When Using Infrared Thermography for Temperature Measurement: A Review. IEEE Trans. Instrum. Meas. 2025, 74, 1015124. [Google Scholar] [CrossRef]
Li, J.; Deng, W.; Dand, X.; Zhao, H. Cross-Domain Adaptation Fault Diagnosis with Maximum Classifier Discrepancy and Deep Feature Alignment Under Variable Working Conditions. IEEE Trans. Reliab. 2025, 74, 4106–4115. [Google Scholar] [CrossRef]
Guo, D.; Zhang, S.; Zhang, J.; Yang, B.; Lin, Y. Exploring Contextual Knowledge-Enhanced Speech Recognition in Air Traffic Control Communication: A Comparative Study. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 16085–16099. [Google Scholar] [CrossRef]
Wu, S.; Gao, B.; Woo, W.L.; Yu, Y.; Yang, Y. Defect Super-Resolution Algorithm Based on Infrared Thermal Imaging Physical Kernel. NDTE Int. 2025, 154, 103368. [Google Scholar] [CrossRef]
Hejazi, S.; Packianather, M.; Liu, Y. Novel Preprocessing of Multimodal Condition Monitoring Data for Classifying Induction Motor Faults Using Deep Learning Methods. In Proceedings of the 2022 IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security, (iSSSC 2022), Gunupur, India, 15–17 December 2022. [Google Scholar]
Zhu, X.; Hu, S.; Bing, Z.; Zhao, D. Infrared Thermal Image Fault Detection of Electric Motor Based on Improved VGG16 Network Model. In Proceedings of the 2024 6th International Conference on Communications, Information System and Computer Engineering, (CISCE 2024), Guangzhou, China, 10–12 May 2024; pp. 856–862. [Google Scholar]
Sakalli, G.; Koyuncu, H. Categorization of Asynchronous Motor Situations in Infrared Images: Analyses with ResNet50. In Proceedings of the 2022 International Conference on Data Analytics for Business and Industry, (ICDABI 2022), Online, 25–26 October 2022; pp. 114–118. [Google Scholar]
Trejo-Chavez, O.; Cruz-Albarran, I.A.; Resendiz-Ochoa, E.; Salinas-Aguilar, A.; Morales-Hernandez, L.A.; Basurto-Hurtado, J.A.; Perez-Ramirez, C.A. A CNN-Based Methodology for Identifying Mechanical Faults in Induction Motors Using Thermography. Machines 2023, 11, 752. [Google Scholar] [CrossRef]

Figure 1. Flow chart of the proposed end-to-end methodology applied to assess different faults in an electromechanical system.

Figure 2. Representation of the laboratory electromechanical systems and data acquisition.

Figure 3. Dataset preprocessing pipeline.

Figure 4. Obtained images after applying the proposed processing stage for preserving RHAs, for the conditions (a) HLT, (b) ½ BRB, (c) 1 BRB, (d) MIS, (e) UNB, and (f) GB50.

Figure 5. Block-diagram representation of the proposed CNN architecture.

Figure 6. Grayscale IR images: (a) original image obtained from the test folder and (b) its Gaussian blur filtering with k = 5.

Figure 7. Grayscale IR images extracted from the test folder: (a) original image and (b) its Gaussian blur filtering with

σ

= 15 and

μ = 5

.

Figure 7. Grayscale IR images extracted from the test folder: (a) original image and (b) its Gaussian blur filtering with

σ

= 15 and

μ = 5

.

Table 1. Confusion matrix obtained by applying the proposed end-to-end methodology to the test images.

		Target Class
		HLT	GB25	GB50	GB75	BD	MIS	UNB	½ BRB	1 BRB
Predicted class	HLT	102	0	0	0	0	0	0	0	0
	GB25	0	102	0	0	0	0	0	0	0
	GB50	0	0	101	0	0	0	0	0	0
	GB75	0	0	0	102	0	0	0	0	0
	BD	0	0	0	0	102	0	0	1	0
	MIS	0	0	0	0	0	102	0	0	0
	UNB	0	0	0	0	0	0	102	0	0
	½ BRB	0	0	1	0	0	0	0	101	0
	1 BRB	0	0	0	0	0	0	0	0	102

Table 2. Resulting confusion matrix achieved by evaluating the filtered test data through the proposed method, with Gaussian blur

k = 3

.

Table 2. Resulting confusion matrix achieved by evaluating the filtered test data through the proposed method, with Gaussian blur

k = 3

.

		Target Class
		HLT	GB25	GB50	GB75	BD	MIS	UNB	½ BRB	1BRB
Predicted class	HLT	102	0	0	0	0	0	0	0	0
	GB25	0	102	0	0	0	0	0	0	0
	GB50	0	0	101	0	0	0	0	0	0
	GB75	0	0	0	102	0	0	0	0	0
	BD	0	0	0	0	102	0	0	0	0
	MIS	0	0	0	0	0	102	0	0	0
	UNB	0	0	0	0	0	0	102	0	0
	½ BRB	0	0	1	0	0	0	0	102	6
	1 BRB	0	0	0	0	0	0	0	0	96

Table 3. Resulting confusion matrix achieved by evaluating the filtered test data through the proposed method, with Gaussian blur

k = 5

.

Table 3. Resulting confusion matrix achieved by evaluating the filtered test data through the proposed method, with Gaussian blur

k = 5

.

		Target Class
		HLT	GB25	GB50	GB75	BD	MIS	UNB	½ BRB	1 BRB
Predicted class	HLT	102	0	0	0	0	0	0	0	0
	GB25	0	59	0	0	0	0	0	0	0
	GB50	0	0	101	0	0	0	0	0	0
	GB75	0	0	0	102	0	0	0	0	0
	BD	0	0	0	0	102	0	0	0	0
	MIS	0	43	0	0	0	102	0	0	0
	UNB	0	0	0	0	0	0	102	0	0
	½ BRB	0	0	1	0	0	0	0	102	0
	1 BRB	0	0	0	0	0	0	0	0	102

Table 4. Obtained confusion matrix by using the noisy test data, with additive Gaussian noise

σ

= 5 and

μ = 0

.

Table 4. Obtained confusion matrix by using the noisy test data, with additive Gaussian noise

σ

= 5 and

μ = 0

.

		Target Class
		HLT	GB25	GB50	GB75	BD	MIS	UNB	½ BRB	1 BRB
Predicted class	HLT	102	0	0	0	0	0	0	0	0
	GB25	0	102	0	0	0	0	0	0	0
	GB50	0	0	101	0	0	0	0	0	0
	GB75	0	0	0	102	0	0	0	0	8
	BD	0	0	0	0	102	0	0	0	0
	MIS	0	0	0	0	0	102	8	0	0
	UNB	0	0	0	0	0	0	94	0	0
	½ BRB	0	0	1	0	0	0	0	102	0
	1 BRB	0	0	0	0	0	0	0	0	94

Table 5. Resulting confusion matrix obtained by using the noisy test data, with additive Gaussian noise

σ

= 8 and

μ = 0

.

Table 5. Resulting confusion matrix obtained by using the noisy test data, with additive Gaussian noise

σ

= 8 and

μ = 0

.

		Target Class
		HLT	GB25	GB50	GB75	BD	MIS	UNB	½ BRB	1 BRB
Predicted class	HLT	102	0	0	0	0	0	0	0	0
	GB25	0	92	0	0	0	0	0	0	0
	GB50	0	0	101	0	7	0	0	1	0
	GB75	0	0	0	102	0	0	9	0	9
	BD	0	0	0	0	93	0	0	0	0
	MIS	0	10	0	0	0	93	29	12	0
	UNB	0	0	0	0	0	0	64	6	0
	½ BRB	0	0	0	0	2	0	0	84	0
	1 BRB	0	0	1	0	0	9	0	0	93

Table 6. Confusion matrix achieved by using the noisy test data, with additive Gaussian noise

σ

= 15 and

μ = 5

.

Table 6. Confusion matrix achieved by using the noisy test data, with additive Gaussian noise

σ

= 15 and

μ = 5

.

		Target Class
		HLT	GB25	GB50	GB75	BD	MIS	UNB	½ BRB	1 BRB
Predicted class	HLT	102	11	0	0	0	0	0	0	0
	GB25	0	89	0	0	0	8	0	0	0
	GB50	0	0	102	0	5	0	0	1	0
	GB75	0	0	0	102	0	0	0	0	22
	BD	0	0	0	0	79	0	0	0	0
	MIS	0	2	0	0	0	94	0	0	23
	UNB	0	0	0	0	0	0	101	1	8
	½ BRB	0	0	0	0	18	0	0	101	0
	1BRB	0	0	0	0	0	0	1	0	57

Table 7. Metrics obtained under different perturbation conditions, including additive Gaussian noise and Gaussian blur, to evaluate the robustness of the proposed method.

Test Dataset Condition	Metrics
Test Dataset Condition	GA	Precision	Recall	F1-Score
Normal (no filter)	99.78%	1.00	1.00	1.00
Gaussian additive ( $μ = 0$ , $σ = 5$ )	98.15%	0.98	0.98	0.98
Gaussian additive ( $μ = 0$ , $σ = 8$ )	89.76%	0.91	0.90	0.90
Gaussian additive ( $μ = 5$ , $σ = 15$ )	90.09%	0.91	0.90	0.90
Gaussian blur ( $k = 3$ )	99.24%	0.99	0.99	0.99
Gaussian blur ( $k = 5$ )	95.21%	0.97	0.95	0.95

Table 8. Comparative performance of the proposed methodology and state-of-the-art models in terms of accuracy, precision, recall, F1-score, training and prediction time, and memory usage.

Setup	Metrics
Setup	GA	Precision	Recall	F1-Score	Training Time (min)	Prediction Time (ms)	Memory Usage
ResNet-18	94.44%	0.96	0.94	0.94	11.17	470.95	443 MB
VGG-16	90.52%	0.93	0.91	0.89	46.56	703.72	753 MB
ResNet-50	61.44%	0.81	0.61	0.58	22.77	525.4	680 MB
Proposed method	98.15%	0.98	0.98	0.98	10.73	154.7	2217.34 MB

Table 9. Comparative results of GA for fault detection and classification.

Dataset Condition	GA	Preprocessing	Augmentation
Reference [33]	95%	Normalization	No
Reference [42]	95%	No	Yes, restricted to salt and pepper
Reference [34]	96.49%	Semantic segmentation	Yes, limited to reflection, rotation, scale and translation
Proposed method	99.7%	Location of Relevant Heating Areas (RHA).	Rotation, translation, zoom variations, brightness adjustments, Gaussian blur, and Gaussian additive

Table 10. Results of the ablation study showing the impact of key components of the proposed methodology.

Setup	Metrics
Setup	GA	Precision	Recall	F1-Score
Without noise augmentation	87.15%	0.88	0.87	0.87
Pure (3 × 3) layers	93.90%	0.94	0.94	0.94
Without preprocessing step	80.17%	0.86	0.80	0.75
Proposed method	98.15%	0.98	0.98	0.98

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alvarado-Robles, G.; Perez-Cruz, A.; Espinosa-Vizcaino, I.A.; Jaen-Cuellar, A.Y.; Saucedo-Dorantes, J.J. Novel End-to-End CNN Approach for Fault Diagnosis in Electromechanical Systems Based on Relevant Heating Areas in Thermography. Technologies 2025, 13, 551. https://doi.org/10.3390/technologies13120551

AMA Style

Alvarado-Robles G, Perez-Cruz A, Espinosa-Vizcaino IA, Jaen-Cuellar AY, Saucedo-Dorantes JJ. Novel End-to-End CNN Approach for Fault Diagnosis in Electromechanical Systems Based on Relevant Heating Areas in Thermography. Technologies. 2025; 13(12):551. https://doi.org/10.3390/technologies13120551

Chicago/Turabian Style

Alvarado-Robles, Gilberto, Angel Perez-Cruz, Isac Andres Espinosa-Vizcaino, Arturo Yosimar Jaen-Cuellar, and Juan Jose Saucedo-Dorantes. 2025. "Novel End-to-End CNN Approach for Fault Diagnosis in Electromechanical Systems Based on Relevant Heating Areas in Thermography" Technologies 13, no. 12: 551. https://doi.org/10.3390/technologies13120551

APA Style

Alvarado-Robles, G., Perez-Cruz, A., Espinosa-Vizcaino, I. A., Jaen-Cuellar, A. Y., & Saucedo-Dorantes, J. J. (2025). Novel End-to-End CNN Approach for Fault Diagnosis in Electromechanical Systems Based on Relevant Heating Areas in Thermography. Technologies, 13(12), 551. https://doi.org/10.3390/technologies13120551

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Novel End-to-End CNN Approach for Fault Diagnosis in Electromechanical Systems Based on Relevant Heating Areas in Thermography

Abstract

1. Introduction

2. Materials and Methods

2.1. Electromechanical System

2.2. Dataset Acquisition

2.3. Data Preprocessing

2.4. Data Augmentation

2.5. CNN Architecture and Implementation

3. Test and Results

4. Ablation Study

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI