Thermal Imaging-Based Defect Detection Method for Aluminum Foil Sealing Using EAC-Net

Hao, Zhibo; Chen, Yitao; Yu, Zhongqi; Qian, Yongjin; Zhao, Leping

doi:10.3390/app15189964

Open AccessArticle

Thermal Imaging-Based Defect Detection Method for Aluminum Foil Sealing Using EAC-Net

by

Zhibo Hao

,

Yitao Chen

^*

,

Zhongqi Yu

,

Yongjin Qian

and

Leping Zhao

School of Mechanical Engineering and Automation, Wuhan Textile University, Wuhan 430200, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(18), 9964; https://doi.org/10.3390/app15189964

Submission received: 8 July 2025 / Revised: 22 August 2025 / Accepted: 9 September 2025 / Published: 11 September 2025

(This article belongs to the Section Applied Thermal Engineering)

Download

Browse Figures

Versions Notes

Abstract

Aluminum foil sealing is widely employed in industrial packaging, and the quality of sealing plays a crucial role in ensuring product integrity and safety. Thermal infrared images frequently exhibit non-uniform heat distribution and indistinct boundaries within the sealing region. Additionally, variations in thermal response and local structural characteristics are observed across different defect types. Thus, traditional detection methods exhibit limitations regarding their stability and adaptability. In this paper, a novel thermal image recognition algorithm called EAC-Net is proposed for the classification and detection of sealing defects in thermal infrared images. In the proposed method, EfficientNet-B0 is utilized as the backbone network to improve its adaptability for industrial deployment. Furthermore, the Atrous Spatial Pyramid Pooling module is incorporated to enhance the multi-scale perception of defect regions, while the Channel–Spatial Attention Mixing with Channel Shuffle module is adopted to strengthen the focus on critical thermal features. Significant improvements in recognition performance were verified in experiments, while both computational complexity and inference latency were effectively kept at low levels. In the experiments, EAC-Net demonstrated an accuracy of 99.06% and a precision of 99.07%, indicating its high robustness and application potential.

Keywords:

thermal imaging; deep learning; aluminum foil sealing; defect detection; attention module

1. Introduction

With the rapid development of the consumer goods industry, aluminum foil packaging materials are increasingly being utilized in sectors such as food [1] and pharmaceuticals [2], playing a pivotal role in ensuring product quality and consumer health. The sealing of aluminum foil not only affects the integrity of a product’s appearance but also determines its sealing performance and storage stability. However, traditional sealing techniques still encounter numerous technical bottlenecks, particularly in terms of issues with sealing defects that may lead to product deterioration, contamination, and even recall incidents. Consequently, enhancing the precision and stability of defect detection in aluminum foil sealing has become a pressing technical challenge for the industry.

Traditional inspection approaches for aluminum foil sealing processes have certain limitations in practical applications, including manual visual inspection and mechanized inspection. Manual visual inspection relies heavily on the operators’ experience, making it susceptible to subjective judgment and operator fatigue; as a result, minor or hidden defects may be hard to consistently identify [3]. In industrial contexts, the overall accuracy of manual visual inspection is around 80% [4]. Although mechanized inspection improved efficiency, its capability remained limited in detecting thermal anomalies and low-contrast regions, such that it does not fully meet the modern packaging industry’s requirements for detection accuracy and adaptability [5]. Therefore, the adoption of efficient, intelligent, and non-destructive detection technologies is considered essential to compensate for the deficiencies of traditional methods and improve the reliability of aluminum foil sealing inspection [6].

With the rapid advancement of industrial automation and smart manufacturing technologies, thermal imaging—a non-destructive testing technique—has demonstrated significant potential in detecting defects in aluminum foil sealing, owing to its advantages such as harmlessness to humans, excellent concealment, and all-weather functionality [7]. In recent years, thermal imaging technology has been widely applied in areas such as medical diagnostics [8,9] and industrial equipment monitoring [10,11]. Its stable performance across diverse scenarios has provided both technical support and a methodological reference for its application in the classification and detection of aluminum foil sealing defects.

In the field of defect detection for aluminum foil sealing, some researchers have diligently explored the integration of thermal imaging technology with image processing and pattern recognition methods. By leveraging edge detection techniques based on grayscale images, texture analysis methods, and Gabor transforms, they precisely extracted defect features. Furthermore, they effectively identified these defects using traditional classification algorithms such as the Extreme Learning Machine (ELM) and Support Vector Machine (SVM). For instance, Zhou et al. [12] combined Gabor transform and ELM for texture feature extraction and classification, which led to enhanced detection performance but still exhibited deficiencies in terms of real-time capability and automation. Zhao et al. [13] proposed a detection method based on optimized Back Propagation (BP) neural networks, which improved the recognition rate. However, these methods relied on handcrafted features, exhibited low computational efficiency, and often failed to meet the real-time requirements of industrial online detection. For instance, Liu et al. reported CPU inference times of 583 ms, 142 ms, and 248 ms per image for SIFT, SURF, and GLCM, respectively [14]. Similarly, Fejér et al. measured that SIFT key-point detection required approximately 193 ms per image on an ARM Cortex-A53 CPU (Arm Ltd., Cambridge, UK) [15]. In contrast, inference times shorter than approximately 33 ms per image (i.e., above 30 FPS) are generally considered the benchmark for industrial real-time detection [16,17].

To address the limitations of traditional methods in terms of feature representation and computational efficiency, the integration of thermal imaging technology with deep learning has achieved positive advancements in the field of intelligent detection in recent years. Due to issues such as low overall grayscale, a low signal-to-noise ratio, and low contrast that often affect thermal images [18], defect recognition based on traditional image processing methods is challenging. In contrast, deep learning methods, with their advantages such as automatic feature extraction and end-to-end training, have demonstrated robustness and generalization capabilities in various tasks. For instance, Cheng Chen et al. significantly enhanced the accuracy of pavement thermal imaging damage detection by combining the EfficientNet-B4 model with data augmentation techniques [19]. Low et al. showcased the potential of thermal imaging in agriculture by applying deep learning to guava maturity assessment [20]. P. Pak et al. proposed a deep learning method which is capable of accurately detecting porosity in laser powder bed fusion manufacturing [21]. C. Cui et al. introduced a thermal error prediction model based on thermal images and a deep attention residual network, optimizing the network structure through transfer learning to achieve high-accuracy thermal error compensation [22]. X. H. Liu successfully differentiated the mechanical strength of thermal protective fabrics after thermal imaging aging using transfer learning and data augmentation techniques [23]. M. N. Reza explored the integration of infrared thermal imaging and deep learning techniques in the pig farming industry, enhancing the efficiency of early disease detection and compression symptom monitoring [24]. L. Wanqing et al. proposed a method for detecting heat sealing quality based on infrared thermal imaging and deep learning, utilizing a deep convolutional neural network for defect classification [25]. In addition to data-driven approaches, thermographic non-destructive testing (TNDT) also encompasses physics-based and hybrid techniques that leverage heat-transfer models or multi-layer thermal analyses to estimate material parameters and enhance interpretability [26,27,28].

The aforementioned studies, encompassing both deep learning approaches and physics-based or hybrid TNDT methods, have comprehensively demonstrated the applicability of thermal imaging-based technologies in various detection tasks, providing valuable theoretical support and methodological references for the intelligent identification of defects in aluminum foil sealing. Nevertheless, the majority of current research endeavors continue to concentrate on model performance optimization and experimental validation. They remain insufficient in comprehensively and systematically addressing the core requirements posed by seal defect detection in industrial settings, including deployment adaptability, real-time processing efficiency, and system stability, thereby somewhat hindering their widespread adoption in practical applications.

Although the integration of deep learning and thermal imaging technology has demonstrated promising prospects in the field of industrial inspection, there remain several pressing issues to be addressed in the task of detecting defects in aluminum foil seals. First, the thermal field distribution within the sealed area exhibits non-uniformity or lacks a sharp boundary transition, potentially diminishing the model’s discriminative ability for features and consequently adversely impacting the accuracy and stability of classification. Second, infrared thermal images are often susceptible to various disturbances, including imaging sensor noise and fluctuations in equipment stability, which introduce a certain level of noise into the images and inevitably decrease a model’s effectiveness in accurately identifying sealing state characteristics. Moreover, many deep neural network architectures are characterized by high structural complexity and large parameter counts, making it challenging to meet the stringent constraints on inference latency and computational resources in industrial settings. Beyond these image- and model-level challenges, industrial deployment imposes concrete, quantified constraints on system design. In inline packaging, line throughput can reach about 250 bottles per minute on glass bottle lines, leaving little per item processing time for inspection [29]. In vision-based inspection, real time operation typically requires about 33 ms per image [30]. Deployment frequently relies on embedded GPUs. Evaluations on NVIDIA Jetson platforms (NVIDIA Corp., Santa Clara, CA, USA) show that latency and throughput are sensitive to model size and to the inference engine, thereby constraining large or computing-intensive models [31,32]. Although quantitative reports specific to aluminum foil sealing are limited, these constraints are broadly consistent with conditions observed in inline packaging lines and with typical embedded GPU deployments. We therefore adopt them as operating assumptions in this study. These constraints necessitate a lightweight network architecture that maintains detection accuracy while meeting stringent throughput, latency, and embedded hardware constraints.

Addressing the aforementioned issues, this study introduces an enhanced model, EfficientNet with Atrous Spatial Pyramid Pooling and Channel–Spatial Attention Mixing with Channel Shuffle (EAC-Net), which is based on EfficientNet-B0. The primary contributions are summarized as follows:

An enhanced EfficientNet-B0 model is proposed to improve task adaptability by addressing the uneven heat distribution and blurry boundaries in thermal images of aluminum foil sealing. Specifically, the model incorporates Multi-scale Atrous Convolution and Channel–Spatial Attention Mixing with Channel Shuffle modules. The multi-scale atrous convolution improves sensitivity to defect regions across spatial sizes—from small, localized anomalies, for example, minor cold spots and asymmetric thermal footprints, to extended thermal deviations, for example, underheating or overheating over a large area of aluminum foil—and across contrast levels, from weak temperature differences near the background to strong, well-defined thermal responses. The channel and spatial attention mixing with channel shuffle further emphasizes seal-relevant areas while suppressing backgrounds. These effects are verified by ablation against a single-scale baseline, in which removing either component reduces accuracy and precision, and by Grad-CAM, which shows more concentrated, higher-intensity activations within the annotated sealing region and fewer background activations.
An attention module named CSAMix is designed, which integrates both channel attention and spatial attention mechanisms by introducing a channel shuffling strategy. This module improves the efficiency of information exchange across feature dimensions, enhances the representation of regional thermal features, strengthens the focus on key thermal areas, and increases the robustness of the model.
The proposed EAC-Net model was verified on thermal images in the context of aluminum foil sealing. The experimental results verify that the proposed method achieves higher detection accuracy and greater stability while maintaining higher efficiency when compared to existing methods.

2. Related Work

2.1. EfficientNet

EfficientNet [33] adopts a compound scaling strategy to balance network depth, width, and input resolution, thereby achieving improved performance and computational efficiency. The corresponding formulations are presented in Equations (1)–(5).

d = α^{\emptyset}

(1)

w = β^{\emptyset}

(2)

r = γ^{\emptyset}

(3)

s . t . α \cdot β^{2} \cdot γ^{2} \approx 2

(4)

α \geq 1, β \geq 1, γ \geq 1

(5)

where

d

,

w

, and

r

represent the network depth, width, and input image resolution, respectively. The scaling factors

α

,

β

, and

γ

correspond to depth, width, and resolution.

\emptyset

denotes the unified compound scaling coefficient, and “subject to” (

s . t .

) indicates the constraint condition.

To balance recognition accuracy and computational efficiency, EfficientNet-B0 [34] was adopted as the baseline model in this study. Designed through neural architecture search (NAS), EfficientNet-B0 serves as the foundational architecture for the EfficientNet family. It features a compact and efficient structure consisting of two standard convolutional layers, a series of Mobile Inverted Bottleneck Convolutions (MBConv) [35], an average pooling layer, and a fully connected layer.

The core component of EfficientNet-B0 is the MBConv block, which applies Depthwise Separable Convolution (DS Conv) to reduce the number of parameters and computational complexity. This convolution operation consists of a depthwise convolution (DW Conv) and a pointwise convolution (PW Conv). The DW Conv operates independently on each input channel, while the PW Conv adjusts the number of channels and enables cross-channel feature fusion. In addition, the MBConv block incorporates the Squeeze-and-Excitation (SE) attention module, which adaptively reweights feature channels to improve the representation of informative features.

The structure of the MBConv block is illustrated in Figure 1a. The DS Conv is shown in Figure 1b, where DW Conv refers to depthwise convolution and PW Conv denotes pointwise convolution. The structure of the SE module is depicted in Figure 1c. Batch Normalization (BN) represents the normalization layer, and the Swish activation function [36] is applied to enhance the nonlinear representation capability of the model, as defined in Equation (6). Swish is a smooth, non-monotonic activation that facilitates gradient-based optimization and preserves small negative responses, thereby aiding the extraction of fine-grained features. Compared with ReLU, it achieves stronger feature continuity, and its performance is comparable to GELU on standard image classification benchmarks [37]. In thermal infrared image analysis, Swish has been adopted in YOLO-based detectors for low-visibility scenes [38], and it is also used in industrial infrared steel detection within the detector’s multi-scale feature-fusion module [39]. In a related low-contrast task of MR image denoising, replacing ReLU with Swish increased PSNR and SSIM, indicating better preservation of subtle intensity variations [40]. Within the SE module, AvgPooling indicates average pooling, as formulated in Equation (7), and Fully Connected (FC) denotes the fully connected layer.

f (x) = x \cdot σ (x) = x \cdot \frac{1}{1 + e^{- x}}

(6)

S_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} X_{c} (i, j)

(7)

where

σ (x)

denotes the sigmoid function,

S_{c}

represents the average activation of channel C,

H a n d W

are the spatial dimensions of the feature map, and

X_{c} (i, j)

denotes the feature value at spatial position

(i, j)

in channel C.

2.2. ASPP Module

In the classification of aluminum foil sealing defects, a model that can simultaneously capture local details and global structures is required due to their thermal distribution patterns. However, standard convolutional layers are limited by fixed receptive fields, leading to insufficient multi-scale feature representation and degraded classification performance.

To enhance the multi-scale adaptability of the model, a multi-scale feature fusion module based on atrous convolution, known as Atrous Spatial Pyramid Pooling (ASPP) [41], has been introduced. A broader receptive field is achieved by incorporating multi-scale feature extraction. This design enhances the model’s ability to capture discriminative features at different levels, which is beneficial for improving classification performance.

The ASPP module consists of five parallel branches that extract multi-scale contextual features, including a 1 × 1 convolution; three 3 × 3 convolutions with dilation rates of 6, 12, and 18; and a global average pooling branch. The resulting feature maps are concatenated and subsequently fused using a 1 × 1 convolution. The structure of the ASPP module is illustrated in Figure 2.

2.3. CBAM Module

In deep learning tasks, the effectiveness of feature extraction directly influences classification accuracy and model robustness. Convolutional operations typically apply uniform weights across all channels and spatial locations. This uniform treatment may lead to the omission of important features or the inclusion of redundant information, thereby reducing classification performance. To address this issue, the design concept of the Convolutional Block Attention Module (CBAM) [42] was introduced to enhance feature representation along both channel and spatial dimensions.

CBAM consists of a Channel Attention Module (CAM) and a Spatial Attention Module (SAM), which are sequentially connected to refine the input feature map by applying attention along the channel and spatial dimensions, as illustrated in Figure 3a. Specifically, the input feature map,

F

, is first element-wise multiplied with the channel attention map,

M_{c}

, to generate the channel-refined feature map,

F^{'}

. Subsequently,

F^{'}

is further refined through element-wise multiplication with the spatial attention map,

M_{s}

, resulting in the final output feature map,

F^{''}

, as formulated in Equation (8).

The input feature map,

F

, is first passed through CAM, where global average pooling and global max pooling are performed independently to capture channel-wise statistical information. The pooled features are then forwarded to a shared Multi-Layer Perceptron (MLP) consisting of two fully connected layers. The resulting outputs are summed and passed through a sigmoid activation function to generate the channel attention map,

M_{c}

, as defined in Equation (9). The structure of CAM is shown in Figure 3b. Subsequently,

F^{'}

is fed into SAM. In this module, channel-wise max pooling and average pooling are applied to compress the channel dimension. The resulting two-dimensional descriptors are concatenated and convolved using a 7 × 7 kernel. The output is then activated by a sigmoid function to yield the spatial attention map,

M_{s}

. The computation process is represented by Equation (10). The structure of SAM is illustrated in Figure 3c.

\{\begin{matrix} F^{'} = M_{c} (F) ⨂ F \\ F^{''} = M_{s} (F^{'}) ⨂ F^{'} \end{matrix}

(8)

M_{c} (F) = σ [M L P (A v g P o o l (F)) + M L P (M a x P o o l (F))]

(9)

M_{s} (F^{'}) = σ (f^{7 \times 7} ([A v g P o o l (F^{'}); M a x P o o l (F^{'})]))

(10)

where

F^{''}

denotes the feature map obtained by element-wise multiplication of the spatial attention map,

M_{s}

, with the channel-refined feature map,

F^{'}

.

F^{'}

is obtained by multiplying the channel attention map,

M_{c}

, with the input feature map,

F

.

⨂

represents element-wise multiplication.

M L P

denotes a shared Multi-Layer Perceptron;

σ (\cdot)

denotes the sigmoid activation function;

f^{7 \times 7}

represents a convolution operation with a 7 × 7 kernel.

3. Proposed Method

3.1. EAC-Net Structure

This study proposes an enhanced network architecture, referred to as EfficientNet with Atrous Spatial Pyramid Pooling and Channel–Spatial Attention Mixing with Channel Shuffle (EAC-Net), which is constructed based on the EfficientNet-B0 backbone. The model is designed to learn discriminative features from infrared images of aluminum foil sealing defects with blurred boundaries and indistinct thermal transitions.

To better align with the spatial resolution characteristics of thermal infrared data and accommodate resource constraints in practical deployment, the original 640 × 480 images were processed by removing irrelevant background regions and retaining only the effective sealing region (ROI), which was then uniformly resized to 160 × 160 as the model input. Compared with the commonly used 224 × 224, this resolution helps preserve the discriminative local thermal anomaly features for defect recognition while reducing computational complexity and inference latency, thus making it more compatible with the real-time requirements of industrial inspection. In contrast, upsampling the ROI to 224 × 224 would only generate additional pixels through interpolation, which does not essentially provide new discriminative information and may instead weaken local thermal contrast while increasing computational overhead. Moreover, as EAC-Net adopts a largely convolutional structure, it exhibits good adaptability to varying input sizes. Consequently, the change in resolution only affects the spatial dimensions of intermediate feature maps, without altering the model architecture or training procedure.

The network takes a 160 × 160 × 3 thermal image as input, which first passes through a convolutional layer, followed by a sequence of MBConv blocks. These layers progressively reduce the spatial resolution while increasing the feature depth: from 160 × 160 to 80 × 80, then to 40 × 40, and eventually to 20 × 20. To balance semantic representation and spatial detail, the ASPP module is placed after the 20 × 20 × 80 feature map stage. It enhances feature extraction across multiple receptive fields through parallel dilated convolutions, thereby improving the network’s adaptability to diverse thermal distribution patterns. Following ASPP, a Channel–Spatial Attention Mixing with Channel Shuffle (CSAMix) module is applied to further improve the model’s attention to critical thermal regions. After the CSAMix module, the enhanced feature maps are fed into a series of deeper MBConv blocks, where the spatial dimensions are progressively reduced to 10 × 10 and then to 5 × 5, while the channel dimension increases to 320. A 1 × 1 convolution is then applied to project the feature maps onto a 1280-dimensional space. Subsequently, global spatial information is aggregated using an adaptive average pooling layer, resulting in a compact 1 × 1 × 1280 feature description vector. The feature vector is processed by a fully connected layer, following the linear transformation defined in Equation (11), resulting in an output vector of size 1 × 8 that corresponds to eight categories of aluminum foil sealing states. The overall network architecture is illustrated in Figure 4.

y = W x + b, x \in R^{1280}, y \in R^{8}

(11)

where

W

and

b

denote the weights and bias of the fully connected layer, respectively;

x

denotes the input feature vector; and

y

is denoted as the final classification output.

3.2. CSAMix Module

To improve the network’s responsiveness to key thermal regions and enhance its capability to represent discriminative features, a Channel–Spatial Attention Mixing with Channel Shuffle (CSAMix) module is introduced, as illustrated in Figure 5. Drawing upon the design concept of CBAM, it consists of Channel Attention and Spatial Attention branches and employs a channel shuffle strategy [43] to enhance cross-channel information interaction.

In the channel attention branch, the input feature map,

F

, is first subjected to dimensionality permutation, transforming from C × H × W to W × H × C, as formulated in Equation (12). An MLP [44] comprising two fully connected layers is applied to compress and expand the channel dimension. Next, a reverse permutation operation is performed to restore the dimensions to C × H × W, and a channel attention map,

M_{c}

, is generated through the sigmoid activation function. The input feature map,

F

, is multiplied element-wise with the channel attention map,

M_{c}

, to obtain the channel-enhanced feature map,

F_{c h a n n e l}

. The mathematical formula is shown in (13). To enhance cross-channel feature interaction, CSAMix applies a channel shuffle operation after channel attention. The feature channels are first divided into four groups and then permuted across group boundaries to facilitate inter-group information exchange. This operation is implemented through tensor dimension rearrangement without introducing additional parameters, thereby maintaining low computational overhead. In the spatial attention branch, the input feature map,

F_{s h u f f l e}

, is first processed by a 7 × 7 convolutional layer, which reduces the number of channels to one-fourth of its original size. Then, BN and the ReLU activation function are applied for nonlinear transformation. Next, the number of channels is restored to the original dimension, C, through a second 7 × 7 convolutional layer. Finally, the spatial attention map,

M_{s}

, is generated via the sigmoid activation function. The feature map,

F_{s h u f f l e}

, after channel rearrangement, is multiplied element-wise with the spatial attention map,

M_{s}

, to obtain the final output feature map,

F_{s p a t i a l}

. The mathematical formula is shown in (14).

In summary, the CSAMix module integrates channel attention, channel rearrangement, and spatial attention mechanisms. Without significantly increasing the number of parameters or computational complexity, it enables cross-channel feature interaction and spatial context enhancement, making it suitable for the thermal image classification task of aluminum foil sealing.

F \in R^{C \times H \times W} \to \tilde{F} \in R^{W \times H \times C}

(12)

F_{c h a n n e l} = σ (M L P (\tilde{F}) \to R^{C \times H \times W}) \otimes F

(13)

F_{s p a t i a l} = σ ({f^{7 \times 7} (f}^{7 \times 7} (F_{s h u f f l e})))) \otimes F_{s h u f f l e}

(14)

where

σ (\cdot)

denotes the sigmoid activation function;

\otimes

represents element-wise multiplication;

F

refers to the input feature map;

\tilde{F}

denotes the dimensionally permuted version of

F

;

F_{c h a n n e l}

denotes the channel-refined feature map, obtained by performing element-wise multiplication between the input feature map,

F

, and the channel attention map,

M_{c}

;

F_{s h u f f l e}

refers to the feature map after group-wise channel shuffle; and

F_{s p a t i a l}

denotes the final output of the spatial attention branch, which is obtained by performing element-wise multiplication between the shuffled feature map,

F_{s h u f f l e}

, and the spatial attention map,

M_{s}

.

4. Analysis and Discussion of Experimental Results

4.1. Experimental Platform and Dataset

To acquire thermal infrared images of aluminum foil sealing defects, an experimental platform was established, consisting of a conveyor belt, a sealing machine, and an infrared camera, as shown in Figure 6. High-density polyethylene (HDPE) bottles were used as the sealed containers, and the sealing process was based on the principle of electromagnetic induction heating [45], as illustrated in Figure 7. The thermal camera captured thermal infrared images at a resolution of 640 × 480 for eight sealing conditions, namely, aluminum foil defect, double-layer aluminum foil, loose cap, low temperature, no aluminum foil, overheating, reversed aluminum foil, and proper sealing, as illustrated in Figure 8.

The loose cap category, while not a defect of the aluminum foil material itself, was included in the dataset due to its practical relevance in sealing operations and its direct impact on packaging reliability. An insufficiently tightened cap reduces the applied sealing pressure during the induction sealing process, thereby limiting the effective contact between the heated aluminum foil and the container rim. This inadequate pressure hinders complete melting and bonding of the polymer layer, potentially leading to weak adhesion or incomplete sealing, often referred to in industrial practice as false sealing. Containers with loose caps are prone to leakage during handling and storage. Moreover, the reduced contact area alters local heat transfer, producing characteristic anomalies in the thermal distribution pattern that can be effectively captured via infrared thermography. Including this class is therefore consistent with our study objective of evaluating sealing quality under realistic operating conditions.

4.2. Experimental Setup

The infrared image dataset used in this study consisted of 800 original samples, covering eight types of sealing conditions: aluminum foil defect, double-layer aluminum foil, loose cap, low temperature, no aluminum foil, overheated, reversed aluminum foil, and proper sealing. Each image in the dataset was annotated according to a predefined correspondence between image identifiers and defect categories. The dataset contains eight defect classes, which were encoded into unique numerical labels for model training. The finalized annotations were stored in a structured annotation format linking each image to its corresponding defect type, ensuring consistent and reproducible labeling across the dataset. The dataset was split into a training set and a test set in a 6:4 ratio, ensuring a balanced class distribution in both subsets to meet the basic requirements for model training and evaluation.

Due to the limited sample size in the original training set, a common data augmentation strategy was applied during the training phase, implemented through the Albumentations [46] image processing library, in order to enhance the model’s stability and feature generalization capabilities under conditions of limited samples. The data augmentation pipeline included random horizontal flipping (probability = 0.5), brightness and contrast adjustment (probability = 0.5), small-angle rotation (±15°, probability = 0.5), Gaussian blur (probability = 0.2), and color jitter (probability = 0.3). These operations were applied sequentially according to their specified probabilities to simulate a broader range of potential variations in the training data. Through these augmentation operations, the training set was expanded to 4800 images, thereby increasing sample diversity. The test set was left unchanged to ensure objective model evaluation.

The experiment was conducted on a computing platform equipped with an NVIDIA RTX 3090Ti GPU (24 GB VRAM). The training process was based on PyTorch [47] 2.0.1 and torchvision [48] 0.15.2, within the Python 3.9 and CUDA 11.8 environments. EfficientNet-B0 was selected as the base network, with a fixed input size of 160 × 160. Training was performed with a batch size of 64, an initial learning rate of 0.00001, the Adam optimizer, and 100 training epochs.

4.3. Experimental Results and Analysis

4.3.1. Evaluation Metrics

To evaluate the classification performance and training effectiveness of the model, the following metrics were adopted:

(1) Accuracy

Accuracy [49] is used to measure the overall correctness of the model’s predictions. It is defined as follows:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(15)

(2) Precision

Precision [49] is defined as the proportion of correctly predicted positive samples among all samples predicted as positive. It is defined as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(16)

where

T P

and

T N

denote the numbers of correctly predicted positive and negative samples, respectively, while

F P

and

F N

represent the numbers of incorrectly predicted positive and negative samples.

This study adopts Macro Precision as the primary precision metric, calculated by averaging the precision scores across all categories. This metric provides a balanced evaluation of the model’s overall performance in multi-class classification tasks.

4.3.2. Comparative Experiment

To assess the efficacy of the proposed EAC-Net model, several representative convolutional neural networks were selected for comparison, including MobileNetV2 [50], MobileNetV3 [51], ShuffleNetV2 [52], EfficientNet-B0 [53], GoogleNet [53], ResNet101 [54], and RegNetX [55]. All models were evaluated on the same thermal image classification task using a fixed input resolution of 160 × 160 × 3. Evaluation metrics included FLOPs, parameter count, model size, inference time, accuracy, and precision. The comparative results are summarized in Table 1.

These models were selected from widely used convolutional neural network architectures, which have demonstrated effectiveness in image classification tasks and have been applied to thermal imaging or related industrial inspection. To ensure fairness, all methods adopted a unified preprocessing pipeline: input images were resized to 160 × 160 and normalized with the dataset-specific mean and standard deviation computed from the original training set.

EfficientNet-B0 achieved an accuracy of 97.19% and a precision of 97.21%, offering a balanced trade-off between performance and computational cost. MobileNetV2, despite having the smallest model size and the fastest inference time, yielded relatively lower accuracy and precision at 95.31% and 95.46%. MobileNetV3 achieved 96.67% accuracy and 96.91% precision, showing slightly higher performance than MobileNetV2 but with a larger model size. ShuffleNetV2 obtained 96.54% accuracy and 96.76% precision. GoogleNet improved performance with moderate complexity, achieving 96.82% accuracy and 96.91% precision. ResNet101 reached 97.81% accuracy and 97.93% precision but incurred substantially higher computational costs. RegNetX maintained a favorable balance between performance and complexity, reaching 98.12% accuracy and 98.25% precision.

EfficientNet-B0 was selected as the backbone network after a quantitative comparison with several commonly used architectures in industrial deep learning, as summarized in Table 1. Among them, while MobileNetV3 and ShuffleNetV2 achieve low computational costs, EfficientNet-B0 attains a relatively higher classification accuracy while maintaining low computational complexity, making it a balanced and appropriate choice for real-time industrial deployment.

The proposed EAC-Net achieved an accuracy of 99.06% and a precision of 99.07%, with a parameter count of 4.40 M and an inference time of 20.62 ms, outperforming all comparative models. These results demonstrate that EAC-Net possesses effectively enhanced discriminative capability while maintaining a compact architecture and high operational efficiency, indicating its promising application prospects and practical deployment value.

4.3.3. Ablation Study

To evaluate the effectiveness of the proposed modules in aluminum foil sealing defect classification, an ablation study was conducted using EfficientNet-B0 as the backbone. ASPP and CSAMix were sequentially introduced to construct multiple model variants. Classification performance and model complexity were then compared under different module configurations. To further assess the contribution of atrous convolution, the atrous convolution in ASPP was replaced with standard convolution, forming the Convolutional Spatial Pyramid Pooling (ConvSPP) module. CBAM was further introduced to investigate the individual and combined effects of each sub-module. Table 2 summarizes the FLOPs, parameter count (Params), and model size (Size) for each model variant, while Figure 9 presents the corresponding accuracy (Acc), precision (P), and inference time (Time).

The experimental results show that the baseline EfficientNet-B0 achieved 97.19% accuracy and 97.21% precision without structural modifications. Incorporating the ConvSPP module improved the accuracy and precision to 98.23% and 98.35%, while replacing ConvSPP with ASPP further enhanced these metrics to 98.56% and 98.67%, respectively, due to the enlarged receptive field provided by atrous convolutions. Attention mechanisms also contributed positively. Integrating CBAM yielded 98.12% accuracy and 98.25% precision. Replacing CBAM with the proposed CSAMix module improved the accuracy to 98.44% and precision to 98.52%, with only a marginal increase in inference time.

Combining attention with multi-scale modules provided additional gains. The CBAM + ConvSPP configuration achieved 98.69% accuracy and 98.75% precision, while CBAM + ASPP reached 98.82% and 98.87%, respectively. The final EAC-Net model, integrating both ASPP and CSAMix, delivered the best performance: 99.06% accuracy and 99.07% precision, with an inference time of 20.62 ms. This represents an improvement of 1.87% in accuracy and 1.86% in precision over the baseline, confirming the effectiveness of the proposed combination of modules.

To further validate the discriminative performance of the proposed EAC-Net model under multi-category sealing conditions, the confusion matrix on the test set is visualized in Figure 10. For clarity, the eight sealing status categories are denoted as F1–F8, where F1 refers to aluminum foil defect, F2 to double-layer aluminum foil, F3 to loose cap, F4 to low temperature, F5 to no aluminum foil, F6 to proper sealing, F7 to overheated, and F8 to reversed aluminum foil.

As shown in Figure 10, the normalized confusion matrix presents the classification results in percentage form, where diagonal values represent the True-Positive Rate (TPR) for each class; the mathematical formula is shown in (17), and off-diagonal values indicate the misclassification rate. The EAC-Net model achieves a high TPR across most sealing categories. This performance is attributed to the ASPP module’s ability to integrate multi-scale contextual information and the CSAMix module’s enhancement of critical regional features. The combination of these two modules substantially improves the model’s recognition capability and stability under diverse sealing defect conditions.

T P R = \frac{T P}{T P + F N} \times 100 %

(17)

where

T P

denotes the number of correctly predicted positive samples (true positives), and

F N

denotes the number of actual positive samples that are incorrectly predicted as negative (false negatives).

4.4. Discussion

4.4.1. Robustness Evaluation Under Gaussian Noise Perturbation

To evaluate the model’s robustness under sensor-induced random disturbances, Gaussian noise [56] (σ = 5) was added to the test set while keeping all other conditions unchanged.

As illustrated in Figure 11, all models experienced performance degradation under Gaussian noise (σ = 5), with varying degrees of reduction. The baseline model (M1) showed the most notable drop, with its accuracy falling from 97.19% to 96.38% and precision from 97.21% to 96.45%, indicating high sensitivity to noise perturbations. Models M2 through M7 demonstrated improved robustness due to the integration of modules such as ConvSPP, ASPP, CBAM, and CSAMix, exhibiting smaller declines in both metrics. Among them, the EAC-Net model showed the best noise resistance, with accuracy decreasing from 99.06% to 98.79% and precision from 99.07% to 98.84%, corresponding to reductions of only 0.27% and 0.23%, respectively. These results confirm that the combination of ASPP and CSAMix contributes to enhanced anti-noise capability.

4.4.2. Analysis of Discriminative Feature Localization Through Grad-CAM

Gradient-Weighted Class Activation Mapping (Grad-CAM) [57,58] is a visualization method that presents the attention regions of a convolutional neural network in the form of heatmaps. Grad-CAM calculates the gradients of the class with respect to the feature maps of the final convolutional layer, highlighting the response intensity of key regions during the model’s decision-making process. A pseudo-color scheme is employed to represent the response intensity, where yellow–red areas indicate strong gradient responses and blue areas denote weak responses.

Figure 12 illustrates the Grad-CAM visualization results for eight types of sealing conditions, showing the original thermal image (a), the response of the baseline model EfficientNet-B0 (b), and the response of the enhanced model EAC-Net (c). Overall, the visualization results reveal that EfficientNet-B0 tends to produce more scattered and less focused activation regions, which may limit its ability to localize critical areas. In contrast, EAC-Net exhibits more concentrated and discriminative activation responses, suggesting improved localization capability. This improvement is attributed to the integration of the ASPP and CSAMix modules, which enhance the model’s attention to key regions. The effectiveness of the model in identifying critical regions is further demonstrated by the Grad-CAM visualization.

4.4.3. Analysis of Applicability to Different Defect Types

In the experiments, not all sealing defects could be clearly observed with the naked eye, as some defects only exhibited slight differences in thermal distribution. However, the proposed EAC-Net was able to effectively capture these features and achieve accurate recognition, demonstrating greater stability compared to manual inspection. The method is mainly applied to detect surface sealing defects manifested as abnormal thermal contrast. In some industrial processes, disposable bottle caps cannot be opened for manual inspection, making human verification impractical. In such cases, the proposed method may provide valuable non-destructive and automated detection capabilities.

For hidden subsurface defects such as trapped inclusions and air bubbles, the proposed method has not yet been evaluated. From a heat-transfer perspective, their thermal distribution characteristics may resemble those of the defects investigated in this study to some extent, which implies a potential applicability of the proposed method for detecting such defects. However, their actual behavior is influenced by factors such as impurity composition, bubble size, and spatial distribution. Such variability makes these defects more complex than those considered in this study. Consequently, in the future, further systematic studies are required to validate the applicability of the proposed method to these defect types and to achieve a more comprehensive evaluation of sealing quality.

4.4.4. Feasibility Analysis of Real-Time Industrial Deployment

Experimental results show that EAC-Net achieves high detection accuracy with relatively low computational complexity and a short inference time. These features indicate that the method has the potential to meet real-time operational requirements in industrial environments. The model can operate on high-performance GPUs, and its relatively low computational overhead also indicates the potential for deployment on embedded hardware platforms, thereby offering application prospects in industrial scenarios with limited computing resources.

In terms of industrial applicability, the method has potential application value in scenarios such as beverage filling production lines and pharmaceutical packaging processes, where aluminum foil sealing is widely used to ensure product integrity and safety. In such industrial online inspection systems, several practical constraints usually need to be considered comprehensively: Firstly, throughput is critical because production lines generally operate at high processing speeds. Secondly, delays must be controllable to ensure that the inference process can meet real-time detection requirements. Thirdly, cost factors are equally crucial. Detection systems typically need to maintain a simple structure, be compatible with existing hardware, and minimize additional deployment costs.

A feasible industrial deployment process can be described as follows: after the sealing process is completed, a thermal imager captures images of sealed containers. These images are processed by the EAC-Net model deployed on an industrial PC or an embedded GPU platform for rapid inference. The classification results are then transmitted to the production-line control system via a programmable logic controller (PLC), enabling the automatic sorting or rejection of defective products. Compared with manual inspection, this process has the advantages of simplified operation, lower cost, and higher efficiency, thereby demonstrating considerable potential for application.

No semi-realistic or production-line validation has been conducted at this stage. Future research will further involve evaluations under conditions closer to real production environments to assess the robustness and applicability of the method.

4.4.5. Safety and Reliability Considerations in Industrial Deployment

In real-world manufacturing environments, aluminum foil sealing defects are a major safety concern, as they can compromise product integrity and lead to spoilage, contamination, or other hazards. This is particularly critical in sectors such as food, pharmaceuticals, and hazardous materials, where undetected defects can result in consumer health risks, product recalls, and financial losses. The essential role of defect detection is to identify and remove defective items from the production line before they are distributed, thus improving the overall yield of qualified, safe products and reducing safety risks. While the present study primarily evaluates classification accuracy and computational efficiency, these outcomes have direct practical safety relevance: a highly accurate detection model reduces the likelihood of missed defects, enhancing the mitigation of associated safety risks. Although comprehensive safety validation was not performed here, future work may explore multi-stage verification and additional quality control measures to further enhance defect detection reliability in industrial settings.

4.4.6. Conceptual Comparative Analysis of Physics-Based, Hybrid, and Data-Driven TNDT Approaches

Within thermographic non-destructive testing (TNDT), purely data-driven models can deliver high classification accuracy and low computational cost under the evaluated conditions. In contrast, physics-based methods—such as step-heating laser thermography for mechanical property evaluation [26], eddy-current pulsed thermography for defect–background separation [27], and multi-layer thermal modeling for quantitative parameter estimation [28]—offer strong physical interpretability, as their outputs can be directly related to material properties, but often require longer acquisition times and higher computational resources.

Hybrid strategies could provide complementary benefits by combining the efficiency of data-driven models with the interpretability of physics-based methods. Incorporating physically derived features or constraints into network training may improve consistency with heat-transfer mechanisms and enhance generalization across different heating patterns, materials, and boundary conditions.

Sensitivity to environmental and system conditions is a general concern in TNDT. In industrial settings, fluctuations in operating parameters may affect thermal image quality and detection reliability; thus, preprocessing, adaptive modeling, and calibration procedures are often employed to mitigate such effects and enhance robustness.

In summary, while data-driven approaches such as the proposed EAC-Net excel in efficiency and classification performance, physics-based and hybrid methods provide advantages in interpretability and parameter estimation. The choice of method should be guided by application-specific requirements, balancing the need for interpretability, parameter estimation, computational efficiency, and real-time performance. Future research may explore integrating physically informed constraints into deep learning frameworks to combine the strengths of both paradigms.

Based on the above discussion, the specific positioning of EAC-Net within this landscape can be summarized as follows: In the context of thermographic non-destructive testing, the primary advantages of EAC-Net lie in its high classification accuracy, low computational cost, and applicability to real-time inspection. Its limitations are mainly associated with the lack of direct physical interpretability and potential sensitivity to variations in acquisition conditions. The main innovation of EAC-Net lies in its tailored network architecture—combining Efficient-Net-B0 with ASPP and CSAMix modules—to exploit multi-scale thermal features and channel–spatial interactions, enabling efficient yet accurate defect detection.

5. Conclusions

This study proposed an enhanced classification framework, called EAC-Net, for defect recognition in the aluminum foil sealing process. The model was developed on the basis of the EfficientNet-B0 backbone and integrates an ASPP module to improve the extraction of thermal features under conditions of uneven heat distribution and blurred boundaries. This module enables the network to capture defect features across multiple receptive fields, thereby improving its adaptability to different thermal distribution patterns. To further enhance the model’s ability to focus on key regions, the study incorporated both channel and spatial attention mechanisms, along with a channel shuffle strategy, to construct a novel attention module named CSAMix. This module strengthens information interactions across feature dimensions and improves the model’s attention to critical thermal areas. Experimental results on thermal infrared images of aluminum foil sealing defects demonstrated that EAC-Net outperforms all comparative models in terms of classification accuracy and precision, confirming the effectiveness of the proposed module design. In addition, EAC-Net maintains low computational costs with 4.40 × 10⁶ parameters and 0.25 GFLOPs. It achieves an average inference time of 20.62 ms per image and a model size of 16.77 MB, making it feasible for deployment and application in industrial scenarios.

Although the proposed method showed strong performance in terms of classification accuracy and model efficiency, certain issues require further improvement. Firstly, lightweighting of models is one of the inevitable trends in the context of industrial applications, so complex deep models need to be further simplified to achieve high efficiency and low cost. Secondly, there remains room for optimization in the model training process of the proposed method. Finally, building on the limitations identified in the discussion, future work will focus on extending the method to additional defect types, including trapped inclusions and air bubbles, as well as exploring model compression strategies to enable its deployment on embedded systems, thus further enhancing its industrial applicability.

Author Contributions

Conceptualization, Z.H. and Y.C.; methodology, Z.H.; software, Z.H.; validation, Z.H. and Y.C.; formal analysis, Z.H.; investigation, Z.H.; resources, Z.Y.; data curation, Z.H.; writing—original draft, Z.H.; writing—review and editing, Y.C.; visualization, Z.H.; supervision, Y.C.; project administration, Y.C.; funding acquisition, Y.Q. and L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Open Project of Guangdong Provincial Key Laboratory of Manufacturing Equipment Digitization, the Natural Science Foundation of Hubei Province (Grant No. 2024AFB259), and the Open Project of the National Key Laboratory of Intelligent Manufacturing Equipment and Technology (Grant No. IMETKF2023011).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are not publicly available due to confidentiality and privacy restrictions. However, the data may be made available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to express their sincere gratitude to the faculty members of the School of Mechanical Engineering and Automation, Wuhan Textile University, for providing the infrared imaging equipment used in the experiments and valuable academic guidance during the course of this research. The authors also wish to thank Zhongbao Xu from Wuhan Dongtai Borui Automation Equipment Co., Ltd., for his technical assistance and provision of essential experimental equipment.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ELM	Extreme Learning Machine
SVM	Support Vector Machine
CSAMix	Channel–Spatial Attention Mixing with Channel Shuffle
MBConv	Mobile Inverted Bottleneck Convolutions
DS Conv	Depthwise Separable Convolution
SE	Squeeze-and-Excitation
DW Conv	Depthwise Convolution
PW Conv	Pointwise Convolution
BN	Batch Normalization
FC	Fully Connected
ASPP	Atrous Spatial Pyramid Pooling
ConvSPP	Convolutional Spatial Pyramid Pooling
CBAM	Convolutional Block Attention Module
CAM	Channel Attention Module
SAM	Spatial Attention Module
MLP	Multi-Layer Perceptron
EAC-Net	EfficientNet with Atrous Spatial Pyramid Pooling and Channel–Spatial Attention Mixing with Channel Shuffle
TPR	True-Positive Rate
TNDT	Thermographic Non-Destructive Testing
Grad-CAM	Gradient-Weighted Class Activation Mapping

References

Alamri, M.S.; Qasem, A.A.A.; Mohamed, A.A.; Hussain, S.; Ibraheem, M.A.; Shamlan, G.; Alqah, H.A.; Qasha, A.S. Food packaging’s materials: A food safety perspective. Saudi J. Biol. Sci. 2021, 28, 4490–4499. [Google Scholar] [CrossRef] [PubMed]
Kumar, G. Pharmaceutical drug packaging and traceability: A comprehensive review. Univers. J. Pharm. Pharmacol. 2023, 2, 19–25. [Google Scholar] [CrossRef]
Ren, Z.; Fang, F.; Yan, N.; Wu, Y. State of the Art in Defect Detection Based on Machine Vision. Int. J. Precis. Eng. Manuf.-Green. Technol. 2022, 9, 661–691. [Google Scholar] [CrossRef]
Sundaram, S.; Zeid, A. Artificial Intelligence-Based Smart Quality Inspection for Manufacturing. Micromachines 2023, 14, 570. [Google Scholar] [CrossRef]
Saberironaghi, A.; Ren, J.; El-Gindy, M. Defect Detection Methods for Industrial Products Using Deep Learning Techniques: A Review. Algorithms 2023, 16, 95. [Google Scholar] [CrossRef]
Guillot, V. Infrared Thermography For Seal Defects Detection On Packaged Products: Unbalanced Machine Learning Classification With Iterative Digital Image Restoration. ELCVIA Electron. Lett. Comput. Vis. Image Anal. 2023, 22, 35–51. [Google Scholar] [CrossRef]
Hou, F.; Zhang, Y.; Zhou, Y.; Zhang, M.; Lv, B.; Wu, J. Review on infrared imaging technology. Sustainability 2022, 14, 11161. [Google Scholar] [CrossRef]
Pakarinen, T.; Joutsen, A.; Oksala, N.; Vehkaoja, A. Assessment of chronic limb threatening ischemia using thermal imaging. J. Therm. Biol. 2023, 112, 103467. [Google Scholar] [CrossRef]
Gupta, T.; Sreedevi, I.; Jindal, R. Segmenting hotspots from medical thermal images using Density-based modified FC-P_cFS with spatial information. Quant. InfraRed Thermogr. J. 2024, 22, 368–404. [Google Scholar] [CrossRef]
Jeffali, F.; Ouariach, A.; El Kihel, B.; Nougaoui, A. Diagnosis of three-phase induction motor and the impact on the kinematic chain using non-destructive technique of infrared thermography. Infrared Phys. Technol. 2019, 102, 102970. [Google Scholar] [CrossRef]
Zhao, X.; Zhao, Y.; Hu, S.; Wang, H.; Zhang, Y.; Ming, W. Progress in active infrared imaging for defect detection in the renewable and electronic industries. Sensors 2023, 23, 8780. [Google Scholar] [CrossRef]
Zhou, Y.; Li, W.; Zhang, Y. Aluminum foil seal sealing property detection method based on thermal image feature extraction. Chin. J. Electron. Devices 2019, 42, 551–556. [Google Scholar] [CrossRef]
Zhao, S.; Li, W.; Shi, C. The research of tightness detection method of aluminum foil seal based on optimized BP neural network algorithm. J. Liaoning Univ. Pet. Chem. Technol. 2019, 39, 97–100. [Google Scholar] [CrossRef]
Liu, Y.; Xu, K.; Xu, J. An Improved MB-LBP Defect Recognition Approach for the Surface of Steel Plates. Appl. Sci. 2019, 9, 4222. [Google Scholar] [CrossRef]
Fejér, A.; Nagy, Z.; Benois-Pineau, J.; Szolgay, P.; de Rugy, A.; Domenger, J.-P. Implementation of Scale Invariant Feature Transform detector on FPGA for low-power wearable devices for prostheses control. Int. J. Circuit Theory Appl. 2021, 49, 2255–2273. [Google Scholar] [CrossRef]
Chan, S.; Li, S.; Zhang, H.; Zhou, X.; Mao, J.; Hong, F. Feature optimization-guided high-precision and real-time metal surface defect detection network. Sci. Rep. 2024, 14, 31941. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; Mi, C.; Wu, Z.; Lu, K.; Long, H.; Pan, B.; Li, D.; Zhang, J.; Chen, P.; Wang, B. A Real-Time Steel Surface Defect Detection Approach With High Accuracy. IEEE Trans. Instrum. Meas. 2022, 71, 5005610. [Google Scholar] [CrossRef]
Liu, H.; Bao, C.; Xie, T.; Gao, S.; Song, X.; Wang, W. Research on the intelligent diagnosis method of the server based on thermal image technology. Infrared Phys. Technol. 2019, 96, 390–396. [Google Scholar] [CrossRef]
Chen, C.; Chandra, S.; Han, Y.; Seo, H. Deep learning-based thermal image analysis for pavement defect detection and classification considering complex pavement conditions. Remote Sens. 2021, 14, 106. [Google Scholar] [CrossRef]
Low, E.S.; Ong, P.; Sim, J.Q.; Sia, C.K.; Ismon, M. Integrating deep learning with non-destructive thermal imaging for precision guava ripeness determination. J. Sci. Food Agric. 2024, 104, 7843–7853. [Google Scholar] [CrossRef]
Pak, P.; Ogoke, F.; Polonsky, A.; Garland, A.; Bolintineanu, D.S.; Moser, D.R.; Arnhart, M.; Madison, J.; Ivanoff, T.; Mitchell, J.; et al. ThermoPore: Predicting part porosity based on thermal images using deep learning. Addit. Manuf. 2024, 95, 104503. [Google Scholar] [CrossRef]
Cui, C.; Zan, T.; Ma, S.; Sun, T.; Lu, W.; Gao, X. Thermal image-driven thermal error modeling and compensation in CNC machine tools based on deep attentional residual network. Int. J. Adv. Manuf. Technol. 2024, 134, 3153–3169. [Google Scholar] [CrossRef]
Liu, X.; Tian, M.; Wang, Y. Mechanical strength recognition and classification of thermal protective fabric images after thermal aging based on deep learning. Int. J. Occup. Saf. Ergon. 2024, 30, 765–773. [Google Scholar] [CrossRef]
Reza, M.N.; Ali, M.R.; Samsuzzaman; Kabir, M.S.N.; Karim, M.R.; Ahmed, S.; Kyoung, H.; Kim, G.; Chung, S.-O. Thermal imaging and computer vision technologies for the enhancement of pig husbandry: A review. J. Anim. Sci. Technol. 2024, 66, 31–56. [Google Scholar] [CrossRef]
Wanqing, L.; Qinghua, Z.; Xiaowei, Z. Deep learning based heat sealing quality inspection. In Proceedings of the 2021 9th International Symposium on Next Generation Electronics (ISNE), Changsha, China, 9–11 July 2021; pp. 1–4. [Google Scholar]
Dell’Avvocato, G.; Rashkovets, M.; Mancini, E.; Contuzzi, N.; Casalino, G.; Palumbo, D.; Galietti, U. Innovative non-destructive thermographic evaluation of mechanical properties in dissimilar aluminium probeless friction stir spot welded (P-FSSW) joints. Eng. Fail. Anal. 2025, 177, 109675. [Google Scholar] [CrossRef]
Wang, D.; Yi, Q.; Liu, Y.; Lu, R.; Tian, G. Advanced detection and reconstruction of welding defects in irregular geometries using eddy current pulsed thermography. NDT E Int. 2025, 154, 103398. [Google Scholar] [CrossRef]
Simmen, K.; Buch, B.; Breitbarth, A.; Notni, G. Non-destructive inspection system for MAG welding processes by combining multimodal data. Quant. InfraRed Thermogr. J. 2021, 18, 1–17. [Google Scholar] [CrossRef]
Bereciartua-Perez, A.; Duro, G.; Echazarra, J.; González, F.J.; Serrano, A.; Irizar, L. Deep Learning-Based Method for Accurate Real-Time Seed Detection in Glass Bottle Manufacturing. Appl. Sci. 2022, 12, 11192. [Google Scholar] [CrossRef]
Lema, D.G.; Usamentiaga, R.; García, D.F. Quantitative comparison and performance evaluation of deep learning-based object detection models on edge computing devices. Integration 2024, 95, 102127. [Google Scholar] [CrossRef]
Shin, D.-J.; Kim, J.-J. A Deep Learning Framework Performance Evaluation to Use YOLO in Nvidia Jetson Platform. Appl. Sci. 2022, 12, 3734. [Google Scholar] [CrossRef]
Choe, C.; Choe, M.; Jung, S. Run Your 3D Object Detector on NVIDIA Jetson Platforms:A Benchmark Analysis. Sensors 2023, 23, 4005. [Google Scholar] [CrossRef]
Ul Amin, S.; Sibtain Abbas, M.; Kim, B.; Jung, Y.; Seo, S. Enhanced anomaly detection in pandemic surveillance videos: An attention approach with EfficientNet-B0 and CBAM integration. IEEE Access 2024, 12, 162697–162712. [Google Scholar] [CrossRef]
Abd El-Ghany, S.; Mahmood, M.A.; Abd El-Aziz, A.A. Adaptive dynamic learning rate optimization technique for colorectal cancer diagnosis based on histopathological image using EfficientNet-B0 deep learning model. Electronics 2024, 13, 3126. [Google Scholar] [CrossRef]
Sun, X.; Huo, H. Corn leaf disease recognition based on improved EfficientNet. IET Image Process 2025, 19, e13288. [Google Scholar] [CrossRef]
Zhang, Y.-D.; Pei, Y.; Górriz, J.M. SCNN: A explainable swish-based CNN and mobile app for COVID-19 diagnosis. Mob. Netw. Appl. 2023, 28, 1936–1949. [Google Scholar] [CrossRef]
Dubey, S.R.; Singh, S.K.; Chaudhuri, B.B. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 2022, 503, 92–108. [Google Scholar] [CrossRef]
Tan, H.; Ou, D.; Zhang, L.; Shen, G.; Li, X.; Ji, Y. Infrared Sensation-Based Salient Targets Enhancement Methods in Low-Visibility Scenes. Sensors 2022, 22, 5835. [Google Scholar] [CrossRef]
Zou, Y.; Fan, Y. An Infrared Image Defect Detection Method for Steel Based on Regularized YOLO. Sensors 2024, 24, 1674. [Google Scholar] [CrossRef]
Sugai, T.; Takano, K.; Ouchi, S.; Ito, S. Introducing Swish and Parallelized Blind Removal Improves the Performance of a Convolutional Neural Network in Denoising MR Images. Magn. Reson. Med. Sci. 2021, 20, 410–424. [Google Scholar] [CrossRef]
Cai, B.; Xu, Q.; Yang, C.; Lu, Y.; Ge, C.; Wang, Z.; Liu, K.; Qiu, X.; Chang, S. Spine MRI image segmentation method based on ASPP and U-Net network. Math. Biosci. Eng. 2023, 20, 15999–16014. [Google Scholar] [CrossRef]
Ma, R.; Wang, J.; Zhao, W.; Guo, H.; Dai, D.; Yun, Y.; Li, L.; Hao, F.; Bai, J.; Ma, D. Identification of maize seed varieties using MobileNetV2 with improved attention mechanism CBAM. Agriculture 2023, 13, 11. [Google Scholar] [CrossRef]
Jian, M.; Huang, H.; Zhang, H.; Wang, R.; Li, X.; Yu, H. CSSANet: A channel shuffle slice-aware network for pulmonary nodule detection. Neurocomputing 2025, 615, 128827. [Google Scholar] [CrossRef]
Li, J.; Yang, R.; Cao, X.; Zeng, B.; Shi, Z.; Ren, W.; Cao, X. Inception MLP: A vision MLP backbone for multi-scale feature extraction. Inf. Sci. 2025, 701, 121865. [Google Scholar] [CrossRef]
Mariani, A.; Malucelli, G. Insights into induction heating processes for polymeric materials: An overview of the mechanisms and current applications. Energies 2023, 16, 4535. [Google Scholar] [CrossRef]
Zhong, M.; Li, Y.; Gao, Y. Research on small-target detection of flax pests and diseases in natural environment by integrating similarity-aware activation module and bidirectional feature pyramid network module features. Agronomy 2025, 15, 187. [Google Scholar] [CrossRef]
Suhng, B.; Lee, W. Fast Zernike moment computation using PyTorch in a multiple-GPU environment. J. Electr. Eng. Technol. 2025, 20, 845–854. [Google Scholar] [CrossRef]
Robles-Guerrero, A.; Gómez-Jiménez, S.; Saucedo-Anaya, T.; López-Betancur, D.; Navarro-Solís, D.; Guerrero-Méndez, C. Convolutional neural networks for real time classification of beehive acoustic patterns on constrained devices. Sensors 2024, 24, 6384. [Google Scholar] [CrossRef]
Hicks, S.A.; Strümke, I.; Thambawita, V.; Hammou, M.; Riegler, M.A.; Halvorsen, P.; Parasa, S. On evaluation metrics for medical applications of artificial intelligence. Sci. Rep. 2022, 12, 5979. [Google Scholar] [CrossRef]
Pozzer, S.; Rezazadeh Azar, E.; Dalla Rosa, F.; Chamberlain Pravia Zacarias, M. Semantic Segmentation of Defects in Infrared Thermographic Images of Highly Damaged Concrete Structures. J. Perform. Constr. Facil. 2021, 35, 04020131. [Google Scholar] [CrossRef]
Tingting, Z.; Xunru, L.; Bohuan, X.; Xiaoyu, T. An in-vehicle real-time infrared object detection system based on deep learning with resource-constrained hardware. Intell. Robot. 2024, 4, 276–292. [Google Scholar] [CrossRef]
Guo, C.; Ren, K.; Chen, Q. YOLO-SGF: Lightweight network for object detection in complex infrared images based on improved YOLOv8. Infrared Phys. Technol. 2024, 142, 105539. [Google Scholar] [CrossRef]
Glowacz, A. Thermographic Fault Diagnosis of Shaft of BLDC Motor. Sensors 2022, 22, 8537. [Google Scholar] [CrossRef]
Pan, P.; Zhang, R.; Zhang, Y.; Li, H. Detecting Internal Defects in FRP-Reinforced Concrete Structures through the Integration of Infrared Thermography and Deep Learning. Materials 2024, 17, 3350. [Google Scholar] [CrossRef] [PubMed]
Santos, L.F.D.d.; Canuto, J.L.d.S.; Souza, R.C.T.d.; Aylon, L.B.R. Thermographic image-based diagnosis of failures in electrical motors using deep transfer learning. Eng. Appl. Artif. Intell. 2023, 126, 107106. [Google Scholar] [CrossRef]
Li, J.; Zheng, W.X.; Yang, L. Fast converging algorithm for blind equalization with gaussian and impulsive noises. IEEE Trans. Signal Process 2025, 73, 372–385. [Google Scholar] [CrossRef]
Wu, L.; Chen, H.; Li, P.; Yang, K. A novel ensemble approach for rib fracture detection and visualization using CNNs and Grad-CAM. Ann. Ital. Chir. 2025, 96, 86–97. [Google Scholar] [CrossRef] [PubMed]
Gopalan, K.; Srinivasan, S.; Pragya; Singh, M.; Mathivanan, S.K.; Moorthy, U. Corn leaf disease diagnosis: Enhancing accuracy with resnet152 and grad-cam for explainable AI. BMC Plant Biol. 2025, 25, 440. [Google Scholar] [CrossRef]

Figure 1. The structure of the Mobile Inverted Bottleneck Convolutions (MBConv) module. (a) Overall structure. (b) The structure of the Depthwise Separable Convolution (DS Conv) module. (c) The structure of the Squeeze-and-Excitation (SE) module.

Figure 2. The structure of the Atrous Spatial Pyramid Pooling (ASPP) module. BN represents Batch Normalization; Rate denotes the dilation rate used in atrous convolutions; AdaptiveAvgPool refers to global average pooling; Bilinear Interpolate represents bilinear interpolation used for upsampling.

Figure 3. The structure of the Convolutional Block Attention Module (CBAM) (a). It consists of a Channel Attention Module (b) and a Spatial Attention Module (c). The input feature,

F

, is sequentially refined by these two modules to generate the final refined feature,

F^{''}

.

Figure 3. The structure of the Convolutional Block Attention Module (CBAM) (a). It consists of a Channel Attention Module (b) and a Spatial Attention Module (c). The input feature,

F

, is sequentially refined by these two modules to generate the final refined feature,

F^{''}

.

Figure 4. The architecture of the proposed EfficientNet with Atrous Spatial Pyramid Pooling and Channel–Spatial Attention Mixing with Channel Shuffle (EAC-Net) network.

Figure 5. The structure of the Channel–Spatial Attention Mixing with Channel Shuffle (CSAMix) module.

Figure 6. Experimental platform for aluminum foil sealing using thermal imaging, consisting primarily of the aluminum foil sealing machine (with an internal induction heating unit), a thermal camera, and a conveyor system for image acquisition.

Figure 7. Schematic diagram of the aluminum foil sealing process, illustrating the sequence from bottle preparation and capping (with the internal structure of the aluminum foil piece), through electromagnetic induction heating, to cooling and sealing completion. The color distribution indicates temperature variation, where darker colors correspond to relatively lower temperatures and brighter colors correspond to relatively higher temperatures.

Figure 8. Visible and infrared thermal images for eight aluminum foil sealing statuses: (a) aluminum foil defect—local cold spots; (b) double-layer aluminum foil—enlarged cool center; (c) loose cap—reduced thermal symmetry; (d) low temperature—dim image with weak ring; (e) no aluminum foil—absence of thermal pattern; (f) overheating—high-intensity response; (g) reversed aluminum foil—cooler center with higher thermal response in the outer ring; (h) proper sealing—uniform and well-defined thermal ring. The color distribution indicates temperature variation, where darker colors correspond to relatively lower temperatures and brighter colors correspond to relatively higher temperatures.

Figure 9. Comparison of accuracy, precision, and inference time among different variant models (M1–M7 and EAC-Net).

Figure 10. Normalized confusion matrix (percentage form) for the EAC-Net model on the aluminum foil sealing defect classification task, illustrating the classification accuracy for each defect category and the distribution of misclassifications.

Figure 11. Accuracy and precision of eight variant models (M1–M7 and EAC-Net) under Gaussian noise disturbance (σ = 5), illustrating the robustness of each model to noise in the aluminum foil sealing defect classification task.

Figure 12. Comparison of Grad-CAM heatmaps for different sealing states. (a) Original thermal images; (b) Grad-CAM heatmaps from the EfficientNet-B0 model; (c) Grad-CAM heatmaps from the proposed EAC-Net model. The color distribution in the Grad-CAM heatmaps represents the gradient response intensity of the model, where brighter regions indicate stronger responses and darker regions indicate weaker responses.

Table 1. Comparison of model complexity and performance.

Models	FLOPs (G)	Params (×10⁶)	Size (MB)	Time (ms)	Acc (%)	P (%)
EfficientNet-B0	0.21	4.02	15.33	16.90	97.19	97.21
MobileNetV2	0.17	2.23	8.52	12.81	95.31	95.46
MobileNetV3	0.12	4.21	16.07	12.64	96.88	97.11
ShuffleNetV2	0.16	2.49	9.49	13.28	96.54	96.76
GoogleNet	0.77	5.61	21.39	17.33	96.67	96.86
ResNet101	4.01	42.52	162.19	25.09	97.81	97.93
RegNetX	0.42	6.59	25.15	19.98	98.12	98.25
EAC-Net	0.25	4.40	16.77	20.62	99.06	99.07

Table 2. Computational and space complexity of variant models.

Label	Models	FLOPs (G)	Params (×10⁶)	Size (MB)
M1	EfficientNet-B0	0.21	4.02	15.33
M2	EfficientNet-B0 + ConvSPP	0.23	4.24	16.16
M3	EfficinetNet-B0 + ASPP	0.23	4.24	16.16
M4	EfficinetNet-B0 + CBAM	0.21	4.02	15.33
M5	EfficinetNet-B0 + CSAMix	0.23	4.18	15.94
M6	EfficientNet-B0 + ConvSPP + CBAM	0.24	4.34	16.34
M7	EfficientNet-B0 + ASPP + CBAM	0.24	4.34	16.34
Proposed	EAC-Net	0.25	4.40	16.77

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hao, Z.; Chen, Y.; Yu, Z.; Qian, Y.; Zhao, L. Thermal Imaging-Based Defect Detection Method for Aluminum Foil Sealing Using EAC-Net. Appl. Sci. 2025, 15, 9964. https://doi.org/10.3390/app15189964

AMA Style

Hao Z, Chen Y, Yu Z, Qian Y, Zhao L. Thermal Imaging-Based Defect Detection Method for Aluminum Foil Sealing Using EAC-Net. Applied Sciences. 2025; 15(18):9964. https://doi.org/10.3390/app15189964

Chicago/Turabian Style

Hao, Zhibo, Yitao Chen, Zhongqi Yu, Yongjin Qian, and Leping Zhao. 2025. "Thermal Imaging-Based Defect Detection Method for Aluminum Foil Sealing Using EAC-Net" Applied Sciences 15, no. 18: 9964. https://doi.org/10.3390/app15189964

APA Style

Hao, Z., Chen, Y., Yu, Z., Qian, Y., & Zhao, L. (2025). Thermal Imaging-Based Defect Detection Method for Aluminum Foil Sealing Using EAC-Net. Applied Sciences, 15(18), 9964. https://doi.org/10.3390/app15189964

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Thermal Imaging-Based Defect Detection Method for Aluminum Foil Sealing Using EAC-Net

Abstract

1. Introduction

2. Related Work

2.1. EfficientNet

2.2. ASPP Module

2.3. CBAM Module

3. Proposed Method

3.1. EAC-Net Structure

3.2. CSAMix Module

4. Analysis and Discussion of Experimental Results

4.1. Experimental Platform and Dataset

4.2. Experimental Setup

4.3. Experimental Results and Analysis

4.3.1. Evaluation Metrics

4.3.2. Comparative Experiment

4.3.3. Ablation Study

4.4. Discussion

4.4.1. Robustness Evaluation Under Gaussian Noise Perturbation

4.4.2. Analysis of Discriminative Feature Localization Through Grad-CAM

4.4.3. Analysis of Applicability to Different Defect Types

4.4.4. Feasibility Analysis of Real-Time Industrial Deployment

4.4.5. Safety and Reliability Considerations in Industrial Deployment

4.4.6. Conceptual Comparative Analysis of Physics-Based, Hybrid, and Data-Driven TNDT Approaches

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI