A Study on ACCC Surface Defect Classification Method Using ResNet18 with Integrated SE Attention Mechanism

Xiao, Wenlong; Chen, Rui

doi:10.3390/app16041899

Open AccessArticle

A Study on ACCC Surface Defect Classification Method Using ResNet18 with Integrated SE Attention Mechanism

by

Wenlong Xiao

¹

and

Rui Chen

^2,*

¹

Department of Electronics and Electrical Engineering, East China University of Technology, Nanchang 330013, China

²

Jiangxi Province Engineering Research Center of New Energy Technology and Equipment, East China University of Technology, Nanchang 330013, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(4), 1899; https://doi.org/10.3390/app16041899

Submission received: 18 January 2026 / Revised: 8 February 2026 / Accepted: 11 February 2026 / Published: 13 February 2026

(This article belongs to the Special Issue AI Applications in Modern Industrial Systems)

Download

Browse Figures

Versions Notes

Abstract

Surface defect detection in aluminum-based composite core conductors (ACCC) via X-ray imaging has long been constrained by challenges such as small sample sizes, class imbalance, model redundancy, and inadequate adaptation to single-channel industrial images. To address this, this paper proposes SE-ResNet18, a lightweight classification model synergistically designed for industrial single-channel X-ray images. The model features a co-adapted architecture where a single-channel input layer (preserving native image information and eliminating RGB conversion overhead) is coupled with a channel attention mechanism (to amplify subtle defect features), all within a globally optimized lightweight framework. With targeted data augmentation and robust training strategies, the model achieves superior performance on the ACCC defect dataset: classification accuracy reaches 98.39%, while excelling in lightweight design (12.0 million parameters) and real-time capability (0.44 ms/image inference speed). The experiments demonstrate that the proposed model exhibits high classification accuracy in testing while offering superior lightweight characteristics and inference efficiency. This provides a feasible solution for achieving high-precision detection and real-time processing in industrial scenarios, showcasing potential for ACCC online detection applications.

Keywords:

aluminum-based composite core conductor (ACCC); surface defect detection; ResNet18; squeeze-and-excitation attention mechanism

1. Introduction

As modern power grids worldwide evolve toward higher efficiency and resilience, aluminum-based composite core conductors (ACCC) have become critical components for long-distance energy transmission, owing to their lightweight construction, high current-carrying capacity, and low line losses. Owing to their high capacity-to-weight ratio and low thermal sag, ACCC conductors enable significant capacity upgrades on existing transmission corridors, making them a key technology for modern grid enhancement [1]. The load-bearing carbon-composite core, however, is susceptible to surface and internal defects—typically categorized as Splitting, Fracture, Sawing, and Displacement—during manufacturing, installation, and service [2]. If undetected, such defects can act as stress concentrators and significantly compromise mechanical integrity. For instance, studies have shown that under bending loads, localized stresses in defective regions can exceed 1 GPa, surpassing the compressive strength of the carbon core (~724 MPa) and leading to microcracking, fiber damage, and potential conductor failure [3]. The inability to identify these defects in a timely manner, therefore, poses direct risks to grid reliability and safety [4]. Hence, developing efficient and accurate detection technologies for ACCC surface defects is of substantial practical and operational importance.

X-ray imaging is a predominant and well-established non-destructive testing technique for internal inspection in industrial components, particularly for layered or composite structures [4,5]. However, X-ray inspection faces three core technical challenges in practical application. First, defect samples are scarce and exhibit category imbalance. Insufficient data for rare defects (e.g., microscopic splits) limits the generalization capability of traditional detection methods [6,7,8,9]. Second, industrial online inspection demands stringent real-time performance. Existing high-precision models (e.g., Inception-ResNet-V2 [1]) feature large parameter counts (approx. 22M) and incur high computational costs, which limit their suitability for real-time applications where rapid inference is critical. Third, X-ray images are single-channel grayscale formats. General RGB image classification models require additional channel conversion preprocessing. This not only increases computational overhead but also creates a suboptimal feature representation for single-channel source data, as it introduces redundant and synthetic channels that do not correspond to original signal variations. Our design choice of native single-channel processing is empirically justified, as it contributes to both higher inference efficiency and improved accuracy in our ablation study (Section 4.4).

Traditional ACCC defect detection methods primarily rely on manually designed feature extraction combined with classical classifiers (e.g., LBP, SVM, and random forests). Such approaches are constrained by the expressiveness of artificial features, exhibit poor adaptability to low-contrast and small-sized defects, and demonstrate weak generalization capabilities, making them unsuitable for meeting the real-time demands of industrial online inspection [10,11,12,13]. With the advancement of deep learning, convolutional neural network-based defect detection methods have emerged as the mainstream approach. Object detection models (e.g., Faster R-CNN and YOLO) can localize and identify defects but suffer from complex structures and high computational costs, rendering them inefficient for scenarios like ACCC, where defect locations are relatively fixed [14,15,16,17,18]. Image classification models (e.g., Inception series [19,20,21] and DenseNet [22]) enhance recognition capabilities through end-to-end learning. Du et al. [23] employed a Feature Pyramid Network (FPN) to detect X-ray defects in aluminum alloy castings, utilizing multi-scale features to improve recognition of small defects. Zhang et al. [24] employed an enhanced DETR algorithm framework, utilizing ResNet fused with ECA-Net attention modules as the backbone network. They introduced a multi-scale deformable attention mechanism, effectively improving the detection performance of small defects in castings. However, these approaches are primarily designed for RGB images and lack optimization for the single-channel characteristics of X-ray data. Furthermore, the models feature large parameter counts and low inference efficiency, limiting their generalization capabilities in industrial scenarios with scarce samples. For ACCC-specific detection, Hu et al. [1] proposed a classification method based on an enhanced Inception-ResNet-V2. While achieving high recognition rates, the model structure is complex, lacks optimization for single-channel inputs, and exhibits low recall for small-sample defects.

To address these issues, this paper proposes SE-ResNet18: a lightweight classification architecture synergistically designed for single-channel X-ray images of ACCC conductors. Built upon ResNet18, which is suitable for small-sample scenarios, our model features a co-adapted design where a natively single-channel input stream is coupled with an embedded SE attention mechanism for channel-wise adaptive feature calibration. This integrated approach not only preserves original defect information by eliminating unnecessary RGB conversion but also enhances feature discrimination, thereby improving classification accuracy while ensuring inference efficiency for industrial deployment.

2. Methods

2.1. Holistic Methodological Framework

To address the intertwined challenges of single-channel input, subtle defect discrimination, and industrial deployment efficiency, we propose a synergistic lightweight architecture. Rather than applying independent modifications, our design ensures that the adaptation to grayscale input, the enhancement of critical features, and the reduction of model complexity work in concert. Figure 1 illustrates this integrated framework, where the single-channel input directly informs the feature extraction and attention calibration processes within a streamlined ResNet18 backbone.

2.2. Single-Channel Adaptive ResNet18 Backbone Network

The SE-ResNet18 backbone architecture proposed in this study is illustrated in Figure 2. Designed according to the principles of “lightweight, efficient, and adaptive,” it comprises three key components: (1) a single-channel adaptive input and feature extraction head (corresponding to the Input to MaxPool section in the figure), (2) a deep residual feature extractor integrating SE attention (corresponding to the four SE-Residual Blocks in the figure), and (3) a lightweight classification output head (corresponding to the GAP to Output section in the figure). This section first elaborates on the first component: the single-channel input adaptation design tailored for X-ray grayscale images.

The traditional ResNet18 network is designed for RGB three-channel images, with its input layer convolutional kernel dimensions set at 7 × 7 × 3 × 64. However, X-ray inspection images of ACCC conductors are in a single-channel grayscale format. Directly applying a three-channel input approach would cause three major issues: information redundancy (requiring replication of the single-channel image into three channels), computational waste (rendering approximately 67% of input channel computations ineffective), and feature dilution (introducing interference from irrelevant channels). To address this, this paper performs targeted single-channel adaptation modifications to the ResNet18 backbone network.

The core of single-channel input layer reconstruction lies in adjusting the number of input channels from the first convolutional layer from 3 to 1, transforming the kernel dimensions to 7 × 7 × 1 × 64. This modification enables the network to directly process single-channel grayscale images without requiring traditional three-channel conversion, thereby fundamentally avoiding information loss and computational redundancy. For parameter initialization, the Kaiming normal distribution strategy is employed with mode = ‘fan_out’ to accommodate the ReLU activation function, ensuring training stability under single-channel input. Theoretically, the single-channel convolution operation can be expressed as:

Y = X \times K = \sum_{i = 1}^{H} \sum_{j = 1}^{W} X (i, j) \times K (i, j)

(1)

where X is the single-channel input image and K is the convolution kernel. Compared to three-channel input (

Y = {\sum_{c = 1}^{3} X}_{c} \times K_{c}

), single-channel input eliminates the need for convolutional summation operations on redundant channels, thereby improving computational efficiency.

While completing the input layer adaptation, this paper fully preserves the core residual structure of ResNet18 (including Layers 1 to 4, with two BasicBlocks per layer). Residual connections resolve the gradient vanishing problem in deep networks through identity mapping, with the fundamental formula being:

F (x) = H (x) - x

(2)

y = F (x, {W_{i}}) + x

(3)

here,

x

and

y

represent the input and output of the block, respectively.

H (x)

denotes the desired base mapping, while

F (x, W_{i})

serves as the residual function. This architecture ensures the network maintains stable training dynamics and robust performance even as it deepens.

Table 1 presents the specific layer configuration of the SE-ResNet18 network. The model comprises five main layers: Conv1 is a standard 7 × 7 convolutional layer that maps a single-channel input to a 64-dimensional feature space, with an output size of 112 × 112; Layers 1 to 4 consist of four stacked SE–BasicBlock layers, each containing two SE–BasicBlocks. These progressively expand the number of channels to 64, 128, 256, and 512, respectively, while correspondingly reducing the feature map dimensions. A total of eight SE modules are embedded at the end of each residual block to achieve adaptive calibration of feature channels.

2.3. Embedding and Optimization of the SE Attention Mechanism

To enhance the model’s ability to capture subtle features of ACCC defects, this study embedded SE (Squeeze-and-Excitation) attention modules after each residual block in ResNet18. This module employs a channel-adaptive feature relabeling mechanism to strengthen the network’s focus on critical defect features while suppressing interference from background and irrelevant information, thereby improving the ability to distinguish morphologically similar defects. The structural design of the SE attention module follows a three-stage workflow, “Squeeze–Excitation–Remapping,” as illustrated in Figure 3.

2.3.1. SE Module Structural Design

SE modules were integrated into the modified ResNet18 to enhance subtle defect recognition, with specific details as follows:

(I): Embedding Position: One SE module was embedded into each BasicBlock, precisely positioned after the second 3 × 3 convolutional layer and before the residual connection summation—ensuring feature calibration without disrupting gradient flow.
(II): Configuration Details: A total of 8 SE modules were deployed (2 per layer across Layers 2–4, detailed in Table 1). The reduction ratio was set to r = 16 (optimized via ablation experiments for small defect recognition).
(III): Lightweight Advantage: The modules add only 0.3M additional parameters (2.5% of the total 12.0M model parameters) and incur no extra inference overhead, maintaining a 0.44 ms/image inference speed.
(IV): Task-Specific Effect: Enhances channel-wise responses of small/low-contrast defects (e.g., Splitting, <5 pixels) in single-channel data, increasing Splitting defect recall from 88.2% (baseline ResNet18) to 94.0% and reducing misclassifications between morphologically similar defects (Splitting vs. Fracture).

2.3.2. Integration Strategy for SE Modules and ResNet18

To enhance the model’s ability to capture subtle features of ACCC defects, this study embeds the SE (Squeeze-and-Excitation) attention module into each BasicBlock of ResNet18, constructing the SE–BasicBlock as the core unit of the network. The embedding position is set after the second convolutional layer and before the summation of the residual connection within each block. This ensures features participate in subsequent information propagation only after calibration while preserving the original gradient flow of the residual structure, thereby maintaining training stability and efficiency. Across the four-layer residual architecture (Layer1–Layer4), each layer incorporates two SE–BasicBlocks, totaling eight embedded SE modules. Their hierarchical and channel configurations are detailed in Table 1. The entire design demonstrates high parameter efficiency, with the SE modules introducing only 0.3M additional parameters (2.5% of the total). The SE attention module introduces negligible inference overhead: compared to the baseline ResNet18 (0.44 ms/image), the SE-ResNet18 maintains the same inference speed (0.44 ms/image). This is because the SE module’s squeeze–excitation operations involve only global average pooling and two lightweight fully connected layers, whose computational cost is negligible compared to the residual feature extraction process, achieving performance gains characterized by ‘high accuracy with minimal overhead.’

In the ACCC defect detection task, the SE module enhances model performance through a triple-mechanism approach. First, via channel-selective enhancement, the model adaptively amplifies feature channel responses associated with carbon core defects while suppressing irrelevant information such as aluminum-stranded-wire background, thereby focusing on critical regions. Second, for minute, low-contrast defects like splits, the SE module amplifies their faint signals in feature channels, thereby improving recognition of “hard samples.” Finally, by strengthening discriminative channel features across defect categories, the SE module enhances feature space separability, effectively reducing misclassifications between morphologically similar defects (e.g., splits and breaks). The experiments demonstrate that incorporating the SE module elevates the recall rate for split defects from 88.2% in the baseline ResNet18 to 94.0%, validating its effectiveness in addressing small-sample and class-imbalanced problems. The comprehensive advantages of the SE module over other attention mechanisms will be discussed in depth in Chapter 4, supported by experimental results.

2.4. Lightweight Output Layer Design and Training Strategy

To ensure efficient training convergence, strong generalization, and industrial deployability, targeted training strategies were designed in conjunction with the lightweight output layer detailed in Section 2.2. As modified earlier, ResNet18′s redundant ImageNet-specific classification head is replaced with a streamlined structure: AdaptiveAvgPool2d (GAP) compresses the 7 × 7 × 512 feature map into a 512-dimensional vector, followed by a 512 × 5 fully connected layer with Softmax activation, reducing output layer parameters from approximately 500k to 2.56k while controlling model complexity and overfitting risks. For data augmentation—exclusively applied to the training set to mitigate sample scarcity and class imbalance without compromising objective evaluation of validation and test sets—we employed horizontal flipping, random rotation (±15°), slight shifting (±5 pixels), scaling (0.9–1.1×), minor shearing (±5°), Gaussian blur (kernel size 3 × 3), and brightness/contrast adjustment (±10%), with targeted augmentation for the least represented Splitting defect (Class_0) via symmetric flipping and mild Gaussian noise addition (σ = 0.05) to double its sample count. For training optimization, we adopted the AdamW optimizer with an initial learning rate of 1 × 10⁻³ and weight decay of 1 × 10⁻⁴, paired with a 5-epoch linear warm-up + cosine annealing scheduling strategy (final learning rate = 1 × 10⁻⁵), and set the batch size to 64 for 200 total epochs with mixed-precision training (FP16). To enhance training stability and efficiency, gradient clipping (max norm = 1.0) and early stopping (patience = 30, monitored via validation accuracy) were also incorporated.

3. Dataset

3.1. Dataset Construction

This study utilizes the aluminum-based composite core conductor (ACCC) X-ray scan dataset publicly released by Wang et al. [25]. The dataset was collected from industrial ACCC conductor X-ray scanning processes, with raw data generated through continuous scanning and containing substantial inter-frame redundancy. To construct a specialized dataset suitable for classification tasks, this study performed rigorous quality screening, inter-frame redundancy removal, and category balancing on the public dataset. This resulted in a specialized classification dataset comprising 2488 images, as shown in Figure 4, featuring four typical surface defects (Splitting, Fracture, Sawing, and Displacement) and defect-free samples. Data acquisition utilized an IXS120BP120P357 X-ray source paired with an XRpad2 3025 detector. Scan parameters were set to tube voltage 70–75 kV and current-time product 0.08–0.12 mA·s to ensure clear defect visualization.

To clearly illustrate the visual characteristics of each defect category presented to the model, representative samples are displayed in Figure 4. These samples were selected from the independent test set (i.e., they were not used in training or validation). The selection criteria prioritized typicality and clarity, meaning each image was chosen to be a clear, unambiguous example of its designated class, covering a range of defect sizes and contrasts present in the dataset. This approach ensures that the figure fairly represents the classification task without bias towards exceptionally easy or hard cases.

To ensure a consistent definition of defect types throughout the study, Table 2 summarizes each category with key morphology, quantitative metrics, and industrial causes. Five classes are included: four typical defects (Splitting, Fracture, Sawing, and Displacement) and defect-free samples, which form the basis of the 5-class classification task.

3.2. Data Preprocessing

Given the characteristics of X-ray grayscale images and the high redundancy of industrial continuous scanning data, this study designed a four-step preprocessing workflow comprising data cleaning, inter-frame redundancy removal, dimensional standardization, and normalization:

(I): Data Validity Screening

Through path validation and image readability checks, invalid images caused by storage anomalies or transmission corruption (e.g., path errors and undecodable files) were filtered out. After screening, the training, validation, and test sets all achieved 100% valid image coverage, providing a high-quality data foundation for model training.

(II): Category-Adaptive Inter-Frame Redundancy Removal

Raw data originates from continuous scanning, resulting in high redundancy between adjacent frames. To enhance data quality and training efficiency, this study designed a category-adaptive inter-frame redundancy removal algorithm. Its core principle dynamically adjusts redundancy removal thresholds based on visual saliency differences among defect types, maximizing redundancy elimination while preserving critical features. The algorithm flow is as follows:

Differentiated parameter configuration: Different differential thresholds (diff_threshold) and minimum retention intervals (min_save_interval) are set for distinct defect types. For defects with prominent, easily identifiable features (e.g., fractures), a higher threshold (6.0) and larger interval (6 frames) are applied to aggressively eliminate redundancy. For subtle, hard-to-identify defects (e.g., saw cuts and displacements), employ lower thresholds (3.0–3.5) and shorter intervals (3 frames) to conservatively retain more samples and prevent critical feature loss.

Inter-frame differential screening: Iterate through each category’s continuous image sequence, calculating the absolute differential mean between the current frame and the last retained frame. If the difference value exceeds the category threshold or reaches the forced retention interval, the frame is retained.

Hard-to-classify sample protection mechanism: For the least frequent Splitting defect, an additional medium threshold (4.0) and moderate retention interval (4 frames) are set to ensure small-sample categories are not excessively discarded during redundancy removal.

(III): Dimension Standardization and Single-Channel Adaptation

All images are uniformly resized to 224 × 224 pixels to meet ResNet18′s input requirements. The original single-channel grayscale format is retained to avoid information redundancy and computational overhead from converting grayscale images to three-channel RGB format.

(IV): Normalization Processing

A global mean-variance normalization method is applied, linearly mapping pixel values to the [−1, 1] range based on training set statistics (mean = 0.5, standard deviation = 0.5). This effectively reduces brightness fluctuations caused by imaging variations, accelerating model convergence.

The dataset was randomly partitioned into training (1741 images), validation (249 images), and test (498 images) sets at a 7:1:2 ratio. Partitioning strictly adhered to the “sample independence and distribution consistency” principle to ensure objectivity in validation and testing outcomes. The category distribution of the dataset is shown in Table 3, exhibiting inherent class imbalance. Sawing defects constitute the largest sample size (667 images, 26.8%), while Splitting defects are the least numerous (310 images, 12.5%). This imbalance stems from the actual occurrence probabilities of various defects in industrial scenarios and will be mitigated through data augmentation strategies in subsequent steps.

4. Experiments and Results

4.1. Experimental Setup

To ensure reproducibility and reliability, the experiments in this study were conducted under a unified hardware and software environment, utilizing explicitly defined training parameters and a standardized dataset partition. Specific configurations are as follows:

Hardware and software environment: The experiments were conducted on a hardware platform equipped with an Intel Core i7-13700H CPU and an NVIDIA GeForce RTX 4070 Laptop GPU (8 GB VRAM), supplemented by 16 GB DDR5 memory and 1 TB SSD storage to meet the computational demands of deep learning model training and inference. The software environment was built on Python 3.11, leveraging the PyTorch 2.0 deep learning framework. It integrated key libraries, including OpenCV 4.8 (image processing), Albumentations 1.3 (data augmentation), Scikit-learn 1.2 (performance evaluation), and Matplotlib 3.7 (result visualization), to ensure stable execution of the experimental workflow and reproducibility of results.

Training parameter settings: The training process strictly follows the optimization strategy described in Section 2.4. Key parameter configurations include: total training epochs (Epochs) set to 200, batch size (Batch Size) set to 64, and the AdamW optimizer selected (initial learning rate of 1 × 10⁻³ and weight decay of 1 × 10⁻⁴). A “5-epoch linear warm-up + cosine annealing” mechanism was employed for learning rate scheduling. Mixed-precision training and gradient clipping (maximum norm of 1.0) were enabled to enhance efficiency and stability. Validation set accuracy was continuously monitored during training, with an early stopping patience threshold of 30. The model weights achieving the best performance on the validation set were ultimately saved for testing.

4.2. Evaluation Indicators

To address the requirements of “precision” and “practicality” in industrial defect detection, model performance is primarily evaluated through five core metrics, as shown in Table 4:

4.3. Core Results Analysis

4.3.1. Overall Performance of SE-ResNet

The overall performance of the SE-ResNet18 model on the test set is shown in Table 5, with all core metrics achieving high levels: accuracy of 98.39%, Macro-F1 of 0.996, and recall of 98.2%. The Macro-F1 score demonstrates the model’s equally outstanding recognition capability for both the more frequent Sawing defects (667 images) and the less frequent Splitting defects (310 images), addressing the core challenge of “missing rare defects” in industrial defect detection. Additionally, the model features only 12.0 million parameters and achieves an inference speed of 0.44 ms per image, demonstrating significant advantages in lightweight design and real-time performance. It can be directly integrated into industrial online inspection equipment.

Figure 5 displays the accuracy and loss curves for the training and validation sets during the training process of the SE-ResNet18 model. It can be observed that as the number of training iterations increases, both the training set and validation set accuracy steadily rise and eventually stabilize, while the loss value continuously decreases until convergence. No significant overfitting was observed during training. The validation set curve is smooth and closely aligns with the training set performance, indicating that the adopted training strategies, such as data augmentation, learning rate scheduling, and regularization, effectively enhance the model’s generalization capability and training stability.

4.3.2. Category and Performance Analysis

The detailed performance metrics of SE-ResNet18 across categories on the test set are shown in Table 6. The table indicates that while the model exhibits some variation in recognition performance across different categories, its overall performance remains excellent:

Class_1 (No Defect) and Class_2 (Sawing Defect): Precision, recall, and F1 score all reach 100%, indicating the model’s exceptional feature extraction capability for these two categories, with no missed detections or false positives.

Class_0 (Splitting Defects): Precision and recall both reached 94%, with an F1 score of 0.94. This category contained 62 samples, with four misclassified instances (misclassified as Class_3 and Class_4). The primary cause was morphological similarities between Splitting defects and Fracture/Displacement defects, coupled with extremely small defect sizes and low contrast in some samples.

Class_3 (Fracture Defects): Precision 98%, Recall 99%, and F1 score 0.99. Among 130 samples, only one was misclassified as Category 0, indicating strong Fracture defect recognition capability.

Class_4 (Shifting Defects): Precision 98%, Recall 97%, and F1-score 0.97. Among 96 samples, three were misclassified as Class_0. These primarily resulted from overlapping distributions of Displacement defects and Splitting defects in the feature space.

4.3.3. Confusion Matrix Analysis

To thoroughly evaluate the model’s classification performance across categories, a systematic analysis of the test set confusion matrix was conducted, as shown in Figure 6. Overall, the model demonstrated excellent classification capabilities, with the vast majority of samples correctly classified. Key findings are as follows:

Correct Classification: Except for Class_0 (Splitting defect), all samples in the remaining categories—including defect-free (Class_1), Sawing defect (Class_2), Fracture defect (Class_3), and Displacement defect (Class_4)—were perfectly classified. The values along the diagonal correspond to the total number of samples in each category, indicating the model’s strong recognition capability for these classes.

Misclassifications Concentrated in Class_0: The model produced only five misclassified samples, all originating from Class_0 (Splitting defects). Among these, two samples were misclassified as Class_3 (Fracture defects), and three samples were misclassified as Class_4 (Shifting defects). No cross-misclassifications occurred between other categories. This indicates the model clearly distinguishes most categories, with misclassifications primarily confined to a few defect types with similar morphologies.

Misclassification Root Cause Analysis: All five misclassified samples were extreme “hard examples” sharing common characteristics: extremely small defect size (<5 pixels), low contrast (grayscale difference ≤ 10), and local visual similarities to Fracture or Displacement defects. In industrial scenarios, even human inspectors require image magnification to clearly identify such samples, reflecting the model’s recognition limitations for ultra-fine, low-contrast defects.

Industrial Scenario Adaptability: The overall misclassification rate is approximately 1.01% (5/498), aligning with the acceptable threshold of “misclassification rate ≤ 1%” in industrial inspection. Since misclassified samples represent edge cases difficult for humans to distinguish, practical applications can incorporate a “high-confidence borderline sample secondary review” mechanism to address them, resulting in limited impact on overall system performance.

In summary, confusion matrix analysis indicates that SE-ResNet18 performs flawlessly across most categories, with misclassifications occurring only in a small number of morphologically similar, weakly featured extreme samples. This further confirms the model’s strong feature discrimination capability and industrial practical potential.

To further explain this misclassification phenomenon at the feature level, PCA dimensionality reduction visualization was applied to the high-dimensional defect features extracted by the model. The results are shown in Figure 7:

Figure 7 presents a two-dimensional visualization of high-dimensional defect features extracted by the SE-ResNet18 model after dimensionality reduction via Principal Component Analysis (PCA). The horizontal axis (PC1) and vertical axis (PC2) represent the first and second principal component directions, respectively, explaining over 58% of the cumulative variance. Samples of different categories are distinguished by color. It is evident that defect-free, Sawing defects, Fracture defects, and Displacement defects form distinct clusters with clear boundaries in the feature space, indicating the model’s strong capability to characterize and distinguish these defects. Notably, a slight edge overlap exists between the clusters of Splitting defects (Category 0), Fracture defects (Category 3), and Displacement defects (Category 4). This aligns with the misclassification observed in the confusion matrix, intuitively confirming the high similarity in morphological and textural features among these defect types—representing the primary classification challenge for the model. Overall, the PCA visualization confirms the model’s strong inter-class separability at the feature distribution level. It also reveals that future improvements in feature representation could enhance the model’s ability to distinguish between similar defects.

4.4. Ablation Experiment

To validate the effectiveness of the three core modules—“single-channel adaptation,” “data augmentation,” and “learning rate scheduling (warm-up + cosine annealing)”—an ablation study was designed using the controlled variable method. The baseline model was “original ResNet18 (3-channel input) + no augmentation + fixed learning rate.” The experimental results are presented in multiple dimensions, as shown in Table 7 and Figure 8 (Comprehensive Analysis of Ablation Study):

Ablation experiments demonstrate that single-channel adaptation improves accuracy by 0.40 percentage points, validating the effectiveness of input layer modifications for X-ray grayscale images. This approach avoids information loss during 3-channel conversion while boosting inference efficiency by 40.5% (from 0.74 ms to 0.44 ms). Data augmentation boosted accuracy by 2.72 percentage points and increased Macro-F1 by 0.033, indicating that this strategy effectively mitigates class imbalance and data scarcity while significantly enhancing the model’s generalization capability for minor defect categories. The SE attention mechanism improved accuracy by 0.61 percentage points, demonstrating that this module enhances defect detail capture through adaptive weighting of key feature channels without incurring additional inference overhead. Learning rate scheduling (warm-up + cosine annealing) improved accuracy by 1.00 percentage points, serving as a critical module for stability by effectively preventing training oscillations and late-stage convergence stagnation. The synergistic effect of these four modules ultimately elevated the model’s accuracy from 95.78% (base model without data augmentation) to 98.39%, validating the rationality and effectiveness of this method’s design.

The ablation results validate our engineering design choices. The performance improvement is not from a single ‘novel’ component, but from their orchestrated integration. The single-channel adaptation provides a lean foundation, the SE module effectively exploits this foundation for discrimination, and the training strategies stabilize learning. This underscores the value of a holistic optimization approach for complex industrial tasks.

4.5. Comparative Experiment

4.5.1. Validation of SE Attention Mechanism Effectiveness (Compared to Baseline ResNet18)

To directly validate the contribution of the SE (Squeeze-and-Excitation) attention module in the ACCC defect classification task, this section compares the proposed SE-ResNet18 with its base version ResNet18 (without attention mechanism) under identical datasets and training strategies. As shown in Table 8.

Comparison Analysis:

(I): Significant Accuracy Improvement: Compared to the baseline ResNet18, SE-ResNet18 achieves a 0.60 percentage point increase in accuracy and a 0.008 improvement in Macro-F1 score. This directly demonstrates that the SE module effectively enhances the model’s ability to capture and distinguish key features of ACCC defects through adaptive feature recalibration in the channel dimension.
(II): Minimal Cost for Performance Gain: The SE module adds only 0.3 million parameters (approximately 2.5%), with inference speed remaining unchanged. This demonstrates that the SE mechanism achieves significant classification performance improvement at minimal computational overhead, offering excellent cost-effectiveness.
(III): Error Pattern Analysis: The baseline ResNet18 produced 11 misclassified samples on the test set, primarily arising from confusion between “Splitting” (Class_0) and “Shifting” (Class_4). This reflects the model’s insufficient ability to distinguish morphologically similar defect categories with subtle features without attention guidance. The SE module addresses this by amplifying relevant feature channels and suppressing irrelevant background information.

Conclusion: Comparative experiments with the baseline ResNet18 strongly demonstrate that the SE attention mechanism is pivotal to the performance gains of the proposed model. At negligible computational and speed costs, it significantly enhances model accuracy, particularly strengthening the discrimination of easily confused defects. This validates its effectiveness when combined with ResNet backbone networks for industrial defect classification tasks.

4.5.2. Comparison with Other Models

To comprehensively evaluate the performance advantages of the proposed SE-ResNet18 model in the ACCC surface defect classification task, this paper selected five representative comparison models covering ultra-lightweight models, state-of-the-art domain models, deep/dense networks, and classic lightweight networks. All comparative experiments were conducted under identical hardware environments, identical training strategies, and identical dataset partitions to ensure comparability and fairness of results. The test set comprised 498 images across five categories (four defect classes + no defect).

Table 9 presents key performance metrics for SE-ResNet18 and each benchmark model on the test set, including accuracy, macro-average F1 score, parameter count, and inference speed.

Figure 9 shows the ACCC Defect Detection Task: Accuracy–Recall Distribution of Different Models. Different shapes and colors in the figure correspond to different models (legend provided), with values beside each marker indicating the accuracy and recall of the corresponding model. This figure visually validates the dual advantages of the proposed model in classification accuracy and defect recall completeness, consistent with the quantitative results in Table 9.

(I): Compared with lightweight models: Significantly higher accuracy

SqueezeNet1.1, an extremely lightweight model with only 0.73M parameters, achieves the fastest inference speed (1.8 ms/image) but the lowest accuracy (95.18%). This indicates its limited expressive power, making it difficult to capture subtle features of ACCC defects.

AlexNet, with 57.01M parameters, achieves only 95.58% accuracy, demonstrating that its architecture is unsuitable for current high-precision classification tasks.

ResNet34 achieves 97.79% accuracy with 21.28M parameters, outperforming the previous two models but still falling short of SE-ResNet18.

(II): Comparison with high-performance models: Superior accuracy balancing efficiency and lightweight design

The improved Inception-ResNet-V2 serves as a state-of-the-art model in this domain, achieving 98.19% accuracy with 22.00M parameters and notable inference speed advantages. However, its complex architecture and specialized input size (90 × 256) limit industrial adaptability.

DenseNet121 excels in inference speed (0.94 ms/image) and lightweight design (6.95M parameters), but its accuracy (97.99%) falls below SE-ResNet18 (98.39%). SE-ResNet18 holds a 0.40 percentage point accuracy advantage, with 12.0M parameters and an inference speed of 0.44 ms/image. It achieves a superior balance between precision and efficiency, making it particularly suitable for industrial inspection scenarios demanding stringent accuracy.

5. Discussion

5.1. Synergistic Effect of Single-Channel Adaptation and Attention Mechanism

The core innovation of this paper lies in proposing the lightweight SE-ResNet18 model, which synergistically combines “single-channel input adaptation” with “SE attention mechanism embedding.” This design directly addresses the industrial characteristics and detection challenges of ACCC conductor X-ray imaging, achieving unified improvements in accuracy, efficiency, and deployment friendliness.

Single-channel adaptation resolves compatibility and efficiency issues when processing X-ray grayscale images within traditional RGB models. By modifying ResNet18′s input layer convolutional kernel to 7 × 7 × 1 × 64, the model directly processes single-channel images, eliminating information redundancy and approximately 67% computational waste in the input layer caused by replicating single-channel data into three channels. This approach preserves original grayscale features in defect regions, reduces preprocessing steps, and significantly boosts inference efficiency (achieving 0.44 ms/image). This adaptation aligns the model with real-world industrial data streams, providing a clean and efficient data interface for subsequent feature extraction.

Building upon this foundation, the Self-Attention (SE) mechanism enhances the capture and discrimination of critical defect features through channel-adaptive calibration. With only 0.3 million parameters, the SE module boosts the model’s accuracy on the test set by 0.60 percentage points, notably increasing the recall rate for the most challenging split defects from 88.2% to 94.0%. Its mechanism involves a “compression–excitation–recalibration” process that amplifies responses in defect-related feature channels while suppressing background and irrelevant information. This significantly enhances the model’s ability to distinguish small-sample, low-contrast, and morphologically similar defects.

The synergistic design embodies a comprehensive optimization approach of “front-end adaptation–back-end enhancement”: single-channel adaptation provides cleaner, more focused feature inputs for the attention mechanism, while the SE module further enhances the discriminative power of these features. Together, they enable the model to achieve a balance between high accuracy (98.39% accuracy) and high speed (0.44 ms/image) while maintaining a lightweight architecture (12.0M parameters). This design approach provides a reference lightweight, high-precision solution for similar industrial single-channel image classification tasks.

5.2. Analysis of Misclassified Samples

The test dataset comprises 498 samples distributed across categories as follows: Category 0 (Split defects, 62 samples), Category 1 (No defects, 77 samples), Category 2 (Sawing defects, 133 samples), Category 3 (Fracture defects, 130 samples), and Category 4 (Displacement defects, 96 samples). SE-ResNet18 produced five misclassified samples: four Category 0 samples misclassified as Categories 3/4, and one Category 4 sample misclassified as Category 0. Confusion primarily occurred between “Categories 0, 3, and 4.”

From a feature perspective, combining Figure 10 (feature analysis diagram) and Figure 11 (t-SNE visualization) reveals: The cosine similarity heatmap in Figure 10 shows that the feature similarity between Category 0 and Categories 3/4 reaches 0.46 and 0.70 respectively, significantly higher than similarities with other categories. Moreover, Category 0 exhibits substantial feature volatility (standard deviation exceeding 0.5 in some dimensions). The feature points of these three categories also exhibit local overlap in Figure 11, indicating that their high-dimensional features inherently lack strong discriminative power, making edge samples prone to confusion.

From the images themselves, all five misclassified samples are marginal cases in the test set: Among the four misclassified Category 0 samples, defect sizes are all <5 pixels (occupying only 1–2% of the carbon core area), and the grayscale difference between the carbon core and defect is ≤10 (below the typical ≥15 for defects). Fine cracks are obscured by background noise, visually converging with Fracture and Displacement defect features. The single Category 4 misclassified sample featured a displacement zone covering only one-fifth of the carbon core, with edge textures highly similar to small-scale Splitting defects.

From the model and sample perspective: Category 0 samples numbered only 62 (12.4% of the test set). Although data augmentation mitigated imbalance during training, learning data for marginal samples remained insufficient. Concurrently, SE-ResNet18′s receptive field is better suited for medium-sized defects (5–30 pixels), leading to inadequate feature extraction for these small, low-contrast marginal samples. This ultimately resulted in a small number of misclassifications.

Notably, these five misclassified samples represent “hard cases” in industrial inspection—even human annotators require 200% image magnification for definitive identification. Their misclassification rate is approximately 1.01% (5/498), which aligns with the acceptable threshold of “misclassification rate ≤ 1%” in industrial settings. Moreover, since these samples represent edge cases inherently difficult for humans to distinguish, they can be addressed in practical inspections through supplementary workflows like “secondary manual review of high-confidence borderline samples,” resulting in minimal impact on industrial applications.

5.3. Limitations

Despite SE-ResNet18 demonstrating strong classification performance and industrial applicability in experiments, several limitations remain, primarily manifested in four aspects:

First, the model still struggles with identifying extremely minute defects (e.g., size < 3 pixels). Such defects occupy a low proportion in the feature space and are susceptible to background noise interference, leading to a significant drop in recognition accuracy. Future work could incorporate attention mechanisms with dual spatial-channel dimensions (e.g., CBAM) and design targeted amplification strategies for ultra-small defects to enhance the model’s ability to capture subtle features.

Second, the current approach relies on a single X-ray imaging perspective, making it challenging to fully characterize the spatial morphology of oblique or concealed defects, potentially leading to missed detections or misclassifications. Subsequent research should explore multi-view image acquisition and feature fusion mechanisms. Constructing a dual-branch network to integrate defect information from different angles could enhance recognition robustness.

Additionally, the model’s robustness in extreme industrial environments remains under-validated. Factors like imaging noise, surface contamination, and voltage fluctuations may compromise detection stability. Future work should establish specialized datasets incorporating complex environmental disturbances and employ strategies such as adversarial training to enhance the model’s environmental adaptability.

Furthermore, the use of a publicly available X-ray image dataset precludes direct micro-structural validation (e.g., SEM analysis) of the specific defect samples. However, as discussed in Section 5.4, the physical relevance of the defined defect categories is strongly supported by their correlation with established failure mechanisms in the literature.

Finally, this study currently achieves only defect type classification without quantifying severity assessment. Industrial maintenance often requires repair strategies based on metrics like defect size and depth. Future work could integrate object detection and image segmentation techniques to establish a defect severity grading model, extending functionality from “detection” to “assessment”.

5.4. Physical Interpretation and Microstructural Correlation of Defect Classes

The defect categories defined in this study are not merely abstract image patterns; they are grounded in distinct physical failure modes of the ACCC’s carbon-fiber composite core. Establishing a clear link between the X-ray morphological features and their underlying micro-scale damage mechanisms is crucial for reinforcing the practical relevance and mechanical significance of the classification task. While the use of a public X-ray image dataset [25] precludes direct destructive sampling and subsequent micro-structural analysis (e.g., scanning electron microscopy, SEM) of the specific specimens, each defined category aligns robustly with well-documented damage mechanisms established in prior destructive and micro-analytical studies of ACCC and analogous composite materials.

I.: Splitting defects (Class_0), appearing as fine hairline cracks, correspond to interfacial micro-cracking or fiber-matrix debonding. This fundamental damage mode in composites has been characterized via SEM in ACCC conductors [3], and its characteristic interfacial morphology is extensively documented in studies of glass-fiber/polymer composites under stress corrosion [26].
II.: Fracture defects (Class_3), seen as complete core breaks, represent catastrophic brittle fracture. This final failure is directly linked to excessive bending, which can induce compressive stresses (>1 GPa) exceeding the core’s strength (~724 MPa) and initiate internal damage [3].
III.: Sawing (Class_2) and Displacement (Class_4) defects, characterized by sharp notches or core misalignment, indicate severe mechanical damage or assembly faults. Their geometric signatures are clear proxies for localized fiber fracture, matrix crushing, or interfacial shear.

In summary, the SE-ResNet18 model’s high accuracy in classifying these X-ray patterns demonstrates its ability to identify image features that serve as effective proxies for structurally significant, micro-structurally grounded failure modes. This link to established physical mechanisms reinforces the practical relevance of the classification task.

6. Conclusions and Contributions

This paper addresses the online defect detection requirements for ACCC conductor X-ray images. To tackle practical challenges such as sparse samples, class imbalance, inadequate single-channel adaptation, and stringent industrial deployment demands, we propose SE-ResNet18—a lightweight single-channel classification model integrating SE attention mechanisms. Through dataset construction and preprocessing, model architecture adaptation, attention enhancement, and training strategy optimization, we achieve synergistic improvements in high accuracy, high efficiency, and strong robustness. Key research conclusions and contributions are as follows:

First, we propose and validate a lightweight, high-precision classification architecture tailored for industrial single-channel images. By reconstructing the ResNet18 input layer into a single-channel convolution, we directly adapt to X-ray grayscale images, eliminating information redundancy and computational waste inherent in traditional three-channel conversions. Building upon this, we embed an SE attention module to achieve adaptive calibration of feature channels. This design achieves 98.39% accuracy and 97.91% macro-average F1 score on the test set with only 0.3M parameters. Notably, recall for the small-sample category (Splitting defects) reaches 94%, significantly enhancing the model’s ability to identify subtle and rare defects.

Second, it achieves an excellent balance between detection accuracy, model lightweight design, and inference efficiency. The SE-ResNet18 model has only 12.0M parameters, with a single-image inference time of 0.44 ms on an RTX 4070 GPU environment, processing over 2200 images per second—essentially meeting the real-time requirements for industrial online inspection. While maintaining compactness, the model achieves a 0.60 percentage point accuracy improvement over the baseline ResNet18. Its parameter count is only about 54.5% of comparable high-performance models (e.g., InceptionResNetV2), demonstrating the design advantage of “lightweight without sacrificing accuracy.”

Third, we designed an end-to-end training and optimization workflow tailored for industrial scenarios. By integrating fundamental geometric transformations, image robustness enhancement, and targeted small-sample processing into our data augmentation strategy, we effectively mitigated class imbalance and sample scarcity issues. The adoption of learning rate warm-up and cosine annealing scheduling, combined with gradient clipping and early stopping mechanisms, ensured stable convergence and robust generalization during training. The entire methodology requires no complex defect simulation or additional annotation, featuring a streamlined workflow with high reproducibility and strong engineering transferability.

In summary, this paper presents a synergistically designed lightweight model, SE-ResNet18, for industrial single-channel ACCC defect detection. This study demonstrates that by co-adapting the input representation, feature enhancement mechanism, and network complexity into a unified architecture, it is possible to overcome the classic trade-off between accuracy and inference efficiency in industrial image analysis. Our system-level optimization provides a practical and effective solution, highlighting the importance of holistic design over isolated improvements in engineering-driven computer vision tasks.

Author Contributions

Conceptualization, W.X. and R.C.; methodology, W.X.; software, W.X.; validation, W.X.; formal analysis, W.X.; investigation, W.X.; resources, R.C.; data curation, R.C.; writing—review and editing, W.X.; visualization, W.X.; supervision, R.C.; project administration, W.X. and R.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 12365026), the Key Research and Development Project of Jiangxi Province (No. 20232BBE50013), and the Jiangxi Provincial Natural Science Foundation (No. 20242BAB25046).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hu, Y.; Wang, J.; Zhu, Y.; Wang, Z.; Chen, D.; Zhang, J.; Ding, H. Automatic defect detection from X-ray Scans for Aluminum Conductor Composite Core Wire Based on Classification Neutral Network. NDT E Int. 2021, 124, 102549. [Google Scholar] [CrossRef]
Hu, Y.; He, N.; Xie, L.; Chen, D.; Gao, C.; Ding, H. Improved automatic defect detection from X-ray scans for aluminum conductor composite core wire based on modified Skip-GANomaly. NDT E Int. 2024, 143, 103050. [Google Scholar] [CrossRef]
Burks, B.; Armentrout, D.L.; Kumosa, M. Failure prediction analysis of an ACCC conductor subjected to thermal and mechanical stresses. IEEE Trans. Power Deliv. 2010, 24, 588–596. [Google Scholar] [CrossRef]
Zhong, F.; Huang, F.; Zhang, C.L.; Zhu, B. Defect Inspection on Carbon Fiber Composite Core Wire. Guangdong Electr. Power 2011, 24, 67–69. [Google Scholar]
Feng, C.; Li, W.; Cao, X.H.; Xie, Y.; Liu, S.W.; Huang, R.; Zhang, J. Radiographic Detection Method of Ablative Defects in 220kV Cable Buffer Layer. High Volt. Eng. 2020, 46, 43–46. [Google Scholar]
Liu, Q.; He, D.; Jin, Z.; Miao, J.; Shan, S.; Chen, Y.; Zhang, M. ViTR-Net: An unsupervised lightweight transformer network for cable surface defect detection and adaptive classification. Eng. Appl. Artif. Intell. 2023, 125, 106–118. [Google Scholar] [CrossRef]
Liu, F.L.; Xia, R.; Li, W.J.; Lu, J.K.; Guo, L.R.; Zeng, H.; Liao, W.L. Research on Detection Method of Buffer Layer Ablation Defect in High Voltage XLPE Cable. High Volt. Eng. 2022, 58, 266–272. [Google Scholar]
Duo, J.; Yang, C.; Bai, H.; Song, N.; Tong, W. Defect detection method of carbon fiber composite conductor based on semi-supervised anomaly detection. In Proceedings of the 2025 6th International Conference on Mechatronics Technology and Intelligent Manufacturing (ICMTIM), Nanjing, China, 11–13 April 2025; IEEE: New York, NY, USA; Volume 6, pp. 495–499.
Pan, Z.; Liu, D.; Wang, Y.X.; Yang, X.L.; Zou, Y.P.; Luo, J.D. X-ray digital imaging testing technology in nondestructive testing for the nuclear industry. Nondestruct. Test. 2025, 47, 25–33. [Google Scholar]
Yang, B.; Wang, J.J.; Li, H.W. Detection of cigarette appearance defects based on ILBP features and XGBoost. J. Image Graph. 2025, 30, 53–62. [Google Scholar]
Ding, S.M.; Liu, Z.F.; Li, C.L. AdaBoost learning for fabric defect detection based on HOG and SVM. In Proceedings of the 2011 International Conference on Multimedia Technology, Hangzhou, China, 26–28 July 2011; pp. 2903–2906. [Google Scholar]
Shipway, N.; Barden, T.; Huthwaite, P.; Lowe, M. Automated defect detection for Fluorescent Penetrant Inspection using Random Forest. NDT E Int. 2019, 101, 113–123. [Google Scholar] [CrossRef]
Jing, J.; Zhang, H.; Wang, J.; Li, P.; Jia, J. Fabric defect detection using Gabor filters and defect classification based on LBP and Tamura method. J. Text. Inst. 2013, 104, 18–27. [Google Scholar] [CrossRef]
Butera, L.; Ferrante, A.; Jermini, M.; Prevostini, M.; Alippi, C. Precise agriculture: Effective deep learning strategies to detect pest insects. IEEE/CAA J. Autom. Sin. 2022, 9, 246–258. [Google Scholar] [CrossRef]
Atac, H.O.; Kayabasi, A.; Aslan, M.F. The study on multi-defect detection for leather using object detection techniques. Collagen Leather 2024, 6, 37. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 1, 91–99. [Google Scholar] [CrossRef] [PubMed]
Chen, J.; Liu, Z.; Wang, H.; Núñez, A.; Han, Z. Automatic defect detection of fasteners on the catenary support device using deep convolutional neural network. IEEE Trans. Instrum. Meas. 2018, 67, 257–269. [Google Scholar] [CrossRef]
Cui, K.B.; Lv, S.Y.; Yang, L.R. Key Parts and Defect Detection of Transmission Line Based on HCDNet-YOLO11. Electron. Meas. Technol. 2025. [Google Scholar]
Liu, D.; Li, C.H. Aviation cable arc fault detection based on Inception-BiLSTM. Sci. Technol. Eng. 2025, 25, 6100–6108. [Google Scholar]
Wang, B.; Huang, M.; Liu, L.J.; Huang, Q.S.; Dan, W.Q. Multi-Layer Focused Inception-V3 Convolutional Network for Fine-Grained Image Classification. Eng. Appl. Artif. Intell. 2022, 47, 72–78. [Google Scholar]
Li, J.Q.; Ma, Y.P.; Hu, X.D.; Zhang, C.Z. Wind Turbine Bearing Fault Diagnosis Based on CBAM-InceptionV2-Two-stream CNN. Eng. Appl. Artif. Intell. 2023, 58, 28–33. [Google Scholar]
Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
Du, W.; Shen, H.; Fu, J.; Zhang, G.; He, Q. Approaches for improvement of the X-ray image defect detection of automobile casting aluminum parts based on deep learning. NDT E Int. 2019, 107, 102144. [Google Scholar] [CrossRef]
Zhang, L.; Yan, S.-F.; Hong, J.; Xie, Q.; Zhou, F.; Ran, S.-L. An improved defect recognition framework for casting based on DETR algorithm. J. Iron Steel Res. Int. 2023, 30, 949–959. [Google Scholar] [CrossRef]
Wang, J. Dataset of X-ray Scan for Aluminum Conductor Composite Core Wire from State Grid of China. Mendeley Data 2021. [Google Scholar] [CrossRef]
Kumosa, M.; Kumosa, L.; Armentrout, D. Failure analyses of nonceramic insulators part I: Brittle fracture characteristics. IEEE Electr. Insul. Mag. 2005, 21, 14–27. [Google Scholar] [CrossRef]

Figure 1. Process Framework Diagram.

Figure 2. SE-ResNet18 backbone architecture. Key details: (1) Input layer: 7 × 7 × 1 × 64 convolution kernel (adapted for single-channel X-ray images), (2) Feature extractor: 4 layers of SE–BasicBlocks (total of 8 SE modules, embedded after the second Conv of each BasicBlock), (3) Output layer: AdaptiveAvgPool2d (compressing 7 × 7 × 512 to 512D) + 512 × 5 FC layer. Total parameters: 12.0M. The architecture balances feature extraction capability and industrial deployment efficiency.

Figure 3. SE Attention Module Structure Diagram.

Figure 4. Representative X-ray image samples of the five ACCC conductor conditions from the test set. (a) NO defect. (b) Sawing defect (red box highlights the broken core region). (c) Shifting defect. (d) Fracture defect. (e) Splitting defect. (f) Splitting defect (zoom-in view): magnified detail of (e) to clearly display the subtle crack (red box emphasizes the defect region). (g) Sawing defect (zoom-in view): magnified detail of (b) to highlight the sharp edges of the Sawing defect (red box marks the critical region). All displayed samples are post-preprocessing (normalized and resized) and were selected as typical representatives from the independent test set to illustrate the visual characteristics of each class.

Figure 5. Model Training Curve. SE-ResNet18 training and validation curves over 200 epochs (batch size = 64, AdamW optimizer, 5-round warm-up + cosine annealing). Accuracy curve: training accuracy stabilizes at ~99% and validation accuracy at 98.39% (no overfitting). Loss curve: training/validation loss converges to <0.1 after 80 epochs. The smooth curves verify the effectiveness of data augmentation and regularization strategies in enhancing generalization.

Figure 6. Test Set Confusion Matrix Heatmap. Rows represent actual categories, columns represent predicted categories, color intensity corresponds to sample quantity, and values indicate specific sample counts for each category.

Figure 7. PCA dimensionality reduction visualization of defect features extracted by SE-ResNet18. PC1 (38.5%) and PC2 (19.5%) explain 58% of cumulative variance. Samples are colored by category (Class 0: Splitting, Class 1: No Defect, Class 2: Sawing, Class 3: Fracture, and Class 4: Displacement). Clear inter-class clustering (except for a slight overlap between Splitting and Fracture/Displacement) verifies the model’s strong feature discriminability.

Figure 8. Comprehensive Analysis Chart of Ablation Experiment. (Top left): Multi-indicator radar chart. (Top right): Accuracy–Time–Robustness 3D plot. (Bottom left): Component influence matrix. (Bottom right): Normalized performance comparison.

Figure 9. Accuracy and Recall Distribution Across Different Models.Note: ResNet18 (No SE) is not visually distinguishable in the plot, as its accuracy and recall values (97.79%, 96.86%) are nearly identical to those of ResNet34 (97.79%, 96.86%), resulting in overlapping markers that make it indistinguishable.

Figure 10. Feature distribution analysis of SE-ResNet18. Top 10 feature dimension means by class: SE module enhances defect-related feature responses. Feature standard deviation: SE reduces intra-class volatility by 12% on average. Inter-class cosine similarity matrix: Splitting (Class_0) has a high similarity with Fracture (0.46) and Displacement (0.70), explaining misclassification. Top 20 variable features: core defect features (e.g., crack texture and edge contrast) are amplified by SE.

Figure 11. t-SNE (perplexity = 30) visualization of high-dimensional defect features. T-SNE Dimension 1 and 2 project 512D features to 2D. Class separation is consistent with PCA results: No Defect, Sawing, and Fracture form distinct clusters; Splitting and Displacement show minor edge overlap, confirming the model’s limitation in distinguishing morphologically similar ultra-small defects.

Table 1. Network Layer Configuration of SE-ResNet18.

Stage	Residual Block Type	Channels	Output Size	Number of SE Modules
Conv1	Standard Convolution	64	112 × 112	0
Layer1	SE–BasicBlock	64	56 × 56	2
Layer2	SE–BasicBlock	128	28 × 28	2
Layer3	SE–BasicBlock	256	14 × 14	2
Layer4	SE–BasicBlock	512	7 × 7	2

Table 2. Unified Definition of ACCC Defect Types.

Defect Category (Label)	English Name	Features	Quantitative Metrics
Class_0	Splitting	Fine hairline cracks	<5 pixels; Δgray ≤ 10
Class_1	No Defect	Intact; uniform gray	No anomalies; σgray < 3
Class_2	Sawing	Sharp, regular edges	Δedge gray ≥ 20; straight/curved
Class_3	Fracture	Complete carbon core break	≥20 pixels; Δgray ≥ 15
Class_4	Shifting	Offset from conductor center	Center shift ≥ 10 pixels; no cracks

Table 3. Dataset Split and Class Distribution.

Split	Total Samples	Class_0	Class_1	Class_2	Class_3	Class_4
Training	1741	336	267	467	217	454
Validation	249	48	38	67	31	65
Test	498	96	77	133	62	130
Total	2488	480 (19.2%)	382 (15.4%)	667 (26.8%)	310 (12.4%)	649 (26.1%)

Table 4. Performance Metrics and Mathematical Formulas.

Metrics	Formula
Accuracy	$A c c = (\frac{T P + T N}{T P + T N + F P + F N}) * 100$
Macro-F1	$M a c r o - F 1 = \frac{1}{C} \sum_{i = 1}^{C} \frac{2 P_{i} R_{i}}{P_{i} + R_{i}}$
Recall	$R_{i} = \frac{T P_{i}}{T P_{i} + + F P_{i}}$
Parameters	$P a r a m s = \sum_{p \in A l l P a r a m s} n u m e l (p)$
Inference Speed	$a v g_s i n g l e_t i m e = (\frac{a v g_b a t c h_t i m e}{b a t c h_s i z e}) \times 1000$

Table 5. Overall Performance Metrics of the Model.

Accuracy	Macro-F1	Recall	Parameters (M)	Inference Speed (ms/Image)
98.39%	0.996	98.20%	12.0	0.44

Table 6. Single-category (including defect-free) grade performance indicators (N = 498).

Category	Sample Size	Accuracy	Recall	Macro-F1
Class_0 (Splitting)	62	0.94	0.94	0.94
Class_1 (No Defects)	77	1.00	1.00	1.00
Class_2 (Sawing)	133	1.00	1.00	1.00
Class_3 (Fracturing)	130	0.98	0.99	0.99
Class_4 (Shifting)	96	0.98	0.97	0.97

Table 7. Melting Performance Comparison.

Ablation Configuration	Accuracy	Macro-F1	Inference Speed (ms/img)
Full Model	98.39%	0.979	0.44
Without Attention Mechanism	97.79%	0.971	0.44
Without Single-Channel Adaptation	97.99%	0.973	0.74
Without Data Augmentation	93.98%	0.922	0.45
Without LR Scheduler	97.39%	0.974	0.45

Table 8. Performance Comparison Between SE-ResNet18 and Base ResNet18.

Model	Accuracy	Macro-F1	Recall	Parameters (M)	Inference Speed (ms/img)
ResNet18 (Baseline)	97.79%	0.971	96%	11.7	0.44
SE-ResNet18 (Ours)	98.39%	0.979	98.2%	12.0	0.44

Table 9. Performance Comparison with Lightweight Models.

Model	Accuracy	Macro-F1	Recall	Parameters (M)	Inference Speed (ms/img)
SqueezeNet1.1	95.18%	0.943	94.80%	0.73	1.8
AlexNet	95.58%	0.9438	95.30%	57.01	3.2
ResNet34	97.79%	0.9715	96.86%	21.28	4.61
Inception-ResNet-V2	98.19%	0.9768	97.95%	22.00	1.8
DenseNet121	97.99%	0.9742	97.57%	6.95	0.94
SE-ResNet18 (our)	98.39%	0.979	98.20%	12.0	0.44

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xiao, W.; Chen, R. A Study on ACCC Surface Defect Classification Method Using ResNet18 with Integrated SE Attention Mechanism. Appl. Sci. 2026, 16, 1899. https://doi.org/10.3390/app16041899

AMA Style

Xiao W, Chen R. A Study on ACCC Surface Defect Classification Method Using ResNet18 with Integrated SE Attention Mechanism. Applied Sciences. 2026; 16(4):1899. https://doi.org/10.3390/app16041899

Chicago/Turabian Style

Xiao, Wenlong, and Rui Chen. 2026. "A Study on ACCC Surface Defect Classification Method Using ResNet18 with Integrated SE Attention Mechanism" Applied Sciences 16, no. 4: 1899. https://doi.org/10.3390/app16041899

APA Style

Xiao, W., & Chen, R. (2026). A Study on ACCC Surface Defect Classification Method Using ResNet18 with Integrated SE Attention Mechanism. Applied Sciences, 16(4), 1899. https://doi.org/10.3390/app16041899

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Study on ACCC Surface Defect Classification Method Using ResNet18 with Integrated SE Attention Mechanism

Abstract

1. Introduction

2. Methods

2.1. Holistic Methodological Framework

2.2. Single-Channel Adaptive ResNet18 Backbone Network

2.3. Embedding and Optimization of the SE Attention Mechanism

2.3.1. SE Module Structural Design

2.3.2. Integration Strategy for SE Modules and ResNet18

2.4. Lightweight Output Layer Design and Training Strategy

3. Dataset

3.1. Dataset Construction

3.2. Data Preprocessing

4. Experiments and Results

4.1. Experimental Setup

4.2. Evaluation Indicators

4.3. Core Results Analysis

4.3.1. Overall Performance of SE-ResNet

4.3.2. Category and Performance Analysis

4.3.3. Confusion Matrix Analysis

4.4. Ablation Experiment

4.5. Comparative Experiment

4.5.1. Validation of SE Attention Mechanism Effectiveness (Compared to Baseline ResNet18)

4.5.2. Comparison with Other Models

5. Discussion

5.1. Synergistic Effect of Single-Channel Adaptation and Attention Mechanism

5.2. Analysis of Misclassified Samples

5.3. Limitations

5.4. Physical Interpretation and Microstructural Correlation of Defect Classes

6. Conclusions and Contributions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI