A Simulation-Based Hybrid Quantum-Classical Channel Attention Network for Reliable Aircraft Skin Defect Recognition

Jiang, Shiqi; Peng, Hai; Zhang, Dingqi; Zhu, Yupei

doi:10.3390/technologies14060361

Open AccessArticle

A Simulation-Based Hybrid Quantum-Classical Channel Attention Network for Reliable Aircraft Skin Defect Recognition

¹

School of Automation, Chengdu University of Information Technology, Chengdu 610225, China

²

International Joint Research Center for Robotics and Intelligent Systems, Chengdu 610225, China

^*

Author to whom correspondence should be addressed.

Technologies 2026, 14(6), 361; https://doi.org/10.3390/technologies14060361 (registering DOI)

Submission received: 24 April 2026 / Revised: 3 June 2026 / Accepted: 9 June 2026 / Published: 13 June 2026

(This article belongs to the Section Quantum Technologies)

Download

Browse Figures

Versions Notes

Abstract

Aircraft skin defect recognition is a safety-critical visual inspection task in which lightweight models must maintain high diagnostic accuracy while suppressing false alarms caused by complex surface textures, illumination variations, and weak defect patterns. This study proposes HQCA-Net, a simulation-based hybrid quantum-classical channel attention network for reliable aircraft skin defect recognition. The core component, termed Residual Quantum Channel Attention (RQCA), embeds a 10-qubit variational quantum circuit into a classical ResNet-18 backbone to perform compact and structured nonlinear feature recalibration, introducing only 30 trainable quantum-gate parameters. The quantum circuit is evaluated using state-vector simulation, and this study focuses on model-level feature recalibration, reliability, and robustness within the evaluated dataset rather than implementation on physical quantum hardware. Experiments on a six-class aircraft skin defect dataset show that HQCA-Net achieves 97.93% classification accuracy and a global false positive rate of 0.49%, outperforming ResNet-18 and classical lightweight attention mechanisms including SE, ECA, and SimAM. Additional analyses using confidence calibration, Grad-CAM visualization, Gaussian noise perturbation, few-shot training, and circuit-depth ablation further indicate that the proposed RQCA module improves feature discrimination and false-alarm suppression under compact parameter constraints. These results suggest that the hybrid quantum-classical attention module can serve as a parameter-efficient nonlinear feature recalibration strategy for reliable visual defect inspection under the tested experimental conditions.

Keywords:

aircraft skin defect recognition; hybrid quantum-classical learning; variational quantum circuit; channel attention; visual inspection; lightweight neural network; confidence calibration; false alarm suppression

1. Introduction

Visual inspection of aircraft skin surfaces is a safety-critical task in aviation maintenance and structural health assessment. Surface defects such as cracks, corrosion, dents, missing fastener heads, paint-off regions, and scratches may indicate local material degradation or potential structural risks. In particular, faint cracks or weak surface anomalies can be difficult to distinguish from illumination variations, reflective textures, and normal mechanical edges. The missed detection of such defects may lead to serious safety consequences, while excessive false alarms can increase manual re-inspection costs and interrupt maintenance workflows. Therefore, reliable aircraft skin defect recognition requires not only high classification accuracy but also effective suppression of false positives and stable decision confidence.

In recent years, deep convolutional neural networks have been widely used in industrial visual inspection and surface defect recognition. Compared with traditional handcrafted feature-based methods, CNN-based models can automatically learn hierarchical visual representations and achieve strong recognition performance. However, aircraft skin images present several practical challenges, including high reflectivity, complex background textures, weak defect boundaries, and inter-class visual similarity. Under these conditions, lightweight CNNs may become sensitive to background noise or local texture interference. As a result, models with high overall accuracy may still produce high-confidence false alarms or unstable confidence estimates when dealing with ambiguous samples. This reliability issue is particularly important in aviation inspection, where both missed detections and unnecessary false alarms can have significant operational consequences.

Attention mechanisms have been introduced to improve feature representation in lightweight visual models. Representative channel or spatial attention modules, such as SE, ECA, and SimAM, can enhance informative features and suppress redundant responses to some extent. Nevertheless, many classical attention mechanisms rely on local channel interaction, dimensionality-reduction bottlenecks, or fixed energy-based assumptions. When facing complex surface reflections and weak defect patterns, these mechanisms may have limited ability to model compact yet expressive nonlinear feature interactions without increasing parameter overhead. Transformer-based architectures provide stronger global modeling ability, but their computational cost is often less suitable for lightweight or resource-constrained inspection scenarios. Therefore, it remains meaningful to explore compact feature recalibration mechanisms that can improve reliability while maintaining a lightweight structure.

Hybrid quantum-classical learning provides a possible direction for compact nonlinear feature interaction. Variational quantum circuits can map low-dimensional classical features into a quantum state space and perform parameterized transformations through rotation and entangling operations. In this study, the simulated VQC is not used to claim quantum supremacy or formal quantum computational advantage. Instead, it is treated as a compact and structured nonlinear mapping module for model-level feature recalibration. Although current quantum machine learning studies are often evaluated through simulation rather than physical quantum hardware, simulated VQC modules can still be used as exploratory components to investigate alternative parameter-efficient feature interaction strategies. Inspired by this idea, this study proposes a Hybrid Quantum-Classical Channel Attention Network, termed HQCA-Net, for reliable aircraft skin defect recognition. The core module, Residual Quantum Channel Attention (RQCA), embeds a 10-qubit simulated variational quantum circuit into a classical ResNet-18 backbone to recalibrate deep channel features under a compact bottleneck structure. To improve the transparency and practical interpretability of the proposed design, this study further provides detailed implementation settings for the simulated VQC and quantum-classical optimization process, reports a model-level complexity comparison in terms of parameter count and MACs, and clarifies the current limitations regarding external dataset validation and embedded-device benchmarking.

The main contributions of this study are summarized as follows:

We propose HQCA-Net, a simulation-based hybrid quantum-classical channel attention network for reliable aircraft skin defect recognition, in which a compact variational quantum circuit is embedded into a classical ResNet-18 backbone for deep feature recalibration.
We develop a Residual Quantum Channel Attention module that maps deep channel descriptors into a 10-qubit simulated quantum circuit and generates channel attention weights through quantum measurement, enabling lightweight feature recalibration with only 30 trainable quantum-gate parameters.
We conduct a systematic comparison with classical lightweight attention mechanisms, including SE, ECA, and SimAM, using accuracy, macro F1-score, false positive rate, and confusion matrix analysis to evaluate the empirical effect of the proposed simulation-based quantum-classical attention mechanism on defect recognition.
We analyze the decision reliability and interpretability of HQCA-Net through confidence calibration, reliability diagrams, DET curves, and Grad-CAM visualizations, examining its ability to suppress false alarms and focus on defect-related regions.
We investigate the robustness, trainability, and parameter efficiency of the proposed hybrid quantum-classical attention design under challenging conditions, including Gaussian noise perturbation, few-shot training, fair bottleneck ablation, circuit-depth analysis, and gradient variance tracking of the simulated quantum circuit.

The rest of this paper is organized as follows: Section 2 reviews related studies on visual defect recognition, attention mechanisms, and hybrid quantum-classical learning. Section 3 introduces the proposed HQCA-Net architecture, the RQCA module, and the training strategy. Section 4 presents the experimental results, including classification performance, reliability analysis, robustness tests, ablation studies, and model complexity analysis. Section 5 concludes the study and discusses limitations and future work.

2. Literature Review

2.1. The Overconfidence Crisis and Local Calibration Collapse in Deep Visual Diagnosis

Visual quality inspection of the physical skin integrity of modern aircraft requires near-zero tolerance, as missed detections of faint surface defects often imply catastrophic structural hazards. This harsh reality was profoundly exemplified by the 2002 China Airlines Flight 611 crash, where undetected microscopic fatigue cracks led to mid-air disintegration [1]. To overcome the limitations of manual inspection, Deep Convolutional Neural Networks (DCNNs) have been widely introduced. For instance, Donatus et al. [2] and Plastropoulos et al. [3] demonstrated the efficacy of machine learning in automating aircraft structural diagnosis and estimating skin defect sizes. Furthermore, comprehensive surveys by Cheng et al. [4] and Bai et al. [5] highlighted that while ML-driven material defect detection is highly promising, adapting these deep models to complex, real-world industrial environments remains a significant challenge.

In modern field maintenance, inspections highly depend on unmanned aerial systems (UAS), imposing strict lightweight constraints on diagnostic models. Connolly et al. [6] explored deep learning-based defect detection for light aircraft using unmanned aircraft systems, emphasizing the necessity of edge-oriented algorithms. However, when these classical lightweight models confront real-world maintenance sites, their relatively weak anti-interference capabilities are starkly exposed. Addressing specific environmental noise, vom Schemm [7] investigated the severe challenges posed by specular reflections in aircraft dent detection. Similarly, Shukla et al. [8] systematically analyzed deep learning-based image anomaly detection in industrial contexts, revealing that constrained models are highly susceptible to feature confusion, frequently misjudging normal mechanical edges as defects. To alleviate this, Wang et al. [9] proposed ESS-DETR, aiming to balance high accuracy with the stringent computational limits of UAV-deployable surface inspection.

Despite these architectural advancements, current mainstream lightweight diagnostic models generally fall into the trap of decision confidence distortion [10]. As revealed by Guo et al. [10], while modern neural networks achieve high test set accuracy, they tend to adopt extremely aggressive feature-fitting strategies, leading to severe miscalibration. In edge-deployed aviation scenarios, this translates to Local Calibration Collapse—when confronting out-of-distribution (OOD) complex industrial noise or ambiguous borderline samples, models either exhibit local blind overconfidence in erroneous misjudgments or fall into extreme underconfident oscillations. In a field with zero-tolerance safety requirements, this probability mapping directly undermines the engineering reliability of AI systems. Therefore, introducing robust model-level constraints under stringent resource limitations to suppress both local blind overconfidence and unacceptable false alarm rates has become the core pain point for practical aircraft skin inspection scenarios.

2.2. From Attention Dimensionality Reduction Bottlenecks to Simulation-Based Quantum-Classical Feature Recalibration

To address the aforementioned challenges of anti-interference and false alarm suppression, Li et al. [11] demonstrated in aluminum surface defect detection that integrating optimized attention modules into lightweight CNN architectures can enhance industrial defect detection performance. Tracing this architectural evolution, Hu et al. [12] introduced the traditional SE module based on a dimensionality-reduction bottleneck, while Woo et al. [13] proposed CBAM to integrate spatial information. To further improve local cross-channel interaction, Wang et al. [14] developed ECA-Net, and Yang et al. [15] subsequently designed the parameter-free 3D attention module SimAM based on energy functions. Despite these advances, existing classical attention mechanisms may still be constrained by local feature aggregation, dimensionality-reduction bottlenecks, or predefined interaction assumptions. As discussed in the review by Niu et al. [16], attention mechanisms can improve feature representation, but their effectiveness still depends strongly on the specific structure of the attention module and the complexity of the target visual task. When dealing with highly reflective aircraft skin images and weak defect patterns, classical lightweight attention modules may struggle to model sufficiently compact and expressive high-order feature interactions without increasing parameter overhead. On the other hand, Dosovitskiy et al. [17] introduced Transformer-based architectures with strong global representation capability. Nevertheless, as highlighted by Cherrat et al. [18], attention-based architectures often involve relatively high computational complexity, which may limit their applicability in lightweight or resource-constrained inspection scenarios.

To explore compact nonlinear feature interaction under limited parameter budgets, Quantum Machine Learning (QML) has attracted increasing attention. Havlíček et al. [19] and Islam and He [20] showed that quantum-enhanced feature spaces and quantum machine learning provide new possibilities for nonlinear feature mapping and visual learning tasks. Building on this foundation, recent studies by Rizvi et al. [21] and Li et al. [22] have explored hybrid quantum-classical vision models and quantum self-attention models in different learning tasks. In visual diagnosis scenarios requiring compact feature aggregation under complex backgrounds, Pandey and Mandal [23] applied a hybrid quantum-classical convolutional neural network with a quantum attention mechanism to skin cancer diagnosis, while Pesah et al. [24] analyzed the trainability and robustness of quantum convolutional architectures. These studies suggest that simulated or hybrid quantum-classical modules may provide useful nonlinear feature representations for visual tasks, although their effectiveness still needs to be evaluated carefully in task-specific scenarios.

Inspired by the recent development of hybrid quantum-classical architectures demonstrated by Mukhanbet and Daribayev [25], this paper explores the use of a simulated variational quantum circuit as a compact channel attention module for aircraft skin defect recognition. Instead of relying on a conventional MLP-based excitation module, the proposed method maps compressed channel descriptors into a low-dimensional quantum circuit and uses parameterized rotation, entangling operations, and quantum measurement to generate channel recalibration weights. In this way, the simulated VQC is used as a model-level nonlinear feature interaction module rather than as a physical quantum-hardware acceleration component. The goal is to investigate whether hybrid quantum-classical channel attention can improve feature discrimination, confidence stability, and false-alarm suppression under compact parameter constraints. While the proposed HQCA-Net utilizes a gate-based VQC, other quantum computing paradigms, such as Quantum Annealing (QA), offer alternative approaches for industrial applications. QA is highly effective for solving discrete combinatorial optimization problems. However, integrating QA directly into the gradient-based backpropagation framework of end-to-end deep convolutional networks poses significant algorithmic challenges. In contrast, the parameterized rotation and entanglement operations in gate-based VQCs can be differentiated natively via the parameter-shift rule. This makes the VQC paradigm currently more suitable for continuous feature recalibration and joint optimization in visual inspection tasks.

3. Materials and Methods

3.1. Overall Hybrid Architecture Design

To meet the stringent dual constraints of extreme lightweight design and an ultra-low false alarm rate required for field inspections of aircraft, this paper proposes a novel Hybrid Quantum-Classical Channel Attention Network. Its overall architecture is illustrated in Figure 1. This architecture combines the visual feature extraction capability of a classical lightweight convolutional backbone with the compact nonlinear feature interaction capability of a simulated VQC-based attention module. The latter introduces a compact and structured nonlinear mapping to recalibrate deep abstract features at the channel level.

To maintain a compact model structure for resource-constrained visual inspection scenarios, HQCA-Net employs ResNet-18, with its terminal fully connected layer removed, as the classical feature extraction backbone. The input defect image

X_{i n} \in R^{3 \times H_{0} \times W_{0}}

undergoes progressive downsampling sequentially through residual blocks Layer 1 to Layer 4, yielding the deep semantic feature map

X \in R^{C \times H \times W}

, where the number of channels is

C = 512

.

Considering that while the deep features output by Layer 4 possess high semantic abstraction, they are also highly prone to feature confusion with high-frequency industrial lighting and shadow noise. Therefore, this paper seamlessly embeds the core component—the RQCA module—immediately after Layer 4. Using a compact simulated VQC-based attention design, this module adaptively redistributes the importance of the 512 deep feature channels through structured nonlinear feature recalibration. Finally, the high-fidelity features, having undergone VQC-based feature recalibration, are dimensionally reduced via Global Average Pooling (GAP) and fed into the fully connected classification head, thereby achieving an end-to-end output from the raw defect image to a reliable diagnostic probability.

3.2. Residual Quantum Channel Attention Mechanism

In classical channel attention mechanisms, such as SE or CBAM, restricted by computational resources, the excitation of channel features typically relies on a dimensionality reduction bottleneck composed of two fully connected layers. This compulsory dimensional compression often leads to irreversible information loss.

To alleviate this limitation, RQCA introduces a simulated VQC-based structured nonlinear mapping as a compact alternative to the traditional classical MLP excitation bottleneck. The proposed module maps compressed channel descriptors into a low-dimensional latent space and uses parameterized rotation gates, CNOT-based structured coupling, and expectation measurements to generate channel recalibration weights. The specific structure and VQC design of RQCA are illustrated in Figure 2, and its forward propagation process can be divided into the following three stages:

Classical Squeeze and Parameter Mapping:

Through global average pooling, the global receptive field information in the spatial dimension is compressed into a channel descriptor. For an input feature map

X = [x_{1}, x_{2}, \dots, x_{C}]

, the statistic vector

z \in R^{C}

is calculated as

z_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} x_{c} (i, j)

(1)

To achieve a smooth bridge between the classical and quantum domains, a learnable linear mapping matrix

W_{s q} \in R^{N \times C}

is employed to project the high-dimensional vector

Z

into an

N

-dimensional low-dimensional space (considering both quantum simulation computational power and representation capability, this paper sets the number of qubits to

N = 10

). The obtained low-dimensional latent vector is directly used as the parameter

θ

for the rotation gates in the quantum circuit:

θ = W_{s q} z + b_{s q}

(2)

2.: Variational Quantum Circuit Excitation:

After obtaining the encoding parameter

θ \in R^{N}

, the process enters the quantum excitation stage. The initial quantum state is set to the ground state of

N

qubits,

∣ 0 ⟩^{\otimes N}

. First, an angle embedding method is adopted, mapping the classical continuous parameters into a compact nonlinear feature recalibration state via single-qubit rotation gates

R_{y} (θ_{i})

around the Y-axis

∣ ψ_{i n} ⟩ = ⨂_{i = 1}^{N} R_{y} (θ_{i} \cdot π) ∣ 0

(3)

Subsequently, the encoded state is processed by the strongly entangling layer. Considering the model’s trainability, convergence speed, and generalization performance, as discussed in the circuit-depth ablation study in Section 4.6.2, the number of entangling layers is set to

L = 1

. This layer consists of parameterized single-qubit rotation gates

U (ϕ)

and cascaded Controlled-NOT (CNOT) gates. The CNOT gates introduce structured inter-qubit coupling, which provides a compact mechanism for modeling nonlinear relationships among the compressed channel descriptors:

∣ ψ_{o u t} ⟩ = U_{e n t} (ϕ) ∣ ψ_{i n} ⟩

(4)

where

ϕ

represents the quantum learnable parameters that can be optimized through backpropagation. Under the architecture set in this paper (L = 1 and the number of qubits

N = 10

), this module introduces only 30 additional quantum learnable parameters. Compared to the tens of thousands of parameters typical in classical deep networks, this design achieves extremely high feature representation efficiency. Finally, expectation measurement is performed on each qubit under the Pauli-Z basis to complete the quantum feature decoding:

E_{i} = ⟨ψ_{o u t}∣ σ_{z}^{(i)}∣ ψ_{o u t}⟩, i \in \{1, 2, \dots, N\}

(5)

3.: Quantum Weight Decoding and Residual Recalibration:

The expectation vector

E \in [- 1, 1]^{N}

obtained from the simulated quantum measurement represents bounded nonlinear responses generated through structured rotation and entangling operations applied to the compressed channel descriptor. It is restored to the original channel dimension

C

via a dimension-raising mapping matrix

W_{e x} \in R^{C \times N}

, and constrained by the Sigmoid activation function

σ (\cdot)

to obtain the normalized channel weights

W_{a t t} \in [0, 1]^{C}

.

To ensure the stable propagation of quantum gradients in deep networks and to prevent the initial randomness of the quantum module from destroying pre-trained feature representations, this paper introduces a Residual Recalibration Topology. The final attention-reconstructed feature map

Y \in R^{C \times H \times W}

is calculated as:

Y_{c} = X_{c} ⊙ (1 + W_{a t t, c})

(6)

where

⊙

denotes element-wise multiplication along the channel dimension. This residual design helps maintain stable feature propagation: when a certain channel is determined to be pure background noise,

W_{a t t, c} \to 0

, and the corresponding channel maintains an identity mapping; for channels containing critical defect information, such as cracks or corroded areas, their responses can be adaptively amplified. This residual recalibration mechanism provides a compact way to adjust channel responses while preserving the backbone feature representation. It can suppress noise-sensitive channels and enhance defect-related channels, thereby supporting improved feature discrimination and false-alarm suppression under compact parameter constraints.

3.3. Theoretical Interpretation of RQCA as a Structured Nonlinear Feature Recalibration Module

To avoid over-interpreting the empirical performance improvements as evidence of quantum superiority, this subsection provides a theoretical interpretation of the proposed RQCA module from the perspective of structured nonlinear feature recalibration. The goal is to clarify how the simulated VQC-based attention mechanism transforms compressed channel descriptors into channel-wise recalibration weights, rather than to claim quantum supremacy or formal quantum computational advantage.

As described in Section 3.2, the RQCA module first compresses the deep feature map into a channel descriptor through global average pooling, projects the descriptor into a 10-dimensional latent vector, encodes the latent vector into a simulated VQC through rotation gates, and finally obtains bounded Pauli-Z expectation responses for channel recalibration. Therefore, the overall RQCA process in Equations (1)–(6) can be interpreted as a constrained nonlinear mapping from the classical channel descriptor to the final channel attention weights. In other words, RQCA does not function as an unconstrained black-box feature extractor, but as a compact channel-wise recalibration operator.

The structural constraint of RQCA comes from four aspects. First, the original high-dimensional channel descriptor is compressed into a low-dimensional latent space with

N = 10

, which restricts the degrees of freedom of the attention-generation process. Second, the single-layer strongly entangling circuit with

L = 1

introduces structured coupling among the compressed latent dimensions, but avoids an excessively flexible high-capacity mapping. Third, the Pauli-Z expectation measurements produce bounded nonlinear responses, which prevents the recalibration signal from freely amplifying arbitrary feature perturbations. Fourth, the residual recalibration topology preserves the original backbone features while adaptively adjusting channel importance. Together, these components impose a compact and structured constraint on the deep feature manifold.

This constrained mapping provides a theoretical interpretation for the observed feature separation effect. In aircraft skin images, background reflections, illumination variations, shadows, and normal mechanical edges often produce unstable, weakly correlated, or high-frequency channel responses. In contrast, true structural defects such as cracks, corrosion, missing fastener heads, and scratches tend to activate more consistent semantic channels in the deep feature space. By applying the constrained nonlinear recalibration process defined in Equations (1)–(6), RQCA can suppress less discriminative or noise-sensitive channel responses while preserving or amplifying channels that are more consistently associated with defect-related semantics.

From a perturbation perspective, the RQCA mapping can be abstractly denoted as

W_{a t t} = g_{R Q C A} (z)

, where

z

is the channel descriptor and

W_{a t t}

is the generated channel attention vector. If

δ z

denotes a small disturbance caused by illumination variation, surface reflection, or background texture noise, the corresponding change in the attention weights can be locally approximated as

δ W_{a t t} \approx J_{g} (z) δ z

, where

J_{g} (z)

is the Jacobian of the RQCA mapping. Since

g_{R Q C A}

is constrained by the low-dimensional bottleneck, the single-layer structured coupling, bounded expectation measurements, and sigmoid normalization, the sensitivity of the channel attention weights to random high-frequency perturbations is restricted. This provides a model-level explanation for why RQCA can reduce false activations caused by reflection and shadow-like artifacts while maintaining sensitivity to defect-related channel patterns.

Compared with a conventional MLP-based excitation module, the proposed VQC-based mapping provides a compact interaction structure with only a small number of trainable gate parameters. The purpose of this design is not to prove that the learned transformation is inaccessible to classical models, but to introduce a parameter-efficient nonlinear recalibration mechanism under lightweight model constraints. Therefore, the improved false-alarm suppression and feature discrimination observed in the experiments should be interpreted as empirical evidence of structured feature recalibration, rather than as evidence of quantum superiority.

It should be emphasized that this theoretical interpretation does not constitute a proof of quantum supremacy, non-classicality, or formal quantum computational advantage. Since the VQC is evaluated by state-vector simulation on classical hardware, the proposed module should be understood as a simulation-based, parameter-efficient nonlinear feature recalibration mechanism. The contribution of RQCA lies in its compact structured mapping and empirical reliability improvement under lightweight model constraints, rather than in demonstrating that the learned feature transformation is inaccessible to classical models.

3.4. Loss Function and Quantum-Classical Joint Optimization Strategy

Building upon the established forward propagation model of HQCA-Net, achieving the end-to-end joint training of the quantum circuit parameters and the classical network weights is a critical aspect of the overall system design. The hybrid quantum-classical joint optimization and gradient backpropagation framework constructed in this paper is illustrated in Figure 3.

Loss Function Design for Defect Sample Imbalance and Difficulty Disparity:

In real-world scenarios of aircraft field maintenance and structural inspection, samples of normal skin regions far outnumber defect samples, and there is a significant disparity in sample difficulty across different defect categories (e.g., minute cracks versus large-area paint peeling). To prevent the model training from being dominated by a large number of easily classified samples, this paper employs Focal Loss instead of traditional cross-entropy loss to dynamically adjust the contribution of samples with varying difficulties to the total loss. For the defect diagnosis task comprising 6 categories, its loss function is defined as:

L_{F o c a l} = - \sum_{k = 1}^{K} α_{k} (1 - p_{k})^{γ} y_{k} \log (p_{k})

(7)

where

K = 6

represents the total number of defect categories,

y_{k}

is the one-hot encoded ground truth label, and

p_{k}

is the softmax probability output by the model.

α_{k}

is the class balancing factor used to alleviate the problem of severely uneven sample quantity distribution; the focusing parameter

γ = 2

reduces the weight of high-confidence, easily classified samples, effectively guiding the optimization attention toward hard boundary samples, thereby significantly enhancing the model’s ability to recognize rare and small-scale defects.

2.: Quantum Gradient Backpropagation Based on the Parameter-Shift Rule:

In the joint optimization process of HQCA-Net, the weights

W

of the classical convolutional layers can be directly updated via the standard chain rule and automatic differentiation. However, the unitary evolution of the VQC involves physical measurements, making its computational process non-differentiable within classical automatic differentiation frameworks. Consequently, error signals cannot directly propagate through the quantum layer.

To enable the effective backpropagation of classical errors to the quantum parameters

ϕ

, this paper introduces the parameter-shift rule at the quantum nodes. For any parameterized unitary gate

U (ϕ_{i})

generated by Pauli operators, the analytical gradient of the loss function with respect to the parameter

ϕ_{i}

can be accurately calculated by shifting the parameter in the positive and negative directions and executing two forward propagation measurements:

\frac{\partial L_{f l}}{\partial ϕ_{i}} = \frac{1}{2} [f (ϕ_{i}+ \frac{π}{2})− f (ϕ_{i}− \frac{π}{2})]

(8)

where

f (\cdot)

represents the complete forward output mapping, including the quantum measurement. By shifting the parameter

ϕ_{i}

by a step size of

π / 2

in both the positive and negative directions and executing two independent forward measurements, this rule circumvents the direct differentiation of the quantum state vector, thereby obtaining an exact analytical gradient at the physical level.

Through this analytical gradient calculation scheme, the gradients of the quantum circuit can be integrated into the computational graph of classical deep learning frameworks. This enables the error signals generated by the Focal Loss to act simultaneously on the classical backbone weights and the quantum gate parameters, thereby supporting joint optimization using the Adam optimizer. From an optimization perspective, this method supports the trainability of HQCA-Net as a hybrid quantum-classical architecture and contributes to its convergence stability on the tested aircraft skin defect dataset.

3.5. Experimental Setup and Evaluation Metrics

To comprehensively verify the effectiveness and decision-making reliability of the proposed HQCA-Net in complex inspection environments, this paper constructs a rigorous experimental testing platform and evaluation system. This section systematically expounds on the core aspects of the experimental design, including the construction of the aircraft skin defect dataset, the hardware and software infrastructure and hyperparameter configurations for model training, and the multi-dimensional comprehensive evaluation metrics used to quantify model performance.

3.5.1. Aviation Assembly Surface Defect Dataset

This experiment utilizes a skin surface defect dataset that closely reflects real-world commercial aircraft maintenance and routine inspection scenarios. The dataset systematically covers common physical damage types caused by daily service, aerodynamic friction, and external environmental erosion of aircraft skins. It encompasses the following six core defect categories, with typical visual morphologies shown in Figure 4:

Corrosion: Metal surface rusting and material degradation caused by chemical or electrochemical reactions.

Crack: Minute topological fractures on the surface of metal skins or structural components caused by stress concentration.

Dent: Local geometric deformation of the skin caused by external physical impact.

Missing-head: Fasteners with broken or detached heads, constituting a severe structural safety hazard.

Paint-off: Peeling of the surface anti-corrosion coating, often accompanied by strong reflective edges from lighting.

Scratch: Linear, shallow surface damage caused by mechanical friction from sharp objects.

To improve the diversity of training samples under complex aviation lighting conditions and multi-view shooting, a total of 7411 raw images were collected and subjected to strict standardized preprocessing before being input into the network. Specifically, the image resolution is uniformly adjusted to

224 \times 224

pixels. A combination of offline and online data augmentation strategies is employed, including random horizontal and vertical flipping, small-angle rotations within a range of ±15°, and color jitter. Ultimately, the entire dataset is randomly partitioned into training, validation, and test sets at a ratio of 7:1.5:1.5. During the partitioning process, it is carefully guaranteed that samples from the same source do not leak across sets, thereby ensuring the objectivity and reliability of the model performance evaluation.

3.5.2. Experimental Environment and Hyperparameter Configuration

All deep learning and quantum computing simulation experiments were conducted on a unified hardware platform. The hardware configuration utilizes an NVIDIA high-performance GPU (NVIDIA Corporation, Santa Clara, CA, USA) to accelerate tensor computations, as well as quantum forward and gradient propagation. The software environment is based on Python 3.8.20, utilizing PyTorch (version 2.4.1) as the core deep learning framework, and integrates the PennyLane (version 0.29.1) quantum machine learning library to achieve state vector simulation and parameter-shift gradient calculations for the VQCs.

The training employs an end-to-end joint optimization method for the classical backbone and the quantum attention module. The Adam optimizer is selected with an initial learning rate of

1 {\times 10}^{- 4}

, complemented by a cosine annealing scheduling strategy to achieve stable convergence in the later stages of training. The training batch size is set to 16, and the number of training epochs is 50. The loss function utilizes the aforementioned Focal Loss to effectively alleviate the extreme imbalance in sample quantity and difficulty among defect categories. For the quantum attention module, the continuous latent vectors are embedded using

R_{y}

rotation gates. The strongly entangling layer is configured with a depth of

L = 1

, where generic single-qubit rotations and CNOT gates are used to construct the entanglement topology. Finally, expectation measurements under the Pauli-Z basis are performed on all 10 qubits. To further improve reproducibility, the key implementation details of HQCA-Net are summarized in Table 1.

3.5.3. Evaluation Metrics and Confidence Calibration

Considering the extreme sensitivity to false alarms in aircraft skin defect diagnosis, this paper abandons traditional single-accuracy-oriented evaluations. Instead, it constructs a dual-track evaluation system that emphasizes both diagnostic accuracy and decision-making reliability.

Basic Diagnostic Accuracy Metrics, for classification performance, overall Accuracy (Acc) is adopted as the global metric. Simultaneously, Precision (P), Recall (R), and F1-Score are introduced to conduct a fine-grained analysis of the precision and recall rates for each defect category. The calculation formulas are as follows:

P = \frac{T P}{T P + F P}

(9)

R = \frac{T P}{T P + F N}

(10)

F 1 = 2 \cdot \frac{P \cdot R}{P + R}

(11)

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(12)

where

T P

,

T N

,

F P

, and

F N

represent True Positives, True Negatives, False Positives (false alarms), and False Negatives (missed detections), respectively.

Core Evaluation Dimensions for Decision Reliability and Confidence Calibration, to quantitatively characterize the degree of model overconfidence and the risk of high-confidence false alarms in inspection scenarios, this paper specifically introduces Reliability Diagrams and the Expected Calibration Error (ECE).

The Reliability Diagram divides the confidence probabilities output by the model into

M

equally spaced bins, calculating the average predicted confidence

conf (B_{m})

and the actual empirical accuracy

acc (B_{m})

for each bin

B_{m}

. For an ideally calibrated model, its reliability curve should consistently align with the main diagonal (

conf (B_{m}) = acc (B_{m})

).

To further provide a comparable quantitative metric, ECE is defined as the weighted average of the errors across all bins:

ECE = \sum_{m = 1}^{M} \frac{∣ B_{m} ∣}{N_{t o t a l}} ∣ acc (B_{m}) - conf (B_{m}) ∣

(13)

where

N_{t o t a l}

is the total number of samples in the test set, and

∣ B_{m} ∣

is the number of samples falling into the

m

bin. A lower ECE value indicates that the probability estimates output by the model are more reliable, and the error rate of high-confidence predictions is lower. This is of critical engineering significance for zero-tolerance scenarios like aviation assembly.

3.5.4. Baseline Selection and Fair Ablation Design

To verify whether the performance gain of HQCA-Net stems from the compact structured nonlinear recalibration capability of the simulated VQC-based RQCA module or merely from an increase in parameter scale, this paper designs a multi-level comparative experimental system. To ensure a fair evaluation, all comparative models use ResNet-18 as the unified backbone network. Well-established attention modules, including SE-Net, ECA-Net, and SimAM, are selected as baselines. These models represent different lightweight attention paradigms, ranging from classical fully connected dimensionality reduction to local convolutional interaction and parameter-free energy-based attention, thereby providing a rigorous comparative setting for evaluating HQCA-Net.

In the ablation study, this paper proposes a Fair Bottleneck Ablation scheme. This scheme forcibly compresses the dimension of the latent feature vectors for all attention modules to the same physical threshold, namely:

\dim (θ_{latent}) = N = 10

(14)

By consistently aligning this latent vector dimension with the number of qubits of the VQC proposed in this paper, the aim is to explore the feature reconstruction capabilities of different nonlinear mapping paradigms under an extremely narrow information bandwidth. Under this constraint, HQCA-Net is horizontally benchmarked against classical variants with strong nonlinear expression capabilities, such as Deep Multilayer Perceptrons (Deep-MLP) and Polynomial Expansions.

4. Results and Discussion

4.1. Trade-Off Between Baseline Performance and Accuracy-Reliability

In this section, an extensive evaluation of the classification performance is conducted using standard test sets to compare the proposed HQCA-Net against the classic ResNet-18 baseline and models integrated with mainstream lightweight attention modules. When confronting the intense background noise characteristic of complex aircraft skin images, the feature-fitting capabilities of classical architectures often encounter representational bottlenecks as they approach their performance limits. In contrast, HQCA-Net demonstrates improved feature remodeling capabilities that surpass those of the classical control groups.

The dynamic convergence characteristics of a model throughout the training cycle serve as a pivotal dimension for validating its structural advantage. To mitigate visual clutter caused by the overlapping of multiple curves, the evolutionary comparison under the full dataset is categorized into two groups. Figure 5 illustrates the evolution of the validation set F1 scores and training loss convergence curves for the core comparison group (HQCA-Net, classic baseline, and the SE module), while Figure 6 presents the corresponding indicators for the mainstream lightweight comparison group (ECA and SimAM modules).

As observed from the training loss convergence curves in Figure 5b and Figure 6b, all models achieve a rapid reduction in loss during the initial training phase. However, during the late-stage deep optimization, HQCA-Net exhibits a smoother and more compact optimization trajectory compared to other classical lightweight architectures, ultimately converging at the lowest global loss minimum. This demonstrates that the integration of quantum mechanisms does not disrupt the backpropagation gradient flow of the classical ResNet; rather, it enhances the model’s optimizability and robustness within complex parameter spaces.

A further horizontal comparison of the validation set F1 score curves in Figure 5a and Figure 6a reveals that, due to a propensity for overfitting complex background noise, classical models—including ECA and SimAM, which emphasize local features—tend to fall into local optima when approaching their respective performance ceilings. This is manifested as prolonged oscillations or even performance degradation. Notably, while HQCA-Net also exhibits minor fluctuations in the later stages, these are mainly distinct from the bottleneck oscillations observed in classical models. While classical models oscillate repeatedly beneath a lower performance ceiling, HQCA-Net maintains a strong upward trend, not only being the first to surpass the performance bottleneck but also repeatedly reaching higher performance intervals during its exploratory fluctuations.

This comprehensive analysis across charts demonstrates that the simulated VQC-based attention module provides effective channel-wise feature recalibration under the tested aircraft skin defect recognition setting. The structured nonlinear mapping introduced by the simulated VQC, implemented via parameterized rotation gates and CNOT-based coupling, contributes to more stable optimization behavior. This compact and constrained feature interaction mechanism helps HQCA-Net achieve improved empirical performance compared with traditional classical architectures on complex aircraft skin defect samples, without implying quantum superiority or formal quantum computational advantage.

Comparing the above metrics, it is evident that HQCA-Net exhibits improved comprehensive evaluation performance, providing a more robust solution to the technical bottleneck of balancing high precision and low false alarms in aircraft skin visual inspection. When approaching performance limits, classical models often rely heavily on the aggressive fitting of high-frequency background noise. To pursue maximum quantitative metrics, their classical attention mechanisms frequently fall into an overly sensitive state of feature activation. In practical aircraft skin inspection scenarios involving sudden lighting changes or minor airborne perspective disturbances, this aggressive feature fitting not only easily translates into high false-alarm costs but also restricts further breakthroughs in the overall accuracy of the network. In contrast, benefiting from the compact structured nonlinear recalibration introduced by the simulated VQC-based RQCA module, HQCA-Net effectively circumvents such aggressive strategies, achieving the highest global accuracy while suppressing the false alarm rate to the lowest level. To further analyze the underlying logic of HQCA-Net in suppressing false alarms for field skin defects at a fine-grained level, this paper comparatively plots the classification confusion matrices of the classical SE-ResNet and HQCA-Net on the test set, as shown in Figure 7.

Through a deep analysis of the off-diagonal error distribution in the confusion matrices, it can be observed that SE-ResNet exhibits significant inter-class feature confusion between categories whose visual features are highly susceptible to light and shadow interference. In real-world aviation maintenance standards, cracks are classified as high-risk structural damage that often directly triggers grounding and maintenance orders, whereas minor dents and corrosion have certain monitoring and release tolerances. As shown in Figure 7a, the classical model misclassifies 6 dent samples and 4 corrosion samples as cracks with high confidence; more severely, it erroneously misses 12 high-risk crack samples by misjudging them as dents. Such cross-class misjudgments, triggered by highly similar local physical features such as edge sharpness and shadow variations, are not only the core root of inducing high false-alarm costs in the field but also pose fatal safety hazards. This fully exposes the representational limitations of classical convolutional networks when processing three-dimensional deformation features.

In contrast, HQCA-Net embedded with RQCA exhibits a more rigorous and precise decision boundary. As shown in Figure 7b, benefiting from the structured nonlinear channel recalibration capability of the simulated VQC-based RQCA module, HQCA-Net effectively circumvents the oversensitive activation strategies of classical networks. It successfully suppresses the number of false-alarm samples—where dents and corrosion are misjudged as cracks—to exactly 3 cases each. More crucially, it reduces the number of dangerous missed detections, where high-risk cracks are misjudged as dents, from 12 down to 9 cases. Furthermore, the model achieves improvements in absolute recognition accuracy for specific categories such as paint-off; the overall number of correctly identified samples increases from 399 to 405, and accurate identifications specifically for the paint-off category rise from 115 to 117. These quantitative results confirm that HQCA-Net achieves an improved trade-off between high precision and low false alarms in aircraft skin visual inspection tasks. While attaining the highest global classification accuracy, it carefully aligns with the dual stringent criteria—preventing both missed detections and false alarms—required for aircraft skin visual inspection, providing a robust foundation for highly trustworthy intelligent diagnostic systems for aircraft skin defects.

Furthermore, to verify the core competitiveness of HQCA-Net in the context of lightweight industrial inspection scenarios, this paper introduces mainstream lightweight classical channel attention mechanisms, such as Efficient Channel Attention (ECA) and parameter-free 3D attention SimAM, for a horizontal expanded comparison. It should be particularly noted that although modern attention mechanisms based on Transformers possess stronger global modeling capabilities, their quadratic computational complexity and massive parameter scale make them difficult to adapt to the stringent constraints of real-time performance and low-parameter required for field aircraft inspections, in resource-constrained inspection scenarios. Therefore, the baseline selection in this paper strictly focuses on lightweight architectures at the forefront of industrial-grade resource-constrained inspection scenarios. The statistics of key performance indicators for each model on the full test set are presented in Table 2.

Analysis of the data in Table 2 indicates that after introducing classical lightweight attention modules, including SE, ECA, and SimAM, the network’s ability to aggregate local cross-channel features is enhanced, improving the classification accuracy from 95.41% for the pure baseline to 97.39%, while the false positive rate also shows a decreasing trend. However, traditional convolutional attention mechanisms may still face feature representation bottlenecks under complex field noise interference. In comparison, the proposed HQCA-Net benefits from the compact structured nonlinear recalibration introduced by the RQCA module. It achieves the highest classification accuracy of 97.93% among the evaluated models and reduces the global macro FPR to 0.49%, representing a relative reduction of nearly 50% compared with the pure baseline. These results indicate that the proposed simulation-based quantum-classical attention module achieves an improved empirical trade-off between classification accuracy and false-alarm suppression. Rather than increasing the fitting strength to local high-frequency noise, the RQCA module uses a compact structured nonlinear mapping to improve channel recalibration without substantially increasing the parameter scale.

The macroscopic metrics in Table 2 reveal the global dominance of HQCA-Net across the entire category distribution. To further quantify the actual engineering and safety benefits brought by this feature extraction advantage in the most stringent real-world aviation inspection scenarios, this paper isolates the extremely high-risk defect “Crack,” which directly threatens flight safety, for a detailed safety performance analysis. In aviation maintenance practices, tolerating reasonable false alarms is the safety baseline, whereas missed detections are absolutely unacceptable fatal hazards. Therefore, under the benchmark of strictly controlling an equivalent reasonable false alarm cost (FPR = 1.54%), an in-depth comparison of the detection error trade-off performance for faint cracks was conducted between the most representative classical baseline ResNet18-SE and HQCA-Net. The intuitive performance comparison between the two in a logarithmic coordinate system is shown in Figure 8.

As shown in Figure 8, plotted on a logarithmic scale, the solid blue DET curve of HQCA-Net forms an absolute full-band dominance over the dashed red line of the classical SE-ResNet model within the core operating region representing high-reliability requirements. This result intuitively suggests that under the stringent constraint of an ultra-low false alarm threshold, HQCA-Net can maintain a significantly lower missed detection level than the classical model.

Detailed statistics of hard metrics further reveal that near the 10⁻² magnitude on the X-axis—the ultra-low false-positive region most concerning to field operations—HQCA-Net breaks the traditional game-theoretic dilemma in visual inspection where preventing missed detections inevitably increases false alarms. Under the premise of strictly controlling an equivalent false-alarm cost FPR ≈ 1.0%, HQCA-Net drastically reduces the FNR of high-risk cracks from approximately 8.0% in the classical model to below 4.0%, achieving a relative reduction of about 50%. This significant downward trend, coupled with its 97.93% global binary classification accuracy, clearly outperforms the classical baseline model.

This multi-dimensional quantitative comparison confirms that the simulation-based quantum-classical attention module provides effective channel-wise feature recalibration at the macroscopic decision-making level. As classical convolutional models approach their performance limits, they are often constrained by limited local receptive fields, making it challenging to balance the suppression of high-frequency background noise with the preservation of faint defect features. In contrast, the HQCA-Net, using the structured nonlinear mapping of the single-layer simulated VQC, achieves improved empirical performance in both classification accuracy and false-positive suppression, helping establish more stable decision boundaries. This compact and structured feature interaction mechanism reduces the likelihood of false alarms while maintaining sensitivity to defect-relevant channels, thereby providing reliable support for automated diagnostic systems without implying quantum superiority or formal quantum computational advantage.

4.2. Decision Reliability and Calibration Analysis

In aircraft skin defect recognition, high-risk defects such as cracks require deep learning models to provide not only accurate classification results but also reliable confidence estimates. Under complex surface conditions, including reflections, weak defect boundaries, and background texture interference, a model with high overall accuracy may still produce unstable or overconfident predictions for ambiguous samples. To evaluate the decision reliability of the compared models, this paper introduces reliability diagrams to analyze the relationship between predicted confidence and empirical accuracy. The comparative results are shown in Figure 9.

As illustrated in the multi-model reliability diagrams in Figure 9, the intrinsic differences in confidence evaluation among the networks are intuitively revealed. Although the classical ResNet-18 baseline and some of its variants hold a slight numerical advantage in the average global Expected Calibration Error (ECE), an observation of their calibration curves reveals that classical models—especially those incorporating SimAM and SE mechanisms—exhibit severe local fluctuations and distortions within the critical decision interval of 0.5 to 0.8. This implies that when confronting ambiguous borderline samples, the output probabilities of classical architectures are highly unstable, making them prone to uncontrollable local calibration collapse. In contrast, although the proposed HQCA-Net achieves the highest overall classification accuracy of 97.93%, its empirical accuracy maintains a highly smooth and monotonically increasing trend across the entire interval, closely tracing the ideal calibration diagonal. These results indicate that HQCA-Net effectively suppresses the phenomena of local overconfidence or extreme underconfidence common in high-precision networks, outputting consistent and highly calibrated diagnostic results across the entire probability spectrum.

This smooth and stable calibration performance further supports the practical competitiveness of the proposed simulation-based hybrid architecture in industrial inspection scenarios. In traditional classical deep learning, smoothing local probability fluctuations and performing uncertainty calibration often require post-processing techniques such as Monte Carlo Dropout or Deep Ensembles. However, these techniques usually require multiple repeated forward inferences for the same input or the parallel execution of multiple large baseline networks, leading to a substantial increase in computational overhead and inference latency. This makes them less suitable for the stringent real-time requirements of field aircraft inspections. In contrast, HQCA-Net alleviates this computational burden by using the compact structured nonlinear recalibration mechanism within RQCA. The proposed module imposes a lightweight constraint on the channel feature space, allowing HQCA-Net to achieve a favorable balance between classification accuracy and decision stability with only a single forward pass, while avoiding the additional inference cost required by uncertainty post-processing methods such as Monte Carlo Dropout or Deep Ensembles.

4.3. Fair Bottleneck-Based Ablation Study and Parameter-Efficient Nonlinear Recalibration

In the study of hybrid quantum-classical architectures, an important issue is whether the observed performance improvement comes from the compact structured nonlinear recalibration capability of the simulated VQC-based module or simply from an increased number of trainable parameters. To avoid conflating parameter efficiency with quantum advantage, this study designs a fair bottleneck ablation experiment. The purpose of this experiment is not to prove formal quantum advantage, but to evaluate whether the proposed RQCA module can provide competitive nonlinear feature recalibration under the same 10-dimensional bottleneck with a substantially smaller parameter scale. Specifically, HQCA-Net is compared with four classical feature dimensionality reduction modules under an identical 10-dimensional information bottleneck, including Classical SE, Deep MLP, high-order polynomial expansion, and RBF kernel mechanisms. The validation accuracy evolution curves and quantitative results are shown in Figure 10 and Table 3.

Combining the quantitative comparison data in Table 3 with the long-term convergence trajectories in Figure 10, it can be observed that under the stringent condition where feature channels are compressed to a 10-dimensional bottleneck, different nonlinear feature recalibration mechanisms exhibit different representational characteristics and robustness. The classical linear SE module encounters an expression bottleneck in the ultra-low-dimensional space. It experiences relatively large performance fluctuations during training and ranks last with a final validation accuracy of approximately 95.41%. Meanwhile, the Deep MLP and RBF mechanisms introduce stronger nonlinear fitting ability, but they still exhibit longer recovery periods after learning-rate decay perturbations at the 10th and 20th epochs. These results suggest that increasing nonlinear fitting strength alone does not necessarily guarantee stable feature recalibration under an extremely narrow bottleneck.

In contrast, HQCA-Net demonstrates competitive recovery behavior after performance oscillations and presents a smoother convergence trend in the later training stage. Its final validation accuracy stabilizes at 96.31%, outperforming the classical SE, Deep MLP, and RBF variants, and reaching a performance level close to that of the high-order polynomial expansion. However, the polynomial expansion achieves this slightly higher accuracy at the cost of a very large parameter scale of more than 150 K additional parameters, whereas HQCA-Net uses only approximately 10 K classical projection parameters and 30 trainable quantum-gate parameters in the simulated VQC module.

These results indicate that the proposed RQCA module provides competitive feature recalibration capability under a compact 10-dimensional bottleneck. The comparison should be interpreted as evidence of parameter efficiency rather than as proof of formal quantum advantage. Although HQCA-Net reaches a performance level comparable to the high-order polynomial expansion with far fewer trainable quantum-gate parameters, this does not prove that classical models cannot approximate the same mapping with sufficient parameter scaling. Instead, it shows that the simulated VQC can serve as a highly compact container for nonlinear channel interactions under the tested bottleneck constraint. Therefore, the main claim supported by this ablation study is parameter-efficient nonlinear feature recalibration at the model level, rather than quantum computational superiority or hardware-level quantum acceleration.

4.4. Visual Interpretability Analysis Based on Grad-CAM

To further examine whether the proposed constrained channel-recalibration mechanism produces more defect-focused responses, this paper uses Grad-CAM to qualitatively compare the deep feature activation patterns across six typical aircraft skin defects. It should be emphasized that the Grad-CAM results are used as qualitative evidence consistent with the theoretical interpretation in Section 3.3, rather than as standalone proof of feature separation or quantum superiority. The visual comparison results are shown in Figure 11.

Combining the visualization details in Figure 11, it can be observed that classical attention mechanisms tend to produce more dispersed and less defect-focused activation patterns in the deep feature space. When confronting real aviation skin images, the highly activated regions (represented by the red to yellow color spectrum) of classical models like SE-ResNet18 exhibit a distinctly disordered and diffused state. A massive amount of attention weights are erroneously dissipated and allocated to defect-free normal skin areas, complex light-and-shadow gradients, or irrelevant mechanical edges, as observed in the Corrosion and Scratch samples in Figure 11.

In contrast, the attention heatmaps of HQCA-Net demonstrate more concentrated defect-related responses and rigorous spatial constraint capabilities. HQCA-Net exhibits improved spatial localization accuracy. It not only precisely anchors the deep-red, high-response activation core regions directly onto the defect itself but also achieves an efficient reconstruction of the complex geometric topologies of the defects. For instance, when dealing with the complex Paint-off defect, the attention response of HQCA-Net is more concentrated around the crescent-shaped edge region of the defect. When capturing faint Scratches, the highly responsive regions are more consistent with the physical diagonal trajectory of the defect, reducing the influence of surrounding background noise.

These Grad-CAM observations are consistent with the theoretical interpretation in Section 3.3. They suggest that the constrained nonlinear channel recalibration introduced by RQCA can reduce attention dispersion caused by background reflections and irrelevant mechanical edges, while enhancing responses around defect-related regions. However, these visualization results should be interpreted as qualitative support for the proposed feature recalibration mechanism, rather than as direct proof of quantum advantage or formal feature disentanglement.

4.5. Performance Boundary Analysis Under Challenging Conditions

As a rigorous study, this paper further conducts an objective evaluation of the performance limits of HQCA-Net under challenging conditions. The experiments systematically explore the potential limitations of the hybrid quantum-classical architecture from two critical dimensions: environmental noise sensitivity and sample scarcity.

The Gaussian noise robustness analysis is shown in Figure 12. As the standard deviation of Gaussian noise increases from 0.00 to an extreme threshold of 0.20, the random noise severely disrupts the global pixel correlation and the underlying manifold structure of the images, causing the classification accuracy of all tested models to inevitably exhibit a monotonically decreasing trend. However, during this process, the sensitivities of different attention mechanisms to noise reveal significant differences.

In a pristine, noise-free environment, HQCA-Net leads with an accuracy of 97.97%, fully demonstrating its exceptionally high acuity for faint defect features under clean conditions. Nevertheless, when the noise increases to 0.20, classical attention mechanisms based on MLP expose severe vulnerabilities. The accuracy of the most representative SE-ResNet18 plummets to 23.83%, performing significantly worse than the 38.67% achieved by the classical ResNet18 baseline without any attention mechanism. This phenomenon indicates that traditional fully connected channel attention is highly susceptible to losing its feature discriminative ability under strong noise interference, mistakenly amplifying high-frequency noise components instead.

In contrast, HQCA-Net maintains an accuracy of 42.36% under extreme noise conditions. Although lightweight modules like SimAM and ECA numerically exhibit a certain level of resistance to degradation, this is primarily attributable to the underfitting tendency of their minimalist structures when facing complex perturbations, rather than active feature purification. The robustness of HQCA-Net is associated with the compact constraint introduced by the 10-dimensional information bottleneck in the RQCA module. The latent feature vectors are restricted to a highly compressed representation space before being processed by the simulated VQC-based recalibration module. This compact bottleneck-based feature constraint and stringent parameter constraints act as a natural filter for random high-frequency signals. This mechanism enables HQCA-Net to effectively circumvent the noise-fitting traps of classical large-parameter modules while overcoming the representational deficiencies of minimalist modules, achieving a unification of detection accuracy and robustness.

This multi-dimensional comparison further supports the model-level robustness of the proposed RQCA module in feature recalibration. Under clean or mildly noisy conditions, the simulated VQC-based attention module uses parameterized rotations, CNOT-based structured coupling, and expectation measurements to generate compact nonlinear channel responses, which helps enhance defect-related features and suppress background interference. When exposed to stronger noise perturbations, the low-dimensional bottleneck and single-layer circuit design constrain the channel recalibration process, thereby reducing the risk of fitting random high-frequency noise. Compared with classical attention modules with larger or less constrained nonlinear mappings, this compact structured feature interaction mechanism can improve robustness while maintaining a lightweight architecture. These results provide empirical evidence for the reliability of the proposed simulation-based feature recalibration strategy, but do not constitute proof of quantum superiority or formal quantum advantage.

In the Few-Shot stress test using only 10% of the training data, as shown in Figure 13 and Table 4, HQCA-Net exhibits dynamic characteristics of rapid initial convergence and broader performance boundaries early in the training phase. At the very early stage of Epoch 2, the validation accuracy of HQCA-Net rapidly climbs to 55.94%, demonstrating a clear convergence advantage over the approximately 42% accuracy of the classical baseline. These results suggest that, in scenarios with extreme data scarcity, the simulated VQC-based RQCA module can provide a compact nonlinear recalibration constraint that helps the model learn defect-related channel patterns more efficiently than several lightweight comparison models. This observation should be interpreted as empirical evidence of parameter-efficient feature recalibration under limited training samples, rather than as proof of superior quantum representational capability.

As training progresses, HQCA-Net not only maintains a strong ascending momentum but also achieves a final generalization accuracy of 82.01%. This result surpasses modern mainstream lightweight attention mechanisms such as ECA (80.22%) and SimAM (80.31%), and closely approaches the baseline level of the classical SE architecture (83.00%). It must be objectively pointed out that, observing the validation loss evolution in Figure 13b, due to the strong nonlinear fitting capacity of the simulated VQC-based recalibration module, the model inevitably experiences a certain degree of overfitting fluctuation in the mid-to-late stages under extreme data constraints. However, combining the comprehensive metrics in Table 4 reveals that, despite facing severe overfitting pressure, HQCA-Net still manages to reduce the final False Positive Rate (FPR) to 4.27% through its compact structured recalibration design, outperforming peer modern lightweight comparative groups. The experimental data indicate that under small-sample conditions, HQCA-Net can construct highly resilient decision boundaries despite harsh data constraints, providing a viable technical solution for defect detection tasks in the aviation industry where sample acquisition is difficult.

The comprehensive multi-dimensional experimental analysis above demonstrates the potential advantages of HQCA-Net as a typical hybrid quantum-classical architecture: rapid initial optimization, precise feature anchoring, and strong anti-interference robustness. Particularly under extreme constraint environments such as strong noise interference and small-sample starvation, this architecture exhibits improved feature discrimination and generalization behavior comparable to or better than those of modern classical attention mechanisms, effectively curbing performance degradation caused by external perturbations or data scarcity.

4.6. Trainability of the Quantum Module and Hyperparameter Space Exploration

In the design of HQCA-Net, the trainability and hyperparameter configuration of the quantum channel attention module are core elements determining the final performance of the system. Addressing the prevalent “barren plateau” problem in variational quantum machine learning—where the gradient variance exponentially shrinks as the circuit depth increases—this section conducts an in-depth experimental demonstration and physical analysis of the gradient evolution trajectory and architecture design space of HQCA-Net.

4.6.1. Evolution of Quantum Gradient Variance and Analysis of Vanishing Risk

To quantitatively evaluate the trainability of the Variational Quantum Circuit (VQC) in HQCA-Net, a dedicated gradient probe mechanism was designed. During the model training process, the gradient variance of the core quantum rotation gate parameters within the RQCA module was tracked in real-time and accurately extracted. Its evolution trajectory over the complete 50-epoch training cycle is shown in Figure 14.

As shown in Figure 14, throughout the entire training cycle, the gradient variance of the quantum parameters in HQCA-Net starts smoothly from an initial magnitude of

O (10^{- 7})

, gradually converges as the network optimizes, and ultimately remains relatively stable around 10⁻⁸. When confronting the complex non-convex optimization landscape inherent to aircraft skin defect recognition, the gradient variance maintains an effective numerical range without exhibiting obvious exponential decay or convergence to zero. This indicates that the model can continuously obtain effective parameter update directions during training. Based on the recorded training logs, these results suggest that the adopted shallow VQC design does not exhibit obvious gradient-variance collapse in this experimental setting. Therefore, the proposed RQCA module can maintain effective parameter updates during training, thereby supporting the trainability of the hybrid quantum-classical architecture in the tested aircraft skin defect recognition task.

Regarding the observed gradient stability, it can be explained from two architectural design aspects:

1. Low-dimensional bottleneck and shallow circuit topology. Through the pre-introduced classical global average pooling and feature bottleneck mechanisms, the latent feature vectors entering the variational quantum circuit are strictly restricted to 10 dimensions. This design not only compresses the quantum operation space to 10 qubits but also, guided by ablation studies, adopts a topological architecture based on a single-layer strongly entangling circuit. This dimensionality reduction and shallow topology help reduce the risk of unstable gradient behavior caused by increased latent dimensionality or excessive circuit depth.

2. Residual recalibration topology for stable gradient propagation. The RQCA module adopts a residual recalibration topology, which helps preserve the stable transmission of deep visual feature flows during forward propagation. During backpropagation, this structure provides an identity-mapping-based residual path for error gradients. This architecture-level skip connection can mitigate gradient attenuation during hybrid module optimization, thereby supporting stable parameter updates in the tested training setting.

4.6.2. Ablation Study and Impact Analysis of Circuit Depth on Nonlinear Recalibration Capability

The number of entangling layers directly affects the nonlinear recalibration capacity of the simulated VQC module and the complexity of the optimization landscape. To confirm that the circuit depth in this paper is the optimal configuration, systematic ablation experiments regarding

L

were conducted on the validation set. The comparisons of final validation accuracy and parameter counts under different depth configurations are shown in Table 5 and Figure 15.

From the experimental results, it can be seen that when the number of entangling layers is L = 1, HQCA-Net achieves a good balance between feature recalibration capability and optimization difficulty, with the validation accuracy peaking at 97.93%. This indicates that, for aircraft skin defect recognition, a single-layer strongly entangling variational quantum circuit already provides sufficient compact nonlinear recalibration capability under the tested dataset and model configuration, without requiring additional circuit-depth stacking. Increasing the circuit depth may introduce more trainable parameters and optimization complexity, which does not necessarily lead to further performance improvement in this task.

However, when the circuit depth is further increased to

L = 2, 3, 4

, the classification performance of the model does not improve with the increase in quantum parameters from 30 to 120. Instead, it exhibits significant degradation, with the validation accuracy retreating to 96.94%, 95.77%, and 96.94%, respectively. This phenomenon profoundly indicates that in industrial visual diagnosis tasks, increasing the circuit depth and parameter count may lead to over-parameterization in this task. An overly deep variational circuit not only increases the non-convexity of the optimization landscape, exacerbating overfitting, but the surplus fitting capacity also easily causes the model to overfit common high-frequency background noise in industrial images. This subsequently triggers oscillations in the late training stages (

L = 4

), ultimately damaging the model’s generalization performance on the test set.

In summary, the results suggest that L = 1 is an appropriate configuration for the quantum channel attention module. With only 30 trainable quantum parameters, this design enables effective recalibration of high-dimensional features while reducing the risk of overfitting and parameter redundancy at the model-design level. It achieves a favorable balance between detection accuracy and trainability in resource-constrained aircraft skin defect recognition scenarios.

4.7. Model Complexity and Practical Feasibility Analysis

In practical aircraft skin defect recognition tasks, lightweight visual models are expected to maintain a favorable balance between recognition performance and computational cost. To further evaluate the lightweight property and practical feasibility of the proposed HQCA-Net, this section provides a model-level complexity comparison with ResNet-18 and classical attention baselines, including SE, ECA, and SimAM. The comparison focuses on total parameter count, attention-module parameter increment, and Multiply-Accumulate Operations (MACs) under a standard input resolution of 224 × 224. It should be noted that these metrics reflect model-level computational complexity rather than measured inference latency, memory footprint, or energy consumption on embedded hardware. The results are presented in Table 6.

4.7.1. Evaluation of Model-Level Computational Burden and Parameter Reduction

As shown in Table 6, all compared models maintain the same model-level MACs of 1.823 G under the standard 224 × 224 input setting. This is because the attention modules are inserted after the final convolutional stage, where the spatial resolution of the feature map has already been substantially reduced. Therefore, the additional computational cost introduced by these attention modules is negligible compared with the convolutional operations in the ResNet-18 backbone.

In terms of parameter count, the SE module increases the model size by 32.76 K parameters due to its fully connected excitation structure. In contrast, the proposed RQCA module introduces 10.76 K additional parameters, including the classical projection and reconstruction layers as well as 30 trainable quantum-gate parameters in the simulated VQC. Therefore, compared with SE-ResNet18, HQCA-Net reduces the attention-module parameter increment by approximately 67.1% while achieving higher classification accuracy and a lower false positive rate in the tested aircraft skin defect recognition task.

It should be noted that the MACs reported in this section reflect model-level computational complexity rather than measured inference latency on physical devices. Since no embedded hardware benchmarking was conducted in this study, the actual runtime, memory usage, and power consumption of HQCA-Net on edge platforms still require further experimental verification.

4.7.2. Performance–Complexity Trade-Off

The comparison in Table 5 indicates that different attention mechanisms present different trade-offs between parameter overhead and recognition performance. ECA and SimAM introduce almost no additional parameters, but their recognition accuracy and false positive rate are lower than those of HQCA-Net in this experiment. SE improves the baseline model but introduces a larger parameter increment than the proposed RQCA module. HQCA-Net achieves the highest accuracy of 97.93% and the lowest FPR of 0.49% among the compared models, while keeping the total parameter count at 11.190 M.

These results suggest that the proposed simulated VQC-based attention module can provide a favorable balance between model complexity and recognition reliability. Rather than substantially increasing the parameter scale, RQCA performs channel feature recalibration through a compact 10-dimensional bottleneck and a 10-qubit simulated variational quantum circuit. In this context, the simulated VQC should be understood as a compact container for nonlinear channel interactions, rather than as evidence of formal quantum advantage. This design allows HQCA-Net to improve false-alarm suppression and classification performance with a moderate parameter increment.

In summary, the model complexity analysis shows that HQCA-Net achieves improved recognition performance and reliability with limited additional parameter overhead. This finding should be interpreted as model-level evidence of parameter-efficient nonlinear feature recalibration, not as proof of quantum computational superiority or hardware-level quantum acceleration. However, the present analysis is restricted to parameter count and MACs. Future work should further evaluate the proposed model on practical embedded computing platforms to measure actual inference latency, memory consumption, and energy efficiency.

5. Conclusions and Future Work

Aircraft skin defect recognition is a safety-critical visual inspection task that requires not only high classification accuracy but also reliable decision confidence and effective false-alarm suppression. Existing lightweight deep learning models may suffer from predictive overconfidence and feature confusion when facing highly reflective aircraft skin surfaces, weak defect boundaries, and complex background textures. To address these challenges, this paper proposed HQCA-Net, a simulation-based Hybrid Quantum-Classical Channel Attention Network for reliable aircraft skin defect recognition. The core Residual Quantum Channel Attention (RQCA) module embeds a 10-qubit simulated Variational Quantum Circuit into a classical ResNet-18 backbone to perform compact and structured channel-wise feature recalibration.

Experimental results on a six-class aircraft skin defect dataset show that HQCA-Net achieves a classification accuracy of 97.93% and a global false positive rate of 0.49%, outperforming ResNet-18 and several classical lightweight attention mechanisms, including SE, ECA, and SimAM. In addition to overall classification performance, the proposed model demonstrates improved reliability in terms of confusion matrix analysis, confidence calibration, and false-alarm suppression. The reliability diagrams indicate that HQCA-Net produces a smoother confidence-accuracy relationship than several classical comparison models, suggesting improved decision stability for ambiguous aircraft skin defect samples.

The ablation studies further show that the proposed RQCA module can provide competitive feature recalibration capability under a compact 10-dimensional bottleneck. With only 30 trainable quantum-gate parameters, the simulated VQC-based attention module achieves effective nonlinear feature interaction while maintaining a lightweight parameter scale. It is important to emphasize that this result should be interpreted as evidence of parameter-efficient feature recalibration rather than as proof of formal quantum advantage. In other words, the fair bottleneck comparison demonstrates that the simulated VQC can serve as a highly compact container for nonlinear channel interactions under limited parameter budgets, but it does not prove that the learned transformation is inaccessible to classical models with arbitrary parameter scaling. The main claim supported by this experiment is therefore parameter efficiency and compact feature recalibration, not quantum computational superiority.

Visual interpretation based on Grad-CAM is interpreted together with the theoretical explanation provided in Section 3.3, rather than being used as standalone evidence of feature separation. The visualization results suggest that HQCA-Net can focus more consistently on defect-related regions and reduce attention dispersion caused by background reflections and irrelevant surface textures, which is consistent with the proposed constrained channel-recalibration mechanism. Robustness experiments under Gaussian noise perturbation and few-shot training conditions further indicate that the proposed model maintains competitive performance under challenging visual inspection scenarios. These results suggest that the hybrid quantum-classical channel attention module can serve as a promising parameter-efficient nonlinear feature recalibration strategy for reliable aircraft skin defect recognition.

However, this study still has several limitations. First, the quantum component was evaluated using state-vector simulation, and no physical quantum hardware was involved. Therefore, the reported results should be interpreted as empirical, model-level evidence for the effectiveness of a simulated variational quantum circuit as a compact nonlinear feature recalibration module, rather than as a demonstration of hardware-level quantum acceleration, quantum supremacy, or formal quantum computational advantage. In other words, this study does not prove that the learned feature transformation is inaccessible to classical models; instead, it shows that the simulated VQC can provide an effective and parameter-efficient structured mapping for channel attention in the tested aircraft skin defect recognition scenario.

Furthermore, a notable weakness and necessary trade-off of the proposed method is the computational overhead introduced during the training phase. Although the additional parameter count of the RQCA module is extremely compact, the state-vector simulation of the quantum circuit on classical hardware increases the training time per epoch compared with purely classical attention mechanisms. This trade-off is currently associated with using the simulated VQC as a structured nonlinear interaction module to improve reliability and suppress false alarms. Second, although the model complexity was analyzed in terms of parameters and MACs, real-time implementation on practical embedded platforms, including potential onboard UAV systems, was not conducted in this work. Third, the experiments were performed on a single aircraft skin defect dataset, and further validation on larger external datasets is still required to assess generalization ability.

Future work will focus on three aspects. First, more diverse aircraft skin defect datasets will be collected or introduced to evaluate the cross-domain robustness of HQCA-Net under different illumination conditions, surface materials, and imaging devices. Second, the model will be tested on practical edge-computing platforms, such as embedded GPUs, to measure real inference latency, memory usage, and power consumption. Third, hardware-compatible quantum circuit designs, noise-aware quantum simulation, and stronger classical nonlinear baselines will be investigated to further clarify the practical value and theoretical boundary of simulated hybrid quantum-classical attention mechanisms for visual defect inspection.

Author Contributions

Conceptualization, S.J.; methodology, S.J.; software, H.P.; validation, H.P.; formal analysis, H.P.; investigation, H.P.; data curation, H.P.; writing—original draft preparation, S.J.; writing—review and editing, H.P., S.J., D.Z. and Y.Z.; visualization, D.Z. and Y.Z.; supervision, S.J.; project administration, S.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Sichuan Province, grant number 2024NSFSC0522. The APC was funded by the Natural Science Foundation of Sichuan Province.

Data Availability Statement

The data presented in this study are available from the corresponding author upon reasonable request due to project-related data management restrictions.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

CNN	Convolutional Neural Network
CNOT	Controlled-NOT
DET	Detection Error Tradeoff
ECA	Efficient Channel Attention
ECE	Expected Calibration Error
FNR	False Negative Rate
FPR	False Positive Rate
GAP	Global Average Pooling
Grad-CAM	Gradient-weighted Class Activation Mapping
HQCA-Net	Hybrid Quantum-Classical Channel Attention Network
MACs	Multiply-Accumulate Operations
MLP	Multi-Layer Perceptron
OOD	Out-of-Distribution
QA	Quantum Annealing
QML	Quantum Machine Learning
RQCA	Residual Quantum Channel Attention
SE	Squeeze-and-Excitation

Nomenclature

X	Input defect image
F	Deep semantic feature map extracted by the classical backbone
C	Number of feature channels
H	Height of the feature map
W	Width of the feature map
z	Channel descriptor obtained by global average pooling
N	Number of qubits in the variational quantum circuit
L	Depth of the strongly entangling layer
θ	Classical latent vector encoded into the quantum circuit
ϕ	Trainable parameters of the variational quantum circuit
Ry	Single-qubit rotation gate around the Y-axis
Watt	Channel attention weights generated by the RQCA module
$\hat{y}$	Predicted probability vector
y	Ground-truth label vector
$ℒ$	Training loss function
α	Class balancing factor in Focal Loss
γ	Focusing parameter in Focal Loss
TP	True positive
TN	True negative
FP	False positive
FN	False negative

References

Aviation Safety Council. In-Flight Breakup over the Taiwan Strait Northeast of Makung, Penghu Island, China Airlines Flight CI611, Boeing 747-200, B-18255, 25 May 2002. In Aviation Occurrence Report ASC-AOR-05-02-001; Aviation Safety Council: Taipei, Taiwan, 2005. [Google Scholar]
Donatus, R.E.; Ubadike, O.C.; Bonet, M.U.; Iyaghiba, S.D.; Donatus, I.H.; Mbada, N.I. Automated aircraft structural defect detection using deep learning and computer vision. Mekatronika 2025, 7, 108–123. [Google Scholar] [CrossRef]
Plastropoulos, A.; Bardis, K.; Yazigi, G.; Avdelidis, N.P.; Droznika, M. Aircraft skin machine learning-based defect detection and size estimation in visual inspections. Technologies 2024, 12, 158. [Google Scholar] [CrossRef]
Cheng, Y.; Cao, Y.; Yao, H.; Luo, W.; Jiang, C.; Zhang, H.; Shen, W. A comprehensive survey for real-world industrial defect detection: Challenges, approaches, and prospects. arXiv 2025, arXiv:2507.13378. [Google Scholar] [CrossRef]
Bai, J.; Wu, D.; Shelley, T.; Schubel, P.; Twine, D.; Russell, J.; Zeng, X.; Zhang, J. A Comprehensive Survey on Machine Learning Driven Material Defect Detection: Challenges, Solutions, and Future Prospects. arXiv 2024, arXiv:2406.07880. [Google Scholar]
Connolly, L.; Garland, J.; O’Gorman, D.; Tobin, E.F. Deep-Learning-Based Defect Detection for Light Aircraft with Unmanned Aircraft Systems. IEEE Access 2024, 12, 83876–83886. [Google Scholar] [CrossRef]
vom Schemm, R. Aircraft Dent Detection Utilizing Specular Reflections and Deep Learning. Master’s Thesis, DLR Electronic Library, Cologne, Germany, 2025. [Google Scholar]
Shukla, V.; Shukla, A.; Surya Prakash, S.K.; Shukla, S. A systematic survey: Role of deep learning-based image anomaly detection in industrial inspection contexts. Front. Robot. AI 2025, 12, 1554196. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Yao, Y.; Zheng, H.; Han, Y. ESS-DETR: A Lightweight and High-Accuracy UAV-Deployable Model for Surface Defect Detection. Drones 2026, 10, 43. [Google Scholar] [CrossRef]
Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; pp. 1321–1330. [Google Scholar]
Li, Y.; Xie, Y.; He, H. A Lightweight Deep Learning Network with an Optimized Attention Module for Aluminum Surface Defect Detection. Sensors 2024, 24, 7691. [Google Scholar] [CrossRef] [PubMed]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11531–11539. [Google Scholar]
Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. SimAM: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the 38th International Conference on Machine Learning (ICML), Virtual, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
Niu, Z.; Zhong, G.; Yu, H. A review of attention mechanism in deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021. [Google Scholar]
Cherrat, E.A.; Kerenidis, I.; Mathur, N.; Landman, J.; Strahm, M.; Li, Y.Y. Quantum Vision Transformers. Quantum 2024, 8, 1265. [Google Scholar] [CrossRef]
Havlíček, V.; Córcoles, A.D.; Temme, K.; Harrow, A.W.; Kandala, A.; Chow, J.M.; Gambetta, J.M. Supervised learning with quantum-enhanced feature spaces. Nature 2019, 567, 209–212. [Google Scholar] [CrossRef] [PubMed]
Islam, M.M.; He, J.S. Quantum Machine Learning for Computer Vision: A Survey. In Proceedings of the 2024 International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 18–20 December 2024; pp. 1827–1832. [Google Scholar]
Rizvi, S.M.A.; Paracha, U.I.; Khalid, U.; Lee, K.; Shin, H. Quantum Machine Learning: Towards Hybrid Quantum-Classical Vision Models. Mathematics 2025, 13, 2645. [Google Scholar] [CrossRef]
Li, G.; Zhao, X.; Wang, X. Quantum self-attention neural networks for text classification. Sci. China Inf. Sci. 2024, 67, 142501. [Google Scholar] [CrossRef]
Pandey, P.; Mandal, S. A hybrid quantum-classical convolutional neural network with a quantum attention mechanism for skin cancer. Sci. Rep. 2026, 16, 1639. [Google Scholar] [CrossRef] [PubMed]
Pesah, A.; Cerezo, M.; Wang, S.; Volkoff, T.; Sornborger, A.T.; Coles, P.J. Absence of barren plateaus in quantum convolutional neural networks. Phys. Rev. X 2021, 11, 041011. [Google Scholar] [CrossRef]
Mukhanbet, A.; Daribayev, B. A Hybrid Quantum-Classical Architecture with Data Re-Uploading and Genetic Algorithm Optimization for Enhanced Image Classification. Computation 2025, 13, 185. [Google Scholar] [CrossRef]

Figure 1. Overall framework of HQCA-Net.

Figure 2. Architecture of the Residual Quantum Channel Attention mechanism and the quantum circuit diagram.

Figure 3. Hybrid quantum-classical joint optimization and gradient backpropagation framework.

Figure 4. Examples of the six defect categories in the aircraft skin dataset.

Figure 5. Evolution curves of HQCA-Net, Classical ResNet-18, and SE-ResNet-18 on the full dataset: (a) Evolution curve of validation F1-score; (b) Convergence curve of training loss.

Figure 6. Evolution curves of ECA and SimAM on the full dataset: (a) Evolution curve of validation F1-score; (b) Convergence curve of training loss.

Figure 7. Comparison of classification confusion matrices between ResNet18-SE and HQCA-Net: (a) Confusion matrix of the classical ResNet18-SE; (b) Confusion matrix of the proposed HQCA-Net.

Figure 8. Detection Error Tradeoff (DET) curve.

Figure 9. Comparison of confidence reliability diagrams.

Figure 10. Convergence curves of the fair bottleneck ablation experiments: (a) Global convergence curve; (b) Magnified details from epoch 20 to 50.

Figure 11. Visual interpretability comparison using Grad-CAM.

Figure 12. Robustness comparison under different Gaussian noise levels.

Figure 13. Output results in the few-shot scenario: (a) Convergence curve of validation accuracy; (b) Convergence curve of validation loss; (c) Convergence curve of training loss.

Figure 14. Evolution curve of the gradient variance of quantum parameters.

Figure 15. Evolution curves of the ablation study on the number of quantum entangling layers (L = 1 to L = 4): (a) Evolution curve of validation F1-score; (b) Evolution curve of training loss.

Table 1. Key implementation details of HQCA-Net.

Item	Setting
Backbone and input	ResNet-18 with 224 × 224 input images
Framework	PyTorch and PennyLane
Quantum simulation	State-vector simulator
VQC configuration	10 qubits and one strongly entangling layer
Quantum operations	Ry angle embedding, CNOT entanglement, and Pauli-Z expectation measurement
Quantum parameters	30 trainable quantum-gate parameters
Optimization	Adam optimizer, initial learning rate of 0.0001, batch size of 16, and 50 training epochs
Training strategy	Focal Loss, cosine annealing learning-rate scheduling, and parameter-shift gradient calculation

Table 2. Comparison of comprehensive performance metrics among different attention mechanisms on the test set.

Model Architecture (Backbone: ResNet-18)	Attention Type	Accuracy (Acc.)	Macro F1-Score	False Positive Rate (FPR)
Classical ResNet-18	None	95.41%	95.47%	1.06%
Classical ResNet-18	SE	96.67%	96.56%	0.77%
ResNet-18 + ECA	ECA	97.03%	96.82%	0.66%
ResNet-18 + SimAM	SimAM	97.39%	97.19%	0.61%
HQCA-Net	RQCA	97.93%	97.79%	0.49%

Table 3. Performance comparison of different feature dimensionality reduction mechanisms under a strict 10-dimensional bottleneck.

Feature Remodeling Mechanism (10D Bottleneck)	Additional Parameter Scale (Params)	Final Validation Accuracy (Acc.)
Classical SE	Very low (~10 K)	95.41%
Deep MLP	Higher (~50 K)	95.50%
RBF Kernel	Medium (~25 K)	95.86%
Polynomial Ext.	Very high (>150 K)	96.40%
HQCA-Net (Ours)	Very low (~10 K + 30 quantum)	96.31%

Table 4. Comprehensive performance evaluation and comparison of various models under extreme few-shot conditions.

Model Architecture	Attention Type	Accuracy (Acc.)	Macro F1-Score	False Positive Rate (FPR)
SE-ResNet	SE	83.00%	79.78%	4.00%
HQCA-Net	RQCA	82.01%	78.23%	4.27%
SimAM-ResNet	SimAM	80.31%	77.99%	4.60%
ECA-ResNet	ECA	80.22%	78.70%	4.81%

Table 5. Performance ablation comparison of different numbers of entangling layers

L .

Table 5. Performance ablation comparison of different numbers of entangling layers

L .

Number of Entangling Layers (L)	VQC Parameter Count (Params)	Validation Accuracy (Val Acc.)	Generalization Performance and Phenomenon Description
1 (Ours)	30	97.93%	Pareto optimal; extremely lightweight with optimal feature recalibration.
2	60	96.94%	Over-parameterization tendency emerges; slight retreat in generalization accuracy.
3	90	95.77%	Overfitting exacerbates; non-convexity of the optimization landscape significantly increases.
4	120	96.94%	Oscillations in late training stages; susceptible to interference from high-frequency background noise.

Table 6. Model-level computational complexity comparison of different attention-based architectures.

Model Architecture	Total Parameters (Params/M)	Multiply-Accumulate Operations (MACs/G)	Module Parameter Increment Relative to Baseline
Classical ResNet-18 (Baseline)	11.179	1.823	0
SE-ResNet18	11.212	1.823	+32.76 K
ECA-ResNet18	11.179	1.823	Negligible
SimAM-ResNet18	11.179	1.823	0 (Parameter-free)
HQCA-Net (Ours)	11.190	1.823	+10.76 K

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiang, S.; Peng, H.; Zhang, D.; Zhu, Y. A Simulation-Based Hybrid Quantum-Classical Channel Attention Network for Reliable Aircraft Skin Defect Recognition. Technologies 2026, 14, 361. https://doi.org/10.3390/technologies14060361

AMA Style

Jiang S, Peng H, Zhang D, Zhu Y. A Simulation-Based Hybrid Quantum-Classical Channel Attention Network for Reliable Aircraft Skin Defect Recognition. Technologies. 2026; 14(6):361. https://doi.org/10.3390/technologies14060361

Chicago/Turabian Style

Jiang, Shiqi, Hai Peng, Dingqi Zhang, and Yupei Zhu. 2026. "A Simulation-Based Hybrid Quantum-Classical Channel Attention Network for Reliable Aircraft Skin Defect Recognition" Technologies 14, no. 6: 361. https://doi.org/10.3390/technologies14060361

APA Style

Jiang, S., Peng, H., Zhang, D., & Zhu, Y. (2026). A Simulation-Based Hybrid Quantum-Classical Channel Attention Network for Reliable Aircraft Skin Defect Recognition. Technologies, 14(6), 361. https://doi.org/10.3390/technologies14060361

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Simulation-Based Hybrid Quantum-Classical Channel Attention Network for Reliable Aircraft Skin Defect Recognition

Abstract

1. Introduction

2. Literature Review

2.1. The Overconfidence Crisis and Local Calibration Collapse in Deep Visual Diagnosis

2.2. From Attention Dimensionality Reduction Bottlenecks to Simulation-Based Quantum-Classical Feature Recalibration

3. Materials and Methods

3.1. Overall Hybrid Architecture Design

3.2. Residual Quantum Channel Attention Mechanism

3.3. Theoretical Interpretation of RQCA as a Structured Nonlinear Feature Recalibration Module

3.4. Loss Function and Quantum-Classical Joint Optimization Strategy

3.5. Experimental Setup and Evaluation Metrics

3.5.1. Aviation Assembly Surface Defect Dataset

3.5.2. Experimental Environment and Hyperparameter Configuration

3.5.3. Evaluation Metrics and Confidence Calibration

3.5.4. Baseline Selection and Fair Ablation Design

4. Results and Discussion

4.1. Trade-Off Between Baseline Performance and Accuracy-Reliability

4.2. Decision Reliability and Calibration Analysis

4.3. Fair Bottleneck-Based Ablation Study and Parameter-Efficient Nonlinear Recalibration

4.4. Visual Interpretability Analysis Based on Grad-CAM

4.5. Performance Boundary Analysis Under Challenging Conditions

4.6. Trainability of the Quantum Module and Hyperparameter Space Exploration

4.6.1. Evolution of Quantum Gradient Variance and Analysis of Vanishing Risk

4.6.2. Ablation Study and Impact Analysis of Circuit Depth on Nonlinear Recalibration Capability

4.7. Model Complexity and Practical Feasibility Analysis

4.7.1. Evaluation of Model-Level Computational Burden and Parameter Reduction

4.7.2. Performance–Complexity Trade-Off

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI