4.1. Trade-Off Between Baseline Performance and Accuracy-Reliability
In this section, an extensive evaluation of the classification performance is conducted using standard test sets to compare the proposed HQCA-Net against the classic ResNet-18 baseline and models integrated with mainstream lightweight attention modules. When confronting the intense background noise characteristic of complex aircraft skin images, the feature-fitting capabilities of classical architectures often encounter representational bottlenecks as they approach their performance limits. In contrast, HQCA-Net demonstrates improved feature remodeling capabilities that surpass those of the classical control groups.
The dynamic convergence characteristics of a model throughout the training cycle serve as a pivotal dimension for validating its structural advantage. To mitigate visual clutter caused by the overlapping of multiple curves, the evolutionary comparison under the full dataset is categorized into two groups.
Figure 5 illustrates the evolution of the validation set F1 scores and training loss convergence curves for the core comparison group (HQCA-Net, classic baseline, and the SE module), while
Figure 6 presents the corresponding indicators for the mainstream lightweight comparison group (ECA and SimAM modules).
As observed from the training loss convergence curves in
Figure 5b and
Figure 6b, all models achieve a rapid reduction in loss during the initial training phase. However, during the late-stage deep optimization, HQCA-Net exhibits a smoother and more compact optimization trajectory compared to other classical lightweight architectures, ultimately converging at the lowest global loss minimum. This demonstrates that the integration of quantum mechanisms does not disrupt the backpropagation gradient flow of the classical ResNet; rather, it enhances the model’s optimizability and robustness within complex parameter spaces.
A further horizontal comparison of the validation set F1 score curves in
Figure 5a and
Figure 6a reveals that, due to a propensity for overfitting complex background noise, classical models—including ECA and SimAM, which emphasize local features—tend to fall into local optima when approaching their respective performance ceilings. This is manifested as prolonged oscillations or even performance degradation. Notably, while HQCA-Net also exhibits minor fluctuations in the later stages, these are mainly distinct from the bottleneck oscillations observed in classical models. While classical models oscillate repeatedly beneath a lower performance ceiling, HQCA-Net maintains a strong upward trend, not only being the first to surpass the performance bottleneck but also repeatedly reaching higher performance intervals during its exploratory fluctuations.
This comprehensive analysis across charts demonstrates that the simulated VQC-based attention module provides effective channel-wise feature recalibration under the tested aircraft skin defect recognition setting. The structured nonlinear mapping introduced by the simulated VQC, implemented via parameterized rotation gates and CNOT-based coupling, contributes to more stable optimization behavior. This compact and constrained feature interaction mechanism helps HQCA-Net achieve improved empirical performance compared with traditional classical architectures on complex aircraft skin defect samples, without implying quantum superiority or formal quantum computational advantage.
Comparing the above metrics, it is evident that HQCA-Net exhibits improved comprehensive evaluation performance, providing a more robust solution to the technical bottleneck of balancing high precision and low false alarms in aircraft skin visual inspection. When approaching performance limits, classical models often rely heavily on the aggressive fitting of high-frequency background noise. To pursue maximum quantitative metrics, their classical attention mechanisms frequently fall into an overly sensitive state of feature activation. In practical aircraft skin inspection scenarios involving sudden lighting changes or minor airborne perspective disturbances, this aggressive feature fitting not only easily translates into high false-alarm costs but also restricts further breakthroughs in the overall accuracy of the network. In contrast, benefiting from the compact structured nonlinear recalibration introduced by the simulated VQC-based RQCA module, HQCA-Net effectively circumvents such aggressive strategies, achieving the highest global accuracy while suppressing the false alarm rate to the lowest level. To further analyze the underlying logic of HQCA-Net in suppressing false alarms for field skin defects at a fine-grained level, this paper comparatively plots the classification confusion matrices of the classical SE-ResNet and HQCA-Net on the test set, as shown in
Figure 7.
Through a deep analysis of the off-diagonal error distribution in the confusion matrices, it can be observed that SE-ResNet exhibits significant inter-class feature confusion between categories whose visual features are highly susceptible to light and shadow interference. In real-world aviation maintenance standards, cracks are classified as high-risk structural damage that often directly triggers grounding and maintenance orders, whereas minor dents and corrosion have certain monitoring and release tolerances. As shown in
Figure 7a, the classical model misclassifies 6 dent samples and 4 corrosion samples as cracks with high confidence; more severely, it erroneously misses 12 high-risk crack samples by misjudging them as dents. Such cross-class misjudgments, triggered by highly similar local physical features such as edge sharpness and shadow variations, are not only the core root of inducing high false-alarm costs in the field but also pose fatal safety hazards. This fully exposes the representational limitations of classical convolutional networks when processing three-dimensional deformation features.
In contrast, HQCA-Net embedded with RQCA exhibits a more rigorous and precise decision boundary. As shown in
Figure 7b, benefiting from the structured nonlinear channel recalibration capability of the simulated VQC-based RQCA module, HQCA-Net effectively circumvents the oversensitive activation strategies of classical networks. It successfully suppresses the number of false-alarm samples—where dents and corrosion are misjudged as cracks—to exactly 3 cases each. More crucially, it reduces the number of dangerous missed detections, where high-risk cracks are misjudged as dents, from 12 down to 9 cases. Furthermore, the model achieves improvements in absolute recognition accuracy for specific categories such as paint-off; the overall number of correctly identified samples increases from 399 to 405, and accurate identifications specifically for the paint-off category rise from 115 to 117. These quantitative results confirm that HQCA-Net achieves an improved trade-off between high precision and low false alarms in aircraft skin visual inspection tasks. While attaining the highest global classification accuracy, it carefully aligns with the dual stringent criteria—preventing both missed detections and false alarms—required for aircraft skin visual inspection, providing a robust foundation for highly trustworthy intelligent diagnostic systems for aircraft skin defects.
Furthermore, to verify the core competitiveness of HQCA-Net in the context of lightweight industrial inspection scenarios, this paper introduces mainstream lightweight classical channel attention mechanisms, such as Efficient Channel Attention (ECA) and parameter-free 3D attention SimAM, for a horizontal expanded comparison. It should be particularly noted that although modern attention mechanisms based on Transformers possess stronger global modeling capabilities, their quadratic computational complexity and massive parameter scale make them difficult to adapt to the stringent constraints of real-time performance and low-parameter required for field aircraft inspections, in resource-constrained inspection scenarios. Therefore, the baseline selection in this paper strictly focuses on lightweight architectures at the forefront of industrial-grade resource-constrained inspection scenarios. The statistics of key performance indicators for each model on the full test set are presented in
Table 2.
Analysis of the data in
Table 2 indicates that after introducing classical lightweight attention modules, including SE, ECA, and SimAM, the network’s ability to aggregate local cross-channel features is enhanced, improving the classification accuracy from 95.41% for the pure baseline to 97.39%, while the false positive rate also shows a decreasing trend. However, traditional convolutional attention mechanisms may still face feature representation bottlenecks under complex field noise interference. In comparison, the proposed HQCA-Net benefits from the compact structured nonlinear recalibration introduced by the RQCA module. It achieves the highest classification accuracy of 97.93% among the evaluated models and reduces the global macro FPR to 0.49%, representing a relative reduction of nearly 50% compared with the pure baseline. These results indicate that the proposed simulation-based quantum-classical attention module achieves an improved empirical trade-off between classification accuracy and false-alarm suppression. Rather than increasing the fitting strength to local high-frequency noise, the RQCA module uses a compact structured nonlinear mapping to improve channel recalibration without substantially increasing the parameter scale.
The macroscopic metrics in
Table 2 reveal the global dominance of HQCA-Net across the entire category distribution. To further quantify the actual engineering and safety benefits brought by this feature extraction advantage in the most stringent real-world aviation inspection scenarios, this paper isolates the extremely high-risk defect “Crack,” which directly threatens flight safety, for a detailed safety performance analysis. In aviation maintenance practices, tolerating reasonable false alarms is the safety baseline, whereas missed detections are absolutely unacceptable fatal hazards. Therefore, under the benchmark of strictly controlling an equivalent reasonable false alarm cost (FPR = 1.54%), an in-depth comparison of the detection error trade-off performance for faint cracks was conducted between the most representative classical baseline ResNet18-SE and HQCA-Net. The intuitive performance comparison between the two in a logarithmic coordinate system is shown in
Figure 8.
As shown in
Figure 8, plotted on a logarithmic scale, the solid blue DET curve of HQCA-Net forms an absolute full-band dominance over the dashed red line of the classical SE-ResNet model within the core operating region representing high-reliability requirements. This result intuitively suggests that under the stringent constraint of an ultra-low false alarm threshold, HQCA-Net can maintain a significantly lower missed detection level than the classical model.
Detailed statistics of hard metrics further reveal that near the 10−2 magnitude on the X-axis—the ultra-low false-positive region most concerning to field operations—HQCA-Net breaks the traditional game-theoretic dilemma in visual inspection where preventing missed detections inevitably increases false alarms. Under the premise of strictly controlling an equivalent false-alarm cost FPR ≈ 1.0%, HQCA-Net drastically reduces the FNR of high-risk cracks from approximately 8.0% in the classical model to below 4.0%, achieving a relative reduction of about 50%. This significant downward trend, coupled with its 97.93% global binary classification accuracy, clearly outperforms the classical baseline model.
This multi-dimensional quantitative comparison confirms that the simulation-based quantum-classical attention module provides effective channel-wise feature recalibration at the macroscopic decision-making level. As classical convolutional models approach their performance limits, they are often constrained by limited local receptive fields, making it challenging to balance the suppression of high-frequency background noise with the preservation of faint defect features. In contrast, the HQCA-Net, using the structured nonlinear mapping of the single-layer simulated VQC, achieves improved empirical performance in both classification accuracy and false-positive suppression, helping establish more stable decision boundaries. This compact and structured feature interaction mechanism reduces the likelihood of false alarms while maintaining sensitivity to defect-relevant channels, thereby providing reliable support for automated diagnostic systems without implying quantum superiority or formal quantum computational advantage.
4.2. Decision Reliability and Calibration Analysis
In aircraft skin defect recognition, high-risk defects such as cracks require deep learning models to provide not only accurate classification results but also reliable confidence estimates. Under complex surface conditions, including reflections, weak defect boundaries, and background texture interference, a model with high overall accuracy may still produce unstable or overconfident predictions for ambiguous samples. To evaluate the decision reliability of the compared models, this paper introduces reliability diagrams to analyze the relationship between predicted confidence and empirical accuracy. The comparative results are shown in
Figure 9.
As illustrated in the multi-model reliability diagrams in
Figure 9, the intrinsic differences in confidence evaluation among the networks are intuitively revealed. Although the classical ResNet-18 baseline and some of its variants hold a slight numerical advantage in the average global Expected Calibration Error (ECE), an observation of their calibration curves reveals that classical models—especially those incorporating SimAM and SE mechanisms—exhibit severe local fluctuations and distortions within the critical decision interval of 0.5 to 0.8. This implies that when confronting ambiguous borderline samples, the output probabilities of classical architectures are highly unstable, making them prone to uncontrollable local calibration collapse. In contrast, although the proposed HQCA-Net achieves the highest overall classification accuracy of 97.93%, its empirical accuracy maintains a highly smooth and monotonically increasing trend across the entire interval, closely tracing the ideal calibration diagonal. These results indicate that HQCA-Net effectively suppresses the phenomena of local overconfidence or extreme underconfidence common in high-precision networks, outputting consistent and highly calibrated diagnostic results across the entire probability spectrum.
This smooth and stable calibration performance further supports the practical competitiveness of the proposed simulation-based hybrid architecture in industrial inspection scenarios. In traditional classical deep learning, smoothing local probability fluctuations and performing uncertainty calibration often require post-processing techniques such as Monte Carlo Dropout or Deep Ensembles. However, these techniques usually require multiple repeated forward inferences for the same input or the parallel execution of multiple large baseline networks, leading to a substantial increase in computational overhead and inference latency. This makes them less suitable for the stringent real-time requirements of field aircraft inspections. In contrast, HQCA-Net alleviates this computational burden by using the compact structured nonlinear recalibration mechanism within RQCA. The proposed module imposes a lightweight constraint on the channel feature space, allowing HQCA-Net to achieve a favorable balance between classification accuracy and decision stability with only a single forward pass, while avoiding the additional inference cost required by uncertainty post-processing methods such as Monte Carlo Dropout or Deep Ensembles.
4.3. Fair Bottleneck-Based Ablation Study and Parameter-Efficient Nonlinear Recalibration
In the study of hybrid quantum-classical architectures, an important issue is whether the observed performance improvement comes from the compact structured nonlinear recalibration capability of the simulated VQC-based module or simply from an increased number of trainable parameters. To avoid conflating parameter efficiency with quantum advantage, this study designs a fair bottleneck ablation experiment. The purpose of this experiment is not to prove formal quantum advantage, but to evaluate whether the proposed RQCA module can provide competitive nonlinear feature recalibration under the same 10-dimensional bottleneck with a substantially smaller parameter scale. Specifically, HQCA-Net is compared with four classical feature dimensionality reduction modules under an identical 10-dimensional information bottleneck, including Classical SE, Deep MLP, high-order polynomial expansion, and RBF kernel mechanisms. The validation accuracy evolution curves and quantitative results are shown in
Figure 10 and
Table 3.
Combining the quantitative comparison data in
Table 3 with the long-term convergence trajectories in
Figure 10, it can be observed that under the stringent condition where feature channels are compressed to a 10-dimensional bottleneck, different nonlinear feature recalibration mechanisms exhibit different representational characteristics and robustness. The classical linear SE module encounters an expression bottleneck in the ultra-low-dimensional space. It experiences relatively large performance fluctuations during training and ranks last with a final validation accuracy of approximately 95.41%. Meanwhile, the Deep MLP and RBF mechanisms introduce stronger nonlinear fitting ability, but they still exhibit longer recovery periods after learning-rate decay perturbations at the 10th and 20th epochs. These results suggest that increasing nonlinear fitting strength alone does not necessarily guarantee stable feature recalibration under an extremely narrow bottleneck.
In contrast, HQCA-Net demonstrates competitive recovery behavior after performance oscillations and presents a smoother convergence trend in the later training stage. Its final validation accuracy stabilizes at 96.31%, outperforming the classical SE, Deep MLP, and RBF variants, and reaching a performance level close to that of the high-order polynomial expansion. However, the polynomial expansion achieves this slightly higher accuracy at the cost of a very large parameter scale of more than 150 K additional parameters, whereas HQCA-Net uses only approximately 10 K classical projection parameters and 30 trainable quantum-gate parameters in the simulated VQC module.
These results indicate that the proposed RQCA module provides competitive feature recalibration capability under a compact 10-dimensional bottleneck. The comparison should be interpreted as evidence of parameter efficiency rather than as proof of formal quantum advantage. Although HQCA-Net reaches a performance level comparable to the high-order polynomial expansion with far fewer trainable quantum-gate parameters, this does not prove that classical models cannot approximate the same mapping with sufficient parameter scaling. Instead, it shows that the simulated VQC can serve as a highly compact container for nonlinear channel interactions under the tested bottleneck constraint. Therefore, the main claim supported by this ablation study is parameter-efficient nonlinear feature recalibration at the model level, rather than quantum computational superiority or hardware-level quantum acceleration.
4.4. Visual Interpretability Analysis Based on Grad-CAM
To further examine whether the proposed constrained channel-recalibration mechanism produces more defect-focused responses, this paper uses Grad-CAM to qualitatively compare the deep feature activation patterns across six typical aircraft skin defects. It should be emphasized that the Grad-CAM results are used as qualitative evidence consistent with the theoretical interpretation in
Section 3.3, rather than as standalone proof of feature separation or quantum superiority. The visual comparison results are shown in
Figure 11.
Combining the visualization details in
Figure 11, it can be observed that classical attention mechanisms tend to produce more dispersed and less defect-focused activation patterns in the deep feature space. When confronting real aviation skin images, the highly activated regions (represented by the red to yellow color spectrum) of classical models like SE-ResNet18 exhibit a distinctly disordered and diffused state. A massive amount of attention weights are erroneously dissipated and allocated to defect-free normal skin areas, complex light-and-shadow gradients, or irrelevant mechanical edges, as observed in the Corrosion and Scratch samples in
Figure 11.
In contrast, the attention heatmaps of HQCA-Net demonstrate more concentrated defect-related responses and rigorous spatial constraint capabilities. HQCA-Net exhibits improved spatial localization accuracy. It not only precisely anchors the deep-red, high-response activation core regions directly onto the defect itself but also achieves an efficient reconstruction of the complex geometric topologies of the defects. For instance, when dealing with the complex Paint-off defect, the attention response of HQCA-Net is more concentrated around the crescent-shaped edge region of the defect. When capturing faint Scratches, the highly responsive regions are more consistent with the physical diagonal trajectory of the defect, reducing the influence of surrounding background noise.
These Grad-CAM observations are consistent with the theoretical interpretation in
Section 3.3. They suggest that the constrained nonlinear channel recalibration introduced by RQCA can reduce attention dispersion caused by background reflections and irrelevant mechanical edges, while enhancing responses around defect-related regions. However, these visualization results should be interpreted as qualitative support for the proposed feature recalibration mechanism, rather than as direct proof of quantum advantage or formal feature disentanglement.
4.5. Performance Boundary Analysis Under Challenging Conditions
As a rigorous study, this paper further conducts an objective evaluation of the performance limits of HQCA-Net under challenging conditions. The experiments systematically explore the potential limitations of the hybrid quantum-classical architecture from two critical dimensions: environmental noise sensitivity and sample scarcity.
The Gaussian noise robustness analysis is shown in
Figure 12. As the standard deviation of Gaussian noise increases from 0.00 to an extreme threshold of 0.20, the random noise severely disrupts the global pixel correlation and the underlying manifold structure of the images, causing the classification accuracy of all tested models to inevitably exhibit a monotonically decreasing trend. However, during this process, the sensitivities of different attention mechanisms to noise reveal significant differences.
In a pristine, noise-free environment, HQCA-Net leads with an accuracy of 97.97%, fully demonstrating its exceptionally high acuity for faint defect features under clean conditions. Nevertheless, when the noise increases to 0.20, classical attention mechanisms based on MLP expose severe vulnerabilities. The accuracy of the most representative SE-ResNet18 plummets to 23.83%, performing significantly worse than the 38.67% achieved by the classical ResNet18 baseline without any attention mechanism. This phenomenon indicates that traditional fully connected channel attention is highly susceptible to losing its feature discriminative ability under strong noise interference, mistakenly amplifying high-frequency noise components instead.
In contrast, HQCA-Net maintains an accuracy of 42.36% under extreme noise conditions. Although lightweight modules like SimAM and ECA numerically exhibit a certain level of resistance to degradation, this is primarily attributable to the underfitting tendency of their minimalist structures when facing complex perturbations, rather than active feature purification. The robustness of HQCA-Net is associated with the compact constraint introduced by the 10-dimensional information bottleneck in the RQCA module. The latent feature vectors are restricted to a highly compressed representation space before being processed by the simulated VQC-based recalibration module. This compact bottleneck-based feature constraint and stringent parameter constraints act as a natural filter for random high-frequency signals. This mechanism enables HQCA-Net to effectively circumvent the noise-fitting traps of classical large-parameter modules while overcoming the representational deficiencies of minimalist modules, achieving a unification of detection accuracy and robustness.
This multi-dimensional comparison further supports the model-level robustness of the proposed RQCA module in feature recalibration. Under clean or mildly noisy conditions, the simulated VQC-based attention module uses parameterized rotations, CNOT-based structured coupling, and expectation measurements to generate compact nonlinear channel responses, which helps enhance defect-related features and suppress background interference. When exposed to stronger noise perturbations, the low-dimensional bottleneck and single-layer circuit design constrain the channel recalibration process, thereby reducing the risk of fitting random high-frequency noise. Compared with classical attention modules with larger or less constrained nonlinear mappings, this compact structured feature interaction mechanism can improve robustness while maintaining a lightweight architecture. These results provide empirical evidence for the reliability of the proposed simulation-based feature recalibration strategy, but do not constitute proof of quantum superiority or formal quantum advantage.
In the Few-Shot stress test using only 10% of the training data, as shown in
Figure 13 and
Table 4, HQCA-Net exhibits dynamic characteristics of rapid initial convergence and broader performance boundaries early in the training phase. At the very early stage of Epoch 2, the validation accuracy of HQCA-Net rapidly climbs to 55.94%, demonstrating a clear convergence advantage over the approximately 42% accuracy of the classical baseline. These results suggest that, in scenarios with extreme data scarcity, the simulated VQC-based RQCA module can provide a compact nonlinear recalibration constraint that helps the model learn defect-related channel patterns more efficiently than several lightweight comparison models. This observation should be interpreted as empirical evidence of parameter-efficient feature recalibration under limited training samples, rather than as proof of superior quantum representational capability.
As training progresses, HQCA-Net not only maintains a strong ascending momentum but also achieves a final generalization accuracy of 82.01%. This result surpasses modern mainstream lightweight attention mechanisms such as ECA (80.22%) and SimAM (80.31%), and closely approaches the baseline level of the classical SE architecture (83.00%). It must be objectively pointed out that, observing the validation loss evolution in
Figure 13b, due to the strong nonlinear fitting capacity of the simulated VQC-based recalibration module, the model inevitably experiences a certain degree of overfitting fluctuation in the mid-to-late stages under extreme data constraints. However, combining the comprehensive metrics in
Table 4 reveals that, despite facing severe overfitting pressure, HQCA-Net still manages to reduce the final False Positive Rate (FPR) to 4.27% through its compact structured recalibration design, outperforming peer modern lightweight comparative groups. The experimental data indicate that under small-sample conditions, HQCA-Net can construct highly resilient decision boundaries despite harsh data constraints, providing a viable technical solution for defect detection tasks in the aviation industry where sample acquisition is difficult.
The comprehensive multi-dimensional experimental analysis above demonstrates the potential advantages of HQCA-Net as a typical hybrid quantum-classical architecture: rapid initial optimization, precise feature anchoring, and strong anti-interference robustness. Particularly under extreme constraint environments such as strong noise interference and small-sample starvation, this architecture exhibits improved feature discrimination and generalization behavior comparable to or better than those of modern classical attention mechanisms, effectively curbing performance degradation caused by external perturbations or data scarcity.
4.6. Trainability of the Quantum Module and Hyperparameter Space Exploration
In the design of HQCA-Net, the trainability and hyperparameter configuration of the quantum channel attention module are core elements determining the final performance of the system. Addressing the prevalent “barren plateau” problem in variational quantum machine learning—where the gradient variance exponentially shrinks as the circuit depth increases—this section conducts an in-depth experimental demonstration and physical analysis of the gradient evolution trajectory and architecture design space of HQCA-Net.
4.6.1. Evolution of Quantum Gradient Variance and Analysis of Vanishing Risk
To quantitatively evaluate the trainability of the Variational Quantum Circuit (VQC) in HQCA-Net, a dedicated gradient probe mechanism was designed. During the model training process, the gradient variance of the core quantum rotation gate parameters within the RQCA module was tracked in real-time and accurately extracted. Its evolution trajectory over the complete 50-epoch training cycle is shown in
Figure 14.
As shown in
Figure 14, throughout the entire training cycle, the gradient variance of the quantum parameters in HQCA-Net starts smoothly from an initial magnitude of
, gradually converges as the network optimizes, and ultimately remains relatively stable around 10
−8. When confronting the complex non-convex optimization landscape inherent to aircraft skin defect recognition, the gradient variance maintains an effective numerical range without exhibiting obvious exponential decay or convergence to zero. This indicates that the model can continuously obtain effective parameter update directions during training. Based on the recorded training logs, these results suggest that the adopted shallow VQC design does not exhibit obvious gradient-variance collapse in this experimental setting. Therefore, the proposed RQCA module can maintain effective parameter updates during training, thereby supporting the trainability of the hybrid quantum-classical architecture in the tested aircraft skin defect recognition task.
Regarding the observed gradient stability, it can be explained from two architectural design aspects:
1. Low-dimensional bottleneck and shallow circuit topology. Through the pre-introduced classical global average pooling and feature bottleneck mechanisms, the latent feature vectors entering the variational quantum circuit are strictly restricted to 10 dimensions. This design not only compresses the quantum operation space to 10 qubits but also, guided by ablation studies, adopts a topological architecture based on a single-layer strongly entangling circuit. This dimensionality reduction and shallow topology help reduce the risk of unstable gradient behavior caused by increased latent dimensionality or excessive circuit depth.
2. Residual recalibration topology for stable gradient propagation. The RQCA module adopts a residual recalibration topology, which helps preserve the stable transmission of deep visual feature flows during forward propagation. During backpropagation, this structure provides an identity-mapping-based residual path for error gradients. This architecture-level skip connection can mitigate gradient attenuation during hybrid module optimization, thereby supporting stable parameter updates in the tested training setting.
4.6.2. Ablation Study and Impact Analysis of Circuit Depth on Nonlinear Recalibration Capability
The number of entangling layers directly affects the nonlinear recalibration capacity of the simulated VQC module and the complexity of the optimization landscape. To confirm that the circuit depth in this paper is the optimal configuration, systematic ablation experiments regarding
were conducted on the validation set. The comparisons of final validation accuracy and parameter counts under different depth configurations are shown in
Table 5 and
Figure 15.
From the experimental results, it can be seen that when the number of entangling layers is L = 1, HQCA-Net achieves a good balance between feature recalibration capability and optimization difficulty, with the validation accuracy peaking at 97.93%. This indicates that, for aircraft skin defect recognition, a single-layer strongly entangling variational quantum circuit already provides sufficient compact nonlinear recalibration capability under the tested dataset and model configuration, without requiring additional circuit-depth stacking. Increasing the circuit depth may introduce more trainable parameters and optimization complexity, which does not necessarily lead to further performance improvement in this task.
However, when the circuit depth is further increased to , the classification performance of the model does not improve with the increase in quantum parameters from 30 to 120. Instead, it exhibits significant degradation, with the validation accuracy retreating to 96.94%, 95.77%, and 96.94%, respectively. This phenomenon profoundly indicates that in industrial visual diagnosis tasks, increasing the circuit depth and parameter count may lead to over-parameterization in this task. An overly deep variational circuit not only increases the non-convexity of the optimization landscape, exacerbating overfitting, but the surplus fitting capacity also easily causes the model to overfit common high-frequency background noise in industrial images. This subsequently triggers oscillations in the late training stages (), ultimately damaging the model’s generalization performance on the test set.
In summary, the results suggest that L = 1 is an appropriate configuration for the quantum channel attention module. With only 30 trainable quantum parameters, this design enables effective recalibration of high-dimensional features while reducing the risk of overfitting and parameter redundancy at the model-design level. It achieves a favorable balance between detection accuracy and trainability in resource-constrained aircraft skin defect recognition scenarios.
4.7. Model Complexity and Practical Feasibility Analysis
In practical aircraft skin defect recognition tasks, lightweight visual models are expected to maintain a favorable balance between recognition performance and computational cost. To further evaluate the lightweight property and practical feasibility of the proposed HQCA-Net, this section provides a model-level complexity comparison with ResNet-18 and classical attention baselines, including SE, ECA, and SimAM. The comparison focuses on total parameter count, attention-module parameter increment, and Multiply-Accumulate Operations (MACs) under a standard input resolution of 224 × 224. It should be noted that these metrics reflect model-level computational complexity rather than measured inference latency, memory footprint, or energy consumption on embedded hardware. The results are presented in
Table 6.
4.7.1. Evaluation of Model-Level Computational Burden and Parameter Reduction
As shown in
Table 6, all compared models maintain the same model-level MACs of 1.823 G under the standard 224 × 224 input setting. This is because the attention modules are inserted after the final convolutional stage, where the spatial resolution of the feature map has already been substantially reduced. Therefore, the additional computational cost introduced by these attention modules is negligible compared with the convolutional operations in the ResNet-18 backbone.
In terms of parameter count, the SE module increases the model size by 32.76 K parameters due to its fully connected excitation structure. In contrast, the proposed RQCA module introduces 10.76 K additional parameters, including the classical projection and reconstruction layers as well as 30 trainable quantum-gate parameters in the simulated VQC. Therefore, compared with SE-ResNet18, HQCA-Net reduces the attention-module parameter increment by approximately 67.1% while achieving higher classification accuracy and a lower false positive rate in the tested aircraft skin defect recognition task.
It should be noted that the MACs reported in this section reflect model-level computational complexity rather than measured inference latency on physical devices. Since no embedded hardware benchmarking was conducted in this study, the actual runtime, memory usage, and power consumption of HQCA-Net on edge platforms still require further experimental verification.
4.7.2. Performance–Complexity Trade-Off
The comparison in
Table 5 indicates that different attention mechanisms present different trade-offs between parameter overhead and recognition performance. ECA and SimAM introduce almost no additional parameters, but their recognition accuracy and false positive rate are lower than those of HQCA-Net in this experiment. SE improves the baseline model but introduces a larger parameter increment than the proposed RQCA module. HQCA-Net achieves the highest accuracy of 97.93% and the lowest FPR of 0.49% among the compared models, while keeping the total parameter count at 11.190 M.
These results suggest that the proposed simulated VQC-based attention module can provide a favorable balance between model complexity and recognition reliability. Rather than substantially increasing the parameter scale, RQCA performs channel feature recalibration through a compact 10-dimensional bottleneck and a 10-qubit simulated variational quantum circuit. In this context, the simulated VQC should be understood as a compact container for nonlinear channel interactions, rather than as evidence of formal quantum advantage. This design allows HQCA-Net to improve false-alarm suppression and classification performance with a moderate parameter increment.
In summary, the model complexity analysis shows that HQCA-Net achieves improved recognition performance and reliability with limited additional parameter overhead. This finding should be interpreted as model-level evidence of parameter-efficient nonlinear feature recalibration, not as proof of quantum computational superiority or hardware-level quantum acceleration. However, the present analysis is restricted to parameter count and MACs. Future work should further evaluate the proposed model on practical embedded computing platforms to measure actual inference latency, memory consumption, and energy efficiency.