An Adaptive Attention DropBlock Framework for Real-Time Cross-Domain Defect Classification

Pasupuleti, Shailaja; Krishnamoorthy, Ramalakshmi; Gunasekaran, Hemalatha

doi:10.3390/ai7020056

Open AccessArticle

An Adaptive Attention DropBlock Framework for Real-Time Cross-Domain Defect Classification

by

Shailaja Pasupuleti

¹

,

Ramalakshmi Krishnamoorthy

^1,*

and

Hemalatha Gunasekaran

^2,*

¹

AU–Centre of Excellence, Computer Vision, Alliance School of Advanced Computing, Alliance University, Bangalore 562106, India

²

College of Computing and Information Sciences, University of Technology and Applied Sciences, Ibri 516, Oman

^*

Authors to whom correspondence should be addressed.

AI 2026, 7(2), 56; https://doi.org/10.3390/ai7020056

Submission received: 18 November 2025 / Revised: 24 January 2026 / Accepted: 25 January 2026 / Published: 3 February 2026

(This article belongs to the Topic AI and Data-Driven Advancements in Industry 4.0, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

The categorization of real-time defects in heterogeneous domains is a long-standing challenge in the field of industrial visual inspection systems, primarily due to significant visual variations and the lack of labelled information in real-world inspection settings. This work presents the Adaptive Attention DropBlock (AADB) framework, a lightweight deep learning framework that was developed to promote cross-domain defect detection using attention-guided regularization. The proposed architecture integrates the Convolutional Block Attention Module (CBAM) and an organized DropBlock-based regularization scheme, creating a unified and robust framework. Although CBAM-based approaches improve localization of defect-related areas and traditional DropBlock provides a generic spatial regularization, neither of them alone is specifically designed to reduce domain overfitting. To address this limitation, AADB combines attention-directed feature refinement with a progressive, transfer-aware dropout policy that promotes the learning of domain-invariant representations. The proposed model is built on a MobileNetV2 base and trained through a two-phase transfer learning regime, where the first phase consists of pretraining on a source domain and the second phase consists of adaptation to a visually dissimilar target domain with constrained supervision. The overall analysis of a metal surface defect dataset (source domain) and an aircraft surface defect dataset (target domain) shows that AADB outperforms CBAM-only, DropBlock-only, and conventional MobileNetV2 models, with an overall accuracy of 91.06%, a macro-F1 of 0.912, and a Cohen’s k of 0.866. Improved feature separability and localization of error are further described by qualitative analyses using Principal Component Analysis (PCA) and Grad-CAM. Overall, the framework provides a practical, interpretable, and edge-deployable solution to the classification of cross-domain defects in the industrial inspection setting.

Keywords:

cross-domain classification; CBAM; MobileNetV2; transfer learning; real-time image processing

1. Introduction

Automated visual inspection is a critical factor in the safety of operations [1]. In the industrial field, particularly in high-stakes aerospace, metallurgy, and manufacturing sectors, where cracking, denting, and corrosion are among the major surface-level causes of significant maintenance time loss and other structural failures [2]. Timely detection and correction can ensure that defects are identified and, when properly addressed, reduce downtime, securing airworthiness [3]. Manual inspection protocols, despite their usefulness, are highly reliant on human labour and can readily give way to subjective judgments, uneven results, and lower rates of inspection [4]. As a result, deep learning-based systems, especially CNNs, have become the standard for automated visual inspection [1].

CNNs are superior to conventional feature engineering models since they have the potential to learn layered features of raw image data [5]. They are, however, known to decline considerably in cross-domain deployments [6]. The common variations in surface texture, lighting, curvature, and defect morphology cause domain shift and result in CNNs misclassify novel visual patterns because of a shift in the distribution of learned features [7]. This difficulty is even more acute in industrial cases when the labelled data of the target domain are either limited or absent due to factors such as proprietary issues, annotation time or cost, or low defect prevalence [8].

Despite the impressive advances in deep learning-based defect detectors, certain challenges are yet to be addressed during implementation in real-world settings. The majority of the literature is focused on lightweight architectures, which do not explicitly implement mechanisms to prevent domain shift or employ more advanced domain-adaptation algorithms, which are too expensive to run in real-time or on edge devices, thereby restricting their relevance to real-world applications in both real-time and edge-based inspection systems. Though the attention-based models enhance defect-relevant region localization, they tend to overfit to domain-specific visual features. Regularization, e.g., DropBlock, is often intended to perform single-domain robustness and is not optimized to be cross-domain transferable. As a result, the deployed methods often cannot be generalized to visually dissimilar inspection settings, especially in instances of limited annotated target-domain data. Specifically, the existing techniques have three main weaknesses: (i) lightweight convolutional neural networks do not have explicit mechanisms to mitigate domain shift between visually dissimilar inspection conditions. (ii) Attention-based models excel at defect localization but tend to overfit to domain-specific visual features and (iii) existent regularization techniques are not domain-aware, and are instead optimized for single-domain robustness. This work fills these gaps and combines the refinement of features guided by attention and progressive and transfer-aware spatial regularization within a lightweight backbone of MobileNetV2.

We aim to combine the superior hierarchical feature learning ability of neural network architectures along with attention mechanisms [9] that have recently emerged in the AI landscape. We achieve this by suggesting a new classification framework which is based on the MobileNetV2 backbone, further enhanced with two synergistic components:

The Convolutional Block Attention Module (CBAM): This module sequentially applies spatial and channel-wise attention, directing the model’s focus toward defect-relevant regions such as micro-cracks or corrosion streaks [10].
The Progressive Transfer DropBlock (TDropBlock): a novel regularization module that generates attention-guided spatial masks encouraging the learning of semantically diverse and domain-transferable features by selectively suppressing overactive regions during training [11].

Novelty and Key Contributions

Novel architectural fusion: first integration of CBAM with a progressive, transfer-aware dropout mechanism (TDropBlock) for enhanced attention and regularization under domain-shifted conditions.
Two-phase training protocol: combines source-domain pretraining with target-domain fine-tuning using minimal supervision.
Comprehensive evaluation: validated under zero-shot and fine-tuned settings, with ablation studies, PCA visualizations, and confusion-matrix analyses.
Deployable design: achieves high accuracy while remaining lightweight and edge-compatible, suitable for real-time industrial inspection systems.

The rest of this paper has the following structure. Section 2 includes the overview of available literature on defect detection, attention-based models, and cross-domain learning. Section 3 also presents the framework that is proposed where the network architecture and the training strategy are outlined. Section 4 gives the experimental set-up and quantitative performance analysis with Section 5 giving qualitative analysis to further analyze the results, ablation analysis, discussion, and conclusion.

2. Related Work

2.1. Architecture

The last few years have seen dramatic progress in the field of automated defect detection with lightweight CNNs, especially in real-time and resource-limited applications in industry. Among them, MobileNetV2 was popular due to its inverted residual architecture, depthwise separable convolutions, and providing a good trade-off between speed and performance. It was shown by Zhang et al. to be practical in the inspection of weld surfaces, with high accuracy even in cases when it is mounted on edge devices [12]. Liu et al. further embedded attention mechanisms into MobileNetV2 for aluminum surface evaluation, boosting robustness under lighting variations [13]. Similar studies [12,14] also highlight that MobileNetV2 adapts well across various defect types and hardware setups. The flexibility of MobileNetV2 toward modalities, defects, and hardware motivates its use for our architecture.

Attention models, particularly CBAM, have been found to be useful in improving feature selection at both the channel and spatial levels. Huang et al. applied CBAM to a multi-scale CNN for weld defect detection [14]. Pan et al. showed its utility in rolling bearing classification under noisy loads, reporting enhanced model stability [15]. Many works similar in spirit [10,14,16] show the effectiveness of CBAM. Yet, attention alone does not address model overfitting, especially under domain shift or low data conditions.

To place the suggested methodology into the context of the available literature, following Table 1 provides a brief sampling of previous studies that are closely related to the current investigation. Instead of making a wide or metric-based comparison, the table focuses on the architectural choice taken by the various researchers, including the choice of backbone networks, the inclusion of attention mechanisms, and the use of regularisation methods, alongside the restrictions mentioned by the corresponding authors. The use of such a descriptive comparison is caused by the fact that most of the existing methods are tested on heterogeneous datasets and under experimental conditions, which makes the direct numerical comparison between studies difficult.

2.2. Regularization

Traditional dropout, while reducing co-adaptation, lacks spatial awareness. DropBlock [11], Tensor Dropout [19], and Checkerboard Dropout [20] each apply structured spatial masking for CNN regularization. However, their fixed masking patterns still limit adaptability. Adaptive approaches such as Auto Dropout [11] learn dropout masks dynamically, while groupwise Dynamic Dropout [17] adjusts dropout probabilities based on semantic density. However, most existing dropout techniques ignore inter-class level dependencies and domain-specific challenges especially in visual inspection settings where classes may differ subtly but critically.

Our proposed Progressive Transfer DropBlock (TDropBlock) addresses this issue by generating class-sensitive, attention-guided masks that evolve across different network layers, promoting transferable yet discriminative feature representations.

2.3. Cross-Domain Methods

In cross-domain contexts, another key challenge is the distribution mismatch between source and target datasets. Domain adversarial training methods like Domain Adversarial Transfer Network (DATN) [21] and Domain Invariant Region Proposal Networks (DIR) [22] have attempted to align feature spaces using gradient reversal techniques. Robust discriminative learning [23] and double-consistency networks have shown strong performance in low-label scenarios; these have helped to minimize the effects of negative transfer. Similarly, domain-adaptive intelligence techniques [18] and domain-transferability-based methods [24] have also been studied to improve cross-domain robustness.

2.4. Training Strategies

Although many of these frameworks still lack spatial regularization and require heavy computation, they are difficult to deploy on edge-based industrial systems. To overcome this data scarcity and mitigate domain shift, two-phase transfer learning has been found to be a practical and widely used solution. Tang and Xie used pretraining on concrete defect data followed by fine-tuning on real-world structures, achieving performance gains even with limited target-domain annotations [25]. Dual-domain adaptation was used for tire defect detection, effectively bridging both marginal and conditional distribution gaps [26]. Further enhancements using discrepancy minimization [27], partial-label mappings [28], and reconstruction-based adaptation [29] show that transfer learning is significantly more effective when paired with mechanisms that preserve spatial semantic structures.

With these developments, there still exist significant gaps. There are very few works that combine lightweight architectures, attention and adaptive dropout into one pipeline that can be deployed to the edge. Mechanisms of dropout remains quite faithful to semantic structure, as they do not learn class-conditional variation in defect structure. Besides, the majority of domain adaptation approaches consider regularization and transfer as unrelated, without learning features synergistically. To address these gaps, we present a new classification paradigm that builds on MobileNetV2 and adds CBAM and Progressive Transfer DropBlock block module. By coupling two-phase attention-based regularisation our model achieves spatially diverse, class-sensitive representations, which are domain generalizable.

3. Methodology

3.1. Overview of the Proposed Framework

In order to deal with the problem of cross-domain defect classification in industrial settings, we introduce a lightweight and generalizable deep-learning framework that makes use of MobileNetV2 and is enhanced with two synergistic modules namely, Convolutional Block Attention Module (CBAM) and Progressive Transfer DropBlock (TDropBlock). The framework is specifically designed to avoid performance drops due to domain shifts, particularly in cases where the target domain has limited labelled data.

CBAM selectively concentrates attention on the channel and spatial domains of a network, focusing on defect-salient regions to enhance the ability of the network to detect subtle patterns like micro-cracks or diffuse corrosion.

Our regularization approach, TDropBlock, builds on the traditional spatial dropout methodology and gradually introduces an attention-directed suppression on a network at progressively deeper levels of the network. In particular, it takes inverted attention masks based on intermediate feature activations and masks out the most salient parts, causing the model to explore redundant spatial paths. This masking, which is class-sensitive and dynamic encourages diversity of features and eliminates overfitting to domain-specific cues, which is crucial in improving generalisation.

The framework in Figure 1 demonstrates source-domain pretraining followed by target-domain fine-tuning for cross-domain defect classification. The training process is built into two consecutive stages to resemble a realistic deployment scenario:

Phase 1—Pretraining: Initially the model is trained on a labeled and balanced metal surface dataset comprising visually homogeneous defects.
Phase 2—Fine-tuning: The pretrained model is adapted to a visually dissimilar aircraft defect dataset using transfer learning under low-supervision settings.

The framework demonstrates source-domain pretraining followed by target-domain fine-tuning for cross-domain defect classification. This two-phase training pipeline not only enables the model to learn generalizable defect representations from a well-labeled source but also encourages robust adaptation to new domains with minimal data and annotation effort. The proposed framework is first trained on the metal surface dataset to learn generic defect-related features and is subsequently fine-tuned on the aircraft surface dataset. This training strategy enables effective knowledge transfer from a controlled industrial domain to a more complex and unconstrained real-world domain.

3.2. Architecture Design

3.2.1. Backbone: MobileNetV2

To achieve a balance between representational capacity and computational efficiency, an essential requirement in industrial inspection and edge-deployment situations, MobileNetV2 was chosen as the backbone architecture [30]. MobileNetV2 also uses depth-wise separable convolutions and inverted residual blocks to significantly reduce the computational cost compared with heavier convolutional networks, while preserving the ability to learn discriminative feature representations [31]. This efficiency is particularly relevant in the context of cross-domain defect classification, as the proposed framework incorporates additional modules for attention and transfer-aware regularization. A lightweight backbone ensures that the overall model complexity remains manageable and that deployment feasible in resource-constrained industrial environments, such as embedded inspection systems or on-site monitoring platforms [31]. Moreover, MobileNetV2 has demonstrated consistent performance across a range of visual inspection tasks, making it a practical and reliable choice for studying the effects of attention mechanisms and progressive regularization under domain-shifted conditions [31]. Based on these observations, MobileNetV2 is adopted as the backbone and modified by eliminating the initial classification head to enable integration of the suggested attention (CBAM) and regularisation (TDropBlock) modules.

3.2.2. Convolutional Block Attention Module (CBAM)

To enhance the localization of defect-relevant regions, CBAM is inserted after the final convolutional block of MobileNetV2. This module sequentially applies:

Channel Attention: Emphasizes salient features by processing global average and max-pooled vectors through a shared multi-layer perceptron (MLP).
Spatial Attention: Applies a 2D convolution over pooled channel features to highlight informative spatial regions.

This dual attention mechanism strengthens the network’s capability to extract fine-grained, domain-relevant defect features, improving robustness in both the source and target domains.

3.2.3. Regularization Module: Progressive Transfer DropBlock (TDropBlock)

In our proposed system, a novel regularization mechanism termed Progressive Transfer DropBlock (TDropBlock) has been introduced, which is designed to improve generalization across domain boundaries. Progressive TDropBlock differs from existing adaptive dropout methods such as AutoDropout and group-wise dynamic dropout in both its objectives and design. While prior methods adapt dropout rates or channel groups within a single domain, primarily to reduce overfitting, Progressive TDropBlock is explicitly formulated for cross-domain transfer learning. It employs spatially structured block-wise dropout with a progressively increasing drop probabilities during fine-tuning, allowing the model to gradually suppress source-domain-specific spatial patterns while promoting domain-invariant representations. This transfer-stage-aware regularization strategy enables more stable and robust adaptation across domains, which is not addressed by existing adaptive dropout approaches.

Further, unlike conventional dropout, which deactivates neurons uniformly at random, TDropBlock generates inverted attention masks through lightweight depth-wise convolutions. These saliency-guided masks are applied in a depth-aware fashion, which gradually increases dropout strength in deeper layers. This approach not only simulates an occlusion of dominant activations, but also encourages the network to explore alternative discriminative pathways, ultimately reducing overfitting and introducing class-sensitive spatial regularization. The model enables learning redundant but transferable features that are vital for domain adaptation. It does this by progressively masking high-activation regions. Despite this added complexity, TDropBlock introduces negligible computational overhead, making it suitable for real-time industrial deployment. TDropBlock extends standard DropBlock by progressively increasing the regularization effect during training. The key idea is to suppress high activation spatial blocks in the feature map using an inverted attention mask. This mask is computed from CBAM’s attention maps, so low attention regions are retained while high attention areas are suppressed.

Algorithm 1 summarizes the key steps of the proposed Progressive Transfer DropBlock (TDropBlock) regularization strategy.

Algorithm 1 Progressive Transfer DropBlock (TDropBlock)

Require: Feature map F, CBAM attention map A, epoch e
Ensure: Regularized feature map

F^{'}

1:: Compute block size $b (e) \leftarrow b_{0} + α \cdot e$
2:: Compute drop probability $p (e) \leftarrow p_{0} + β \cdot e$
3:: Invert CBAM attention to obtain suppression mask M
4:: Randomly drop $p (e)$ proportion of $b (e) \times b (e)$ regions guided by M
5:: Apply mask: $F^{'} \leftarrow F ⊙ M$
6:: return $F^{'}$

The block size b and drop probability p increase linearly with the training epoch e:

b (e) = b_{0} + α \cdot e, p (e) = p_{0} + β \cdot e

(1)

This encourages the network to explore alternate discriminative regions and avoid overfitting to high saliency zones in the source domain. The linear growth strategy was selected for its simplicity, stability, and predictable regularization behavior during fine-tuning, allowing gradual adaptation without introducing abrupt changes that could destabilize training. While more complex schedules are possible, empirical validation showed that linear progression provides a reliable balance between performance and training stability.

In Equation (1), the initial block size

b_{0}

and drop probability

p_{0}

control the strength of regularization at the beginning of fine-tuning, ensuring stable knowledge transfer from the source domain without excessive feature suppression. The growth rates

α

and

β

determine how rapidly spatial occlusion and stochastic feature dropping is intensified across training epochs. As training progresses, the gradual increase in both block size and drop probability enforces stronger regularization, discouraging over-reliance on source-domain–specific high-saliency regions and encouraging the learning of more robust and domain-invariant representations under domain shift. The hyperparameters

b_{0}

,

p_{0}

,

α

, and

β

were selected via a grid search on the validation set, where candidate values were evaluated to balance regularization strength and classification performance.

3.3. Dataset Description

To evaluate the cross-domain generalization capability of the proposed framework, experiments were conducted using two visually and contextually diverse datasets: a metal surface defect dataset as the source domain and an aircraft surface defect dataset as the target domain.

Source Domain–Metal Surface Dataset: The metal surface dataset consists of a total of 1104 images spanning three defect categories, namely crazing (276 images), pitted (552 images), and rolled (276 images). The images are acquired under controlled industrial inspection settings, with relatively uniform illumination, planar surfaces, and minimal background clutter. Due to its structured appearance and limited intra-class variability, this dataset serves as a suitable source domain for initial feature learning and pretraining.
Target Domain–Aircraft Surface Dataset: The aircraft surface dataset contains 11,121 images annotated into three defect classes: crack, dent, and rust, which are semantically aligned with the defect taxonomy of the metal dataset. The dataset is partitioned into training, validation, and testing subsets. The training set consists of 2314 crack images, 2648 dent images, and 2822 rust images. The validation set contains 496 crack, 567 dent, and 605 rust images, while the test set includes 496 crack, 568 dent, and 605 rust images. Compared to the source domain, this dataset exhibits significantly higher intra-class variability due to complex surface curvature, reflections, diverse lighting conditions, and irregular defect morphology. These characteristics closely resemble real-world aircraft maintenance and inspection scenarios.

Dataset splits are predefined using approximately an 80% train, 10% test, 10% val stratified split across training, validation and testing sets to ensure consistent evaluation across experiments, and class-wise distributions are explicitly reported to maintain transparency and reproducibility. Figure 2 presents representative crack, dent, and rust defects from the aircraft surface dataset. Figure 3 illustrates sample crazing, pitted, and rolled defects from the metal surface dataset.

3.4. Data Augmentation and Qualitative Analysis Tools

To ensure input consistency and enhance the model generalization, images are resized to a fixed dimension of 224 × 224 × 3 and normalized to the [0, 1] range. During training, real-time data augmentation is applied to introduce variation and reduce overfitting.

The augmentation strategies include:

Random horizontal flipping.
Random rotation within $\pm 15^{\circ}$ .
Zoom augmentation up to 20%.
Width and height shift up to ±10%.

The augmentations assist in modelling actual real-life changes in imaging conditions, including camera angle, distance, and illumination. Synthetic class balancing is not used because the datasets are originally balanced. All changes are done through the tf. image API of TensorFlow in the data pipeline to make the pipeline efficient and reproducible.

Along with quantitative performance measures, the Principal Component Analysis (PCA) is used as a qualitative measure to investigate the structure of the learned feature space. PCA allows conducting a visual evaluation of feature separability and alignment between the source and target domains before and after transfer learning by projecting high-dimensional features into a lower-dimensional representation. In order to enhance the interpretability of the proposed model and to examine its decision-making behavior, the Gradient-weighted Class Activation Mapping (Grad-CAM) is employed to come up with visual explanations of the networks predictions. Grad-CAM identifies the areas in the input images that are class-discriminative and therefore, we can look at whether or not the model is concentrating on the areas of defects or whether it is strained by spurious backgrounds.

4. Experimental Results

4.1. Experimental Configuration

Entire experimentation and evaluation were done in TensorFlow 2.13 and executed on machine equipped with an NVIDIA RTX GPU (16 GB VRAM). Both metal and aircraft datasets were preprocessed identically, resized to 224 × 224 × 3 pixels, and split using an 80:10:10 ratio for training, validation, and testing.

Training Parameters:

Batch Size: 32
Loss Function: Categorical Cross-Entropy
Input Shape: 224 × 224 × 3
Classifier Head: Global Average Pooling followed by a Dense Softmax layer (3-class output)

Training Callbacks:

EarlyStopping with patience of 5 epochs
ReduceLROnPlateau (factor= 0.5, patience = 3)
ModelCheckpoint for saving the best performing model based on validation loss

To ensure reproducibility and transparency, all code, preprocessed datasets, and trained model checkpoints have been archived and are intended for public release upon publication.

4.2. Evaluation Metrics

A wide range of assessment metrics are used to thoroughly evaluate both in-domain classification performance and cross-domain generalization. These are captured separately for both zero-shot evaluation and post-transfer fine-tuning, showing various aspects of predictive reliability.

Primary Metrics:

Accuracy: Represents the proportion of total correct predictions. While commonly reported, it may obscure model weaknesses on minority classes. It is defined as:

$Accuracy = \frac{\sum_{i = 1}^{C} T P_{i}}{N}$

(2)

where $T P_{i}$ denotes the number of true positives for class i, C is the number of classes (here $C = 3$ ), and N is the total number of samples.
F1-Score: Represents the harmonic mean of precision and recall, making it effective for evaluating performance on hard-to-separate or ambiguous defect categories:

${F 1}_{i} = \frac{2 \times {Precision}_{i} \times {Recall}_{i}}{{Precision}_{i} + {Recall}_{i}}$

(3)

where

${Precision}_{i} = \frac{T P_{i}}{T P_{i} + F P_{i}}, {Recall}_{i} = \frac{T P_{i}}{T P_{i} + F N_{i}}$

(4)
Macro F1-Score:Computes the F1-score independently for each class and then averages them, treating all classes equally and highlighting performance on underrepresented categories:

$Macro - F 1 = \frac{1}{C} \sum_{i = 1}^{C} {F 1}_{i}$

(5)
Cohen’s Kappa ( $κ$ ): Measures inter-class agreement corrected for chance, providing an additional view of prediction reliability under domain shift:

$κ = \frac{p_{o} - p_{e}}{1 - p_{e}}$

(6)

where $p_{o}$ is the observed agreement and $p_{e}$ is the expected agreement by chance, computed from the class-wise marginal probabilities.

These metrics provide a multi-faceted evaluation of both the model’s classification accuracy and its robustness to domain variability, which are essential in real-world deployment scenarios such as aircraft maintenance or industrial quality control Table 2 summarizes the quantitative comparison between zero-shot evaluation and transfer learning performance across standard classification metrics.

4.3. Zero-Shot Evaluation: Evidence of Domain Shift

To simulate real world deployment conditions without prior exposure to the target domain, a zero-shot evaluation performed wherein the model trained exclusively on the metal defect dataset was directly tested on aircraft defect images without any fine-tuning. As expected, performance deteriorated significantly due to domain shift:

Accuracy decreased to 35.40%
Macro F1-Score fell to 0.20
Cohen’s Kappa approached 0.02, indicating near random agreement

Interpretation: This class-wise analysis conducted shows a strong prediction bias toward the dent class, while crack and rust defects were either misclassified or completely overlooked, suggesting that the features learned during pretraining failed to generalize to the target domain. Such a collapse shows the non-transferability of representations across domains with distinct visual characteristics. Variations in surface texture, defect morphology, and lighting conditions between the metal and aircraft datasets likely disrupted internal feature alignment, thereby exacerbating the domain induced degradation.

The bar plot in Figure 4 illustrates the stark contrast between zero shot and transfer learning performance. While the pretrained model performs poorly on unseen aircraft data (Accuracy: 35.00%, Macro F1: 0.20, Cohen’s Kappa: 0.02), fine-tuning on the target domain significantly boosts all metrics, achieving 86.45% accuracy, 0.85 macro F1-score, and 0.78 Cohen’s Kappa highlighting the critical role of domain adaptation.

4.4. Transfer Learning Results

Following the application of transfer learning on the aircraft defect dataset, the model demonstrates a prominent performance boost, confirming the efficacy of domain adaptation. Fine-tuning the pretrained weights with a limited set of labeled aircraft images led to marked improvements across all key evaluation metrics.

Overall Performance:

Validation Accuracy: 86.45%
Macro F1-Score: 0.85
Weighted F1-Score: 0.85
Cohen’s Kappa: 0.78

These results reflect a high degree of prediction consistency and strong generalization to the target domain especially notable when compared with the sharp performance degradation observed in the zero-shot setting.

Combined with the substantial improvement in Macro F1-score, this evidence indicates that the model effectively recalibrates its internal representations through low-shot adaptation, aligning them with the semantic structure of the target domain.

The learning curves in Figure 5 illustrate a progressive decline in validation loss accompanied by a steady increase in accuracy during the fine tuning phase. This trend confirms the stability of the optimization process and the effective convergence of the model under transfer learning.

To better understand the model’s behavior across individual defect categories, Table 3 presents the per class precision, recall, and F1-scores.

Table 4 summarizes the overall performance metrics, reinforcing the effectiveness of the proposed framework following transfer learning.

4.5. Confusion Matrix and Visualizations

To better understand the model’s class wise performance and its ability to differentiate features under domain shift, we employ both confusion matrix analysis to visualize classification reliability and feature separability.

Following the application of transfer learning, the class-wise F1-score comparison shown in Figure 6 highlights the differential improvement across defect categories.

The confusion matrix generated on the aircraft test set Figure 7 provides a granular view of classification performance.

Rust class exhibits near-perfect classification, with 593 out of 605 instances correctly identified.
Crack and Dent classes show moderate confusion, particularly with crack instances misclassified as dent.

The higher classification accuracy observed for the rust class can be attributed to its distinctive visual characteristics. Rust defects typically exhibit consistent reddish-brown color patterns and textured corrosion regions, which provide strong chromatic and appearance cues for the network. This misclassification for crack and dent likely stems from visual similarities such as shared linear edge textures or surface discontinuities, which challenge the separability without high resolution localization.

Despite these challenges, the confusion matrix confirms that the model has learned robust interclass distinctions post adaptation.

5. Qualitative Interpretation

5.1. PCA: Feature Embedding Evolution

To analyse how the model’s internal representations evolve under domain shift, Principal Component Analysis (PCA) is employed as a qualitative interpretability tool on feature embeddings extracted from the penultimate layer (post–Global Average Pooling). These embeddings capture a high-level semantic encodings crucial for classification. Feature space visualization using PCA as shown in Figure 8 confirmed that post-adaptation embeddings became more compact and separable for each defect class, indicating effective domain alignment and improved cross-domain generalization.

(a) Zero-Shot Scenario:

Significant cluster overlap across crack, dent, and rust classes.
Absence of clear decision boundaries, indicating poor domain-invariant feature learning.
Evidence of feature collapse, a phenomenon typical of severe domain shift.

(b) Post Transfer Learning:

PCA reveals three well-separated clusters, each aligned with a defect class.
Increased intra-class compactness and improved inter-class separation.
Strong evidence that the model successfully adapts to the target domain.

Figure 6 additionally presents the class wise F1-score Comparision between zero-shot evaluation and post-transfer learning on the aircraft dataset. The most noticeable improvement is observed for defect classes such as cracks and corrosion-related patterns, which are highly sensitive to domain-specific texture variations, indicating that the proposed transfer-aware regularization is effective in improving cross-domain generalization.

To further examine the extent of domain shift and the need for adaptation, we performed a cross domain PCA on the penultimate layer embeddings obtained from the model trained solely on the metal dataset (i.e., before transfer learning). This projection includes features from both domains:

Blue points: Metal surface defects (source domain)
Red points: Aircraft surface defects (target domain)

PCA in this case is has been used as an interpretative tool to explore features representation in domains providing a representation of the underlying domain shift that is easy to understand by projecting high dimensional representations of both the target data set and the source data set into a shared low dimensional space. PCA visualization showed well-separated clusters after adaptation, confirming stronger domain alignment. The visible difference in source-domain bias of the learned features and hence when the fine-tuning decreases overlap the semantic alignment between the domains is increased before transfer learning. This advancement demonstrates that the proposed framework supports acquisition of domain invariant yet defect discriminative representations, which explains its high cross domain generalization performance.

Together considering, the confusion matrix, within domain PCA, and cross domain PCA provide compelling evidence of domain shift and the model’s successful adaptation through fine-tuning. These visualizations affirm the critical role of CBAM and Progressive TDropBlock in:

Better categorization of defects that would offer a more suitable classification at the class level when there is a domain shift.
Semantic separability of defects based on smaller categorical groups, where reliable semantic representations are based on smaller and structured feature embeddings.
An increased resistance to low levels of supervision in the target area would indicate increased generalization in the case of cross domain inspection.

5.2. Interpretability via Grad-CAM

To enhance model interpretability and explain behaviour of misclassification, Grad-CAM heatmaps (Figure 9) were regenerated for both correctly classified and misclassified samples, across various defect classes. Incorrectly predicted cases and high activation zones (red/yellow) completely aligned with true defect regions, which indicate that the model learned spatially meaningful features successfully. Grad CAM is used not only as a visualization tool but as an interpretability tool to investigate the effect of the proposed CBAM TDropBlock framework on feature attention, given cross-domain conditions. Through comparing activation maps of correct and incorrectly labeled samples, Grad-CAM will show whether the network concentrates on defect relevant areas or background domain-specific patterns. The analysis based on the proposed design gives qualitative evidence that the proposed design promotes more discriminative and transferable features learning, which contributes to the robustness of the model and its generalization of other visually dissimilar inspection fields. However, the misclassified samples frequently displayed scattered or distorted attention, concentrating solely on shadows or background textures instead of the cues that indicate localized defects. The visual evidence clarifies the ongoing confusion between classes that appear visually similar, such as cracks and dents. Furthermore, in certain incorrect instances, a partial focus on defect areas indicates that transfer learning has provided a beneficial inference bias. However, it may be essential to pursue extra fine-tuning or class-specific augmentation. In summary, these attention maps enhance the clarity and dependability of the proposed model within the context of real-world aircraft maintenance workflows.

5.3. Ablation Study

In order to quantitatively assess the individual and combined contributions of the Convolutional Block Attention Module (CBAM) and the Progressive Transfer DropBlock (TDropBlock), a comprehensive ablation study was performed and results were summarised in Table 5. The analysis showcases how each component and their combination contributes improvement in the model’s capability in domain shift generalization.

5.4. Key Observations

Variant A (CBAM + TDropBlock): variant ‘A’ achieved strongest results across all evaluation metrics, with values 91.06% accuracy, a 0.9122 Macro F1-Score, and a Cohen’s Kappa value of 0.866. These results clearly show that combining spatial attention along with adaptive dropout creates a very powerful synergy, especially considering situation when the model is tested across different domains.
Variant B (TDropBlock only): This particular variant, despite it offering a modest performance with gain in robustness but at the sametime falls short in discrimination capability. Its macro F1 Score (0.5906) and Kappa (0.445) indicate that the dropout alone, although the version is beneficial for regularization, but observed that, it is not sufficient enough for capturing defect-specific context.
Variant C (CBAM only): variant ‘c’ shows marked improvements over the baseline, substantiating the role of attention in class wise discrimination enhancement. The model highly benefits from refined localization of defect regions, improving feature saliency for subtle surface anomalies.
Variant D (Baseline - MobileNetV2 only): This variant is so far exhibits the lowest scores across all considered metrics, presenting the limited capacity of the backbone to handle domain shift without the support of attention or drop-out based regularization.

Overall, these results so far observed clearly demonstrate that both CBAM and TDropBlock contribute distinct yet complementary advantages. While CBAM improves spatial and channelwise feature localization, TDropBlock enforces structured regularization to promote generalizable learning. Their combination enables the model to not only differentiate between subtle classes but also maintain robustness under cross-domain settings.

Table 6 presents a comparison of lightweight architectures evaluated under identical training and evaluation conditions. The results show that the proposed framework provides consistent and substantial improvements for architectures with sufficient representational capacity, particularly MobileNetV2 [32] and NASNetMobile [33], which demonstrate clear gains across accuracy, macro-F1, and Cohen’s

κ

. In contrast, extremely compact architectures exhibit limited or inconsistent benefits. For instance, SqueezeNet [34] attains slightly higher accuracy and

κ

value in its baseline configuration as compared to its proposed variant (0.670 vs. 0.660, 0.494 vs. 0.481), this difference is marginal and is accompanied by an substantial increase in macro-F1 (0.572 vs. 0.638). Another observation is that both SqueezeNet configurations perform substantially worse than MobileNetV2 integrated with the proposed framework, which achieves markedly higher and more stable performance across all metrics. This behavior is consistent with the design objective of SqueezeNet to aggressively reduce parameters and model size, which inherently constrains representational capacity and limits the effective integration of additional modules. Consequently, SqueezeNet’s marginal accuracy advantage does not translate into robust or balanced performance, and model selection based on overall reliability rather than isolated accuracy favors architectures such as MobileNetV2 when combined with the proposed framework.

5.5. Discussion

Results from Ablation study (Figure 10) highlight the complementary roles of CBAM and TDropBlock in cross-domain defect classification: (i) CBAM enhances the location of defects by improving spatial attention and producing a higher macro F1-score; (ii) Whereas TDropBlock improves generalization by enforcing spatial regularization and increasing macro F1, Cohen’s Kappa; (iii) together, their combination (Variant A) delivers the best overall accuracy, confirming the synergy between attention and adaptive dropout.

MobileNetV2 enhanced with CBAM-TDropBlock improved accuracy from 35% (zero-shot) to 86% after transfer, while Cohen’s Kappa increased from 0.02 to 0.78, highlighting the benefit of limited target domain supervision. CBAM helped the network focus on subtle defect cues, such as crack edges and surface textures. In contrast, TDropBlock gradually suppressed dominant activations using inverted saliency masks, which encouraged the model to explore alternative regions and reduced overfitting. whereas TDropBlock progressively suppressed dominant activations via inverted saliency masks, encouraging exploration of alternate discriminative regions and reducing overfitting. Residual confusion between crack and dent arises from their similar elongated edges and low-contrast textures, adding class-specific or multi-scale attention heads may help disentangle them. TDropBlock adds negligible computational cost because it relies on depthwise convolutions, making the framework practical for real-time inspection on embedded devices such as drones or edge cameras. In addition, class-domain classification performance with different lightweight backbone evaluations shows comparable absolute accuracy, the proposed model consistently improves cross-domain robustness across all the tested architectures, indicating that the contribution lies in attention-guided regularization rather than backbone capacity. Although the method performs reliably under moderate domain differences, performance may degrade under extreme shifts (e.g., visible-to-thermal transitions or severe imbalance). Future extensions could incorporate unsupervised adaptation, adversarial learning, or domain-invariant constraints to improve robustness. In this context, a direct quantitative comparison with the prior studies is limited, as the aircraft defects dataset utilized in this study is not publicly available.

5.6. Conclusions and Future Work

This study proposed an Adaptive Attention DropBlock framework for domain-adaptive defect classification. By combining CBAM-based spatial attention with the novel TDropBlock regularizer inside a lightweight MobileNetV2 backbone, the model achieved large gains in cross-domain accuracy from 35% to 86% and reliability with Cohen’s Kappa value 0.78. Visualization through PCA and Grad-CAM confirmed enhanced feature separation and focused defect localization.The framework uniquely integrates adaptive attention-guided dropout with progressive depth scheduling, an approach not previously explored for real-time defect classification. This design increases spatial diversity, limits overfitting, and supports interpretable, edge-deployable performance. Remaining challenges include disambiguating visually similar defect types and extending adaptability to wider modality gaps. Future work will pursue unsupervised or adversarial adaptation and adaptive regularization to strengthen domain invariance and broaden industrial applicability. One could also investigate other schedules instead of linear growth rate in Algorithm 1.

Author Contributions

S.P. carried out the conceptualization, methodology design, software implementation, experimentation, data curation, and preparation of the original manuscript draft. R.K. supervised the study, validated the results, provided critical review and editing of the manuscript, and guided the overall research direction. H.G. contributed to in visualisation of experimental results, qualitative analysis and critical review of manuscript for technical clarity and consistency. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

Acknowledgments

The authors thank the Alliance School of Advanced Computing, Alliance University, for research infrastructure and institutional support.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AADB	Adaptive Attention DropBlock
CBAM	Convolutional Block Attention Module
CNN	Convolutional Neural Network
TDropBlock	Progressive Transfer DropBlock
DL	Deep Learning
ML	Machine Learning
TL	Transfer Learning
PCA	Principal Component Analysis
F1	F1-Score
GPU	Graphics Processing Unit

References

Hütten, N.; Gomes, M.A.; Hölken, F.; Andricevic, K.; Meyes, R.; Meisen, T. Deep Learning for Automated Visual Inspection in Manufacturing and Maintenance: A Survey of Open-Access Papers. Appl. Syst. Innov. 2024, 7, 11. [Google Scholar] [CrossRef]
Kacprzynski, G.J. Sensor/Model Fusion for Adaptive Prognosis of Structural Corrosion Damage; Defense Technical Information Center: Fort Belvoir, VA, USA, 2006. [Google Scholar]
Ulus, Ö.; Davarcı, F.E.; Gültekin, E.E. Non-destructive testing methods commonly used in aviation. Int. J. Aeronaut. Astronaut. 2024, 5, 10–22. [Google Scholar] [CrossRef]
Li, X.; Li, F.; Yang, H.; Wang, P. Deep Learning-Enabled Visual Inspection of Gap Spacing in High-Precision Equipment: A Comparative Study. Machines 2025, 13, 74. [Google Scholar] [CrossRef]
Mienye, I.D.; Swart, T.G.; Obaido, G.; Jordan, M.; Ilono, P. Deep Convolutional Neural Networks: A Comprehensive Review. Preprints 2024. [Google Scholar] [CrossRef]
Zhang, S.; Su, L.; Li, K.; Li, F. A Semantic Transferable Feature Extraction Strategy for Cross-Domain Detection. IEEE Trans. Ind. Inform. 2025, 10, 7392–7402. [Google Scholar] [CrossRef]
Zhang, S.; Su, L.; Gu, J.; Li, K.; Wu, W.; Pecht, M. Category-level selective dual-adversarial network using significance-augmented unsupervised domain adaptation for surface defect detection. Expert Syst. Appl. 2023, 238, 121879. [Google Scholar] [CrossRef]
Gerschner, F.; Paul, J.; Schmid, L.; Barthel, N.; Gouromichos, V.; Schmid, F.; Atzmueller, M.; Theissler, A. Domain Transfer for Surface Defect Detection Using Few-Shot Learning on Scarce Data. In Proceedings of the IEEE International Conference on Industrial Informatics (INDIN), Lemgo, Germany, 18–20 July 2023; pp. 1–7. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef]
Ghiasi, G.; Lin, T.Y.; Le, Q.V. DropBlock: A Regularization Method for Convolutional Networks. arXiv 2018, arXiv:1810.12890. [Google Scholar] [CrossRef]
Ding, K.; Niu, Z.; Hui, J.; Zhou, X.; Chan, F.T.S. A Weld Surface Defect Recognition Method Based on Improved MobileNetV2 Algorithm. Mathematics 2022, 10, 3678. [Google Scholar] [CrossRef]
Liu, Z.; Cui, J.; Li, C.; Ding, S.; Xu, Q. Real-time Fabric Defect Detection based on Lightweight Convolutional Neural Network. In Proceedings of the ACM International Conference on Artificial Intelligence and Computer Engineering (ICAICE), Beijing, China, 23–25 October 2019; pp. 122–127. [Google Scholar] [CrossRef]
Huang, L.; Zhang, S.; Li, R.; Cai, X.; Zhang, S.; Cao, H. Weld defect detection based on improved multi-scale CNN with CBAM attention. In Proceedings of the International Conference on Intelligent Image Processing (ICIIP), Bucharest, Romania, 21–22 November 2023. [Google Scholar] [CrossRef]
Qin, H.; Pan, J.; Li, J.; Huang, F. Fault diagnosis method of rolling bearing based on CBAM_ResNet and ACON activation function. Appl. Sci. 2023, 13, 7593. [Google Scholar] [CrossRef]
Xie, X.; Xu, L.; LI, X.; Wang, B.; Wan, T. A high-effective multitask surface defect detection method based on CBAM and atrous convolution. J. Adv. Mech. Des. Syst. Manuf. 2022, 16, JAMDSM0063. [Google Scholar] [CrossRef]
Ke, Z.; Wen, Z.; Xie, W.; Wang, Y.; Shen, L. Group-wise dynamic dropout based on latent semantic variations. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11229–11236. [Google Scholar] [CrossRef]
Cao, X.; Wen, G.; Xie, J.; Guo, X.; Tang, B.; Chen, X. Domain-adaptive intelligence for fault diagnosis based on deep transfer learning from scientific test rigs to industrial applications. Neural Comput. Appl. 2021, 33, 16627–16644. [Google Scholar] [CrossRef]
Zeng, Y.; Dai, T.; Chen, B.; Xia, S.T. Correlation-based structural dropout for convolutional neural networks. Pattern Recognit. 2021, 116, 108117. [Google Scholar] [CrossRef]
Nguyen, K.B.; Choi, J.; Yang, J.S. Checkerboard dropout: A structured dropout with checkerboard pattern. IEEE Access 2022, 10, 76044–76054. [Google Scholar] [CrossRef]
Chen, Z.; He, G.; Li, J.; Liao, Y.; Gryllias, K.; Li, W. Domain adversarial transfer network for cross-domain fault diagnosis of rotary machinery. IEEE Trans. Instrum. Meas. 2020, 69, 8702–8712. [Google Scholar] [CrossRef]
Yang, X.; Wan, S.; Jin, P. Domain-Invariant Region Proposal Network for Cross-Domain Detection. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), London, UK, 6–10 July 2020; pp. 1–6. [Google Scholar] [CrossRef]
Pan, Y.; Ma, A.J.; Gao, Y.; Wang, J.; Lin, Y. Multi-scale Adversarial Cross-Domain Detection with Robust Discriminative Learning. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020; pp. 1324–1332. [Google Scholar] [CrossRef]
Shi, Y.; Deng, A.; Deng, M.; Li, J.; Xu, M.; Zhang, S. Domain Transferability-Based Deep Domain Generalization Method Towards Actual Fault Diagnosis Scenarios. IEEE Trans. Ind. Inform. 2023, 19, 7355–7366. [Google Scholar] [CrossRef]
Tang, H.; Xie, Y. Deep Transfer Learning for Connection Defect Identification in Prefabricated Structures. Struct. Health Monit. 2022, 22, 2128–2146. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, Y.; Jiang, Z.; Zheng, L.; Chen, J.; Lu, J. Tire Defect Detection by Dual-Domain Adaptation-Based Transfer Learning Strategy. IEEE Sens. J. 2022, 22, 18804–18814. [Google Scholar] [CrossRef]
Su, Z.; Zhang, J.; Tang, J.; Wang, Y.; Xu, H.; Zou, J.; Fan, S. A Novel Deep Transfer Learning Method with Inter-Domain Decision Discrepancy Minimization. Knowl. Based Syst. 2022, 259, 110065. [Google Scholar] [CrossRef]
Tan, S.; Wang, K.; Shi, H.; Song, B. A Novel Multiview Predictive Local Adversarial Network for Partial Transfer Learning in Cross-Domain Fault Diagnostics. IEEE Trans. Instrum. Meas. 2023, 72, 3504712. [Google Scholar] [CrossRef]
Guo, L.; Yu, Y.; Liu, Y.; Gao, H.; Chen, T. Reconstruction Domain Adaptation Transfer Network for Partial Transfer Learning of Machinery Fault Diagnostics. IEEE Trans. Instrum. Meas. 2022, 71, 3129213. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
Nimsarkar, K.R.; Sawwashere, S.; Sonekar, S.V.; Lanjewar, A.; Baig, M.M. Steel Plate Defect Detection and Classification Using Lightweight MobileNetV2 Model. In Proceedings of the IEEE International Conference on Intelligent Data Analytics for Industry and Education (IDICAIEI), Wardha, India, 29–30 November 2024; pp. 1–6. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning Transferable Architectures for Scalable Image Recognition. arXiv 2018, arXiv:1707.07012. [Google Scholar] [CrossRef]
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and <0.5MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]

Figure 1. Proposed two-phase transfer learning framework based on MobileNetV2 backbone integrating CBAM and TDropBlock modules.

Figure 2. Sample aircraft surface defect images showing representative crack, dent, and rust patterns.

Figure 3. Samples from the metal surface defect dataset showing crazing, pitted, and rolled patterns.

Figure 4. Zero-shot vs transfer learning performance across evaluation metrics.

Figure 5. Training and validation curves during the two-phase learning process.

Figure 6. Class-wise F1-score comparison between zero-shot and transfer learning on the aircraft dataset.

Figure 7. Confusion matrices on the aircraft dataset before and after transfer learning.

Figure 8. PCA projections of penultimate-layer features before and after transfer learning.

Figure 9. Grad-CAM visualizations for correctly and incorrectly classified aircraft defect samples.

Figure 10. Macro F1-score and Cohen’s Kappa across ablation variants. (A) Baseline MobileNetV2; (B) MobileNetV2 with CBAM only; (C) MobileNetV2 with TDropBlock only; (D) MobileNetV2 with combined CBAM and TDropBlock.

Table 1. Qualitative comparison of representative studies with respect to backbone architecture, attention mechanisms, regularization strategies, and their applicability to cross-domain industrial defect detection.

Study (Ref.)	Backbone	Attention Mechanism	Regularization/Dropout	Domain Adaptation
Zhang et al. [6]	MobileNetV2	No	No	No
Huang et al. [14]	Multiscale CNN	CBAM	No	No
Pan et al. [15]	CNN	CBAM	No	No
Ghiasi et al. [11]	CNN	No	DropBlock	No
Ke et al. [17]	CNN	No	Dynamic Dropout	No
Cao et al. [18]	CNN	No	No	Yes (Transfer Learning)

Table 2. Comparison of zero-shot and transfer learning performance across evaluation metrics.

Metric	Zero-Shot Eval.	Transfer Learn.
Accuracy (out of 100%)	35.4%	86.45%
Macro F1-Score	0.20	0.85
Weighted F1	0.20	0.85
Cohen’s Kappa	0.02	0.78

Table 3. Per-class classification report on the aircraft dataset (post-transfer learning).

Class	Precision	Recall	F1-Score
Crack	0.89	0.86	0.87
Dent	0.85	0.88	0.86
Rust	0.81	0.83	0.82

Table 4. Overall evaluation metrics after transfer learning.

Metric	Value
Validation Accuracy	86.45%
Macro F1-Score	0.85
Weighted F1-Score	0.85
Cohen’s Kappa	0.78

Table 5. Ablation study results across architectural variants.

Variant	CBAM + TDropBlock	Accuracy	Macro F1	Cohen’s Kappa
A (Full)	Yes + Yes	91.06%	0.9122	0.866
B	– + Yes	63.01%	0.5906	0.445
C	Yes + –	86.18%	0.8623	0.793
D (Baseline)	– + –	69.92%	0.6915	0.549

Table 6. Cross-domain performance comparison of lightweight backbones with baseline and proposed architectures. The proposed architecture corresponds to the CBAM–Progressive TDropBlock framework described in this work.

Model	Architecture	Accuracy	Macro-F1	Cohen’s $κ$
MobileNetV2	Baseline	0.699	0.692	0.549
MobileNetV2	Proposed	0.911	0.912	0.866
MobileNetV3-Small	Baseline	0.340	0.170	0.000
MobileNetV3-Small	Proposed	0.340	0.176	0.002
NASNetMobile	Baseline	0.820	0.816	0.726
NASNetMobile	Proposed	0.830	0.825	0.744
SqueezeNet	Baseline	0.670	0.573	0.494
SqueezeNet	Proposed	0.660	0.638	0.481

Note: Bold values indicate the best performance achieved for each metric within a given backbone configuration (baseline vs. proposed), and are provided solely to facilitate visual comparison.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pasupuleti, S.; Krishnamoorthy, R.; Gunasekaran, H. An Adaptive Attention DropBlock Framework for Real-Time Cross-Domain Defect Classification. AI 2026, 7, 56. https://doi.org/10.3390/ai7020056

AMA Style

Pasupuleti S, Krishnamoorthy R, Gunasekaran H. An Adaptive Attention DropBlock Framework for Real-Time Cross-Domain Defect Classification. AI. 2026; 7(2):56. https://doi.org/10.3390/ai7020056

Chicago/Turabian Style

Pasupuleti, Shailaja, Ramalakshmi Krishnamoorthy, and Hemalatha Gunasekaran. 2026. "An Adaptive Attention DropBlock Framework for Real-Time Cross-Domain Defect Classification" AI 7, no. 2: 56. https://doi.org/10.3390/ai7020056

APA Style

Pasupuleti, S., Krishnamoorthy, R., & Gunasekaran, H. (2026). An Adaptive Attention DropBlock Framework for Real-Time Cross-Domain Defect Classification. AI, 7(2), 56. https://doi.org/10.3390/ai7020056

Article Menu

An Adaptive Attention DropBlock Framework for Real-Time Cross-Domain Defect Classification

Abstract

1. Introduction

Novelty and Key Contributions

2. Related Work

2.1. Architecture

2.2. Regularization

2.3. Cross-Domain Methods

2.4. Training Strategies

3. Methodology

3.1. Overview of the Proposed Framework

3.2. Architecture Design

3.2.1. Backbone: MobileNetV2

3.2.2. Convolutional Block Attention Module (CBAM)

3.2.3. Regularization Module: Progressive Transfer DropBlock (TDropBlock)

3.3. Dataset Description

3.4. Data Augmentation and Qualitative Analysis Tools

4. Experimental Results

4.1. Experimental Configuration

4.2. Evaluation Metrics

4.3. Zero-Shot Evaluation: Evidence of Domain Shift

4.4. Transfer Learning Results

4.5. Confusion Matrix and Visualizations

5. Qualitative Interpretation

5.1. PCA: Feature Embedding Evolution

5.2. Interpretability via Grad-CAM

5.3. Ablation Study

5.4. Key Observations

5.5. Discussion

5.6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI