Next Article in Journal
SFE-DETR: An Enhanced Transformer-Based Face Detector for Small Target Faces in Open Complex Scenes
Next Article in Special Issue
Achieving Robotic Data Efficiency Through Machine-Centric FDCT Vision Processing
Previous Article in Journal
AFCLNet: An Attention and Feature-Consistency-Loss-Based Multi-Task Learning Network for Affective Matching Prediction in Music–Video Clips
Previous Article in Special Issue
Automated Remote Detection of Falls Using Direct Reconstruction of Optical Flow Principal Motion Parameters
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Advancing SAR Target Recognition Through Hierarchical Self-Supervised Learning with Multi-Task Pretext Training

Electrical & Computer Engineering Department, Tuskegee University, Tuskegee, AL 36088, USA
*
Author to whom correspondence should be addressed.
Sensors 2026, 26(1), 122; https://doi.org/10.3390/s26010122
Submission received: 28 October 2025 / Revised: 9 December 2025 / Accepted: 16 December 2025 / Published: 24 December 2025

Abstract

Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) systems face significant challenges due to limited labeled data availability and persistent domain gaps between synthetic and measured imagery. This paper presents a comprehensive self-supervised learning (SSL) framework that eliminates dependency on synthetic data while achieving state-of-the-art performance through multi-task pretext training and extensive downstream classifier evaluation. We systematically evaluate our SSL framework across diverse downstream classifiers spanning different computational paradigms and architectural families. Our study encompasses traditional machine learning approaches (SVM, Random Forest, XGBoost, Gradient Boosting), deep convolutional neural networks (ResNet, U-Net, MobileNet, EfficientNet), and a generative adversarial network. We conduct extensive experiments using the SAMPLE dataset with rigorous evaluation protocols. Results demonstrate that SSL significantly improves SAR ATR performance, with SVM achieving 99.63% accuracy, ResNet18 reaching 97.40% accuracy, and Random Forest demonstrating 99.26% accuracy. Our multi-task SSL framework employs nine carefully designed pretext tasks, including geometric invariance, signal robustness, and multi-scale analysis. Cross-validation experiments validate the generalizability and robustness of our findings. Rigorous comparison with SimCLR baseline validates that task-based SSL outperforms contrastive learning for SAR ATR. This work establishes a new paradigm for SAR ATR that leverages inherent radar data structure without synthetic augmentation, providing practical guidelines for deploying SSL-based SAR ATR systems and a foundation for future domain-specific self-supervised learning research in remote sensing applications.

1. Introduction

Synthetic Aperture Radar (SAR) systems have become indispensable tools for military surveillance, reconnaissance, and civilian monitoring applications because of their unique capability to provide high-resolution imagery regardless of weather conditions, cloud cover, or illumination. Unlike electro-optical (EO) sensors, SAR systems actively transmit electromagnetic signals and measure the backscattered energy, enabling consistent performance under diverse environmental conditions [1,2,3]. This all-weather capability makes SAR particularly valuable for modern defense systems and enables the rapid identification and classification of targets in complex operational environments [4,5].
The development of Automatic Target Recognition (ATR) systems for SAR imagery presents unique challenges that distinguish them from conventional computer vision tasks. SAR images exhibit “non-literal” characteristics where electromagnetic scattering properties dominate visual appearance, making human interpretation and traditional computer vision approaches less effective. The radar backscatter depends on complex interactions between electromagnetic waves and target geometry, material properties, and viewing angles. This results in imagery that can be fundamentally different from expectations in the optical domain [3].
A fundamental challenge in SAR ATR stems from the vast operating condition (OC) space that encompasses sensor parameters (frequency, polarization, resolution), target configurations (orientation, articulation), and environmental factors (weather, terrain, clutter) [6]. This multidimensional parameter space makes comprehensive data collection impractical, leading to the persistent problem of limited labeled training data for machine learning approaches [7]. Traditional datasets like MSTAR represent only a small fraction of the possible OC space, potentially limiting the generalizability of trained models [8].
To address data scarcity, researchers have extensively explored synthetic data generation using electromagnetic modeling and computational tools [7]. Sophisticated simulators such as RaySAR, CohRaS, and SARViz can generate realistic SAR imagery through ray tracing, full-wave electromagnetic modeling, and prediction of the center of scatter [9,10,11]. However, despite careful Computer-Aided Design (CAD) model truthing and parameter matching, persistent domain gaps between synthetic and measured imagery continue to limit ATR performance [3,12].
The SAMPLE dataset introduced by Lewis et al. represents a comprehensive effort to understand and quantify these domain gaps through carefully matched synthetic and measured SAR image pairs [3]. Their extensive experiments demonstrated that classification accuracy degrades significantly when synthetic data comprises more than 40% of the training set, even with meticulously truthed CAD models and matched collection parameters. This finding has highlighted the fundamental limitations of synthetic data approaches and motivated the exploration of alternative paradigms [3,12].
Traditional SAR ATR approaches are heavily based on supervised learning methods that require large amounts of labeled training data [13,14,15]. However, obtaining labeled SAR data presents significant challenges due to security considerations, expert annotation requirements, and the high cost of data collection campaigns. This data scarcity problem has motivated researchers to explore alternative learning paradigms, with self-supervised learning (SSL) emerging as a promising approach [15].
Self-supervised learning has emerged as a powerful paradigm for learning meaningful representations from unlabeled data across various domains. Recent advances in SSL have demonstrated remarkable success in computer vision through contrastive learning, masked autoencoding, and pretext task learning [16]. The core principle of SSL, the separation of supervision signals directly from the data structure without external labels, aligns well with the rich internal structure present in the SAR images [15,17]. Radar data contains inherent geometric relationships, electromagnetic scattering patterns, and coherent imaging properties that can be exploited for representation learning [16].
Despite the growing interest in SSL for SAR applications, there exists a significant gap in comprehensive evaluation frameworks that systematically assess the effectiveness of different SSL approaches across diverse downstream tasks and data availability scenarios. Most existing studies focus on specific model architectures or limited experimental settings, making it difficult to draw generalizable conclusions about the optimal deployment strategies for SSL-based SAR ATR systems [18].
This paper presents a comprehensive, improved self-supervised learning (SSL) framework specifically designed for synthetic aperture radar automatic target recognition (SAR ATR) applications. Our approach addresses the limitations of existing methods through several key innovations:
  • Multi-Task Pretext Learning Architecture: We develop a sophisticated SSL framework employing nine complementary pretext tasks specifically tailored to SAR imagery characteristics, including geometric transformations (rotations, flips), signal processing operations (denoising, blurring), and multi-scale analysis (zoom transformations).
  • Comprehensive Downstream Evaluation: We provide extensive comparison across diverse architectural families, including traditional machine learning approaches (SVM, XGBoost, Random Forest, Gradient Boosting) and modern deep learning architectures (ResNet18, U-Net, MobileNet variants, EfficientNet variants, and GAN-based classifiers) to demonstrate the versatility and effectiveness of our learned representations.
  • Elimination of Synthetic Data Dependency: Our framework demonstrates competitive performance using exclusively measured SAR data, directly addressing the fundamental domain gap problem that has limited previous approaches [3].
  • Operational Performance Analysis: We provide a comprehensive evaluation including timing analysis, false positive rate characterization, and computational efficiency metrics essential for operational deployment considerations.
  • Validation and Robustness: We perform a comprehensive analysis of performance across varying data availability scenarios (5% to 100% of training data), and cross-validation ensures reliable performance assessment.
The remainder of this paper is organized as follows. Section 2 provides a comprehensive review of related work in SAR ATR and self-supervised learning. Section 3 details our enhanced SSL framework, including architecture design, pretext tasks selection, and downstream evaluation methodology. Section 4 presents extensive experimental results and comparative evaluation. Section 5 discusses practical implications, limitations, and theoretical contributions as well as future research directions and broader impact considerations.

2. Related Work

2.1. Evolution of SAR Automatic Target Recognition

SAR ATR has evolved through several methodological paradigms over the past decades. Early approaches relied on template matching and correlation-based techniques, which compared measured SAR signatures with manually curated reference templates [5]. Although computationally efficient, these methods lacked robustness due to target aspect variations, articulation, and cluttered backgrounds. Subsequently, physics-informed methods introduced electromagnetic and geometric feature extraction techniques. Attributed Scattering Centers (ASC), target contour descriptors, and polarimetric signatures were employed to enhance target discrimination and interpretability [6,19]. However, these methods required extensive domain expertise and were often sensitive to sensor configurations and environmental noise.
The advent of traditional machine learning, particularly Support Vector Machines (SVM) and Random Forests, offered automated feature selection and improved generalization under moderate data availability [8]. Machine learning for SAR ATR involves two main steps: extracting discriminative features such as geometric structures, scattering characteristics, or Fourier-based transforms, and then classifying them using methods like SVM, K-NN, or neural networks [6].
Deep learning methods marked a major paradigm shift in SAR ATR. Convolutional Neural Networks (CNNs) was introduced to SAR by Morgan and Chen [20], which enabled end-to-end learning from raw SAR images. Recent advances have demonstrated that deep learning, with abundant training data, has the potential to greatly improve SAR ATR performance by establishing foundations for large-scale implementation [21]. The limited availability of high-quality SAR images has been identified as a critical bottleneck affecting the accuracy and robustness of target detection, classification, and segmentation tasks [22].
Recent studies, such as Li et al.’s SARATR-X [16], further demonstrate that foundation-level architectures tailored to radar data (e.g., including phase information, polarimetry, or multi-aspect views) outperform traditional pipelines. Hybrid models have also gained traction, combining domain-specific knowledge with learnable architectures. For instance, polarimetric CNNs leverage multi-channel SAR input to extract polarization-specific features [6]. Others incorporate synthetic aperture physics or prior target structure using attention layers or residual learning mechanisms [14]. Modern SAR ATR systems face significant deployment challenges due to the sequential arrival of training data and high retraining costs, motivating the development of incremental learning approaches [23]. Recent developments apply transformer architectures to sequence-like SAR inputs, moving beyond historic RNN/CNN designs. For instance, Li et al. propose a multi-aspect SAR target recognition transformer that mines inter-frame correlations using self-attention [24], while Zhao et al. highlight the transition from RNN/CNN to a lightweight vision transformer for SAR ATR in sequence contexts [25].
These developments underscore the importance of integrating radar phenomenology with modern learning frameworks, which is now enabling SAR ATR systems to reach operational readiness across a broader set of deployment conditions.

2.2. Synthetic Data Generation and Domain Gap Challenges

Synthetic SAR data generation has become a widely adopted approach to mitigate the scarcity of labeled data in SAR ATR tasks. Tools such as RaySAR, SARViz, and CohRaS facilitate the generation of ray-traced or wave-based SAR images by simulating electromagnetic backscatter under controlled conditions [9,10,11]. However, a persistent challenge lies in the domain gap between synthetic and measured SAR imagery, stemming from differences in background clutter, sensor noise, and real-world scene complexity [3,12]. Recent research has highlighted that advances in artificial intelligence have increased the demand for labeled data, often outpacing availability, while synthetic data generation provides a cost-effective alternative that still suffers from inherent domain shift problems [26].
Lewis et al. [3] introduced the SAMPLE dataset, which comprises paired synthetic and measured SAR image samples under matched parameters. Their study revealed that even with rigorous CAD truthing, classification performance degraded significantly when synthetic data accounted for more than 60% of the training set. Inkawhich et al. [12] further confirmed that models trained on synthetic data fail to generalize to real measurements, especially under domain shifts. Contemporary approaches have proposed hybrid dataset methods that combine synthetic and measured data to tackle the challenges hindering automatic target detection algorithms for ground targets in SAR images [27].
More recently, Kim et al. [7] proposed Soft Segmented Randomization (SSR) to improve domain generalization by applying controlled noise in segmentation masks. Despite these efforts, most methods cannot completely close the synthetic-to-real gap, thus motivating the need for learning frameworks that work exclusively with measured data. Novel diffusion-model-based approaches have emerged that require only single training samples to generate realistic SAR images, offering potential solutions to data scarcity challenges [22].

2.3. Self-Supervised Learning Paradigms

Self-supervised learning (SSL) has emerged as a promising approach to learning meaningful representations from unlabeled data [28]. In SAR, the structured nature of radar signals, including speckle statistics, aspect dependence, and coherence, provides rich signals for pretext learning tasks [14,15]. Recent work by Pei et al. [14] applied contrastive learning to SAR ATR and showed significant improvements over supervised baselines using the MSTAR and SAMPLE datasets. Li et al. [15] introduced a predictive gradient-based embedding architecture and demonstrated robust performance in low-data scenarios.
Foundation model initiatives, such as SARATR-X [16], leverage large-scale SSL to pretrain on multiple SAR modalities. Furthermore, Muzeau et al. [18] proposed SAFE, an SAR feature extractor based on masked Siamese ViTs, which achieves state-of-the-art results on several SAR benchmarks. Recent advances in adversarial self-supervised learning have introduced novel defense methods such as unsupervised adversarial contrastive learning (UACL), which explicitly suppresses vulnerability in the representation space by maximizing similarity between clean data and corresponding adversarial examples, demonstrating the potential of SSL to improve model robustness in SAR target recognition [29]. These methods highlight the potential of SSL in reducing dependency on labeled data while improving generalization.
Among contrastive learning approaches, SimCLR (Simple Framework for Contrastive Learning of Visual Representations) [30] has emerged as a foundational method in self-supervised learning. SimCLR learns representations by maximizing agreement between differently augmented views of the same image through a contrastive loss function, without requiring any task-specific labels during pretraining. The framework employs data augmentation, a learnable nonlinear projection head, and the normalized temperature-scaled cross-entropy (NT-Xent) loss to train encoders that produce invariant representations. While originally developed for natural images, SimCLR’s simplicity and effectiveness have led to its adoption across diverse imaging domains. However, its application to SAR imagery presents unique challenges due to fundamental differences in image formation physics, the scarcity of large-scale SAR datasets compared to natural image collections, and domain-specific characteristics such as speckle noise and aspect-dependent scattering [31]. Despite these challenges, SimCLR represents an important baseline for evaluating SSL approaches in SAR ATR, as it enables direct comparison between general contrastive learning methods and domain-specific pretext task approaches.

2.4. Multi-Task Learning and Feature Representation

In SAR applications, auxiliary tasks can potentially align with underlying properties of radar imaging, such as translation and orientation invariance. Our previous work [32] showed that multi-task pretext training with complementary transformation tasks enables robust feature learning from measured SAR data without synthetic augmentation. Our proposed SSL framework employs nine tasks ranging from geometric to signal processing transformations, designed to learn robust and transferable feature embeddings.

3. Materials and Methods

3.1. Problem Formulation and Framework Overview

Let X = { x i } i = 1 N represent a dataset of N measured SAR images, where each x i R 64 × 64 is a single-channel image of spatial dimensions 64 × 64 . Our goal is to learn a feature representation function f θ : R 64 × 64 R 2048 parameterized by θ that maps input images to 2048-length feature vectors suitable for downstream target classification. The self-supervised learning framework consists of three main components:
  • Pretext Learning Phase: Learn feature representations f θ by training on multiple pretext tasks { T k } k = 1 K using unlabeled SAR imagery.
  • Feature Extraction Phase: Apply the learned encoder f θ to extract features from training and test data.
  • Downstream Classification Phase: Train classifiers on the extracted features for target recognition.
Our proposed self-supervised learning-based SAR ATR architecture is depicted in Figure 1.

3.2. Dataset Description and Preprocessing

Our experiments utilize the Synthetic and Measured Paired and Labeled Experiment (SAMPLE) dataset [3], a comprehensive SAR ATR benchmark designed to facilitate research on bridging the domain gap between synthetic and real-world SAR imagery. The dataset exhibits a paired structure where each measured (real) SAR image is accompanied by a corresponding synthetic image generated from high-fidelity computer-aided design (CAD) models. Our study focuses exclusively on measured imagery to evaluate SSL performance under realistic operational conditions without synthetic data dependency. The dataset contains ten military vehicle classes with both synthetic and measured SAR images for each vehicle. Synthetic data were generated using meticulously truthed CAD models closely matched to real vehicle images from the publicly available Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset [33]. Images were collected at elevation angles between 14 and 17 and azimuth angles ranging from 10 to 80 . The dataset was specifically created to facilitate research on bridging the gap between measured and synthetic SAR imagery for automatic target recognition applications.
We gathered the data from the publicly available GitHub repository [34], where the dataset has been made available. There are two different kinds of files in the repository: MATLAB files and PNG images. ‘decibel’ and ‘qpm’ are two distinct directories with images in PNG format that are accessible, and the images in both folders are identical. The actual measured images are available in one of the subfolders in each of these folders, while the corresponding synthetic image to each of the measured images is available in the other. Each image filename follows a standardized naming convention encoding parsable metadata: target class identifier, elevation angle ( 14 , 15 , 16 , or 17 ), azimuth angle ( 10 to 80 at variable intervals), and data type (synthetic/measured). This paired structure enables one-to-one correspondence between synthetic and measured images for the same target at matching viewing geometries.
Our Study’s Data Usage Rationale: We utilize only measured (real) images from the SAMPLE dataset, totaling 1345 images across 10 classes, to evaluate SSL performance under authentic operational conditions without synthetic data augmentation. This choice is motivated by three considerations: (1) Realism: measured images contain authentic sensor noise, atmospheric effects, and target variability absent in synthetic data, (2) SSL evaluation: self-supervised learning should demonstrate effectiveness on limited real data, and (3) Operational relevance: deployed SAR ATR systems process real sensor acquisitions. Following [3], we adopt an elevation-based train-test split where all images at 17 elevation (539 samples) serve as the held-out test set, while images at 14 , 15 , and 16 elevations (806 samples) constitute the training pool. Table 1 reports the complete distribution. From the 1345 measured (real) SAR images across 10 military vehicle classes, we use only measured images, excluding synthetic pairs, to evaluate SSL under realistic operational conditions.
The preprocessing pipeline involves two key steps. First, intensity normalization is performed by scaling pixel values to a range of [0, 1] using min-max scaling, ensuring consistent intensity distributions across images. Second, spatial standardization is applied by center-cropping and resizing the images to 64 × 64 pixels, maintaining spatial consistency while preserving target details. We employ a systematic experimental protocol to ensure reproducible and reliable results. The experimental framework evaluates SSL approaches using the SAMPLE dataset with both held-out test data and rigorous cross-validation protocols to assess the performance reliability and generalizability.
Data Leakage Prevention Protocol: To ensure rigorous evaluation and prevent data leakage between training and testing phases, we implement a strict elevation-based data separation strategy. Following the SAMPLE dataset protocol [3], all images at a 17 elevation (539 test samples across 10 classes) are held out from both the SSL pretext training and downstream classifier training phases. Only images at 14, 15, and 16 elevation angles (806 training samples) are available for pretext task learning and downstream training. This elevation-based split ensures that test data remains completely unseen throughout the entire training pipeline, preventing any form of data leakage. Additionally, during downstream classifier training, we employ stratified train-validation splitting where 15% of the training data is reserved for validation, with class proportions preserved to maintain balanced representation. This approach ensures that validation performance accurately reflects generalization capability across all target classes, providing reliable early stopping criteria without introducing bias toward majority classes. This rigorous protocol combining held-out test elevation, proportionate validation, and consistent data partitioning across all k-value experiments substantiates that reported performance metrics represent genuine generalization to unseen data rather than memorization of training examples.

3.3. Multi-Task Pretext Learning Framework

3.3.1. Two-Stage Hierarchical SSL Pipeline with Multi-Task Pretext Learning

Our framework implements a two-stage hierarchical SSL pipeline. Stage 1 (Pretext Training) learns general-purpose representations from unlabeled data through pretext tasks. Stage 2 (Downstream Evaluation) transfers these learned features to target classification by freezing the pretrained encoder and training only the downstream classifier. This hierarchical separation ensures that pretext task learning captures domain-invariant features without overfitting to specific downstream labels.
Within the first stage (pretext training), we employ multi-task learning where a single shared encoder is trained simultaneously on nine pretext tasks. Our multi-task pretext objective jointly optimizes all nine tasks through a unified loss function, which enables the network to learn a unified representation space that captures multiple complementary aspects of SAR imagery simultaneously. In the second stage (downstream evaluation), we freeze the pretrained encoder and extract 2048-length feature vectors for all training samples. These extracted features are then used to train various downstream classifiers for the target classification task.
We adopt multi-task learning within the pretext stage (rather than training nine separate single-task models) for some key advantages: (1) Training efficiency: requires 2–3 min for training the nine pretext tasks simultaneously for the SAMPLE dataset taken, (2) Feature complementarity: ablation studies confirm multi-task synergy yields better performance over best single-task performance through simultaneous gradient contributions from all tasks, and (3) Data efficiency: each of 806 images provides nine training signals (7254 effective examples), compensating for limited labeled data, with minimal computational overhead.

3.3.2. Pretext Tasks Design and Theoretical Justification

We propose a comprehensive self-supervised learning framework comprising nine methodically constructed pretext tasks, each designed to exploit distinct structural characteristics inherent to synthetic aperture radar (SAR) imagery. Our approach is grounded in rigorous theoretical principles derived from established radar phenomenology and signal processing theory. The foundation of our methodology rests upon some key observations from the radar remote sensing literature. First, electromagnetic scattering mechanisms in SAR targets demonstrate invariance properties under specific geometric and radiometric transformations [35,36], providing a theoretical basis for transformation-based self-supervision. Second, operational SAR systems are inherently subjected to multiplicative speckle noise, atmospheric propagation effects, and system-induced artifacts [37], which require robust feature representations that can model these degradation processes. Third, SAR backscatter signatures exhibit multi-scale spatial dependencies arising from the complex interplay between target geometry and radar viewing parameters [38,39], motivating the development of hierarchical feature extraction mechanisms. Building upon these theoretical foundations, our pretext tasks design systematically addresses each aspect of SAR data complexity while maintaining computational scalability for large-scale applications. The pretext tasks can be classified into these broad categories:
Original Image Prediction ( T 0 ): Classification of unmodified images serves as a baseline task and helps maintain original SAR signature characteristics.
Geometric Invariance Tasks ( T 1 T 5 ): Based on the principle that SAR target signatures should be recognizable regardless of platform orientation or viewing geometry, these tasks force the network to learn rotation and reflection-invariant features essential for multi-aspect target recognition. These tasks include:
  • 90 Rotation ( T 1 ): Rotates the image counterclockwise by 90, leveraging viewpoint-invariant nature of SAR targets.
    R 90 ( x ) = rot 90 ( x , k = 1 )
  • 180 Rotation ( T 2 ): Rotation by 180, helping to learn orientation-invariant features.
    R 180 ( x ) = rot 90 ( x , k = 2 )
  • 270 Rotation ( T 3 ): Counterclockwise rotation by 270 (equivalent to 90 clockwise), helping rotational feature learning for multi-aspect recognition.
    R 270 ( x ) = rot 90 ( x , k = 3 )
  • Horizontal Flip ( T 4 ): Flips the image horizontally, creating a mirrored version to learn geometric invariances.
    F h ( x ) = flip ( x , axis = width )
  • Vertical Flip ( T 5 ): Flips the image vertically, producing an upside-down version, enhancing robustness to target orientation changes.
    F v ( x ) = flip ( x , axis = height )
Signal Quality Robustness Tasks ( T 6 T 7 ): Motivated by operational requirements where SAR imagery may be degraded by atmospheric effects, sensor noise, or processing artifacts. These tasks ensure the learned representations are robust to signal quality variations commonly encountered in operational scenarios. These transformations include:
  • Denoising ( T 6 ): The network learns to identify images processed with Gaussian smoothing (implemented as “denoising” task). This applies a 3 × 3 Gaussian kernel with fixed standard deviation σ = 0.5 , which reduces noise and high-frequency details in SAR imagery.
    S ( x ) = x G σ = 0.5
    where G σ = 0.5 is a 3 × 3 Gaussian kernel with standard deviation 0.5.
  • Blur Prediction ( T 7 ): The network applies Gaussian blur transformations with a 5 × 5 kernel, simulating atmospheric effects and resolution variations. The standard deviation is chosen randomly between 0.5 and 1.0 to create varying levels of blur effect.
    B ( x ) = x G σ b
    where G σ b is a 5 × 5 Gaussian kernel with standard deviation σ b U ( 0.5 , 1.0 ) .
Multi-Scale Analysis Task ( T 8 ): To address the multi-resolution nature of SAR phenomenology, where target features manifest at different spatial scales depending on sensor parameters and target geometry, we implement a zoom-in transformation. The network predicts zoom-in transformations that upscale the image by a factor between 1.2 and 1.5, then crops the center region back to the original size, creating a zoom effect. This corresponds to 20% to 50% zoom levels, enabling multi-scale feature learning important for variable resolution scenarios.
Z ( x ) = CenterCrop ( Resize ( x , s · size ( x ) ) , size ( x ) )
where s U ( 1.2 , 1.5 ) is the scaling factor.
Figure 2 illustrates the complete set of pretext transformations applied to representative samples from each target class in our dataset.

3.3.3. Pretext Network Architecture

Our CNN architecture is specifically designed to capture the characteristics of SAR imagery, following the implementation of Lewis et al. [3] to ensure fair comparison and reproducibility. The network employs progressive channel expansion (16 → 32 → 64 → 128) to enable feature learning. Spatial resolution is systematically reduced using max-pooling operations that preserve essential spatial relationships while lowering computational complexity. The pretext classifier consists of a fully connected head (2048 → 1000 → 500 → 250 → N) with ReLU activations, where N denotes the number of pretext transformation classes. All convolutional layers use 3 × 3 kernels with same-padding to preserve spatial dimensions prior to pooling. Pretext training is formulated as a single-label classification problem over the N transformation classes and is optimized using the categorical cross-entropy loss
L pretext = L C E ( p , y ) ,
where p represents the predicted probability distribution over the transformation classes and y is the corresponding ground-truth label. Optimization is performed using the Adam optimizer (learning rate = 0.001 , batch size = 16 ) with early stopping (patience = 5 epochs) to mitigate overfitting and encourage generalizable representations for downstream tasks. The architecture is shown in Table 2.
Computational Resources: Pretext training requires approximately 2–3 min on a single GPU (NVIDIA GeForce RTX 3090, Santa Clara, CA, USA ) for the complete SAMPLE dataset. Feature extraction for all downstream classifiers adds an average of 15.5 ms per 64 × 64 image. The framework is designed for computational efficiency, making it practical for research and operational deployment scenarios.

3.4. SimCLR Baseline Implementation

To establish a rigorous comparison with established SSL methods, we implement SimCLR (Simple Framework for Contrastive Learning of Visual Representations) [30] as a strong baseline, following the original paper’s methodology as closely as possible while making necessary adaptations for SAR imagery and our experimental conditions. SimCLR represents a fundamentally different SSL paradigm compared to our task-based approach, while our method employs explicit pretext tasks. SimCLR learns representations through instance discrimination, where the model learns to distinguish between different images while recognizing augmented views of the same image as equivalent.
Our implementation faithfully replicates the core architectural and algorithmic components of the original SimCLR paper [30]. We preserve the fundamental two-phase training protocol (contrastive pretraining followed by linear evaluation), the exact projection head architecture with batch normalization, the NT-Xent loss formulation with cosine similarity, and the linear evaluation protocol for representation quality assessment. We employ LARS optimizer with learning rate scaling ( lr = 0.3 × batch   size 256 ), linear warmup for the first 10 epochs, and cosine decay learning rate scheduling. The temperature parameter ( τ = 0.5 ), weight decay ( 10 6 ), momentum (0.9), and LARS trust coefficient (0.001) are set exactly as recommended in the original work. This faithful implementation ensures our comparison reflects the true performance characteristics of SimCLR rather than implementation-specific variations.
The SimCLR framework consists of two main phases. During contrastive pretraining, each SAR image x undergoes random augmentation to generate two correlated views ( x i , x j ) . For each image, we randomly select two different augmentations from our SAR-adapted transformation set ( T 0 T 8 ). Each view applies exactly one randomly selected transformation, with the constraint that the two views use different transformations to ensure meaningful view diversity while preserving target-specific SAR signatures.
Both augmented views are processed through a shared ResNet-18 encoder f ( · ) to extract 512-dimensional feature representations. Following the original SimCLR protocol exactly, we employ a two-layer MLP projection head g ( · ) with the architecture specified in the original paper: Linear ( 512 2048 ) → BatchNorm → ReLU → Linear ( 2048 128 ). This produces 128-dimensional normalized embeddings z = g ( f ( x ) ) for contrastive learning. The projection head, including batch normalization between layers as specified in the original work, is crucial for effective contrastive learning but is discarded during downstream evaluation.
The contrastive loss encourages agreement between positive pairs (augmented views of the same image) while maximizing disagreement with negative pairs (views from different images). For a batch of N images producing 2 N augmented views, we apply the normalized temperature-scaled cross-entropy (NT-Xent) loss exactly as defined in the original paper. For a positive pair ( z i , z j ) :
L i , j = log exp ( sim ( z i , z j ) / τ ) k = 1 2 N 1 [ k i ] exp ( sim ( z i , z k ) / τ )
where sim ( z i , z j ) denotes cosine similarity between L2-normalized embeddings, τ = 0.5 is the temperature parameter (original paper’s recommended value), and k i excludes self-comparisons. The final loss averages over all 2 N positive pairs in the batch, precisely following the original implementation.
We replicate SimCLR’s training protocol with dataset-appropriate adaptations. Following the original paper’s optimizer recommendations, we use LARS (Layer-wise Adaptive Rate Scaling) with base learning rate 0.3, momentum 0.9, weight decay 10 6 , and trust coefficient 0.001. The learning rate is scaled by batch size: lr = 0.3 × ( batch   size / 256 ) , maintaining the original paper’s linear scaling rule. We implement the original paper’s learning rate schedule: linear warmup for the first 10 epochs, followed by cosine decay to zero over the remaining epochs. The original SimCLR employs batch sizes of 256–8192 to provide sufficient negative samples; for our dataset constraints, we use an adaptive batch sizing strategy where batch size is set to min ( 64 , N train / 2 ) to ensure at least two batches per epoch while maximizing negative sample diversity. This adaptation is necessary for low k-values where training samples are limited, but preserves the core contrastive learning principle. We train for 200 epochs, consistent with one of the pretraining durations explored in the original SimCLR paper.
The second phase, linear evaluation, follows the standard protocol established by the original SimCLR paper. We freeze the trained ResNet-18 encoder and train only a single linear layer (512-length input to 10 target classes) using supervised labels and cross-entropy loss. This evaluation protocol, widely adopted in the SSL literature, tests whether the encoder has learned features that are linearly separable for the downstream task, providing a fair comparison between SSL methods. Following the original paper’s specifications, we train the linear classifier for 100 epochs using SGD with learning rate 0.1, momentum 0.9, weight decay 0.0, and early stopping (patience = 10 epochs) on validation loss.
We modified ResNet-18’s initial convolutional layer to accept grayscale input (1 channel instead of 3) to adapt the SAR data we used. Our adaptive batch sizing ensures stable contrastive learning even with limited training data at low k-values, preventing training failures while maintaining meaningful negative sampling.

3.5. Downstream Classification Approaches

To evaluate the quality of learned representations from our SSL framework, we conduct comprehensive downstream classification experiments using both traditional machine learning and deep learning approaches. All classifiers operate on feature vector of length 2048 that is extracted from the pre-trained SSL encoder network.

3.5.1. Traditional Machine Learning Classifiers

We implement four well-established machine learning algorithms, each configured following standard practices and widely used default hyperparameter settings for high-dimensional feature classification.
Support Vector Machine (SVM): We employ an SVM with linear kernel optimized for high-dimensional features commonly encountered in deep learning representations. The implementation utilizes scikit-learn’s SVC with probability estimation enabled for comprehensive performance analysis.
XGBoost: Our XGBoost configuration implements an ensemble of 100 decision trees. The classifier is configured for multi-class soft probability prediction with 10 output classes and automatic label encoding disabled.
Random Forest: The Random Forest classifier employs 100 estimators. Bootstrap sampling and other ensemble parameters maintain default scikit-learn configurations.
Gradient Boosting: The Gradient Boosting implementation utilizes 100 estimators. The implementation follows scikit-learn’s gradient boosting approach with learning rate, maximum tree depth, and other regularization parameters configured according to the hyperparameter optimization results to prevent overfitting while maintaining classification performance.

3.5.2. Deep Learning Architectures

We evaluate several state-of-the-art deep neural network architectures, each adapted for SAR imagery classification.
ResNet18: We implement a modified ResNet18 architecture that can handle both the image and the input of the feature vector. The implementation includes adaptive input processing to accommodate 2048-length feature vectors by reshaping them into spatial representations suitable for convolutional processing.
U-Net: We adapt the U-Net encoder-decoder architecture for classification tasks by incorporating global average pooling in the decoder bottleneck. The encoder extracts hierarchical features, while skip connections preserve fine-grained spatial information, making this architecture particularly suitable for SAR imagery, where spatial context is crucial for target recognition.
MobileNet variants (v1 with different width multipliers: 1.0, 0.75, 0.5, 0.25): We implement the MobileNet v1 architecture with different width multipliers to explore the trade-offs between computational efficiency and classification performance. The architecture employs depthwise separable convolutions to reduce computational cost while maintaining feature extraction capability.
EfficientNet variants (B0, B1, B2, B3): Our EfficientNet implementation leverages compound scaling to systematically balance network depth, width, and resolution. We evaluated EfficientNet-B0 through B3 variants. The balanced scaling approach of EfficientNet makes it particularly suitable for SAR applications where computational efficiency is crucial while maintaining high accuracy requirements.
GAN Classifier: We implement a discriminator-based classification approach that leverages generative adversarial network (GAN) principles for target recognition. The GAN classifier consists of a discriminator network trained to distinguish between different target classes rather than real versus synthetic data. This approach utilizes the representational power of adversarial training to learn robust feature discriminations.
CNN: Additionally, we have employed the CNN architecture that is used to train the pretext features. We have adapted the CNN architecture to process one-dimensional feature vector while keeping the architecture parameters the same. For downstream classification, we have used 1D convolutions and filters instead of 2D.
Table 3 provides the hyperparameter specifications used to implement the downstream classifiers.

3.6. Experimental Design and Evaluation Methodology

3.6.1. Evaluation Metrics

  • Accuracy: A = T P + T N T P + T N + F P + F N , representing the overall percentage of correctly classified samples and providing a general measure of classifier effectiveness.
  • Precision: P = T P T P + F P , macro-averaged across all classes to account for potential class imbalance effects.
  • Recall: R = T P T P + F N , macro-averaged to ensure equal consideration of all target classes regardless of frequency.
  • F1-Score: F 1 = 2 P R P + R , representing the harmonic mean of precision and recall to provide a balanced performance measure.
  • Area Under the Curve (AUC): A U C = 0 1 T P R ( F P R 1 ( t ) ) d t , providing a threshold-independent measure of classifier discriminative ability across all possible decision boundaries.
  • True Positive Rate at Fixed False Positive Rates: T P R = T P T P + F N evaluated at strategically chosen FPR thresholds of 1%, 5%, 10%, 15%, and 20% to assess performance under varying operational constraints typical in SAR target recognition applications.
  • Pretext Training Time: Total wall-clock time required for SSL pretext tasks training, measured across multiple epochs until convergence.
  • Downstream Classifier Training Time: Training duration for each downstream classifier using extracted feature representations.

3.6.2. Data Availability Scenarios and k-Value Selection

Our experimental framework evaluates performance across varying fractions of measured training data, denoted as k ranging from 0.05 to 1.00, where k represents the percentage of available measured (real) training data used for both pretext and downstream classifier training. The k parameter constitutes a critical experimental design element that enables systematic assessment of SSL framework effectiveness under different data availability constraints commonly encountered in operational SAR ATR scenarios. The significance of k-value analysis extends beyond academic evaluation, providing essential guidance for operational deployment decisions where data acquisition costs, labeling requirements, and rapid deployment constraints directly impact system viability.
While our framework supports comprehensive evaluation across all k-values for traditional machine learning and most deep learning architectures, generative adversarial network (GAN) based classifiers require special consideration at extremely low k-values. At k = 0.05 (5% of training data), the available sample size (43 training samples from the SAMPLE dataset) falls below the minimum requirements for stable GAN training. GANs require adequate sample diversity for stable adversarial training to ensure numerical stability and avoid training failures. GANs exhibit several critical issues, including numerical instability due to insufficient batch statistics for BatchNorm layers, gradient instability leading to mode collapse and vanishing gradients, and training convergence failure. These technical limitations align with established findings in GAN literature [40,41], where minimum dataset requirements are necessary for stable training. Consequently, our evaluation presents GAN-based classifier results for k 0.10 (at least 85 training samples).
The k-value methodology provides valuable insights into the relationship between data availability and model performance across diverse architectural families. This systematic approach enables identification of operational thresholds where specific architectures become viable, quantification of performance degradation under data constraints, and optimization of resource allocation for data collection efforts. For operational SAR ATR systems, k-value analysis directly informs critical deployment decisions, including minimum data collection requirements, expected performance under constrained scenarios, and cost-benefit analysis of additional data acquisition investments. The distribution of data for each k (fraction of real data used) is shown in Table 4.

3.6.3. Cross-Validation Protocol

To ensure robust and generalizable results, we also implement a rigorous cross-validation protocol. The cross-validation experiments validate our findings using independent data splits and provide confidence intervals for performance estimates.
Addressing the inherent data scarcity challenges in SAR ATR applications, our cross-validation methodology employs a conservative approach with reduced data usage to better reflect real-world operational constraints. We implement a 5-fold stratified cross-validation protocol using exclusively real SAR imagery, selecting the first 92 images per class lexicographically to ensure reproducible dataset composition and eliminate potential selection bias. This results in a total dataset of 920 images ( 92 × 10 classes).
The cross-validation architecture differs fundamentally from our k-value experimental framework in several key aspects: (1) Data Volume: Cross-validation uses 920 real images compared to the full dataset that includes the original train and the test set of 1345 images, representing the evaluation under data-scarce conditions; (2) Pretext Training: All 920 images are used for self-supervised pretext training without fold splits, ensuring the SSL feature extractor learns from the complete available dataset; (3) Downstream Evaluation: The same 920 images are subjected to 5-fold cross-validation where each fold uses approximately 736 images for training and 184 for testing, with 25% of training data reserved for validation; (4) Statistical Rigor: This protocol provides confidence intervals and standard deviations across folds, offering more reliable performance estimates than single train-test splits used in the main k-value experiments.

4. Experimental Results and Analysis

4.1. Overall Performance Analysis

Our comprehensive experimental evaluation encompasses 16 diverse downstream classifier architectures systematically evaluated across four primary computational paradigms: traditional machine learning (SVM, Random Forest, XGBoost, Gradient Boosting), convolutional neural networks (CNN, ResNet18, U-Net), efficient architectures (MobileNet variants with width multipliers 1.0, 0.75, 0.5, 0.25; EfficientNet variants B0–B3), and generative adversarial networks (GAN classifier). This systematic evaluation across values of k (from 0.05 to 1.0) provides comprehensive insights into SSL framework performance under diverse data availability constraints, generating 320 experimental configurations that thoroughly validate our approach across operational scenarios typical of real-world SAR ATR deployments.
The experimental results reveal striking performance patterns that fundamentally challenge conventional assumptions about deep learning superiority in computer vision tasks. Traditional machine learning approaches consistently demonstrate exceptional performance, with SVM achieving a remarkable 99.63% accuracy, 99.63% precision, and 99.66% recall at full training data availability (k = 1.0), accompanied by near-perfect ROC AUC scores of 1.0000 for all target classes. Random Forest maintains robust second-best performance with 99.26% accuracy, 99.30% precision, and 99.22% recall, achieving ROC AUC values between 0.9990 and 1.0000 across all target classes. XGBoost delivers competitive 93.88% accuracy with consistently high per-class discrimination above 0.9930. These results establish traditional ML methods as the optimal choice for SSL-based SAR ATR applications, contradicting the common bias toward deep neural architectures in computer vision research.
Among deep learning architectures, ResNet18 emerges as the top performer with 97.40% accuracy at k = 1.0, achieving exceptional per-class ROC AUC values between 0.9960 and 1.0000 across classes, demonstrating robust discrimination across all target types. CNN provides competitive baseline performance at 94.25% accuracy, while U-Net achieves 84.04% accuracy despite higher computational requirements. The efficient architecture family demonstrates compelling performance-efficiency trade-offs, with EfficientNet B3 reaching peak efficient architecture performance at 95.92% accuracy and per-class ROC AUC values between 0.9733 and 1.0000, EfficientNet B2 achieving 89.42%, EfficientNet B1 attaining 95.18%, and EfficientNet B0 delivering baseline 92.95% accuracy.
MobileNet variants showcase scalability across computational constraints with remarkable performance. Standard MobileNet achieves an exceptional 98.70% accuracy with per-class ROC AUC values ranging from 0.9995 to 1.0000. MobileNet 0.75 reaches 97.77%, MobileNet 0.5 attains 95.73%, and MobileNet 0.25 delivers 92.21% acccuracy. GAN-based classifiers demonstrate promising generative learning capabilities with 96.85% accuracy and consistently high per-class discrimination above 0.9785, validating the framework’s applicability to adversarial learning paradigms while maintaining competitive performance across diverse target classes.

4.2. Data Availability Impact and k-Value Analysis

The k-value experimental design provides critical insights into SSL framework robustness under varying data constraints, systematically evaluating performance degradation and recovery patterns across data availability scenarios from extreme scarcity (k = 0.05, 5% of training data with only 43 total samples) to full availability (k = 1.0, 100% with 806 samples). Our systematic evaluation reveals dramatic performance variations that directly inform operational deployment strategies across diverse resource-constrained environments typical of real-world SAR ATR applications.
At extreme data scarcity (k = 0.05), traditional machine learning approaches demonstrate remarkable resilience that far exceeds random chance performance. SVM achieves 51.95% accuracy under these severe constraints with per-class ROC AUC values ranging from 0.7015 to 0.9970, while Random Forest maintains 46.01% accuracy with consistent discrimination above 0.5385 across all target classes. These performance levels remain operationally relevant for preliminary target screening applications, demonstrating the framework’s capability to extract meaningful features even from minimal training data. In contrast, deep learning architectures exhibit significant performance degradation, with ResNet18 achieving only 24.86% accuracy and CNN reaching 27.27% accuracy at k = 0.05.
The performance trajectory from k = 0.05 to k = 1.0 reveals distinct architectural families’ sensitivity to data availability. Traditional ML methods exhibit graceful degradation under data constraints while maintaining consistent relative performance across all k-values, with SVM improving from 51.95% to 99.63% accuracy and Random Forest advancing from 46.01% to 99.26%. Deep learning architectures show higher variance at low k-values but demonstrate rapid improvement as data availability increases: ResNet18 advancing from 24.86% accuracy at k = 0.05 to 97.40% at k = 1.0, and CNN improving from 27.27% to 94.25%. EfficientNet variants display particularly interesting scaling behavior, with B3 improving from 15.96% at k = 0.05 to 95.92% at k = 1.0, while B0 advances from 19.48% to 92.95%, indicating architectural depth’s relationship to data requirements. The accuracy on the full test set in varying training data availability scenarios is shown in Figure 3, while the accuracy curves with different k values for top downstream classifiers are depicted in Figure 4.

4.3. Computational Efficiency and Operational Metrics

The timing analysis reveals highly favorable computational characteristics essential for operational deployment across diverse hardware configurations and real-time processing requirements. SSL feature extraction from the trained pretext model took a computational cost of 15.35 ms per image. Inference times with the extracted feature vary dramatically across architectural families, enabling precise performance-efficiency optimization. Traditional ML methods achieve exceptional computational efficiency: Gradient Boosting requires only 0.01 ms per image, XGBoost needs 0.02 ms per image. Random Forest demands 0.05 ms per image, and SVM requires 0.08 ms per image, despite its superior accuracy. Deep learning architectures require 0.05–0.37 ms per image, with ResNet18 achieving a competitive 0.13 ms per image, CNN requiring 0.05–0.06 ms per image, and U-Net demanding 0.35–0.37 ms per image due to its complex encoder-decoder architecture.
Efficient architectures maintain competitive timing profiles optimized for resource-constrained deployments. EfficientNet variants require 0.16–0.28 ms per image across B0–B3 configurations, with B0 achieving 0.16 ms per image and B3 reaching 0.27–0.28 ms per image, demonstrating excellent scalability. MobileNet variants need 0.05–0.11 ms per image depending on the width multiplier, with the 0.25 width multiplier variant achieving a fast 0.09 ms per image while maintaining 92.21% accuracy. GAN classifiers provide balanced performance at 0.08–0.09 ms per image, making them suitable for specialized applications requiring generative model capabilities.
Total processing times range from 15.36 ms (Gradient Boosting: 15.35 ms feature extraction and 0.01 ms inference) to 15.72 ms (U-Net: 15.35 ms feature extraction and 0.37 ms inference), enabling real-time processing capabilities for operational SAR ATR systems. The minimal variance in total processing time (±0.36 ms) across architectural families confirms that feature extraction dominates computational cost, making downstream classifier selection primarily a performance consideration rather than a computational constraint. This characteristic enables deployment flexibility where high-accuracy classifiers like SVM (99.63% accuracy, 15.43 ms total) can be deployed when computational resources permit, while fast alternatives like Gradient Boosting (92.76% accuracy, 15.36 ms total) provide viable options for extreme real-time constraints or battery-powered platforms.

4.4. Per-Class Performance Analysis

Class-wise ROC AUC analysis reveals exceptional discrimination capability across all target types, with particularly strong performance for specific vehicle classes that demonstrate distinct feature characteristics in the SSL representation space. The per-class analysis provides critical insights into target-specific recognition capabilities and potential operational deployment considerations for diverse mission requirements.
SVM achieves near-perfect per-class discrimination with ROC AUC values demonstrating exceptional consistency. This performance demonstrates the SSL framework’s ability to learn discriminative features for diverse target types, from tracked tanks to wheeled vehicles. Random Forest maintains robust per-class performance with ROC AUC values ranging from 0.9990 to 1.0000 across all target classes. The minimal performance variation indicates consistent feature quality across vehicle types. XGBoost demonstrates reliable per-class discrimination with ROC AUC values consistently above 0.9941. ResNet18 exhibits competitive per-class discrimination with ROC AUC values between 0.9974 and 1.0000, achieving perfect discrimination for Classes 5 (M35) and 6 (M548).
Class-specific insights reveal interesting patterns in target discrimination. Class 6 (M548 tracked carrier) exhibits perfect discrimination (ROC AUC = 1.0000) across all top-performing classifiers (SVM, Random Forest, XGBoost), indicating highly distinctive SSL-learned features that separate this tracked utility vehicle from combat platforms. Classes 3 (M1 tank) and 5 (M35 truck) consistently achieve perfect or near-perfect discrimination, suggesting that main battle tanks and wheeled logistics vehicles possess easily distinguishable radar signatures in the learned feature space. The minimal interclass performance variation (typically within 0.001–0.002 ROC AUC) across top classifiers demonstrates the SSL framework’s balanced feature learning capability for diverse military vehicle types encountered in operational SAR ATR scenarios.

4.5. Operational Performance Thresholds

The analysis of True Positive Rate (TPR) at fixed False Positive Rate (FPR) thresholds provides operationally critical performance metrics that directly inform deployment decisions across diverse mission requirements and operational constraints. These metrics represent real-world performance under varying tolerance levels for false alarms, allowing precise system configuration for specific operational scenarios from high-precision reconnaissance to rapid threat assessment.
At the stringent 1% FPR threshold, representing high-precision applications where false alarms carry significant operational costs, SVM achieves an exceptional 99.89% TPR, demonstrating near-perfect target detection with minimal false alarm rates suitable for critical decision-making scenarios. Random Forest attains 99.65% TPR at this threshold, while XGBoost reaches 96.10% TPR, and ResNet18 achieves 98.70% TPR. These performance levels indicate excellent discrimination capability suitable for high-stakes applications such as threat identification in densely populated areas or precision targeting scenarios where misclassification consequences are severe.
At the moderate 5% FPR threshold, providing operational flexibility for scenarios tolerating moderately higher false alarm rates in exchange for enhanced target detection, perfect 100% TPR is achieved by SVM, Random Forest (100%), XGBoost (100%), and ResNet18 (99.26%). MobileNet standard achieves perfect 100% TPR, while EfficientNet B3 reaches 99.44% and Gradient Boosting attains 97.40%. This threshold represents optimal operating points for many operational scenarios where some false alarms are acceptable to ensure comprehensive target detection.
The 10% FPR threshold demonstrates robust detection performance under permissive operational constraints. Perfect 100% TPR is achieved by SVM, XGBoost, ResNet18, MobileNet standard, and multiple EfficientNet variants. CNN reaches 98.33%, U-Net achieves 97.96%, and even challenging scenarios maintain TPR above 94%. The consistent achievement of 100% TPR at this threshold across top-tier classifiers demonstrates robust detection performance suitable for surveillance applications where comprehensive area monitoring is prioritized over precision.
At higher FPR thresholds (15% and 20%), all classifiers achieve TPR values above 98%, with many reaching perfect 100% detection. SVM maintains perfect 100% TPR at both 15% and 20% FPR thresholds, as do Random Forest, XGBoost, and most deep learning architectures. This performance consistency across varying operational constraints validates the SSL framework’s robustness and provides confidence for deployment across diverse mission requirements.
Operational deployment guidelines emerge from threshold analysis. For high-precision applications requiring minimal false alarms (≤1% FPR), SVM provides optimal performance, making it ideal for critical identification tasks, precision targeting, or operations in sensitive areas where false positives have severe consequences. For balanced operational requirements (5% FPR), multiple classifiers achieve perfect or near-perfect TPR, enabling selection based on computational constraints, with SVM and Random Forest providing optimal accuracy while Gradient Boosting offers fast processing for time-critical applications. For surveillance and area monitoring scenarios tolerating higher false alarm rates (≥10% FPR), the framework provides exceptional flexibility with multiple high-performance options enabling deployment optimization based on hardware limitations, power constraints, or specific mission requirements rather than accuracy considerations.

4.6. Cross-Validation Results and Statistical Validation

Our rigorous cross-validation methodology employs a conservative 5-fold class-proportionate approach using exclusively real SAR imagery. To ensure reproducible dataset composition and eliminate potential selection bias, the first 92 images per class are selected lexicographically. This results in a total dataset of 920 images ( 92 × 10 classes), representing a constrained data regime characteristic of operational SAR ATR scenarios, where labeled training data are inherently limited and costly to acquire. Unlike the main k-value experiment, this cross-validation setup uses the full 920-image dataset for SSL pretext training while subjecting the same data to a rigorous 5-fold class-proportionate partitioning for downstream classification evaluation.
The cross-validation results demonstrate exceptional consistency and statistical significance across all evaluated classifiers, providing robust validation of our main experimental findings. SVM maintains the best and most consistent performance with 99.13% mean accuracy cross folds, indicating exceptional stability across different data partitions and confirming its position as the optimal classifier for SSL-based SAR ATR. The narrow confidence interval demonstrates statistical significance of performance differences between methods and validates the reliability of SVM’s superiority across diverse data configurations.
Random Forest achieves a robust 96.96% mean accuracy, while ResNet18 delivers a competitive 96.52% mean performance among deep learning approaches. The cross-validation results for additional architectures demonstrate consistent performance patterns: XGBoost reaches 95.65% accuracy, Gradient Boosting achieves 94.13%, and efficient architectures maintain their relative performance hierarchies observed in the experiments. CNN attains 90.22% accuracy, U-Net reaches 87.93%, and MobileNet variants achieve 93.37–95.54% accuracy depending on width multiplier configuration.
Key findings from cross-validation analysis provide compelling evidence for framework robustness. First, consistent high performance across multiple classifiers (99.13% for SVM, 96.96% for Random Forest, 96.52% for ResNet18) indicates results are not attributable to classifier-specific optimization but reflect genuine feature quality from our SSL framework. Second, low standard deviations across all experiments demonstrate reproducibility and reliability, confirming that performance variations are minimal across different data partitions. Third, traditional machine learning approaches exhibit exceptional stability compared to deep learning models while maintaining competitive mean performance, reinforcing their suitability for operational deployment where consistency is crucial. Key findings from cross-validation analysis provide compelling evidence for framework robustness. First, consistent high performance across multiple classifiers (99.13% for SVM, 96.96% for Random Forest, 96.52% for ResNet18) indicates results are not attributable to classifier-specific optimization but reflect genuine feature quality from our SSL framework. Second, low standard deviations across all experiments demonstrate reproducibility and reliability, confirming that performance variations are minimal across different data partitions. Third, traditional machine learning approaches exhibit exceptional stability compared to deep learning models while maintaining competitive mean performance, reinforcing their suitability for operational deployment where consistency is crucial.
The cross-validation framework validates that results generalize reliably across different data splits, providing confidence in deployment scenarios where data composition may vary from training conditions. The consistent performance patterns between main k-value experiments and cross-validation results demonstrate framework robustness and validate our SSL approach’s effectiveness across diverse experimental protocols, supporting the reliability of our findings for real-world SAR ATR applications where data availability and composition constraints are common operational challenges.

4.7. Comprehensive Architecture Performance Analysis

The systematic evaluation across 16 downstream classifier architectures reveals distinct performance patterns that inform optimal deployment strategies. Traditional ML classifiers demonstrate remarkable consistency: SVM achieves 99.63% accuracy with exceptional precision (99.63%) and recall (99.66%), Random Forest maintains 99.26% accuracy with balanced precision (99.30%) and recall (99.22%), XGBoost delivers competitive 93.88% accuracy, and Gradient Boosting provides rapid inference time of 0.01 ms with 92.76% accuracy.
Deep learning architectures exhibit varied performance characteristics. CNN achieves 94.25% accuracy with moderate computational requirements (0.06 ms inference on extracted features), ResNet18 delivers top deep learning performance at 97.40% accuracy (0.13 ms inference on extracted features), and U-Net provides specialized capabilities with 84.04% accuracy despite higher computational cost (0.37 ms inference on extracted features). The efficient architecture family demonstrates compelling performance-efficiency trade-offs: EfficientNet B3 reaches peak efficient performance at 95.92% accuracy, EfficientNet B2 achieves 89.42%, EfficientNet B1 attains 95.18%, and EfficientNet B0 delivers baseline 92.95% performance.
MobileNet variants showcase scalability across computational constraints: standard MobileNet achieves exceptional 98.70% accuracy, MobileNet 0.75 reaches 97.77%, MobileNet 0.5 attains 95.73%, and MobileNet 0.25 delivers 92.21%. GAN-based classifiers demonstrate promising generative learning capabilities with 96.85% accuracy, validating the framework’s applicability to adversarial learning paradigms.

4.8. Contrastive Learning Baseline: SimCLR Performance Analysis

To establish a rigorous comparison with established contrastive learning methods, we evaluated SimCLR under identical experimental conditions with the same dataset, and k-value range 0.05–1.00). SimCLR represents a fundamentally different SSL paradigm: while our multi-task approach learns through explicit transformation classification, SimCLR employs instance discrimination with contrastive loss, making this comparison particularly informative for understanding optimal SSL strategies for SAR ATR.
The SimCLR baseline achieves 92.02% accuracy at full training data availability (k = 1.00), demonstrating that contrastive learning produces viable representations for SAR target recognition. However, this performance falls substantially short of our multi-task SSL framework, which achieves 99.63% accuracy (SVM), 99.26% (Random Forest), and 97.40% (ResNet18) under identical conditions. The performance gap persists across all k-values, with SimCLR reaching only 15.77% accuracy at k = 0.05 compared to 51.95% for our approach. Comprehensive performance metrics, including accuracy, precision, recall, and F1-score across all k-values are provided in Appendix A Table A5 (SimCLR Performance Metrics). Results reveal that SimCLR requires approximately k 0.40 (327 training samples) to achieve performance comparable to our method at k = 0.25 (203 samples), indicating superior data efficiency for task-based SSL in the SAR domain.
The performance differential can be attributed to several domain-specific factors. First, SAR imagery exhibits structured transformations (rotations, flips) that are semantically meaningful for target recognition, making explicit task-based learning more effective than generic instance discrimination. Second, contrastive learning relies heavily on negative sampling diversity, requiring large batch sizes for optimal performance [42]; our SimCLR adaptation uses batch size 64 due to dataset constraints, potentially limiting contrastive signal strength. Third, the limited visual variability within SAR target classes (compared to natural images) may reduce the effectiveness of view-based contrastive learning, as augmented views may not provide sufficient discrimination signal.

4.9. Task Ablation Study and Pretext Task Analysis

To address fundamental questions about task contribution and framework complexity, we conducted comprehensive ablation studies systematically evaluating 15 task combinations across all 16 downstream classifiers at full data availability (k = 1.00) for our best-performing classifier (SVM). The nine pretext tasks were organized into four semantic groups: T0_Original (identity transformation), T1–T5_Geometric (rotations: 90, 180, 270; flips: horizontal, vertical), T6–T7_SignalQuality (denoise, blur), and T8_MultiScale (zoom). We evaluated individual groups, two-group combinations, and three-group combinations to identify optimal task selection strategies and quantify individual task contributions. The findings of the ablation studies are mentioned in Table 5.
The ablation study reveals that geometric transformation tasks (T1–T5) are unequivocally the most critical component, achieving 96.22% average accuracy across all downstream classifiers independently, substantially outperforming all other task groups. Individual task group performance demonstrates a clear hierarchy: T1–T5_Geometric (96.22%), T0_Original (89.46%), T8_MultiScale (88.71%), and T6–T7_SignalQuality (59.80%). The geometric tasks’ superiority stems from SAR domain-specific characteristics: radar target recognition inherently requires aspect-angle invariance, and rotation/flip transformations directly address arbitrary viewing angles in overhead imaging while preserving electromagnetic scattering properties and target structure.
The ablation study identifies tasks contributing minimally: T6–T7_SignalQuality alone achieves only 59.80% accuracy, indicating poor standalone discrimination. However, when combined with geometric tasks, signal quality tasks contribute positively (T1–T5 + T6–T7 achieves the best performance), suggesting complementary robustness rather than primary discriminative features. Similarly, T8_MultiScale (88.71% alone) provides moderate independent contribution but enhances specific architectures when combined with geometric tasks.
The task contribution analysis for the task groups is depicted in Figure 5. As shown in the heatmap, T0_Original, T1–T5_Geometric, and T8_MultiScale maintain consistently high scores (0.981–0.994) across accuracy, precision, and F1-score, demonstrating robust discriminative capacity. In stark contrast, T6–T7_SignalQuality exhibits substantially degraded performance (0.720–0.732), confirming its limited standalone contribution while validating its role as a complementary robustness enhancer when combined with geometric transformations. The near-uniform performance of geometric tasks across all metrics (0.981–0.982) reinforces their position as the primary feature learning mechanism, while the marginal performance between T8_MultiScale (0.992–0.993) and geometric tasks suggests scale invariance provides incremental rather than fundamental improvements to the learned representations. Overall, achieving 99.63% accuracy using all the tasks shows that all tasks contribute in gaining performance.

4.10. Baseline Comparison and State-of-the-Art Performance

Our SSL framework demonstrates significant improvements over synthetic data approaches reported by Lewis et al. [3], achieving superior performance using exclusively measured data compared to their hybrid synthetic-measured approaches across all data availability scenarios. The elimination of synthetic data dependency while maintaining exceptional performance (99.63% accuracy for SVM) represents a fundamental advancement over previous methods that struggled with domain gap limitations.
The comparative analysis reveals our approach’s superiority across multiple dimensions: (1) elimination of synthetic data requirements removes domain gap constraints that limited previous approaches; (2) exclusive use of real SAR imagery ensures operational relevance and eliminates modeling assumptions inherent in synthetic generation; (3) superior performance metrics across diverse architectural families demonstrate framework generalizability; (4) robust performance under extreme data constraints (51.95% accuracy at k = 0.05) provides viable solutions for rapid deployment scenarios.
Our SSL framework substantially outperforms prior work across all evaluation metrics. While Lewis et al. [3] achieved 94.5% accuracy using CNN-based models on full measured datasets, our approach delivers superior performance across diverse architectural families. Table 6 presents comprehensive results for all classifiers achieving ≥94.50% accuracy at k = 1.00, demonstrating the framework’s effectiveness across traditional machine learning (SVM: 99.63%, Random Forest: 99.26%), efficient deep architectures (MobileNet variants: 95.73–98.70%, EfficientNet: 95.18–95.92%), standard deep networks (ResNet18: 97.40%), and generative models (GAN Classifier: 96.85%).
The results establish clear performance hierarchies: traditional ML approaches achieve exceptional accuracy when leveraging SSL-extracted features, with SVM demonstrating state-of-the-art performance (99.63%) that substantially exceeds previous benchmarks. Deep learning architectures maintain competitive performance, with ResNet18 leading at 97.40% accuracy, while efficient architectures provide compelling accuracy-efficiency trade-offs suitable for resource-constrained deployment scenarios.
Figure 6 presents ROC curves for the highest-performing classifiers at k = 1.00, demonstrating exceptional discrimination capabilities across all target classes. The visualization reveals near-perfect performance characteristics with AUC values approaching 1.0000 for SVM, Random Forest, ResNet18, and MobileNet variants, validating the superior quality of SSL-extracted features for SAR target recognition.
To visualize the quality of learned representations, we employ t-SNE (t-distributed Stochastic Neighbor Embedding) dimensionality reduction to project the 2048-length SSL-extracted features into 2D space. Figure 7 presents the t-SNE visualization for the test set, revealing distinct class clusters with minimal overlap. The clear separation between target classes demonstrates that our multi-task SSL framework learns semantically meaningful features that naturally group similar targets while maintaining discriminative boundaries between classes. SVM performs very well because the 2048-length SSL embeddings are close to linearly separable; a linear SVM maximizes inter-class margins in high dimensions with strong implicit regularization, which is advantageous in low-sample measured SAR. In contrast, tree ensembles and deep heads exhibit higher variance and tend to overfit speckle/idiosyncratic artifacts on limited data, while SVM leverages the global, geometry-aligned cues captured by the pretext encoder.
Comparison with SAR-Specific SSL Methods: Pei et al. [14] achieved 90.71% (using 10% of MSTAR data) and 99.34% (using 30% of MSTAR data) using SimCLR-style contrastive learning with ResNet50. Li et al. [15] introduced SAR-JEPA with predictive gradient-based embeddings, demonstrating robust low-data performance on MSTAR. SARATR-X [16] employs Vision Transformer pretraining on 180K samples from 14 datasets, while SAFE [18] uses masked Siamese ViT with self-distillation. These foundation models require millions of parameters, extensive multi-dataset curation, and substantial computational resources. Our lightweight CNN (97 K parameters, 2–3 min single-GPU training) achieves 99.63% accuracy with only 806 samples using SAMPLE dataset enhanced from MSTAR, demonstrating that physics-informed task design provides exceptional performance with dramatically lower cost and complexity compared to large-scale foundation approaches.

4.11. Detailed Results

Comprehensive experimental results are provided in the appendix tables. Appendix A Table A1 (Model Performance Metrics and Timing Results for k-value Experiment) presents detailed performance metrics and timing results for all 16 downstream classifiers across 20 different k-values (0.05–1.00), including accuracy, precision, recall, F1-score, ROC AUC, TPR at various FPR thresholds, training time, and inference time. Appendix A Table A2 (Class-wise ROC AUC Results for k-value Experiment) provides class-wise ROC AUC results for the k-value experiments, showing per-class discrimination performance for all 10 target classes. Appendix A Table A3 (Model Performance Metrics and Timing Results for Cross Validation Experiment) contains cross-validation performance metrics as well as the timing results (average of the fold metrics). Appendix A Table A4 (Class-wise ROC AUC Results for Cross Validation Experiment) presents class-wise ROC AUC results from the cross-validation experiments (average of the fold metrics). Appendix A Table A5 (SimCLR Performance Metrics) provides complete SimCLR baseline performance metrics across all k-values for comparison with our multi-task SSL approach.

5. Conclusions and Future Work

This work presents a comprehensive self-supervised learning framework for SAR automatic target recognition that fundamentally advances the state-of-the-art through systematic multi-task pretext training and extensive architectural evaluation. Our approach successfully eliminates dependency on synthetic data while achieving exceptional performance across diverse computational paradigms, providing critical insights for operational SAR ATR deployment through rigorous experimental validation on the SAMPLE dataset with 16 downstream classifiers spanning traditional machine learning, deep neural networks, efficient architectures, and generative models.

5.1. Key Findings and Contributions

The experimental results reveal fundamental insights that challenge conventional assumptions about deep learning superiority in SAR applications. Traditional machine learning approaches demonstrate exceptional effectiveness when combined with SSL-extracted features, with SVM achieving remarkable 99.63% accuracy, 99.63% precision, and 99.66% recall alongside near-perfect per-class ROC AUC values (1.0000 for six target classes, ≥0.9999 for four classes). Random Forest maintains the second-best performance with 99.26% accuracy and consistent per-class discrimination between 0.9960–1.0000, while XGBoost achieves competitive 93.88% accuracy with per-class ROC AUC above 0.9930. Among deep learning architectures, ResNet18 emerges as the top performer with 97.40% accuracy and exceptional per-class ROC AUC values between 0.9974–1.0000, demonstrating that convolutional approaches remain highly effective for SSL-based SAR feature classification.
The k-value experimental design provides insights into framework robustness under varying data constraints, systematically evaluating performance from extreme scarcity (k = 0.05, 43 samples) to full availability (k = 1.0, 806 samples). Traditional ML methods exhibit remarkable resilience at extreme data scarcity, with SVM achieving 51.95% accuracy and Random Forest reaching 46.01% accuracy at k = 0.05, far exceeding random chance performance (10%) and maintaining operationally relevant capabilities for preliminary target screening. Deep learning architectures show higher sensitivity to data availability but demonstrate rapid improvement as data increases, with ResNet18 advancing from 24.86% accuracy at k = 0.05 to 97.40% at k = 1.0, indicating architectural depth’s relationship to data requirements.
Computational efficiency analysis reveals highly favorable operational characteristics essential for real-world deployment. SSL feature extraction dominates processing time at 15.35 ms per image consistently across all classifiers, while inference times range from 0.01 ms (Gradient Boosting) to 0.37 ms (U-Net), enabling total processing times of 15.36–15.72 ms/image and supporting real-time operation at 58–65 Hz frame rates. Operational performance metrics demonstrate exceptional discrimination capability with SVM achieving 99.89% TPR at 1% FPR and perfect 100% TPR at 5% FPR, providing flexible deployment options across diverse mission requirements from high-precision reconnaissance to rapid threat assessment scenarios.

5.2. Theoretical and Practical Implications

The superior performance of traditional ML methods on SSL-extracted features suggests that the learned representations exhibit favorable linear separability characteristics, indicating that complex non-linear transformations may be unnecessary for effective SAR target discrimination when appropriate feature representations are available. This finding has significant practical implications for operational deployment, where traditional ML approaches offer computational efficiency advantages, interpretability benefits, and reduced training complexity compared to deep neural networks while maintaining state-of-the-art accuracy.
The framework’s elimination of synthetic data dependency addresses a critical limitation in SAR ATR research, where domain gap challenges between synthetic and measured imagery have consistently hindered practical deployment. By achieving exceptional performance using exclusively measured SAR imagery, our approach provides a viable path toward operational systems that can be deployed without extensive synthetic data generation infrastructure or domain adaptation techniques. The framework’s elimination of synthetic data dependency addresses a critical limitation in SAR ATR research, where domain gap challenges between synthetic and measured imagery have consistently hindered practical deployment. By achieving exceptional performance using exclusively measured SAR imagery, our approach provides a viable path toward operational systems that can be deployed without extensive synthetic data generation infrastructure or domain adaptation techniques.
Cross-validation experiments using a conservative 5-fold protocol with 920 real images validate framework robustness, with SVM achieving 99.13% mean accuracy and Random Forest maintaining 96.96% accuracy. The narrow confidence intervals demonstrate statistical significance and confirm that performance differences between methods reflect genuine capabilities rather than dataset-specific artifacts, providing confidence for deployment across varied operational scenarios.

5.3. Limitations and Methodological Considerations

Several limitations demand acknowledgment for comprehensive evaluation. Our experimental validation is constrained to the SAMPLE dataset (14– 17 elevation angles, 10– 80 azimuth range) with 1345 images, representing a specific operational scenario that is yet to be tested in diverse SAR sensor configurations, environmental conditions, or significantly larger target inventories encountered in comprehensive military applications. The framework’s performance across different radar frequencies, polarizations, and operational environments requires systematic investigation to establish generalizability bounds.
The k-value experimental design necessarily focuses on data availability scenarios where k 0.10 for GAN-based classifiers due to minimum sample requirements for stable adversarial training. While this methodological choice ensures reliable results and prevents training artifacts, it limits insights into extremely low-data scenarios ( k < 0.10 ) for generative approaches. Additionally, while our computational analysis indicates promising timing characteristics, real-time deployment considerations, including hardware optimization, system integration overhead, and edge computing constraints require empirical validation across diverse operational platforms.

5.4. Future Research Directions

Several promising research directions emerge from this foundation that could significantly extend framework capabilities and operational applicability. Multi-modal integration incorporating multi-polarization, multi-frequency, and multi-temporal SAR data represents a natural extension that could enhance recognition capabilities while maintaining the synthetic data independence that characterizes our approach. SSL-based domain adaptation techniques could address operational condition variations and sensor differences without requiring extensive paired datasets.
Developing incremental learning frameworks that adapt to new target classes without full retraining would improve operational flexibility in dynamic threat environments where new vehicle types or configurations emerge continuously. Advanced pretext tasks design incorporating radar phenomenology principles, such as polarimetric signatures, coherence patterns, or aspect-dependent scattering behaviors, could further improve feature representation quality while maintaining computational efficiency.
Federated learning approaches could enable collaborative model development across multiple operational sites while preserving data privacy and security requirements critical for defense applications. Extension to sequential SAR data and video SAR analysis could leverage temporal information for enhanced recognition performance, particularly for moving targets or complex scenarios requiring temporal context.
Investigation of foundation model architectures specifically designed for radar data could establish large-scale pretraining frameworks that leverage diverse SAR datasets while maintaining the measured data focus that characterizes our current approach. Integration with modern transformer architectures and attention mechanisms could enable more sophisticated feature learning while preserving computational efficiency requirements for operational deployment.
As SAR technology evolves with advanced sensor capabilities, the principles established in this work provide a framework for developing adaptive, efficient automatic target recognition systems. The elimination of synthetic data dependency, combined with exceptional performance characteristics and computational efficiency, positions this approach as a foundation for next-generation SAR ATR applications, contributing to advances in remote sensing, computer vision, and artificial intelligence that extend beyond defense applications to scientific remote sensing, environmental monitoring, and civilian surveillance systems.

Author Contributions

Conceptualization, M.A.S. and D.F.N.; Data curation, D.F.N.; Formal analysis, M.A.S.; Funding acquisition, D.F.N.; Investigation, M.A.S.; Methodology, M.A.S. and D.F.N.; Project administration, D.F.N. and M.N.; Resources, D.F.N., M.N. and J.F.K.; Software, M.A.S.; Supervision, D.F.N., M.N. and J.F.K.; Validation, M.A.S. and D.F.N.; Visualization, M.A.S.; Writing—original draft, M.A.S.; Writing—review & editing, D.F.N., M.N. and J.F.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the funds provided by the National Science Foundation and by DoD OUSD (R&E) under Cooperative Agreement PHY-2229929 (The NSF AI Institute for Artificial and Natural Intelligence).

Data Availability Statement

The SAMPLE dataset used in this research is publicly available at https://github.com/benjaminlewis-afrl/SAMPLE_dataset_public (accessed on 8 December 2025). The implementation code and experimental configurations are available at https://github.com/MdAlSiam/ssl-sar-atr-2-v2/ (accessed on 8 December 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SARSynthetic Aperture Radar
ATRAutomatic Target Recognition
SSLSelf-Supervised Learning
CNNConvolutional Neural Network
SVMSupport Vector Machine
RFRandom Forest
XGBoostExtreme Gradient Boosting
GBGradient Boosting
ROCReceiver Operating Characteristic
AUCArea Under the Curve
TPTrue Positive
TNTrue Negative
FPFalse Positive
FNFalse Negative
TPRTrue Positive Rate
FPRFalse Positive Rate
MSTARMoving and Stationary Target Acquisition and Recognition
SAMPLESynthetic and Measured Paired and Labeled Experiment
GANGenerative Adversarial Network
ViTVision Transformer

Appendix A

Table A1. Model Performance Metrics and Timing Results (k-value experiment).
Table A1. Model Performance Metrics and Timing Results (k-value experiment).
Pretext ModelDownstream ModelkAccuracyPrecisionRecallF1 ScoreROC AUCTPR@FPR = 0.01TPR@FPR = 0.05TPR@FPR = 0.10TPR@FPR = 0.15Train Time (s)Test Feat. Ext. (ms/img)Inference Time (ms/img)
CNNCNN0.050.27270.46420.26040.24520.62100.02090.10470.31730.35440.4315.350.06
CNNCNN0.100.41740.57000.41110.38680.72180.10960.37660.46750.53990.7815.340.05
CNNCNN0.150.33580.42920.33040.28280.62360.03000.22630.39330.44161.1715.340.05
CNNCNN0.200.45450.57820.45590.45390.70830.05900.42120.53990.56771.5815.450.05
CNNCNN0.250.50460.66180.49980.50370.78490.03270.47870.62340.67901.9115.360.05
CNNCNN0.300.47120.60300.46590.47060.71660.04830.43410.53990.57702.3115.380.05
CNNCNN0.350.46940.67950.47170.47600.68930.03360.41000.55470.56962.6015.310.05
CNNCNN0.400.55470.58120.54740.53830.82660.18280.55100.64560.70693.0315.360.05
CNNCNN0.450.52500.56310.51820.51250.74410.10300.51390.59740.62893.4915.390.05
CNNCNN0.500.63450.71540.62540.62730.84240.21570.66420.72540.75883.6815.340.06
CNNCNN0.550.59550.68540.59350.59670.78720.05970.61970.68090.70324.1215.280.05
CNNCNN0.600.67900.74430.67070.66630.88130.32650.72910.78290.80894.6015.290.05
CNNCNN0.650.65680.73090.64880.65090.84920.22820.70130.75510.77554.9615.350.05
CNNCNN0.700.74030.76830.73720.73580.90420.38030.80710.84230.85715.3015.360.05
CNNCNN0.750.76810.80670.76290.75990.94150.64560.83670.87760.89985.7515.370.05
CNNCNN0.800.80890.83000.80710.80750.96130.66980.88500.91090.92586.0115.320.05
CNNCNN0.850.82000.82950.81500.81530.95010.71430.90170.92950.93516.1715.310.05
CNNCNN0.900.84970.85280.84730.84770.95320.74950.91090.93320.94066.6415.400.05
CNNCNN0.950.90720.90860.90420.90460.98160.90350.94620.96470.97407.0715.370.05
CNNCNN1.000.94250.94310.94190.94170.98690.95920.98140.98330.98527.3715.350.06
CNNEfficientNet B00.050.19480.39970.19610.17200.51990.00340.05010.23930.30431.4315.350.29
CNNEfficientNet B00.100.23190.45500.23270.21240.58520.00000.09830.28760.33952.5215.340.26
CNNEfficientNet B00.150.35810.62990.35640.37180.66430.01480.26900.41190.46753.8515.340.27
CNNEfficientNet B00.200.40450.55230.40360.41440.80840.18740.37290.46940.55845.0515.450.27
CNNEfficientNet B00.250.51390.60310.51210.53500.87900.26720.51760.63270.71616.1915.360.27
CNNEfficientNet B00.300.48980.57210.49040.48060.82820.26720.47120.60670.67357.6015.380.27
CNNEfficientNet B00.350.54170.62340.54030.54470.85470.36180.54360.70500.77378.6015.310.27
CNNEfficientNet B00.400.63450.66890.62770.61980.90170.39150.66050.73840.79419.8115.360.28
CNNEfficientNet B00.450.60480.64470.60040.60960.87490.40820.62890.69390.745811.1915.390.28
CNNEfficientNet B00.500.65680.74850.65180.65240.92800.49720.68650.77740.827512.1515.340.26
CNNEfficientNet B00.550.62150.70440.61580.61960.89790.42670.65310.73650.769913.6515.280.27
CNNEfficientNet B00.600.67530.74730.67040.67650.94360.47680.72910.81630.860915.1115.290.27
CNNEfficientNet B00.650.69020.71650.68610.68620.93650.53250.76440.85530.883115.8715.350.28
CNNEfficientNet B00.700.71240.74990.71050.70810.92520.46940.76810.84040.864617.0015.360.27
CNNEfficientNet B00.750.82370.82420.82070.81910.97690.78480.87940.94060.962918.5615.370.28
CNNEfficientNet B00.800.78110.79030.78040.77060.97590.67350.87570.93140.962919.5715.320.27
CNNEfficientNet B00.850.84230.87940.84250.84700.98040.79220.93140.95180.964720.6215.310.27
CNNEfficientNet B00.900.92020.92310.91810.91730.99630.93510.98330.99440.994422.2015.400.26
CNNEfficientNet B00.950.94620.94720.94600.94550.99850.95730.99631.00001.000023.2415.370.26
CNNEfficientNet B01.000.92950.93250.93100.93000.99520.94990.99070.99440.994424.3315.350.27
CNNEfficientNet B10.050.20590.39840.20810.19580.54400.00000.04640.24300.31911.3915.350.27
CNNEfficientNet B10.100.27640.39610.27180.21200.60310.01480.20590.30980.35992.5515.340.26
CNNEfficientNet B10.150.39520.49180.39870.39440.73460.11870.34510.49170.59743.9115.340.27
CNNEfficientNet B10.200.38400.57160.38330.36600.70410.22260.36730.41930.48245.0815.450.28
CNNEfficientNet B10.250.46200.56250.46040.46000.82290.22450.44900.56030.61976.1915.360.27
CNNEfficientNet B10.300.56220.59760.56100.56280.88180.32100.55660.67530.75327.5415.380.27
CNNEfficientNet B10.350.51020.62630.50240.50020.83330.34320.50650.61220.67728.6415.310.28
CNNEfficientNet B10.400.60110.62700.59630.59390.89280.41190.63450.73280.77749.8115.360.27
CNNEfficientNet B10.450.62710.67500.62060.61330.89830.40820.65120.74580.812611.1415.390.27
CNNEfficientNet B10.500.66600.72560.65680.65110.92290.51210.71240.80150.842312.0315.340.27
CNNEfficientNet B10.550.58810.68110.58170.58670.84860.31350.60480.69760.731013.6415.280.27
CNNEfficientNet B10.600.70320.72940.69710.69520.95870.59000.77550.86460.912815.1315.290.27
CNNEfficientNet B10.650.72910.76090.72840.73170.94220.61970.79410.85340.885015.9915.350.27
CNNEfficientNet B10.700.76620.78740.76270.75620.95290.66050.83670.87570.918417.1315.360.28
CNNEfficientNet B10.750.82930.83750.82790.82840.98260.75510.91090.96470.972218.5915.370.27
CNNEfficientNet B10.800.78110.80720.77910.77370.97680.64940.86640.94430.974019.4415.320.27
CNNEfficientNet B10.850.84040.85690.83850.83940.98090.79780.91840.95920.974020.4815.310.27
CNNEfficientNet B10.900.92950.93110.92860.92860.99660.94060.98700.99440.996322.2815.400.27
CNNEfficientNet B10.950.90910.91290.90840.90710.99670.91280.98520.99811.000023.3015.370.27
CNNEfficientNet B11.000.95180.95280.95280.95140.99300.96290.98700.99070.990724.0815.350.27
CNNEfficientNet B20.050.21520.34810.21880.14600.54910.01480.13910.24680.33211.4015.350.27
CNNEfficientNet B20.100.25970.46220.25650.22650.57790.00930.15030.29500.32842.5615.340.26
CNNEfficientNet B20.150.35440.66930.35540.37650.66330.00000.25600.42670.50093.9215.340.27
CNNEfficientNet B20.200.40260.57960.39410.39260.71730.13730.35250.48240.53254.9815.450.27
CNNEfficientNet B20.250.45640.53290.44980.43480.82380.25230.44710.55660.62156.1315.360.27
CNNEfficientNet B20.300.53060.61030.52730.52070.81170.24680.52320.61220.67907.5415.380.27
CNNEfficientNet B20.350.47500.63450.46880.47370.75570.19850.46200.54730.62348.7715.310.27
CNNEfficientNet B20.400.60300.62510.59950.59210.87530.42120.63820.71990.76999.7815.360.27
CNNEfficientNet B20.450.53250.63230.52420.52640.84870.37480.53620.61970.701311.1715.390.27
CNNEfficientNet B20.500.59000.59980.59260.56940.91380.39520.63080.74030.818212.0315.340.27
CNNEfficientNet B20.550.59550.69290.59140.59390.85360.42860.60110.69940.755113.5415.280.27
CNNEfficientNet B20.600.64750.70810.64520.65190.92960.50650.68650.80890.853415.0015.290.27
CNNEfficientNet B20.650.71430.75150.70750.70240.92750.56960.74950.81260.859016.1315.350.27
CNNEfficientNet B20.700.73840.78230.73320.73170.92870.58440.79220.85160.877617.3715.360.27
CNNEfficientNet B20.750.84420.85750.84110.84370.98560.79410.91840.96290.987018.4115.370.27
CNNEfficientNet B20.800.77920.81610.77790.77860.96790.65860.84970.89240.938819.3815.320.27
CNNEfficientNet B20.850.90910.91330.90810.90780.99560.91280.97590.99440.998120.4815.310.27
CNNEfficientNet B20.900.91280.91630.91210.91120.99600.91470.97770.99260.998122.0115.400.27
CNNEfficientNet B20.950.94620.94790.94570.94510.99680.96660.99070.99070.996323.1615.370.27
CNNEfficientNet B21.000.89420.89990.89140.89180.99140.87940.97590.98520.990724.3315.350.27
CNNEfficientNet B30.050.15960.48540.16230.14780.51480.00000.02040.18740.30801.3915.350.26
CNNEfficientNet B30.100.26720.48600.26780.22910.61160.04820.18000.29680.37112.5615.340.27
CNNEfficientNet B30.150.29870.52240.28910.25230.60770.01140.19290.33950.39523.9715.340.27
CNNEfficientNet B30.200.42860.54690.42300.41200.74090.15210.38400.49350.56224.9615.450.27
CNNEfficientNet B30.250.46570.59520.46140.46820.80780.19110.44530.54360.61226.1615.360.27
CNNEfficientNet B30.300.46380.63670.46280.45680.79180.20780.44160.55100.59937.5315.380.27
CNNEfficientNet B30.350.48790.56700.48340.47480.78250.25790.46940.55840.62348.6815.310.27
CNNEfficientNet B30.400.62710.65690.62130.60820.90420.39890.65680.73280.77749.8515.360.27
CNNEfficientNet B30.450.57700.67320.57120.56670.86430.43040.58810.65310.699411.1715.390.27
CNNEfficientNet B30.500.60480.68040.60160.61250.90500.32840.62890.72910.788511.9815.340.28
CNNEfficientNet B30.550.64750.69480.64630.64870.87450.37110.69200.77180.816313.4615.280.27
CNNEfficientNet B30.600.66050.74690.65510.65270.91740.51210.70500.77180.814514.8215.290.27
CNNEfficientNet B30.650.70690.74120.70230.69710.92600.55100.75700.82930.864616.1015.350.27
CNNEfficientNet B30.700.72730.76660.72410.72420.94360.55100.77920.84970.888717.2815.360.27
CNNEfficientNet B30.750.74770.77210.74260.73760.96250.64380.83120.89420.920218.5515.370.27
CNNEfficientNet B30.800.80330.82290.80130.79530.97280.70500.87570.92210.944319.4715.320.27
CNNEfficientNet B30.850.84230.85560.83830.83830.98210.81260.92580.95360.968520.7015.310.27
CNNEfficientNet B30.900.91280.91920.91150.91370.99320.90910.96100.98140.987022.1015.400.27
CNNEfficientNet B30.950.94060.94130.93930.93950.99880.96850.99811.00001.000023.0315.370.27
CNNEfficientNet B31.000.95920.95890.95990.95920.99580.98140.99440.99440.994424.3815.350.28
CNNGAN Classifier0.100.35990.43630.34640.29480.76470.15210.32650.45080.54360.7515.340.09
CNNGAN Classifier0.150.37480.63770.37100.40100.66600.00470.30060.42860.49171.1415.340.08
CNNGAN Classifier0.200.36180.52180.36220.35150.71190.03150.28010.43230.52501.3515.450.08
CNNGAN Classifier0.250.50090.67680.49660.50600.82070.27090.48240.58810.69391.4515.360.08
CNNGAN Classifier0.300.51950.68720.51740.52970.79280.02410.49910.61780.66421.7715.380.08
CNNGAN Classifier0.350.56030.75560.55460.57450.78960.20780.56220.66420.70502.3915.310.08
CNNGAN Classifier0.400.61040.64430.60760.58750.89040.41190.64010.74030.79222.7315.360.08
CNNGAN Classifier0.450.65860.74020.65780.66170.88410.44160.69570.78110.81453.3215.390.08
CNNGAN Classifier0.500.68090.75570.67310.66390.92090.49720.71800.82930.86463.5615.340.08
CNNGAN Classifier0.550.69390.77630.69030.69520.87160.20440.74400.79780.81634.2415.280.08
CNNGAN Classifier0.600.65490.73180.65020.64870.92680.48050.68830.77740.83674.1415.290.09
CNNGAN Classifier0.650.72540.75210.72300.72010.92080.51210.78660.84970.87014.8615.350.08
CNNGAN Classifier0.700.75510.80660.74940.74880.92700.52690.81260.85900.87945.1615.360.08
CNNGAN Classifier0.750.79590.83150.79350.79050.97510.73470.86640.93140.95735.5315.370.08
CNNGAN Classifier0.800.79410.82580.79300.79410.97470.66420.88870.94060.96475.4815.320.08
CNNGAN Classifier0.850.86640.87690.86500.86500.97800.79780.93140.96660.97966.3815.310.09
CNNGAN Classifier0.900.92020.92500.91940.91890.99440.93320.97960.99070.99446.7515.400.08
CNNGAN Classifier0.950.95550.95750.95520.95500.99840.96850.99440.99811.00007.3615.370.08
CNNGAN Classifier1.000.96850.96880.96790.96790.99460.99070.99260.99440.99446.8815.350.08
CNNGradient Boosting0.050.26160.48560.25950.25620.64160.08160.17070.30980.38594.5815.350.01
CNNGradient Boosting0.100.34690.59910.33610.33540.70430.11130.26530.41740.471213.3915.340.01
CNNGradient Boosting0.150.47870.59160.47870.48120.82240.29500.46380.57930.654918.1715.340.01
CNNGradient Boosting0.200.48790.58700.48760.49460.83960.30980.45270.57880.645626.2515.450.01
CNNGradient Boosting0.250.44340.57640.43790.44670.80800.30430.41930.49910.549228.9115.360.01
CNNGradient Boosting0.300.48610.57330.48030.47750.84260.32280.46380.56400.632734.8615.380.01
CNNGradient Boosting0.350.53620.65890.52840.53120.83990.30980.52500.58810.632753.4615.310.01
CNNGradient Boosting0.400.57510.60730.56910.57140.87810.38590.58630.66980.732840.6815.360.01
CNNGradient Boosting0.450.52320.64090.51640.50520.82170.33950.49720.58440.654958.6215.390.01
CNNGradient Boosting0.500.66790.72820.66750.67440.92890.54360.70690.76810.829361.4715.340.01
CNNGradient Boosting0.550.64750.70800.64230.64530.92630.53060.66790.77370.831243.1015.280.01
CNNGradient Boosting0.600.66050.76550.65730.65810.90530.51760.69200.74770.771860.4515.290.01
CNNGradient Boosting0.650.72170.73850.71340.70260.93880.60850.76810.80890.844260.6215.350.01
CNNGradient Boosting0.700.72910.75030.72260.72360.91360.56220.78480.83120.8571109.0915.360.01
CNNGradient Boosting0.750.83860.85410.83690.83800.98020.78110.92020.95180.966684.5115.370.01
CNNGradient Boosting0.800.79590.81610.79480.79660.97180.73100.86460.91280.942590.1415.320.01
CNNGradient Boosting0.850.83490.84840.83170.83330.97440.79410.89420.91090.9332100.2415.310.01
CNNGradient Boosting0.900.87380.88530.87030.87310.99100.84420.94990.97590.9852101.4615.400.01
CNNGradient Boosting0.950.93320.93620.93070.93150.99650.94430.99070.99260.994483.1615.370.01
CNNGradient Boosting1.000.92760.93020.92790.92750.99510.94430.97400.97960.994491.6415.350.01
CNNMobileNet v10.050.33770.54540.32980.31290.70710.12240.30800.39520.48050.5115.350.11
CNNMobileNet v10.100.34320.64230.34310.32870.76130.21340.30800.40260.49910.8515.340.10
CNNMobileNet v10.150.42120.61550.41620.38120.78880.25970.39520.49350.53801.2915.340.10
CNNMobileNet v10.200.47500.68810.47280.46950.82600.32650.43600.55470.64011.6815.450.11
CNNMobileNet v10.250.52880.65700.52440.52390.86760.34320.53620.64750.71802.1115.360.10
CNNMobileNet v10.300.59740.72760.59520.58580.89620.46200.61040.71610.77372.4815.380.11
CNNMobileNet v10.350.56590.70730.56270.54260.86190.38220.56590.63450.67532.9215.310.10
CNNMobileNet v10.400.60850.73620.60390.60380.90030.45830.63270.71610.77373.2315.360.10
CNNMobileNet v10.450.68460.78630.67790.67580.94280.60300.73840.82190.87763.7715.390.10
CNNMobileNet v10.500.63450.80170.63110.64500.90850.36360.67160.78660.83864.0915.340.10
CNNMobileNet v10.550.70130.78150.69620.69750.95460.58440.76070.85900.89804.4815.280.10
CNNMobileNet v10.600.75700.82750.75110.74990.95080.66230.82560.87940.89614.8815.290.10
CNNMobileNet v10.650.80330.83120.79880.79480.97320.73840.86270.90170.92215.3115.350.10
CNNMobileNet v10.700.75320.80700.74760.74720.96340.65680.83670.90540.94995.7715.360.10
CNNMobileNet v10.750.81630.83500.81210.80910.98360.78290.90350.94250.96476.2015.370.10
CNNMobileNet v10.800.80710.84270.80420.79730.97780.73280.87760.94430.96296.4015.320.10
CNNMobileNet v10.850.90350.91130.90190.90060.99480.91650.97220.98520.98896.9815.310.10
CNNMobileNet v10.900.92760.93330.92680.92610.99640.94430.97770.98890.99817.4015.400.10
CNNMobileNet v10.950.96100.96180.95980.96020.99900.98140.99440.99631.00007.8415.370.10
CNNMobileNet v11.000.98700.98730.98670.98680.99980.99631.00001.00001.00008.0215.350.10
CNNMobileNet v1 0.250.050.24300.34160.23500.15350.60480.06120.17250.23930.28760.4615.350.11
CNNMobileNet v1 0.250.100.28940.48390.28070.25690.67420.08910.20780.43040.49540.8115.340.09
CNNMobileNet v1 0.250.150.38960.57680.38280.35230.78840.20780.36550.49170.56031.2415.340.09
CNNMobileNet v1 0.250.200.44340.55060.43460.42010.86290.26900.44900.58440.68271.6515.450.09
CNNMobileNet v1 0.250.250.47500.63390.46990.45450.79890.30240.45450.54550.62522.0215.360.09
CNNMobileNet v1 0.250.300.50650.65680.49850.49490.84400.25970.49720.61410.69392.5115.380.10
CNNMobileNet v1 0.250.350.51950.71290.51730.52220.83730.29310.51020.63270.70502.7715.310.09
CNNMobileNet v1 0.250.400.61780.65870.61310.59540.92810.43780.64940.75510.83123.2415.360.09
CNNMobileNet v1 0.250.450.60300.67340.59980.59760.91580.45640.64380.75700.82933.7115.390.10
CNNMobileNet v1 0.250.500.58810.63630.58090.56670.90460.44530.62340.74770.81084.0115.340.09
CNNMobileNet v1 0.250.550.76250.79020.75460.75170.96550.64190.82000.88680.92764.2815.280.09
CNNMobileNet v1 0.250.600.67900.75310.67130.67030.94970.57510.74030.82190.89054.9315.290.09
CNNMobileNet v1 0.250.650.71990.77670.71600.72000.94980.51950.79410.86830.89985.1715.350.09
CNNMobileNet v1 0.250.700.72540.80080.72050.72640.94860.56770.80520.87200.90355.4315.360.09
CNNMobileNet v1 0.250.750.82000.82770.81670.81700.98270.74400.91650.95730.97226.0015.370.09
CNNMobileNet v1 0.250.800.74770.78620.74440.74280.94880.61040.82000.88130.90916.1515.320.09
CNNMobileNet v1 0.250.850.85530.86630.85230.85200.98850.81630.93880.96850.98706.8015.310.09
CNNMobileNet v1 0.250.900.86460.86870.86300.86180.98920.83120.94810.97030.97966.9415.400.09
CNNMobileNet v1 0.50.600.73280.82090.72820.73530.94790.58070.78110.85710.89054.8715.290.10
CNNMobileNet v1 0.50.650.78110.81680.77920.77670.96630.64750.85530.89980.92765.1215.350.10
CNNMobileNet v1 0.50.700.77920.80560.77690.77390.96570.71610.84040.88870.92215.4115.360.10
CNNMobileNet v1 0.50.750.82930.86640.82690.82810.97800.76990.89980.94430.95736.0815.370.09
CNNMobileNet v1 0.50.800.82190.83680.82230.81620.98250.74030.90720.95360.96856.2015.320.10
CNNMobileNet v1 0.50.850.84970.87020.84570.84480.98440.83120.91090.94620.96856.8515.310.10
CNNMobileNet v1 0.50.900.90170.91200.89910.90110.99590.90910.98520.99630.99817.1015.400.09
CNNMobileNet v1 0.50.950.96290.96330.96260.96240.99900.98330.99441.00001.00007.6615.370.09
CNNMobileNet v1 0.51.000.95730.95720.95770.95690.99920.97400.99811.00001.00008.0115.350.09
CNNMobileNet v1 0.750.050.31540.42640.31200.29580.68290.07240.23930.38030.46380.4615.350.10
CNNMobileNet v1 0.750.100.29310.56090.28960.27240.68150.14470.24680.34690.44710.7915.340.09
CNNMobileNet v1 0.750.150.38400.65550.38320.37120.75090.22080.33950.45080.54171.2315.340.09
CNNMobileNet v1 0.750.200.43410.60010.42590.42020.78210.20410.42120.53990.60301.5815.450.09
CNNMobileNet v1 0.750.250.51760.71600.50590.50560.81380.24120.50830.59000.64382.0215.360.09
CNNMobileNet v1 0.750.300.44340.67590.43060.40120.78750.19110.42860.52690.57702.3515.380.09
CNNMobileNet v1 0.750.350.53620.69250.53460.53430.85210.33950.53800.63080.69572.8615.310.09
CNNMobileNet v1 0.750.400.64940.78820.64130.63480.92480.46200.68650.77550.82753.0515.360.09
CNNMobileNet v1 0.750.450.66790.71500.66460.64890.92170.52320.70870.78480.82933.6015.390.09
CNNMobileNet v1 0.750.500.66230.73390.65930.64830.93100.56030.69570.79780.84423.9915.340.09
CNNMobileNet v1 0.750.550.68830.77420.68200.67400.93400.49910.75320.81820.86464.3815.280.09
CNNMobileNet v1 0.750.600.73840.80340.73670.74740.96140.60670.81260.89240.92584.6615.290.09
CNNMobileNet v1 0.750.650.80710.84010.80180.79940.97570.73100.87570.91470.94815.1715.350.10
CNNMobileNet v1 0.750.700.76250.82330.75640.75270.94670.64940.82370.86830.89425.3215.360.09
CNNMobileNet v1 0.750.750.83300.86700.82940.83060.97900.75510.89240.94250.95926.0715.370.09
CNNMobileNet v1 0.750.800.82930.86110.82620.82370.98110.74770.89610.95180.96476.2215.320.09
CNNMobileNet v1 0.750.850.89800.91340.89610.89430.99320.89610.96660.98140.98896.6115.310.09
CNNMobileNet v1 0.750.900.91840.92070.91740.91710.99520.93140.98330.99070.99636.9215.400.09
CNNMobileNet v1 0.750.950.93140.93680.93060.93130.99780.94250.99440.99810.99817.5715.370.09
CNNMobileNet v1 0.751.000.97770.97800.97710.97710.99970.99441.00001.00001.00007.4815.350.09
CNNRandom Forest0.050.46010.58400.44450.40290.83010.28920.46950.54890.61770.0915.350.03
CNNRandom Forest0.100.52320.71210.51360.49850.88290.32060.53430.64180.71410.0915.340.04
CNNRandom Forest0.150.54550.72960.54100.52410.89230.39380.54890.66280.73700.0815.340.05
CNNRandom Forest0.200.53620.70230.53310.53230.87120.44280.53120.64240.70660.0815.450.04
CNNRandom Forest0.250.61220.71950.60990.60200.93560.50130.66630.76040.85860.0915.360.05
CNNRandom Forest0.300.60480.73650.59920.60060.93590.51270.62910.75460.83740.0815.380.04
CNNRandom Forest0.350.61410.75440.60850.60760.92040.46590.64600.75660.81920.0915.310.04
CNNRandom Forest0.400.70690.77050.69940.68800.94630.60690.74490.81820.86330.1015.360.05
CNNRandom Forest0.450.70870.81530.70330.69620.94470.58640.72530.80950.87020.0915.390.05
CNNRandom Forest0.500.71610.76570.70830.70150.96590.60760.78910.87460.93530.0915.340.05
CNNRandom Forest0.550.78480.83030.77700.77440.97660.70670.84880.93070.95850.0915.280.05
CNNRandom Forest0.600.80710.84120.80060.79680.97470.74050.86460.92040.95060.0915.290.05
CNNRandom Forest0.650.84420.86560.83940.84020.98390.76760.89890.95090.97780.0915.350.05
CNNRandom Forest0.700.83120.85430.82590.82620.98440.79060.90630.93910.97510.1015.360.04
CNNRandom Forest0.750.87940.89000.87610.87470.98880.82550.93770.97420.98330.1015.370.04
CNNRandom Forest0.800.88130.89480.87820.87800.98810.82860.93350.96910.98090.1115.320.05
CNNRandom Forest0.850.90350.90770.90110.90030.99270.89490.94930.98250.98990.1015.310.05
CNNRandom Forest0.900.93880.94630.93730.93850.99600.93420.98700.99260.99440.1115.400.04
CNNRandom Forest0.950.97220.97310.97090.97160.99930.98370.99811.00001.00000.1015.370.05
CNNRandom Forest1.000.99260.99300.99220.99240.99980.99651.00001.00001.00000.1015.350.05
CNNResNet180.050.24860.48830.24660.20210.57800.04450.19670.27640.33950.6915.350.15
CNNResNet180.100.47680.74770.47470.44270.72790.23560.46010.52880.55841.1315.340.13
CNNResNet180.150.40450.66440.40350.37090.74270.23750.37480.43600.50831.7415.340.13
CNNResNet180.200.41190.61280.41310.40340.75020.04820.37290.52690.57702.2215.450.13
CNNResNet180.250.55840.76270.55510.55300.85930.30240.55660.68830.73652.7215.360.13
CNNResNet180.300.51210.73050.50970.52100.77150.10020.50280.61040.66053.2815.380.13
CNNResNet180.350.53060.70970.53090.53750.78850.17250.54730.61970.68463.7915.310.13
CNNResNet180.400.64750.78380.64620.63920.88380.43040.68460.74580.77374.2715.360.13
CNNResNet180.450.56590.70950.55960.56530.85220.37110.57880.69200.75144.8815.390.13
CNNResNet180.500.64380.73890.63820.62950.86630.42860.69390.75700.81265.2115.340.13
CNNResNet180.550.67530.74420.66800.67080.89810.52690.70870.76620.80715.8015.280.13
CNNResNet180.600.68460.81730.68040.68990.92670.55470.74400.81820.86276.4915.290.13
CNNResNet180.650.73100.81010.72560.73210.93460.57330.79780.85340.87576.8115.350.13
CNNResNet180.700.73470.80450.72990.72810.87780.61970.79040.83670.84977.3415.360.13
CNNResNet180.750.85530.86100.85240.84860.97880.82750.92950.94810.96478.0615.370.13
CNNResNet180.800.80710.81720.80450.80160.97370.74950.89240.93690.95368.3615.320.13
CNNResNet180.850.90170.90960.89910.89930.98070.89980.94430.96470.97778.8115.310.13
CNNResNet180.900.92760.93590.92610.92620.99460.93320.98140.98890.99449.5115.400.13
CNNResNet180.950.93690.94250.93680.93720.99770.94810.99260.99810.998110.0015.370.13
CNNResNet181.000.97400.97410.97510.97410.99530.98700.99260.99260.994410.4215.350.13
CNNSVM0.050.51950.71690.51330.48380.85500.23750.48420.61220.66980.0015.350.01
CNNSVM0.100.51390.69750.51460.51530.88990.26900.57700.70320.74950.0115.340.02
CNNSVM0.150.53800.69200.53800.52580.90260.35620.60480.70320.76250.0115.340.02
CNNSVM0.200.59550.69320.59390.58820.91860.38220.66420.75510.79780.0215.450.03
CNNSVM0.250.64560.76520.64290.64660.93920.51580.67160.78850.86830.0215.360.04
CNNSVM0.300.64190.76700.63750.64740.92780.46750.68830.77370.85160.0415.380.04
CNNSVM0.350.63640.79350.63070.64170.94300.49170.71610.79780.87940.0315.310.04
CNNSVM0.400.65860.74950.65330.65310.94650.54360.71610.83490.89240.0515.360.05
CNNSVM0.450.73100.82830.72500.72600.95990.66050.77550.84420.90350.0815.390.05
CNNSVM0.500.76620.85120.76030.76160.96710.64940.84040.89980.93320.0615.340.05
CNNSVM0.550.77180.82970.76750.76680.97970.70130.87570.94990.97590.0815.280.05
CNNSVM0.600.81630.86420.81360.81580.98280.75510.90170.95550.97590.1315.290.06
CNNSVM0.650.82560.86610.81980.82010.97910.77180.88500.93880.96100.1215.350.07
CNNSVM0.700.82930.87670.82410.82590.98150.79040.91090.94990.97030.1215.360.07
CNNSVM0.750.88680.89920.88420.88300.99170.87380.95550.97220.98140.1815.370.07
CNNSVM0.800.88130.90220.87870.87840.99150.87010.94810.97770.99260.1615.320.09
CNNSVM0.850.93320.94080.93140.93140.99710.94250.97770.99441.00000.1515.310.08
CNNSVM0.900.95360.95540.95280.95290.99920.97220.99811.00001.00000.2315.400.08
CNNSVM0.950.98890.98880.98900.98891.00001.00001.00001.00001.00000.2315.370.09
CNNSVM1.000.99630.99630.99660.99641.00001.00001.00001.00001.00000.2115.350.08
CNNU-Net0.050.27830.60980.26890.24030.63470.06490.15770.36180.43041.6415.350.39
CNNU-Net0.100.31170.63610.30350.28390.70500.14290.27270.38220.45083.0515.340.37
CNNU-Net0.150.42670.64520.42580.40230.76220.16510.41370.50460.56034.6415.340.37
CNNU-Net0.200.47680.56950.47540.44150.83210.26720.49910.62340.68656.0315.450.37
CNNU-Net0.250.43230.45640.42310.40080.83470.25790.42120.52500.65497.3515.360.37
CNNU-Net0.300.43230.57810.42500.38510.79020.25600.41370.54920.62528.9015.380.37
CNNU-Net0.350.60850.72840.59940.58730.88600.39150.65310.72540.797810.4115.310.37
CNNU-Net0.400.56590.72670.55830.54950.85850.30240.57330.66050.718011.7515.360.37
CNNU-Net0.450.54360.62600.53570.51840.87950.43600.57140.69390.781113.3315.390.37
CNNU-Net0.500.70320.75330.70120.69530.94490.54550.74770.82000.875714.4615.340.37
CNNU-Net0.550.65310.70060.65150.64560.93110.51950.69390.80150.877616.1115.280.37
CNNU-Net0.600.74950.77940.74570.73880.97160.66980.84790.91650.940617.6515.290.37
CNNU-Net0.650.75880.78110.75340.74940.96820.61970.85530.91090.940619.0815.350.37
CNNU-Net0.700.74210.79340.73700.73270.94210.60480.79590.84970.875720.4815.360.37
CNNU-Net0.750.79960.81250.79610.78760.96740.73100.87010.89800.922122.0515.370.37
CNNU-Net0.800.79410.81200.79050.78720.97890.68460.88130.94620.966623.2715.320.37
CNNU-Net0.850.86640.87750.86370.86040.98790.84970.94060.96850.981424.6515.310.37
CNNU-Net0.900.84790.86750.84360.84690.98160.74580.92390.95180.970326.2115.400.37
CNNU-Net0.950.91840.92230.91630.91630.99700.92950.98890.99810.998127.6915.370.37
CNNU-Net1.000.84040.86360.83830.83730.98740.77180.94250.97960.990728.9815.350.37
CNNXGBoost0.050.36730.48550.35330.32950.69440.15580.30980.35810.42490.2815.350.04
CNNXGBoost0.100.32100.57140.30840.30100.72390.09830.26900.38030.52320.3215.340.02
CNNXGBoost0.150.44530.53130.43800.41000.79230.23930.42860.52690.59930.3615.340.02
CNNXGBoost0.200.50830.61660.50710.48950.85460.32840.52130.61040.67350.4115.450.02
CNNXGBoost0.250.50460.56530.49820.49150.86730.37110.52130.64560.73100.4715.360.02
CNNXGBoost0.300.52130.61660.51430.50950.88360.35250.53800.65680.74950.5215.380.02
CNNXGBoost0.350.59550.67870.58980.57920.89610.39330.61600.72910.77550.6415.310.02
CNNXGBoost0.400.63820.68370.63260.62070.93000.51020.68090.79410.86090.6015.360.02
CNNXGBoost0.450.57330.71110.56670.56980.85760.33580.61040.68830.72730.6515.390.02
CNNXGBoost0.500.69390.72000.69070.69320.96360.59000.77550.87570.93510.7315.340.02
CNNXGBoost0.550.73650.78440.72940.72970.96900.64190.83120.91840.94620.6315.280.02
CNNXGBoost0.600.72730.81070.72220.72310.95900.64940.79040.87940.91840.7415.290.02
CNNXGBoost0.650.75880.77120.75310.75130.96070.65120.81630.87760.92020.7515.350.02
CNNXGBoost0.700.73280.75500.72630.72220.96060.62890.81080.89980.92210.9615.360.02
CNNXGBoost0.750.87940.89160.87680.87800.98890.85340.95550.97960.98330.9115.370.02
CNNXGBoost0.800.87570.89540.87380.87460.98530.83300.93140.94990.96660.9515.320.02
CNNXGBoost0.850.88310.89050.88040.88130.99140.87200.95730.97590.98890.9215.310.02
CNNXGBoost0.900.89980.90550.89800.89740.99430.89420.97030.98700.99260.9915.400.02
CNNXGBoost0.950.96100.96310.95950.96000.99900.97031.00001.00001.00000.9415.370.02
CNNXGBoost1.000.93880.94080.93870.93900.99790.96100.98890.99630.99810.9715.350.02
Table A2. Class-wise ROC AUC Results.
Table A2. Class-wise ROC AUC Results.
Pretext ModelDownstream ModelkROC AUC Class 0ROC AUC Class 1ROC AUC Class 2ROC AUC Class 3ROC AUC Class 4ROC AUC Class 5ROC AUC Class 6ROC AUC Class 7ROC AUC Class 8ROC AUC Class 9
CNNCNN0.050.55680.72090.52190.43800.64570.73340.79190.69270.45870.7500
CNNCNN0.100.73900.85670.59200.74260.85800.39550.82080.88170.75990.8620
CNNCNN0.150.74130.71780.51190.60320.59300.26520.73820.59030.63540.9373
CNNCNN0.200.76740.86380.63680.59360.55790.71220.99440.59980.63740.7939
CNNCNN0.250.79840.91210.61350.67630.57760.93310.99400.69060.72860.9559
CNNCNN0.300.80410.72650.59570.64300.56320.77200.71240.68030.72290.9490
CNNCNN0.350.76360.93060.67130.48920.50660.64230.81760.50220.79950.8855
CNNCNN0.400.84190.79030.67020.84610.72510.78000.88890.89620.79140.9722
CNNCNN0.450.75270.63530.65230.85810.74870.63780.76480.87500.81280.7460
CNNCNN0.500.96150.87190.71200.66240.72220.90260.93570.89680.83920.9835
CNNCNN0.550.83950.80920.71080.78960.69090.59930.95030.82560.84530.8172
CNNCNN0.600.96530.93390.88670.92020.84700.71310.62240.94150.92890.9944
CNNCNN0.650.92590.90360.80150.73020.79240.90770.68260.96650.91550.8765
CNNCNN0.700.88320.97480.89030.89940.86950.91110.73480.89540.94020.9991
CNNCNN0.750.95640.92450.94590.87460.89250.97300.98820.94910.97580.9634
CNNCNN0.800.98100.92820.97480.91340.93280.99590.97410.96730.97200.9769
CNNCNN0.850.99200.85660.92580.91270.92650.97760.93420.96370.98690.9921
CNNCNN0.900.98100.96690.90220.95240.86730.96370.95570.98280.94570.9935
CNNCNN0.950.97540.96200.97520.99030.96430.98001.00000.99500.98130.9929
CNNCNN1.000.99890.99850.99000.98080.99490.96770.99620.94900.99521.0000
CNNEfficientNet B00.050.49320.65590.48410.44640.43140.51320.53810.43690.52650.6345
CNNEfficientNet B00.100.70950.81470.13340.67900.67870.38390.49740.59510.69240.8371
CNNEfficientNet B00.150.85580.75450.37300.63490.64380.58060.67450.59230.75380.8760
CNNEfficientNet B00.200.71220.93660.94400.56160.80630.76350.79690.73370.81970.9639
CNNEfficientNet B00.250.93550.95740.97250.73610.66590.77610.96190.88640.82160.9889
CNNEfficientNet B00.300.86880.81320.86770.83180.67780.85340.76900.82320.87430.9783
CNNEfficientNet B00.350.69650.99210.76260.87260.93470.80100.77520.83960.92470.9184
CNNEfficientNet B00.400.96700.92120.93540.78870.66920.80880.98100.97090.95700.9996
CNNEfficientNet B00.450.84570.78360.95110.89270.89800.85570.90290.86570.86440.9108
CNNEfficientNet B00.500.98100.97970.96180.91670.86640.99770.98380.90780.92000.9670
CNNEfficientNet B00.550.93900.93620.85040.87890.88260.90240.93780.99070.88540.8636
CNNEfficientNet B00.600.96060.97760.96210.87850.90430.86440.99870.97300.91380.9922
CNNEfficientNet B00.650.95540.97210.96550.89530.95910.97640.83960.96650.88350.9846
CNNEfficientNet B00.700.92470.99440.97410.87580.91280.92390.86140.95730.92540.9942
CNNEfficientNet B00.750.97200.96750.99280.95210.94290.96860.99950.97120.98520.9998
CNNEfficientNet B00.800.98630.98790.98860.94950.91250.96590.99710.97840.98741.0000
CNNEfficientNet B00.850.99770.99870.99500.98820.95830.94830.98820.98210.98050.9999
CNNEfficientNet B00.900.99080.99530.99430.99520.98680.99990.99930.99910.99881.0000
CNNEfficientNet B00.950.99950.99940.99970.99770.99750.99981.00000.99750.99171.0000
CNNEfficientNet B01.000.99450.99930.99930.98020.99740.99751.00000.99620.99960.9959
CNNEfficientNet B10.050.51230.48140.40720.49260.65440.52230.61820.47330.53640.6883
CNNEfficientNet B10.100.69430.85770.15350.71900.78230.32610.48230.74430.74190.6872
CNNEfficientNet B10.150.74780.81970.89530.47950.50560.90790.79240.56310.87950.9035
CNNEfficientNet B10.200.72100.90990.45930.71620.68110.56220.92050.57110.79890.8131
CNNEfficientNet B10.250.88070.90330.87190.79660.77620.82690.75240.78740.89950.9333
CNNEfficientNet B10.300.90490.90860.81070.82520.87950.91290.90860.89650.84690.9191
CNNEfficientNet B10.350.86900.95150.78980.68010.75990.96120.78310.90010.91100.9385
CNNEfficientNet B10.400.94090.94120.82190.82750.83290.84220.94820.89850.90810.9849
CNNEfficientNet B10.450.84010.94150.82520.87200.88140.95190.94320.87050.94240.9889
CNNEfficientNet B10.500.97980.95660.89650.76910.86820.97290.99150.97360.94230.9866
CNNEfficientNet B10.550.90130.98000.83910.87770.89210.91220.56110.93680.81110.8843
CNNEfficientNet B10.600.96750.97760.90170.93240.93750.94180.98480.98140.96470.9997
CNNEfficientNet B10.650.91510.97950.95090.84380.95110.98650.91110.95320.97070.9700
CNNEfficientNet B10.700.98860.99640.97990.95460.93230.97610.85950.95900.95270.9999
CNNEfficientNet B10.750.98360.99340.96590.96430.97780.98890.98770.98370.98580.9982
CNNEfficientNet B10.800.98630.98860.98660.94720.95170.97690.99480.97430.97240.9828
CNNEfficientNet B10.850.99860.93330.99010.98630.97810.98110.99360.98490.98620.9996
CNNEfficientNet B10.900.99840.99770.99540.99320.98910.99991.00000.99700.99621.0000
CNNEfficientNet B10.950.99920.99890.99820.99310.98840.99820.99980.99520.99191.0000
CNNEfficientNet B11.000.99910.99350.99500.99370.99920.99370.99960.96400.99761.0000
CNNEfficientNet B20.050.54940.86450.48490.42740.61900.73080.59410.32690.46600.6657
CNNEfficientNet B20.100.69720.85070.22540.72440.69550.28780.42850.71720.73310.6587
CNNEfficientNet B20.150.67680.75240.54970.55710.50590.88020.58000.63900.76640.8207
CNNEfficientNet B20.200.70000.95120.59080.67330.75930.48450.49210.86770.85730.8672
CNNEfficientNet B20.250.82470.85690.63950.83260.75410.86780.93340.89380.88860.9552
CNNEfficientNet B20.300.90870.93070.87240.78250.79190.73800.80920.72100.73330.9983
CNNEfficientNet B20.350.84000.97390.88540.45400.62420.72350.71640.81540.61330.9949
CNNEfficientNet B20.400.94090.80960.95000.85630.54790.85650.91660.96120.92850.9952
CNNEfficientNet B20.450.91710.75050.86850.82900.87270.83450.89540.89710.85250.9395
CNNEfficientNet B20.500.84600.95580.97190.90320.86260.91250.97670.89590.91310.9790
CNNEfficientNet B20.550.95060.98110.87920.82530.69800.63200.97550.87280.88240.8770
CNNEfficientNet B20.600.95230.98140.93940.87170.90360.95330.87320.95090.96200.9979
CNNEfficientNet B20.650.92640.98670.93520.90530.93850.96890.75080.97940.95280.9891
CNNEfficientNet B20.700.91860.99590.93590.89390.93030.93360.82370.96660.94050.9933
CNNEfficientNet B20.750.98980.99500.98880.96360.95940.98980.99860.99040.98440.9960
CNNEfficientNet B20.800.99340.99460.99080.91690.93080.93860.97330.97950.97240.9933
CNNEfficientNet B20.850.99440.99760.99730.99730.98800.98780.99980.99550.99911.0000
CNNEfficientNet B20.900.99830.99530.99980.99630.97800.99750.99770.99830.99611.0000
CNNEfficientNet B20.950.99490.99260.99830.99920.99760.99151.00000.99930.99121.0000
CNNEfficientNet B21.000.99770.99480.99650.99490.99530.97791.00000.97490.99090.9967
CNNEfficientNet B30.050.49680.37980.53120.40830.57830.62180.70110.28150.42760.6525
CNNEfficientNet B30.100.72540.83490.20880.56790.66880.50450.73330.65410.72430.8149
CNNEfficientNet B30.150.78710.84490.43970.59490.53180.42870.42160.74170.63090.8602
CNNEfficientNet B30.200.91960.84590.60510.67110.79090.84470.35690.76040.76780.9208
CNNEfficientNet B30.250.93940.90380.80970.72030.60730.78340.87050.81880.68950.9765
CNNEfficientNet B30.300.85240.80110.93920.71080.62940.85600.74410.84230.85650.9938
CNNEfficientNet B30.350.72990.96990.64980.67860.84530.81550.78690.71090.76950.9580
CNNEfficientNet B30.400.86910.96350.94780.84330.73550.80400.95870.95380.94500.9999
CNNEfficientNet B30.450.94710.94120.90280.74840.79620.91770.85490.92220.95570.8275
CNNEfficientNet B30.500.96260.93640.93670.75040.81780.96310.90930.84350.91880.9751
CNNEfficientNet B30.550.92580.91880.89110.84670.79510.86060.99090.81800.83900.8614
CNNEfficientNet B30.600.97260.98480.91340.83550.78020.90180.99190.96980.96900.9926
CNNEfficientNet B30.650.94110.99530.96160.90630.75220.96540.92560.97730.92210.9326
CNNEfficientNet B30.700.97110.95790.98190.79970.89270.95790.96420.96550.96340.9990
CNNEfficientNet B30.750.98800.97480.97050.93330.91180.94300.99160.98310.96570.9656
CNNEfficientNet B30.800.99330.99560.99430.91950.96670.93000.99380.98340.97310.9846
CNNEfficientNet B30.850.99660.99190.99350.96560.98070.96940.97180.99300.98731.0000
CNNEfficientNet B30.900.99440.99860.99800.99450.98550.98730.99630.99030.99011.0000
CNNEfficientNet B30.950.99990.99860.99910.99660.99621.00001.00000.99850.99551.0000
CNNEfficientNet B31.000.99820.99860.99960.99940.99990.99841.00000.97330.99681.0000
CNNGAN Classifier0.100.76510.94250.53800.72420.82340.66830.80150.92170.79880.9872
CNNGAN Classifier0.150.86160.78160.51470.64750.54780.55310.46360.73210.80740.7218
CNNGAN Classifier0.200.84120.87760.37090.80190.71560.71220.64870.61570.82630.8386
CNNGAN Classifier0.250.93000.94080.73290.76510.71950.96140.91890.64690.78510.9999
CNNGAN Classifier0.300.84880.88770.75060.78600.52310.95180.73670.75620.81580.9818
CNNGAN Classifier0.350.81170.94950.72260.55370.62640.87500.88170.83400.69010.9861
CNNGAN Classifier0.400.85480.90590.93650.89480.79130.77260.95480.88970.94400.9819
CNNGAN Classifier0.450.85280.85320.92750.94780.86400.95670.95320.85530.92840.7513
CNNGAN Classifier0.500.97870.98370.91330.91490.80540.99080.89150.89690.91910.9897
CNNGAN Classifier0.550.94730.94410.88700.87550.84160.94620.84280.79350.85960.8275
CNNGAN Classifier0.600.96980.98420.96490.90000.83020.85850.90430.95110.95750.9885
CNNGAN Classifier0.650.94290.96400.92130.94450.84310.96040.78820.96520.98370.9162
CNNGAN Classifier0.700.93210.95640.90820.91940.91550.94570.74660.97360.97760.9914
CNNGAN Classifier0.750.99200.99470.98880.97680.90760.93660.99860.98410.98880.9999
CNNGAN Classifier0.800.98440.98730.97330.94830.96320.95710.97580.99030.97740.9828
CNNGAN Classifier0.850.99100.91020.96950.99750.97670.99450.99100.98740.99820.9801
CNNGAN Classifier0.900.99340.99550.99730.99450.96960.99980.99890.99850.99730.9983
CNNGAN Classifier0.950.99651.00000.99960.99920.99710.99900.99880.99970.99921.0000
CNNGAN Classifier1.000.99970.99980.99750.97850.99990.98261.00000.99990.99980.9897
CNNGradient Boosting0.050.58450.84530.75980.42490.54790.64710.57920.82380.42770.7450
CNNGradient Boosting0.100.79140.90060.82710.50200.75840.71330.45110.75950.56270.8635
CNNGradient Boosting0.150.82270.96200.88910.57430.82730.83450.98240.68420.77860.9589
CNNGradient Boosting0.200.82190.92040.80360.82270.87030.84970.98390.67140.84850.8845
CNNGradient Boosting0.250.80580.88870.84680.76570.78530.73410.76600.78800.75790.9859
CNNGradient Boosting0.300.84850.85110.70480.90940.80710.77090.91140.80490.82220.9901
CNNGradient Boosting0.350.81530.93750.76780.73280.84000.82170.92200.86250.77540.9645
CNNGradient Boosting0.400.93540.91040.73000.88170.85480.73080.97910.88150.88310.9977
CNNGradient Boosting0.450.90700.95810.67080.70450.84640.79590.87070.76040.87300.9494
CNNGradient Boosting0.500.95300.94860.93040.90610.90870.97300.99950.83900.88090.9418
CNNGradient Boosting0.550.93540.97990.89000.89170.96930.92800.91770.96420.85180.9681
CNNGradient Boosting0.600.90740.97630.86910.86190.87830.84940.98620.88550.92960.9901
CNNGradient Boosting0.650.98450.98850.90470.91760.94670.84610.89920.99690.94570.9972
CNNGradient Boosting0.700.98650.98260.90890.94280.96980.66610.77700.98550.94810.9963
CNNGradient Boosting0.750.98220.99650.98980.98670.94450.98920.94550.98990.96811.0000
CNNGradient Boosting0.800.97400.99470.98970.91280.97080.95920.97190.98000.97730.9935
CNNGradient Boosting0.850.99000.98720.97750.94920.97140.95210.96620.98900.97950.9958
CNNGradient Boosting0.900.99700.98610.99580.98820.98000.99560.99081.00000.99270.9963
CNNGradient Boosting0.950.99730.99450.99210.99880.99500.99840.99460.99980.99830.9998
CNNGradient Boosting1.000.98730.99961.00000.99930.99730.99330.99790.99890.98060.9968
CNNMobileNet v10.050.65880.88630.68300.55000.75350.88350.97100.87440.56010.7850
CNNMobileNet v10.100.69570.88960.90730.77090.77910.85120.98750.82530.79400.8703
CNNMobileNet v10.150.79930.94400.77270.69580.72550.80560.90400.90700.83260.8828
CNNMobileNet v10.200.88130.97500.87350.77480.55790.92530.96080.78210.89730.9729
CNNMobileNet v10.250.92210.97560.90010.81940.72730.87180.90830.91710.92320.9939
CNNMobileNet v10.300.95750.96070.78060.88620.77340.98960.92870.96770.93540.9985
CNNMobileNet v10.350.87650.97160.95120.81870.76000.94970.93210.80650.92350.9856
CNNMobileNet v10.400.96610.96690.85640.85820.83020.85850.95920.97480.93780.9995
CNNMobileNet v10.450.94750.98840.93430.95810.94990.87100.96670.98460.96641.0000
CNNMobileNet v10.500.97900.98170.93040.91710.70870.97850.90470.92580.94240.9925
CNNMobileNet v10.550.96310.98740.88370.91990.97360.96440.95270.98890.95680.9997
CNNMobileNet v10.600.99660.99190.99000.89680.87900.97610.99740.98860.95050.9982
CNNMobileNet v10.650.98080.99590.98200.93350.96280.99250.99390.99260.97240.9997
CNNMobileNet v10.700.98820.98810.96830.95870.87320.95900.99320.97620.97990.9990
CNNMobileNet v10.750.99330.99800.99810.95480.96240.97650.99490.99750.99291.0000
CNNMobileNet v10.800.99250.99080.98880.96400.91720.99240.98250.99230.98531.0000
CNNMobileNet v10.850.99950.99870.99860.98740.98170.99701.00000.99431.00001.0000
CNNMobileNet v10.900.99560.99911.00000.99420.98870.99981.00000.99840.99641.0000
CNNMobileNet v10.950.99900.99940.99930.99900.99231.00001.00000.99980.99911.0000
CNNMobileNet v11.001.00000.99980.99950.99990.99991.00001.00000.99991.00001.0000
CNNMobileNet v1 0.250.050.68220.83690.45340.64020.55050.67920.64250.90790.54510.8031
CNNMobileNet v1 0.250.100.56380.85130.89730.43390.54170.67110.94260.85910.53010.8339
CNNMobileNet v1 0.250.150.87150.90530.88750.90120.81040.77140.84940.91510.80820.7090
CNNMobileNet v1 0.250.200.87710.94220.72280.79720.89260.87520.91770.91260.91960.9344
CNNMobileNet v1 0.250.250.85650.95840.86920.76430.68090.86660.72250.95620.88490.9900
CNNMobileNet v1 0.250.300.86490.96330.83160.75750.74080.90710.89650.87390.90360.9568
CNNMobileNet v1 0.250.350.88920.92280.91490.62090.65110.85390.98800.82700.92540.9875
CNNMobileNet v1 0.250.400.91020.95870.89690.91200.83790.96510.98290.93750.91430.9999
CNNMobileNet v1 0.250.450.88210.93160.94070.80690.88370.90920.95880.96810.96310.9980
CNNMobileNet v1 0.250.500.94820.97690.97510.82690.83790.96500.99240.93220.88060.9721
CNNMobileNet v1 0.250.550.96810.99550.92670.94400.97100.96890.94790.99220.96680.9957
CNNMobileNet v1 0.250.600.96850.98630.92930.93040.92770.94930.96250.96010.95850.9910
CNNMobileNet v1 0.250.650.96660.97950.97510.94520.88710.96580.93380.96010.96190.9881
CNNMobileNet v1 0.250.700.95160.98120.97430.91410.86670.97500.97240.97910.95300.9953
CNNMobileNet v1 0.250.750.98330.99390.99510.93230.97000.97760.98890.99120.97601.0000
CNNMobileNet v1 0.250.800.97700.97270.96160.84020.82490.97220.99390.98470.95691.0000
CNNMobileNet v1 0.250.850.98040.99440.99120.98900.96770.99530.99240.99280.98981.0000
CNNMobileNet v1 0.250.900.97430.99170.99690.98360.96660.99520.99170.99790.99590.9999
CNNMobileNet v1 0.250.950.99870.98890.98980.99200.97120.99750.99830.99620.99121.0000
CNNMobileNet v1 0.251.000.99450.99880.99750.99900.98900.99920.99980.99970.99071.0000
CNNMobileNet v1 0.50.050.68280.65870.97850.59410.64810.71670.57750.89440.46160.8205
CNNMobileNet v1 0.50.100.65540.90180.84930.68710.61380.96330.96310.85960.80080.6343
CNNMobileNet v1 0.50.150.86200.87830.84640.63100.65530.75240.91150.81410.69630.8648
CNNMobileNet v1 0.50.200.91230.94870.77370.78790.62140.95620.93740.85320.79940.9764
CNNMobileNet v1 0.50.250.86400.95320.93520.79140.86250.97880.96540.92460.86260.9817
CNNMobileNet v1 0.50.300.94660.98400.96830.80510.78590.93270.99250.95240.89990.9922
CNNMobileNet v1 0.50.350.73750.97510.91160.68400.63800.90950.98490.85790.89660.9744
CNNMobileNet v1 0.50.400.94190.97050.91490.83330.81360.80150.97200.92360.93730.9999
CNNMobileNet v1 0.50.450.91050.96800.87430.86700.83240.96470.97350.95230.93800.9999
CNNMobileNet v1 0.50.500.95380.91290.97280.74650.73780.98210.99390.96540.94370.9995
CNNMobileNet v1 0.50.550.97350.99050.83560.89410.90470.97870.98760.97840.96120.9988
CNNMobileNet v1 0.50.600.98340.98350.91120.91390.88990.93720.99330.97830.96840.9988
CNNMobileNet v1 0.50.650.98570.99400.98560.94320.89340.98830.95590.97810.96070.9986
CNNMobileNet v1 0.50.700.97280.99290.98830.91920.91580.98290.99120.96330.96660.9989
CNNMobileNet v1 0.50.750.98970.99190.99230.95610.91090.98550.99910.99280.97941.0000
CNNMobileNet v1 0.50.800.99380.98550.99770.96210.92860.99210.99110.98700.99311.0000
CNNMobileNet v1 0.50.850.99560.99320.99480.96210.96500.99690.99550.99360.99111.0000
CNNMobileNet v1 0.50.900.99490.99690.99920.99460.97100.99931.00000.99840.99791.0000
CNNMobileNet v1 0.50.950.99740.99960.99850.99920.99591.00001.00000.99990.99821.0000
CNNMobileNet v1 0.51.000.99860.99880.99920.99970.99890.99971.00000.99970.99601.0000
CNNMobileNet v1 0.750.050.64420.71830.68140.52900.58980.98040.77360.82120.59580.7322
CNNMobileNet v1 0.750.100.76120.95600.47030.65870.64310.76290.65020.93300.66970.8744
CNNMobileNet v1 0.750.150.72460.94970.65540.77850.67020.82110.95800.89780.87750.5734
CNNMobileNet v1 0.750.200.84270.92040.88800.71820.77610.49290.89160.81200.87740.9951
CNNMobileNet v1 0.750.250.91850.95920.60370.83880.75260.76460.92060.94260.88780.9987
CNNMobileNet v1 0.750.300.84580.92430.59310.80850.68240.92220.89390.93240.83140.9962
CNNMobileNet v1 0.750.350.90730.95780.95690.63600.72160.92360.98370.77180.90160.9949
CNNMobileNet v1 0.750.400.95210.97600.91820.83050.83290.94350.98220.97270.94541.0000
CNNMobileNet v1 0.750.450.93380.98220.96270.84220.79840.91360.98540.97430.93880.9987
CNNMobileNet v1 0.750.500.95890.98540.98000.82900.92320.99300.94480.96480.95570.9794
CNNMobileNet v1 0.750.550.97780.99000.79010.89930.97890.94310.91950.97220.96370.9992
CNNMobileNet v1 0.750.600.95860.99150.97830.95520.91760.93000.98920.97520.96410.9997
CNNMobileNet v1 0.750.650.98060.99550.98430.96830.95550.98810.96790.99520.97981.0000
CNNMobileNet v1 0.750.700.98340.99840.98520.89120.89600.98210.93970.97510.96820.9999
CNNMobileNet v1 0.750.750.99080.99620.99370.94740.95250.99130.99860.98680.98881.0000
CNNMobileNet v1 0.750.800.99550.99570.99730.94400.94970.98680.99240.99380.99141.0000
CNNMobileNet v1 0.750.850.99990.99680.99980.99500.96260.99921.00000.99710.99971.0000
CNNMobileNet v1 0.750.900.99540.99321.00000.99650.95700.99990.99990.99800.99521.0000
CNNMobileNet v1 0.750.950.99270.99680.99980.99890.99141.00000.99980.99930.99801.0000
CNNMobileNet v1 0.751.001.00000.99990.99950.99970.99971.00001.00001.00000.99991.0000
CNNRandom Forest0.050.82640.89430.91160.53850.93290.89300.91910.95310.80890.9982
CNNRandom Forest0.100.91660.94120.97280.67310.93970.83460.98530.86860.84160.9985
CNNRandom Forest0.150.94660.96620.97060.74360.96600.84890.98400.82800.91740.9950
CNNRandom Forest0.200.88710.95900.94730.69910.94670.81150.98520.81940.89220.9982
CNNRandom Forest0.250.88670.97150.98690.90340.94230.91190.99370.90520.93650.9996
CNNRandom Forest0.300.93510.96090.96940.89780.93460.93170.99380.91020.91220.9982
CNNRandom Forest0.350.95850.97830.96980.73020.95900.92290.98120.90540.90880.9976
CNNRandom Forest0.400.96150.98740.96370.85970.93230.89140.99250.95890.95270.9984
CNNRandom Forest0.450.96910.98940.95890.83540.94020.96650.99600.94410.96650.9976
CNNRandom Forest0.500.98130.98060.98180.89650.95880.95720.99820.97970.95591.0000
CNNRandom Forest0.550.99590.99380.98140.95740.96990.95110.98960.99330.98020.9999
CNNRandom Forest0.600.99370.99250.99020.93040.97460.93960.99920.98660.97291.0000
CNNRandom Forest0.650.99650.99650.98140.97240.97320.96940.99400.99530.98380.9999
CNNRandom Forest0.700.99820.99870.99440.97210.97930.96040.99380.99660.97371.0000
CNNRandom Forest0.750.99530.99800.99880.95930.95570.98380.99900.99780.97731.0000
CNNRandom Forest0.800.99630.99810.99860.96040.96150.98020.99740.99680.99551.0000
CNNRandom Forest0.850.99870.99970.99940.98720.96960.98540.99890.99730.99301.0000
CNNRandom Forest0.900.99840.99850.99990.99810.97010.99660.99990.99910.99561.0000
CNNRandom Forest0.950.99970.99960.99981.00000.99461.00001.00000.99990.99911.0000
CNNRandom Forest1.000.99980.99991.00000.99990.99901.00001.00000.99960.99961.0000
CNNResNet180.050.58100.77750.44310.52070.71480.88720.46960.52860.59940.5979
CNNResNet180.100.75980.94780.93740.72720.68780.86700.63570.96560.55100.7344
CNNResNet180.150.74260.89010.62790.77370.75560.54380.96200.79740.87730.7288
CNNResNet180.200.87140.92850.61460.45420.63980.92920.89200.50110.80700.9349
CNNResNet180.250.91410.96720.72920.80370.63240.97060.95260.87910.93840.9970
CNNResNet180.300.89640.88240.61250.63920.57800.92760.69610.80850.82160.9715
CNNResNet180.350.87710.96790.88120.74180.72710.85700.68450.59300.74000.9682
CNNResNet180.400.94580.91450.95930.78520.62740.85980.99530.95430.93301.0000
CNNResNet180.450.81110.86370.85680.73770.75430.96770.94670.84440.95850.9635
CNNResNet180.500.94470.96240.86700.77710.70370.95870.99920.77710.84100.9836
CNNResNet180.550.98520.95450.82270.88420.84640.90910.88410.98330.90250.8963
CNNResNet180.600.97940.97810.98180.90750.81790.84320.99340.97250.94880.9954
CNNResNet180.650.96640.98340.94760.89850.88910.97250.93920.97160.92510.9654
CNNResNet180.700.95420.99580.99400.78920.78920.96850.58480.90930.88600.9994
CNNResNet180.750.99080.99600.99090.95540.87970.98690.99990.99830.99601.0000
CNNResNet180.800.99530.99490.99560.91600.94970.94850.98940.98680.99000.9828
CNNResNet180.850.99950.90700.99770.95910.96960.99560.99580.99600.99841.0000
CNNResNet180.900.99810.99960.99980.99520.98510.99710.98250.99890.99840.9993
CNNResNet180.950.99780.99960.99740.99990.99350.99901.00000.99940.99361.0000
CNNResNet181.000.99800.99600.99941.00000.99990.98330.99980.97971.00001.0000
CNNSVM0.050.82000.92080.96220.70150.94150.94610.99700.97300.69980.9893
CNNSVM0.100.85350.96310.97500.75410.89770.89610.92500.89450.91500.9995
CNNSVM0.150.92280.96320.92160.82670.93420.94250.97660.84160.94270.9997
CNNSVM0.200.89550.96860.91500.88750.95540.94680.98210.78950.92990.9999
CNNSVM0.250.93260.97970.97890.90130.93960.95810.99980.88150.93121.0000
CNNSVM0.300.92290.96730.96620.82310.92980.92320.95480.94060.94201.0000
CNNSVM0.350.98360.97990.95050.90080.93510.94760.99880.94150.94021.0000
CNNSVM0.400.97610.98000.95930.89220.92220.89730.99830.95400.96241.0000
CNNSVM0.450.98040.98820.96700.94930.93500.97270.99880.97530.98280.9999
CNNSVM0.500.98980.98980.97960.88650.90090.99061.00000.98900.97221.0000
CNNSVM0.550.99350.98960.97660.97310.98530.97190.99660.99390.99320.9999
CNNSVM0.600.99690.99560.99910.97700.97800.96420.99960.99080.99181.0000
CNNSVM0.650.99180.99710.99190.95080.95420.98320.99340.99940.98520.9889
CNNSVM0.700.99480.99770.99640.93730.97570.97890.99340.99320.97561.0000
CNNSVM0.750.99940.99920.99920.98050.97160.98561.00000.99910.99681.0000
CNNSVM0.800.99960.99360.99800.97420.98710.99060.99970.99600.99641.0000
CNNSVM0.851.00000.99971.00000.99370.99230.99400.99990.99980.99951.0000
CNNSVM0.900.99990.99980.99990.99860.99510.99961.00000.99900.99951.0000
CNNSVM0.950.99991.00001.00001.00000.99951.00001.00001.00001.00001.0000
CNNSVM1.001.00001.00001.00001.00001.00001.00001.00001.00001.00001.0000
CNNU-Net0.050.59230.91370.70790.53640.63870.74610.62870.81240.52920.6926
CNNU-Net0.100.78840.94050.63410.70850.75390.67230.76250.89680.84470.6275
CNNU-Net0.150.79930.89060.45840.72670.70740.87900.99040.85150.86510.8097
CNNU-Net0.200.89450.93670.55900.76530.81740.95210.94720.89780.90770.9920
CNNU-Net0.250.80270.93790.96610.75510.74290.71450.87950.86910.86341.0000
CNNU-Net0.300.89200.86370.73440.76710.57100.89020.95680.80390.82850.9964
CNNU-Net0.350.94780.98370.91780.86590.57250.94490.98030.97220.93190.9957
CNNU-Net0.400.82740.92080.79920.73590.77500.74180.91350.97630.93040.9970
CNNU-Net0.450.91250.96470.89980.80220.59380.93360.98210.96640.96171.0000
CNNU-Net0.500.98160.98520.99170.91690.85190.96490.94730.96090.93650.9998
CNNU-Net0.550.94420.98690.95180.96960.89050.92230.98300.92400.92790.8919
CNNU-Net0.600.98310.99150.99400.99020.90730.96190.99340.98480.97001.0000
CNNU-Net0.650.96830.95450.99070.91730.92780.97970.98710.99320.97440.9993
CNNU-Net0.700.97820.98620.98920.86230.89510.94010.94890.98740.96730.9994
CNNU-Net0.750.99270.97460.99640.90030.92040.97600.99900.98520.98410.9999
CNNU-Net0.800.98490.99400.98950.95400.95130.99741.00000.98970.97430.9982
CNNU-Net0.850.99430.99490.99980.98120.94740.99090.99950.99190.98581.0000
CNNU-Net0.900.99190.99830.99680.96300.94440.99370.99950.98610.95941.0000
CNNU-Net0.950.99840.99910.99850.99530.98780.99860.99951.00000.99121.0000
CNNU-Net1.000.97120.99940.99950.99950.97780.99831.00000.97850.96490.9987
CNNXGBoost0.050.64250.76940.64230.40940.59160.85690.54130.86300.73820.8445
CNNXGBoost0.100.78290.69330.79150.54750.71010.84370.86130.76550.52670.8597
CNNXGBoost0.150.88410.87310.68800.51990.91190.82250.95650.67370.72460.9732
CNNXGBoost0.200.92300.87740.84770.81190.87480.74820.99240.70100.89300.9251
CNNXGBoost0.250.86810.94650.71430.82370.82750.91750.72590.92920.88230.9993
CNNXGBoost0.300.89130.89620.73030.82950.93780.87610.94980.87700.87070.9946
CNNXGBoost0.350.92020.97540.82000.86340.94880.87370.97130.83950.87880.9752
CNNXGBoost0.400.96620.94620.77170.90580.89130.91790.99430.95610.95860.9944
CNNXGBoost0.450.90730.98520.84860.72890.92380.80770.94470.72210.86450.9337
CNNXGBoost0.500.98930.94840.95280.91380.97570.94530.99750.96050.96190.9891
CNNXGBoost0.550.95970.98410.95340.97050.97430.98230.98600.98930.93700.9890
CNNXGBoost0.600.98670.98690.92910.94380.93830.94590.99860.96770.95671.0000
CNNXGBoost0.650.97750.99230.92980.92920.95520.92980.93510.99420.96930.9999
CNNXGBoost0.700.97720.99560.95660.96360.95540.86640.97270.99330.94970.9996
CNNXGBoost0.750.99520.99470.99840.97690.95950.99030.99780.99610.98401.0000
CNNXGBoost0.800.99150.99630.99940.94270.97760.98360.99970.99230.98791.0000
CNNXGBoost0.850.99620.99700.99620.98610.98440.97780.99810.99840.98961.0000
CNNXGBoost0.900.99510.99910.99380.98430.98860.99680.99760.99850.99700.9994
CNNXGBoost0.950.99950.99980.99950.99920.99641.00001.00000.99960.99831.0000
CNNXGBoost1.000.99830.99301.00000.99790.99740.99841.00000.99850.99511.0000
Table A3. Model Performance Metrics and Timing Results (Cross Validation Experiment).
Table A3. Model Performance Metrics and Timing Results (Cross Validation Experiment).
Pretext ModelDownstream ModelAccuracyPrecisionRecallF1 ScoreROC AUCTPR@FPR = 0.01TPR@FPR = 0.05TPR@FPR = 0.10TPR@FPR = 0.15Test Feat. Ext. (ms/img)Inference Time (ms/img)
CNNXGBoost0.95650.95920.95640.95670.99870.97610.99780.99890.998915.350.04
CNNRandom Forest0.96960.97050.96940.96940.99860.97470.99570.99780.998915.350.05
CNNSVM0.99130.99170.99130.99140.99990.99891.00001.00001.000015.350.08
CNNGradient Boosting0.94130.94670.94120.94130.99590.95430.98150.99020.993515.350.01
CNNResNet180.96520.96800.96540.96550.99940.98480.99781.00001.000015.350.13
CNNCNN0.90220.90690.90220.90260.99160.89460.97390.98700.988015.350.03
CNNU-Net0.87930.88970.87950.87880.99240.85110.95980.98480.994615.350.35
CNNMobileNet v10.95540.95980.95520.95540.99840.97070.99670.99670.997815.350.05
CNNMobileNet v1 0.750.92610.93170.92640.92670.99760.93370.99350.99670.997815.350.05
CNNMobileNet v1 0.50.93590.93850.93590.93580.99720.95220.99020.99570.997815.350.11
CNNMobileNet v1 0.250.88040.88820.88090.87880.99100.83260.95760.98260.990215.350.05
CNNEfficientNet B00.93370.93830.93380.93350.99680.94020.99130.99460.996715.350.16
CNNEfficientNet B10.93150.93530.93150.93110.99690.94020.98700.99240.995715.350.22
CNNEfficientNet B20.91090.92060.91160.91150.99470.91090.97610.98910.995715.350.25
CNNEfficientNet B30.91740.92620.91770.91780.99600.92170.98040.99460.996715.350.28
CNNGAN Classifier0.95330.95780.95360.95360.99780.96630.99570.99670.998915.350.04
Table A4. Class-wise ROC AUC Results (Cross Validation Experiment).
Table A4. Class-wise ROC AUC Results (Cross Validation Experiment).
Pretext ModelDownstream ModelROC AUC Class 0ROC AUC Class 1ROC AUC Class 2ROC AUC Class 3ROC AUC Class 4ROC AUC Class 5ROC AUC Class 6ROC AUC Class 7ROC AUC Class 8ROC AUC Class 9
CNNXGBoost0.99410.99970.99960.99880.99840.99991.00000.99840.99760.9999
CNNRandom Forest0.99600.99870.99970.99720.99810.99981.00000.99740.99810.9999
CNNSVM0.99990.99990.99991.00000.99991.00001.00000.99981.00000.9999
CNNGradient Boosting0.98800.99460.99530.99860.99590.99860.99870.99860.99740.9996
CNNResNet180.99950.99950.99990.99990.99741.00001.00000.99910.99880.9999
CNNCNN0.98300.98690.98490.98580.98790.99480.99580.99760.99060.9991
CNNU-Net0.98050.99790.99930.98720.98920.99280.99820.99630.98450.9999
CNNMobileNet v10.99660.99920.99990.99890.99750.99971.00000.99680.99590.9999
CNNMobileNet v1 0.750.99460.99910.99890.99630.99580.99780.99990.99540.99681.0000
CNNMobileNet v1 0.50.99850.99780.99880.99230.99170.99891.00000.99530.99311.0000
CNNMobileNet v1 0.250.98860.99620.99900.98130.97190.99510.99840.98920.98461.0000
CNNEfficientNet B00.99650.99980.99630.99470.99190.99940.99970.99430.99650.9947
CNNEfficientNet B10.99190.99970.99870.99530.99300.99720.99920.99710.99511.0000
CNNEfficientNet B20.98820.99710.99740.98950.98930.99660.99920.99320.99251.0000
CNNEfficientNet B30.99660.99800.99890.99180.99220.99760.99940.99220.99400.9999
CNNGAN Classifier0.99950.99980.99970.99670.99820.99910.99980.99750.99650.9942
Table A5. SimCLR Performance Metrics.
Table A5. SimCLR Performance Metrics.
kAccuracyPrecisionRecallF1 Score
0.050.15770.12330.15770.0863
0.100.31350.36330.31350.2360
0.150.32100.53490.32100.2633
0.200.45450.55050.45450.4222
0.250.46750.75000.46750.4763
0.300.59000.68910.59000.5763
0.350.60850.66020.60850.5994
0.400.63820.66550.63820.6192
0.450.54360.63480.54360.5211
0.500.68460.71750.68460.6719
0.550.71240.75410.71240.7120
0.600.75510.79750.75510.7524
0.650.75320.77920.75320.7490
0.700.75140.76980.75140.7449
0.750.68090.78210.68090.6779
0.800.77370.80430.77370.7690
0.850.81260.82780.81260.8060
0.900.87380.87830.87380.8719
0.950.87940.88630.87940.8779
1.000.92020.92150.92020.9199

References

  1. Moreira, A.; Prats-Iraola, P.; Younis, M.; Krieger, G.; Hajnsek, I.; Papathanassiou, K.P. A tutorial on synthetic aperture radar. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–43. [Google Scholar] [CrossRef]
  2. Jakowatz, C.V.; Wahl, D.E.; Eichel, P.H.; Ghiglia, D.C.; Thompson, P.A. Spotlight-Mode Synthetic Aperture Radar: A Signal Processing Approach: A Signal Processing Approach; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  3. Lewis, B.; Scarnati, T.; Sudkamp, E.; Nehrbass, J.; Rosencrantz, S.; Zelnio, E. A SAR dataset for ATR development: The Synthetic and Measured Paired Labeled Experiment (SAMPLE). In Algorithms for Synthetic Aperture Radar Imagery XXVI; SPIE: Bellingham, DC, USA, 2019; Volume 10987, pp. 39–54. [Google Scholar]
  4. Ross, T.D.; Worrell, S.W.; Velten, V.J.; Mossing, J.C.; Bryant, M.L. Standard SAR ATR evaluation experiments using the MSTAR public release data set. In Algorithms for Synthetic Aperture Radar Imagery V; SPIE: Bellingham, DC, USA, 1998; Volume 3370, pp. 566–573. [Google Scholar]
  5. El-Darymli, K.; Gill, E.W.; Mcguire, P.; Power, D.; Moloney, C. Automatic target recognition in synthetic aperture radar imagery: A state-of-the-art review. IEEE Access 2016, 4, 6014–6058. [Google Scholar] [CrossRef]
  6. Li, J.; Yu, Z.; Yu, L.; Cheng, P.; Chen, J.; Chi, C. A comprehensive survey on SAR ATR in deep-learning era. Remote Sens. 2023, 15, 1454. [Google Scholar] [CrossRef]
  7. Kim, M.; Jang, O.; Song, H.; Shin, H.; Ok, J.; Back, M.; Youn, J.; Kim, S. Soft Segmented Randomization: Enhancing Domain Generalization in SAR ATR for Synthetic-to-Measured. IEEE Access 2024, 12, 175801–175816. [Google Scholar] [CrossRef]
  8. Inkawhich, N. A global model approach to robust few-shot SAR automatic target recognition. IEEE Geosci. Remote Sens. Lett. 2023, 20, 4004305. [Google Scholar] [CrossRef]
  9. Auer, S.J. 3D Synthetic Aperture Radar Simulation for Interpreting Complex Urban Reflection Scenarios. Ph.D. Thesis, Technische Universität München, München, Germany, 2011. [Google Scholar]
  10. Hammer, H.; Schulz, K. Coherent simulation of SAR images. In Proceedings of the Image and Signal Processing for Remote Sensing XV, Berlin, Germany, 31 August–3 September 2009; SPIE: Bellingham, DC, USA, 2009; Volume 7477, pp. 406–414. [Google Scholar]
  11. Balz, T.; Stilla, U. Hybrid GPU-based single-and double-bounce SAR simulation. IEEE Trans. Geosci. Remote Sens. 2009, 47, 3519–3529. [Google Scholar] [CrossRef]
  12. Inkawhich, N.; Inkawhich, M.J.; Davis, E.K.; Majumder, U.K.; Tripp, E.; Capraro, C.; Chen, Y. Bridging a gap in SAR-ATR: Training on fully synthetic and testing on measured data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2942–2955. [Google Scholar] [CrossRef]
  13. Wang, C.; Liu, X.; Huang, Y.; Luo, S.; Pei, J.; Yang, J.; Mao, D. Semi-supervised SAR ATR framework with transductive auxiliary segmentation. Remote Sens. 2022, 14, 4547. [Google Scholar] [CrossRef]
  14. Pei, H.; Su, M.; Xu, G.; Xing, M.; Hong, W. Self-supervised feature representation for SAR image target classification using contrastive learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 9246–9258. [Google Scholar] [CrossRef]
  15. Li, W.; Yang, W.; Liu, T.; Hou, Y.; Li, Y.; Liu, Z.; Liu, Y.; Liu, L. Predicting gradient is better: Exploring self-supervised learning for SAR ATR with a joint-embedding predictive architecture. ISPRS J. Photogramm. Remote Sens. 2024, 218, 326–338. [Google Scholar] [CrossRef]
  16. Li, W.; Yang, W.; Hou, Y.; Liu, L.; Liu, Y.; Li, X. SARATR-X: Towards building a foundation model for SAR target recognition. IEEE Trans. Image Process. 2025, 34, 869–884. [Google Scholar] [CrossRef]
  17. Jaiswal, A.; Babu, A.R.; Zadeh, M.Z.; Banerjee, D.; Makedon, F. A survey on contrastive self-supervised learning. Technologies 2020, 9, 2. [Google Scholar] [CrossRef]
  18. Muzeau, M.; Frontera-Pons, J.; Ren, C.; Ovarlez, J.P. SAFE: A SAR Feature Extractor based on self-supervised learning and masked Siamese ViTs. arXiv 2024, arXiv:2407.00851. [Google Scholar] [CrossRef]
  19. Feng, S.; Ji, K.; Zhang, L.; Ma, X.; Kuang, G. SAR target classification based on integration of ASC parts model and deep learning algorithm. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 10213–10225. [Google Scholar] [CrossRef]
  20. Chen, S.; Wang, H.; Xu, F.; Jin, Y.Q. Target classification using the deep convolutional networks for SAR images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4806–4817. [Google Scholar] [CrossRef]
  21. Geng, Z.; Xu, Y.; Wang, B.N.; Yu, X.; Zhu, D.Y.; Zhang, G. Target recognition in SAR images by deep learning with training data augmentation. Sensors 2023, 23, 941. [Google Scholar] [CrossRef]
  22. Wang, J.; Yang, H.; Liu, Z.; Chen, H. SSDDPM: A single SAR image generation method based on denoising diffusion probabilistic model. Sci. Rep. 2025, 15, 10867. [Google Scholar] [CrossRef]
  23. Ma, X.; Bu, X.; Zhang, D.; Wang, Z.; Li, J. Incremental SAR Automatic Target Recognition with Divergence-Constrained Class-Specific Dictionary Learning. Remote Sens. 2025, 17, 2090. [Google Scholar] [CrossRef]
  24. Li, S.; Pan, Z.; Hu, Y. Multi-aspect convolutional-transformer network for SAR automatic target recognition. Remote Sens. 2022, 14, 3924. [Google Scholar] [CrossRef]
  25. Zhao, G.; Li, P.; Zhang, Z.; Guo, F.; Huang, X.; Xu, W.; Wang, J.; Chen, J. Towards sar automatic target recognition: Multi-category sar image classification based on light weight vision transformer. In Proceedings of the 2024 21st Annual International Conference on Privacy, Security and Trust (PST), Sydney, Australia, 28–30 August 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
  26. Cho, D.S.; Ebersole, C.; Zelnio, E. Replacing measured SAR datasets with augmented synthetic images. In Proceedings of the Algorithms for Synthetic Aperture Radar Imagery XXXII, Orlando, FL, USA, 15–17 April 2025; SPIE: Bellingham, DC, USA, 2025; Volume 13456, pp. 148–156. [Google Scholar]
  27. Camus, B.; Voillemin, T.; Le Barbu, C.; Louvigné, J.C.; Belloni, C.; Vallée, E. Training Deep Learning Models with Hybrid Datasets for Robust Automatic Target Detection on real SAR images. In Proceedings of the 2024 International Radar Conference (RADAR), Rennes, France, 21–25 October 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
  28. Li, W.; Wei, Y.; Liu, T.; Hou, Y.; Li, Y.; Liu, Z.; Liu, Y.; Liu, L. Exploring Self-Supervised Learning for SAR ATR: A Knowledge-Guided Predictive Perspective. arXiv 2023, arXiv:2311.15153. [Google Scholar]
  29. Xu, Y.; Sun, H.; Chen, J.; Lei, L.; Ji, K.; Kuang, G. Adversarial self-supervised learning for robust SAR target recognition. Remote Sens. 2021, 13, 4158. [Google Scholar] [CrossRef]
  30. Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning. PmLR, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
  31. Slesinski, J.; Wierzbicki, D. Review of Synthetic Aperture Radar Automatic Target Recognition: A Dual Perspective on Classical and Deep Learning Techniques. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 18978–19024. [Google Scholar] [CrossRef]
  32. Al Siam, M.; Noor, D.F. Self-Supervised Learning for SAR Target Recognition with Multi-Task Pretext Training. In Proceedings of the SoutheastCon 2025, Concord, NC, USA, 29 March 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1207–1213. [Google Scholar]
  33. Sandia National Laboratory. MSTAR Overview. 1995. Available online: https://www.sdms.afrl.af.mil/index.php?collection=mstar (accessed on 5 January 2024).
  34. Lewis, B. The Public SAMPLE Dataset. 2019. Available online: https://github.com/benjaminlewis-afrl/SAMPLE_dataset_public (accessed on 31 January 2025).
  35. Margarit, G.; Mallorqui, J.J. Scattering-Based Model of the SAR Signatures of Complex Targets for Classification Applications. Int. J. Navig. Obs. 2008, 2008, 426267. [Google Scholar] [CrossRef]
  36. Velten, V.J. SAR image invariants from 3D scattering centers. In Proceedings of the Algorithms for Synthetic Aperture Radar Imagery VIII, Orlando, FL, USA, 16–20 April 2001; SPIE: Bellingham, DC, USA, 2001; Volume 4382, pp. 367–378. [Google Scholar]
  37. Mori, S.; Biscarini, M.; Marziani, A.; Marzano, F.S.; Pierdicca, N. Effects of atmospheric precipitations and turbulence on satellite Ka-band synthetic aperture radar. In Proceedings of the Active and Passive Microwave Remote Sensing for Environmental Monitoring II, Berlin, Germany, 12–13 September 2018; SPIE: Bellingham, DC, USA, 2018; Volume 10788, pp. 18–26. [Google Scholar]
  38. Liang, W.; Wu, Y.; Li, M.; Cao, Y.; Hu, X. High-resolution SAR image classification using multi-scale deep feature fusion and covariance pooling manifold network. Remote Sens. 2021, 13, 328. [Google Scholar] [CrossRef]
  39. Tsokas, A.; Rysz, M.; Pardalos, P.M.; Dipple, K. SAR data applications in earth observation: An overview. Expert Syst. Appl. 2022, 205, 117342. [Google Scholar] [CrossRef]
  40. Zhao, S.; Liu, Z.; Lin, J.; Zhu, J.Y.; Han, S. Differentiable augmentation for data-efficient gan training. Adv. Neural Inf. Process. Syst. 2020, 33, 7559–7570. [Google Scholar]
  41. Karras, T.; Aittala, M.; Hellsten, J.; Laine, S.; Lehtinen, J.; Aila, T. Training generative adversarial networks with limited data. Adv. Neural Inf. Process. Syst. 2020, 33, 12104–12114. [Google Scholar]
  42. Cheng, Z.; Zhang, H.; Li, K.; Leng, S.; Hu, Z.; Wu, F.; Zhao, D.; Li, X.; Bing, L. Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss. arXiv 2024, arXiv:2410.17243. [Google Scholar] [CrossRef]
Figure 1. Detailed architecture of the proposed self-supervised learning framework. Main stages: (1) Input Processing: converting raw SAR images to normalized format with elevation-based data splitting, (2) Pretext Tasks: nine distinct transformation tasks for self-supervised learning, (3) CNN Architecture: specialized network with four convolutional blocks and dense layer for feature learning, and (4) Downstream Classification: multiple classifier evaluation using extracted features.
Figure 1. Detailed architecture of the proposed self-supervised learning framework. Main stages: (1) Input Processing: converting raw SAR images to normalized format with elevation-based data splitting, (2) Pretext Tasks: nine distinct transformation tasks for self-supervised learning, (3) CNN Architecture: specialized network with four convolutional blocks and dense layer for feature learning, and (4) Downstream Classification: multiple classifier evaluation using extracted features.
Sensors 26 00122 g001
Figure 2. Visual demonstration of all nine pretext transformations ( T 0 to T 8 ) applied to representative SAR images from each target class. Each row shows a different vehicle class, while each column represents a specific transformation.
Figure 2. Visual demonstration of all nine pretext transformations ( T 0 to T 8 ) applied to representative SAR images from each target class. Each row shows a different vehicle class, while each column represents a specific transformation.
Sensors 26 00122 g002
Figure 3. Comprehensive performance evaluation of different downstream classifiers in varying data availability scenarios.
Figure 3. Comprehensive performance evaluation of different downstream classifiers in varying data availability scenarios.
Sensors 26 00122 g003
Figure 4. Learning curves showing accuracy evolution of top 5 classifiers across different training data fractions (k = 0.05–1.00). Each colored line represents a classifier’s performance trajectory.
Figure 4. Learning curves showing accuracy evolution of top 5 classifiers across different training data fractions (k = 0.05–1.00). Each colored line represents a classifier’s performance trajectory.
Sensors 26 00122 g004
Figure 5. Task group contribution analysis at full data availability (k = 1.00) using SVM classifier. The heatmap shows performance metrics (accuracy, precision, F1-score) for individual task groups: T0_Original (identity transformation), T1–T5_Geometric (rotation and flip transformations), T6–T7_SignalQuality (denoise and blur), and T8_MultiScale (zoom).
Figure 5. Task group contribution analysis at full data availability (k = 1.00) using SVM classifier. The heatmap shows performance metrics (accuracy, precision, F1-score) for individual task groups: T0_Original (identity transformation), T1–T5_Geometric (rotation and flip transformations), T6–T7_SignalQuality (denoise and blur), and T8_MultiScale (zoom).
Sensors 26 00122 g005
Figure 6. ROC curves for top-performing classifiers at k = 1.00, demonstrating exceptional discrimination capabilities across all SAR target classes. The near-perfect curves validate the superior quality of SSL-extracted features and confirm the framework’s effectiveness for operational SAR ATR applications. (a) SVM (99.63% accuracy); per-class AUC ≥ 0.9999. (b) Random Forest (99.26% accuracy); AUC 0.9996–1.0000. (c) ResNet18 (97.40% accuracy); AUC 0.9994–1.0000. (d) MobileNet (98.70% accuracy); efficient and near-perfect AUC.
Figure 6. ROC curves for top-performing classifiers at k = 1.00, demonstrating exceptional discrimination capabilities across all SAR target classes. The near-perfect curves validate the superior quality of SSL-extracted features and confirm the framework’s effectiveness for operational SAR ATR applications. (a) SVM (99.63% accuracy); per-class AUC ≥ 0.9999. (b) Random Forest (99.26% accuracy); AUC 0.9996–1.0000. (c) ResNet18 (97.40% accuracy); AUC 0.9994–1.0000. (d) MobileNet (98.70% accuracy); efficient and near-perfect AUC.
Sensors 26 00122 g006
Figure 7. t-SNE visualization of 2048-length SSL features for the test set.
Figure 7. t-SNE visualization of 2048-length SSL features for the test set.
Sensors 26 00122 g007
Table 1. SAMPLE Dataset Distribution of Measured Images by Class and Elevation-Based Train/Test Split.
Table 1. SAMPLE Dataset Distribution of Measured Images by Class and Elevation-Based Train/Test Split.
ClassTypeWheeled?Training ImagesTest ImagesTotal
2S1TankTracked11658174
BMP2TankTracked5552107
BTR70TankWheeled434992
M1TankTracked7851129
M2TankTracked7553128
M35TruckWheeled7653129
M548TruckTracked7553128
M60TankTracked11660176
T72TankTracked5652108
ZSU23TankTracked11658174
Total8065391345
Table 2. CNN Architecture for Pretext Learning.
Table 2. CNN Architecture for Pretext Learning.
LayerOutput ShapeParametersActivation/Action
Input(64, 64, 1)Grayscale SAR image
Conv2D-1(64, 64, 16) 3 × 3 × 1 × 16 ReLU
MaxPool-1(32, 32, 16) 2 × 2 Spatial reduction
Conv2D-2(32, 32, 32) 3 × 3 × 16 × 32 ReLU
MaxPool-2(16, 16, 32) 2 × 2 Spatial reduction
Conv2D-3(16, 16, 64) 3 × 3 × 32 × 64 ReLU
MaxPool-3(8, 8, 64) 2 × 2 Spatial reduction
Conv2D-4(8, 8, 128) 3 × 3 × 64 × 128 ReLU
MaxPool-4(4, 4, 128) 2 × 2 Spatial reduction
Dense-1(1000) 2048 × 1000 ReLU
Dense-2(500) 1000 × 500 ReLU
Dense-3(250) 500 × 250 ReLU
Output(9) 250 × 9 Softmax
Table 3. Downstream Classifier Hyperparameters.
Table 3. Downstream Classifier Hyperparameters.
Model/GroupKey Hyperparameters
All ModelsFeature scaling: StandardScaler (z-score); Feature dimension: 2048
 Traditional Machine Learning Models
SVMKernel: Linear; Probability output: Enabled; Random state: 42
Random ForestEstimators: 100; Random state: 42; Parallel jobs: All cores
Gradient BoostingEstimators: 100; Learning rate: 0.1; Random state: 42
XGBoostEstimators: 100; Tree method: gpu_hist; Objective: multi:softprob; Random state: 42
 Deep Learning Models (All: LR = 0.001, Epochs = 30, Batch = 8, Adam, Val = 0.15, Patience = 5)
ResNet18Input features: 2048
CNN4 convolution layers
U-NetEncoder depth: 4 levels
MobileNet v1Width multipliers: 1.0, 0.75, 0.5, 0.25
EfficientNetVariants: B0, B1, B2, B3
GAN ClassifierGenerator LR: 0.0005; Discriminator LR: 0.001; Adversarial weight: 0.5
Table 4. Number of Training Samples Selected per Class for Different k Values.
Table 4. Number of Training Samples Selected per Class for Different k Values.
k Class Total
2S1 BMP2 BTR70 M1 M2 M35 M548 M60 T72 ZSU23
0 1 2 3 4 5 6 7 8 9
0.05633444463643
0.10126588881261285
0.1518971212121218918127
0.202411916151615241224166
0.2529141120191919291429203
0.3035171324232323351735245
0.3541201628272727412041288
0.4047221832303130472347327
0.4553252036343534532653369
0.5058282239383838582858405
0.5564312443424242643164447
0.6070342647464646703470489
0.6576362851495049763776528
0.7082393155535453824082571
0.7588423359575857884388613
0.8093443563606160934593647
0.8599473767646564994899689
0.9010550397168696810551105731
0.9511153417572737211154111773
1.0011655437875767511656116806
Test58524951535353605258539
This table shows the number of training samples selected for each class at different k-values, where k represents the fraction of available training data used. The last row shows the test data distribution across all classes.
Table 5. Task Ablation Study Findings.
Table 5. Task Ablation Study Findings.
RankTask Combination# TasksAvg Accuracy (%)
1T1–T5_Geometric + T6–T7_SignalQuality796.34
2T1–T5_Geometric596.22
3T0_Original + T1–T5_Geometric696.22
4T1–T5_Geometric + T8_MultiScale696.01
5T1–T5 + T6–T7 + T8895.03
6T0 + T1–T5 + T8794.58
7All_Tasks (Full Framework)994.18
8T0 + T1–T5 + T6–T7892.56
9T6–T7 + T8391.91
10T0_Original189.46
11T8_MultiScale188.71
12T0 + T8287.78
13T0 + T6–T7372.99
14T0 + T6–T7 + T8460.32
15T6–T7_SignalQuality259.80
Table 6. Accuracy (%) for Top Performing Classifiers (Those with ≥ 94.50% at k = 1.00).
Table 6. Accuracy (%) for Top Performing Classifiers (Those with ≥ 94.50% at k = 1.00).
k-ValueEfficientNet B1EfficientNet B3GANMobileNet v1MobileNet v1 0.5MobileNet v1 0.75Random ForestResNet18SVMXGBoost
0.0520.5915.96N/A33.7723.3831.5446.0124.8651.9536.73
0.1027.6426.7235.9934.3237.6629.3152.3247.6851.3932.10
0.1539.5229.8737.4842.1238.0338.4054.5540.4553.8044.53
0.2038.4042.8636.1847.5043.6043.4153.6241.1959.5550.83
0.2546.2046.5750.0952.8853.8051.7661.2255.8464.5650.46
0.3056.2246.3851.9559.7465.4944.3460.4851.2164.1952.13
0.3551.0248.7956.0356.5952.6953.6261.4153.0663.6459.55
0.4060.1162.7161.0460.8561.9764.9470.6964.7565.8663.82
0.4562.7157.7065.8668.4660.1166.7970.8756.5973.1057.33
0.5066.6060.4868.0963.4566.4266.2371.6164.3876.6269.39
0.5558.8164.7569.3970.1369.5768.8378.4867.5377.1873.65
0.6070.3266.0565.4975.7073.2873.8480.7168.4681.6372.73
0.6572.9170.6972.5480.3378.1180.7184.4273.1082.5675.88
0.7076.6272.7375.5175.3277.9276.2583.1273.4782.9373.28
0.7582.9374.7779.5981.6382.9383.3087.9485.5388.6887.94
0.8078.1180.3379.4180.7182.1982.9388.1380.7188.1387.57
0.8584.0484.2386.6490.3584.9789.8090.3590.1793.3288.31
0.9092.9591.2892.0292.7690.1791.8493.8892.7695.3689.98
0.9590.9194.0695.5596.1096.2993.1497.2293.6998.8996.10
1.0095.1895.9296.8598.7095.7397.7799.2697.4099.6393.88
Bold values denote classifier accuracies meeting or exceeding the 94.50% threshold used to define top-performing models.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Siam, M.A.; Noor, D.F.; Ndoye, M.; Khan, J.F. Advancing SAR Target Recognition Through Hierarchical Self-Supervised Learning with Multi-Task Pretext Training. Sensors 2026, 26, 122. https://doi.org/10.3390/s26010122

AMA Style

Siam MA, Noor DF, Ndoye M, Khan JF. Advancing SAR Target Recognition Through Hierarchical Self-Supervised Learning with Multi-Task Pretext Training. Sensors. 2026; 26(1):122. https://doi.org/10.3390/s26010122

Chicago/Turabian Style

Siam, Md Al, Dewan Fahim Noor, Mandoye Ndoye, and Jesmin Farzana Khan. 2026. "Advancing SAR Target Recognition Through Hierarchical Self-Supervised Learning with Multi-Task Pretext Training" Sensors 26, no. 1: 122. https://doi.org/10.3390/s26010122

APA Style

Siam, M. A., Noor, D. F., Ndoye, M., & Khan, J. F. (2026). Advancing SAR Target Recognition Through Hierarchical Self-Supervised Learning with Multi-Task Pretext Training. Sensors, 26(1), 122. https://doi.org/10.3390/s26010122

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop