Next Article in Journal
Statistical CSI-Based Beamspace Transmission for Massive MIMO LEO Satellite Communications
Previous Article in Journal
AdaptPest-Net: A Task-Adaptive Network with Graph–Mamba Fusion for Multi-Scale Agricultural Pest Recognition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Pseudo-Sample Generation and Self-Supervised Framework for Infrared Dim and Small Target Detection

School of Electronic and Information Engineering, Changchun University of Science and Technology, Changchun 130022, China
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(12), 1212; https://doi.org/10.3390/e27121212
Submission received: 2 November 2025 / Revised: 26 November 2025 / Accepted: 26 November 2025 / Published: 28 November 2025
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

Infrared dim and small target detection is crucial for long-range sensing. However, its deep representation learning is severely constrained by the scarcity of accurately annotated real data, and related research remains underdeveloped. Existing data generation methods based on patch synthesis or geometric transformations fail to incorporate the physical degradation mechanisms of infrared imaging systems and reasonable environmental constraints, leading to significant discrepancies between synthetic data and real-world scenarios. To address this issue, this paper proposes a novel pseudo-sample generation paradigm based on physics-informed degradation modeling and high-order constraints. First, we construct an infrared image degradation model that decouples the degradation processes of targets and backgrounds at the signal level, achieving accurate modeling of real infrared imaging while ensuring the reliability of the degradation process through information fidelity optimization. Second, an online grid-based high-order constraint strategy is designed, which synergistically integrates global semantic, local structural, and grayscale constraints based on statistical distribution consistency to generate a high-fidelity infrared simulation dataset. Finally, we build a complete self-supervised detection framework incorporating classical neural networks, customized loss functions, and two-dimensional information evaluation metrics. Extensive experiments demonstrate that the synthetic data generated by our method significantly outperforms existing simulated datasets on authenticity metrics. It also effectively enhances the generalization performance of various detectors in real-world scenarios, achieving detection accuracy superior to baseline models trained on traditional simulated data.

1. Introduction

Infrared dim and small target detection serves as one of the core technologies in information processing fields such as infrared early warning, precision guidance, and remote monitoring [1,2,3,4]. However, infrared dim and small targets in natural scenes typically exhibit extremely low signal-to-noise ratios and information entropy, lack distinct texture information and shape features, and are susceptible to interference from complex backgrounds. These factors pose severe challenges to the detection task. Although deep learning methods have achieved remarkable success in visible light object detection, their application in infrared dim and small target detection is severely constrained by the scarcity of high-quality annotated data. Acquiring large-scale, diverse real infrared images along with pixel-level annotations is not only costly but also nearly infeasible in sensitive domains such as defense, constituting a primary bottleneck for the development of this field.
To overcome this data bottleneck, using synthetic data to train detection models has emerged as a promising alternative. Traditional methods based on simple patch pasting or geometric transformations [5,6] can expand data volume to some extent, but their core flaw lies in the failure to embed the physical mechanisms of infrared imaging systems. The pseudo-samples generated by these methods show significant differences from real physical processes in terms of degradation characteristics, radiation properties, and target–background interactions, leading to severe performance degradation when models trained on such data are generalized to real complex scenarios.
In recent years, the development of pseudo-sample generation algorithms for real infrared dim and small targets has remained largely stagnant. However, several existing heuristic data synthesis methods have provided important references and insights for this study. As shown in Figure 1, current pseudo-sample generation methods start from real infrared images and employ noise sampling and modeling techniques combined with hybrid enhancement strategies to produce pseudo-samples for network training. Although these approaches achieve certain effectiveness in expanding data volume, they fundamentally rely on statistical resampling and noise injection of existing real images. They lack in-depth modeling of information entropy and feature distributions, failing to deeply capture the intrinsic degradation mechanisms between targets and backgrounds in infrared imaging at a physical level. This limitation consequently restricts the generalization performance of detection models trained on such data.
In addition, several other representative methods have explored pseudo-sample generation from different perspectives. Sun et al. [7] proposed a generation method based on physical priors in the RDIAN network, which uses a Gaussian sphere model to simulate the radiation characteristics of targets and integrates them with real collected infrared background images while introducing sensor-specific noise. The ISD-DCGAN method proposed by Zhang et al. [6] explores the application of generative adversarial networks in infrared sequence data generation. This approach decouples the processes of background generation, target generation, and sequence construction through an improved deep convolutional generative adversarial network, attempting to ensure visual realism and sequence continuity of generated samples through adversarial training. However, its integration of infrared physical mechanisms remains relatively indirect, focusing more on data distribution matching. In the LASNet study, Chen et al. [8] further proposed the DISTG algorithm for pseudo-sample generation targeting dense objects. This method models the spatiotemporal distribution and thermodynamic interactions of target clusters through physical simulation, demonstrating its distinctive strength in simulating dense, moving small targets. Nevertheless, its complexity is relatively high, and the modeling of degradation details for targets and backgrounds in single-frame static images could be further refined.
In summary, although existing data synthesis methods have made certain progress, they generally exhibit deficiencies in the depth of physical degradation information modeling and the strength of constraints in the generation process. Specifically, most of these methods remain at the level of appearance enhancement or the introduction of partial physical parameters, failing to systematically construct a complete degradation model. They also lack the application of multi-level constraints that integrate global and local coordination to ensure the physical authenticity and distributional rationality of the generated samples.
To address the aforementioned challenges, this study focuses on the critical issue of high-fidelity infrared pseudo-sample generation, aiming to enhance the quality of synthetic data from both physical degradation and data information distribution perspectives. We contend that effective simulation of infrared dim and small targets must satisfy three requirements: first, establishing an accurate imaging degradation model that separately characterizes the radiation and noise properties of targets and backgrounds; second, adhering to the spatial information distribution patterns of targets in real scenarios, such as occurrence density and positional preferences; and third, ensuring the generated data can effectively drive the detection model to learn feature representations with generalization capability, rather than overfitting to simulation artifacts. Based on this, we propose an information-driven physical degradation framework for high-fidelity infrared dim and small target pseudo-sample generation and self-supervised detection, providing an important solution to address the issues of data scarcity and reliability verification in this field.
In summary, the contributions of this paper can be summarized as follows:
  • We propose a real physics-driven infrared image degradation model, termed TOA, which decouples the degradation processes of targets and backgrounds at the signal level, achieving accurate modeling of the real infrared imaging process.
  • We propose an online grid-based high-order constraint pseudo-sample generation method. This method constrains the position, quantity, and authenticity of dim and small targets in generated pseudo-sample images, bridging the information gap between pseudo-samples and real samples.
  • We construct a complete self-supervised learning detection framework. For the customized pseudo-labels, we designed a dedicated novel position-confidence loss function and a position deviation rate evaluation metric.

2. Related Work

2.1. Infrared Dim and Small Target Detection

The technology of infrared dim and small target detection has undergone a paradigm shift from traditional model-driven approaches to modern data and information-driven methods. Traditional approaches primarily relied on handcrafted features or prior models, such as filter-based methods [9], human visual system-based methods [10], and low-rank matrix recovery-based methods [11]. These methods achieve target detection by constructing complex target-background separation models. However, their performance heavily depends on the accuracy of the preset models, resulting in limited generalization capability in complex and variable real-world scenarios.
With breakthroughs in deep learning, data and information-driven detection methods have demonstrated significant advantages. Based on the evolution of network architectures, existing deep learning methods can be broadly categorized into two main streams: convolutional neural network (CNN)-based methods and Transformer-based methods.
CNN-based methods aim to address the conflict between clutter suppression and feature information retention. Existing work, such as ISTC [12], introduces a context modulation mechanism focusing on enhancing the correlation between target pixels to mitigate clutter effects, yet its background suppression capability remains incomplete. To improve this issue, densely nested networks [13,14] combined with attention mechanisms attempt to preserve deep target semantics by deepening the network structure and strengthening cross-layer feature interactions, but they fail to fundamentally resolve the class imbalance between foreground targets and background clutter. Addressing this, RDIAN [7] utilizes multi-directional convolutional layers to extract target features and dynamically balances the feature responses of targets and backgrounds through progressive interactions within receptive fields, enhancing the model’s discriminative ability. Furthermore, networks based on the U-Net backbone, such as UIU-Net [15], UCDnet [16], and MSHNet [17], have become mainstream. Transformer-based methods (ISTT [18,19]), on the other hand, leverage self-attention mechanisms to model long-range dependencies in images, overcoming the limitations of local receptive fields in CNNs.
Nevertheless, existing deep learning methods heavily rely on large amounts of accurately annotated data, and the scarcity of such data has become a major bottleneck for further performance improvement.

2.2. Sample Generation

Pseudo-sample generation represents the most critical step in applying self-supervised learning to infrared dim and small target detection [20,21,22]. The selection of positive target samples and negative background samples directly determines the authenticity of data information in pseudo-samples. Positive target samples can be obtained through two methods: The first is simulation-based generation, where each generated positive sample image must undergo strict constraints to maintain a feature distribution consistent with real positive sample images. Crucially, simulated positive target samples should cover different target sizes, shapes, and intensity characteristics to ensure the model learns truly useful target features. The alternative method involves first acquiring image samples containing small targets and then extracting real positive target samples from them. This study combines both approaches, generating a total of 2929 positive sample images for final sample generation.
The most direct method for obtaining negative background samples is field photography, which follows three principles [23]: (1) Multi-scenario diversity: Ensuring diversity of negative background images by covering different environments and scenario conditions, with random variations in location, weather, time, and season to obtain diverse negative background samples. (2) Distribution balance: Maintaining balanced selection of negative background images to avoid excessive bias toward any particular background type. A balanced negative background dataset helps the model better distinguish between targets and backgrounds. (3) Interference-free: Ensuring background images contain no sharp interfering elements or non-target objects, preventing the network from learning incorrect behaviors when encountering unlabeled interfering targets. This study collected 21,419 negative background images covering multiple scenarios including clouds, buildings, forests, ridges, pedestrians, and vehicles. The obtained negative background samples are highly diverse, essentially covering all common scenarios encountered in daily environments.

2.3. Image Degradation Model

No imaging system in the real world is perfect; all are affected by various image degradation mechanisms [24], which underscores the necessity of researching image enhancement and information processing technologies. Therefore, the target positive sample images and background negative sample images obtained in this study cannot be directly used for infrared small target pseudo-sample synthesis. They must undergo a degradation model to simulate the information distribution of real-world images.
Image degradation models are widely applied in tasks such as image super-resolution reconstruction, image dehazing, and deblurring [25,26,27]. Infrared dim and small target images in real environments suffer from complex degradation mechanisms. If internal camera noise degradation is neglected, the degradation of moving target images can be divided into two cases: static background and dynamic background. Detecting static dim small targets is relatively easier, with the most direct influencing factor being noise degradation. Degradations such as random Gaussian blur, motion blur, and random scaling formed by changes in both targets and backgrounds present greater detection challenges.
Currently, isotropic Gaussian blur and anisotropic Gaussian blur are two commonly used image degradation methods. Since the motion direction of infrared sensors during real-world image acquisition is randomly variable, we assume that the images are affected by random anisotropic Gaussian blur. The causes of motion blur in infrared dim small targets can be attributed to three factors: (1) Imaging system shake: during imaging, slight shaking or movement of the camera causes blur in all targets in the image. (2) Target motion: movement of the target during imaging leads to changes in its position during the exposure time, leaving motion trails in the image. (3) Long exposure time: under low-light conditions with long exposure times, any slight movement can cause significant blurring effects [28].
Extensive research shows that image problems caused by multi-scale and random imaging distances can be equivalently treated as target random scale degradation due to resolution [29,30]. In addition, noise degradation, random target rotation, and radiation intensity degradation should also be considered as important imaging degradation modes [31,32].

3. Methods

In this section, we first present the TOA degradation model in Section 3.1. Subsequently, an online grid-based high-order constrained pseudo-sample generation method is designed in Section 3.2, where the formulations and detailed parameter settings of three constraint conditions are provided. Finally, a self-supervised training framework is constructed, including pseudo-label specification, training strategy, and loss function in Section 3.3.

3.1. TOA Degradation Model

Infrared dim and small target images in real-world environments undergo complex physical degradation during formation, exhibiting significant signal-to-noise ratio reduction, edge blurring, and radiation distortion. To systematically simulate this process, this paper proposes the Target-Oriented Adaptation (TOA) degradation model, whose overall architecture is shown in Figure 2. The core concept of the TOA model is to decouple the overall degradation process into a background degradation path and a target degradation path, thereby separately characterizing the texture evolution of background clutter and the optical attenuation characteristics of dim small targets.
The TOA model constructs a degradation function library comprising operations such as anisotropic blur, multi-scale sampling, and radiation intensity modulation. On this basis, a Degradation Shuffle mechanism is proposed, which achieves flexible control and diversified generation of the degradation process by randomly combining different degradation factors through information entropy optimization methods, enhancing the authenticity and coverage of synthetic samples.
In natural imaging processes, blurring caused by camera motion typically occurs first, while degradation in target radiation intensity and scale further develops based on the blurred images. This is because adding optical degradation factors to blurred images will further reduce the saliency of targets. Scale degradation usually refers to simulating distance changes or target volume variations. Applying scale changes last better aligns with the characteristics of long-distance, low-radiation targets in the imaging process, and can reduce the loss of target details after blur and brightness processing, preserving important feature information to facilitate subsequent pseudo-sample generation and target detection tasks.
Next, we will detail each degradation factor.

3.1.1. Blur Degradation

During the infrared imaging process, influenced by the relative motion between the camera and the target, dim small targets and background information in real scenes often exhibit significant directional blur degradation characteristics. To simulate this physical process, this paper employs an anisotropic Gaussian blur model [33] to achieve directionally sensitive control of the blur effect.
A differentiated blur strategy is designed according to the degradation characteristics of background and target regions. Background regions typically contain rich textures and high-frequency clutter; therefore, larger random anisotropic Gaussian kernels with a wider standard deviation range are used to effectively smooth details and simulate background degradation in practical imaging. Target regions, especially dim small targets, are prone to loss due to excessive blur; hence, smaller Gaussian kernels with limited standard deviation are applied to maintain degradation rationality while avoiding loss of target saliency.
The mathematical formulation of the anisotropic Gaussian blur is as follows:
For the input image I i m g , its blurred result G i m g is generated by a two-dimensional convolution operation:
G i m g = I i m g G ( k w , k h , σ x , σ y )
where ∗ denotes the convolution operation; G ( k w , k h , σ x , σ y ) represents the anisotropic Gaussian kernel function, which is defined on a grid of size k w × k h with its center at (0,0). For any coordinate ( x , y ) on this grid, where x [ ( k w / 2 ) , ( k w / 2 ) ] , y [ ( k h / 2 ) , ( k h / 2 ) ] , the value of the kernel function is defined as:
G ( x , y , σ x , σ y ) = 1 2 π σ x σ y e x p 1 2 x 2 σ x 2 + y 2 σ y 2
where k w and k h denote the width and height of the Gaussian kernel, respectively; σ x and σ y represent the standard deviations in the two orthogonal directions, controlling the degree and anisotropy of the blur.

3.1.2. Radiation Intensity Degradation

In real infrared imaging scenarios, non-uniform radiation intensity degradation often occurs in both background and target regions due to variations in environmental illumination, manifesting as overall grayscale distortion or local contrast reduction [34]. Background areas are susceptible to influences such as low light or overexposure, leading to global shifts or compression in their grayscale distribution. In contrast, dim-small targets exhibit more pronounced brightness fluctuations and contrast attenuation caused by imaging distance, orientation, and environmental stray light. Such degradations are typically more prominent in the grayscale domain.
To accurately simulate this complex radiation distortion process, we introduce a linear radiation response model that jointly accounts for both brightness and contrast degradation. This model learns the statistical distribution characteristics of radiation intensity in real environments by randomly sampling a gain factor and an offset factor. The degradation function is formulated as:
B i m g = I i m g × g f a c t o r + o f a c t o r
where B i m g denotes the degraded image; I i m g denotes the input image; g f a c t o r ( g m i n , g m a x ) is a random gain factor simulating the contrast degradation process, and o f a c t o r ( o m i n , o m a x ) is a random offset factor simulating the overall brightness degradation.

3.1.3. Scale Degradation

In the generation of pseudo-samples for infrared dim and small targets, scale degradation is a key aspect for simulating variability in real imaging conditions. Scale degradation of the background primarily arises from differences in camera models, resolution configurations, and imaging distances. In contrast, scale degradation of targets is significantly influenced by both imaging distance and changes in the target’s own posture. Therefore, modeling target scale requires comprehensive consideration of both random size degradation and random posture degradation [35].
To address this, we construct a scale degradation function, expressed mathematically as:
M i m g = R θ ( S λ , w ( I i m g ) )
where S λ , w ( · ) denotes a stochastic resampling function. The scaling factor λ is randomly selected from the set { 2 , 3 , 4 } , indicating that the width and height of the input image I i m g are rescaled to λ times (upsampling) or 1 / λ times (downsampling) of the original dimensions. The sampling method w is randomly chosen from {nearest-neighbor, bilinear, bicubic} to simulate the reconstruction differences introduced by various interpolation algorithms. R θ ( · ) represents a random rotation transformation, with the rotation angle θ uniformly generated within the range of ( 60 , 60 ) , used to simulate the apparent target variations caused by relative pose changes. It is important to note that the scale degradation for background samples applies only S λ , w ( · ) , without incorporating the pose rotation transformation.
The selection of the λ range is based on considerations of the physical constraints of the imaging system. As mentioned in Section 3.2.2, the input images are standardized to a resolution of 512 × 512 . Choosing { 2 , 3 , 4 } ensures that the resolution of the generated samples remains within a reasonable interval of approximately 170 × 170 to 2048 × 2048 . This range not only covers mid-to-long-range scale variations but also avoids producing invalid samples with excessively low or high resolutions, thereby ensuring the physical plausibility of the simulated data.

3.2. Pseudo-Sample Generation

As described in Section 2.2, directly using original background and target samples cannot effectively drive the network to learn the distribution of real image information. Moreover, images processed through degradation still require strict constraints during pseudo-sample synthesis to accurately simulate the actual imaging mechanism.
To address this, this paper proposes a grid-based high-order constrained pseudo-sample generation method. This approach collaboratively generates high-fidelity pseudo-sample data with authentic image information through global target constraints, local target constraints, and local grayscale-level constraints. The overall framework of the method is shown in Figure 3, with the algorithm pseudo-code provided in Algorithm 1. The method does not restrict the number of generated targets and allows flexible setting of the target quantity range to adapt to different task requirements.
Algorithm 1 Pseudo-code of Pseudo-Sample Generation
Require:  I b : background image; I t : target images; A: anchor size for grid partitioning; M o : maximum number of objects per image; T a : maximum number of targets per anchor
Ensure:  I g : generation image; I m : mask, L: L a b e l c c
  1:
Initialize: V ( I ) : variance compute; R p o s ( U , w , h , w s , h s ): Function to generate a random position ( x , y ) ; τ : threshold interval; L 0 ; U
  2:
I b d TOA ( I b )
  3:
Sample N o U ( 1 , M o )
  4:
for  i = 1  to  N o  do
  5:
I t d TOA ( I t )
  6:
( h s , w s ) size   ( I t d ); ( h , w ) size   ( I b d )
  7:
( x , y ) R p o s ( U , w , h , w s , h s )
  8:
U U { ( x , y ) }
  9:
 Compute σ b e f o r e V ( I b d [ x : x + w s , y : y + h s ] )
10:
 Blend I t d into I b d [ x : x + w s , y : y + h s ]
11:
 Compute σ a f t e r V ( I b d [ x : x + w s , y : y + h s ] )
12:
if  σ b e f o r e σ a f t e r τ  then
13:
   I g I b d
14:
   I m [ x : x + w s , y : y + h s ] I m
15:
   ( c x , c y ) ( x + w s / 2 , y + h s / 2 )
16:
   ( a x , a y ) ( c x / A , c y / A )
17:
   Δ x c x % A A 0.5 ; Δ y c y % A A 0.5
18:
  Update L [ a x , a y , next ] ( Δ x , Δ y , 1 )
19:
else
20:
  Delete ( x , y )
21:
end if
22:
end for
23:
Normalize I g and apply T if defined
24:
Return I g , I m , L

3.2.1. Global Target Constraint

Global constraints are primarily applied to both target quantity and spatial distribution. First, the total number of targets N added to the entire image is constrained to satisfy N [ 0 , N max ] , ensuring the reasonableness of target quantity and consistency across samples. Second, for an image size of H × W , target positions ( x i , y i ) (where i denotes the target index) are set according to the content distribution characteristics of the images in the dataset. For images with simple and uniform content, targets can be randomly added across the entire image, i.e., 0 x i W 1 , 0 y i H 1 . For complex scene images containing multiple elements such as sky, buildings, trees, and pedestrians, targets should be preferentially added to the upper region of the image, with a minority distributed in other areas.
This constraint can be formally expressed as:
N i = 1 I ( y i H ε ) α N N i = 1 I ( y i > H ε ) ( 1 α ) N , ( α > ε )
where I is the indicator function that equals 1 when the condition is satisfied and 0 otherwise; α is the proportion parameter, and ε controls the division ratio of the upper region.

3.2.2. Local Target Constraint

To enhance the local authenticity of generated samples, this paper further introduces grid-based local constraints. First, the image is uniformly scaled to a resolution of 512 × 512 and divided into 8 × 8 image blocks B i j each with a size of 64 × 64 :
B i j = I [ 64 i : 64 ( i + 1 ) 1 , 64 j : 64 ( j + 1 ) 1 ] , i , j = 0 , 1 , 2 , , 7
Subsequently, the number of targets N i , j in each image block B i j is constrained to N [ 0 , N max ] . This local target constraint effectively avoids the issue of uneven target distribution in local regions and eliminates the occurrence of “pseudo-targets”.

3.2.3. Local Grayscale Constraint

After the degradation processing described in Section 3.1, the visibility of some target samples is significantly reduced. Moreover, global and local positional constraints alone cannot guarantee the compatibility of radiation characteristics between targets and backgrounds. If a target is placed in an area with highly similar radiation features, it should be excluded. To address this, the grayscale values of image block B i j before and after target generation are defined as I i j k and I i j k (where k is the pixel index), and a tolerable local standard deviation change threshold Δ σ t h is set to evaluate the rationality of target placement:
σ p r e i j = 1 n n k = 1 ( I i j k μ i j ) 2 , σ p o s t i j = 1 n n k = 1 ( I i j k μ i j ) 2
where n denotes the total number of pixels in the image patch B i j ; σ p r e i j and σ p o s t i j are the standard deviations within the block before and after adding the target, respectively; μ i j and μ i j are the average grayscale values of the corresponding block. If the condition Δ σ i j = σ p o s t i j σ p r e i j Δ σ t h is satisfied, the target placement at this location is considered radiometrically reasonable and consistent with real imaging characteristics.

3.3. Self-Supervised Learning Framework

This paper constructs a complete self-supervised learning framework consisting of three components: pseudo-sample generation, a neural network library, and an adaptive training strategy. The overall structure is shown in Figure 4. The framework takes the degraded samples and corresponding labels generated earlier as input, optimizes the detection model in an end-to-end manner, and finally validates its generalization capability on real-world data.

3.3.1. Pseudo-Labels

Traditional grid labels rigidly assign targets to specific grid cells. This discrete assignment method tends to introduce annotation noise when targets are near grid boundaries or partially occluded, leading to unstable training and limited localization accuracy.
Unlike the widely used fixed grid labels, the pseudo-labels generated in this study are target position-confidence array labels of size B a t c h × 8 × 8 × 3 × 3 . Through soft assignment and continuous supervision of target positions, they more finely reflect the spatial distribution of targets. This approach is particularly beneficial for improving the detection performance of small and occluded targets.

3.3.2. Training Strategy

The training process adopts an end-to-end supervised approach. Using the pseudo-samples and their corresponding pseudo-labels generated in Section 3.2 as training data, parameter optimization is achieved by minimizing the difference between the network output and the pseudo-labels. The loss function employs the position-confidence loss function L c c defined in Section 3.3.3.

3.3.3. Loss Function

To compute the loss with pseudo-labels, we design a position-confidence loss function ( L c c ). The L c c introduces a weighted combination of position loss ( L c d ) and confidence loss ( L c f ), which ensures accurate judgment of target existence while enhancing the fine-grained prediction capability of target locations. The kernel of both L c f and L c d uses the L 2 -norm [36].
Since The output of the neural network is a five-dimensional array P of size B a t c h × 8 × 8 × 3 × 3 . The last dimension of P and the pseudo-label five-dimensional array T contains the target’s position offset and confidence information, respectively. Specifically, P [ , 0 : 2 ] or T [ , 0 : 2 ] represents the predicted or labeled two-dimensional position coordinates ( x , y ) , while P [ , 2 ] or T [ , 2 ] indicates the predicted or labeled target confidence.
Therefore, L c d can be expressed as:
L c d = i , j , k , l M c o o d · P p o s [ i , j , k , l ] M c o o d · T p o s [ i , j , k , l ] 2
where P p o s [ i , j , k , l ] denotes the two-dimensional coordinates ( x , y ) predicted by the network, corresponding to the original array P [ i , j , k , l , 0 : 2 ] ; T p o s [ i , j , k , l ] represents the two-dimensional coordinates of the pseudo-label at the corresponding position, i.e., the original array T [ i , j , k , l , 0 : 2 ] ; and M c o o d is a mask (also a five-dimensional array) generated based on the pseudo-label confidence, ensuring that only positions with non-zero confidence participate in the loss calculation.
The L c f can be expressed as:
L c f = i , j , k , l ( P c o n f [ i , j , k , l ] T c o n f [ i , j , k , l ] ) 2
where P c o n f [ i , j , k , l , 2 ] denotes the target confidence predicted by the network, i.e., the original array P [ i , j , k , l , 2 ] ; and T c o n f [ i , j , k , l , 2 ] represents the target confidence of the pseudo-label, i.e., the original array P [ i , j , k , l , 2 ] .
Finally, the L c c loss function is defined as:
L c c = λ 1 · L c d + λ 2 · L c f
where λ 1 and λ 2 are balance factors for the position and confidence losses, respectively. When dealing with datasets containing a larger number of target categories or greater scale variations, the training process can be optimized by adjusting λ 1 and λ 2 .

4. Experimental

4.1. Datasets Settings

A. For network training. Background samples are shown in Figure 5e. We selected 6000 images from public datasets including FLIR [37], TNO [38], and MSRS [39] as background samples. Additionally, we constructed an infrared image acquisition platform to collect 2659 custom background samples as Supplementary Data. Finally, the background sample images were expanded to 21,419 through data augmentation techniques. Target samples are shown in Figure 5c. We first acquired three types of infrared drone small target images. Then, we designed a small target morphology detection algorithm to obtain 1500 real target samples with size of 30 × 30 pixels. Furthermore, we generated 1429 simulated target samples with random morphologies through simulation. Thus, the total number of target sample images reached 2929.
B. For network testing. We used public real datasets NUAA-SIRST (ACM) [12] and IRDST-real (RDIAN) [7] for evaluation. Since the labels of these two datasets lack the confidence dimension, we extended them with a confidence dimension to match the output format of our model.

4.2. Network Training Details

All software components were implemented using the open-source PyTorch 2.4.0 framework. Hardware training, testing, model tuning, and subsequent program modularization were performed on a computer equipped with an NVIDIA GeForce RTX 4090 GPU. In the loss function L c c used in this paper, the balance factors for the position loss and confidence loss were set to λ 1 = 0.2 and λ 2 = 0.8, respectively. During network training, the learning rate was set to 0.01, and network optimization was performed using SGD gradient descent with a batch size of 32. The network achieved stable convergence after 200 epochs.

4.3. Evaluation Metrics

To comprehensively evaluate the performance of the proposed method, a quantitative evaluation system is established from two dimensions: pseudo-sample generation quality and target detection performance.

4.3.1. Pseudo-Sample Quality Metrics

To assess the authenticity of the pseudo-samples, the Signal-to-Clutter Ratio (SCR) is adopted to evaluate the difficulty of the target detection task. The SCR is defined as:
S C R = μ t μ b σ b
where μ t represents the grayscale intensity of the target region, while μ b and σ b denote the mean and standard deviation of the background region, respectively.
Information Entropy (IE) is used to describe the texture complexity and information richness of the image. The IE is defined as:
I E = i = 0 255 p i · log 2 p i
where p i indicates the probability of the corresponding grayscale intensity.

4.3.2. Detection Metrics

To evaluate the target localization accuracy of the proposed method, we design a Position Deviation ( P o d ) metric, which measures the average Euclidean distance deviation between the predicted target positions and the ground truth positions, directly reflecting the localization precision. The P o d is formulated as:
P o d = 1 N l a b e l N l a b e l i = 1 ( x , y ) o u t p u t , i ( x , y ) l a b e l , i 2
where N l a b e l denotes the number of real targets in the pseudo-labels; ( x , y ) o u t p u t , i and ( x , y ) l a b e l , i represent the predicted coordinates and the ground truth coordinates in the pseudo-labels of the i-th target, respectively. This metric measures the average pixel distance, with a value range of [ 0 , + ) and units of pixels. A smaller value indicates higher localization accuracy.
Furthermore, we employ the F 1 m e a s u r e ( F 1 ), precision rate ( P r e c ), false alarm rate ( F a ), and Intersection over Union ( I o U ) [40] metrics to comprehensively evaluate the performance of the proposed method. These metrics (except for F a ) are proportional values with a range of [0, 1] and are expressed in percentages (%). Larger values indicate better performance.
The F 1 represents the harmonic mean of precision and recall, providing a comprehensive assessment of the detection performance. It is calculated as:
F 1 = 2 × P r e c × R e c P r e c + R e c
where P r e c and R e c denote the precision rate and recall rate, respectively.
The precision rate ( P r e c ) indicates the proportion of true positive targets among all detections predicted by the model as targets. It is calculated as:
P r e c = T P T P + F P
where T P represents the number of true positive targets correctly detected by the model, and F P denotes the number of false positive targets incorrectly detected by the model.
The false alarm rate ( F a ) quantifies the number of false targets per unit area or unit quantity. In this paper, it is defined as the number of false alarms per million pixels. The formula is:
F a = F P N t o t a l × 10 6
where N t o t a l represents the normalized total number of reference pixels. The value range of F a is [ 0 , + ) , with units of 10 6 . A smaller value indicates better performance.
The Intersection over Union ( I o U ) measures the overlap between the predicted target bounding box and the corresponding ground truth bounding box. It is calculated as:
I o U = A P A T A P A T
where A P represents the area of the predicted box, and A T represents the area of the ground truth box.

4.4. Comparation

To evaluate the training foundation provided by pseudo-samples for self-supervised learning, this study conducts comparative experiments with datasets generated by five classical pseudo-sample synthesis methods. The compared datasets include: the NUDT-SIRST dataset proposed by DNA-Net [13], the NUST-SIRST dataset proposed by MDvsFA [41], the IRDST-simulation dataset proposed by RDIAN [7], the IRSTD-1K dataset proposed by ISNet [42], and the SIRST-5K dataset proposed by AFFNet [20].

4.4.1. Pseudo-Sample Quality Comparison

This section presents a quantitative comparison between the generated pseudo-samples and the five aforementioned simulated datasets using the SCR and IE metrics. The results are shown in Figure 6a,b. It can be observed that the pseudo-samples proposed in this paper outperform existing simulated datasets in terms of image complexity and realism.

4.4.2. Comparison of Detection Performance

A. Quantitative comparison. To comprehensively evaluate the effectiveness and generalization ability of the pseudo-samples generated in this paper, we conduct cross-training experiments using five baseline detection networks (DNA-Net, MDvsFA, RDIAN, ISNet, SIRST-5K) with six simulated datasets (including the pseudo-sample dataset generated in this paper and five baseline simulated datasets). The evaluation is performed on two real infrared dim and small target datasets (NUAA-SIRST and IRDST-real). The quantitative results are presented in Table 1 and Table 2.
As shown in Table 1, the RDIAN network trained with the pseudo-samples from this paper achieves four optimal performance metrics on the real NUAA-SIRST dataset, significantly outperforming results trained with other simulated data. It is worth noting that when trained using the NUST-SIRST simulated dataset, which has high homology with the real test set, RDIAN only performs well on some metrics. This indicates that network performance is still affected by the distribution similarity between the training and test sets, reflecting the limitations of traditional simulation data in cross-scene generalization.
The results in Table 2 further demonstrate that models trained with the proposed method maintain stable advantages on the IRDST-real dataset, particularly excelling in target localization accuracy. The RDIAN baseline model achieves two optimal values and one sub-optimal value. In contrast, although the model trained with the homologous IRDST-simulation data achieves some relatively good results when testing on IRDST-real, its performance significantly depends on the scene homology between the training and test sets. This again confirms the limitations of such methods in cross-scene generalization capability.
Figure 7 shows the ROC curves of five networks trained on six types of simulated data. It can be observed that regardless of the base network architecture, training with the pseudo-samples generated in this paper enables the models to converge faster to better performance regions, and the area under the curve (AUC) is generally higher than results trained with other simulated data. This indicates that the proposed pseudo-samples better approximate real infrared imaging characteristics in terms of feature distribution, noise structure, and target-background interaction relationships, thereby supporting stable detection performance in unknown real-world scenarios.
B. Qualitative comparison. To validate the effectiveness of the proposed method, we randomly selected 10 representative single-target and multi-target infrared images from the NUAA-SIRST and IRDST-real datasets for testing. The experimental results are shown in Figure 8 and Figure 9. Since the maximum number of targets in the IRDST-real dataset is two, the multi-target test only includes dual-target scenarios. To further analyze the detection accuracy of different methods, we simultaneously visualized the centroid localization results of each method.
Taking the single-target scenario as an example (Figure 8a), the centroid localization deviation of the proposed method is only 4 pixels, while the deviations of other methods are 8, 9, 15, 26, and 40 pixels, respectively. In the multi-target scenario (Figure 8e), the deviation of the proposed method is 14 pixels, while the deviations of other methods are 28, 32, 44, 63, and 105 pixels, respectively. The following conclusions can be drawn: (1) The adaptability of detection models to single-target and multi-target tasks differs, with single-target detection being significantly easier; (2) The catastrophic forgetting problem still exists in infrared dim and small target detection tasks and has a significant impact on model performance. The superior performance of the proposed method mainly benefits from the TOA simulation mechanism, which covers diverse imaging conditions in real environments, generates infrared samples with varying morphologies, sizes, visibility levels, and background complexities, and introduces high-order degradation constraints during pseudo-sample generation, thereby effectively enhancing the generalization capability of the model.
Figure 9 presents the qualitative comparison results on the IRDST-real dataset. Combined with the quantitative data in Table 3, it can be observed that the proposed method achieves the lowest total positional deviation among all models, with a value of 58 pixels. In comparison, the total deviations of models trained on NUST-SIRST, NUDT-SIRST, IRSTD-1K, SIRST-5K, and IRDST-simulation are 844, 153, 216, 76, and 56 pixels, respectively. This result further verifies that the proposed method exhibits superior localization accuracy and stability in real-world scenarios.

4.5. Ablation Study

This section conducts detailed ablation studies on the TOA degradation model, the high-order constraint algorithm, and the loss balance factor to validate the effectiveness of the proposed pseudo-sample generation method and the positional deviation rate. All experiments are performed under identical training settings to ensure a fair comparison.

4.5.1. Ablation Study on TOA

This subsection examines the impact of pseudo-samples generated under different degradation configurations on the detection model: no degradation (Scheme A), sequentially adding blur degradation D b l u r (Scheme B), radiation intensity degradation D b c (Scheme C), and scale degradation D s c a l e (Scheme D). The results are shown in Table 4 and Figure 10. To augment the samples, random cropping has been applied in each scheme.
As shown in Table 4, the detection model trained on pseudo-samples without any degradation experiences a sharp decline in performance on the real dataset. In contrast, consistent performance improvements are observed as blur degradation D b l u r , radiation intensity degradation D b c , and scale degradation D s c a l e are incrementally incorporated. This occurs because models trained on data generated without degradation fail to simulate the degradation mechanisms present in real images, leading to poor generalization. It is worth noting that in tests on IRDST-real, which contains images with extremely small target sizes, the model trained with the addition of D s c a l e shows a significant performance boost.
Next, we perform detailed ablation experiments for each type of degradation. The default configuration includes all three degradation types, with only one degradation type’s configuration altered at a time.
A. Blur Degradation ( D b l u r ). To validate the impact of blur degradation modeling on detection performance, we compare traditional motion blur degradation (Scheme A) with the proposed anisotropic Gaussian blur degradation method (Scheme B). Using the NUAA-SIRST and IRDST-real real datasets described in Section 4.1 for testing, we evaluate the effectiveness of different blur modeling approaches within the self-supervised learning framework. The results are shown in Table 5.
Table 5 demonstrates that Scheme B (anisotropic Gaussian blur) significantly outperforms Scheme A (traditional motion blur) across all evaluation metrics. Specifically, on the NUAA-SIRST dataset, the P o d of Scheme B increases from 0.697 to 0.736, the P r e c improves from 98.013 to 98.841, and the F a decreases from 2.994 to 2.614. A consistent trend is observed on the IRDST-real dataset, where Scheme B maintains a high detection rate while achieving notably better localization deviation and false alarm rates compared to Scheme A. These results fully demonstrate that the anisotropic Gaussian blur degradation method adopted in this paper can more accurately simulate the direction-sensitive blur effects caused by target-sensor relative motion and atmospheric turbulence in real imaging processes.
B. Radiation Intensity Degradation ( D b c ). We retrained two detection models using the pseudo-samples generated in this paper for comparison: radiation intensity degradation applied only to targets (Scheme A), radiation intensity degradation applied only to backgrounds (Scheme B), and complete radiation intensity degradation applied to both targets and backgrounds (Scheme C). The radiation intensity degradation factor c f a c t o r in this section is set according to Equation (4). The networks are also tested on two real datasets to validate the impact of radiation intensity degradation on self-supervised learning. The results are shown in Table 6.
As shown in Table 6, the complete radiation intensity degradation (Scheme C) achieves three optimal values on both real test datasets, verifying that both infrared backgrounds and targets undergo radiation intensity degradation in the real world. The proposed complete radiation intensity degradation effectively enhances target detection performance under random brightness distributions. The schemes with degradation applied only to targets (Scheme A) or only to backgrounds (Scheme B) each achieve three sub-optimal values on the two test sets. This indicates that the NUAA-SIRST dataset exhibits relatively severe background degradation, while target information is more prominent compared to the IRDST-real dataset. Conversely, the IRDST-real dataset suffers from significant target degradation, while background information is clearer compared to the NUAA-SIRST dataset.
C. Scale Degradation ( D s c a l e ). In this section, we retrain two detection models for comparison: scale degradation applied only to the background (Scheme A), scale degradation applied only to the target (Scheme B), and complete scale degradation applied to both target and background (Scheme C). The sampling factor and rotation factor are set according to Equation (4). We validate the generalization ability of models trained on simulated data generated with the scale degradation mechanism on real data. The results are shown in Table 7.
As shown in Table 7, the complete scale degradation Scheme C achieves a total of six optimal values across the two real datasets. The P r e c metric improves by 2.23 and 0.07 points compared to Scheme A, and by 1.69 and 1.10 points compared to Scheme B, respectively, in the two real datasets. This verifies the importance of scale degradation in enhancing dim and small target detection performance. Particularly in IRDST-real, scale degradation significantly alleviates the randomness in target sample scale and pose.

4.5.2. Ablation Study on Pseudo-Sample Generation

In this section, we conduct a stepwise ablation experiment on the three constraint conditions for pseudo-sample generation to validate whether the hierarchical constraint strategy of the three constraint methods ( C g , C l , and C g l ) in pseudo-sample generation has a significant impact on the trained detection models. We also train two detection models: one with only the global target constraint ( C g ), one with the global target constraint plus the local target constraint ( C g + C l ), and one with the complete set of global target constraint + local target constraint + local grayscale constraint ( C g + C l + C g l ). The three schemes are tested on two real datasets, and the experimental results are shown in Table 8 and Figure 11.
As shown in Table 8, the complete constraint scheme C g + C l + C g l achieves a total of nine optimal values across the three evaluation metrics on the two real test datasets. The scheme C g + C l achieves seven sub-optimal values on the two real test datasets. Furthermore, it is evident that compared to the evaluation metrics in each ablation scheme for image degradation, the differences in evaluation metrics among the various ablation schemes for pseudo-sample generation show a gradually narrowing trend. This verifies that the proposed pseudo-sample generation constraint method effectively complements the detection model on the basis of image degradation.
As shown in Figure 11, to more subjectively observe whether the number, position, and global layout of pseudo-sample targets generated by the three constraint schemes are closer to the real data distribution, we visualize the results of the three constraint schemes. We use red closed curves to mark unreasonable targets generated by the incomplete pseudo-sample generation schemes. The addition of local constraints and local grayscale constraints results in generated targets with higher local quantity and global realism quality.

4.5.3. Ablation Study on λ 1 and λ 2

To validate the impact of the position loss weight λ 1 and the confidence loss weight λ 2 on model performance, and to explore the rationality of the optimal weight combination (0.2, 0.8), this section designs an ablation experiment on the loss balance factors. The results are shown in Table 9.
The experiment first satisfies the condition λ 1 + λ 2 = 1 , then selects multiple different combinations of ( λ 1 , λ 2 ) = {(0.1, 0.9), (0.2, 0.8), (0.3, 0.7), (0.4, 0.6), (0.5, 0.5), (0.6, 0.4), (0.7, 0.3), (0.8, 0.2), (0.9, 0.1)}. All other hyperparameters (learning rate, batch size, network architecture, etc.) and training settings (dataset, number of iterations, etc.) are kept consistent.
As can be seen from Table 9, the configuration of the loss balance factors λ 1 and λ 2 has a significant non-linear impact on model performance. The optimal combination ( λ 1 = 0.2, λ 2 = 0.8) demonstrates the best performance, with the model beginning to converge after only 23 epochs, which is 26% faster than the sub-optimal combination (0.3, 0.7). In-depth analysis reveals that when the ratio λ 1 / λ 2 remains within the interval of 4.0 ± 0.5, the model performance shows a stable plateau. Performance drops sharply outside this range, validating the physical rationality of the confidence-dominated strategy, indicating that target existence judgment is the foundation for precise localization.

5. Discussion

5.1. Limitations Analysis

The proposed TOA degradation model effectively simulates the “dim, weak, and small” physical characteristics of infrared dim-small targets by integrating blur, radiation intensity, and scale degradations. However, this study has two main limitations. First, the designed degradation pipeline does not yet fully replicate all disturbance factors present in real-world environments, particularly in modeling complex background noise, which somewhat restricts the comprehensiveness of the synthetic data. Second, the degradation factors in our method are set globally based on the entire dataset, lacking adaptive adjustment according to the semantic content of individual images. This may lead to the loss of critical information in some samples under strong degradation.
To address these issues, future work will focus on the following directions: First, we will explore and integrate more sophisticated noise models and physical radiation transfer models to enhance the diversity of degradations. Second, we will investigate an adaptive degradation mechanism based on learnable information channels [43], enabling content-aware modulation of degradation factors to better preserve image fidelity. Third, we plan to incorporate continual learning paradigms [44] to improve feature generalization and mitigate catastrophic forgetting in cross-database scenarios.

5.2. Feasibility Analysis

The generalization capability of deep learning models in real-world scenarios is highly dependent on the distributional consistency between the training data and real-world data. However, acquiring large-scale, high-quality, and accurately annotated real datasets is often time-consuming, labor-intensive, and costly, which has become a critical bottleneck restricting the development of this field [45]. Therefore, generating high-fidelity pseudo-sample data to supplement or replace real data presents a significant solution. However, the core of its feasibility lies in whether the generated pseudo-samples can enable the model to acquire generalization capabilities for real-world scenarios.
The feasibility of the method proposed in this paper is not based on theoretical assumptions but is grounded on empirical evidence from a series of experiments. Specifically, models trained solely on our synthetic data still exhibit excellent and stable detection performance on real-world datasets that were not involved in training, significantly outperforming various baseline simulated data. This demonstrates that the data generated by our method can effectively support the model in learning feature representations that are transferable to the real world. Furthermore, systematic comparisons with existing methods indicate that the physical degradation model and high-order constraints can continuously reduce the distributional discrepancy between simulated and real data. Consequently, through physically-guided generation design and empirical performance validation, this work collectively substantiates the feasibility and reliability of the proposed technical pathway in addressing the challenge of real data scarcity.

6. Conclusions

This paper proposes a novel pseudo-sample generation method based on physical information-driven and high-order constraints, overcoming the limitations of existing fully supervised learning approaches that heavily rely on difficult-to-annotate real infrared datasets for training. First, we developed the TOA image degradation model, which preserves reasonable information entropy distribution to create a degraded image database for pseudo-sample generation. Second, we introduced an online grid-based high-order constrained pseudo-sample generation method that integrates global target constraints, local target constraints, and local grayscale-level constraints. Through feature distribution consistency optimization, this approach ensures more reasonable target positioning, layout, and realism in generated images. Additionally, we created target position-confidence labels for network training, guaranteeing information integrity in feature representation. Finally, we established a complete network training framework that achieves successful convergence. Experimental results demonstrate that our self-supervised learning framework not only effectively suppresses complex degradation effects in real-world imaging and improves detection efficiency, but also addresses numerous challenges arising from data scarcity in deep learning applications for infrared dim and small target detection.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/e27121212/s1.

Author Contributions

Conceptualization, J.G.; Methodology, W.Z.; Formal analysis, D.Z. and D.H.; Investigation, Y.J.; Resources, J.G.; Data curation, D.H., Y.C. and Y.J.; Writing—original draft preparation, J.G.; Writing—review and editing, W.Z.; Visualization, D.Z. and Y.C.; Supervision, W.Z.; Funding acquisition, J.G. and W.Z.; Validation, X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the first batch of the “Jilin Province Doctoral Student Support Program” from the Jilin Association for Science and Technology, China (Grant No. BST202532).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Qin, H.L.; Xu, T.F.; Tang, Y.; Xu, F.; Li, J. Osformer: One-step transformer for infrared video small object detection. IEEE Trans. Image Process. 2025, 34, 5725–5736. [Google Scholar] [CrossRef]
  2. Zhang, M.J.; Li, X.L.; Gao, F.; Guo, J.; Gao, X.; Zhang, J. SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 10–17 June 2025; pp. 9549–9558. [Google Scholar]
  3. Han, J.H.; Moradi, S.; Zhou, B.; Wang, W.; Zhao, Q.; Luo, Z. A True Global Contrast Method for IR Small Target Detection under Complex Background. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5004424. [Google Scholar] [CrossRef]
  4. Yue, T.R.; Lu, X.J.; Cai, J.X.; Chen, Y.; Chu, S. SDS-Net: Shallow-Deep Synergism-detection Network for infrared small target detection. arXiv 2025, arXiv:2506.06042. [Google Scholar] [CrossRef]
  5. Kim, J.H.; Hwang, Y. GAN-based synthetic data augmentation for infrared small target detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5002512. [Google Scholar] [CrossRef]
  6. Zhang, L.H.; Lin, W.H.; Shen, Z.M.; Zhang, D.; Xu, B.; Wang, K.; Chen, J. Infrared dim and small target sequence dataset generation method based on generative adversarial networks. Electronics 2023, 12, 3625. [Google Scholar] [CrossRef]
  7. Sun, H.; Bai, J.X.; Yang, F.; Bai, X. Receptive-field and direction induced attention network for infrared dim small target detection with a large-scale dataset IRDST. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5000513. [Google Scholar] [CrossRef]
  8. Chen, S.J.; Ji, L.P.; Zhu, S.C.; Ye, M.; Ren, H.; Sang, Y. Towards dense moving infrared small target detection: New datasets and baseline. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5005513. [Google Scholar] [CrossRef]
  9. Li, Y.S.; Li, Z.Z.; Zhang, C.; Luo, Z.; Zhu, Y.; Ding, Z.; Qin, T. Infrared maritime dim small target detection based on spatiotemporal cues and directional morphological filtering. Infrared Phys. Technol. 2021, 115, 103657. [Google Scholar] [CrossRef]
  10. Cui, H.X.; Li, L.Y.; Liu, X.; Su, X.; Chen, F. Infrared small target detection based on weighted three-layer window local contrast. IEEE Geosci. Remote Sens. Lett. 2021, 19, 7505705. [Google Scholar] [CrossRef]
  11. Yan, F.J.; Xu, G.L.; Wu, Q.; Wang, J.; Li, Z. Infrared small target detection using kernel low-rank approximation and regularization terms for constraints. Infrared Phys. Technol. 2022, 125, 104222. [Google Scholar] [CrossRef]
  12. Dai, Y.M.; Wu, Y.Q.; Zhou, F.; Barnard, K. Asymmetric contextual modulation for infrared small target detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 5–9 January 2021; pp. 950–959. [Google Scholar]
  13. Li, B.Y.; Xiao, C.; Wang, L.G.; Wang, Y.; Lin, Z.; Li, M.; An, W.; Guo, Y. Dense nested attention network for infrared small target detection. IEEE Trans. Image Process. 2022, 32, 1745–1758. [Google Scholar] [CrossRef]
  14. Nian, B.K.; Zhang, Y.; Zhang, Y.; Shi, H.J. Dense Nested Network Based on Position-Aware Dynamic Parameter Convolution Kernel for Infrared Small Target Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 7213–7227. [Google Scholar] [CrossRef]
  15. Wu, X.; Hong, D.F.; Chanussot, J. UIU-Net: U-Net in U-Net for infrared small object detection. IEEE Trans. Image Process. 2022, 32, 364–376. [Google Scholar] [CrossRef] [PubMed]
  16. Xu, X.D.; Wang, J.R.; Zhu, M.; Sun, H.; Wu, Z.; Wang, Y.; Cao, S.; Liu, S. UCDnet: Double U-Shaped Segmentation Network Cascade Centroid Map Prediction for Infrared Weak Small Target Detection. Remote Sens. 2023, 15, 3736. [Google Scholar] [CrossRef]
  17. Liu, Q.K.; Liu, R.; Zheng, B.L.; Wang, H.; Fu, Y. Infrared Small Target Detection with Scale and Location Sensitivity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 17490–17499. [Google Scholar]
  18. Yuan, S.; Qin, H.L.; Yan, X.; Akhtar, N.; Mian, A. Sctransnet: Spatial-channel cross transformer network for infrared small target detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5002615. [Google Scholar] [CrossRef]
  19. Chen, T.X.; Tan, Z.T.; Chu, Q.; Wu, Y.; Liu, B.; Yu, N. TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 1201–1209. [Google Scholar]
  20. Lu, Y.H.; Lin, Y.P.; Wu, H.; Xian, X.; Shi, Y.; Lin, L. SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised Learning for Robust Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5002911. [Google Scholar] [CrossRef]
  21. Ziegler, A.; Asano, Y.M. Self-supervised learning of object parts for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 14502–14511. [Google Scholar]
  22. Yu, J.; Zhang, L.W.; Du, S.S.; Chang, H.; Lu, K.; Zhang, Z.; Yu, Y.; Wang, L.; Ling, Q. Pseudo-label generation and various data augmentation for semi-supervised hyperspectral object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 305–312. [Google Scholar]
  23. Li, C.L.; Ye, X.F.; Cao, D.X.; Hou, J.; Yang, H. Zero shot objects classification method of side scan sonar image based on synthesis of pseudo samples. Appl. Acoust. 2021, 173, 107691. [Google Scholar] [CrossRef]
  24. Liu, A.R.; Liu, Y.H.; Gu, J.J.; Qiao, Y.; Dong, C. Blind image super-resolution: A survey and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 5461–5480. [Google Scholar] [CrossRef]
  25. Liu, X.B.; Chen, S.Q.; Song, L.P.; Woźniak, M.; Liu, S. Self-attention negative feedback network for real-time image super-resolution. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 6179–6186. [Google Scholar] [CrossRef]
  26. Liang, W.; Long, J.; Li, K.C.; Xu, J.; Ma, N.; Lei, X. A fast defogging image recognition algorithm based on bilateral hybrid filtering. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2021, 17, 42. [Google Scholar] [CrossRef]
  27. Li, D.S.; Zhang, Y.; Cheung, K.C.; Wang, X.; Qin, H.; Li, H. Learning degradation representations for image deblurring. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 736–753. [Google Scholar]
  28. Niu, W.J.; Zhang, K.H.; Luo, W.H.; Zhong, Y. Blind motion deblurring super-resolution: When dynamic spatio-temporal learning meets static image understanding. IEEE Trans. Image Process. 2021, 30, 7101–7111. [Google Scholar] [CrossRef]
  29. Zhang, W.L.; Shi, G.Y.; Liu, Y.H.; Dong, C.; Wu, X.M. A closer look at blind super-resolution: Degradation models, baselines, and performance upper bounds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 527–536. [Google Scholar]
  30. Yin, G.H.; Wang, W.; Yuan, Z.H.; Ji, W.; Yu, D.; Sun, S.; Chua, T.-S.; Wang, C. Conditional meta-network for blind super-resolution with multiple degradations. arXiv 2021, arXiv:2104.03926. [Google Scholar]
  31. Kawar, B.; Elad, M.; Ermon, S.; Song, J. Denoising diffusion restoration models. Adv. Neural Inf. Process. Syst. 2022, 35, 23593–23606. [Google Scholar]
  32. Jiang, R.H.; Han, Y.H. Generalizing to Out-of-Sample Degradations via Model Reprogramming. IEEE Trans. Image Process. 2024, 33, 2783–2794. [Google Scholar] [CrossRef] [PubMed]
  33. Kumar, S.; Kumar, N.; Alam, K. A nonlinear anisotropic diffusion equation for image restoration with forward-backward diffusivities. Recent Adv. Electr. Electron. Eng. 2021, 14, 428–434. [Google Scholar] [CrossRef]
  34. Yang, S.Z.; Ding, M.X.; Wu, Y.M.; Li, Z.; Zhang, J. Implicit neural representation for cooperative low-light image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 12918–12927. [Google Scholar]
  35. Wang, X.T.; Xie, L.B.; Dong, C.; Shan, Y. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 1905–1914. [Google Scholar]
  36. Hodson, T.O. Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. Discuss. 2022, 15, 5481–5487. [Google Scholar] [CrossRef]
  37. FLIR Dataset [EB/OL]. 2017. Available online: https://oem.flir.com/solutions/automotive/adas-dataset-form/ (accessed on 1 October 2020).
  38. Toet, A. The TNO multiband image data collection. Data Brief 2017, 15, 249–251. [Google Scholar] [CrossRef]
  39. Tang, L.F.; Yuan, J.T.; Zhang, H.; Jiang, X.; Ma, J. PIAFusion: A progressive infrared and visible image fusion network based on illumination aware. Inf. Fusion 2022, 83, 79–92. [Google Scholar] [CrossRef]
  40. Chen, T.X.; Ye, Z.; Tan, Z.T.; Gong, T.; Wu, Y.; Chu, Q.; Liu, B.; Yu, N.; Ye, J. Mim-istd: Mamba-in-mamba for efficient infrared small target detection. arXiv 2024, arXiv:2403.02148. [Google Scholar] [CrossRef]
  41. Wang, H.; Zhou, L.P.; Wang, L. Miss detection vs. false alarm: Adversarial learning for small object segmentation in infrared images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8509–8518. [Google Scholar]
  42. Zhang, M.J.; Zhang, R.; Yang, Y.X.; Bai, H.; Zhang, J.; Guo, J. ISNet: Shape matters for infrared small target detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 877–886. [Google Scholar]
  43. Emad, M.; Peemen, M.; Corporaal, H. MoESR: Blind super-resolution using kernel-aware mixture of experts. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 3408–3417. [Google Scholar]
  44. Mai, Z.D.; Li, R.W.; Jeong, J.; Quispe, D.; Kim, H.; Sanner, S. Online continual learning in image classification: An empirical survey. Neurocomputing 2022, 469, 28–51. [Google Scholar] [CrossRef]
  45. Qiu, Z.B.; Ma, Y.; Fan, F.; Huang, J.; Wu, L.; Du, Y. Improved DBSCAN for infrared cluster small target detection. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5511905. [Google Scholar] [CrossRef]
Figure 1. Comparison between the proposed method and existing pseudo-sample generation and network learning approaches. Section (a) shows existing methods, while section (b) presents our proposed framework. The proposed online pseudo-sample generation method simulates real image degradation mechanisms and incorporates high-order constraints, resulting in generated targets that exhibit closer alignment with real images in terms of quantity, spatial distribution, and information characteristics.
Figure 1. Comparison between the proposed method and existing pseudo-sample generation and network learning approaches. Section (a) shows existing methods, while section (b) presents our proposed framework. The proposed online pseudo-sample generation method simulates real image degradation mechanisms and incorporates high-order constraints, resulting in generated targets that exhibit closer alignment with real images in terms of quantity, spatial distribution, and information characteristics.
Entropy 27 01212 g001
Figure 2. TOA image degradation model architecture.
Figure 2. TOA image degradation model architecture.
Entropy 27 01212 g002
Figure 3. The framework of the high-order constraint pseudo-sample generation algorithm proposed in this paper.
Figure 3. The framework of the high-order constraint pseudo-sample generation algorithm proposed in this paper.
Entropy 27 01212 g003
Figure 4. The self-supervised learning training framework proposed in our paper.
Figure 4. The self-supervised learning training framework proposed in our paper.
Entropy 27 01212 g004
Figure 5. Description of the simulated dataset acquisition process. (a) Original image library for acquiring small target binary labels. (b) Display of acquired small target binary images. (c) Acquired a library of 2929 small target label images with a size of 30 × 30 pixels. (d) Acquired library of 8659 infrared background images with a size of 512 × 512 pixels, to which small targets need to be added. (e) Infrared background image library containing 21,419 images after data augmentation.
Figure 5. Description of the simulated dataset acquisition process. (a) Original image library for acquiring small target binary labels. (b) Display of acquired small target binary images. (c) Acquired a library of 2929 small target label images with a size of 30 × 30 pixels. (d) Acquired library of 8659 infrared background images with a size of 512 × 512 pixels, to which small targets need to be added. (e) Infrared background image library containing 21,419 images after data augmentation.
Entropy 27 01212 g005
Figure 6. Distribution of the proposed pseudo-sample quality in terms of SCR and IE. (a) The x-axis represents the SCR range (0–50), where lower values indicate more challenging detection targets. The y-axis shows the proportion of samples under different SCR values relative to the total dataset. (b) The x-axis represents image complexity and information content, while the y-axis indicates the proportion of samples under different IE values relative to the total dataset. The orange area represents the distribution of high-quality pseudo-samples. The more data points in this area, the better the quality. The pseudo-samples obtained in this study exhibit higher detection complexity, richer information content, and greater structural authenticity.
Figure 6. Distribution of the proposed pseudo-sample quality in terms of SCR and IE. (a) The x-axis represents the SCR range (0–50), where lower values indicate more challenging detection targets. The y-axis shows the proportion of samples under different SCR values relative to the total dataset. (b) The x-axis represents image complexity and information content, while the y-axis indicates the proportion of samples under different IE values relative to the total dataset. The orange area represents the distribution of high-quality pseudo-samples. The more data points in this area, the better the quality. The pseudo-samples obtained in this study exhibit higher detection complexity, richer information content, and greater structural authenticity.
Entropy 27 01212 g006
Figure 7. ROC curve evaluation of five networks trained on six simulated datasets, respectively.
Figure 7. ROC curve evaluation of five networks trained on six simulated datasets, respectively.
Entropy 27 01212 g007
Figure 8. Qualitative visualization results on the real dataset NUAA-SIRST. Each row of images corresponds to the centroid localization results of five baseline networks: the first column shows the original infrared image, the second column shows the ground truth centroid annotation, and the third to eighth columns display the detection results of models trained on the six simulated datasets. It can be clearly observed that the self-supervised learning method proposed in this paper outperforms the comparative methods in terms of false alarm rate, detection rate, and localization accuracy.
Figure 8. Qualitative visualization results on the real dataset NUAA-SIRST. Each row of images corresponds to the centroid localization results of five baseline networks: the first column shows the original infrared image, the second column shows the ground truth centroid annotation, and the third to eighth columns display the detection results of models trained on the six simulated datasets. It can be clearly observed that the self-supervised learning method proposed in this paper outperforms the comparative methods in terms of false alarm rate, detection rate, and localization accuracy.
Entropy 27 01212 g008
Figure 9. Qualitative visualization results on the real dataset IRDST-real. Each row of images represents the centroid visualization results of five baseline networks. The first column displays the original image; the second column shows the ground truth centroid visualization; and the third to eighth columns present the centroid visualizations of detection results from models trained on the six simulated datasets. It can be clearly seen that the self-supervised learning method proposed in this paper achieves a lower false alarm rate, a higher detection rate, and superior target localization accuracy.
Figure 9. Qualitative visualization results on the real dataset IRDST-real. Each row of images represents the centroid visualization results of five baseline networks. The first column displays the original image; the second column shows the ground truth centroid visualization; and the third to eighth columns present the centroid visualizations of detection results from models trained on the six simulated datasets. It can be clearly seen that the self-supervised learning method proposed in this paper achieves a lower false alarm rate, a higher detection rate, and superior target localization accuracy.
Entropy 27 01212 g009
Figure 10. Pseudo-sample images generated by four different degradation schemes (magnification recommended for detailed observation). From left to right: No degradation, D b l u r , D b l u r + D b r , and D b l u r + D b r + D s c a l e .
Figure 10. Pseudo-sample images generated by four different degradation schemes (magnification recommended for detailed observation). From left to right: No degradation, D b l u r , D b l u r + D b r , and D b l u r + D b r + D s c a l e .
Entropy 27 01212 g010
Figure 11. Pseudo-sample images generated by three different constraint schemes (magnification recommended for detailed observation). From left to right: C g , C g + C l and C g + C l + C g l .
Figure 11. Pseudo-sample images generated by three different constraint schemes (magnification recommended for detailed observation). From left to right: C g , C g + C l and C g + C l + C g l .
Entropy 27 01212 g011
Table 1. Quantitative comparison results of five networks trained on six simulated datasets and evaluated on the real dataset NUAA-SIRST. The gray-shaded row highlights the results of our proposed method. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 1. Quantitative comparison results of five networks trained on six simulated datasets and evaluated on the real dataset NUAA-SIRST. The gray-shaded row highlights the results of our proposed method. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
NetworkTrain Datasets Pod ( Pixel ) F 1 ( % ) Prec ( % ) F a ( 10 6 ) IoU ( % )
MDvsFANUDT-SIRST0.55280.74194.3286.21581.624
NUST-SIRST0.40594.21595.7435.32291.842
IRDST-simulation9.74567.21774.32420.21161.835
IRSTD-1K0.46886.84496.7457.41785.328
SIRST-5K0.58373.32892.2126.74673.467
ours dataset0.34291.74595.3286.21993.433
ISNetNUDT-SIRST0.38781.74393.3286.21579.338
NUST-SIRST0.28392.32896.4674.32893.229
IRDST-simulation8.29469.84577.32818.74362.327
IRSTD-1K0.37587.58797.8414.21687.648
SIRST-5K0.54874.82891.2158.74574.437
ours dataset0.19795.21597.2364.32294.339
AFFNetNUDT-SIRST0.57379.41691.6499.32878.564
NUST-SIRST0.47294.74592.7345.84594.393
IRDST-simulation9.88467.32472.84524.41760.845
IRSTD-1K0.55285.32898.3679.74385.448
SIRST-5K0.61572.41789.3289.21671.206
ours dataset0.32794.32393.7455.63991.245
DNA-NetNUDT-SIRST0.37282.32394.7445.84380.215
NUST-SIRST0.19895.84997.2153.74792.328
IRDST-simulation7.84569.74776.21316.82862.434
IRSTD-1K0.38788.74597.3283.84287.406
SIRST-5K0.46575.11693.3246.21974.745
ours dataset0.04392.32198.7452.84495.215
RDIANNUDT-SIRST0.37183.00796.9754.54282.539
NUST-SIRST0.11296.88498.7352.75591.447
IRDST-simulation8.27569.86277.52716.62364.325
IRSTD-1K0.33587.61397.3043.49289.106
SIRST-5K0.44675.24395.3634.80476.009
ours dataset0.03694.30598.8412.61495.386
Table 2. Quantitative comparison results of five networks trained on six simulated datasets and evaluated on the real dataset IRDST-real. The gray-shaded row highlights the results of our proposed method. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 2. Quantitative comparison results of five networks trained on six simulated datasets and evaluated on the real dataset IRDST-real. The gray-shaded row highlights the results of our proposed method. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
NetworkTrain Datasets Pod ( Pixel ) F 1 ( % ) Prec ( % ) F a ( 10 6 ) IoU ( % )
MDvsFANUDT-SIRST0.48781.32892.4078.74580.215
NUST-SIRST17.21570.74585.21515.32267.884
IRDST-simulation0.23392.36794.8355.71391.215
IRSTD-1K0.57279.43590.3119.74775.835
SIRST-5K0.38785.74595.2055.47985.545
ours dataset0.05891.21994.8455.32991.735
ISNetNUDT-SIRST0.57380.31390.21310.63482.862
NUST-SIRST15.41668.43783.20517.14569.954
IRDST-simulation0.12894.23596.7453.52892.135
IRSTD-1K0.67377.41792.3398.74576.267
SIRST-5K0.54886.34896.6444.23486.382
ours dataset0.15792.86495.7063.22891.015
AFFNetNUDT-SIRST0.41279.21589.34819.44979.463
NUST-SIRST18.74567.43181.39718.33865.675
IRDST-simulation0.02291.36991.6154.73990.396
IRSTD-1K0.73976.32889.4679.32675.254
SIRST-5K0.31886.84591.2285.74484.379
ours dataset0.63790.37592.5477.76489.328
DNA-NetNUDT-SIRST0.51281.84193.3285.41383.841
NUST-SIRST14.32669.46383.74512.21869.486
IRDST-simulation0.05394.71997.3283.84594.728
IRSTD-1K0.51578.33792.7466.30974.229
SIRST-5K0.47287.84597.2153.79587.336
ours dataset0.08993.70396.3283.46792.845
RDIANNUDT-SIRST0.54886.24395.3833.74184.331
NUST-SIRST11.56373.31887.53413.36570.198
IRDST-simulation0.04694.64298.0462.88193.142
IRSTD-1K0.44281.22793.2246.80478.247
SIRST-5K0.36389.14597.5123.41687.512
ours dataset0.05194.81797.6632.55292.856
Table 3. Statistics of position pixel deviation (unit: pixel) for various models on the real dataset IRDST-real.
Table 3. Statistics of position pixel deviation (unit: pixel) for various models on the real dataset IRDST-real.
Model/Train DatasetsNUST-SIRSTNUDT-SIRSTIRSTD-1KSIRST-5KIRDST-SimulationOurs Dataset
MDvsFA451520202012
ISNet12134658187
AFFNet12334581958
DNA-Net45621336512
RDAN99494023819
Total844153216765658
Table 4. Qualitative comparison of five schemes on three evaluation metrics for the real datasets NUAA-SIRST and IRDST-real. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 4. Qualitative comparison of five schemes on three evaluation metrics for the real datasets NUAA-SIRST and IRDST-real. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
SchemeNUAA-SIRSTIRDST-Real
Pod ( Pixel ) F 1 ( % ) Prec ( % ) F a ( 10 6 ) IoU ( % ) Pod ( Pixel ) F 1 ( % ) Prec ( % ) F a ( 10 6 ) IoU ( % )
A8.93575.60179.54911.84277.16313.10771.47973.00416.35275.060
B4.33790.73494.0467.48589.5498.10488.59285.11610.44082.339
C1.20588.24997.1183.11796.3306.99685.38093.8107.63787.548
D0.03694.30598.8412.61495.3860.05192.81797.6632.55292.856
Table 5. Qualitative comparison of four blur degradation schemes on two real datasets and three evaluation metrics. The best and second-best results are highlighted in red. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 5. Qualitative comparison of four blur degradation schemes on two real datasets and three evaluation metrics. The best and second-best results are highlighted in red. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
SchemeNUAA-SIRSTIRDST-Real
Pod ( Pixel ) Prec ( % ) F a ( 10 6 ) Pod ( Pixel ) Prec ( % ) F a ( 10 6 )
A0.69798.0132.9942.51095.4214.232
B0.03698.8412.6140.05197.6632.552
Table 6. Qualitative comparison of three Radiation Intensity Degradation schemes on two real datasets and three evaluation metrics. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 6. Qualitative comparison of three Radiation Intensity Degradation schemes on two real datasets and three evaluation metrics. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
SchemeNUAA-SIRSTIRDST-Real
Pod ( Pixel ) Prec ( % ) F a ( 10 6 ) Pod ( Pixel ) Prec ( % ) F a ( 10 6 )
A2.19696.1643.0871.19596.3603.260
B0.17197.9132.7263.99795.1913.793
C0.03698.8412.6140.05197.6632.552
Table 7. Qualitative comparison of three scale degradation schemes on two real datasets and three evaluation metrics. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 7. Qualitative comparison of three scale degradation schemes on two real datasets and three evaluation metrics. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
SchemeNUAA-SIRSTIRDST-Real
Pod ( Pixel ) Prec ( % ) F a ( 10 6 ) Pod ( Pixel ) Prec ( % ) F a ( 10 6 )
A3.65096.6843.4411.23797.5862.603
B2.22797.1692.6911.57296.5975.961
C0.03698.8412.6140.05197.6632.552
Table 8. Qualitative comparison of three constraint schemes on five evaluathion metrics for the real datasets NUAA-SIRST and IRDST-real. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 8. Qualitative comparison of three constraint schemes on five evaluathion metrics for the real datasets NUAA-SIRST and IRDST-real. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
SchemeNUAA-SIRSTIRDST-Real
C g C l C gl Pod ( Pixel ) F 1 ( % ) Prec ( % ) F a ( 10 6 ) IoU ( % ) Pod ( Pixel ) F 1 ( % ) Prec ( % ) F a ( 10 6 ) IoU ( % )
1.63991.06498.9673.91694.2251.93885.39491.0315.19087.438
0.22793.22896.3712.83793.1874.20488.71295.1943.77690.550
0.03694.30598.8412.61495.3860.05192.81797.6632.55292.856
Table 9. Impact of different loss balancing factor combinations λ 1 and λ 2 on self-supervised performance. “Epoch” indicates the iteration at which the model begins to converge. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 9. Impact of different loss balancing factor combinations λ 1 and λ 2 on self-supervised performance. “Epoch” indicates the iteration at which the model begins to converge. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Scheme ( λ 1 , λ 2 ) Pod ( Pixel ) F 1 ( % ) Prec ( % ) F a ( 10 6 ) Epoch
(0.1, 0.9)0.04292.8596.313.8736
(0.2, 0.8)0.03694.3098.842.6123
(0.3, 0.7)0.03994.6797.923.0428
(0.4, 0.6)0.04593.9595.763.7835
(0.5, 0.5)0.05791.2096.804.3367
(0.6, 0.4)0.05188.3794.205.1265
(0.7, 0.3)0.06389.5092.457.8186
(0.8, 0.2)0.07587.4090.1010.2590
(0.9, 0.1)0.08883.9288.2012.64112
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guo, J.; Zhan, W.; Huo, D.; Zhu, D.; Chen, Y.; Jiang, Y.; Xu, X. Pseudo-Sample Generation and Self-Supervised Framework for Infrared Dim and Small Target Detection. Entropy 2025, 27, 1212. https://doi.org/10.3390/e27121212

AMA Style

Guo J, Zhan W, Huo D, Zhu D, Chen Y, Jiang Y, Xu X. Pseudo-Sample Generation and Self-Supervised Framework for Infrared Dim and Small Target Detection. Entropy. 2025; 27(12):1212. https://doi.org/10.3390/e27121212

Chicago/Turabian Style

Guo, Jinxin, Weida Zhan, Dehua Huo, Depeng Zhu, Yu Chen, Yichun Jiang, and Xiaoyu Xu. 2025. "Pseudo-Sample Generation and Self-Supervised Framework for Infrared Dim and Small Target Detection" Entropy 27, no. 12: 1212. https://doi.org/10.3390/e27121212

APA Style

Guo, J., Zhan, W., Huo, D., Zhu, D., Chen, Y., Jiang, Y., & Xu, X. (2025). Pseudo-Sample Generation and Self-Supervised Framework for Infrared Dim and Small Target Detection. Entropy, 27(12), 1212. https://doi.org/10.3390/e27121212

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop