Figure 1.
Comparison between the proposed method and existing pseudo-sample generation and network learning approaches. Section (a) shows existing methods, while section (b) presents our proposed framework. The proposed online pseudo-sample generation method simulates real image degradation mechanisms and incorporates high-order constraints, resulting in generated targets that exhibit closer alignment with real images in terms of quantity, spatial distribution, and information characteristics.
Figure 1.
Comparison between the proposed method and existing pseudo-sample generation and network learning approaches. Section (a) shows existing methods, while section (b) presents our proposed framework. The proposed online pseudo-sample generation method simulates real image degradation mechanisms and incorporates high-order constraints, resulting in generated targets that exhibit closer alignment with real images in terms of quantity, spatial distribution, and information characteristics.
Figure 2.
TOA image degradation model architecture.
Figure 2.
TOA image degradation model architecture.
Figure 3.
The framework of the high-order constraint pseudo-sample generation algorithm proposed in this paper.
Figure 3.
The framework of the high-order constraint pseudo-sample generation algorithm proposed in this paper.
Figure 4.
The self-supervised learning training framework proposed in our paper.
Figure 4.
The self-supervised learning training framework proposed in our paper.
Figure 5.
Description of the simulated dataset acquisition process. (a) Original image library for acquiring small target binary labels. (b) Display of acquired small target binary images. (c) Acquired a library of 2929 small target label images with a size of pixels. (d) Acquired library of 8659 infrared background images with a size of pixels, to which small targets need to be added. (e) Infrared background image library containing 21,419 images after data augmentation.
Figure 5.
Description of the simulated dataset acquisition process. (a) Original image library for acquiring small target binary labels. (b) Display of acquired small target binary images. (c) Acquired a library of 2929 small target label images with a size of pixels. (d) Acquired library of 8659 infrared background images with a size of pixels, to which small targets need to be added. (e) Infrared background image library containing 21,419 images after data augmentation.
Figure 6.
Distribution of the proposed pseudo-sample quality in terms of SCR and IE. (a) The x-axis represents the SCR range (0–50), where lower values indicate more challenging detection targets. The y-axis shows the proportion of samples under different SCR values relative to the total dataset. (b) The x-axis represents image complexity and information content, while the y-axis indicates the proportion of samples under different IE values relative to the total dataset. The orange area represents the distribution of high-quality pseudo-samples. The more data points in this area, the better the quality. The pseudo-samples obtained in this study exhibit higher detection complexity, richer information content, and greater structural authenticity.
Figure 6.
Distribution of the proposed pseudo-sample quality in terms of SCR and IE. (a) The x-axis represents the SCR range (0–50), where lower values indicate more challenging detection targets. The y-axis shows the proportion of samples under different SCR values relative to the total dataset. (b) The x-axis represents image complexity and information content, while the y-axis indicates the proportion of samples under different IE values relative to the total dataset. The orange area represents the distribution of high-quality pseudo-samples. The more data points in this area, the better the quality. The pseudo-samples obtained in this study exhibit higher detection complexity, richer information content, and greater structural authenticity.
Figure 7.
ROC curve evaluation of five networks trained on six simulated datasets, respectively.
Figure 7.
ROC curve evaluation of five networks trained on six simulated datasets, respectively.
Figure 8.
Qualitative visualization results on the real dataset NUAA-SIRST. Each row of images corresponds to the centroid localization results of five baseline networks: the first column shows the original infrared image, the second column shows the ground truth centroid annotation, and the third to eighth columns display the detection results of models trained on the six simulated datasets. It can be clearly observed that the self-supervised learning method proposed in this paper outperforms the comparative methods in terms of false alarm rate, detection rate, and localization accuracy.
Figure 8.
Qualitative visualization results on the real dataset NUAA-SIRST. Each row of images corresponds to the centroid localization results of five baseline networks: the first column shows the original infrared image, the second column shows the ground truth centroid annotation, and the third to eighth columns display the detection results of models trained on the six simulated datasets. It can be clearly observed that the self-supervised learning method proposed in this paper outperforms the comparative methods in terms of false alarm rate, detection rate, and localization accuracy.
Figure 9.
Qualitative visualization results on the real dataset IRDST-real. Each row of images represents the centroid visualization results of five baseline networks. The first column displays the original image; the second column shows the ground truth centroid visualization; and the third to eighth columns present the centroid visualizations of detection results from models trained on the six simulated datasets. It can be clearly seen that the self-supervised learning method proposed in this paper achieves a lower false alarm rate, a higher detection rate, and superior target localization accuracy.
Figure 9.
Qualitative visualization results on the real dataset IRDST-real. Each row of images represents the centroid visualization results of five baseline networks. The first column displays the original image; the second column shows the ground truth centroid visualization; and the third to eighth columns present the centroid visualizations of detection results from models trained on the six simulated datasets. It can be clearly seen that the self-supervised learning method proposed in this paper achieves a lower false alarm rate, a higher detection rate, and superior target localization accuracy.
Figure 10.
Pseudo-sample images generated by four different degradation schemes (magnification recommended for detailed observation). From left to right: No degradation, , + , and + + .
Figure 10.
Pseudo-sample images generated by four different degradation schemes (magnification recommended for detailed observation). From left to right: No degradation, , + , and + + .
Figure 11.
Pseudo-sample images generated by three different constraint schemes (magnification recommended for detailed observation). From left to right: , + and + + .
Figure 11.
Pseudo-sample images generated by three different constraint schemes (magnification recommended for detailed observation). From left to right: , + and + + .
Table 1.
Quantitative comparison results of five networks trained on six simulated datasets and evaluated on the real dataset NUAA-SIRST. The gray-shaded row highlights the results of our proposed method. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 1.
Quantitative comparison results of five networks trained on six simulated datasets and evaluated on the real dataset NUAA-SIRST. The gray-shaded row highlights the results of our proposed method. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
| Network | Train Datasets | ↓ | ↑ | ↑ | ↓ | ↑ |
|---|
| MDvsFA | NUDT-SIRST | 0.552 | 80.741 | 94.328 | 6.215 | 81.624 |
| NUST-SIRST | 0.405 | 94.215 | 95.743 | 5.322 | 91.842 |
| IRDST-simulation | 9.745 | 67.217 | 74.324 | 20.211 | 61.835 |
| IRSTD-1K | 0.468 | 86.844 | 96.745 | 7.417 | 85.328 |
| SIRST-5K | 0.583 | 73.328 | 92.212 | 6.746 | 73.467 |
| ours dataset | 0.342 | 91.745 | 95.328 | 6.219 | 93.433 |
| ISNet | NUDT-SIRST | 0.387 | 81.743 | 93.328 | 6.215 | 79.338 |
| NUST-SIRST | 0.283 | 92.328 | 96.467 | 4.328 | 93.229 |
| IRDST-simulation | 8.294 | 69.845 | 77.328 | 18.743 | 62.327 |
| IRSTD-1K | 0.375 | 87.587 | 97.841 | 4.216 | 87.648 |
| SIRST-5K | 0.548 | 74.828 | 91.215 | 8.745 | 74.437 |
| ours dataset | 0.197 | 95.215 | 97.236 | 4.322 | 94.339 |
| AFFNet | NUDT-SIRST | 0.573 | 79.416 | 91.649 | 9.328 | 78.564 |
| NUST-SIRST | 0.472 | 94.745 | 92.734 | 5.845 | 94.393 |
| IRDST-simulation | 9.884 | 67.324 | 72.845 | 24.417 | 60.845 |
| IRSTD-1K | 0.552 | 85.328 | 98.367 | 9.743 | 85.448 |
| SIRST-5K | 0.615 | 72.417 | 89.328 | 9.216 | 71.206 |
| ours dataset | 0.327 | 94.323 | 93.745 | 5.639 | 91.245 |
| DNA-Net | NUDT-SIRST | 0.372 | 82.323 | 94.744 | 5.843 | 80.215 |
| NUST-SIRST | 0.198 | 95.849 | 97.215 | 3.747 | 92.328 |
| IRDST-simulation | 7.845 | 69.747 | 76.213 | 16.828 | 62.434 |
| IRSTD-1K | 0.387 | 88.745 | 97.328 | 3.842 | 87.406 |
| SIRST-5K | 0.465 | 75.116 | 93.324 | 6.219 | 74.745 |
| ours dataset | 0.043 | 92.321 | 98.745 | 2.844 | 95.215 |
| RDIAN | NUDT-SIRST | 0.371 | 83.007 | 96.975 | 4.542 | 82.539 |
| NUST-SIRST | 0.112 | 96.884 | 98.735 | 2.755 | 91.447 |
| IRDST-simulation | 8.275 | 69.862 | 77.527 | 16.623 | 64.325 |
| IRSTD-1K | 0.335 | 87.613 | 97.304 | 3.492 | 89.106 |
| SIRST-5K | 0.446 | 75.243 | 95.363 | 4.804 | 76.009 |
| ours dataset | 0.036 | 94.305 | 98.841 | 2.614 | 95.386 |
Table 2.
Quantitative comparison results of five networks trained on six simulated datasets and evaluated on the real dataset IRDST-real. The gray-shaded row highlights the results of our proposed method. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 2.
Quantitative comparison results of five networks trained on six simulated datasets and evaluated on the real dataset IRDST-real. The gray-shaded row highlights the results of our proposed method. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
| Network | Train Datasets | ↓ | ↑ | ↑ | ↓ | ↑ |
|---|
| MDvsFA | NUDT-SIRST | 0.487 | 81.328 | 92.407 | 8.745 | 80.215 |
| NUST-SIRST | 17.215 | 70.745 | 85.215 | 15.322 | 67.884 |
| IRDST-simulation | 0.233 | 92.367 | 94.835 | 5.713 | 91.215 |
| IRSTD-1K | 0.572 | 79.435 | 90.311 | 9.747 | 75.835 |
| SIRST-5K | 0.387 | 85.745 | 95.205 | 5.479 | 85.545 |
| ours dataset | 0.058 | 91.219 | 94.845 | 5.329 | 91.735 |
| ISNet | NUDT-SIRST | 0.573 | 80.313 | 90.213 | 10.634 | 82.862 |
| NUST-SIRST | 15.416 | 68.437 | 83.205 | 17.145 | 69.954 |
| IRDST-simulation | 0.128 | 94.235 | 96.745 | 3.528 | 92.135 |
| IRSTD-1K | 0.673 | 77.417 | 92.339 | 8.745 | 76.267 |
| SIRST-5K | 0.548 | 86.348 | 96.644 | 4.234 | 86.382 |
| ours dataset | 0.157 | 92.864 | 95.706 | 3.228 | 91.015 |
| AFFNet | NUDT-SIRST | 0.412 | 79.215 | 89.348 | 19.449 | 79.463 |
| NUST-SIRST | 18.745 | 67.431 | 81.397 | 18.338 | 65.675 |
| IRDST-simulation | 0.022 | 91.369 | 91.615 | 4.739 | 90.396 |
| IRSTD-1K | 0.739 | 76.328 | 89.467 | 9.326 | 75.254 |
| SIRST-5K | 0.318 | 86.845 | 91.228 | 5.744 | 84.379 |
| ours dataset | 0.637 | 90.375 | 92.547 | 7.764 | 89.328 |
| DNA-Net | NUDT-SIRST | 0.512 | 81.841 | 93.328 | 5.413 | 83.841 |
| NUST-SIRST | 14.326 | 69.463 | 83.745 | 12.218 | 69.486 |
| IRDST-simulation | 0.053 | 94.719 | 97.328 | 3.845 | 94.728 |
| IRSTD-1K | 0.515 | 78.337 | 92.746 | 6.309 | 74.229 |
| SIRST-5K | 0.472 | 87.845 | 97.215 | 3.795 | 87.336 |
| ours dataset | 0.089 | 93.703 | 96.328 | 3.467 | 92.845 |
| RDIAN | NUDT-SIRST | 0.548 | 86.243 | 95.383 | 3.741 | 84.331 |
| NUST-SIRST | 11.563 | 73.318 | 87.534 | 13.365 | 70.198 |
| IRDST-simulation | 0.046 | 94.642 | 98.046 | 2.881 | 93.142 |
| IRSTD-1K | 0.442 | 81.227 | 93.224 | 6.804 | 78.247 |
| SIRST-5K | 0.363 | 89.145 | 97.512 | 3.416 | 87.512 |
| ours dataset | 0.051 | 94.817 | 97.663 | 2.552 | 92.856 |
Table 3.
Statistics of position pixel deviation (unit: pixel) for various models on the real dataset IRDST-real.
Table 3.
Statistics of position pixel deviation (unit: pixel) for various models on the real dataset IRDST-real.
| Model/Train Datasets | NUST-SIRST | NUDT-SIRST | IRSTD-1K | SIRST-5K | IRDST-Simulation | Ours Dataset |
|---|
| MDvsFA | 45 | 15 | 20 | 20 | 20 | 12 |
| ISNet | 121 | 34 | 65 | 8 | 18 | 7 |
| AFFNet | 123 | 34 | 58 | 19 | 5 | 8 |
| DNA-Net | 456 | 21 | 33 | 6 | 5 | 12 |
| RDAN | 99 | 49 | 40 | 23 | 8 | 19 |
| Total | 844 | 153 | 216 | 76 | 56 | 58 |
Table 4.
Qualitative comparison of five schemes on three evaluation metrics for the real datasets NUAA-SIRST and IRDST-real. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 4.
Qualitative comparison of five schemes on three evaluation metrics for the real datasets NUAA-SIRST and IRDST-real. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
| Scheme | NUAA-SIRST | IRDST-Real |
|---|
| ↓ | ↑ | ↑ | ↓ | ↑ | ↓ | ↑ | ↑ | ↓ | ↑ |
|---|
| A | 8.935 | 75.601 | 79.549 | 11.842 | 77.163 | 13.107 | 71.479 | 73.004 | 16.352 | 75.060 |
| B | 4.337 | 90.734 | 94.046 | 7.485 | 89.549 | 8.104 | 88.592 | 85.116 | 10.440 | 82.339 |
| C | 1.205 | 88.249 | 97.118 | 3.117 | 96.330 | 6.996 | 85.380 | 93.810 | 7.637 | 87.548 |
| D | 0.036 | 94.305 | 98.841 | 2.614 | 95.386 | 0.051 | 92.817 | 97.663 | 2.552 | 92.856 |
Table 5.
Qualitative comparison of four blur degradation schemes on two real datasets and three evaluation metrics. The best and second-best results are highlighted in red. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 5.
Qualitative comparison of four blur degradation schemes on two real datasets and three evaluation metrics. The best and second-best results are highlighted in red. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
| Scheme | NUAA-SIRST | IRDST-Real |
|---|
| ↓ | ↑ | ↓ | ↓ | ↑ | ↓ |
|---|
| A | 0.697 | 98.013 | 2.994 | 2.510 | 95.421 | 4.232 |
| B | 0.036 | 98.841 | 2.614 | 0.051 | 97.663 | 2.552 |
Table 6.
Qualitative comparison of three Radiation Intensity Degradation schemes on two real datasets and three evaluation metrics. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 6.
Qualitative comparison of three Radiation Intensity Degradation schemes on two real datasets and three evaluation metrics. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
| Scheme | NUAA-SIRST | IRDST-Real |
|---|
| ↓ | ↑ | ↓ | ↓ | ↑ | ↓ |
|---|
| A | 2.196 | 96.164 | 3.087 | 1.195 | 96.360 | 3.260 |
| B | 0.171 | 97.913 | 2.726 | 3.997 | 95.191 | 3.793 |
| C | 0.036 | 98.841 | 2.614 | 0.051 | 97.663 | 2.552 |
Table 7.
Qualitative comparison of three scale degradation schemes on two real datasets and three evaluation metrics. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 7.
Qualitative comparison of three scale degradation schemes on two real datasets and three evaluation metrics. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
| Scheme | NUAA-SIRST | IRDST-Real |
|---|
| ↓ | ↑ | ↓ | ↓ | ↑ | ↓ |
|---|
| A | 3.650 | 96.684 | 3.441 | 1.237 | 97.586 | 2.603 |
| B | 2.227 | 97.169 | 2.691 | 1.572 | 96.597 | 5.961 |
| C | 0.036 | 98.841 | 2.614 | 0.051 | 97.663 | 2.552 |
Table 8.
Qualitative comparison of three constraint schemes on five evaluathion metrics for the real datasets NUAA-SIRST and IRDST-real. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 8.
Qualitative comparison of three constraint schemes on five evaluathion metrics for the real datasets NUAA-SIRST and IRDST-real. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
| Scheme | NUAA-SIRST | IRDST-Real |
|---|
| | | ↓ | ↑ | ↑ | ↓ | ↑ | ↓ | ↑ | ↑ | ↓ | ↑ |
|---|
| ✓ | | | 1.639 | 91.064 | 98.967 | 3.916 | 94.225 | 1.938 | 85.394 | 91.031 | 5.190 | 87.438 |
| ✓ | ✓ | | 0.227 | 93.228 | 96.371 | 2.837 | 93.187 | 4.204 | 88.712 | 95.194 | 3.776 | 90.550 |
| ✓ | ✓ | ✓ | 0.036 | 94.305 | 98.841 | 2.614 | 95.386 | 0.051 | 92.817 | 97.663 | 2.552 | 92.856 |
Table 9.
Impact of different loss balancing factor combinations and on self-supervised performance. “Epoch” indicates the iteration at which the model begins to converge. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
Table 9.
Impact of different loss balancing factor combinations and on self-supervised performance. “Epoch” indicates the iteration at which the model begins to converge. The best and second-best results are highlighted in red and green, respectively. Symbols ↑ and ↓ indicate that higher and lower values are better, correspondingly.
| Scheme (, ) | ↓ | ↑ | ↑ | ↓ | Epoch |
|---|
| (0.1, 0.9) | 0.042 | 92.85 | 96.31 | 3.87 | 36 |
| (0.2, 0.8) | 0.036 | 94.30 | 98.84 | 2.61 | 23 |
| (0.3, 0.7) | 0.039 | 94.67 | 97.92 | 3.04 | 28 |
| (0.4, 0.6) | 0.045 | 93.95 | 95.76 | 3.78 | 35 |
| (0.5, 0.5) | 0.057 | 91.20 | 96.80 | 4.33 | 67 |
| (0.6, 0.4) | 0.051 | 88.37 | 94.20 | 5.12 | 65 |
| (0.7, 0.3) | 0.063 | 89.50 | 92.45 | 7.81 | 86 |
| (0.8, 0.2) | 0.075 | 87.40 | 90.10 | 10.25 | 90 |
| (0.9, 0.1) | 0.088 | 83.92 | 88.20 | 12.64 | 112 |