Adaptive Dual-Frequency Denoising Network-Based Strip Non-Uniformity Correction Method for Uncooled Long Wave Infrared Camera

Shao, Ajun; He, Hongying; Gao, Guanghui; Zhang, Mengxu; Ge, Pengqiang; Kong, Xiaofang; Qian, Weixian; Gu, Guohua; Chen, Qian; Wan, Minjie

doi:10.3390/app16021052

Open AccessArticle

Adaptive Dual-Frequency Denoising Network-Based Strip Non-Uniformity Correction Method for Uncooled Long Wave Infrared Camera

by

Ajun Shao

^1,2,3,4,†,

Hongying He

^1,2,3,4,†,

Guanghui Gao

^1,2,3,4,

Mengxu Zhang

^1,2,3,4

,

Pengqiang Ge

^1,2,3,4,

Xiaofang Kong

⁵,

Weixian Qian

^1,2,3,4,

Guohua Gu

^1,2,3,4,

Qian Chen

^1,2,3,4,6 and

Minjie Wan

^1,2,3,4,*

¹

School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China

²

Jiangsu Key Laboratory of Visual Sensing & Intelligent Perception, Nanjing University of Science and Technology, Nanjing 210094, China

³

State Key Laboratory of Extreme Environment Optoelectronic Dynamic Measurement Technology and Instrument, Nanjing University of Science and Technology, Nanjing 210094, China

⁴

Advanced Interdisciplinary Research Center for Optics, Nanjing University of Science and Technology, Nanjing 210094, China

⁵

National Key Laboratory of Transient Physics, Nanjing University of Science and Technology, Nanjing 210094, China

⁶

School of Information and Communication Engineering, North University of China, Taiyuan 030051, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2026, 16(2), 1052; https://doi.org/10.3390/app16021052

Submission received: 17 November 2025 / Revised: 2 January 2026 / Accepted: 5 January 2026 / Published: 20 January 2026

(This article belongs to the Section Optics and Lasers)

Download

Browse Figures

Versions Notes

Abstract

The imaging quality of uncooled long wave infrared (IR) cameras is always limited by the stripe non-uniformity mainly caused by fixed pattern noise (FPN). In this paper, we propose an adaptive dual-frequency denoising network-based stripe non-uniformity correction (NUC) method, namely ADFDNet, to realize the balance between FPN removal and image detail preservation. Our ADFDNet takes the dual-frequency feature deconstruction module as its core, which decomposes the IR image into high-frequency and low-frequency features, and performs targeted processing through detail enhancement branches and sparse denoising branches. The former enhances the performance of detail preservation through multi-scale convolution and pixel attention mechanism, while the latter combines sparse attention mechanism and dilated convolution design to suppress high-frequency FPN. Furthermore, the dynamic weight fusion of features is realized using the adaptive dual-frequency fusion module, which better integrates detail information. In our study, a 420-pair image dataset covering different noise levels is constructed for better model training and evaluation. Experiments verify that the presented ADFDNet method significantly improves image clarity in both real and simulated noise scenes, and achieves a better balance between FPN suppression and detail preservation than other existing methods.

Keywords:

strip non-uniformity correction; adaptive dual-frequency denoising network; detail enhancement; sparse denoising; dynamic weight fusion

1. Introduction

Nowadays, long wave infrared (IR) imaging has been extensively applied in both civil and military applications [1], such as security monitoring [2], non-contact temperature measurement [3] and optical guidance [4]. Among all the existing devices, microbolometer focal plane array (FPA)-based uncooled long wave IR cameras have shown their superiority owing to the advantages of high sensitivity, fast response speed, low cost, small size and so on. However, the imaging quality still suffers from the non-uniformity caused by fixed pattern noise (FPN), mainly the stripe noise [5,6]. Therefore, it is of great necessity to develop a high-quality stripe non-uniformity correction (NUC) method for uncooled long wave IR cameras.

As shown in Figure 1a,b, the output of commercial uncooled long wave IR cameras is contaminated by stripe noise (i.e., stripe NUC), which greatly decreases the overall clarity of images and makes it inevitable to limit the precision of IR object detection. In essence, the stripe NUC is derived from the response non-uniformity of microbolometer FPA. As depicted in Figure 1c, the main structure of FPA includes a detector array, column-parallel blind bolometers, column-parallel accumulators and column-parallel analog-to-digital converters (ADCs) [7]. Since the response characteristics of blind bolometers and ADCs in different columns cannot be perfectly uniform, the vertical stripe noise is thus created and further amplified by the non-uniformity of the readout circuit. In this paper, IR non-uniformity refers to systematic pixel-to-pixel variations in the electrical response of the microbolometer FPA under a spatially uniform radiance field, caused by gain mismatch, readout-circuit non-uniformity and long-term drift; this fixed-pattern response non-uniformity manifests in the image domain as structured artifacts such as vertical stripes, banding and blotchy patches, and IR NUC aims to estimate and compensate these pixel-level errors so that an ideally uniform scene is rendered with approximately uniform brightness. In the following, we focus on stripe-type non-uniformity, in particular the column-wise vertical stripes induced by column FPN in uncooled long wave IR cameras, and the proposed ADFDNet is designed as an image-level complement to the hardware NUC chain to model and suppress these stripe non-uniformity components superimposed on the thermal image.

In order to overcome the interferences caused by stripe non-uniformity, scholars have undertaken a lot of research on this topic during the past decades. Generally speaking, the existing NUC methods can be roughly divided into two mainstream types, including calibration-based and scene-based methods [8]. To be specific, the core principle of calibration-based NUC methods is to establish a mathematical model and compensate for non-uniformity by measuring the response characteristics of each pixel under different input radiation. The most representative models involve single-point, two-point and multi-point correction models [9]. Although calibration-based methods have relatively high measurement accuracy, they highly rely on the known reference sources and require periodic re-calibration. In addition, the calibration task needs to be implemented in a static and uniform scene, and dynamic targets in the scene may introduce immeasurable calibrations errors. On the other hand, scene-based NUC methods aim to dynamically calculate and update the correction parameters during the process of IR image acquisition based on the priori from the scene and stripe non-uniformity itself. Currently, a lot of useful techniques have been exploited for scene-based NUC methods, e.g., constant statistical approaches [10], temporal high-pass approaches [11], and image registration-based approaches [12]. It is noticeable that although such approaches can adapt to environmental changes, especially in complex and dynamic scenes, ghosting artifacts and convergence speed are two major thorny problems.

With the rapid development of deep learning techniques, researchers have begun to exploit convolutional neural network (CNN)-based methods to perform strip NUC for IR images [13,14,15]. Due to the powerful ability of spatial feature extraction and non-linear fitting, deep learning-based NUC methods have shown promising performance in stripe noise removal. It should be noted that these types of methods essentially belong to the scene-based methods discussed above, which do not depend on the offline calibration, and can complete the NUC using the scene information from only single frame. However, deep learning-based NUC methods are still faced with several challenges in practical applications. On the one hand, the existing CNN models are unable to achieve both noise suppression and detail preservation. That is to say, scene details, such as edges and contours, may be blurred, which leads to serious image quality degradation after NUC. More seriously, those long-distance dim and small targets may be totally lost, which is a disaster for IR object detection and tracking tasks. On the other hand, real datasets that contain both real noisy images and clean images are difficult to obtain. For the existing studies, they usually use different FPN simulation models [14] to add column strip noise on the real-captured IR images as the noisy images to train and test their CNN models. Despite the satisfactory performance on simulated test images, the generalization cannot be guaranteed when applied to real-captured IR noisy images with real FPN. For standard visible and near-IR digital cameras, Bernacki et al. [16] and Volkov et al. [17] systematically reviewed deep learning-based methods for FPN modeling, sensor noise removal and source camera identification. Different from these stripe-oriented and general denoising frameworks in the visible domain, our work specifically targets column-wise stripe FPN in uncooled long wave IR microbolometer FPAs, and designs a dual-frequency CNN to balance stripe suppression and IR detail preservation under low-SNR thermal imaging conditions.

To solve the above-discussed problems of strip NUC for uncooled long wave IR cameras, this paper proposes an adaptive dual-frequency denoising network, namely ADFDNet, for effective FPN removal and image detail preservation. First of all, the network separates high-frequency noise and low-frequency structural information through a dual-frequency feature deconstruction module. Then, the presented detail enhancement branch strengthens low-frequency features, the sparse denoising branch suppresses high-frequency noise, and the adaptive dual-frequency fusion module dynamically adjusts weights to optimize feature fusion. In addition, a loss function combining structural similarity and mean square error is developed to further balance global consistency and pixel-level accuracy. In order to better train and evaluate the presented network, we also construct a diverse dataset, including 105 pairs of noise-free IR images and 315 pairs of simulated noisy images, covering three different noise intensities.

The main contributions of our study can be summarized as follows:

1.: We propose a hierarchical processing architecture for high and low frequency image information. The dual-frequency feature decomposition (DFD) module decomposes the input noisy image into high-frequency and low-frequency components, which are used to suppress noise and preserve background structure, respectively.
2.: A detail enhancement branch (DEB) module is especially designed to process low-frequency feature submaps, which consists of multi-scale convolutional layers, an enhanced residual convolutional module and a pixel attention (PA) module. This module is designed to enhance the low-frequency information of the image and reconstruct details.
3.: A sparse noise reduction branch (SNRB) module is exploited to process high-frequency feature sub-images. This module consists of a sparse attention (SA) module based on a sparse mechanism, primarily used for enhancing high-frequency information and removing noise.
4.: Both qualitative and quantitative experimental results demonstrate that our proposed ADFDNet significantly outperforms existing stripe NUC methods in real and simulated datasets, and has stronger denoising ability, detail preservation effect and generalization ability.

2. Related Work

During the past decades, researchers have developed plenty of NUC methods to deal with the problem of IR image strip noise suppression. In general, the existing strip NUC methods can be classified into calibration-based and scene-based methods. In addition, deep learning-based methods, which essentially belong to scene-based methods, have emerged rapidly in recent years.

2.1. Calibration-Based Methods

Calibration-based methods use blackbody to calibrate the output response of the detector offline at even radiations, and compute the gain and bias parameters using a linear model [18]. Among all these methods, Helfirch et al. [19] proposed a real-time compensation method with a microprocessor to calibrate two blackbody sources with different temperatures, and calculate the responsivity and bias, respectively. By this means, the output signal can be corrected in real time, thus eliminating the FPN effectively. Due to the problem of insufficient calibration coefficient accuracy and large errors in the two-point temperature calibration method, Hu et al. [20] proposed a two-point temperature calibration scheme with secondary parameter compensation. According to their ideas, the initial calibration parameters are compensated via the real-time signal dynamic range, so that the parameters can be closer to the theoretical values. Besides, Li et al. [21] proposed a polynomial fitting non-uniformity correction algorithm based on the piecewise linear correction algorithm, which improves the performance of NUC algorithm. In view of the limitations of the univariate linear theoretical model, Qu et al. [22] established for the first time a binary nonlinear non-uniformity theoretical model for IR focal plane arrays, which greatly increases the prediction accuracy of the IR focal plane array response curve and its non-uniformity. Overall, the calibration-based NUC methods have the advantages of being simple, efficient, and easy to implement in hardware, but they cannot completely avoid the problem of time drift, i.e., periodic recalibration is necessary. For example, a recent multi-segment second-order calibration-based NUC scheme implemented on FPGA by Yang et al. [23] can achieve high correction accuracy but still relies on a blackbody reference and large segment-wise parameter storage, whereas the ADFDNet proposed in this paper dispenses with external calibration and, through a data-driven dual-frequency deep network, adaptively suppresses stripe FPN while better preserving fine structural details, thus providing a complementary image-level solution to these hardware-oriented methods.

2.2. Scene-Based Methods

Scene-based methods do not need to use an external reference source, i.e., blackbody source, during the processing. It corrects the non-uniformity of the detector array by analyzing the information of the IR scene itself. Compared with the traditional calibration-based methods, scene-based NUC methods can perform real-time dynamic correction, and can adapt to more complex application scenarios. Scribner et al. [24] first proposed the time-domain high-pass filter method, which argues that the high-frequency information belongs to the scene details while the low-frequency information belongs to the FPN. Therefore, FPN can be estimated by low-pass filtering the image sequence along the time axis. Narendra [25] first presented the constant statistics-based NUC method based on the assumption that all the IR detectors receive the same statistical characteristics of radiation over a long enough period of time in a dynamic changing scene, and these statistical characteristics can be used as a unified reference to calculate and compensate for the offset and gain differences of each detector. Harris et al. [26] detected the magnitude of pixel changes and selectively updated the statistics. In their method, they adjust the correction parameters only when significant changes exist, which effectively reduces ghosting artifacts and improves the accuracy and robustness of NUC. In order to better capture the stripe FPN and accelerate the convergence of the optimization process, Kim et al. [27] combined the weighted detection strategy and the acceleration strategy of the alternating direction method of multipliers (ADMM) to propose an ADMM-based optimization model, called ADOM, for stripe noise removal. Cao et al. [28] used a local linear model to describe the relationship between column FPN and thermal radiation, and applied a one-dimensional row-guided filter to smooth the image edge in the horizontal direction. Meanwhile, they applied a one-dimensional column-guided filter to separate column FPN from other high-frequency signals. Although this method can achieve a good balance between removing noise and preserving details, it still mistakenly removes some vertical structures. In general, scene-based NUC methods are sensitive to motion and are easily affected by extreme scenes [9,11], and it is also difficult to ensure both the convergence speed and stability of the algorithm. Lin et al. [29] proposed a scene-based NUC method for IR motion-blurred images, in which non-uniformity is estimated from motion blur and then compensated; although this approach can effectively suppress column-wise stripe noise, it is strongly dependent on sufficient scene motion and its ability to handle more complex spatially distributed non-uniformity is limited. In addition, Liu et al. [30] combined a deep image prior with step-variable total variation regularization to perform reference-free spatial noise and FPN removal for thermal images, providing an unsupervised alternative to calibration-based and conventional scene-based NUC schemes.

2.3. Deep Learning-Based Methods

In recent years, deep learning-based methods have developed rapidly owing to the strong non-linear fitting ability of convolutional neural networks (CNNs). Typically, Kuang et al. [31] designed a strip noise removal CNN (SNRCNN) which achieved satisfactory performance on natural images with simulated stripe noises. However, SNRCNN is trained only using visible light images, without considering the natural differences between IR and visible light images, and its performance drops significantly when applied to real captured IR images. He et al. [7] proposed a deep learning-based strip NUC method (DLSNUC) that uses a residual network architecture with a large receptive field to obtain better processing results, but it will reduce image contrast to a certain degree. In addition, Kuang et al. [32] proposed a data-driven learning-based stripe removal method (DDL-SR) that uses a UNet-like network to learn the regularity of complex stripe noise features. Guan et al. [33] used a strip noise removal wavelet deep neural network (SNRDWNN) to suppress FPN from images by taking advantage of the special characteristics of stripe noise and the complementary information between different wavelet sub-band coefficients. However, this method has a large amount of computation and serious detail loss under strong noise levels. Chang et al. [34] designed a two-stream CNN that can simultaneously model stripes and images, and embedded wavelets into this CNN model to better learn the internal directional characteristics of stripes. In order to solve the problems of image detail loss and edge blur in existing approaches, Li et al. [35] proposed a long-short connection (LSC-CNN) structure-based IR image NUC method, where the network depth is increased to fully learn the noise by short connections and the long connection is used to decrease the image information loss caused by transposed convolution. Specifically, in this work we focus on a self-developed medium-resolution uncooled long wave IR focal-plane-array camera (original sensor resolution 640 × 512, with training and testing performed at a working resolution of 480 × 480), which is complementary in sensor structure and resolution scale to the ultra-high-resolution IR line-scan systems reported in recent studies [36]. However, since most current deep learning-based strip NUC methods are trained mainly on simulated stripe datasets, their generalization ability to real-captured IR images with complex superimposed noise still needs to be further verified in practical applications.

3. Method

3.1. Overview

The overall structure of our proposed ADFDNet is shown in Figure 2. The input IR noisy image is first processed by a DFD module for frequency domain decomposition. Then, two sub-branches are generated, and each branch processes information with different frequencies. The above processed information is further fused together through an adaptive dual-frequency fusion (ADF) module, which ultimately outputs the final denoised image. To be specific, the DFD module decomposes the input noisy image into high-frequency and low-frequency components. The former component mainly contains noise and edge details while the latter component primarily represents the background information and global structure. These high-frequency and low-frequency feature maps are then fed into two separate branches for subsequent processing. The DEB processes the low-frequency subsidiary feature map, which consists of multi-scale convolutional layers, an enhanced residual convolution module and a PA module. Its goal is to enhance the low-frequency information and reconstruct details. The low-frequency information helps remove background noise and reconstruct the image structural features. The SNRB processes the high-frequency subsidiary feature map, which is composed of an SA module. It is primarily used for enhancing high-frequency information and noise removal. The SA mechanism allows the network to focus more on the noisy regions so as to improve the performance of denoising. The ADF module fuses the processed feature maps from the two branches through learnable adaptive weights in order to avoid the challenge of manually adjusting fusion parameters. The adaptive weight mechanism makes the fusion process more flexible and precise. By this means, the low-frequency and high-frequency information can be effectively combined to produce the final corrected IR image. Note that ADFDNet is specifically designed as a two-dimensional IR image NUC framework, in which all feature modeling and convolution operations are performed in the spatial domain; extending this architecture to one-dimensional, low-dimensional signals would require substantial redesign of both the network structure and the loss function and is therefore beyond the scope of this work.

3.2. DFD Module

Due to the inherent imaging characteristics of IR images, these IR images often exhibit a low signal-to-clutter ratio (SCR), which makes it difficult to balance noise removal and detail preservation. During the noise removal process, forceful denoising typically results in the loss of detailed information, whereas preserving details often comes with residual noise. This trade-off renders single-objective noise removal methods inadequate for achieving effective denoising and detail preservation in IR image processing at the same time.

The digital detail enhancement (DDE) method, which was first proposed by the FLIR, employs an image layer-based strategy. The core idea of the DDE method is to decompose the image into a background layer and a detail layer. The background layer typically contains high-contrast, obvious grayscale variations, while the detail layer contains low-contrast, subtle grayscale changes. By processing the background and detail layers, respectively, the DDE method improves image quality without losing critical details. Inspired by the concept of DDE, this paper proposes a dual-frequency network based on hierarchical processing, which aims at processing different frequency characteristics within images to achieve noise removal and detail preservation. To achieve image feature detachment, a dual-frequency decomposition (DFD) module is designed, as illustrated by the blue dashed box in Figure 2. The specific operations of the DFD module are as follows: (1) The Gaussian filtering performs low-frequency processing to the input image, which generates a low-frequency feature map. (2) The high-frequency feature map is calculated based on the difference between the original image and the low-frequency feature map. (3) The low-frequency feature and high-frequency feature maps are separately concatenated with the original image, which produces feature map X containing high-frequency information and feature map Y containing low-frequency information. These two feature maps serve as inputs to subsequent network modules, which provides support for efficient image denoising and detail enhancement. This process is defined Equation (1) as follows:

\begin{matrix} X & = c o n c a t (G (F_{n o i}), F_{n o i}), \\ Y & = c o n c a t (F_{n o i} - G (F_{n o i}), F_{n o i}) \end{matrix}

(1)

where

G (\cdot)

denotes a Gaussian filter;

F_{n o i}

is the input noisy image; and

c o n c a t (\cdot)

denotes concatenation. In this work,

G (\cdot)

is implemented as a fixed Gaussian low-pass filter. This simple and stable operator provides a clear physical interpretation and is sufficient to separate the slowly varying background from detail-plus-noise, while avoiding extra learnable parameters and computational burden introduced by more complex filters; the kernel size and standard deviation are empirically selected on a validation set and kept fixed in all experiments.

The fixed Gaussian filtering used in the DFD module is primarily intended for coarse-grained, lightweight pre-decomposition. The actual adaptive content modeling is performed by the subsequent learnable branches, such as the DEB and SNRB, which work together to achieve more precise, context-aware processing. The DFD module separates low and high-frequency components using Gaussian filtering, extracting a smooth low-frequency image and a complementary high-frequency image. These components are then concatenated with the original noisy image and fed into the network, ensuring that structural information is not irreversibly discarded during the decomposition stage. Both branches retain access to the full original image, enabling robust feature matching and improved registration accuracy.

We emphasize that the choice of fixed Gaussian smoothing is driven by the need for stability, generalization and computational efficiency in IR camera deployment. It effectively separates large-scale biases from local contrast variations without adding excessive learnable parameters, thus reducing overfitting, especially with limited data. Preliminary experiments showed that fully learnable filtering increased instability without improving peak signal-to-noise ratio (PSNR) or structural similarity index measure (SSIM).

3.3. DEB Module

IR low-frequency feature maps contain important image information characteristics, which includes global background information of the scene and structural details in low-contrast regions instead of non-uniform noise. These low-frequency features often determine the overall image brightness distribution and global texture. However, the detail information is relatively sparse, which makes it susceptible to noise interference and results in the loss of key features. Therefore, how to effectively mine potential detailed features from low-frequency information and enhance detail representation is a critical challenge in improving the quality of denoised images when considering the characteristics of IR low-frequency feature maps. The key idea of processing low-frequency information is to enhance detail representation while precenting the addition of extra noise. To this end, this study designs a DEB module to progressively strengthen the detailed features within low-frequency information, which consists of a multi-scale feature extraction layer, a PA module and an enhanced residual convolution module.

Firstly, the low-frequency feature map X undergoes a convolution operation to expand its feature channel, which is then fed into the multiscale feature extraction layer. This layer is composed of three convolution blocks with kernel sizes of

3 \times 3

,

5 \times 5

and

7 \times 7

, respectively. By extracting multi-scale features from low-frequency information under different receptive fields, it can capture hidden details in low-frequency regions. The extracted multi-scale feature maps are then concatenated for further enhancement processing. The formulas are defined in Equation (2) as follows:

\begin{matrix} X^{'} & = f (X), \\ X_{m u l t i} & = c o n c a t (f_{3} (X^{'}), f_{5} (X^{'}), f_{7} (X^{'})) \end{matrix}

(2)

where

f (\cdot)

denotes a standard convolution;

f_{3} (\cdot)

,

f_{5} (\cdot)

and

f_{7} (\cdot)

denote convolutions with kernel sizes

3 \times 3

,

5 \times 5

and

7 \times 7

, respectively;

X^{'}

denotes the low-frequency map after channel expansion via convolution; and

X_{m u l t i}

denotes the multi-scale feature map.

Next, the weights of all pixels are adaptively adjusted through the PA module, which guides the network to focus on the important details within the low-frequency regions. This process further improves the network’s capability to enhance details. The structure of the PA module is illustrated in Figure 3a. Concretely, the PA module combines global average pooling and global max pooling along the channel dimension to capture both background statistics and locally salient responses, and then uses a

1 \times 1

convolution followed by a sigmoid activation to generate a pixel-wise attention map.

After that, the multi-scale feature map

X_{m u l t i}

enters n enhanced residual convolutional modules. These modules strengthen feature extraction capabilities through residual connection mechanisms while effectively preventing network performance degradation, which ensures that the overall structural characteristics of low-frequency information are preserved while enhancing detail representation. The enhanced residual convolution modules used in this paper differ from the traditional ones, which removes the batch normalization (BN) layer and the RELU activation layer. According to findings in [37], the batch normalization layer normalizes features, which introduces unnecessary blurring effects in images and limits the network’s adaptability. Hence, removing the RELU layers after the shortcut connection in the residual module helps accelerate convergence during training. Figure 3b,c illustrates the structural comparison between the traditional residual module and the enhanced residual convolution module adopted in this paper.

Finally, the feature maps processed by the DEB module are concatenated and fused with the original low-frequency maps to obtain the final feature map

F^{ℓ}

. This process is expressed in Equation (3) as follows:

F^{ℓ} = f (c o n c a t (X^{'}, X_{e n h}))

(3)

where

X^{'}

denotes the low-frequency feature map after channel expansion via convolution, and

X_{e n h}

denotes the output feature map of the enhanced residual convolution module.

3.4. SNRB Module

IR high-frequency feature maps primarily include edge, texture and other fine-grained image details as well as noise. This is because noise in high-frequency components typically manifests as subtle pixel fluctuations or outlies. These components not only contain valuable target details, but also suffer from severe distortion of authentic representatives due to noise interference. Therefore, denoising becomes the important task in processing high-frequency feature maps. To address the above issues, the primary goals for handling high-frequency information involve effectively suppressing noise while preserving subtle image details. For this purpose, this paper proposes a SNRB module, which attenuates noise component while enhance actual details. The sparse regularization mechanism [38] has been extensively utilized in image processing tasks. Notably, Tian et al. [37] integrate it into a conventional neural network to prove its effectiveness in denoising tasks. Inspired by this, we introduce an SA module within the SNRB framework, which effectively separates noise from meaningful high-frequency information to enhance denoising performance.

Specifically, the proposed sparse module is different from the common sparse module; the SA module is designed based on dilated and regular convolution from conventional neural networks, which combines with attention mechanisms. The SA module with twelve layers is composed of two types: dilated attention convolutional layers and regular convention layers. The regular convention layer is the combination of normal convolution and batch normalization and ReLU activation. In contrast, the dilated attention convolution layers incorporate a channel attention mechanism and dilated convolution with an expansion factor of 2, which further integrated with BN and ReLU activation. Notably, the squeeze and excitation (SE) module serves as a channel attention mechanism, which enhances denoising performance through global information compression and channel-wise weighting. This dual mechanism effectively suppresses noise components while enhancing critical image features, which achieves efficient separation between noise and meaningful high-frequency information. The structure of dilated attention layer is illustrated in Figure 4.

In this work, “sparse attention” does not specifically refer to the sparse connection pattern of dilated convolutions. While dilated convolutions do create a “hole-like” sparse sampling structure at certain positions, the focus of “sparse attention” is not on the sparsity of the convolutional kernel connections but on the strategic distribution of attention across key layers in the network. Specifically, the SNRB consists of 12 convolutional layers, with only a few layers (such as layers 2, 5, 9 and 12) incorporating SA modules. These SA modules are not standard convolutional layers but rather a combination of dilated convolutions and SE. The dilated convolutions capture long-range dependencies and column-oriented stripe patterns, while the SE channel attention adaptively adjusts channel weights, emphasizing responses relevant to stripe noise and suppressing irrelevant or redundant features.

The design intentionally places these “enhanced” attention modules in only a few key layers, rather than stacking them across all layers. This strategy minimizes the computational cost of the attention mechanism while ensuring effective modeling of critical tasks such as stripe noise removal. This is what we refer to as the “sparseness” of “sparse attention”: it emphasizes the selective, strategic deployment of attention modules within the network, rather than focusing on numerical sparsity or nullifying convolutional kernel parameters.

In SNRB, the dilated attention layers are positioned at layers 2, 5, 9 and 12. These layers can capture more contextual dependencies through dilated convolution [39], which are regarded as high-energy nodes. Normal convolution layers are assigned to layers 1, 3, 4, 6, 7, 8, 10 and 11, which are classified as low energy nodes. The sparse property is realized through combing some high energy nodes with low energy nodes. In addition, sparse modules utilize fewer high energy points rather than a large number of them to capture more useful information. This approach not only enhances denoising performance and training efficiency but also reduces network complexity. Finally, the feature maps processed by SNRB are concatenated and fused with the original high-frequency information feature maps to obtain the final feature map

F^{h}

. This process is described in Equation (4) as follows:

F^{h} = f (c o n c a t (Y^{'}, Y_{sa}))

(4)

where

Y^{'}

denotes the high-frequency feature map after channel expansion via convolution, and

Y_{sa}

denotes the output of the SA module. In practice, the dilation rates, the number and positions of the dilated attention layers, and other structural hyper-parameters of DEB and SNRB are empirically selected based on preliminary experiments and kept fixed for all experiments.

3.5. ADF Module

In IR image processing, the fusion of background and detail layers is one of the critical steps. However, this fusion process typically faces two major issues: (1) the setting of fusion weight parameters significantly affects the final fusion effect; (2) utilizing a single fixed coefficient for fusion has significant limitations. The fixed coefficient cannot adequately adapt to the variations in the distribution of background and detail information across different images, which potentially leads to an imbalance in weights during the fusion process and thereby reduces denoising performance or detail preservation. To address these challenges, this paper proposes the ADF module to introduce an adaptive weight mechanism, which can adaptively adjust the fusion weights according to the distribution of input features during fusion process of high frequency and low frequency features, which achieves more precise fusion. The ADF module is shown in Figure 5.

To be specific, the module first concatenates the high-frequency feature map

F^{h}

with the low-frequency feature map

F^{ℓ}

to obtain a fused feature map. Subsequently, a convolutional layer is utilized to generate an adaptive weight map. This map passes through a sigmoid activation function to scale its values between 0 and 1, which represents the relative importance of high-frequency and low-frequency features at each pixel location. The formula for calculating the adaptive weights is given in Equation (5) as follows:

F_{w e i g h t} = σ (f_{1} (c o n c a t (F^{h}, F^{ℓ})))

(5)

where

F_{w e i g h t}

is the generated adaptive fusion weight map,

σ (\cdot)

denotes the sigmoid activation and

f_{1} (\cdot)

denotes a

1 \times 1

convolution.

Figure 6. Weighted maps generated by the ADF module. (a) Noisy image. (b) Denoised image. (c) Weighted map.

Several weight maps generated by the ADF module are shown in Figure 6. Through this weight map, the ADF module performs a weighted fusion of the high-frequency and low-frequency features; the detailed computation formula is defined in Equation (6) as follows:

F_{f u s i o n} = F_{w e i g h t} \otimes F^{ℓ} + (1 - F_{w e i g h t}) \otimes F^{h}

(6)

where ⊗ denotes element-wise multiplication, and

F_{w e i g h t}

denotes element-wise addition. The fused feature map passes through a convolutional layer and a batch normalization layer to further extract both global and local information from the fused features. In addition, the channel number of feature maps is adjusted through a 3 × 3 convolution layers, which ensures that the output features have a uniform dimension. This results in the final clean image. This process is expressed in Equation (7) as follows:

F_{o u t} = f_{3} (f (F_{f u s i o n})) .

(7)

Compared with traditional fusion methods that use fixed weights, this module significantly enhances the capability to dynamically balance background and detail features through an adaptive weight generation mechanism, which achieves a better balance between denoising and detail preservation. This design offers feature representation with higher quality for subsequent tasks, which effectively enhances the robustness and generalization performance of the network.

3.6. Loss Function

To optimize network performance and achieve more efficient noise removal and detail preservation, this paper proposes a weighted combination loss function. This loss function integrates SSIM loss and mean squared error (MSE) loss, which adjusts the relative importance of the two components through weights. It better balances the preservation of global structural information and the minimization of pixel-level errors.

(1): MSE loss: It is a commonly used metric to measure the point-by-point difference between predicted values and target values. It directly reflects the degree of deviation of the predicted results from the targets by calculating the squared error between them and taking the average, which is defined in Equation (8) as follows:

$L_{M S E} = \frac{1}{N} \sum_{i = 1}^{N} {∥I_{p r e d}^{(i)} - I_{g t}^{(i)}∥}_{2}^{2}$

(8)

where N is the total pixels in the image, $I_{p r e d}^{(i)}$ is the predicted value of i-th pixel, $I_{g t}^{(i)}$ is the target value of i-th pixel.
(2): SSIM loss: The SSIM loss quantifies similarities between two images by evaluating structural integrity, luminance consistency and contrast alignment, which makes it particularly effective for assessing perceptual quality. Unlike MSE loss, SSIM loss focuses on the global structural image features rather than pixel-level intensity discrepancies. The SSIM loss function is derived from the SSIM metric, which is defined as the complement of SSIM, defined in Equation (9) as follows:

$L_{SSIM} = 1 - S S I M (I_{p r e d}, I_{g t}) .$

(9)
(3): Total weighting combined loss function: This weighted combination integrates MSE and SSIM losses, which sufficiently considers the characteristics of non-uniform correction tasks. Namely, it requires a balance between global structural consistency and local detail restoration. By introducing SSIM loss, the network can better focus on the perceptual quality of the image, while the MSE loss provides a rigorous numerical optimization. This approach enhances the model’s denoising performance and detail preservation capabilities. The definition is given in Equation (10) as follows:

$L_{total} = α L_{S S I M} + γ L_{M S E}$

(10)

where $α$ controls the contribution of the SSIM loss to the total loss. A larger value of $α$ makes the network focus on the preservation of global structural information. $γ$ controls the weight of SSIM loss $L_{S S I M}$ ; a bigger value of $γ$ makes the network focus on pixel-level accuracy in the matching process. Before computing both loss terms, all IR images are linearly normalized to the range [0, 1], so that the numerical scales of the MSE and SSIM losses are comparable and the linear weighting in Equation (10) behaves stably. In all experiments, the weighting coefficients are empirically chosen and fixed as $α$ = 1 and $γ$ = 1.5, providing a simple trade-off between pixel-wise fidelity and structural preservation.

4. Experimental Results

4.1. Experimental Setup

4.1.1. Dataset

To train the IR non-uniformity correction method and evaluate its performance, this method uses a commercial uncooled thermal long wave IR camera to capture 105 IR image datasets of different scenes. The camera is fixed on a gimbal and collects data by slowly and uniformly panning. In total, 500 frames of continuous images are captured for each scene. Subsequently, these multi-frame images are processed using a time-domain multi-frame denoising algorithm to generate 105 pairs of high-quality noise-free IR images with a resolution of 480 × 480. To obtain noise-free, high-quality IR images, the time-domain multi-frame denoising algorithm is used to process the collected multi-frame images. The specific steps are as follows:

(1): Data preparation: Continuous frame images of the same scene are collected. During the shooting process, the camera is panned as slowly as possible to maximize the consistency between frames, thereby providing higher accuracy for the subsequent alignment and denoising steps.
(2): Frame alignment: To ensure the effectiveness of time-domain denoising, the multi-frame images are aligned. First, a frame is randomly selected from the first 50 still images as a reference frame. The phase correlation method is then used to calculate the displacement vector of each frame relative to the reference frame. Phase correlation, based on frequency domain transformation, can quickly and accurately estimate the displacement between images. Based on the calculated displacement vector, interpolation or image transformation techniques (such as translation, rotation and scaling) are used to align all frames with the reference frame. This alignment process may result in blank areas at the edges, which are addressed using interpolation or appropriate padding strategies to prevent impact on subsequent steps.
(3): Temporal denoising: After frame alignment, a temporal multi-frame filtering algorithm is used to denoise the image sequence. The specific calculation is shown in Equation (11):

$R (n) = α R (n - 1) + (1 - α) I (n)$

(11)

where $R (n)$ represents the output of frame n, i.e., the pixel values of the denoised image. $R (n - 1)$ represents the output of frame $n - 1$ , i.e., the previous denoising result. $I (n)$ represents the input of frame n, i.e., the pixel values of the original image. $α$ is a weighting factor, ranging from 0 to 1. When $α$ is closer to 1, the filter relies more on historical data; when $α$ is closer to 0, it relies more on the current input. This algorithm uses an exponentially weighted moving average to smooth the image, effectively reducing noise while preserving image detail. In experiments, the value of parameter $α$ was optimized to achieve the optimal balance between denoising and detail preservation.

The effectiveness of the real IR stripe non-uniformity dataset in this study is closely tied to the accuracy of image registration. Significant registration errors can cause blurring and distortion in the ground truth images. Since image registration relies on the presence of sufficient feature points in the image pair, flat, clean backgrounds, which generally lack such features, pose a challenge. To address this, scenes rich in high-frequency components, such as buildings, vehicles and grass, are deliberately chosen during dataset acquisition. Scenes with low feature extractability, such as clear skies and calm water surfaces, are avoided. This strategy ensures reliable feature matching and maximizes registration accuracy during image capture.

To further assess the accuracy of the method used to create the pairing dataset, we conduct a quantification process as illustrated in Figure 7. For each noisy image (480 × 480) in our dataset, we introduce random pixel shift values

δ_{i}^{0}

that range from 1 to 5 pixels. We then crop a pair of corresponding images before and after the shift (460 × 480) and register them using a phase correlation alignment algorithm. By calculating the sub-pixel shift value

δ_{i}

and comparing it with the randomly generated true shift value, we derive the registration deviation

ε_{i} = |δ_{i}^{0} - δ_{i}|

. Finally, the average registration deviation is calculated as

\bar{ε} = \frac{1}{N} \sum_{i = 1}^{N} ε_{i}

, where

N = 105

. Statistically, the average registration deviation is found to be

\bar{ε} = 0.89

pixel, confirming the reliability of our dataset.

After the above processing, 105 pairs of high-quality noise-free IR images are finally generated, providing a reliable data basis for the subsequent simulation of column fixed pattern noise and training of deep network models. For each noise-free image, column fixed pattern noise is simulated based on the noise model proposed in [7]. This model describes the relationship between column FPN and thermal radiation and combines a variety of classic noise models. For example, the simple offset model [31], the linear correction model [40,41] and the quadratic polynomial model derived from thermal calibration experiments [5]. This paper adopts a comprehensive column FPN model to characterize the non-uniformity of the detector through Equation (12):

\begin{matrix} S (i, j) & = a_{M}^{'} V {(i, j)}^{M} + a_{M - 1}^{'} V {(i, j)}^{M - 1} + \dots + a_{1}^{'} V (i, j) + a_{0}^{'} \end{matrix}

(12)

where

V (i, j)

is the thermal response of detector

(i, j)

.

S (i, j)

is the column FPN.

a_{M}^{j}, a_{M - 1}^{j}, \dots, a_{0}^{j}

is column FPN, which can be adjusted to generate noise patterns of varying intensities. When the polynomial order M is set to 0, 1 or 2, the model degrades to the simple offset model, linear model or quadratic polynomial model, respectively. While these can be used for training, they may oversimplify the FPN in real systems, which is influenced by factors like irradiance and temperature drift. Increasing the polynomial order to 4 or 5 leads to excessive oscillations and local extrema, creating “overfitted” stripes that do not align with actual physical systems, worsening the discrepancy between simulated and real noise. To balance model complexity and physical realism, a cubic (third-order) polynomial is chosen. This order integrates the advantages of the lower-order models while avoiding overfitting and instability, making it a practical and physically consistent choice for real-world applications. Hence, the third-order polynomial is selected as the default in this work. In this paper, the polynomial coefficients are randomly assigned within the range [—0.1, 0.1], and the polynomial order is set to 3 to achieve a good balance between complexity and performance. Based on the above formula, the generated column FPN is added to the ground truth image

V (i, j)

obtained after temporal denoising, which produces the simulated noisy image

S (i, j)

. It is important to note that each simulated noisy image segment has a unique column FPN pattern.

Finally, this paper collects 315 pairs of images with simulated column FPN and their corresponding noise-free images, which covers three noise levels (low, medium and high). Combined with 105 pairs of real noisy images, a total of 420 pairs are compiled to generate the IR non-uniformity correction dataset. Some example image pairs are shown in Figure 8. In this figure, (a) denotes the ground truth image, (b) denotes real noise, (c)–(e) display simulated noise images with low, medium and high noise levels, respectively. Note that the noise levels of low, medium and high are quantified using standard deviation and PSNR as objective metrics [42]. The statistical results of noise levels are reported as Table 1, in which the average, minimum and maximum values of each metric are calculated, respectively.

By constructing a diverse dataset covering various FPN patterns, the trained deep network achieves significantly enhanced robustness in image restoration, which makes it suitable for real-world IR image processing. In this work, we focus on a self-developed uncooled long wave IR microbolometer camera, whose FPN pattern, drift behaviour and imaging noise characteristics are strongly coupled with its specific detector architecture and readout chain. Therefore, the above dataset is collected and constructed using this camera so that the training and testing conditions of ADFDNet are consistent with the actual deployment scenario. We emphasize that the approach employed in this work avoids the two-stage process of pre-training with pure simulation and then fine-tuning with real data. Instead, a hybrid real and simulated data strategy is used for end-to-end training from the outset. This strategy leverages simulated samples to expand the diversity of stripe patterns and intensities, while simultaneously anchoring the network’s statistical learning to the real sensor noise distribution through real samples. This hybrid approach mitigates the risk of overfitting to a single multinomial distribution. Quantitative evaluations and visualizations based on real stripe non-uniformity test images indicate that the model’s performance improvement trend on real data aligns with the conclusions derived from simulated data.

4.1.2. Evaluation Indicators

To comprehensively evaluate the performance of denoising networks, this study selects three widely used metrics for image quality evaluation: PSNR, SSIM and the Pearson correlation coefficient (PCC) as normalized pixel-level metrics. PSNR assesses the difference between a denoised image and its noise-free reference image while SSIM captures the structural information of the image and comprehensively considers the similarity in luminance, contrast and structure. The mathematical formulas of these two metrics are defined in [43]. PCC measures the linear relationship between pixel intensities in the denoised and reference images. Let the denoised image be

I_{p r e d}

and the reference image be

I_{g t}

, both expanded into pixel vectors of length

N : {\{x_{i}\}}_{i = 1}^{N}

and

{\{y_{i}\}}_{i = 1}^{N}

respectively. Its definition is given in Equation (13) as follows:

P C C = \frac{\sum_{i = 1}^{N} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2}} \sqrt{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}},

(13)

where

\bar{x}

and

\bar{y}

are the mean of the pixel values in these two images.

P C C \in [- 1, 1]

. The closer the value is to 1, the more it indicates that the denoised result is consistent with the reference image in terms of grayscale variation trends.

4.1.3. Implementation Details

The proposed ADFDNet model was implemented on an NVIDIA GeForce RTX 3080Ti GPU with 12 GB of VRAM. The training utilized the Adam optimizer and a cosine annealing learning rate scheduler, with an initial learning rate set to 0.0002. The training was conducted for 300 epochs, with a batch size of 12. For the dataset, it was divided into training, testing and validation sets in a ratio of 7:2:1. To facilitate the training process, the images were cropped into sub-images of size 64 × 64 and the number of training samples was augmented through data augmentation techniques (left-right flipping, rotation and scaling). In this paper, ADFDNet is compared with six state-of-the-art non-uniformity correction methods based on single image. These comparison methods include the following two manually designed IR non-uniformity correction methods: 1D Column Guided Filter (GF) [28] based on one-dimensional guided filtering, and ADMM-Based Optimization Model (ADOM) [27] based on the alternating direction method of multipliers. Additionally, there are four deep learning-based IR non-uniformity correction methods: DLS-NUC [7], DDL-SR [32], SNRWDNN [33] and SNRCNN [31]. In all experiments presented in this paper, all the deep learning methods are retrained on the dataset we constructed, without utilizing the pre-trained weights provided in their original studies. This approach ensures that all models are evaluated under consistent conditions, using the same dataset, thereby maintaining fairness and reliability in the comparison.

Other classical NUC and general denoising methods are excluded from the quantitative comparison for the following reasons:

(1): Comparability: Many standard NUC methods rely on multi-frame sequences, black-body calibration or specific hardware setups, which differ significantly from the single-frame approach used in this study in terms of input requirements and underlying assumptions.
(2): Task Specificity: General image denoising networks are not tailored to the directional and column-correlated nature of IR stripe FPN. These methods would require substantial retraining and structural adjustments to be applicable.
(3): Practical Scope: In order to balance methodological coverage and practical feasibility, we focus on two representative traditional single-frame stripe removal methods and four established deep learning-based approaches. Other categories are discussed in the related work section but were not quantitatively evaluated here.

Under the same hardware platform and unified training configuration, the average time taken to complete a full training cycle for each deep learning method is shown in Table 2. As can be seen from Figure 9, 300 epochs are sufficient for network convergence, and further increases in training epochs provide very limited improvements in quantitative metrics.

4.2. Qualitative Experiment

To evaluate and compare the performance of different NUC methods, this paper tests the proposed method against six comparison methods. Figure 10 show the real noise test results for scenes 1 and 2, and the simulated noise test results for scenes 3 and 4, where scene 3 represents a medium noise level and scene 4 represents a high noise level.

In addition, for the results of the proposed method, we compute the pixel-wise absolute error between the corrected image and its corresponding reference image and visualize it as a pseudocolor residual error map with an accompanying color bar indicating error magnitude, where warmer colors (toward yellow) represent larger residuals and cooler colors (toward purple) represent smaller residuals, as illustrated in Figure 11.

In real IR noisy images, the traditional methods GF and ADOM produce significant artifacts in both test images. In contrast, among the deep learning-based methods, SNRWDNN is able to preserve more image details but has obvious noise residuals, and DLS-NUC also has a small amount of noise residuals. SNRCNN removes noise while erasing a large amount of image details. Although DDL-SR effectively removes noise, it generates images with overall lower clarity. These results indicate that for real scenes with lower noise intensity, most algorithms can successfully remove noise, but the clarity of the processed images is generally low. ADFDNet achieves the best balance between noise removal and image clarity, which effectively removes noise while generating the highest-clarity denoised images.

In the tests on simulated IR noisy images, SNRCNN exhibits a significant amount of noise residuals under both noise levels. This is mainly due to the use of a three-layer convolutional structure and training only on visible images, which ignores the natural differences between IR and visible images. Therefore, when processing IR images with strong fixed pattern noise (FPN), its performance significantly degrades. The other three deep learning-based methods can remove most of the noise under both noise levels, but there are still a small number of local residuals. For traditional methods, at medium noise levels (scene 3), GF and ADOM can remove part of noise, but residuals still exist in local areas of the image; at high noise levels (scene 4), both methods show severe noise residuals. This further demonstrates that deep learning-based methods have stronger reconstruction capabilities and can better adapt to different intensities of noise, which has particularly better performance at higher noise levels.

4.3. Quantitative Experiment

To more accurately measure the advantages and disadvantages among the various algorithms, this section calculates the three aforementioned image quality evaluation metrics. Table 3, Table 4 and Table 5 present the average metrics for each method on the test images, including real image noise and three different levels of simulated noise images.

From Table 3, it can be observed that deep learning-based methods generally outperform traditional methods in terms of the SSIM metric, with ADFDNet performing particularly well in multiple scenarios. In real noise scenarios, ADFDNet achieves the best SSIM score, which is 3% higher than that of DDL-SR, which ranks second. The scores of traditional methods (GF and ADOM) are relatively lower, which are 4.6% and 6.1% lower than those of ADFDNet, respectively, which reflects their disadvantages in detail preservation. In real noise scenarios, deep learning methods have a significant advantage in terms of the SSIM metric, especially ADFDNet, which can better preserve the detailed structure of the image while removing noise. Under simulated low-level and medium-level noise, ADFDNet is only 0.43% and 0.48% lower than SNRWDNN, which ranks the first. SNRWDNN demonstrates superior detail preservation capabilities. However, the performance of ADFDNet remains close to that of SNRWDNN and demonstrates stable performance among all deep learning methods. In scenarios with higher noise levels, ADFDNet achieves the best SSIM score of 0.9780. The traditional methods (GF and ADOM) obtain significantly lower SSIM scores, which are 3.2% and 2.6% lower than those of ADFDNet, respectively. In scenarios with stronger noise intensities, deep learning methods still exhibit a clear advantage in terms of the SSIM metric. In particular, ADFDNet better balances noise removal and detail preservation.

From the PSNR results in Table 4, it can be seen that ADFDNet achieves a significant advantage in real noise images while maintaining excellent performance in simulated noise images: in real noise scenarios, ADFDNet ranks first with a PSNR score of 27.81, which is 1.5% and 1.3% higher than the second-ranked GF and third-ranked SNRWDNN, respectively. ADFDNet’s optimal PSNR performance in real noise scenarios indicates its outstanding noise removal capabilities while effectively avoiding signal loss due to over-smoothing. In scenarios with simulated noise levels 1 and 2, SNRWDNN performs slightly better than ADFDNet. In scenarios with lower and medium noise levels, SNRWDNN demonstrates strong noise removal capabilities, and ADFDNet maintains high performance and stability. In scenarios with simulated high-level noise, ADFDNet ranks first with a PSNR score of 34.53, which is 1.8% higher than the second-ranked method. The performance of DDL-SR and DLS-NUC declines significantly, and traditional methods have limited performance under high noise. ADFDNet maintains a high PSNR score under high noise conditions, which demonstrates its superior noise removal effect and adaptability compared to other methods.

Based on the PCC results shown in Table 5, our ADFDNet consistently outperforms other methods across almost all noise scenarios. In real noise, it achieves the highest PCC of 0.9582, slightly surpassing ADOM and GF. Under simulated low-level noise, ADFDNet excels with a PCC of 0.9875, which is slightly behind SNRWDNN by 0.0113. In medium-level noise, it maintains a strong PCC of 0.9942, surpassing both traditional methods and SNRWDNN (0.9933). In high-level noise scenarios, ADFDNet remains highly effective with a PCC of 0.9916, well ahead of SNRWDNN and ADOM. These results highlight ADFDNet’s superior ability to preserve image details and structure while effectively suppressing noise.

Through comprehensive analysis, it can be concluded that although SNRWDNN performs excellently in simulated low and medium noise levels, where it ranks first in both PSNR and SSIM. However, its performance decreases significantly in real noise scenarios. The training of SNRWDNN is solely based on simulated data, without fully considering the complexity and characteristics of real noise, which results in insufficient generalization ability in practical applications. In contrast, ADFDNet demonstrates stable performance in both real and simulated noise scenarios, with its SSIM, PSNR and PCC scores consistently ranking among the top, especially in real noise and high noise level scenarios. This is attributed to ADFDNet’s modeling ability for the characteristics of real noise and its adaptability to noise of different intensities, which enables it to achieve the best balance between noise removal and detail preservation. Traditional methods (such as GF and ADOM) still have certain competitiveness in low noise scenarios. However, their PSNR and SSIM scores decline rapidly as noise intensity increases, which indicates insufficient adaptability to high noise scenarios. This reflects the limitations of traditional methods in terms of noise removal and detail preservation. These comparisons are based on the datasets and competing methods considered in this work, and do not claim to cover all possible deep learning-based NUC schemes.

Using not only image quality evaluation metrics to assess the effectiveness of image enhancement methods, this section examines the processing speed of each method. This refers to the time required for the method to process images of the same size. The faster the method, the better it can adapt to the needs of real-time applications. The processing speeds of the methods are shown in Table 6. It is worth mentioning that the method proposed in this paper only requires 0.089s of processing time, which is 65.30% faster than the second-ranked GF method and demonstrates a significant advantage in processing speed.

The total parameter size of our ADFDNet is approximately 10.60 MB, equivalent to around 2.8 ×

10^{6}

learnable parameters. It is important to note that the current version of ADFDNet is primarily focused on validating the algorithm’s effectiveness and exploring its performance potential. However, it has not yet achieved the engineering maturity necessary for direct integration into the internal hardware of IR cameras. The computational and storage limitations inherent in thermal imagers, coupled with strict power consumption budgets, pose challenges for deploying the current multi-branch structure on embedded platforms. Future work will focus on lightweighting the network while preserving its denoising performance.

4.4. Ablation Experiment

Table 7 and Table 8 present the SSIM and PSNR metrics of ADFDNet under different ablation experiment configurations, which evaluates the performance of the complete network (including DFD, SA and ADF modules) and the network after removing some modules. Specifically, we also consider a configuration where the network retains its standard residual units and single-branch topology, without introducing the DFD, SA and ADF modules.

Across all datasets, the complete ADFDNet (including DFD, SA and ADF modules) performs best, with higher SSIM and PSNR metrics than networks with any single module removed. After removing the ADF module, the network’s SSIM and PSNR decrease to varying degrees in all scenarios, with SSIM dropping by 1.6% and PSNR dropping by 1.7% in high simulated noise levels. In real noise scenarios, SSIM also decreases by 0.7%. The ADF module fuses features from high-frequency and low-frequency branches using learnable adaptive weights, which avoids the inflexibility of manually adjusting weights. Its removal leads to weakened fusion of high-frequency and low-frequency information, which reduces the accuracy of noise removal and the ability to preserve image details. After removing the SA module, the network’s PSNR decreases significantly in high simulated noise scenarios, dropping by 5.7%. This result indicates that the SA module plays a crucial role in improving PSNR metrics. The SA module accurately locates and enhances the noise suppression capability of the high-frequency branch through a sparse mechanism, which enables the network to effectively remove high-intensity noise while avoiding over-smoothing of image details. Its removal directly leads to weakened high-frequency noise processing capabilities, which results in significant degradation in overall denoising performance, especially in high noise scenarios where PSNR decreases significantly. This phenomenon highlights the importance of the SA module in ensuring the network’s adaptability and robustness to high noise scenarios. After removing the DFD module, the network’s SSIM and PSNR performance slightly decreases. In high simulated noise scenarios, SSIM drops by 0.4% and PSNR drops by 1.4%. The DFD module decomposes the input image into high-frequency and low-frequency features to remove background and detail noise, respectively. Its removal prevents the network from distinguishing between different frequency features, which affects the performance of subsequent modules. When all three modules, i.e., DFD, SA and ADF, are removed, the network’s performance drops to its lowest across all noise scenarios. This underscores that the absence of these modules severely compromises the network’s denoising effectiveness, making it incapable of efficiently addressing different noise types.

The ablation experiments demonstrate that the three modules (DFD, SA and ADF) of ADFDNet all play crucial roles in the network, and the absence of any one of them leads to significant performance degradation. The introduction of the DFD module ensures effective separation of high-frequency and low-frequency features, which provides more targeted processing paths for subsequent branches. The SA module significantly enhances the removal capability of high-frequency noise through a sparse attention mechanism. The ADF module improves the network’s overall performance in noise removal and detail preservation by adaptively fusing high-frequency and low-frequency information. The complete ADFDNet exhibits optimal performance in both SSIM and PSNR metrics, especially in scenarios with real noise and high simulated noise levels, where its denoising effect and detail preservation capabilities are the best.

4.5. Discussion

4.5.1. Discussion of Loss Function

The values of

α

and

γ

are set to their default values of 1.0 and 1.5, respectively, serving as adjustment factors that control sample weights and address class imbalance between easy and difficult samples.

α

is used to adjust the weights of positive and negative samples in focal loss, helping to alleviate class imbalance.

γ

acts as an adjustment factor that reduces the loss weight of easily classified samples, enabling the model to focus more on the difficult-to-classify samples.

As shown in Table 9, we evaluate model performance under different combinations of

α

and

γ

, including configurations using only MSE, only SSIM, and different weight ratios. When MSE is used alone, PSNR is slightly higher, but SSIM decreases significantly, leading to oversmoothing of structural details. Conversely, when only SSIM is used, structural similarity improved, but PSNR decreases. The combination of

α

= 1.0 and

γ

= 1.5 achieves the best balance, resulting in higher PSNR and SSIM, and offering a more favorable trade-off between pixel fidelity and structural consistency. Therefore, this parameter set is selected as the default configuration for this study.

4.5.2. Discussion of Fold Cross-Validation

In this study, we employ K-fold cross-validation, with “scene” as the basic unit, to assess the robustness and generalization capabilities of our model. Specifically, the dataset consists of 105 distinct scenes, which are randomly divided into K parts, with K set to 5 for this experiment. Each partition is designed to preserve the integrity of individual scenes (i.e., frames from the same scene are not split across folds). In each iteration, one partition is selected as the test set, while the remaining four are used for training and validation. This process is repeated for all five folds, ensuring that each partition serves as the test set exactly once. For each fold, the ADFDNet model is retrained using identical network architecture, loss function and training configurations. Model performance is evaluated using standard metrics such as PSNR and SSIM on the corresponding test set. Finally, to obtain a comprehensive assessment of the model’s performance, the results from all five folds are averaged, providing a measure of the model’s stability and generalization ability across different scene partitions.

The results of the five-fold cross-validation are summarized in Table 10. The data demonstrate that the average PSNR and SSIM values across the

F_{1}

to

F_{5}

folds remain within a narrow range, with minimal fluctuations observed. These findings highlight the strong stability and generalization ability of the ADFDNet model, indicating consistent performance across different scene partitions. This suggests that the model is not overfitting to specific scene characteristics and is capable of maintaining high performance on unseen data, reinforcing its potential for practical deployment in real-world applications.

4.5.3. Discussion of Other Details

(1): Application on treating other direction patterns

It should be noted that the stripe non-uniformity considered in this paper is limited to vertical, column-wise patterns; whether the network can effectively suppress a certain type of stripe noise fundamentally depends on whether the training data contain sufficient samples with the corresponding pattern. In principle, the ADFDNet architecture has enough expressive power to learn multi-directional stripe noise (e.g., vertical, horizontal and oblique stripes) if a dataset of comparable scale covering these patterns is constructed, but the design and validation of such a multi-directional training scheme are left as an important direction for future work.

(2): Application on polarized IR imaging

For polarized IR imaging, the same readout-induced stripe non-uniformity may appear in each polarization sub-channel, but additional care is required to ensure that stripe removal does not distort polarization quantities such as the degree and angle of linear polarization (DoLP/AoLP); in principle, ADFDNet could be extended to this scenario by treating different polarization channels as multi-channel inputs and incorporating polarization-consistency constraints into the loss function, provided that a sufficiently large polarized IR dataset with stripe/reference pairs is available, and the design and validation of such a polarization-oriented extension are left for future work.

(3): Perspective on dynamic images

In this paper, ADFDNet is trained and tested on single-frame IR images, and each frame is processed independently in a quasi-static manner. However, in many practical systems IR data are acquired as continuous video sequences, in which stripe non-uniformity drifts slowly over time and scene targets are constantly moving. Extending ADFDNet to dynamic image sequences by explicitly modeling cross-frame spatio-temporal correlations (e.g., via 3D convolution, ConvLSTM or temporal attention) while controlling the computational cost to avoid large latency or flicker will be an important direction of future work, in line with recent studies on dynamic imaging and physical-state sensing [44].

(4): Current limitations

From the viewpoint of noise components, the proposed ADFDNet is mainly designed to suppress column-wise stripe FPN that remains after temporal multi-frame denoising. Benefiting from the DFD module and the SNRB module, this structured stripe component is significantly reduced in both real and simulated cases, as can be seen from the residual maps. In contrast, the residual fine-grained temporal random noise and slow-varying low-frequency background non-uniformity are not explicitly modeled and are only slightly smoothed as a side effect. In extremely low-contrast regions where useful textures and noise are highly entangled, very weak details or small dim targets may be over-smoothed and slight stripe shadows or local ringing artifacts may still appear, which we regard as one of the current limitations of ADFDNet and a direction for further improvement.

4.5.4. Discussion of Other Standard Methods

Future work will explore the incorporation of multi-frame and unsupervised NUC methods, as well as a broader range of standard denoising networks, to assess their scalability and performance across various types of data and application settings. This will involve testing the proposed approaches on more diverse IR image datasets, including those with varying levels of noise and different environmental conditions. Furthermore, the integration of multi-frame data and unsupervised learning methods could potentially enhance the robustness and adaptability of the model. By expanding the evaluation to include these additional methods, we aim to better understand the strengths and limitations of different denoising techniques in real-world applications. This will provide deeper insights into the practical deployment of NUC and denoising algorithms for IR imaging systems.

4.5.5. Discussion of Advantages and Disadvantages

On the one hand, our proposed ADFDNet method does not rely on motion assumptions and directly performs dual-frequency decomposition and adaptive processing on single-frame images, enabling simultaneous suppression of stripe non-uniform patterns while better preserving edges and texture details. On the other hand, however, as a data-driven multi-branch deep network, it still depends on large paired noisy/clean datasets, entails relatively high computational cost and may be less effective for atypical or extremely low-SNR noise patterns. Also, the generalization needs further verification and improvement. As a matter of fact, these advantages and disadvantages are common issues involved in general image reconstruction tasks [45,46].

4.5.6. Discussion of Future Work

1.: Possible algorithm optimization

(1) More refined feature decomposition mechanism: Currently, ADFDNet uses Gaussian filtering for frequency domain decomposition, which is effective but may introduce smoothing effects. We could explore the incorporation of wavelet transforms or frequency domain attention mechanisms to achieve more adaptive frequency band partitioning, improving the separation accuracy of high-frequency noise and low-frequency details.

(2) Dynamic loss function design: The weights of SSIM and MSE in the loss function are currently fixed. We could design an adaptive weight adjustment strategy to dynamically adjust the ratio of these two based on the image noise intensity or content complexity, thereby enhancing training stability and generalization ability.

(3) Incorporating adversarial training: We could integrate the discriminator structure of generative adversarial networks (GANs) to enhance the visual realism of denoised images, especially improving detail preservation and texture recovery in real-world noisy scenarios.

2.: Further computing acceleration

(1) Convolution operation optimization: We could replace some standard convolutions with depthwise separable convolutions or group convolutions to reduce computational complexity without significantly affecting performance.

(2) Multi-scale feature reuse: By sharing some shallow feature extraction layers in the dual-frequency branches, we can reduce redundant computations and improve the efficiency of forward propagation.

(3) Hardware-aware optimization: We could optimize operator fusion and memory access for GPU or NPU architectures, leveraging inference frameworks such as TensorRT and ONNX Runtime to further enhance real-time performance.

3.: Near-future improvements of the method

(1) Build a larger-scale, multi-scenario and multi-noise-type real IR non-uniform noise dataset to reduce the reliance on simulated data and enhance the model’s generalization ability in real-world scenarios.

(2) Integrate non-uniformity correction with object detection tasks for end-to-end joint training, sharing the feature extraction network, to improve the overall system’s efficiency and consistency.

(3) Analyze the network’s decision-making mechanism in noise suppression and detail preservation through visualized attention maps or feature responses, providing a basis for further optimization.

5. Conclusions

In this paper, we propose an IR stripe NUC method called ADFDNet. Our method decomposes the input IR image into frequency domains through the DFD module to process low-frequency and high-frequency features separately, and utilizes DEB and SNRB to achieve detail reconstruction of low-frequency features and suppression of high-frequency noise, respectively. Subsequently, the ADF module achieves dynamic feature fusion through an adaptive weight mechanism, which significantly improves the balance between noise removal and detail preservation. In addition, the dataset constructed in our work consists of a total of 420 pairs of IR images covering three different noise levels. The experimental results verify the advantages of our ADFDNet in terms of denoising effect, detail preservation and generalization performance through qualitative and quantitative comparisons in both real and simulated noise scenarios.

Author Contributions

Conceptualization, A.S. and H.H.; methodology, A.S. and H.H.; software, A.S. and G.G. (Guanghui Gao); validation, M.Z. and P.G.; formal analysis, H.H., X.K. and W.Q.; investigation, A.S. and H.H.; resources, W.Q. and X.K.; data curation, H.H., G.G. (Guanghui Gao) and M.Z.; writing—original draft preparation, A.S.; writing—review and editing, H.H.; visualization, M.Z. and M.W.; supervision, G.G. (Guohua Gu), Q.C. and M.W.; project administration, X.K. and M.W.; funding acquisition, X.K. and M.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Open Fund of State Key Laboratory of Deep-sea Manned Vehicles (2025SKLDMV06), the Equipment Pre-research Weapon Industry Application Innovation Project (627010402), the National Natural Science Foundation of China (62201260 and 62571245), the Fundamental Research Funds for the Central Universities (30924010941 and 30925020226) and the Project of Jiangsu Province Independent Scientific Research Fund (2025-JSS-LB-034-14).

Data Availability Statement

The data of our study are available upon request.

Acknowledgments

The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yang, W.; Liu, Y.; Chen, X. D3fusion: Decomposition–disentanglement–dynamic compensation framework for infrared-visible image fusion in extreme low-light. Appl. Sci. 2025, 15, 8918. [Google Scholar] [CrossRef]
Liao, D.; Shu, X.; Li, Z.; Liu, Q.; Yuan, D.; Chang, X.; He, Z. Fine-grained feature and template reconstruction for tir object tracking. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 9276–9286. [Google Scholar] [CrossRef]
Marques, G.; Pitarma, R. Non-contact infrared temperature acquisition system based on Internet of things for laboratory activities monitoring. Procedia Comput. Sci. 2019, 155, 487–494. [Google Scholar] [CrossRef]
Zhu, Y.; Ma, Y.; Fan, F.; Huang, J.; Yao, Y.; Zhou, X.; Huang, R. Towards robust infrared small target detection via frequency and spatial feature fusion. IEEE Trans. Geosci. Remote Sens. 2025, 63, 2001115. [Google Scholar] [CrossRef]
Cao, Y.; Li, Y. Strip non-uniformity correction in uncooled long-wave infrared focal plane array based on noise source characterization. Opt. Commun. 2015, 339, 236–242. [Google Scholar] [CrossRef]
Wang, H.; Yang, X.; Wang, Z.; Yang, H.; Wang, J.; Zhou, X. Improved CycleGAN for Mixed Noise Removal in Infrared Images. Appl. Sci. 2024, 14, 6122. [Google Scholar] [CrossRef]
He, Z.; Cao, Y.; Dong, Y.; Yang, J.; Cao, Y.; Tisse, C.L. Single-image-based nonuniformity correction of uncooled long-wave infrared detectors: A deep-learning approach. Appl. Opt. 2018, 57, D155–D164. [Google Scholar] [CrossRef]
Li, F.; Zhao, Y.; Luo, H.; Lv, C. Spatio-temporal deep recurrent convolutional neural network for infrared focal plane arrays non-uniformity correction. Infrared Phys. Technol. 2024, 140, 105390. [Google Scholar] [CrossRef]
Liu, Y.; Qiu, B.; Tian, Y.; Cai, J.; Sui, X.; Chen, Q. Scene-based dual domain non-uniformity correction algorithm for stripe and optics-caused fixed pattern noise removal. Opt. Express 2024, 32, 16591–16610. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; Zhao, W. Scene-based nonuniformity correction using local constant statistics. J. Opt. Soc. Am. A 2008, 25, 1444–1453. [Google Scholar] [CrossRef] [PubMed]
Zuo, C.; Chen, Q.; Gu, G.; Qian, W. New temporal high-pass filter nonuniformity correction based on bilateral filter. Opt. Rev. 2011, 18, 197–202. [Google Scholar] [CrossRef]
Zeng, J.; Sui, X.; Gao, H. Adaptive image-registration-based nonuniformity correction algorithm with ghost artifacts eliminating for infrared focal plane arrays. IEEE Photonics J. 2015, 7, 1–16. [Google Scholar] [CrossRef]
Li, Y.; Liu, N.; Xu, J. Infrared scene-based non-uniformity correction based on deep learning model. Optik 2021, 227, 165899. [Google Scholar] [CrossRef]
Deng, F.; Chen, S.; Ma, Y.; Cheng, S.; Sun, Y.; Yang, J. RCAU-Net: Convolutional networks with residual channel attention for non-uniformity correction. Int. J. Sens. Netw. 2024, 44, 169–181. [Google Scholar] [CrossRef]
Chen, S.; Deng, F.; Zhang, H.; Lyu, S.; Kou, Z.; Yang, J. Infrared non-uniformity correction model via deep convolutional neural network. In Proceedings of the IET Conference Proceedings CP820, Beijing, China, 11–13 November 2022; Volume 2022, pp. 178–184. [Google Scholar]
Bernacki, J.; Scherer, R. Algorithms and Methods for Individual Source Camera Identification: A Survey. Sensors 2025, 25, 3027. [Google Scholar] [CrossRef]
Volkov, A.A.; Kozlov, A.V.; Cheremkhin, P.A.; Rymov, D.A.; Shifrina, A.V.; Starikov, R.S.; Nebavskiy, V.A.; Petrova, E.K.; Zlokazov, E.Y.; Rodin, V.G. A review of Neural network-based image noise processing methods. Sensors 2025, 25, 6088. [Google Scholar] [CrossRef]
Liu, T.; Sui, X.; Wang, Y.; Wang, Y.; Chen, Q.; Guan, Z.; Chen, X. Strong non-uniformity correction algorithm based on spectral shaping statistics and LMS. Opt. Express 2023, 31, 30693–30709. [Google Scholar] [CrossRef] [PubMed]
Helfrich, R. Programmable compensation technique for staring arrays. In Proceedings of the Smart Sensors, Washington, DC, USA, 17–20 April 1979; Volume 178, pp. 110–123. [Google Scholar]
Hu, X.M. IR FPA non-uniformity correction study based on the approach of modified two point temperature correction accuracy. Infrared Laser Eng. 2000, 29, 19–21. [Google Scholar]
Li, Y.X.; Sun, D.X.; Liu, Y.N. Polynomial-Fitting–Based Nonuniformity Correction for Infrared Focal Plane Arrays. Laser Infrared 2005, 35, 4. [Google Scholar]
Qu, H.; Chen, Q. A theoretical model on infrared focal plane arrays binary nonlinear nonuniformity. Acta Electron. Sin. 2008, 36, 2150–2153. [Google Scholar]
Yang, H.; Yan, P.; Yang, F.; Cheng, J. Multi-segment Second-order Non-uniformity Correction of IRFPA Based on FPGA. Acta Photonica Sin. 2025, 54, 0704001. [Google Scholar]
Scribner, D.A.; Sarkady, K.A.; Kruer, M.R.; Caulfield, J.T.; Hunt, J.; Herman, C. Adaptive nonuniformity correction for IR focal-plane arrays using neural networks. In Proceedings of the Infrared Sensors: Detectors, Electronics, and Signal Processing, San Diego, CA, USA, 24–26 July 1991; Volume 1541, pp. 100–109. [Google Scholar]
Narendra, P. Scene-based nonuniformity compensation for imaging sensors. IEEE Trans. Pattern Anal. Mach. Intell. 1982, PAMI-4, 57–61. [Google Scholar] [CrossRef]
Harris, J.G.; Chiang, Y.M. Nonuniformity correction using the constant-statistics constraint: Analog and digital implementations. In Proceedings of the Infrared Technology and Applications XXIII, Orlando, FL, USA, 20–25 April 1997; Volume 3061, pp. 895–905. [Google Scholar]
Kim, N.; Han, S.S.; Jeong, C.S. ADOM: ADMM-based optimization model for stripe noise removal in remote sensing image. IEEE Access 2023, 11, 106587–106606. [Google Scholar] [CrossRef]
Cao, Y.; Yang, M.Y.; Tisse, C.L. Effective strip noise removal for low-textured infrared images based on 1-D guided filtering. IEEE Trans. Circuits Syst. Video Technol. 2015, 26, 2176–2188. [Google Scholar] [CrossRef]
Lin, Z.; Huang, X.; Li, Z. Non-uniform correction method for infrared motion blurred images. Infrared Technol. 2025, 47, 765. [Google Scholar]
Liu, K.; Chen, H.; Bao, W.; Wang, J. Thermal imaging spatial noise removal via deep image prior and step-variable total variation regularization. Infrared Phys. Technol. 2023, 134, 104888. [Google Scholar] [CrossRef]
Kuang, X.; Sui, X.; Liu, Y.; Chen, Q.; Gu, G. Single infrared image optical noise removal using a deep convolutional neural network. IEEE Photonics J. 2017, 10, 1–15. [Google Scholar] [CrossRef]
Kuang, X.; Sui, X.; Liu, Y.; Liu, C.; Chen, Q.; Gu, G. Robust destriping method based on data-driven learning. Infrared Phys. Technol. 2018, 94, 142–150. [Google Scholar] [CrossRef]
Guan, J.; Lai, R.; Xiong, A. Wavelet deep neural network for stripe noise removal. IEEE Access 2019, 7, 44544–44554. [Google Scholar] [CrossRef]
Chang, Y.; Chen, M.; Yan, L.; Zhao, X.L.; Li, Y.; Zhong, S. Toward universal stripe removal via wavelet-based deep convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2019, 58, 2880–2897. [Google Scholar] [CrossRef]
Li, T.; Zhao, Y.; Li, Y.; Zhou, G. Non-uniformity correction of infrared images based on improved CNN with long-short connections. IEEE Photonics J. 2021, 13, 1–13. [Google Scholar] [CrossRef]
Huang, M.; Chen, W.; Zhu, Y.; Duan, Q.; Zhu, Y.; Zhang, Y. An Adaptive Weighted Residual-Guided Algorithm for Non-Uniformity Correction of High-Resolution Infrared Line-Scanning Images. Sensors 2025, 25, 1511. [Google Scholar] [CrossRef] [PubMed]
Tian, C.; Xu, Y.; Li, Z.; Zuo, W.; Fei, L.; Liu, H. Attention-guided CNN for image denoising. Neural Netw. 2020, 124, 117–129. [Google Scholar] [CrossRef] [PubMed]
Tian, C.; Xu, Y.; Zuo, W. Image denoising using deep CNN with batch renormalization. Neural Netw. 2020, 121, 461–473. [Google Scholar] [CrossRef]
Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
Liu, Z.; Xu, J.; Wang, X.; Nie, K.; Jin, W. A fixed-pattern noise correction method based on gray value compensation for TDI CMOS image sensor. Sensors 2015, 15, 23496–23513. [Google Scholar] [CrossRef]
Narayanan, B.; Hardie, R.C.; Muse, R.A. Scene-based nonuniformity correction technique that exploits knowledge of the focal-plane array readout architecture. Appl. Opt. 2005, 44, 3482–3491. [Google Scholar] [CrossRef]
Ma, B.; Yao, J.; Le, Y.; Qin, C.; Yao, H. Efficient image noise estimation based on skewness invariance and adaptive noise injection. IET Image Process. 2020, 14, 1393–1401. [Google Scholar] [CrossRef]
Jiang, N.; Zhang, Y.; Li, Q.; Yan, F. An infrared thermal image denoising method focusing on noise feature learning. Opt. Laser Technol. 2025, 184, 112475. [Google Scholar] [CrossRef]
Torres, J.A.; Torres-Torres, C.; Vidal, E.; Fernández, F.; de Icaza-Herrera, M.; Loske, A.M. Violin vibration state determined from laser streak patterns. Appl. Acoust. 2022, 185, 108384. [Google Scholar] [CrossRef]
Hou, F.; Zhang, Y.; Zhou, Y.; Zhang, M.; Lv, B.; Wu, J. Review on infrared imaging technology. Sustainability 2022, 14, 11161. [Google Scholar] [CrossRef]
Li, M.; Wang, Y.; Sun, H. Single-Frame Infrared Image Non-Uniformity Correction Based on Wavelet Domain Noise Separation. Sensors 2023, 23, 8424. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Illustration of stripe non-uniformity noise contamination. (a) Clean IR image. (b) Noisy image contaminated by stripe noise. (c) Block diagram of uncooled long wave IR FPA.

Figure 2. Overview of the proposed ADFDNet.

Figure 3. Structure diagram of key modules in DEB. (a) PA module. (b) Original residual module. (c) Enhanced residual convolution module.

Figure 4. Structure of the dilated attention convolutional layer.

Figure 5. Structure diagram of ADF module.

Figure 7. An example of images to be registered. (a) Reference image. (b) Image to be registered.

Figure 8. Part of the dataset, including (a) ground truth, (b) real noise, (c) simulated low-level noise, (d) simulated media-level noise and (e) simulated high-level noise.

Figure 9. Training convergence curve of our method.

Figure 10. Partial test results, including (a) scene 1 real noise test results, (b) scene 2 real noise test results, (c) scene 3 real noise test results and (d) scene 4 real noise test results.

Figure 11. Residual error maps with error bars. (a) Noisy image, (b) Denoised image, (c) Ground truth, (d) Residual error map.

Table 1. Statistical results of standard deviation and PSNR data for images with different noise levels.

Noise Level	Metric	Average Value	Minimum Value	Maximum Value
Low	Standard deviation	40.93	25.76	55.36
	PSNR (dB)	39.17	35.90	41.23
Medium	Standard deviation	41.79	26.90	55.97
	PSNR (dB)	36.15	33.58	37.23
High	Standard deviation	44.24	30.14	57.86
	PSNR (dB)	33.84	31.88	34.79

Table 2. Average time (in seconds) taken to complete a full training cycle for each deep learning method (the best result is marked in bold).

DLS-NUC	DDL-SR	SNRWDNN	SNRCNN	Ours
20.56	11.03	15.99	14.06	3.43

Table 3. SSIM experimental results of each method on the test dataset (the best results are marked in bold).

Method	Real Noise	Simulated Low-Level Noise	Simulated Media-Level Noise	Simulated High-Level Noise
GF	0.9097	0.9923	0.9821	0.9458
ADOM	0.8959	0.9750	0.9713	0.9525
DLS-NUC	0.9128	0.9869	0.9810	0.9658
DDL-SR	0.9254	0.9806	0.9801	0.9778
SNRDWNN	0.9046	0.9934	0.9872	0.9765
SNRCNN	0.8271	0.8701	0.7879	0.4441
Ours	0.9536	0.9891	0.9824	0.9780

Table 4. PSNR experimental results of each method on the test datase (the best results are marked in bold).

Method	Real Noise (dB)	Simulated Low-Level Noise (dB)	Simulated Media-Level Noise (dB)	Simulated High-Level Noise (dB)
GF	27.44	44.35	39.50	33.84
ADOM	24.16	26.45	26.34	25.24
DLS-NUC	24.27	35.25	34.53	32.74
DDL-SR	27.15	29.21	29.29	29.37
SNRDWNN	27.39	45.15	40.31	33.90
SNRCNN	25.18	28.95	28.08	24.55
Ours	27.81	40.77	38.08	34.53

Table 5. PCC experimental results of each method on the test dataset (the best results are marked in bold).

Method	Real Noise	Simulated Low-Level Noise	Simulated Media-Level Noise	Simulated High-Level Noise
GF	0.9569	0.9952	0.9930	0.9900
ADOM	0.9578	0.9965	0.9937	0.9908
DLS-NUC	0.9487	0.9887	0.9842	0.9816
DDL-SR	0.2239	0.2068	0.2041	0.2138
SNRWDNN	0.9374	0.9988	0.9933	0.9914
SNRCNN	0.9008	0.9275	0.9222	0.8782
Ours	0.9582	0.9875	0.9942	0.9916

Table 6. Average runtime of each method (the best result is marked in bold).

Method	GF	ADOM	DLS-NUC
Runtime (s)	0.231	4.294	0.825
DDL-SR	SNRWDNN	SNRCNN	Ours
0.477	0.628	0.595	0.089

Table 7. Results of ablation study in terms of SSIM (the best results are marked in bold).

Module			Noise Type
DFD	SA	ADF	Real Noise	Simulated Low-Level Noise	Simulated Media-Level Noise	Simulated High-Level Noise
			0.9420	0.9790	0.9725	0.9620
	✓	✓	0.9526	0.9851	0.9817	0.9742
✓	✓		0.9455	0.9810	0.9755	0.9650
✓		✓	0.9469	0.9818	0.9771	0.9633
✓	✓	✓	0.9536	0.9891	0.9824	0.9780

Table 8. Results of ablation study in terms of PSNR (the best results are marked in bold).

Module			Noise Type
DFD	SA	ADF	Real Noise	Simulated Low-Level Noise	Simulated Media-Level Noise	Simulated High-Level Noise
			27.20	34.26	33.01	30.77
	✓	✓	27.76	40.72	37.83	34.03
✓	✓		27.34	35.99	34.74	32.53
✓		✓	27.71	39.37	37.33	33.96
✓	✓	✓	27.81	40.77	38.08	34.53

Table 9. PSNR and SSIM results for different values of

α

and

γ

.

Table 9. PSNR and SSIM results for different values of

α

and

γ

.

$α$	$γ$	PSNR (dB)	SSIM
$0.0$	$1.5$	$33.01$	$0.9477$
$1.0$	$0.0$	$33.00$	$0.9484$
$0.5$	$1.0$	$33.50$	$0.9551$
$1.0$	$1.0$	$34.95$	$0.9690$
$1.0$	$1.5$	$35.30$	$0.9758$
$1.0$	$2.0$	$34.55$	$0.9591$

Table 10. Results of cross-validation experiment.

Fold	PSNR (dB)	SSIM
$F_{1}$	$35.10$	$0.9691$
$F_{2}$	$35.27$	$0.9739$
$F_{3}$	$35.39$	$0.9747$
$F_{4}$	$35.35$	$0.9751$
$F_{5}$	$35.42$	$0.9764$
Average	$35.31$	$0.9738$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shao, A.; He, H.; Gao, G.; Zhang, M.; Ge, P.; Kong, X.; Qian, W.; Gu, G.; Chen, Q.; Wan, M. Adaptive Dual-Frequency Denoising Network-Based Strip Non-Uniformity Correction Method for Uncooled Long Wave Infrared Camera. Appl. Sci. 2026, 16, 1052. https://doi.org/10.3390/app16021052

AMA Style

Shao A, He H, Gao G, Zhang M, Ge P, Kong X, Qian W, Gu G, Chen Q, Wan M. Adaptive Dual-Frequency Denoising Network-Based Strip Non-Uniformity Correction Method for Uncooled Long Wave Infrared Camera. Applied Sciences. 2026; 16(2):1052. https://doi.org/10.3390/app16021052

Chicago/Turabian Style

Shao, Ajun, Hongying He, Guanghui Gao, Mengxu Zhang, Pengqiang Ge, Xiaofang Kong, Weixian Qian, Guohua Gu, Qian Chen, and Minjie Wan. 2026. "Adaptive Dual-Frequency Denoising Network-Based Strip Non-Uniformity Correction Method for Uncooled Long Wave Infrared Camera" Applied Sciences 16, no. 2: 1052. https://doi.org/10.3390/app16021052

APA Style

Shao, A., He, H., Gao, G., Zhang, M., Ge, P., Kong, X., Qian, W., Gu, G., Chen, Q., & Wan, M. (2026). Adaptive Dual-Frequency Denoising Network-Based Strip Non-Uniformity Correction Method for Uncooled Long Wave Infrared Camera. Applied Sciences, 16(2), 1052. https://doi.org/10.3390/app16021052

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Dual-Frequency Denoising Network-Based Strip Non-Uniformity Correction Method for Uncooled Long Wave Infrared Camera

Abstract

1. Introduction

2. Related Work

2.1. Calibration-Based Methods

2.2. Scene-Based Methods

2.3. Deep Learning-Based Methods

3. Method

3.1. Overview

3.2. DFD Module

3.3. DEB Module

3.4. SNRB Module

3.5. ADF Module

3.6. Loss Function

4. Experimental Results

4.1. Experimental Setup

4.1.1. Dataset

4.1.2. Evaluation Indicators

4.1.3. Implementation Details

4.2. Qualitative Experiment

4.3. Quantitative Experiment

4.4. Ablation Experiment

4.5. Discussion

4.5.1. Discussion of Loss Function

4.5.2. Discussion of Fold Cross-Validation

4.5.3. Discussion of Other Details

4.5.4. Discussion of Other Standard Methods

4.5.5. Discussion of Advantages and Disadvantages

4.5.6. Discussion of Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI