1. Introduction
SWIR imaging technology, owing to its unique spectral characteristics, has demonstrated irreplaceable value in a wide range of applications, including low-light night vision, material identification, and penetrative detection. Among these, high-sensitivity low-light detection, as a frontier direction of SWIR research, plays a crucial role in remote sensing, security defense, and scientific observation. For instance, Pooja Sudha et al. achieved ultra-sensitive single-photon detection in the SWIR band using a silicon-based single-electron transistor (SET) structure [
1], while Shu et al. designed a high-performance broadband mid-infrared photodetector based on a MoS
2/BP/MoS
2 junction field-effect transistor [
2].
However, the core challenge in achieving high-sensitivity detection lies in the fact that conventional phototransistors, while amplifying photogenerated signals, inevitably also amplify noise induced by channel defects and impurity scattering, resulting in a deterioration of SNR. As an emerging detection architecture, the heterojunction gate field-effect transistor (HGFET) effectively addresses this challenge through its intrinsic low-noise and high-gain characteristics. Its unique isolation structure enables the signal generated by photodiode to efficiently modulate the channel current. This scheme achieves high-gain amplification while preventing intrinsic channel noise from feeding back to the photoelectric interface, which also realizes a synergistic mechanism of signal amplification and noise suppression at the device level.
The structural configuration of the HGFET SWIR detector is illustrated in
Figure 1. From top to bottom, it consists of the PbS QDs (lead sulfide quantum dot layer) serving as the photosensitive absorption layer; the ZnO layer functioning as the electron transport layer (ETL); the Y
2O
3 layer acting as the gate dielectric; the Pd contacts serving as source and drain electrodes; and the CNT films (carbon nanotube films) forming the conduction channel. The entire structure is built on a Si/SiO
2 substrate, which provides mechanical support and electrical insulation. Through the synergistic interaction between light absorption in the PbS quantum dot layer and current amplification in the HGFET channel, this device achieves high-gain and low-noise SWIR photodetection. Experimental results demonstrate that the detectivity of HGFET-based detectors can exceed 10
14 Jones, which is approximately two orders of magnitude higher than that of commercial indium gallium arsenide (InGaAs) detectors [
3]. Therefore, it shows tremendous potential in ultra-low-light sensing, high-resolution imaging, and cost-efficient infrared system integration.
Despite the excellent sensitivity of HGFET at the device level, their weak output electrical signals remain highly susceptible to various noise sources during imaging. Under low-illumination conditions, SWIR images typically suffer from insufficient contrast, spatial detail degradation, and pronounced temporal noise, which severely hinder the full utilization of detector performance. At image processing level, existing enhancement algorithms have made efforts to improve image quality to some extent. For example, Ashiba et al. employed a three-stage hybrid contrast enhancement method for infrared night vision images [
4]; Liu et al. applied total variation and low-rank bidirectional twisted tensor decomposition to remove temporal response noise (TRN) from infrared detectors [
5]; and Dong et al. proposed a simple real-time SWIR image enhancement method based on Gaussian filtering and histogram remapping differences [
6]. However, most of these approaches are designed for conventional commercial detectors based on InGaAs or HgCdTe and are therefore not directly applicable to the unique requirements of HGFET detectors operating in high-sensitivity, low-light scenarios. For example, traditional histogram equalization (HE) methods (e.g., LAHE, MHE) often cause local over-enhancement and detail loss [
7]; three-dimensional noise reduction (3DNR) techniques can suppress temporal noise effectively but exhibit poor adaptability to scenes with extreme temperature variations [
8]; and multi-scale wavelet-based algorithms achieve satisfactory noise separation at the expense of computational complexity and limit real-time processing performance [
9]. In recent years, deep learning has achieved remarkable progress in infrared–visible image fusion [
10]; Tian et al. proposed a GAN-driven multi-scale detail enhancement approach that preserves weak target details during fusion [
11]. Nevertheless, such models usually feature complex architectures and high computational costs that make them difficult to deploy efficiently on resource-constrained embedded hardware platforms.
Therefore, to fully exploit the performance potential of HGFET technology in SWIR low-light detection, it is imperative to develop a dedicated image processing algorithm that simultaneously offers strong enhancement capability, effective noise suppression, and hardware-friendly design. This work aims to develop an image enhancement algorithm tailored for the imaging characteristics of HGFET-based detectors under low-light conditions. It is capable of improving image contrast, suppressing background noise, and preserving weak target details. Furthermore, the algorithm is implemented and validated on FPGA platform to provide a feasible technical pathway for next-generation high-sensitivity SWIR imaging systems.
2. Principle of the Image Enhancement Algorithm for HGFET-Based High-Sensitivity Low-Light Detection
The design of the algorithm proposed in this paper is essentially consistent with the core physical characteristics of the HGFET detector. Firstly, to address the fixed-pattern defects introduced by carbon nanotube growth and lithography processes, the algorithm employs defective row-column correction driven by dark-state reference data at the front end, reconstructing a usable image foundation. Secondly, to match its photoelectric response characteristics of high gain and susceptibility to saturation, an adaptive Gamma correction is introduced to dynamically adjust the response curve, preventing over-exposure and expanding the dynamic range. Finally, to handle the complex noise and weak signals under high-gain amplification, clip-limited adaptive histogram equalization (CLAHE) and variance-based dynamic threshold selection are adopted, effectively suppressing noise amplification while ensuring robust separation of background and weak targets across various scenes. The entire processing chain represents a hardware-algorithm co-optimization tailored to the HGFET’s unique capability of “amplifying signals without amplifying noise”.
2.1. Correction of Defective Rows and Columns
Defective rows and columns, as common pixel-level defects in infrared imaging systems, primarily originate from multiple physical mechanisms, including non-uniform carbon nanotube growth [
9], hydrogen-terminated interface state defects, photolithography deviations, poor electrode contact, etching-induced damage [
12], electrical overstress breakdown, and environmental oxidation. Electrically, these defects manifest as entire rows or columns exhibiting persistent high-current or zero-current outputs, or as signal responses decoupled from incident optical intensity variations.
At imaging level, defective rows and columns severely degrade image quality and usability. On the one hand, such defects obscure genuine target structures, compress the system’s effective dynamic range, and lead to decreased spatial resolution and aggravated non-uniformity [
13]. Specifically, it can directly cause significant grayscale jumps between adjacent pixels, and in actual testing, it can deteriorate the system’s signal-to-noise ratio by 3–5 dB, severely reducing image quality. More importantly, these defective pixels will seriously interfere with subsequent processing algorithms: their extreme grayscale values will distort the statistical distribution of the image and induce enhancement artifacts; pollute surrounding pixels during spatial filtering; and generate false contours in edge detection. Even deep learning methods are difficult to immune to, and may produce “ghosting” or amplify defects. In addition, the real-time calibration requirements have significantly increased the hardware overhead of FPGA.
In summary, the influence of defective rows and columns on SWIR imaging systems extends across multiple layers, from physical mechanisms to algorithmic processing, which directly degrades image quality and posing significant challenges to the robustness and accuracy of subsequent processing stages. Consequently, effective detection and correction of such defects constitute an indispensable step in high-accuracy infrared image preprocessing.
In this work, the detector bias and readout timing are configured on a board-level platform to acquire a dark-state image
, from which the most frequently occurring gray level
k is determined across all pixels. Given an input image
and an output image
, where
x and
y denote the row and column pixel indices, the correction of defective rows and columns is formulated as follows:
Since both the input and dark-state images contain the same fixed-pattern noise associated with defective rows and columns, their subtraction effectively eliminates these artifacts. The subsequent addition of the background value k restores the corrected image intensity baseline, yielding an image free from row and column defects.
2.2. Adaptive Gamma Correction
Gamma correction is a nonlinear gray-level transformation technique widely used in image processing to adjust the brightness and contrast distribution of an image. Its primary function is to optimize perceived image quality by modifying the mapping curve of pixel intensity values, with particularly notable effectiveness in enhancing images captured under low-illumination (dark-state) conditions.
To improve the response of dark-state regions while avoiding overexposure in high-intensity illumination, the maximum pixel value Emax of the image
, obtained after defective row and column removal, is first calculated. When
falls within the thresholds corresponding to dark (
), normal (
), and high-brightness (
) environments, different gamma correction strategies are applied accordingly.
The parameters γ0 and γ1 are set to values greater than 1 and less than 1, respectively, and their values are adjusted to achieve the optimal imaging performance.
2.3. Clip-Limited Adaptive Histogram Equalization (CLAHE)
CLAHE as an important improvement over traditional HE, effectively mitigates the problem of over-enhancement caused by global HE through the introduction of a local adaptive processing mechanism [
14].
In practice, the algorithm divides the image into small tiles, performs HE on each tile, and constrains the extent of local contrast enhancement (LCE) during the process. This approach prevents over-enhancement of noise or minor details while improving overall visual quality. The implementation steps of CLAHE can be described as follows: image tiling, local histogram computation, histogram clipping, HE, and bilinear interpolation. Specifically, the image is divided into M × N non-overlapping rectangular tiles, denoted as tiles. For each tile, the grayscale histogram
is computed, where
i ∈ [0, 255]. The local histogram is then clipped according to the following formula:
The parameter clipLimit represents the contrast limitation threshold; a higher value results in stronger contrast enhancement. The parameter tileGridSize denotes the number of divided tiles, smaller tiles lead to stronger local adaptivity. The tileSize defines the actual pixel dimensions of each tile, determined by the overall image size and the specified tileGridSize. After clipping, the excess pixels are uniformly redistributed across all gray levels.
The cumulative distribution function (CDF) of the clipped histogram
is then calculated as follows:
The gray level mapped through the CDF is given by:
The gray-level transformation function of CLAHE can be expressed as:
where
denotes the cumulative probability of the minimum gray level within the tile, and the denominator
represents the total number of pixels in the tile.
To prevent discontinuities between adjacent tiles, bilinear interpolation is applied to the boundary pixels using the CDFs of the four neighboring tiles.
Given that the original image possesses a 14-bit depth (gray level range 0–16,383), whereas the CLAHE algorithm typically operates on 8-bit data to improve statistical and interpolation efficiency, a bit-depth conversion module is introduced prior to enhancement. Specifically, the original 14-bit image is first converted into 8-bit representation through either linear or adaptive mapping, after which the CLAHE process is applied. Following enhancement, the output data are reconverted to the 14-bit domain to maintain compatibility with subsequent processing modules and to preserve sufficient dynamic range for further analysis.
2.4. OTSU Threshold Segmentation and Statistical Moment Analysis for Background Noise Separation
After image contrast enhancement, a threshold-based background denoising module is designed to effectively suppress background noise and further improve overall contrast and SNR. This module integrates the OTSU threshold segmentation method with statistical moment analysis to dynamically compute the optimal threshold and separate target region from background noise.
In implementation, the variance of the image is first calculated to characterize its gray-level distribution. Based on a predefined variance threshold, an adaptive segmentation strategy is selected: when the variance is below the threshold, the OTSU algorithm is employed for segmentation; when it is above the threshold, the statistical moment analysis method is applied instead. Experimental results demonstrate that this adaptive strategy effectively adjusts to different noise and contrast characteristics across images, achieving superior background noise separation.
This variance value, serving as a measure of scene complexity, is directly fed into the adaptive threshold selector. The underlying logic is as follows: a low variance typically indicates a uniform background and a single dominant target, in which case the OTSU algorithm can effectively perform segmentation by maximizing inter-class variance. Conversely, a high variance suggests a complex scene or significant noise, where the global thresholding nature of OTSU becomes unreliable; thus, the system automatically switches to the statistical-moment–based method, which is inherently more robust to noise. Through this mechanism, the enhanced image produced by the CLAHE stage not only improves visual contrast but also provides statistical characteristics that directly drive the adaptive behavior of the subsequent segmentation stage.
The threshold was determined through a grid-search procedure conducted on a calibrated dataset containing diverse scenes, including dark, low-light, and high-illumination conditions. We evaluated the algorithm’s overall performance, balancing background noise separation and target integrity, while varying the variance threshold within the range [100, 1000].
The experimental results indicate that the algorithm achieves an optimal balance across a wide range of test scenes when the variance value is set to 400. Similarly, the clipLimit in the CLAHE module was set to 3.0, chosen to strike an optimal balance between avoiding over-enhancement artifacts and preserving sufficient local detail.
The OTSU algorithm is an adaptive thresholding technique that automatically determines the optimal threshold by maximizing the inter-class variance, without the need for manual parameter tuning. For a grayscale image, setting a threshold
T divides the pixels into two classes: the background class
C0 and the target class
C1. The inter-class variance is defined as follows:
where
and
represent the probabilities of the two pixel classes,
and
denote their respective mean gray levels, and
is the overall mean gray level of the image. The optimal threshold
T* is the value of
T that maximizes the between-class variance
.
Statistical moments are mathematical tools used to describe the shape characteristics of probability distributions and are widely employed in image analysis to characterize gray-level distribution features.
Let
denote the variance of the image, and
be a predefined empirical threshold. The segmentation threshold
is adaptively selected as follows:
where
is the threshold computed by OTSU on image
I,
is the formula for statistical moment analysis,
and
are the mean and standard deviation of the CLAHE processed image, respectively, and
k is a multiple of the standard deviation (adjusted according to the target sparsity).
To determine an appropriate value of
k for the proposed algorithm, the Average Gradient (AG) [
15] is employed as an evaluation metric. For the same image, the corresponding AG values under different
k coefficients are shown in
Figure 2.
As illustrated in
Figure 2, the AG value reaches its maximum of 770.0489 when
k is set to 0.88. Consequently, this value of
k is selected as the multiplier for the image standard deviation. Furthermore, for different input images, the proposed algorithm adaptively adjusts the
k value to ensure that the AG of the processed image is always maximized.
Unlike conventional frameworks that simply cascade adaptive histogram equalization and threshold segmentation, the proposed algorithm constructs a multi-stage processing pipeline with an embedded feedback path. Its mathematical core lies in the introduction of a variance-based decision function, which dynamically switches between the OTSU and statistical moment analysis methods. This mechanism effectively overcomes the inherent limitations of single-threshold algorithms under specific imaging conditions, thereby achieving more robust and adaptive background noise separation.
2.5. Median Filtering
Median filtering can effectively remove impactful noise such as salt and pepper noise in images, while preserving image edges and details well [
16]. Given that HGFET detectors operate under extremely low-light conditions, their readout circuits and data processing chain are susceptible to significant salt-and-pepper noise (impulse noise). Rooted in order-statistics theory, median filtering replaces the central pixel value with the median of its neighborhood, which proves highly effective in removing this type of noise. Its robustness stems from not relying on a specific noise distribution model.
Furthermore, HGFET detectors hold potential for high-resolution imaging, with pixel pitch potentially scalable to below 5 μm. While linear filters, such as Gaussian filters, often cause edge blurring alongside noise smoothing, the median filter excels at maintaining edge sharpness and fine details while removing noise. This capability is crucial for achieving clarity and accuracy in low-light imaging. From a hardware implementation perspective, the median filter algorithm possesses a high degree of parallelism, making it well-suited for high-speed processing on FPGAs or dedicated DSPs.
In this paper, median filtering is implemented based on the convolution kernel theory. An n × n filter kernel is defined and populated with pixel data. The data within the kernel is then reordered using the bubble sort algorithm, and the median value is selected as the output for the current pixel. To maintain the original image dimensions, zero-padding is applied to the image borders. For instance, a 3 × 3 kernel requires a one-pixel border of zero-padding around the image. An example of the resulting data matrix is illustrated in
Figure 3.
As shown in
Figure 3, the red box represents the 3 × 3 filter kernel, the numbers with a yellow background correspond to a 4 × 4 pixel matrix, and the white areas indicate zero-padding.
For 64 × 64 pixel matrix, a comparative analysis of the imaging results using 3 × 3 and 5 × 5 median filtering is conducted via FPGA board-level verification. It is found that the 3 × 3 median filter demonstrates superior edge preservation and more effective removal of salt-and-pepper noise. A visual comparison of the imaging effects between the 3 × 3 and 5 × 5 median filters is presented in
Figure 4.
Meanwhile, the selection of window size for median filtering also takes into account the implementation efficiency of FPGA. The 3 × 3 window can efficiently utilize the line cache architecture to achieve pipeline processing, while larger windows significantly increase BRAM consumption and logical latency.
3. FPGA Design and Implementation
The FPGA implementation is not a direct mapping of the algorithm but a hardware-efficient co-design that introduces several key optimizations for real-time performance.
3.1. Hardware System
The imaging system in this work utilizes HGFET-based high-sensitivity SWIR detector for image acquisition. The readout circuit is composed of an Analog Front-End (AFE) and an ADS850 converter. An FPGA chip from the Altera Cyclone IV series serves as the core processor, and a host computer system, built on the Pycharm 5.0.3 platform, is responsible for image display. A block diagram of the hardware implementation architecture is presented in
Figure 5.
As illustrated, the core sensing element of the system is the carbon-based HGFET detector, which is responsible for converting incident infrared radiation into primary electrical signals. The detector control module operates in coordination with the AFE control module to provide stable biasing and manage the detector’s operating state. Depending on varying illumination conditions, the detector performs dynamic gate voltage regulation through the infrared signal preprocessing module, which in turn delivers control signals to the DAC post-processing circuit for precise bottom-gate voltage configuration. The conditioned signals are subsequently acquired by the AFE readout circuit and digitized through the ADC conversion circuit, with the timing precisely controlled by the ADS850 control module. The digitized image data are then stored in the RAM storage, where they are processed by the image enhancement algorithm module to improve overall image quality. The enhanced data are finally transmitted to the host computer via the UART data transmission module for visualization and image reconstruction. The entire system’s operational flow is centrally managed by the timing and reset module, ensuring coordinated and orderly operation among all subsystems. Additionally, the key detection module initiates system startup by monitoring external keypress events, while the parameter configuration module enables configurable control of AFE functionalities through predefined control words. Together, these two modules provide a flexible human–machine interface and parameter configuration capability. This architecture achieves full-chain integration encompassing signal sensing, conditioning, digitization, processing, transmission, and display.
The signal output of the 64 × 64 infrared detector array is influenced by both the source–drain bias voltage (V
ds) and the laser power density [
17]. Under fixed bias condition, varying the laser power density generates different output currents that are fed into the AFE. The FPGA provides synchronized control signals for both the AFE and ADC, converting the detector’s output current into voltage and subsequently into 14-bit digital grayscale data. Data transmission is carried out via a UART protocol through a USB interface to the PC, where a Python-based (version 5.0.3) host application handles data storage and real-time image display.
To enhance system imaging quality and sensitivity, a digital-to-analog conversion circuit based on the MCP4725 DAC and its corresponding FPGA control module are designed. Because the HGFET detector exhibits distinct electrical characteristics under different gate voltages [
3], the FPGA dynamically controls the DAC to generate an adjustable back-gate voltage suitable for various illumination conditions. Under low-light or high-intensity illumination, the back-gate voltage is maintained within the range of −0.6 V to −0.3 V to preserve high gain and sensitivity. This scheme ensures strong responsiveness in dim conditions while avoiding saturation under strong illumination. In normal lighting conditions, the gate voltage is tuned between −1.2 V and −1.3 V, allowing the detector to operate in a linear regime where a stable proportional relationship between optical input and photocurrent is maintained. Meanwhile, by adjusting the AFE integration time, the system ensures that the maximum input current corresponds to the maximum grayscale output, thereby optimizing quantization efficiency across the full dynamic range.
Secondly, the FPGA architecture embodies critical design trade-offs to maximize efficiency under resource constraints. We optimized throughput by implementing a deep pipeline, accepting a marginal increase in latency to achieve a pixel-per-clock throughput and the resulting high frame rate. Furthermore, an in-place update dataflow was adopted, which trades off increased design complexity for a significant reduction in block RAM consumption. This co-optimized structure ensures that the design meets real-time processing demands while maintaining resource efficiency on the target platform. In addition, we accurately cropped the bit width of the internal data path while ensuring no significant decrease in image quality. For example, in the calculation of statistical moments, the bit width of some intermediate results is reduced from 32 bits to 24 bits. This optimization reduces the consumption of DSP blocks and LUTs, while the impact on the final enhancement effect can be ignored, reflecting the accuracy optimization under resource constraints.
The 64 × 64 readout circuit consists of a custom-designed mixed-signal AFE chip developed by our research group, together with an ADS850 converter and associated control modules. The AFE chip is shown in
Figure 6.
The AFE features dual-mode operation, which is switchable between Integrate-Then-Read (ITR) and Integrate-While-Read (IWR). Their corresponding timing diagrams are shown in
Figure 7 and
Figure 8, respectively.
Clk represents the system master clock with a frequency set to 20 MHz; rst_n is an active-low asynchronous hardware reset signal; fsync denotes the frame synchronization signal, whose low-level duration (6.5 μs or 36 μs) corresponds to the capacitor integration phase, and the falling edge of this signal marks the beginning of a new frame. lsync serves as the line synchronization signal, remaining high for 20 clock cycles to indicate the start of a new line readout. data_valid is a data-valid indicator that transitions to a high level one clock cycle after the rising edge of col_data_out<n>, remaining high until col_data_out<n> falls. data_flag functions as the sampling signal for the AD module, synchronized with the rising edge of data_valid, and is pulled low after maintaining a high level for one-fifth of a clock cycle.
In addition to input and output, all other signals are internal timing signals, which are generated by the digital module and transmitted to the analog parts to coordinate on-chip analog-logic operations.
Furthermore, the system incorporates a 39-bit serial control word signal (ctrl_data). The most significant bit serves as the start flag, and the second bit is used for soft-reset control. The subsequent six bits define the operating mode configuration field. The following seven bits, after being quadrupled, specify the number of hold cycles for the column data output. The final four six-bit fields, respectively, define the start and end addresses for the windowing operation in both the row and column directions. Each bit of the control word must be maintained for eight clock cycles (i.e., 400 ns), and the entire control word must be fully transmitted within 39 × 8 + 10 clock cycles before the falling edge of the external fsync signal. Additionally, the transmission start time must not occur earlier than the rising edge of the previous fsync signal. A schematic diagram of the control word waveform is shown in
Figure 9.
The schematic diagram of the control signals between FPGA, AFE, and ADC is shown in
Figure 10. The FPGA needs to provide a total of five control signals to the AFE, while simultaneously supplying clock to ADC to ensure coordinated operation between AFE and itself. The 14-bit parallel output data and flag signals from ADC are received and processed by FPGA.
3.2. Image Enhancement Algorithm Module
The FPGA-based image enhancement algorithm primarily consists of six core modules: the computation unit, correction of defective rows and columns, median filtering, Gamma correction, CLAHE enhancement, and background noise separation. Among them, the computation unit is responsible for the real-time calculation of key image statistical features, including the mode, variance, maximum value, and mean.
Upon system initialization, the background image data p0 is loaded into RAM0 from the img_bg.mif file. The computation unit extracts the mode c0, which is then used to correct defective rows and columns in the raw image p1. Considering that the detector output typically contains substantial salt-and-pepper noise, a median filter is applied to p2 to prevent noise amplification during subsequent enhancement processes, producing the filtered image p3. The module simultaneously computes the maximum value c3 and variance c4 of the image, which are used, respectively, for Gamma correction control and background noise separation threshold determination. When c4 is greater than the threshold T, choose statistical moment analysis to separate background noise, otherwise use OTSU threshold segmentation to separate.
Subsequently, the CLAHE algorithm is employed to further enhance image contrast. The computation unit then extracts the mean
c1 and standard deviation
c2 of the enhanced image
p5, which serve as parameters for statistical moment analysis in background noise separation. The resulting background-free image
p6 is processed through an additional median filtering stage to eliminate any residual impulse noise, yielding the final enhanced image
p7. The algorithm flowchart is shown in
Figure 11.
Regarding data flow scheduling, the raw image data (pic_data) is converted from parallel to serial format and written into RAM1. The output of each processing module is written back to RAM1 for in-place updates. Once the entire frame is processed, the enhanced image data p7 (ram_data) is transmitted to the host computer via the UART communication protocol.
The architecture of the FPGA-based image enhancement module is illustrated in
Figure 12, with major port descriptions summarized in
Table 1.
4. Experimental Results and Analysis
The experiments are conducted on a computer running Windows 10 that is equipped with an Intel(R) Core(TM) i7-9750H 2.60 GHz CPU and 16 GB of RAM. To verify the accuracy and effectiveness of the proposed algorithm, both software simulation and hardware verification are performed. Algorithm simulations are carried out using PyCharm 5.0.3 and ModelSim-Altera 18.0, followed by FPGA board-level validation.
In the experiments, two types of imaging scenes are selected: the “PKU” character pattern and the Boya Pagoda, both captured using the HGFET high-sensitivity SWIR detector. The original images and the images processed by the proposed image enhancement algorithm are compared under various laser power densities to evaluate performance improvements.
Both subjective visual assessment and objective quantitative evaluation are conducted to analyze the enhancement results. The comparison results for the two imaging scenarios are shown in
Figure 13 and
Figure 14, respectively. The axes represent the pixel count.
To further verify the effectiveness, a comparative study is conducted under the same laser power density of 475 nW/nm
2. Four representative infrared image enhancement algorithms, including HE, LCE, top-hat transformation, and Gaussian weighted local thresholding (GWLT), are selected for comparative analysis. The comparison results of the “PKU” character pattern under different algorithms are shown in
Figure 15. The axes represent the pixel count. The color bar indicates that higher grayscale values correspond to brighter image intensity.
Human visual perception serves as an important criterion for evaluating image quality [
18]. As shown in
Figure 13 and
Figure 14, the original images (a) exhibit noticeable defective rows and columns, which remain consistent regardless of image content. When laser power density falls below 500 nW/nm
2 or exceeds 200 μW/nm
2, the images show low contrast and insufficient sensitivity in low-light conditions, or local overexposure in high-illumination regions. After processing with the proposed algorithm, the defective rows and columns are effectively removed while preserving the original texture structure. The contrast and sensitivity under dark conditions are significantly enhanced, overexposure in bright regions is suppressed, and background noise is reduced, thereby improving the overall perceptibility of meaningful information.
The comparison results in
Figure 15 demonstrate that although the traditional histogram equalization method improves global contrast, it also amplifies background noise, leaving substantial noise residues after subsequent denoising processes. The local contrast enhancement algorithm offers certain noise suppression capability, but it performs poorly in preserving target structures and exhibits limited contrast improvement. The top-hat transformation preserves target details better; however, it fails to completely remove background noise, and its overall enhancement effect remains weak. The Gaussian weighted local thresholding method effectively enhances image contrast but has limited denoising performance, causing prominent white-edge artifacts along target boundaries that degrade visual quality. In contrast, the proposed algorithm not only significantly enhances the contrast in target regions but also effectively suppresses background noise while maintaining image detail integrity.
From subjective visual evaluation perspective, the images processed exhibit superior overall visual quality compared with both the original and other enhanced images, demonstrating its outstanding comprehensive enhancement performance.
From objective evaluation standpoint, the image with defective rows and columns removed is used as the reference image. The effectiveness of the proposed algorithm is quantitatively assessed using four indicators: Average Gradient (AG), Information Entropy (IE) [
19], Average Contrast (AC) [
20], and SNR. AG reflects image sharpness and detail richness; a higher gradient value indicates a clearer image [
4]. IE measures the randomness or uncertainty of gray-level distribution in the image. A higher entropy indicates richer information content, more complex textures, and greater detail, while a lower entropy suggests smoother or more uniform regions [
21]. AC quantifies the luminance difference between the brightest and darkest regions of the image. A higher contrast implies a clearer, more vivid image, whereas lower contrast results in dull or blurred visuals. SNR evaluates image quality by comparing the power of useful signals to that of background noise. A higher SNR indicates better signal quality and less noise interference.
Under the same optical power density, select the original image with no bad rows and columns and compare it with the processed image by taking 10 consecutive frames, and calculate the average ± error margins. The comparative results are shown in
Table 2,
Table 3,
Table 4 and
Table 5.
As shown in the table above, the proposed multi-stage enhancement algorithm achieves significant improvements in the objective metrics, including AG, IE, and SNR. These gains are not incidental but arise from the purposefully designed processing stages of the algorithm. The improvement in AG primarily stems from the preceding defective row-column correction and median filtering, which provide a “clean” input for the CLAHE module, ensuring that local contrast enhancement acts on genuine structural details rather than noise. The increase in IE indicates that the adaptive gamma correction and CLAHE effectively expand the dynamic range and local information content of the image. The enhancement of SNR is mainly attributed to the variance-driven adaptive thresholding strategy, which intelligently distinguishes complex background variations from true targets, thereby achieving more accurate noise suppression.
The proposed algorithm is implemented on an Altera Cyclone IV E FPGA (model EP4CE10E22C8). To verify its practical feasibility, the algorithm’s performance is analyzed from three perspectives: comparison between software and hardware processing results, system clock cycles, and hardware resource utilization.
Figure 16 presents the physical setup of the infrared readout system designed for the HGFET high-sensitivity detector. The FPGA serves as the core control module, which generates control signals for both the HGFET detector and AFE chip.
As shown in
Figure 17, where (a) represents the original image, (b) the Python-based simulation result, and (c) the FPGA-implemented output, the hardware-processed image demonstrates a high degree of consistency with the software simulation, thereby validating the correctness and effectiveness of the FPGA-based implementation.
Based on measurements obtained through ModelSim-Altera simulation, the single-frame processing time of the infrared readout system incorporating the image enhancement algorithm is shown in
Table 6.
The resource utilization of the proposed algorithm, after synthesis using Quartus Prime 18.0, is detailed in
Table 7.
According to the data presented in the previous table, synthesis results show that the theoretical maximum clock frequency (Fmax) of the design reaches 66.7 MHz. The resource utilization is summarized as follows: 2066 LUTs (20%), 1174 FFs (11%), and 4 DSP blocks (accounting for 9% of the 46 embedded 18-bit multipliers). For images with a resolution of 64 × 64, the design achieves a maximum processing throughput of 1686 fps. Under the same image enhancement task, the proposed FPGA implementation delivers an approximately 42× speedup over the optimized OpenCV implementation running on a computer (40 fps). In terms of energy efficiency, the estimated energy consumption per frame on the FPGA is 0.08 mJ, which is significantly lower than the CPU’s 1.5 mJ, representing more than an 18× improvement.
The proposed system achieves the actual maximum operating frequency of 1706 Hz under the FPGA resource constraints. In comparison, the 64 × 64 infrared imaging system reported in [
25] operates at 100 Hz. The proposed design thus demonstrates a significant improvement in frequency performance, far exceeding the 60 Hz refresh rate threshold required for human visual perception. In addition, a detailed power estimation was conducted using the Quartus PowerPlay Power Analyzer with an accurately generated Signal Activity File (SAF). Under typical operating conditions, the FPGA exhibits a total power consumption of 34 mW, demonstrating a clear advantage in energy efficiency compared with the referenced algorithms. This result indicates that the system is fully capable of meeting the stringent real-time image and video processing requirements of modern infrared imaging applications.
Nevertheless, this study has several limitations. First, the scalability of the algorithm to higher-resolution imagery is constrained by on-chip memory and computational resources; future work will investigate more efficient line-buffer architectures. Second, the algorithm is currently optimized for HGFET detectors, and adapting it to other sensor types requires re-calibration of key parameters. Developing an online self-calibration mechanism constitutes an important direction for improving generality. Finally, the robustness of the algorithm has been primarily validated under laboratory conditions. Its performance under more complex real-world scenarios—such as rapidly varying ambient illumination—may require the incorporation of more advanced adaptive mechanisms or evaluation on a broader and more diverse dataset.
5. Conclusions
In summary, this paper addresses the core challenges of weak photoresponse and low image contrast in carbon-based HGFET SWIR detectors under low-light conditions by proposing a dedicated image enhancement algorithm and implementing it on FPGA hardware.
The key insight stemming from our work is that a co-design strategy, which tightly couples algorithm design with hardware implementation constraints, is paramount for unlocking the full potential of novel detectors in practical systems. The proposed algorithm demonstrates several notable advantages. First, the defective row-column correction and background noise separation mechanisms effectively suppress fixed-pattern noise, providing a high-quality preprocessing foundation for weak signal extraction. Second, the adoption of an adaptive Gamma correction and exposure-constrained histogram equalization strategy significantly enhances image contrast while preventing over-enhancement and detail loss. Finally, the variance-based dynamic threshold selection mechanism improves the adaptability of background noise separation across different imaging scenarios, ensuring the reliable extraction of weak target features.
Comprehensive verification is conducted from three perspectives: subjective visual quality, objective performance, and system practicality. Comparative analyses in terms of AG, AC, IE, and SNR confirm the algorithm’s superiority in terms of both visual quality and quantitative performance. Furthermore, evaluation of the single-frame processing time and FPGA resource utilization demonstrates the system’s feasibility in terms of frame rate, power efficiency, and hardware resource allocation.
Experimental results validate that the proposed solution provides a high-performance and real-time processing framework for the next generation of HGFET-based low-light infrared imaging systems.
Our findings not only demonstrate a significant performance improvement but also provide a scalable architectural blueprint for embedding real-time intelligence into next-generation infrared imaging platforms. The principles of adaptive processing and resource-aware design established here are expected to be applicable to a broader class of high-sensitivity sensor systems.