HT-NRC: A High-Throughput and Noise-Resilient Lossless Image Compression Architecture for Deep-Space CMOS Cameras

Wu, Haoyu; Bai, Yonglin; Gao, Jiarui

doi:10.3390/app16062873

Open AccessArticle

HT-NRC: A High-Throughput and Noise-Resilient Lossless Image Compression Architecture for Deep-Space CMOS Cameras

by

Haoyu Wu

^1,2,

Yonglin Bai

^1,*

and

Jiarui Gao

¹

State Key Laboratory of Ultrafast Optical Science and Technology, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(6), 2873; https://doi.org/10.3390/app16062873

Submission received: 5 February 2026 / Revised: 26 February 2026 / Accepted: 4 March 2026 / Published: 17 March 2026

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Lossless image compression is pivotal for deep-space exploration. Considering the requirements of deep-space exploration for a high compression ratio and real-time processing, traditional image compression algorithms have garnered significant attention. However, existing algorithms struggle with real-time processing speed and compression degradation in high-noise regions, failing to meet the throughput demands of next-generation sensors. To address these challenges, this paper proposes a high-throughput and noise-resilient lossless image compression architecture, named HT-NRC, for deep-space CMOS cameras. First, to overcome the throughput bottleneck, we introduce a parallel processing method, which is built on index-based dispatch and Reorder mechanism. This is achieved by dynamically distributing pixel streams into parallel cores and utilizing a Reorder Buffer for sequence restoration. Second, to mitigate low compression efficiency in noisy backgrounds, we present a Heterogeneous Dual-Path Coding scheme. This system adaptively separates structural information for predictive coding and stochastic noise for raw packing with Bit-Plane Slicing (BPS) strategy. The proposed architecture was implemented on a Xilinx Virtex-7 FPGA (Xilinx, Inc., San Jose, CA, USA). Operating at 100 MHz, the system achieves a processing throughput of 414.7 Mpixel/s and a high average compression ratio under deep-space image datasets, while consuming an estimated total on-chip power of only 2.1 W. Experimental results show that our proposed method substantially outperforms existing baseline methods. Specifically, compared to the optimized serial JPEG-LS implementation processing one pixel per clock cycle, our parallel architecture achieves an approximately 314.7% increase in processing throughput.

Keywords:

lossless image compression; FPGA; high-speed CMOS; deep space exploration; JPEG-LS; parallel architecture; noise expansion

1. Introduction

In the realm of modern deep-space exploration, high-resolution optical payloads have become indispensable for capturing detailed scientific data, ranging from planetary surface topography to faint astronomical phenomena. To acquire high-fidelity observations, these missions increasingly employ advanced imaging sensors characterized by high spatiotemporal resolution and wide dynamic range [1,2]. However, this pursuit of precision inevitably leads to an exponential surge in raw data generation. Given the severe asymmetry between the massive acquisition rates of these next-generation sensors and the extremely limited downlink bandwidth of deep-space probes [3], coupled with the strict power budgets that constrain onboard storage, the system faces a critical “transmission wall.” For instance, a state-of-the-art CMOS sensor targeted for deep-space exploration features a resolution of 2048 × 2048, a maximum frame rate of 300 fps, and a 12-bit depth. Operating at peak capacity, it generates an overwhelming raw data rate of over 15 Gbps. However, the continuous transmission bandwidth of standard data acquisition interfaces is highly constrained, typical Camera Link interfaces often cap at 2.0 to 3.2 Gbps. Furthermore, saving uncompressed raw data consumes approximately 1.88 GB of onboard storage every single second, meaning that a capacity-limited, space-grade solid-state recorder would be completely exhausted in just a few minutes. This severe mismatch between massive data generation, narrow interface bandwidth, and restricted storage capacity inevitably leads to data congestion and frame dropping during real-time observations. Consequently, deploying real-time, lossless image compression logic directly at the detector front-end has evolved from a performance optimization into a mission-critical necessity to guarantee the integrity and continuity of scientific data retrieval [4].

Conventional lossless image compression algorithms typically fall into three primary categories [5]: entropy coding, which targets statistical redundancy; predictive coding, which exploits inter-pixel correlations; and transform coding, which addresses global spatial redundancy. While these paradigms excel in computational efficiency and hardware implementability, they often struggle to balance high throughput with compression robustness when processing scientific imagery characterized by high-entropy noise. Image compression algorithms based on deep learning typically exhibit high compression ratios. These methods employ neural networks, such as Convolutional Neural Networks (CNNs) or Transformers, to capture complex non-linear features. However, these data-driven approaches require massive computational power and memory. Given the strict real-time constraints and limited logic resources of radiation-hardened FPGAs, researchers still prefer traditional algorithms for real-time processing implementation.

Due to its high compression ratio, the JPEG-LS standard is widely applied to images from deep-space exploration [6]. Its popularity stems from three key advantages: ultra-low computational complexity, minimal memory usage, and high efficiency. However, when processing raw data from high-speed CMOS sensors, the standard JPEG-LS method faces two critical limitations: First, the algorithm inherently resists parallel processing. Its core context modeling relies on immediate feedback from previous pixels to update prediction parameters [7]. This strict recursive dependency prevents simultaneous multi-pixel processing. Consequently, it fails to fully exploit the parallel acceleration capabilities of FPGAs. Second, compression efficiency degrades substantially in noisy image regions [8]. JPEG-LS operates by predicting values based on adjacent pixel correlations. However, noise typically manifests as a random and irregular distribution. This unpredictability disrupts the prediction model, resulting in low compression ratios for these areas.

To overcome these challenges, this paper proposes a Content-Aware Heterogeneous Compression Architecture, named HT-NRC. First, to solve the throughput bottleneck, we introduce an Index-Based Dispatch and Reorder mechanism. We distribute pixels to parallel processing cores based on their index, which effectively breaks serial dependencies. At the output stage, a Reorder Buffer (ROB) restores the data to the correct input order. Second, to improve compression ratio in noisy regions, we propose a Heterogeneous Dual-Path Coding scheme. We use Bit-Plane Slicing (BPS) to separate pixel data into two parts: Most Significant Bits (MSBs) for structural information and Least Significant Bits (LSBs) for random noise. We compress the MSBs using an optimized predictive encoder. Meanwhile, we directly pack the LSBs to avoid data expansion. The overall workflow is: the system dispatches raw pixels to parallel cores, splits the signal into structure and noise paths, and finally re-aligns the processed packets using the ROB. Experiments show that our method achieves the high throughput required for CMOS sensors and effectively compresses images with high background noise. Specifically, this work proposes an N-pixel/cycle content-aware pipeline with a provably lossless Bit-Plane Slicing (BPS) path and Reorder Buffer (ROB), achieving a throughput of over 400 Mpixel/s on a Virtex-7 FPGA.

Our main contributions are summarized as follows:

We propose a high-throughput, content-aware lossless image compression architecture specifically tailored for deep-space exploration payloads. This architecture effectively addresses the dual challenges of extreme processing speed requirements and efficiency degradation in high-noise images.
We introduce an index-based dispatch and reorder mechanism to overcome the inherent serial dependencies of the JPEG-LS algorithm. By decoupling context modeling from pixel order and restoring the sequence via hardware logic, this design enables scalable parallel processing on FPGAs.
We develop a heterogeneous dual-path coding scheme based on Bit-Plane Slicing (BPS). This method adaptively separates structural information from stochastic noise, applying predictive coding only in effective regions, substantially improving compression ratios in noisy image regions.
We implement the proposed system on a Xilinx Virtex-7 FPGA. Experimental results demonstrate that our method achieves a throughput of 414.7 MPixel/s and a high average compression ratio. Furthermore, our method substantially outperforms existing solutions in both speed and robustness. Furthermore, post-implementation analysis estimates a total on-chip power consumption of 2.1 W, demonstrating its high energy efficiency for power-constrained deep-space payloads.

The remainder of this paper is organized as follows. Section 2 reviews the related work on conventional and learning-based image compression algorithms, particularly focusing on their applications in deep space. Section 3 details the proposed HT-NRC methodology, encompassing the standard JPEG-LS algorithm, the content-aware parallel classification and allocation strategy, the noise-resilient bit-plane slicing mechanism, and the reorder buffer. Section 4 presents the experimental settings, comprehensive performance evaluations, and ablation studies. Finally, Section 5 provides the conclusions, alongside a discussion of the architecture’s limitations and directions for future work.

2. Related Work

To contextualize the proposed HT-NRC architecture, this section categorizes existing literature into three distinct domains: traditional standard algorithmic families, learning-based image compression, and image compression for deep-space images.

2.1. Image Compression Algorithms

Entropy coding, predictive coding, and transform coding constitute the three primary pillars of conventional lossless image compression. Entropy coding reduces statistical redundancy by modeling symbol probabilities. While older formats like PNG rely on simple pre-processing filters combined with Deflate (LZ77 and Huffman coding), modern standards like JPEG XL integrate advanced decision-tree context models (MANIAC) and Asymmetric Numeral Systems (ANS) to achieve superior throughput [9]. At the other spectrum, low-complexity formats like QOI utilize hash-indexed streaming to deliver ultra-fast speeds for real-time applications [10]. Predictive coding minimizes information entropy by exploiting inter-pixel correlations. Complex methods like CALIC [11] and TMW [12] use adaptive gradient adjustments or multi-pass global optimization to maximize compression ratios. To balance performance with hardware complexity, the JPEG-LS standard [13] employs a simpler LOCO-I algorithm with Median Edge Detection (MED), making it highly suitable for onboard implementation. Finally, transform coding addresses global spatial redundancy. Standards like JPEG 2000 [14] use Integer Wavelet Transforms to support features like multi-resolution display, while JPEG XR [15] utilizes Lapped Biorthogonal Transforms to efficiently handle high-dynamic-range imaging. Collectively, these methods often face trade-offs between computational complexity and compression efficiency.

2.2. Learning-Based Image Compression

Concurrently, the rapid evolution of deep learning has promoted the application of numerous learning-based image compression algorithms [16]. To address bandwidth bottlenecks, researchers have explored FPGA-based acceleration for these models. For instance, Nakahara et al. [17] proposed a high-throughput CNN inference scheme using custom JPEG compression, achieving a 3.84× throughput increase via a fully pipelined architecture, though its block-based nature limits the compression ratio. Wang et al. introduced AsymLLIC [18], an asymmetric architecture that uses progressive training to create a lightweight decoder (19.65 M parameters) suitable for low-power terminals. Similarly, Mazouz et al. [19] optimized deployment using knowledge distillation and structured pruning, achieving 58.7 FPS on a ZCU102 FPGA (Xilinx, Inc., San Jose, CA, USA). While these data-driven methods offer superior compression ratios and reconstruction quality compared to traditional algorithms, their computational demands remain prohibitively high for real-time process applications. Even with pruning and quantization, Learned Image Compression (LIC) models require thousands of DSP units for massive matrix operations. In stark contrast to hardware-friendly algorithms like JPEG-LS, deep learning approaches exhibit suboptimal power and area efficiency. The massive parameter scale and memory bandwidth requirements pose significant challenges for radiation-hardened FPGAs, where logic resources and power budgets are strictly constrained.

2.3. Image Compression for Deep-Space Image

High processing throughput is a fundamental prerequisite for realizing real-time image compression in deep-space camera systems. Researchers generally prioritize conventional low-complexity algorithms as the primary solution. Santos et al. [20] designed and implemented technology-independent IP cores compliant with the CCSDS 121.0-B-2 standard [21] for multispectral and hyperspectral satellite imagery. Their implementation, capable of supporting configurable compression orders (BSQ/BIP/BIL), achieved a maximum throughput of 153.5 MSamples/s on a Virtex-5 device and is adaptable to space-qualified FPGAs. Li et al. [22] proposed a real-time lossless compression system for Bayer pattern images based on a modified JPEG-LS algorithm, achieving a maximum throughput of 346.41 MPixel/s on a Xilinx XC7Z045 SoC. Machairas and Kranitis [23] proposed a high-performance architecture for the 9/7M Integer Discrete Wavelet Transform (DWT) compliant with the CCSDS 122.0-B-1 standard [24], adopting parallel design and elastic pipeline principles to achieve a throughput of 2 samples/cycle.

Existing approaches often struggle to balance high processing throughput with robust compression performance in the presence of sensor noise. To address the throughput bottleneck, we propose an Index-Based Dispatch and Reorder architecture to enable scalable hardware parallelism. Simultaneously, to resolve the efficiency degradation in noisy regions, we design a Heterogeneous Dual-Path Coding scheme that utilizes Bit-Plane Slicing to adaptively decouple signal from noise.

3. Proposed Methodology

This chapter details the architectural framework of the proposed HT-NRC. The top-level hardware architecture is illustrated in Figure 1, delineating the end-to-end superscalar pipeline from parallel pixel ingestion to serialized bitstream generation. The comprehensive workflow is organized into four logical stages: Section 3.1 introduces the Standard JPEG-LS Algorithm as the core compression engine for high-signal regions; Section 3.2 describes the Content-Aware Parallel Classification and Allocation system designed to resolve context dependencies; Section 3.3 details the Noise-Resilient Bit-Plane Slicing mechanism for efficient background noise processing; Section 3.4 elucidates the Reorder Buffer mechanism, which manages out-of-order execution to ensure the strict consistency of the output stream; and finally, Section 3.5 presents the corresponding decoder architecture to validate the bit-exact, strictly lossless reconstruction of the generated bitstream.

3.1. Standard JPEG-LS Algorithm

To process the spatially correlated Spot Regions containing critical scientific data, we employ the standard JPEG-LS algorithm to guarantee strict lossless compression. This choice exploits the local smoothness of the spot signal to maximize decorrelation efficiency. The fundamental working principles of the LOCO-I algorithm, which underpins the JPEG-LS standard, are detailed as follows.

The encoding process initiates with the elimination of spatial redundancy using a non-linear predictor. Let

x_{i}

be the current pixel, and

R_{a}, R_{b}, R_{c}

be its reconstructed neighbors (left, up, and up-left, respectively). The Median Edge Detector (MED) predicts the value of

x_{i}

by detecting vertical or horizontal edges in the local neighborhood, as shown in Equation (1).

P_{MED} = \{\begin{array}{l} m i n (R_{a}, R_{b}), & if R_{c} \geq m a x (R_{a}, R_{b}) \\ m a x (R_{a}, R_{b}), & if R_{c} \leq m i n (R_{a}, R_{b}) \\ R_{a} + R_{b} - R_{c}, & otherwise \end{array}

(1)

This predictor implicitly assumes a local plane model (

x = R_{a} + R_{b} - R_{c}

) in smooth regions, while clamping the prediction to the nearest neighbor in the presence of sharp edge transitions, thereby minimizing the energy of the prediction error. To effectively capture local texture characteristics and exploit conditional probability models, the algorithm must first compute local gradients and map them to a unique context index

Q

. The system calculates three local difference gradients within the causal neighborhood, as shown in Equation (2).

\begin{matrix} D_{1} = R_{d} - R_{b} \\ D_{2} = R_{b} - R_{c} \\ D_{3} = R_{c} - R_{a} \end{matrix}

(2)

Since the dynamic range of the raw gradients

D_{j}

is too large for direct indexing, they are quantized into low-range integers

Q_{j} \in {- 4, \dots, + 4}

using a set of predefined thresholds

{T_{1}, T_{2}, T_{3}}

. The quantization function

Q (\cdot)

is defined as Equation (3).

Q_{j} = \{\begin{array}{l} - 4, & if D_{j} \leq - T_{3} \\ - 3, & if - T_{3} < D_{j} \leq - T_{2} \\ - 2, & if - T_{2} < D_{j} \leq - T_{1} \\ - 1, & if - T_{1} < D_{j} < 0 \\ 0, & if D_{j} = 0 \\ + 1, & if 0 < D_{j} \leq T_{1} \\ + 2, & if T_{1} < D_{j} \leq T_{2} \\ + 3, & if T_{2} < D_{j} \leq T_{3} \\ + 4, & if D_{j} > T_{3} \end{array}

(3)

The resulting triplet

(Q_{1}, Q_{2}, Q_{3})

constitutes 729 possible states. To reduce the context memory size, the state space is collapsed by exploiting the symmetry of image gradients. A sign control variable

s

is defined such that if the first non-zero quantized gradient is negative, the signs of all gradients are flipped. The final unique context index

Q

, compressed to the range

[0, 364]

, is generated via mixed-radix mapping, as shown in Equation (4).

Q = 81 \cdot Q_{1} + 9 \cdot Q_{2} + Q_{3}

(4)

This index

Q

points to a specific entry in the global context memory, which stores cumulative prediction error statistics (

C [Q]

), occurrence counts (

N [Q]

), accumulated absolute errors (

A [Q]

), and bias correction values (

B [Q]

). To eliminate systematic errors, the MED prediction is refined using the historical average bias of the current context, shown in Equation (5).

P_{corrected} = P_{MED} + s \cdot B [Q]

(5)

This feedback loop ensures that systematic offsets caused by specific sensor characteristics or spot intensity distributions are adaptively cancelled out over time. Simultaneously, the context model determines the optimal Golomb-Rice encoding parameter

k

for the current context, derived from the average absolute error estimate to control the code length in the subsequent stage, as shown in Equation (6).

k = m i n \{k^{'} \in Z ∣ 2^{k^{'}} \cdot N [Q] \geq A [Q]\}

(6)

The final arithmetic step involves calculating the prediction residual

ϵ = x_{i} - P_{corrected}

(typically computed modulo

2^{W}

to handle boundary conditions) and mapping it to a non-negative integer. Since standard entropy coders operate on non-negative values, the signed error

ϵ

is transformed using the interleaved mapping function defined in the standard, as shown in Equation (7).

M (ϵ) = \{\begin{array}{l} 2 ϵ, & ϵ \geq 0 \\ 2 | ϵ | - 1, & ϵ < 0 \end{array}

(7)

Upon completion, the context variables

A [Q], B [Q], C [Q], N [Q]

are updated based on the current

ϵ

to provide more accurate statistical support for future occurrences of the same texture.

3.2. Content-Aware Parallel Classification and Allocation

As illustrated in Figure 2, this architecture ingests N pixels per cycle in parallel. It first employs a gradient-based region identification algorithm to separate valid structural signals from sensor noise with minimal latency. Using the causal neighborhood of the current pixel

x_{i}

, let

R_{a}, R_{b}, R_{c},

and

R_{d}

denote the reconstructed pixels located to the left, above, top-left, and top-right of the current position, respectively. The system calculates the local texture complexity

D_{l o c a l}

, according to Equation (8).

D_{l o c a l} (x_{i}) = | R_{d} - R_{b} | + | R_{b} - R_{c} | + | R_{c} - R_{a} |

(8)

To generate the control mask

M (x_{i})

used for subsequent distribution, the gradient metric is compared against an adaptive threshold derived from the sensor’s noise characteristics. Let

σ_{s e n s o r}

be the inherent standard deviation of the sensor noise and

γ

be the confidence coefficient. The classification decision function is defined as Equation (9).

M (x_{i}) = \{\begin{array}{l} 0 (N o i s e), & if D_{l o c a l} (x_{i}) < γ \cdot σ_{s e n s o r} \\ 1 (Spot), & if D_{l o c a l} (x_{i}) \geq γ \cdot σ_{s e n s o r} \end{array}

(9)

In practical on-orbit operations, the baseline noise standard deviation

σ_{s e n s o r}

is not statically hardcoded but periodically updated. It is dynamically estimated utilizing the sensor’s optical black (OB) pixels at the edge of the CMOS array, or calculated during routine dark-frame calibration phases. This ensures the threshold reliably adapts to temperature fluctuations and radiation-induced dark current degradation over the mission lifetime. The confidence coefficient

γ

is a pre-configured empirical parameter that controls the sensitivity of the mask. A higher

γ

severely restricts the spot path to only strong structural gradients, thereby minimizing False Positives (noise falsely identified as structure) but potentially increasing False Negatives (weak structural edges bypassed to the noise path).

The allocation unit distributes pixels to the target processing core based on the classification mask

M

. The strategy dynamically switches scheduling logic based on the value of mask

M

. For noise pixels (

M = 0

), a stateless round-robin schedule is adopted to maximize throughput. For spot pixels (

M = 1

), the system utilizes a look-ahead computation module to pre-calculate the context index

Q

. It then applies a Bit-XOR strategy for mapping to address the load imbalance caused by the distribution of gradients. The unified allocation function is expressed as Equation (10).

C o r e_{t a r g e t} (P i) = \{\begin{array}{l} i (m o d N_{b g}), & if M (x_{i}) = 0 \\ (Q_{l o w} \oplus Q_{h i g h}) (m o d N_{g r o u p s}), & if M (x_{i}) = 1 \end{array}

(10)

where

Q_{h i g h}

and

Q_{l o w}

represent the upper and lower halves of the context index

Q

, respectively. Specifically, assuming

Q

is mapped to a 10-bit integer, it is symmetrically split such that

Q_{h i g h} = Q [9 : 5]

and

Q_{l o w} = Q [4 : 0]

. The rationale behind this XOR-based splitting logic (

Q_{l o w} \oplus Q_{h i g h}

) is to implement a lightweight, single-cycle hardware spatial hashing mechanism. In typical remote sensing images, contiguous pixels often share identical or highly correlated context states, which would naturally lead to severe localized congestion if a linear modulo assignment were used. By XOR-folding the high and low bits of

Q

, the architecture effectively breaks these regular spatial strides, pseudo-randomly distributing the context mappings across the available processing groups. This substantially mitigates structural hazards and guarantees a highly balanced workload distribution among the parallel cores.

To elucidate the scheduling behavior of this algorithm under mixed workloads, we present a practical case study illustrated in Figure 3, where spot encoding resources are divided into N processing groups. Consider 4 input pixels [P0, P1, P2, P3]. Through look-ahead computation, P1 is identified as a noise pixel, while the others are spot pixels with context index

Q (P 1) = 5

,

Q (P 2) = 5

, and

Q (P 3) = 37

.

The allocation unit allocates the pixel P1 to the noise cores based on its mask. Simultaneously, the system maps the remaining spot pixels. For P3, the index

Q = 37

maps to processing group

G r o u p_{5}

; since this group is idle,

P_{3}

is immediately assigned an encoder and processed in parallel with P1. A resource conflict arises for P0 and P2, two pixels are simultaneously assigned to the second group

G r o u p_{5}

. To resolve this conflict, the scheduler performs a precise comparison of their context indices. Upon detecting that

Q (P 0)

and

Q (P 2)

are identical (both equal 5), the scheduler identifies a conflict. Consequently, pixel P0 is immediately processed, while P2 is forcibly assigned to the same encoder but placed in a wait queue, commencing processing only in the subsequent clock cycle after P0 completes its context update. Conversely, had P0 and P2 mapped to the same group but possessed distinct

Q

values, the system would have assigned them to different encoders within that group to achieve intra-group parallelism. This two-stage scheduling strategy maximizes hardware utilization while guaranteeing context consistency.

Crucially, this conflict resolution mechanism provides a theoretical guarantee that the proposed parallel architecture strictly preserves the core semantics of the standard JPEG-LS algorithm. In a traditional single-core raster-scan encoder, the context variables (

A [Q], B [Q], C [Q], N [Q]

) are updated strictly sequentially. Our index-based dispatch ensures that any concurrent pixels sharing an identical context state are forcefully serialized via the wait queue or assigned to the same processing core to maintain their original relative order. Combined with the Reorder Buffer (ROB) at the output stage, which restores the absolute spatial sequence of the bitstream, the predictive residuals and updated context variables generated by our parallel MSB path are mathematically identical to those of a serial encoder. Consequently, the architecture strictly maintains the bit-exact lossless nature of JPEG-LS without any algorithmic deviation.

To formalize the scheduling correctness, two concurrently dispatched pixels, Pi and Pj, can safely execute in parallel without interleaving hazards if and only if their context indices differ. If an identical context is detected, a structural hazard occurs. To strictly maintain the sequential context update semantics of JPEG-LS, they must be forcefully serialized, with the chronologically later pixel temporarily buffered in a Wait Queue (WQ).

Furthermore, we must consider pathological load imbalances, such as when processing extremely large homogeneous regions where all N concurrent pixels might share the exact same context index. To guarantee system stability under such extreme conditions, a hardware back-pressure mechanism is implemented. Assuming the Wait Queue has a maximum depth of

D_{W Q}

, in the absolute worst-case scenario where all N pixels continuously target the same processing group, 1 pixel is actively processed while N-1 pixels are pushed into the queue per clock cycle. The wait queue will reach its full capacity in

[D_{W Q} / (N - 1)]

cycles. Upon saturation, a back-pressure stall signal is immediately asserted to the upstream Dispatch Unit, temporarily halting the ingestion of new pixels until the queue drains. This worst-case stall bound ensures that no data packets are overwritten or dropped during extreme local congestion, thereby guaranteeing deterministic timing closure and absolutely lossless behavior.

3.3. Noise-Resilient Bit-Plane Slicing

The pixels in the background area are usually composed of sensor readout noise. Applying conventional differential predictive coding to these stochastic regions often fails to reduce data redundancy. Consequently, this section introduces a Noise-Resilient Bit-Plane Slicing (BPS) mechanism designed to physically segregate the stochastic noise components from the deterministic structural information.

The algorithm first determines the optimal bit-plane slicing depth, denoted as

S

, based on the characteristics of the CMOS sensor or image. The objective of this parameter is to encapsulate the picture’s inherent noise floor within the lower bit planes, ensuring that the information retained in the upper bit planes is dominated by illumination trends rather than shot noise. Let

σ_{n o i s e}

represent the standard deviation of the noise at the current gain level, and

α

be a confidence coefficient. The slicing depth

S

is defined as the smallest integer bit-width sufficient to cover the noise amplitude:

S = ⌈{l o g}_{2} (α \cdot σ_{n o i s e})⌉

(11)

Through this calculation, the

W

-bit data of the input pixel is logically partitioned into a Most Significant Bit (MSB) segment and a Least Significant Bit (LSB) segment.

As illustrated in Figure 4, the input pixel

x_{i}

is decomposed into two orthogonal data streams processed via distinct pipelines.

The Structure Path processes the deterministic high-order bits,

x_{M S B}

. Since the high-frequency stochastic noise has been truncated, this component restores strong spatial correlation. These values are forwarded to a lightweight linear predictor to remove spatial redundancy, followed by Golomb-Rice entropy coding to generate the final MSB bitstream.

The noise path handles the stochastic low-order bits,

x_{L S B}

. This component comprises thermal and shot noise which is statistically incompressible. Instead of attempting entropy coding, the system employs a Raw Packing strategy. These

S

bits are transmitted directly to the output buffer without incurring the computational cost or probability modeling overhead associated with predictive coding.

To explicitly clarify the lossless nature of this architecture, it is imperative to note that the BPS mechanism strictly preserves all W bits of the original pixel. Mathematically, the original pixel

x_{i}

is exactly reconstructed at the decoder by concatenating the two data streams. Because the MSB path employs strictly lossless predictive coding and the LSB path utilizes raw bitstream packing without any quantization or truncation, no information is discarded. Consequently, the selection of the slicing depth

S

merely determines the boundary between the predictive and raw-packing domains. It does not introduce any bias to the MSB predictor, nor does it compromise the bit-exact reconstruction of the original data, ensuring that the system remains 100% mathematically lossless regardless of the

S

value.

3.4. Reorder Buffer

As described in the previous section, the system processes “spot” and “background” pixels in parallel. However, these two paths have different processing speeds. The spot path is slower because it requires complex context updates, while the background path is very fast. This speed difference causes pixels to finish in a random, out-of-order sequence. To solve this, we introduce a Reorder Buffer (ROB) to restore the strict raster-scan order required for the final image.

The ROB acts as a smart “sorting pool.” It uses a “Reserve-Cache-Sort” mechanism to ensure that data leaves the system in the exact same order it entered, regardless of when it finishes processing. The specific workflow consists of three stages, as shown below.

Before processing begins, the system assigns a “seat” in the ROB for every input pixel. An Allocation Pointer moves sequentially to reserve N empty entries for the current batch of pixels. The address of each reserved entry serves as a unique Transaction Tag. This tag is sent along with the pixel data to the processing units, effectively marking the pixel’s correct position in the final sequence.
Once a processing unit finishes compressing a pixel, it uses the Transaction Tag to find its reserved “seat” in the ROB. It writes the compressed data directly into that entry and sets a Valid Bit to 1. Since background pixels are processed faster, they may arrive at the ROB much earlier than spot pixels. The ROB temporarily holds these early results until the slower predecessors catch up.
At the output end, a Commit Pointer monitors the ROB. It strictly checks entries one by one in the original order. If the current entry is ready, the system reads out the data and moves the pointer to the next entry. If the current entry is not ready, the pointer pauses and waits. This mechanism forces all subsequent data to wait, guaranteeing that the final output stream is strictly continuous and ordered.

It is important to note that the primary function of the ROB is to guarantee the strict spatial sequence restoration of pixels processed by the heterogeneous paths. To prevent the ROB from becoming a new system bottleneck due to latency mismatches—specifically, the necessity to cache “fast” background pixels while waiting for “slower” predictive spot pixels—its depth is highly scalable. Given the incredibly abundant Block RAM (BRAM) resources on the target Xilinx Virtex-7 FPGA (which provides 1470 BRAMs totaling 52.9 Mb), the ROB capacity can be expanded substantially to accommodate high-resolution images with complex spot distributions. For our specific implementation, the ROB is configured with a deep buffer of 2048 entries, which explicitly consumes less than 1.5% of the total available on-chip BRAM (approximately 22 BRAM tiles). By allocating this sufficiently deep buffer that consumes only a marginal fraction of the total on-chip memory, the architecture can effortlessly absorb worst-case latency variations. Consequently, commit stalls are practically eliminated at the 100 MHz operating frequency, guaranteeing deterministic system throughput.

3.5. Decoder Architecture

This section details the corresponding decoder architecture. Because the Reorder Buffer (ROB) at the encoding stage perfectly restores the processed data to its original raster-scan order, the decoder can seamlessly process the interleaved bitstream through a unified state machine and a shared line buffer, enabling deterministic, pixel-by-pixel recovery.

The decoder initially parses the global header of the compressed bitstream to extract fundamental image parameters, including resolution, pixel bit-depth W, and the slicing parameter

S

. Crucially, the decoder must synchronously acquire the classification map generated during the encoding phase. This map serves as the primary control signal for the decoding state machine, accurately identifying the category (spot or background) of each interleaved pixel within the bitstream, thereby directing it to the appropriate decoding branch.

The decoder processes the image strictly in the raster-scan order and maintains a globally shared on-chip line buffer. Because spot and background pixels are spatially adjacent and interleaved in natural images, this unified line buffer ensures that regardless of the current pixel’s category, its prediction process always accesses the most recently updated and accurate causal neighborhood. For the i-th pixel, the decoder enters different parsing branches based on the classification signal:

Spot Pixel Branch: If the current pixel is flagged as a spot region, the decoder reads the variable-length Golomb-Rice codeword from the main bitstream and applies an inverse mapping to obtain the residual value. Subsequently, it utilizes the neighboring pixels stored in the line buffer to perform standard linear prediction. The prediction value is then added to the residual to losslessly recover the original spot pixel $x_{i}$ .
Background Pixel Branch: If flagged as a background region, the bitstream parsing involves two sequential operations. First, the decoder reads the variable-length Golomb-Rice codeword and combines it with the causal prediction to recover the Most Significant Bits ( $x_{M S B}$ ). Immediately after, the decoder directly extracts a fixed length of $S$ bits from the bitstream, which serve as the transparently packed Least Significant Bits ( $x_{L S B}$ ). Finally, the complete background pixel is losslessly reconstructed via a deterministic bitwise shift and addition operation.

Upon the completion of either branching decode operation, the fully reconstructed pixel

x_{i}

is immediately written back to the shared line buffer, providing an accurate causal reference for the subsequent pixel. This cycle repeats continuously until the entire image is perfectly reconstructed.

To empirically verify the absolute losslessness of this heterogeneous interleaved decoding logic, an end-to-end software testbench was established. For all image frames within the evaluation datasets detailed in Section 4, the mixed compressed bitstreams generated by the FPGA encoder were fed into the decoder. The reconstructed output images were then subjected to a rigorous end-to-end MD5 hash checksum matching against the original, uncompressed raw images. The test results demonstrated a 100% perfect match across all image frames.

4. Results and Analysis

To rigorously evaluate the proposed architecture, this section first outlines the experimental setup and evaluation metrics. Subsequently, it presents a comprehensive comparative analysis of processing throughput, hardware resource utilization, and compression efficiency against baseline models.

4.1. Experiment Setting

To ensure a comprehensive and rigorous evaluation of the proposed HT-NRC architecture, we established a diversified experimental environment covering dataset construction, evaluation metrics definition, and hardware implementation details.

4.1.1. Benchmark Datasets

To evaluate the compression performance of the proposed HT-NRC architecture, we utilized the high-resolution (HR) ground-truth images from the STAR benchmark dataset [25]. This dataset is highly suitable for our evaluation because its HR corpus comprehensively covers dense star clusters, sparse galactic fields, and regions with varying background noise. These characteristics perfectly align with the typical observational scenarios and noise profiles targeted by our hardware compressor.

Specifically, we selected 200 HR images featuring “sparse galactic fields” from the “×4” subset of the STAR dataset. Since the original astronomical observations are provided in scientific data formats—namely, .npy arrays and .fits files—a standardized preprocessing pipeline was implemented to prepare the data for the hardware testbench. For the .npy files, the raw array data was loaded, normalized, and mapped to an 8-bit grayscale depth. Similarly, for the .fits files, the primary image extension data was extracted and scaled to an 8-bit range. Following this bit-depth conversion, all images were exported as uncompressed Portable Gray Map (.pgm) files. These 8-bit .pgm files were subsequently utilized as the direct pixel input stimuli for our FPGA hardware verification environment.

4.1.2. Evaluation Metrics

We employ three quantitative metrics to assess the compression efficiency, hardware performance, and reconstruction quality:

Processing Throughput ( $T_{p}$ ): Defined as the actual number of pixels processed per second, measured in Megapixels per second (Mpixel/s). It is empirically calculated based on the measured hardware execution time, where $N_{t o t a l}$ represents the total number of pixels processed, and $T_{process}$ denotes the actual processing time consumed by the FPGA hardware in seconds, as shown in Equation (12):

$T_{p} = \frac{N_{t o t a l}}{T_{process} \times 10^{6}}$

(12)

2.: Compression Ratio (CR): Defined as the ratio of the original raw data size to the compressed bitstream size, as shown in Equation (13)

$C R = \frac{{Size}_{original}}{{Size}_{compressed}}$

(13)

A higher CR indicates better storage saving capability.

3.: Hardware Resource Utilization: Assessed by the consumption of Look-Up Tables (LUTs) on the target FPGA, representing the implementation cost.

4.1.3. Implementation Details

The proposed HT-NRC architecture was described in Verilog HDL and synthesized using Xilinx Vivado Design Suite 2016.2. The target hardware platform is the Xilinx Virtex-7 FPGA (XC7VX690T). As a high-performance logic device suitable for intensive parallel computing, this specific FPGA features 433,200 Look-Up Tables (LUTs), 1470 Block RAMs, and 3600 DSP48E1 slices. The image compression module is optimized using a ten-stage pipeline, achieving an operating frequency of 100 MHz. For comparative analysis, three representative baselines were selected: 1. JPEG-LS (ISO/IEC 14495-1 [26]): A strictly lossless standard based on LOCO-I predictive coding. 2. JPEG 2000 (ISO/IEC 15444-1 [27]): A transform-based standard using Discrete Wavelet Transform (DWT). 3. PNG (RFC 2083 [28]): A dictionary-based software standard using the Deflate algorithm.

4.2. Main Results

4.2.1. Performance Evaluation and Comparative Analysis

To comprehensively validate the superiority of the proposed HT-NRC architecture in high-speed space-borne applications, we conducted a quantitative benchmarking against existing implementations. Table 1 presents a detailed comparison of resource utilization, operating frequency, and processing throughput across different FPGA platforms.

As shown in Table 1, the proposed HT-NRC architecture achieves a processing throughput of 414.7 Mpixel/s on the Xilinx Virtex-7 platform. This performance substantially outperforms existing FPGA-based solutions. Specifically, compared to the optimized JPEG-LS implementation in [29], our design increases the throughput by approximately 2 times.

It is important to note the relationship between clock frequency and throughput. The method in [29] relies on a high operating frequency of 208 MHz to reach its peak performance. In contrast, our architecture achieves double the throughput while running at a much lower frequency of 100 MHz. This demonstrates the high efficiency of the proposed parallel allocation strategy. By processing multiple pixels per clock cycle, we overcome the serial bottleneck that limits traditional algorithms like [30,31], which are constrained to processing one pixel per cycle.

Table 1. Overall Performance Benchmark.

Work	Algorithm	Technology	Resource	Frequency (MHz)	Throughput (Mpixel/s)	Throughput/kLUT
ICEE 2011 [32]	JPEG-LS	ALTERA Stratix II	-	155.2	154.2	-
TCSVT 2014 [30]	JPEG2000	Virtex-4 XC4VLX80	750 Slices	106	106	-
TCSVT 2018 [31]	JPEG-LS	Virtex-6 XC6VCX75T	8354 slices	51.684	51.684	-
VLSI-SOC 2018 [29]	JPEG-LS	Virtex-7 XC7VX485	1.4 k LUT	208	208	148.57
IEEE Access 2024 [33]	JPEG-LS	Zynq-7000 XC7Z020	1.3 k LUT	108.6	43.03	33.1
Mathematics 2025 [22]	JPEG-LS	Zynq-7000 XC7Z045	18.28 k LUT	350	346.41	18.95
ACOMPA 2024 [34]	JPEG	Virtex-7 VC709	258.27 k LUT	100.7	65.64	0.25
JSCDM 2025 [35]	searchless-based FIC	Cyclone V	-	50	27.17	-
Proposed	HT-NRC	Virtex-7 XC7VX690T	18 k LUT	100	414.7	23.04

Furthermore, compared to the JPEG2000 implementation in [30] and the recent work in [33], our method provides a 4× to 10× speedup. This substantial improvement confirms that the HT-NRC architecture is well-suited for the real-time processing requirements of next-generation high-speed CMOS sensors in deep-space images.

An essential aspect of hardware design for deep-space missions is resource efficiency. To transparently address this, Table 1 incorporates the “Throughput/kLUT” metric. As observed, while the proposed HT-NRC achieves approximately a 2× throughput improvement over the highly serialized method in [29], it incurs a roughly 12× higher logic resource cost (18k LUTs vs. 1.4k LUTs). This decrease in normalized throughput-per-LUT is a deliberate and unavoidable architectural trade-off. Breaking the single-cycle physical limit to achieve over 400 Mpixel/s—a strict necessity for real-time processing of modern high-speed CMOS sensors—fundamentally mandates a superscalar parallel pipeline, context variable replication across multiple cores, and a large-capacity Reorder Buffer (ROB). For resource-limited yet modern space platforms, dedicating approximately 4.1% of the available logic resources to securely overcome the data transmission wall represents a highly practical, reliable, and favorable engineering compromise.

4.2.2. Evaluation of Compression Ratio

This section evaluates the algorithm’s ability to maintain data reduction efficiency under varying deep-space imaging conditions, specifically focusing on background noise intensity and valid signal density.

As presented in Table 2, we injected additive Gaussian noise with standard deviation

σ

ranging from 0.5 to 10.0 into the test images while keeping the Spot Occupancy Ratio (SOR) constant at 5%. The selection of these specific noise levels is meticulously designed to simulate a comprehensive range of practical deep-space imaging scenarios. Specifically,

σ = 0.5

represents a near-ideal, clean sensor baseline under optimal calibration;

σ = 2

and

σ = 5

correspond to typical low-to-medium noise levels encountered during nominal operations and moderate sensor aging; while the extreme value of

σ = 10

is chosen to simulate worst-case environments, such as observations under ultra-low illumination requiring extreme sensor gain, or scenarios suffering from severe radiation-induced dark current degradation. Under low-noise conditions (

σ = 0.5

), all algorithms perform effectively. JPEG-LS achieves the highest compression ratio (3.62) due to its efficient run-length coding for clean backgrounds, with HT-NRC following closely at 3.58. The slight deficit of HT-NRC is attributed to the overhead of the region identification mask, which is negligible in clean images. However, as noise intensity increases, the performance of traditional algorithms degrades substantially. Notably, at high noise levels (

σ = 10.0

), the compression ratios of PNG and JPEG-LS drop below 1.0 (0.88 and 0.65, respectively), indicating a “negative compression” or data expansion phenomenon. This occurs because the high-entropy noise disrupts the dictionary matches in PNG and the prediction models in JPEG-LS, causing the compressed data stream to exceed the size of the raw input. In contrast, HT-NRC maintains a compression ratio of 1.24 even in this worst-case scenario. This superior robustness is achieved by the Bit-Plane Slicing (BPS) module, which adaptively separates the noise floor from the structural data, preventing the noise entropy from overwhelming the predictive encoder.

Further evaluation was conducted under varying signal densities (SOR ranging from 1% to 40%) with fixed background noise (

σ = 2.0

), as shown in Table 3. The results demonstrate that HT-NRC consistently outperforms the standard JPEG-LS across all density levels. The performance gain is most pronounced in sparse scenes (SOR = 1%), where the proposed architecture achieves a 7.8% improvement. This advantage stems from the dual-path processing mechanism: the background handling path effectively filters out noise in the vast empty regions, while the predictive path focuses solely on the valid signal spots. As the scene becomes denser (SOR = 40%), the improvement margin narrows to 3.8% because the image is dominated by signal textures, causing the system behavior to converge towards the standard predictive coding model. Nevertheless, HT-NRC maintains a steady advantage, proving its efficacy for diverse observation targets ranging from sparse cosmic ray hits to dense star clusters.

4.3. Ablation Study

To systematically quantify the individual contributions of the proposed architectural innovations and optimize key hyperparameters, we conducted a series of ablation studies. These experiments decouple the effects of the Parallel Allocation mechanism and the Bit-Plane Slicing (BPS) noise encoder, and further analyze the system’s sensitivity to slicing depth (

S

).

4.3.1. Impact of Core Modules

We defined the standard serial JPEG-LS (LOCO-I) as the Baseline model and progressively integrated the proposed modules to observe their impact on Compression Ratio (CR) under noisy conditions (

σ = 5.0

) and Processing Throughput. The results are summarized in Table 4.

As seen in Variant A, enabling the Index-Based Parallelism (

N = 4

) yields a nearly linear increase in throughput from 100 Mpixel/s to 414.7 Mpixel/s. However, without the BPS module, the compression ratio remains at a suboptimal 0.98, failing to achieve a high compression ratio under high noise environments. This confirms that parallelism is strictly responsible for the speed dimension. Variant B introduces the BPS mechanism into a serial pipeline. While the throughput gain is negligible, the compression ratio dramatically improves from 0.98 to 1.65. This proves that the BPS module is the sole contributor to noise robustness, effectively decoupling the high-entropy noise from the predictive path. The full HT-NRC architecture combines both advantages, achieving high throughput and high compression simultaneously without mutual interference.

4.3.2. Sensitivity to Slicing Depth

The Slicing Depth parameter (

S

) determines the boundary between the “Spot” (MSB) and “Noise” (LSB) paths. We evaluated the impact of varying

S

(from 0 to 5) on the overall Compression Ratio under typical noise conditions.

As illustrated in Figure 5, when

S

is small (

S = 1, 2

), significant noise bits remain in the structure path (MSB), disrupting the prediction model and limiting coding efficiency. Conversely, when

S

is excessively large (

S = 4, 5

), useful structural information is misclassified as noise and transmitted in raw format, leading to “under-compression.” The experimental results demonstrate that

S = 3

yields the optimal trade-off, achieving a peak average compression ratio of 1.58 by effectively isolating stochastic noise while preserving the compressibility of structural features.

4.3.3. Threshold Ablation and Misclassification Risk

To evaluate the robustness of the content-aware classification, we analyzed the misclassification risk and its downstream impact on compression performance under a high-noise scenario (

σ = 5.0

). Table 5 presents an ablation study comparing a fixed-threshold strategy against the proposed adaptive threshold (

D_{l o c a l} (x_{i}) < γ \cdot σ_{s e n s o r}

) with varying

γ

values.

False Positives (FP) occur when stochastic noise is misclassified as valid spot data, forcing it into the predictive path. This leads to context state pollution and wait-queue congestion, which noticeably degrades both Compression Ratio (CR) and Processing Throughput (as seen when

γ = 1

or using an improper fixed threshold). Conversely, False Negatives (FN) occur when valid structural pixels are misclassified as noise. It is crucial to note that the architecture remains strictly lossless in FN scenarios, as these pixels are safely routed to the raw-packing noise cores. However, bypassing the predictive coding inevitably reduces the overall CR (as seen when

γ = 3

). As demonstrated in Table 5, the adaptive threshold with

γ = 2

dynamically balances these risks, effectively mitigating throughput penalties while preserving optimal compression efficiency compared to a rigid fixed-threshold approach.

5. Conclusions, Limitations, and Future Work

5.1. Conclusions

In this paper, we introduced HT-NRC, a high-speed lossless compression architecture designed for deep-space CMOS cameras. This system addresses two critical bottlenecks in current space missions: the inability of traditional serial algorithms to match the throughput of modern sensors, and the severe degradation of compression efficiency caused by inherent sensor noise. The core innovation of HT-NRC is its content-aware parallel allocation system, which dynamically categorizes and routes pixel streams. For signal-rich spot regions, an index-based grouping mechanism safely decouples data dependencies, enabling standard JPEG-LS cores to execute in parallel while strictly preserving lossless semantics. For noisy backgrounds, a noise-resilient Bit-Plane Slicing (BPS) technique physically segregates stochastic noise from structural data, effectively preventing the data expansion phenomenon common in predictive coding.

Implemented on a Xilinx Virtex-7 FPGA, HT-NRC achieves a massive throughput of 414.7 Mpixel/s at 100 MHz, operating with an estimated on-chip power consumption of 2.1 W. This performance is approximately four times faster than optimized serial implementations and securely maintains a compression ratio above 1.25 even under extreme noise conditions. Utilizing less than 5% of available LUT resources, HT-NRC offers a highly practical, energy-efficient, and robust solution for onboard scientific payloads.

5.2. Limitations

Despite these substantial performance gains, the proposed architecture has certain limitations. First, as a proof-of-concept prototype validated in a controlled laboratory environment, the current FPGA implementation lacks physical verification against radiation-induced single-event effects (SEEs) inherent to actual spaceflight. Second, while the system is highly robust under standardized noise conditions, its adaptability to extreme edge cases—such as ultra-low bit-depth sensors, High Dynamic Range (HDR) sensors employing non-linear companding, or severe radiation-induced gain drifts—remains constrained by the statically configured slicing depth. Finally, overcoming the physical throughput limit inherently mandates an architectural trade-off, resulting in higher normalized logic resource consumption (due to parallel core replication and the Reorder Buffer) compared to heavily serialized baselines.

5.3. Future Work

To address these limitations and broaden the architecture’s impact, our future research will proceed along two primary trajectories: spaceflight hardware upgrading and cross-disciplinary application.

Regarding spaceflight deployment, a primary focus is upgrading this prototype into a flight-ready model. We plan to migrate the architecture to radiation-hardened ASICs and incorporate robust mitigation strategies, including Triple Modular Redundancy (TMR) and watchdog timers for critical control paths, Error Correction Codes (ECC) for on-chip memory, and a hardware fail-safe mechanism that gracefully degrades into a raw passthrough mode upon uncorrectable bit-flips. To handle extreme sensor edge cases, we will develop a dynamic on-orbit calibration module capable of adaptively updating the slicing depth $S$ based on real-time dark-frame evaluations. Additionally, we will conduct comprehensive quantitative profiling of the Reorder Buffer (ROB) occupancy across diverse celestial workloads to further optimize memory efficiency.

Beyond aerospace, we will explore Region-of-Interest (ROI) near-lossless modes to maximize data efficiency under severe bandwidth limits. Furthermore, the architecture’s core characteristics—strictly lossless reconstruction, ultra-low latency, and noise resilience—highlight significant potential for other artifact-intolerant domains. We intend to extend this hardware-accelerated pipeline to advanced medical imaging workflows, particularly those driven by Large Language Models (LLMs) and Vision-Language Models (VLMs) [36,37]. In fields like dermatology, where diagnostic cues are incredibly subtle and AI models are highly sensitive to compression artifacts, adapting our architecture to perfectly preserve fine morphological details while reducing transmission constraints represents a highly promising interdisciplinary research direction.

Author Contributions

Conceptualization, H.W. and J.G.; methodology, H.W.; software, J.G.; validation, H.W. and J.G.; formal analysis, H.W.; investigation, H.W.; resources, Y.B.; data curation, H.W.; writing—original draft preparation, H.W.; writing—review and editing, Y.B.; funding acquisition, Y.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant number 12027803.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, J.; Chen, N.; Wang, Z.; Dou, R.; Liu, J.; Wu, N.; Liu, L.; Feng, P.; Wang, G. A review of recent advances in high-dynamic-range CMOS image sensors. Chips 2025, 4, 8. [Google Scholar] [CrossRef]
Bautz, M.W.; Miller, E.D.; Prigozhin, G.Y.; LaMarr, B.J.; Malonis, A.; Foster, R.; Grant, C.E.; Schneider, B.; Leitz, C.; Donlon, K.; et al. Fast, low-noise image sensor technology for strategic x-ray astrophysics missions. In Space Telescopes and Instrumentation 2024: Ultraviolet to Gamma Ray; SPIE: Bellingham, WA, USA, 2024; Volume 13093, pp. 528–538. [Google Scholar]
Cesarone, R.J.; Abraham, D.S.; Shambayati, S.; Rush, J. Deep-space optical communications. In Proceedings of the International Conference on Space Optical Systems and Applications 2011, Santa Monica, CA, USA, 11–13 May 2011; pp. 410–423. [Google Scholar]
Lambert-Nebout, C.; Latry, C.; Moury, G.A.; Parisot, C.; Antonini, M.; Barlaud, M. On-board optical image compression for future high-resolution remote sensing systems. Appl. Digit. Image Process. 2000, XXIII 4115, 332–346. [Google Scholar]
Elakkiya, S.; Thivya, K.S. Comprehensive review on lossy and lossless compression techniques. J. Inst. Eng. Ser. B 2022, 103, 1003–1012. [Google Scholar] [CrossRef]
Maireles-González, Ò.; Bartrina-Rapesta, J.; Hernández-Cabronero, M.; Serra-Sagristà, J. Analysis of lossless compressors applied to integer and floating-point astronomical data. In Proceedings of the Data Compression Conference 2022, Snowbird, UT, USA, 22–25 March 2022; pp. 389–398. [Google Scholar]
Papadonikolakis, M.E.; Kakarountas, A.P.; Goutis, C.E. Efficient high-performance implementation of JPEG-LS encoder. J. Real-Time Image Process. 2008, 3, 303–310. [Google Scholar] [CrossRef]
Pence, W.D.; Seaman, R.; White, R.L. Lossless astronomical image compression and the effects of noise. Publ. Astron. Soc. Pac. 2009, 121, 414. [Google Scholar] [CrossRef]
Alakuijala, J.; Van Asseldonk, R.; Boukortt, S.; Bruse, M.; Comșa, I.M.; Firsching, M.; Fischbacher, T.; Kliuchnikov, E.; Gomez, S.; Obryk, R.; et al. JPEG XL next-generation image compression architecture and coding tools. Appl. Digit. Image Process. 2019, 11137, 112–124. [Google Scholar]
Ciobâcă, Ş.; Gratie, D.E. Implementing, specifying, and verifying the QOI format in dafny: A case study. In Proceedings of the International Conference on Integrated Formal Methods 2024, Manchester, UK, 13–15 November 2024; pp. 35–52. [Google Scholar]
Wu, X.; Memon, N. CALIC-a context based adaptive lossless image codec. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings 1996, Atlanta, GA, USA, 7–10 May 1996; Volume 4, pp. 1890–1893. [Google Scholar]
Meyer, B.; Tischer, P. TMW-a new method for lossless image compression. In Proceedings of the International Picture Coding Symposium 1997, Berlin, Germany, 10–12 September 1997; Volume 2, pp. 533–540. [Google Scholar]
Weinberger, M.J.; Seroussi, G.; Sapiro, G. LOCO-I: A low complexity, context-based, lossless image compression algorithm. In Proceedings of the Data Compression Conference-DCC 1996, Snowbird, UT, USA, 31 March–3 April 1996; pp. 140–149. [Google Scholar]
Bartrina-Rapesta, J.; Serra-Sagrista, J.; Auli-Llinas, F.; Gomez, J.M. JPEG2000 ROI coding method with perfect fine-grain accuracy and lossless recovery. In Proceedings of the Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers 2009, Pacific Grove, CA, USA, 1–4 November 2009; pp. 558–562. [Google Scholar]
Dufaux, F.; Sullivan, G.J.; Ebrahimi, T. The JPEG XR image coding standard [Standards in a Nutshell]. IEEE Signal Process. Mag. 2009, 26, 195–204. [Google Scholar] [CrossRef]
Trigka, M.; Dritsas, E. A comprehensive survey of deep learning approaches in image processing. Sensors 2025, 25, 531. [Google Scholar] [CrossRef] [PubMed]
Nakahara, H.; Que, Z.; Luk, W. High-throughput convolutional neural network on an FPGA by customized JPEG compression. In Proceedings of the IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines 2020, Fayetteville, AR, USA, 3–6 May 2020; pp. 1–9. [Google Scholar]
Wang, S.; Cheng, Z.; Feng, D.; Lu, G.; Song, L.; Zhang, W. Asymllic: Asymmetric lightweight learned image compression. In Proceedings of the IEEE International Conference on Visual Communications and Image Processing 2024, Tokyo, Japan, 8–11 December 2024; pp. 1–5. [Google Scholar]
Mazouz, A.; Chaudhuri, S.; Cagnanzzo, M.; Mitrea, M.; Tartaglione, E.; Fiandrotti, A. Lightweight Embedded FPGA Deployment of Learned Image Compression with Knowledge Distillation and Hybrid Quantization. arXiv 2025, arXiv:2503.04832. [Google Scholar]
Santos, L.; Gomez, A.; Sarmiento, R. Implementation of CCSDS standards for lossless multispectral and hyperspectral satellite image compression. IEEE Trans. Aerosp. Electron. Syst. 2019, 56, 1120–1138. [Google Scholar] [CrossRef]
CCSDS 121.0-B-2; Lossless Data Compression. CCSDS: Washington, DC, USA, 2012.
Li, X.; Zhou, L.; Zhu, Y. Real-Time Lossless Compression System for Bayer Pattern Images with a Modified JPEG-LS. Mathematics 2025, 13, 3245. [Google Scholar] [CrossRef]
Machairas, E.; Kranitis, N. A 13.3 Gbps 9/7M discrete wavelet transform for CCSDS 122.0-B-1 image data compression on a space-grade SRAM FPGA. Electronics 2020, 9, 1234. [Google Scholar] [CrossRef]
CCSDS 122.0-B-1; Image Data Compression. CCSDS: Washington, DC, USA, 2005.
Wu, K.C.; Zhuang, G.; Huang, J.; Zhang, X.; Ouyang, W.; Lu, Y. Star: A benchmark for astronomical star fields super-resolution. arXiv 2025, arXiv:2507.16385. [Google Scholar] [CrossRef]
ISO/IEC 14495-1; Information Technology—Lossless and Near-Lossless Compression of Continuous-Tone Still Images: Baseline. ISO/IEC: Geneva, Switzerland, 1999.
ISO/IEC 15444-1; Information Technology—JPEG 2000 Image Coding System: Core Coding System. ISO/IEC: Geneva, Switzerland, 2000.
RFC 2083; PNG (Portable Network Graphics) Specification Version 1.0. IETF: Fremont, CA, USA, 1997.
Murat, Y. Key architectural optimizations for hardware efficient JPEG-LS encoder. In Proceedings of the IFIP/IEEE International Conference on Very Large Scale Integration 2018, Verona, Italy, 8–10 October 2018; pp. 243–248. [Google Scholar]
Belyaev, E.; Liu, K.; Gabbouj, M.; Li, Y. An efficient adaptive binary range coder and its VLSI architecture. IEEE Trans. Circuits Syst. Video Technol. 2014, 25, 1435–1446. [Google Scholar] [CrossRef]
Chen, L.; Yan, L.; Sang, H.; Zhang, T. High-throughput architecture for both lossless and near-lossless compression modes of LOCO-I algorithm. IEEE Trans. Circuits Syst. Video Technol. 2018, 29, 3754–3764. [Google Scholar] [CrossRef]
Daryanavard, H.; Abbasi, O.; Talebi, R. FPGA implementation of JPEG-LS compression algorithm for real time applications. In Proceedings of the 19th Iranian Conference on Electrical Engineering 2011, Tehran, Iran, 17–19 May 2011; pp. 1–4. [Google Scholar]
Liu, F.; Chen, X.; Liao, Z.; Yang, C. Adaptive Pipeline Hardware Architecture Design and Implementation for Image Lossless Compression/Decompression Based on JPEG-LS. IEEE Access 2024, 12, 5393–5403. [Google Scholar] [CrossRef]
Khai, L.D.; Van Quang, T.; Luan, P.H. A real-time JPEG image compression hardware design architecture. In Proceedings of the International Conference on Advanced Computing and Analytics 2024, Ben Cat, Vietnam, 27–29 November 2024; pp. 29–36. [Google Scholar]
Saad, A.M.; Chai, Z.H.; Alduais, N.A.; Mohammed, M.S.; Abdul-Qawy, A.S.H.; Nasser, A.B.; Ghanem, W.A.H.; Sa’ad, H.H.Y. Efficient Implementation of Searchless Fractal Image Compression on Low-cost FPGA for Real-Time Encoding. J. Soft Comput. Data Min. 2025, 6, 138–149. [Google Scholar]
Boostani, M.; Bánvölgyi, A.; Zouboulis, C.C.; Goldfarb, N.; Suppa, M.; Goldust, M.; Lőrincz, K.; Kiss, T.; Nádudvari, N.; Holló, P.; et al. Large language models in evaluating hidradenitis suppurativa from clinical images. J. Eur. Acad. Dermatol. Venereol. 2025, 39, 1052–1055. [Google Scholar] [CrossRef] [PubMed]
Boostani, M.; Bánvölgyi, A.; Goldust, M.; Cantisani, C.; Pietkiewicz, P.; Lőrincz, K.; Holló, P.; Wikonkál, N.M.; Paragh, G.; Kiss, N. Diagnostic performance of GPT-4o and Gemini Flash 2.0 in acne and rosacea. Int. J. Dermatol. 2025, 64, 1881–1882. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Top-level architecture of the proposed HT-NRC system. The design features an index-based parallel dispatch unit that dynamically allocates input pixels into two parts.

Figure 2. Hardware architecture of the gradient-based region identification module. It computes local gradients using causal neighbors and generates a control mask by comparing the gradient sum against an adaptive noise threshold for subsequent pixel classification.

Figure 3. Detailed schematic of the pixel allocation method and conflict resolution strategy. The figure illustrates a practical case study involving four input pixels. The allocation unit dynamically dispatches the noise pixel (P1) to the noise cores, and the spot pixels (P0, P2, P3) to the corresponding spot processing groups based on their context indices (

Q

).

Figure 3. Detailed schematic of the pixel allocation method and conflict resolution strategy. The figure illustrates a practical case study involving four input pixels. The allocation unit dynamically dispatches the noise pixel (P1) to the noise cores, and the spot pixels (P0, P2, P3) to the corresponding spot processing groups based on their context indices (

Q

).

Figure 4. Schematic of the Noise-Resilient Bit-Plane Slicing (BPS) architecture. The input W-bit pixel is decomposed into Most Significant Bits (MSBs) for spatial redundancy removal via a linear predictor and entropy coding, and Least Significant Bits (LSBs) which bypass the predictive model and are directly packed into the stream combiner as raw bitstreams.

Figure 5. Compression Ratio vs. Slicing Depth (

S

).

Figure 5. Compression Ratio vs. Slicing Depth (

S

).

Table 2. Compression ratio-comparison under varying noise levels.

Noise Level ( $σ$ )	PNG	JPEG 2000	JPEG-LS	HT-NRC (Proposed)
0.5 (Clean)	2.85 ± 0.35	3.55 ± 0.42	3.62 ± 0.45	3.58 ± 0.40
2.0 (Low)	1.95 ± 0.28	2.30 ± 0.32	2.15 ± 0.36	2.21 ± 0.28
5.0 (Medium)	1.15 ± 0.24	1.45 ± 0.28	0.98 ± 0.35	1.65 ± 0.25
10.0 (High)	0.88 ± 0.20	1.05 ± 0.22	0.65 ± 0.28	1.24 ± 0.20

Table 3. Compression ratio comparison under varying spot densities.

SOR (%)	JPEG-LS	HT-NRC (Proposed)	Improvement
1% (Sparse)	2.18 ± 0.38	2.35 ± 0.30	+7.8%
10% (Medium)	2.05 ± 0.32	2.15 ± 0.26	+4.9%
40% (Dense)	1.85 ± 0.28	1.92 ± 0.22	+3.8%

Table 4. Impact of Core Modules under different variants.

Configuration	Module: Parallelization (N = 4)	Module: Background BPS (S = 3)	Throughput (Mpixel/s)	Compression Ratio ( $σ$ = 5.0)
Baseline (JPEG-LS)	×	×	100	0.98
Variant A	√	×	401.3	0.98
Variant B	×	√	100	1.65
HT-NRC (Full)	√	√	414.7	1.65

Table 5. Impact of classification thresholds on misclassification rates and system performance.

Threshold Strategy	False Positive Rate (FP)	False Negative Rate (FN)	Throughput (Mpixel/s)	Compression Ratio
Fixed Threshold (Th = 10)	18.4%	4.2%	365.2	1.32
$Adaptive (γ = 1)$	14.5%	2.1%	382.4	1.41
$Adaptive (γ = 2)$	3.2%	5.6%	414.7	1.65
$Adaptive (γ = 3)$	0.8%	15.3%	420.1	1.50

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, H.; Bai, Y.; Gao, J. HT-NRC: A High-Throughput and Noise-Resilient Lossless Image Compression Architecture for Deep-Space CMOS Cameras. Appl. Sci. 2026, 16, 2873. https://doi.org/10.3390/app16062873

AMA Style

Wu H, Bai Y, Gao J. HT-NRC: A High-Throughput and Noise-Resilient Lossless Image Compression Architecture for Deep-Space CMOS Cameras. Applied Sciences. 2026; 16(6):2873. https://doi.org/10.3390/app16062873

Chicago/Turabian Style

Wu, Haoyu, Yonglin Bai, and Jiarui Gao. 2026. "HT-NRC: A High-Throughput and Noise-Resilient Lossless Image Compression Architecture for Deep-Space CMOS Cameras" Applied Sciences 16, no. 6: 2873. https://doi.org/10.3390/app16062873

APA Style

Wu, H., Bai, Y., & Gao, J. (2026). HT-NRC: A High-Throughput and Noise-Resilient Lossless Image Compression Architecture for Deep-Space CMOS Cameras. Applied Sciences, 16(6), 2873. https://doi.org/10.3390/app16062873

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HT-NRC: A High-Throughput and Noise-Resilient Lossless Image Compression Architecture for Deep-Space CMOS Cameras

Abstract

1. Introduction

2. Related Work

2.1. Image Compression Algorithms

2.2. Learning-Based Image Compression

2.3. Image Compression for Deep-Space Image

3. Proposed Methodology

3.1. Standard JPEG-LS Algorithm

3.2. Content-Aware Parallel Classification and Allocation

3.3. Noise-Resilient Bit-Plane Slicing

3.4. Reorder Buffer

3.5. Decoder Architecture

4. Results and Analysis

4.1. Experiment Setting

4.1.1. Benchmark Datasets

4.1.2. Evaluation Metrics

4.1.3. Implementation Details

4.2. Main Results

4.2.1. Performance Evaluation and Comparative Analysis

4.2.2. Evaluation of Compression Ratio

4.3. Ablation Study

4.3.1. Impact of Core Modules

4.3.2. Sensitivity to Slicing Depth

4.3.3. Threshold Ablation and Misclassification Risk

5. Conclusions, Limitations, and Future Work

5.1. Conclusions

5.2. Limitations

5.3. Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI