Distributed Latent Representation Clustering for Efficient Multi-Satellite Image Compression

Lu, Xiandong; Guan, Xingyu; Wang, Pengcheng; Cai, Zhiming; Zhang, Yonghe

doi:10.3390/rs18091355

Open AccessArticle

Distributed Latent Representation Clustering for Efficient Multi-Satellite Image Compression

by

Xiandong Lu

^1,2,

Xingyu Guan

^2,3,

Pengcheng Wang

^2,4,5,

Zhiming Cai

⁴

and

Yonghe Zhang

^1,2,4,5,*

¹

School of Fundamental Physics and Mathematical Sciences, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

School of Intelligent Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China

⁴

Innovation Academy of Microsatellites, Chinese Academy of Sciences, Shanghai 201306, China

⁵

Key Laboratory for Satellite Digitalization Technology, Chinese Academy of Sciences, Shanghai 201210, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(9), 1355; https://doi.org/10.3390/rs18091355

Submission received: 17 February 2026 / Revised: 4 April 2026 / Accepted: 20 April 2026 / Published: 28 April 2026

(This article belongs to the Special Issue Advanced Technology for Remote Sensing Image Analysis and Applications)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

DLRC is the first framework to integrate real-time multi-satellite observation redundancy elimination into learned image compression.
This method achieves a significant reduction in bits per pixel compared to baselines while maintaining virtually identical reconstruction quality.

What are the implications of the main findings?

DLRC establishes an efficient distributed multi-satellite image compression architecture and allows for seamless compatibility with existing models.
The experimental results reveal the substantial potential of eliminating multi-satellite observation redundancy to enhance image compression performance.

Abstract

With the increasing number and enhanced sensing capabilities of satellites, the volume of satellite imagery has substantially surpassed the available bandwidth of satellite-to-ground links. Recently, with the adoption of commercial on-board GPUs, Learned Image Compression (LIC) offers the potential to mitigate this bottleneck by virtue of its superior rate–distortion performance over traditional codecs. However, existing LIC solutions operate in isolation on single satellites and underutilize the overlapping observations, which limits further gains in compression performance. In this paper, we propose Distributed Latent Representation Clustering (DLRC), which represents the first attempt to integrate real-time multi-satellite observation redundancy elimination into LIC. DLRC first introduces a local latent representation clustering mechanism. It discretizes the latent representation of LIC into compact cluster signatures on each satellite with lightweight computational overhead. Subsequently, DLRC presents a global cluster signature synchronization strategy. By exchanging signatures with negligible communication overhead, it enables multiple satellites to identify globally redundant local observations on a per-signature basis. By coding and downlinking only the latent representation corresponding to globally unique signatures, DLRC achieves non-redundant downlink in a training-free paradigm while remaining compatible with existing LIC architectures. Through extensive experiments, we demonstrate that DLRC achieves efficient bits per pixel reduction compared to independent LIC solutions while maintaining comparable reconstruction quality.

Keywords:

earth observation imagery; learned image compression; locality-sensitive hashing

1. Introduction

Large-scale Low Earth Orbit (LEO) constellations provide Earth Observation (EO) imagery that supports critical applications in climate monitoring [1,2,3], urban planning [4,5,6], and disaster response [7,8,9]. Due to limited on-board resources, massive volumes of captured imagery must be downlinked to ground stations for processing and storage. In recent years, the number of EO satellites has increased by hundreds annually [10]. A constellation can generate hundreds of terabytes of data per day [11], whereas satellite-to-ground links typically offer only Mbps-level bandwidth [12]. Furthermore, the dynamic characteristics of LEO satellites result in intermittent connectivity, which provides only a few high-quality contact windows daily [13]. This gap prevents approximately 98 % of captured images from being downloaded within a day [14], which severely hinders the timeliness of satellite data applications. Consequently, efficient on-orbit satellite image compression has become essential.

Conventional industrial satellite image compression solutions primarily rely on lossless or lossy hand-crafted methods [15,16,17]. These methods have inferior compression performance and provide limited relief for the downlink bottleneck. In contrast, Learned Image Compression (LIC) methods have demonstrated superior Rate–Distortion (R–D) performance in the general image compression domain [18,19,20,21,22,23]. With emerging but limited on-board computing capabilities [24], recent work has begun to explore the deployment of learning-based methods on satellites to enhance compression performance on EO imagery [25,26,27,28,29,30,31].

Despite advances in single-satellite compression, existing studies have largely overlooked the potential for multi-satellite collaboration within the LIC framework. The motivation is that during a single EO pass, the observations of multiple satellites exhibit significant spatial overlap to ensure gapless coverage [32,33,34]. While overlapping observations inherently produce inter-image redundancy, it is not eliminated by existing independent LIC pipelines; instead, it persists through the non-linear transform and manifests as repetitive information in the latent domain. This results in additional bit overhead, as identical content is independently coded and transmitted multiple times. Real-time elimination of such latent-domain redundancy among concurrent observations therefore offers an opportunity to further improve compression performance.

However, achieving a real-time multi-satellite LIC solution is hindered primarily by stringent on-board runtime resource constraints and the intrinsic limitations of the LIC architecture. First, satellites typically operate under severe computational and power budgets, as the majority of energy is prioritized for attitude and orbit control systems and communication systems [35]. It is difficult to support the centralized processing of observations from multiple satellites to identify and eliminate redundancies. Second, current LEO satellites exhibit highly dynamic characteristics that cannot guarantee stable, high-speed Inter-Satellite Links (ISLs) between satellite pairs at arbitrary times [30]. This constraint hinders raw image sharing within a single observation pass due to the insufficient duration of contact windows. Third, mainstream LIC models typically integrate context-based probability distribution prediction. Direct redundancy elimination in the latent domain cannot be applied straightforwardly, as it may break the causal dependencies required by the LIC compression and decompression process.

To address these challenges, we propose Distributed Latent Representation Clustering (DLRC), a framework that efficiently extends standard single-satellite on-board LIC pipelines into a multi-satellite redundancy elimination framework. At its core, DLRC operates through a distributed architecture to achieve a globally equivalent effect, where each satellite performs deduplication-related computations exclusively on its own payload data, while striving to minimize the synchronization overhead.

Specifically, DLRC implements lightweight operations to accommodate stringent on-board constraints. First, DLRC introduces a local latent representation clustering mechanism. It leverages Locality-Sensitive Hashing (LSH) to efficiently cluster the latent representation extracted by the LIC encoder into hash signatures. Crucially, DLRC adopts a unified projection space by employing identical hash functions across all participating satellites, ensuring that latent representations map to globally comparable signatures. Second, DLRC presents a global signature synchronization strategy. Instead of transmitting voluminous raw images [30] or latent data [35] as in conventional methods, each participating satellite exchanges its local integer signatures with others. By cross-referencing the received global signatures, each satellite independently identifies and retains only those local signatures that are globally unique. Third, DLRC leverages the entropy encoder of pre-trained LIC models, where each satellite performs the actual coding on the local latent representation corresponding to the retained signatures. This ensures that only a single instance of data is preserved for each globally unique cluster, achieving non-redundant downlinking without requiring additional training. Finally, all bitstreams from participating satellites are collected at the Ground Station (GS), where they are decoded sequentially following the encoding order to satisfy the context dependencies. The latent representations are then restored to their full spatial structure via mutual referencing among deduplicated representations, enabling subsequent image reconstruction and ensuring compatibility with existing LIC architectures.

We conduct a comprehensive evaluation of DLRC on representative EO imagery datasets. The results demonstrate that DLRC achieves a reduction of over 30% in bits per pixel (bpp) compared to existing LIC methods while maintaining nearly identical reconstruction metrics. Furthermore, DLRC exhibits performance equivalent to baselines in terms of both visual quality and downstream task accuracy. In addition, through practical measurements on resource-constrained hardware and a communication simulator that we construct, we provide a complete characterization of the practical system overhead of DLRC. We also quantify its end-to-end downlink gain under representative configurations and validate its robustness under real viewing-geometry variations.

The remainder of this paper is organized as follows. Section 2 reviews the background of satellite image downlinking and the current state of LIC. Section 3 introduces our proposed multi-satellite redundancy elimination framework, DLRC. Section 4 presents experiments on compression performance and overhead. Section 5 discusses the potential future work of DLRC, and Section 6 concludes the paper. Appendix A details the communication simulator.

2. Background

2.1. Satellite Image Downlink Missions

EO imagery involves capturing terrestrial, oceanic, and atmospheric data through specialized satellite sensors. These images primarily comprise multiple spectral bands (e.g., RGB, near-infrared, and short-wave infrared) [36], which provide rich physical information. Compared to other imaging domains, EO operates across vast geographical scales, yielding a significantly larger total pixel count despite having relatively lower spatial resolutions (e.g., several meters per pixel) [37]. An uncompressed image can reach several gigabytes in size, with a single high-duty-cycle satellite generating nearly a terabyte of data daily [11].

State-of-the-art EO satellites are equipped with small GPUs (e.g., NVIDIA Jetson AGX Xavier with 32 TOPS) to provide on-board image processing capability [24]. In practice, these hardware components are further constrained by low-priority power budgets, radiation-induced system resets, and thermal-dictated short duty cycles [38]. Direct on-board processing of compute-intensive image pipelines remains difficult. Meanwhile, mainstream EO satellites are equipped with only a few terabytes of SSD storage per satellite [39]. In the space environment, the necessity for multi-level storage redundancy further limits the actual available capacity [40]. Such configurations are insufficient to support long-term storage of massive images. This limitation necessitates frequent local data replacement, which leads to irrecoverable data loss. Furthermore, EO imagery collected by satellites is often required to be delivered to end-users for time-series comparative studies and long-term utilization. Therefore, downlinking large volumes of image data to ground stations remains a persistent requirement.

Despite this imperative, satellite image downlink continues to face critical bottlenecks arising from tenuous connectivity to the GS. EO satellites typically operate in LEO to achieve global coverage. Their orbital mechanics dictate that each communication window with a terrestrial receiver lasts less than ten minutes and occurs only a few times per day [13]. In tandem, space-to-ground links rely on radio transmission, which restricts the downlink rates to the order of a few hundred megabits per second [12]. Accumulated EO imagery may consequently suffer from severe transmission latencies, ranging from several hours to multiple days [13]. For latency-sensitive monitoring applications (e.g., disaster response or maritime security), such delays are unacceptable. This necessitates lightweight on-board compression. By shrinking the raw data size before it enters the downlink queue, compression methods provide an opportunity to maximize the volume of imagery that can be offloaded within these brief downlink windows.

Figure 1 illustrates several sequential stages of satellite image downlink missions. Satellites acquire images in a push-broom manner along their ground tracks. These images are initially processed with on-board radiometric calibration [41] and geometric correction [42] to rectify raw observations. These pre-processed images then undergo on-board encoding before being transmitted to the GS. Current constellations enable satellites to transfer encoded data packets to a satellite with an active ground-station contact via ISLs [43], which facilitates global downlink pipelines. Finally, the ground segment performs the corresponding decoding to generate the reconstructed images, followed by image registration and mosaicking to produce comprehensive observations.

2.2. Satellite Image Compression Methods

Traditional satellite image compression methods, often referred to as codecs, consist of a pipeline of components crafted by human experts to exploit known statistical structures in image signals. These methods first employ predefined mathematical transforms or linear prediction to represent signals in images with fewer symbols. Examples include the Discrete Cosine Transform (DCT) used in JPEG [15], the Discrete Wavelet Transform (DWT) in JPEG2000 [16], or the neighboring pixel prediction in CCSDS-123 [17]. The resulting symbols then undergo quantization and entropy coding (e.g., Huffman or arithmetic coding [44]) to minimize the generated bits. As established standards, these codecs have long been utilized to provide both lossless and lossy compression under on-board computing environments. Their hand-crafted design limits their ability to model complex image characteristics, making it difficult to increase compression ratios while maintaining high fidelity for the growing volume of EO data.

With the development of deep learning, LIC has emerged as a powerful alternative. LIC also follows the transform, quantization, and entropy coding framework, but trains all components under a unified R–D objective. For state-of-the-art LIC with joint hyperprior and context-based entropy modeling, Figure 2 illustrates the processing pipeline. Concretely, given an input image

x \in R^{3 \times H \times W}

, the encoder

g_{a}

, referred to as the analysis transform, maps it into a latent representation

y \in R^{C \times H^{'} \times W^{'}}

, which typically has reduced spatial resolution and increased channel dimensionality to retain salient information. A hyperprior

z

is further extracted from

y

by the hyperprior encoder

h_{a}

to capture its spatially varying statistics:

y = g_{a} (x), z = h_{a} (y)

(1)

Both

y

and

z

are quantized to support entropy coding, where

Q (\cdot)

denotes the quantization function:

\hat{y} = Q (y), \hat{z} = Q (z)

(2)

In the entropy modeling procedure, the hyperprior decoder

h_{s}

extracts hyperprior features

ψ \in R^{2 C \times H^{'} \times W^{'}}

from

\hat{z}

, while the context prediction model

g_{c p}

extracts context features

ϕ \in R^{2 C \times H^{'} \times W^{'}}

from

\hat{y}

. The entropy parameter model

g_{e p}

then takes

ψ

and

ϕ

as input, and predicts the distribution parameters

(μ, σ)

for position i of

\hat{y}

:

ψ = h_{s} (\hat{z}), ϕ_{i} = g_{c p} ({\hat{y}}_{< i}), μ_{i}, σ_{i} = g_{e p} (ψ, ϕ_{i})

(3)

Subsequently, the probability of

\hat{z}

is modeled by a factorized distribution, while the probability of

\hat{y}

is modeled by a conditional Gaussian distribution:

p (\hat{z}) = \prod_{i} p ({\hat{z}}_{i}), p (\hat{y} ∣ \hat{z}) = \prod_{i} N ({\hat{y}}_{i} ∣ μ_{i}, σ_{i}^{2})

(4)

Respectively,

p (\hat{z})

and

p (\hat{y} ∣ \hat{z})

are utilized for the entropy coding of

\hat{z}

and

\hat{y}

to generate the compressed bitstream.

At the receiver,

\hat{z}

is directly decoded from the bitstream based on the hyperprior metadata and then used in the same entropy modeling procedure to decode

\hat{y}

. Finally, the decoder

g_{s}

, referred to as the synthesis transform, maps

\hat{y}

back into the pixel space and reconstructs the image

\hat{x}

:

\hat{x} = g_{s} (\hat{y})

(5)

The training objective of LIC minimizes the R–D loss

L

, where

R

represents the expected bit rate,

D

denotes a distortion metric (e.g., Mean Squared Error (MSE) or Multi-Scale Structural Similarity Index Measure (MS-SSIM) [45]), and

λ

controls the R–D trade-off:

L = R + λ D = E [- {log}_{2} p (\hat{y} ∣ \hat{z}) - {log}_{2} p (\hat{z})] + λ E [d (x, \hat{x})]

(6)

LIC models designed for natural visual imagery [18,19,20,21,22,23] have achieved superior lossy R–D performance across various metrics compared to traditional codecs, at the cost of increased model parameters and computational complexity.

Recently, the exploration of LIC in satellite systems is gaining increasing momentum, with a primary focus on optimizing efficiency by reducing on-board compression overhead [25,26,27,28]. However, these methods operate on individual satellites in isolation and do not exploit the potential redundancy arising from overlapping multi-satellite observations.

Among existing efforts in multi-satellite scenarios, Earth+ [30] proposes sharing historical images across the constellation to skip the downlink of unchanged redundant tiles. DeepSpace [31] also relies on historical images, but adopts a super-resolution framework in which low-resolution tiles are downlinked and reconstructed on the ground. Nevertheless, these methods essentially remain single-satellite offline operations, as they rely on intermittent ground-side uplinks to provide reference images from prior days. Moreover, their reduction in downlink data volume is achieved by filtering image tiles outside the compression pipeline, making them largely orthogonal to LIC methods. Consequently, a LIC-based real-time framework capable of eliminating redundancy in overlapping multi-satellite observations remains an open challenge.

3. Method

3.1. Motivation and Challenges

LEO satellites move rapidly above the Earth, often providing multi-sensor, multi-angle, and multi-orbital coverage of target areas. To minimize data gaps within limited observation windows, the footprints of multiple satellites inevitably produce spatial overlap [32,33,34]. This spatial overlap naturally leads to repeated visual content across captured images. We further observe that such inter-image redundancy does not vanish after feature extraction, but is largely preserved in the latent space of LIC.

To illustrate this phenomenon, Figure 3 presents two representative examples from overlapping observations. After aligning each image pair captured by different satellites using geographical coordinates, we transform them with ELIC [21], which is adopted here as a strong and widely used LIC baseline, using

λ = 0.0067

to examine the most challenging high-compression setting (details in Section 4.1). For the overlapping regions, we compute the cosine similarity between corresponding latent representations:

S (y_{i}, y_{j}) = \frac{y_{i} \cdot y_{j}}{∥ y_{i} ∥ ∥ y_{j} ∥}

(7)

where

y_{i}

and

y_{j}

denote the latent vectors at the corresponding location. Cosine similarity is commonly used as a proxy for semantic or structural consistency.

In EO missions, to optimize coverage and mosaicking quality, imaging strips are typically planned to run in parallel or to have only small relative rotations [46]. Therefore, Figure 3a,b illustrates two representative cases, one without rotation and the other with a 3° rotation. As shown in Figure 3a, even near the boundaries of the overlapping region, where convolutional receptive field differences are more pronounced, the corresponding latent representations remain highly similar. Figure 3b further shows that, even when regions with clear geometric structures, such as buildings, exhibit visible misalignment and reduced similarity, more homogeneous natural regions, such as rivers, remain highly similar.

Importantly, the latent representation is the direct coding object in LIC. Therefore, eliminating redundancy in the latent space can directly reduce the amount of data that needs to be entropy-coded and transmitted, thereby alleviating the downlink bottleneck. However, realizing this idea is non-trivial due to several fundamental challenges:

Constrained on-board computation. Satellite payload processors have limited computing capability, making expensive centralized pairwise similarity computation difficult to perform in real time.
Limited inter-satellite communication. Redundancy detection cannot rely on exchanging large volumes of high-dimensional latent representations, since ISL bandwidth is also scarce.
Codability and decodability of latent representations. Due to the spatial dependencies in entropy modeling, once redundancy elimination changes the spatial structure of the latent representation, direct encoding and decoding under the original LIC pipeline are no longer feasible.

3.2. Overview

To address the above challenges, we propose Distributed Latent Representation Clustering (DLRC) to enable the elimination of redundancy in real-time multi-satellite observations. DLRC performs distributed redundancy identification and latent representation deduplication through lightweight hash computation and signature exchange, thereby accommodating the stringent resource constraints on satellites. It then codes the retained latent representation using entropy-modeling results generated from the original one, and realizes the same decoding order at the ground station through cross-bitstream reference completion, thereby preserving compatibility with mainstream LIC architectures.

Specifically, DLRC is integrated into the standard LIC pipeline and identically deployed across all participating satellites. Its overall architecture comprises four lightweight on-board modules and a single ground-side module, as illustrated in Figure 4:

Recognizer. The Recognizer captures the directional characteristics of high-dimensional latent vectors and generates a discrete signature for each vector.

Selector. The Selector performs intra-satellite signature de-duplication. It clusters identical signatures and corresponding vectors and selects a single representative for each cluster. Meanwhile, it records the assignment map as metadata, which maps each original spatial position to its corresponding retained representative.

Arbiter. The Arbiter performs inter-satellite signature de-duplication. It detects whether local signatures are also held by other satellites and determines the retention or discard of these signatures. Accordingly, it updates the assignment map to ensure that spatial positions associated with discarded signatures are indexed to the representatives retained by other satellites.

Filter. The Filter removes latent vectors from the original latent representation if their associated signatures are excluded from the final signature set obtained after the two-stage de-duplication.

Reconstructor. The Reconstructor re-indexes the retained latent vectors from the global pool back into their original spatial positions based on the recorded assignment map.

Following the common practice in LIC, we view a latent representation

y \in R^{C \times H^{'} \times W^{'}}

as a collection of

H^{'} W^{'}

latent vectors, denoted as

y_{1}, y_{2}, \dots, y_{H^{'} W^{'}}

. Each latent vector

y_{i} \in R^{C}

corresponds to the latent elements across all channels at the i-th spatial location.

These latent vectors from

y

are first forwarded to the Recognizer, which generates a scalar signature for each vector. The mapping from vectors to signatures is globally unified, ensuring that signatures are eligible for direct cross-satellite comparison. This is a fundamental prerequisite for distributed clustering. The Selector then removes duplicate signatures to obtain the set of locally unique signatures. Each satellite subsequently exchanges its local signature set with other participating satellites. By consulting the global signature pool, the Arbiter determines which locally unique signatures are retained as part of the globally unique set based on the relative priority of the local satellite. The final assignment map produced by the two-stage signature deduplication is losslessly compressed and transmitted alongside the standard LIC metadata. Notably, the aforementioned process operates in parallel with the hyperprior entropy modeling of the LIC framework, thereby hiding waiting-state latency in the compression pipeline. Finally, quantized latent representation

\hat{y}

and the finalized signature set are jointly provided to the Filter to produce a compact, pruned latent representation, which is then entropy-coded according to its corresponding probability distribution entries.

Since each satellite now only needs to code a subset of

\hat{y}

, the number of symbols entering the arithmetic coder is significantly reduced, leading to a direct decrease in the total bitstream size. The final bitstream therefore comprises the hyperprior

\hat{z}

, the hyperprior metadata, the pruned

\hat{y}

, and the assignment map. Consequently, DLRC ensures that the data downlinked from multiple satellites contain mutually complementary and non-redundant information.

Upon receiving the downlink bitstream from all participating satellites, the GS step-by-step decodes the pruned

\hat{y}

. Finally, according to the assignment map carried in the downlink metadata, the Reconstructor copies and assembles the recovered subsets to reconstruct each satellite’s approximation

\tilde{y}

of the original

y

in its original shape for subsequent image reconstruction.

3.3. Local Latent Representation Clustering

In this section, we detail the on-board process for local latent representation clustering, which comprises the Recognizer and the Selector.

The core of the Recognizer is a clustering algorithm based on Locality-Sensitive Hashing (LSH) [47], a widely adopted technique for capturing similarities among high-dimensional vectors. Specifically, for each latent vector

y_{i} \in R^{C}

in latent representation

y

, we implement LSH using Cross-Polytope Hashing (CPH) [48] with random projections. The hash function for a single hash round is defined as:

h (y_{i}) = {argmax}_{j \in {\pm 1, \pm 2, . . ., \pm C}} {| R y_{i} |}_{j}

(8)

where

R \in R^{C \times C}

is a random rotation matrix, and

| R y_{i} |_{j}

is the absolute value of the j-th component of the rotated

y_{i}

. The sign of index j is chosen to match the sign of the corresponding component in

R y_{i}

. Since a single hash function may not be sufficiently discriminative, we perform k independent hash rounds, each with a newly sampled

R

, to generate multiple hash codes

h_{1} (y_{i}), \dots, h_{k} (y_{i})

. The final hash signature of

y_{i}

, denoted as

s_{i}

, is generated by iteratively mixing the codes from all k rounds into a single 64-bit integer:

s_{i}^{(r)} = F (h_{r} (y_{i}), s_{i}^{(r - 1)}) (mod 2^{64})

(9)

where

r \in {1, \dots, k}

,

s_{i}^{(0)} = 0

,

s_{i} = s_{i}^{(k)}

, and

F

denotes a non-linear mixing function (e.g., a combination of bitwise XOR, shifts, and addition to promote uniform distribution). By applying the modulo operation with

2^{64}

, the signature corresponding to each latent vector is constrained to 64 bits. In addition, by deploying the same ordered set of random projection matrices

R

across all satellites participating in collaborative compression, identical or highly similar latent vectors can be mapped to the same hash signature. This ensures that signatures generated independently by different satellites remain globally comparable.

The Recognizer introduces only a low additional computational cost. For a latent representation

y \in R^{C \times H^{'} \times W^{'}}

, the computational complexity of the Recognizer is:

O_{rec} = O (H^{'} W^{'} k C^{2})

(10)

where k is the number of hash rounds. This is equivalent to a standard convolution layer operating on

y

, and is much smaller than the overall computation of the original LIC pipeline.

Building on the obtained hash signatures, the Selector’s operation consists of two steps. First, it performs de-duplication on these signatures to extract unique local signatures for immediate global synchronization. Specifically, for identical signatures, we directly retain the one with the smallest spatial index. Second, to support actual pruning in the Filter, a corresponding latent vector must be assigned to each retained signature. Latent vectors with identical signatures are grouped into the same cluster. Since the vectors within each cluster are already sufficiently similar, we directly use the original latent vector at the retained position as the representative. This preserves alignment with the latent distribution expected by the pre-trained hyperprior entropy model, which potentially maximizes compression gain. Alternatively, we can compute the mean of all latent vectors in the cluster

{\bar{y}}_{m}

to serve as an unbiased result and use it to replace all latent vectors belonging to that cluster in the latent representation

y

:

{\bar{y}}_{m} = \frac{1}{n} \sum_{i = 1}^{n} y_{m, i}

(11)

where n is the number of latent vectors in the cluster. This acts as a denoising mechanism, which minimizes reconstruction distortion. These two strategies represent distinct fidelity-bpp trade-offs rather than a hierarchy of superiority, allowing selection based on specific requirements.

y

is then fed into the standard LIC entropy modeling procedure.

In the meantime, an assignment map

A \in N^{H^{'} \times W^{'}}

is constructed to record these representative relationships. For each cluster, the spatial index of the retained position is assigned to the corresponding entries of

A

for all latent vectors in that cluster.

Since the Selector only involves simple accumulation and averaging, its computational complexity is linear in the size of

y

:

O_{sel} = O (H^{'} W^{'} C)

(12)

3.4. Global Cluster Signature Synchronization

In this section, we elaborate on the global cluster signature synchronization and the follow-up coding, which comprises the Arbiter and the Filter.

Following the intra-satellite signature de-duplication, each satellite exchanges its local signatures with other satellites within the same observation region. These signatures are transmitted in their original spatial order, and each signature is a compact int64 integer. The per-transmission communication volume V is determined by:

V = M \cdot C \cdot B

(13)

where M is the number of spatial positions, C is the number of channels per position, and B is the bit-width. For a

256 \times 256

image, its corresponding latent representation, and the resulting DLRC signatures, the typical parameter sets

(M, C, B)

are

(256^{2}, 3, 8)

for raw pixels,

(16^{2}, 192, 32)

for latent vectors, and

(16^{2}, 1, 64)

for the signatures before de-duplication. By comparing these configurations, the signatures inherently provide a 96-fold reduction in communication volume compared to both raw and latent data (which have the same data volume under the current configuration), resulting in a per-transmission volume of 2 KiB.

Then, after receiving the signature sets from other participating satellites, the Arbiter checks whether any signature in its locally unique set also appears elsewhere. In case of conflict, which satellite retains the signature is determined by the customized global rules of the satellite system. For instance, the on-board system can evaluate a priority metric such as link quality [49] or energy state [50] during runtime, or simply satellite ID. If the local satellite has the highest relative priority among all satellites holding the same signature, the conflicting signature is retained; otherwise, it is discarded. By independently applying this arbitration at each satellite, consistent global de-duplication decisions can be achieved in a distributed manner.

The assignment map

A \in N^{H^{'} \times W^{'}}

is also updated during this global de-duplication step. The index of the signature from the other satellite, derived from its spatial order when transmitted, is assigned to the corresponding position of the replaced local signature in

A

. After this stage,

A

consists of

H^{'} \times W^{'}

indices referring to globally unique representatives. Each index must encode both the satellite identity and the spatial position within that satellite, which requires a relatively large bit width for each entry. To reduce this overhead, the Arbiter performs a dense remapping of these global indices. Since each satellite holds the complete set of signatures, the indices in

A

can be independently remapped at each satellite into contiguous and non-overlapping dense number ranges, where the starting offset of each satellite is given by the total number of unique representatives retained by all higher-priority satellites. After this remapping, the entries in

A

no longer indicate the global spatial position of a representative, but instead denote its global sequential number, with each number remaining in one-to-one correspondence with a representative, thereby reducing the bit width for each entry. The final assignment map

A

is further compressed using a lossless algorithm (e.g., LZMA [51]) and is downlinked alongside the compressed bitstream as metadata. Moreover, since all signatures are synchronized across satellites, the assignment maps for all satellites can be entirely computed on a single satellite with the same overhead, which is typically more favorable for subsequent LZMA compression.

The computational complexity of the Arbiter is nearly linear in the number of involved signatures and assignment entries:

O_{arb} = O (N H^{'} W^{'})

(14)

where N denotes the number of participating satellites.

Finally, the Filter operates based on the signature set after two-stage deduplication, which is a local subset of the globally unique signatures. It performs a corresponding pruning on the quantized latent representation

\hat{y}

, retaining only the latent vectors whose signatures are included in this set.

Before this, we have already performed entropy modeling based on

y

(after possible cluster-wise mean replacement by the Selector) to obtain the distribution parameters

(μ, σ) \in R^{C \times H^{'} \times W^{'}}

for the entire latent space, whose shapes are identical to

y

. By selecting the parameters

(μ, σ)

corresponding to the retained latent vectors, the latter are directly encoded into the bitstream within the arithmetic encoder. This obviates the need for additional training to fit the distribution of the pruned latent representation.

The Filter only requires a linear scan over the latent space, and its computational complexity is:

O_{fil} = O (H^{'} W^{'} C)

(15)

Overall, the computational complexity of the proposed on-board modules is dominated by the Recognizer term and can be summarized as:

O_{total} = O (H^{'} W^{'} k C^{2})

(16)

3.5. Centralized Latent Representation Reconstruction

In this section, we describe the workflow of centralized latent representation reconstruction on the ground, which is performed by the Reconstructor.

After aggregating the bitstreams from all satellites at the GS, the hyperpriors

z

from each satellite are directly decoded following the original LIC pipeline. However, as introduced in Section 2.2, for LIC models with joint hyperprior and context entropy modeling, the distribution parameters

(μ, σ)

of the quantized latent representation

\hat{y}

are not determined by

z

alone. Instead, they are generated sequentially from the decoded

z

together with the previously reconstructed

\hat{y}

. Therefore,

\hat{y}

must also be decoded and reconstructed sequentially in spatial order.

Accordingly, we recover the

\hat{y}

of different satellites sequentially following the inter-satellite priority determined during global arbitration. As illustrated in Figure 5, in the decoding process integrated with DLRC, for each satellite, the Reconstructor then traverses the dense assignment map

A

in raster order (from left to right and top to bottom). At each spatial position, it first derives the current distribution parameters from the decoded

z

and the previously reconstructed latent context. Since the entries in

A

are dense sequential numbers, the retained latent vectors are re-indexed according to their relative magnitudes. Specifically, when a new index is encountered in

A

, the corresponding latent vector is entropy-decoded from the bitstream of the current satellite using these distribution parameters. When an index that has already appeared is encountered, the Reconstructor directly copies the corresponding latent vector from the previously reconstructed latent representation. By repeating this process sequentially, the full latent representation

\tilde{y}

of each satellite is reconstructed. As a result, the reconstructed latent representation

\tilde{y}

is a cluster-consistent approximation of the original

\hat{y}

, in which all spatial positions sharing the same signature are represented by a single representative latent vector. It can then be fed into the original LIC decoder and mapped back to the pixel domain to produce the reconstructed image.

4. Evaluation

4.1. Experimental Setup

Environment. Our experimental testbed is a node equipped with 8 NVIDIA A100 SXM4 80 GB GPUs (NVIDIA, Santa Clara, CA, USA), two 32-core Intel(R) Xeon(R) Platinum 8369B @ 2.90 GHz CPUs (Intel, Santa Clara, CA, USA), and 2 TB of RAM. The node runs on Ubuntu 24.04.2 LTS with Linux kernel 5.10, with a software stack that consists of Python 3.12.3, PyTorch 2.8.0a0, and CUDA 12.9. We utilize this high-performance node to accelerate the training and evaluation of various baseline models. In addition, we employ a relatively lower-performance node as a proxy for on-board hardware to evaluate the practical overhead of DLRC. The experiments are conducted using an NVIDIA TITAN V 12 GB GPU (NVIDIA, Santa Clara, CA, USA), two 48-core Intel(R) Xeon(R) E5-2650 v4 @ 2.20 GHz CPUs (Intel, Santa Clara, CA, USA), and 62 GB of RAM. It runs Ubuntu 20.04.6 LTS with Linux 5.15.0-113-generic, using Python 3.10.15, PyTorch 2.4.1, and CUDA 12.1. We implement DLRC in Python and integrate it with the CompressAI library [52]. While CompressAI employs the legacy DataParallel (DP) from torch.nn, we re-implement the training pipeline using DistributedDataParallel (DDP) for two primary objectives. First, the multi-process distributed paradigm of DDP significantly enhances training efficiency and scalability on multi-GPU systems compared to the single-process DP approach. Second, and more importantly, the DDP architecture provides the necessary inter-device communication primitives required to evaluate DLRC, which enables synchronized operations across different GPU devices that are not feasible under the DP configuration.

Datasets. We use the large-scale EO dataset STAR [37]. STAR contains 1273 high-resolution images with spatial resolutions ranging from 0.15 m to 1 m, and the majority of these images possess large dimensions ranging from 2000 to 16,000 pixels. The dataset covers 11 categories of geospatial scenarios closely associated with human activities, including airports, ports, nuclear power stations, and dams. The training set in STAR is employed to train the baseline models. Subsequently, comprehensive evaluations are conducted on the validation and test sets. For training, we randomly crop patches of size

256 \times 256

pixels from the original images with diverse sizes. For evaluation, we execute an identical model instance on each of 8 GPU devices to simulate an overlapping observation scenario involving multiple satellites. Then, we first randomly select observation windows of size

W \times W

, where

W \in {512, 1024}

, within each image. Each device then independently performs random cropping of

256 \times 256

patches within selected windows. A stride of 16 pixels is used for the random patch crops to induce clearly distinguishable overlap patterns. Specifically, we define the overlap degree

o_{\exp}

as the expected number of observations covering the same ground pixel across the simulated satellites. Formally,

o_{\exp}

is defined as:

o_{\exp} = \frac{N \cdot P^{2}}{W^{2}}

(17)

where N is the number of satellites, P is the input patch size. Accordingly, under the fixed setup of

N = 8

and

P = 256

,

W = 512

corresponds to a denser overlap setting with

o_{\exp} = 2.0

, meaning that each ground pixel is expected to be captured by two satellites under different image contexts. In contrast,

W = 1024

corresponds to a sparser overlap setting with

o_{\exp} = 0.5

, meaning that the observations from all satellites are expected to cover only half of the observation window. This design enables us to approximate a range of realistic multi-satellite overlap scenarios in the absence of real-time multi-satellite observation datasets. In addition, we use the EarthView dataset [53] to further investigate the impact of real viewing geometry in overlapping observations on DLRC. The dataset provides EO images with a ground sampling distance of 1 m, together with comprehensive metadata, including viewing azimuth, off-nadir angle, sun elevation, and sun azimuth. From its Satellogic subset, we randomly select 4024 images for training and another 128 revisit images from different regions for testing. The input consists of the original

384 \times 384

RGB images.

Baselines. Since DLRC represents the first real-time multi-satellite redundancy elimination framework within the LIC pipeline, prior multi-satellite methods [30,31] are methodologically orthogonal to ours and do not provide directly comparable baselines for LIC tasks (details in Section 2.2). Therefore, we evaluate our framework by integrating it with four representative LIC models. These models often serve as backbones for recent advances. They represent CNN and lightweight attention-based architectures commonly adopted in practical on-board scenarios. BMSHJ [18] is a foundational work that pioneered the use of a scale hyperprior in end-to-end image compression, which establishes the standard entropy modeling framework adopted by most subsequent LIC methods. MBT [19] is a canonical autoregressive-based model that significantly improves entropy estimation through joint autoregressive and hierarchical priors. Cheng [20] exemplifies attention-based LIC models, demonstrating how lightweight attention mechanisms can be integrated into convolutional architectures to enhance contextual modeling. ELIC [21] is a widely adopted CNN baseline that achieves strong compression performance through unevenly grouped space–channel context modeling and an efficient checkerboard-based context decoding strategy. In addition, we include COSMIC [28], a recent satellite-oriented LIC model with a lightweight encoder and a ground-side compensation mechanism, only for computational-overhead evaluation, given its emphasis on efficient on-board compression.

Training details. On the STAR dataset, the four representative LIC models are trained from scratch for 100 epochs on 8 NVIDIA A100 GPUs using DDP to ensure competitive compression performance. With a batch size of 16 per device, this configuration yields a global batch size of 128. We optimize the objective function using the Mean Squared Error (MSE) as the distortion term and utilize the Adam optimizer [54] with a fixed learning rate of

1 \times 10^{- 4}

. To support performance evaluation across different bits per pixel (bpp) regimes, we train separate models for each

λ \in {0.0067, 0.013, 0.025, 0.0483}

, which covers a representative range of the Rate–Distortion (R–D) tradeoff. On the EarthView dataset, we follow the same training setup but train only the ELIC model.

Performance metrics. To evaluate the reconstruction quality, we employ two widely-used metrics: Peak Signal-to-Noise Ratio (PSNR) [55] and Multi-Scale Structural Similarity Index Measure (MS-SSIM) [45]. Both metrics are computed on images quantized to the integer range

[0, 255]

. PSNR is measured based on the MSE between the reconstructed image

\hat{x}

, and the original image

x

, where

{MAX}_{I} = 255

denotes the maximum possible pixel value:

PSNR (x, \hat{x}) = 10 {log}_{10} (\frac{{MAX}_{I}^{2}}{MSE (x, \hat{x})})

(18)

MS-SSIM is calculated following the multi-scale structural similarity formulation:

MS - SSIM (x, \hat{x}) = {[l_{J} (x, \hat{x})]}^{α_{J}} \prod_{j = 1}^{J} {[c_{j} (x, \hat{x})]}^{β_{j}} {[s_{j} (x, \hat{x})]}^{γ_{j}}

(19)

where J denotes the number of scales,

l_{J} (\cdot)

represents the luminance comparison at the coarsest scale,

c_{j} (\cdot)

and

s_{j} (\cdot)

correspond to the contrast and structure comparison functions defined in SSIM, respectively. The scale weights

α_{J}

,

β_{j}

, and

γ_{j}

are set to the standard values defined in [45]. While PSNR reflects pixel-wise reconstruction fidelity, MS-SSIM provides a closer correlation with human perception. For both metrics, higher values indicate superior reconstruction quality. Together, they offer a standardized and comprehensive assessment of R–D performance.

DLRC configuration. Unless otherwise specified, the number of hash functions (i.e., hashing rounds) in DLRC is 20 to suppress approximation errors introduced during clustering. When generating cluster representatives, we by default employ the mean-based strategy to optimize reconstruction quality. We treat each device in our testbed as an individual satellite, and utilize the global rank of each device as an intuitive instantiation of the downlinking priority. These static configurations ensure both a fair comparison and experimental reproducibility.

4.2. Experimental Results

R–D performance. Figure 6 presents the R–D performance of the four baselines and the gains achieved when integrated with our proposed DLRC. To fully reveal the impact of latent representation pruning on reconstruction quality, we focus on the primary entropy-coded bpp for both the baselines and the versions integrated with DLRC. All results are obtained on the STAR test set and organized into two primary parts: Figure 6a shows results in terms of PSNR, while Figure 6b displays the MS-SSIM metrics. Within each part, rows correspond to specific baseline architectures. Within each row, we compare results under two degrees of overlap:

o_{\exp} = 2.0

and

o_{\exp} = 0.5

. Because different overlap degrees correspond to distinct input image compositions, their results are presented in separate subplots to ensure a fair comparison. In each plot, curves positioned closer to the top-left corner indicate superior performance, as they achieve higher fidelity (PSNR) or perception (MS-SSIM) at a lower bpp. As illustrated, across all evaluated models, DLRC achieves a bpp reduction ranging from 33.46% to 39.70% under the denser overlap degree (

o_{\exp} = 2.0

), whereas the reduction remains between 9.29% and 11.78% under the sparse overlap degree (

o_{\exp} = 0.5

). Meanwhile, the corresponding PSNR variations are typically confined within 0.01 dB, and the MS-SSIM differences remain within 0.001. These results indicate that DLRC maintains nearly identical reconstruction quality to the baselines while significantly optimizing the bpp. The consistency of performance gains across diverse baseline architectures further demonstrates the robustness of DLRC. This also confirms its effectiveness in capturing and eliminating cross-observation redundancies and in scaling its efficiency with increasing multi-satellite coverage density.

Metadata overhead. Figure 7 presents the total bpp comparison for the four baseline models and their versions integrated with DLRC. For the latter, the total bpp includes both the entropy-coded part and the metadata overhead. Here, the metadata overhead is separately reported for the dense-remapped assignment map and for its further LZMA-compressed version. The original results and the entropy-coded bpp are identical to those reported in Figure 6, and the evaluation is conducted under the same configuration and procedure. For the raw assignment map, after dense remapping, each sample is stored using the minimum fixed-length bit width required to represent all assignment indices. For the LZMA case, as described in Section 3.5, the complete assignment maps from all participating satellites are sequentially flattened and compressed together. As shown in Figure 7, the raw assignment overhead remains relatively stable, mainly because most entries have a fixed 16-bit length. Under sparse overlap in Figure 7b, this overhead can partially offset the coding gain of DLRC and even make the total bpp slightly higher than that of the original baseline. In contrast, after applying LZMA compression, the metadata overhead becomes negligible across all model architectures,

λ

settings, and overlap degrees. Overall, after accounting for the LZMA-compressed assignment metadata overhead, DLRC achieves a bpp reduction ranging from 28.74% to 38.91% under the denser overlap setting (

o_{\exp} = 2.0

), while the reduction remains between 6.00% and 11.15% under the sparse overlap setting (

o_{\exp} = 0.5

).

Visual results. Figure 8 shows reconstructed image samples obtained from the validation set under varying bpp levels. To evaluate image reconstruction quality under extreme constraints, we select results from the device with the lowest downlinking priority at an overlap degree of

o_{\exp} = 2.0

, a configuration that typically yields the maximum degradation in quality. Specifically, Figure 8a presents the results of models with

λ = 0.0067

at a lower bpp level, while Figure 8b depicts those with

λ = 0.0483

at a higher bpp level. Each subfigure is organized into two rows: the first row presents the original image followed by results from the four baseline models, while the second row displays the corresponding baselines integrated with DLRC. The model name, bpp, and performance metrics are provided below each image for clear comparison. As observed in the figure, the structures and colors in reconstructed images remain virtually unchanged between each baseline and its DLRC-enhanced version, while the bpp is substantially reduced. The quality information below each image further confirms this, which is consistent with the R–D results. This further indicates that DLRC maintains consistent performance even in worst-case scenarios, rather than achieving its average gains at the expense of significant quality degradation on the low-priority satellite.

Impact on downstream tasks. Beyond the standard evaluations within the image compression domain, we further investigate the impact of DLRC on downstream EO image tasks. We adopt the object detection task as defined in the original work of the STAR dataset [37] to examine whether DLRC affects the semantic analysis of reconstructed images. Specifically, image patches are obtained from each image in the STAR validation set under an overlap degree of

o_{\exp} = 2.0

. These are then compressed and reconstructed using models with

λ = 0.0483

to generate the input for the detection task. Subsequently, we follow the detection pipeline established in STAR, where truncated objects at the patch boundaries are excluded from the evaluation. The results are summarized in Table 1. The percentages listed below each model represent the mean Average Precision (mAP) for object detection. Compared to the corresponding baselines, the decrease in mAP induced by DLRC remains consistently below 1%, which indicates that DLRC maintains high semantic fidelity for downstream analysis. Notably, mAP increases are observed in DLRC-enhanced versions of Mbt and Cheng. This improvement may be attributed to the fact that the clustering mechanism in DLRC facilitates representation merging, thereby enhancing certain semantic features within EO images.

Computational overhead. We evaluate the computational overhead on the NVIDIA TITAN V node described in Section 4.1, which serves as a proxy for a resource-constrained environment. Table 2 first presents a detailed breakdown of the encoding overhead of the baseline models and the additional cost introduced by DLRC, in terms of computation time and overall memory usage. For Table 2, the input patch size is fixed at

256 \times 256

, and the results are reported at

λ = 0.0067

. Models with

λ

from 0.0067 to 0.025 typically share the same model setting (e.g., latent channel dimension

C = 192

), and thus have the same computational overhead, while the model with

λ = 0.0483

uses a larger setting (e.g.,

C = 320

). Although DLRC can in principle be parallelized with parts of the LIC pipeline, such parallel execution is often difficult to fully realize on resource-constrained hardware. Therefore, we conservatively treat the computation of DLRC as serial overhead in our evaluation. As shown in the table, due to the low computational complexity of DLRC, the relative time of DLRC is very low for baselines with heavier context modeling (Cheng and ELIC). In contrast, for simpler hyperprior-only models, the same DLRC computation accounts for a larger fraction of the total encoding time. Meanwhile, the memory usage of DLRC is negligible compared with that of the baseline models, since it operates directly on the existing latent representations and only introduces index data. Figure 9 then shows how the computational overhead of the baseline models and DLRC grows with increasing patch size. In addition to

λ = 0.0067

, we also report results for

λ = 0.0483

to include a larger model setting. From Figure 9a, although the relative time of DLRC is more noticeable for small patches in the hyperprior-only models, its proportion decreases rapidly as the patch size increases. From Figure 9b, the memory usage of DLRC grows much more slowly than that of the baseline models and remains very low across all patch sizes. Overall, these results indicate that DLRC introduces only limited computational overhead, especially for larger patch sizes and stronger LIC baselines.

Communication overhead. To enable a fine-grained evaluation of the communication overhead, we build a satellite communication simulator, EdgeSpaceCom. Detailed descriptions of the simulator designs and runtime settings are provided in Appendix A. At each sampling instant, the simulator derives the real-time inter-satellite link bandwidth and topology among the 8 satellites in the observing region based on the current constellation state, and then simulates the synchronization traffic of local signatures at the flow level. A sync round is defined as the waiting time from when one satellite sends its local signatures to when it receives all signatures from the other participating satellites. In this experiment, the input patch sizes are

{256, 512, 1024, 2048}

. According to Section 3.4, the corresponding signature data volumes are

{2, 8, 32, 128}

KiB per satellite in each sync round. As shown in Figure 10, the simulator uses a sampling interval of 1 min over a total duration of 12 h, resulting in 720 samples in total. The simulation is conducted based on a classical Walker constellation [56], and therefore exhibits a periodic pattern in communication overhead. It can be observed that, under this configuration, the synchronization time remains short and shows only limited fluctuation. Since this communication can naturally overlap with the LIC entropy modeling procedure, its overhead can be fully hidden by that stage when context models are involved, according to the entropy modeling time in Table 2. For hyperprior-only models, it may still introduce a certain amount of synchronization waiting time. However, according to Figure 9 and Figure 10, this overhead can also be fully hidden as the patch size increases, since the computation time of the baseline models grows much faster.

End-to-end gain. To formalize the end-to-end gain of DLRC for satellite image downlink systems, we denote the baseline and DLRC by B and D, respectively. Under a given downlink configuration

(N_{w}, τ_{w}, x)

, where

N_{w}

is the number of communication windows per day,

τ_{w}

is the duration of each window, and x is the downlink rate, the daily volume of raw data processed by the two methods is defined as:

P_{B} = \frac{T_{day} S_{raw}}{t_{B}}, P_{D} = \frac{T_{day} S_{raw}}{t_{D}}

(20)

where

T_{day}

is the duration of one day,

S_{raw}

is the raw bit size of a single image patch, and

t_{B}

,

t_{D}

are the per-patch processing times. The corresponding daily raw-equivalent downlink capability is given by:

L_{B} = \frac{N_{w} τ_{w} x r_{raw}}{b p p_{B}}, L_{D} = \frac{N_{w} τ_{w} x r_{raw}}{b p p_{D}}

(21)

where

r_{raw}

denotes the total raw bit depth per pixel. Since both production and transmission bound the actually delivered data, the effective daily downlinked data for each method is the minimum of these two terms. Therefore, the downlink gain of DLRC is formulated as:

G_{down} (N_{w}, τ_{w}, x) = \frac{min (P_{D}, L_{D})}{min (P_{B}, L_{B})} - 1

(22)

Accordingly,

G_{down} = P_{D} / L_{B} - 1

indicates the regime where only the baseline is downlink-limited, so the bpp gain of DLRC is weakened by the additional overhead. Under tighter downlink configurations,

G_{down} = L_{D} / L_{B} - 1

indicates the regime where both methods are downlink-limited, so DLRC fully attains the gain from the bpp reduction. Figure A2 in Appendix B further visualizes the above results, using the total bpp in Figure 7, the computation time in Figure 9, and the maximum communication time in Figure 10.

Viewing geometry effect. Since real-time multi-satellite observation datasets with complete annotations are currently unavailable, we use historical revisit data to explore the potential impact of viewing geometry on the performance of DLRC. This experiment is conducted on a test set constructed from cross-temporal revisit images of several regions in the Satellogic subset of the EarthView dataset. The evaluation procedure is similar to that used in R–D performance, except that no overlap-degree distinction is considered here and use ELIC as the representative baseline. The angle ranges among different revisit images are summarized in Appendix B, Table A2, including changes in viewing azimuth, off-nadir angle, sun elevation, and sun azimuth. As shown in Figure 11, even under substantial differences in imaging conditions, DLRC still provides modest performance gains, reducing bpp by up to 8.12% at comparable quality, demonstrating its robustness.

4.3. Ablation Study

Number of hash functions. We perform the ablation study on the representative ELIC baseline under an overlap degree of

o_{\exp} = 2.0

. Figure 12 shows the impact of different numbers of hash functions on the key metrics of DLRC, including R–D performance and computation time (at

λ = 0.0067

). As shown in the figure, using more hash functions leads to higher bpp but better reconstruction quality. This indicates that DLRC can effectively identify redundant latent representations, while the recognition accuracy improves rapidly as the number of hash functions increases. In particular,

k = 8

already provides sufficient accuracy to avoid severe quality degradation caused by incorrect merging. Therefore, the number of hash functions serves as a practical lever to balance compression performance and computational overhead.

Representative selection strategies. Figure 12 also compares different representative selection strategies under the same number of hash functions. When k is small, the difference between the two strategies is more pronounced, and the Original Rep.strategy achieves a clearer reduction in bpp. As k increases, however, the gap gradually narrows. This is because higher recognition accuracy makes the original latent vector and the cluster mean increasingly similar within each cluster, eventually resulting in almost no difference between the two strategies. At the same time, the Original Rep. strategy brings only limited improvement in computation efficiency, mainly because the Selector module is not the dominant source of computation in DLRC, unlike the Recognizer.

5. Discussion

Perspective on Tile-Level Processing. Since satellites capture images covering vast geographical areas, on-board computational constraints often necessitate partitioning these large-scale images into smaller tiles for individual processing [14,57]. This implies that the redundancy within latent representation may also be partitioned into these tiles. The resulting fragmentation provides the motivation to extend DLRC from spatial synchronization within every single round to spatial-temporal redundancy elimination across multiple rounds. This necessitates viewing such redundancy elimination as a coordinated task, driving us to further position DLRC as a foundational framework for joint optimization within a highly dynamic, multi-satellite system to maximize the overall compression gain.

Flexibility and Optimization Trade-off. DLRC achieves architectural compatibility and obviates the need for re-training each individual LIC model. Conversely, this implies that the transforms and entropy model in LIC are not retrained for the post-pruning distribution, which introduces two limitations. First, the transforms have not been optimized to produce a DLRC-adaptive latent representation or to maximize reconstruction quality from the pruned latent representation. Second, the distribution parameters predicted by the entropy model are not estimated based on the actual pruned latent representation, resulting in non-optimal bit allocation. Future research could explore hybrid models that adaptively pursue the R–D performance frontier while maintaining the flexibility of DLRC.

6. Conclusions

In this work, we propose Distributed Latent Representation Clustering (DLRC), an efficient image compression framework designed to exploit and eliminate redundancy in multi-satellite observations. DLRC relies only on lightweight hash computation and negligible signature communication, and allows for seamless integration with existing learned image compression models without training. Experimental results demonstrate that, on the STAR dataset, DLRC achieves up to 38.91% reduction in total bits per pixel compared to baseline models, while PSNR variations are confined within 0.01 dB and MS-SSIM differences remain within 0.001. On our resource-constrained platform, the computational overhead of DLRC remains low, and in our constructed simulator, its communication overhead can be hidden. Through end-to-end analysis, DLRC translates its bpp reduction into an increase in downlinked data under multiple practical constraints. On the EarthView dataset with large variations in viewing geometry and illumination, DLRC still preserves up to 8.12% bpp savings. Owing to these properties, DLRC is well-suited for deployment in current satellite constellations to substantially enhance real-time compression performance.

Author Contributions

Conceptualization, X.L.; methodology, X.L.; software, X.L. and X.G.; validation, X.L. and X.G.; formal analysis, X.L.; investigation, X.L. and X.G.; resources, P.W., Z.C. and Y.Z.; data curation, X.L.; writing—original draft preparation, X.L.; writing—review and editing, P.W., Z.C. and Y.Z.; visualization, X.L. and X.G.; supervision, P.W., Z.C. and Y.Z.; project administration, P.W., Z.C. and Y.Z.; funding acquisition, P.W., Z.C. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Key Research and Development Program of China (2022YFC2203700).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

EO	Earth Observation
ISL	Inter-Satellite Link
GS	Ground Station
LIC	Learned Image Compression
DLRC	Distributed Latent Representation Clustering

Appendix A

This appendix section describes EdgeSpaceCom, a satellite communication-overhead simulator. EdgeSpaceCom models inter-satellite communication overhead of a constellation on a discrete-time basis: it reconstructs time-varying inter-satellite topology from offline orbital trajectories, maps a link budget to link rates and per-hop costs, and then, under multi-hop routing, computes the completion time of parallel unicast synchronization rounds.

As illustrated in Figure A1, EdgeSpaceCom provides a communication-simulation framework for overhead analysis in mega-constellations. It first samples constellation trajectories within a specified time window, then constructs a time-slotted topology graph and computes link costs at each time step. Next, a routing solver computes minimum-delay paths on the resulting time-varying graphs. Finally, it aggregates round-trip delay and topology-event overhead to evaluate network performance.

Figure A1. Overall workflow of the communication-overhead simulator.

EdgeSpaceCom first converts continuous trajectories into a unified sequence of discrete sampling instants: the simulation window is

T = 12

h, the sampling step is

Δ t = 1

min, and the k-th sampling instant is denoted by

t_{k}

.

Orbital trajectories are exported from the STK simulator. The scenario uses a Walker constellation at an altitude of 550 km, covering the full 24-h motion of 1000 satellites. Trajectory data are recorded as timestamp–3D position pairs, with timestamps in UTCG format and positions in the ECI coordinate frame. In offline simulation, a window of length T is cropped from the full trajectory and then sampled with step

Δ t

to obtain the discrete-time sequence. All subsequent topology construction, link-cost computation, and round-delay statistics share this sequence as the common time base. The current implementation generates trajectory snapshots using a fixed time window and uniform sampling, and triggers one full network computation at each sampling instant.

Given the discrete instants

{t_{k}}

, the simulator selects N satellites in the region of interest at each

t_{k}

as participating nodes (by default, a spatially compact subset is used), and then constructs the instantaneous topology and computes multi-hop paths and end-to-end delays. The network parameters and link-budget parameters used in EdgeSpaceCom are listed in Table A1.

Table A1. Network and link-budget parameters used in EdgeSpaceCom.

Parameter	Symbol	Value	Unit
Satellites in region	N	8	—
Direct-link distance threshold	$d_{max}$	3000	km
Unicast payload	S	2/8/32/128	KiB
Speed of light	c	$3 \times 10^{8}$	$m / s$
Relay processing overhead	$t_{r e l a y}$	0.5	ms/hop
Fixed control overhead	$t_{c t r l}$	0.03	s/round/sat
Link setup cost	$C_{s e t u p}$	5	ms/event
Link teardown cost	$C_{t e a r d o w n}$	2	ms/event
Transmit power	$P_{t}$	40	dBm
Tx antenna gain	$G_{t}$	35	dBi
Rx antenna gain	$G_{r}$	35	dBi
Carrier frequency	f	20	GHz
Bandwidth	B	100	MHz
Noise temperature	$T_{n}$	290	K
System loss	$L_{s y s}$	10	dB

Note: Values are those used in our experiments; “—” indicates a dimensionless quantity.

At any discrete instant t, the position of satellite i is denoted by

p_{i} (t)

, and the distance between satellites i and j is:

d_{i j} (t) = {∥ p_{i} (t) - p_{j} (t) ∥}_{2}

(A1)

If

d_{i j} (t) \leq d_{max}

, the link is considered available and can be formalized as:

link (i, j, t) = I (d_{i j} (t) \leq d_{max})

(A2)

The link cost combines propagation delay and transmission delay. For a unicast flow with payload size S (KiB), we first convert it to bits as

S_{b i t} = S \times 1024 \times 8

. Let the effective link rate be

B_{i j} (t)

(bps). The per-hop delay is:

t_{h o p} (i, j, t) = \frac{d_{i j} (t) \cdot 1000}{c} + \frac{S_{b i t}}{B_{i j} (t)}

(A3)

Under multi-hop forwarding, let

P_{s \to d} (t)

be the minimum-delay path and

H_{s, d} (t)

the hop count. The path delay and flow delay are:

t_{p a t h} (s, d, t) = \sum_{(u, v) \in P_{s \to d} (t)} t_{h o p} (u, v, t),

(A4)

t_{f l o w} (s, d, t) = t_{p a t h} (s, d, t) + (H_{s, d} (t) - 1) \cdot t_{r e l a y}

(A5)

In a parallel-unicast synchronization round, the completion times for sending and receiving at satellite s are:

T_{s e n d} (s, t) = max_{d \neq s} t_{f l o w} (s, d, t), T_{r e c v} (s, t) = max_{u \neq s} t_{f l o w} (u, s, t)

(A6)

The synchronization completion time at satellite s and the round completion time are:

T_{s y n c} (s, t) = max (T_{s e n d} (s, t), T_{r e c v} (s, t)) + t_{c t r l},

(A7)

T_{r o u n d} (t) = max_{s} T_{s y n c} (s, t)

(A8)

The topology-event overhead between two adjacent sampling instants is defined as:

Ω_{t o p o} (t) = N_{s e t u p} (t) \cdot C_{s e t u p} + N_{t e a r d o w n} (t) \cdot C_{t e a r d o w n}

(A9)

Here

t_{c t r l}

captures the fixed per-round control processing time, while

Ω_{t o p o} (t)

captures additional control burden triggered by link setups and teardowns.

Ω_{t o p o} (t)

is recorded as a separate metric to reflect the control burden induced by topology dynamics; the round completion time

T_{r o u n d} (t)

is computed as the worst-case synchronization completion time across satellites and is not double-counted with

Ω_{t o p o} (t)

.

In the end-to-end delay computation above, the key quantity is the effective rate

B_{i j} (t)

of any available link at time t. We use an RF free-space link budget together with a Shannon-capacity approximation to map distance to effective rate. For a link with distance

d_{i j} (t)

(km), the free-space path loss is:

L_{f s} (d_{i j} (t), f) = 32.44 + 20 {log}_{10} (d_{i j} (t)) + 20 {log}_{10} (f)

(A10)

The received power

P_{r} (i, j, t)

(dBm) is:

P_{r} (i, j, t) = P_{t} + G_{t} + G_{r} - L_{f s} (d_{i j} (t), f) - L_{s y s}

(A11)

The thermal-noise power

N (i, j)

(dBm) is:

N (i, j) = - 174 + 10 {log}_{10} (T_{n}) + 10 {log}_{10} (W \cdot 10^{6})

(A12)

Thus, the signal-to-noise ratio

SNR (i, j, t)

is given by:

{SNR}_{d B} (i, j, t) = P_{r} (i, j, t) - N (i, j), SNR (i, j, t) = 10^{{SNR}_{d B} (i, j, t) / 10}

(A13)

Let

δ

denote the protocol-overhead factor (we use

δ = 0.2

). The effective link rate (bps) is:

B_{i j} (t) = (W \cdot 10^{6}) {log}_{2} (1 + SNR (i, j, t)) \cdot (1 - δ)

(A14)

EdgeSpaceCom loads the hardware parameters from the corresponding entries in Table A1 and applies the above bandwidth computation uniformly to all available links at each sampling instant.

Putting all steps together yields one simulation iteration. At runtime, the simulator samples at fixed intervals. At each sample, it computes inter-satellite link bandwidths and topology, collects flow statistics, and derives the synchronization time of each satellite. The synchronization time serves as the reference for the communication overhead of proposed DLRC.

Appendix B

Figure A2. End-to-end gain of DLRC at patch size

256 \times 256

and

λ = 0.0483

on four-Titan-V satellites. The curved segment:

G_{down} = P_{D} / L_{B} - 1

; the horizontal segment:

G_{down} = L_{D} / L_{B} - 1

.

Figure A2. End-to-end gain of DLRC at patch size

256 \times 256

and

λ = 0.0483

on four-Titan-V satellites. The curved segment:

G_{down} = P_{D} / L_{B} - 1

; the horizontal segment:

G_{down} = L_{D} / L_{B} - 1

.

Table A2. Angle ranges of the revisit images in the Satellogic subset of the EarthView dataset.

Parameter	Range (°)
Viewing Azimuth	96.92–282.26
Off-Nadir Angle	1.74–23.83
Sun Azimuth	148.22–213.57
Sun Elevation	18.33–57.27

References

Yang, J.; Gong, P.; Fu, R.; Zhang, M.; Chen, J.; Liang, S.; Xu, B.; Shi, J.; Dickinson, R. The role of satellite remote sensing in climate change studies. Nat. Clim. Change 2013, 3, 875–883. [Google Scholar] [CrossRef]
Chen, C.; Dubovik, O.; Schuster, G.L.; Chin, M.; Henze, D.K.; Lapyonok, T.; Li, Z.; Derimian, Y.; Zhang, Y. Multi-angular polarimetric remote sensing to pinpoint global aerosol absorption and direct radiative forcing. Nat. Commun. 2022, 13, 7459. [Google Scholar] [CrossRef]
Sarah, C.; Rochelle, S.; Johanna, N.; Michelle, H.; Sofia, F.; Ying, W.; Michael, R.; Kristin, A.; Jean-Philippe, A.; Mark, D.; et al. Earth observations for climate adaptation: Tracking progress towards the Global Goal on Adaptation through satellite-derived indicators. npj Clim. Atmos. Sci. 2025, 8, 359. [Google Scholar]
Adamiak, M.; Grinblat, Y.; Psotta, J.; Fulman, N.; Mazumdar, H.; Tang, S.; Zipf, A. Deep Learning Enhanced Road Traffic Analysis: Scalable Vehicle Detection and Velocity Estimation Using PlanetScope Imagery. arXiv 2024, arXiv:2410.14698. [Google Scholar] [CrossRef]
Zhang, L.; Guo, H.; Liang, D.; Lv, Z.; Li, Z.; Geng, Y.; Liu, X.; Lv, M.; Dou, C. A study on detection of human activity using SDGSAT-1 glimmer imager data over urban agglomerations in China. Remote Sens. Environ. 2025, 328, 114886. [Google Scholar] [CrossRef]
Xu, Y.; Gao, S.; Huang, Q.; Göçmen, A.; Zhu, Q.; Zhang, F. Predicting human mobility flows in cities using deep learning on satellite imagery. Nat. Commun. 2025, 16, 10372. [Google Scholar] [CrossRef]
Zheng, Z.; Zhong, Y.; Wang, J.; Ma, A.; Zhang, L. Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man-made disasters. Remote Sens. Environ. 2021, 265, 112636. [Google Scholar] [CrossRef]
Shastry, A.; Carter, E.; Coltin, B.; Sleeter, R.; McMichael, S.; Eggleston, J. Mapping floods from remote sensing data and quantifying the effects of surface obstruction by clouds and vegetation. Remote Sens. Environ. 2023, 291, 113556. [Google Scholar] [CrossRef]
Eudaric, J.; Kreibich, H.; Camero, A.; Rafiezadeh Shahi, K.; Martinis, S.; Zhu, X.X. A satellite imagery-driven framework for rapid resource allocation in flood scenarios to enhance loss and damage fund effectiveness. Sci. Rep. 2024, 14, 19290. [Google Scholar] [CrossRef]
Gomes, C.; Wittmann, I.; Robert, D.; Jakubik, J.; Reichelt, T.; Maurogiovanni, S.; Vinge, R.; Hurst, J.; Scheurer, E.; Sedona, R.; et al. Lossy neural compression for geospatial analytics: A review. IEEE Geosci. Remote Sens. Mag. 2025, 13, 97–135. [Google Scholar] [CrossRef]
Tao, B.; Masood, M.; Gupta, I.; Vasisht, D. Transmitting, fast and slow: Scheduling satellite traffic through space and time. In Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, Madrid, Spain, 2–6 October 2023; pp. 1–15. [Google Scholar]
Starlink. Starlink for Businesses. 2024. Available online: https://www.starlink.com/ca/business (accessed on 17 February 2026).
Tao, B.; Chabra, O.; Janveja, I.; Gupta, I.; Vasisht, D. Known knowns and unknowns: Near-realtime earth observation via query bifurcation in serval. In Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), Santa Clara, CA, USA, 16–18 April 2024; pp. 809–824. [Google Scholar]
Denby, B.; Chintalapudi, K.; Chandra, R.; Lucia, B.; Noghabi, S. Kodan: Addressing the computational bottleneck in space. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Vancouver, BC, Canada, 25–29 March 2023; Volume 3, pp. 392–403. [Google Scholar]
Wallace, G.K. The JPEG still picture compression standard. Commun. ACM 1991, 34, 30–44. [Google Scholar] [CrossRef]
Taubman, D.S.; Marcellin, M.W.; Rabbani, M. JPEG2000: Image compression fundamentals, standards and practice. J. Electron. Imaging 2002, 11, 286–287. [Google Scholar] [CrossRef]
Hernández-Cabronero, M.; Kiely, A.B.; Klimesh, M.; Blanes, I.; Ligo, J.; Magli, E.; Serra-Sagrista, J. The ccsds 123.0-b-2 “low-complexity lossless and near-lossless multispectral and hyperspectral image compression” standard: A comprehensive review. IEEE Geosci. Remote Sens. Mag. 2021, 9, 102–119. [Google Scholar] [CrossRef]
Ballé, J.; Minnen, D.; Singh, S.; Hwang, S.J.; Johnston, N. Variational image compression with a scale hyperprior. arXiv 2018, arXiv:1802.01436. [Google Scholar] [CrossRef]
Minnen, D.; Ballé, J.; Toderici, G.D. Joint autoregressive and hierarchical priors for learned image compression. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
Cheng, Z.; Sun, H.; Takeuchi, M.; Katto, J. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7939–7948. [Google Scholar]
He, D.; Yang, Z.; Peng, W.; Ma, R.; Qin, H.; Wang, Y. Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5718–5727. [Google Scholar]
Liu, J.; Sun, H.; Katto, J. Learned image compression with mixed transformer-cnn architectures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14388–14397. [Google Scholar]
Li, Y.; Zhang, H.; Li, L.; Liu, D.; Wu, F. Scaling Learned Image Compression Models up to 1 Billion. arXiv 2025, arXiv:2508.09075. [Google Scholar] [CrossRef]
Satellogic. We Reinvented the Satellite from the Ground Up. 2025. Available online: https://satellogic.com/technology/satellites (accessed on 17 February 2026).
Zhang, L.; Hu, X.; Pan, T.; Zhang, L. Global priors with anchored-stripe attention and multiscale convolution for remote sensing image compression. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 138–149. [Google Scholar] [CrossRef]
Xiang, S.; Liang, Q. Remote sensing image compression based on high-frequency and low-frequency components. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
Guerrisi, G.; Del Frate, F.; Schiavon, G. Artificial intelligence based on-board image compression for the φ-sat-2 mission. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 8063–8075. [Google Scholar] [CrossRef]
Zhang, Z.; Qiu, H.; Zhang, M.; Liu, J.; Chen, B.; Zhang, T.; Li, H. Cosmic: Compress satellite images efficiently via diffusion compensation. In Proceedings of the NeurIPS, Vancouver, BC, Canada, 10–15 December 2024; pp. 10–15. [Google Scholar]
Furutanpey, A.; Zhang, Q.; Raith, P.; Pfandzelter, T.; Wang, S.; Dustdar, S. Fool: Addressing the downlink bottleneck in satellite computing with neural feature compression. IEEE Trans. Mob. Comput. 2025, 24, 6747–6764. [Google Scholar] [CrossRef]
Du, K.; Cheng, Y.; Olsen, P.; Noghabi, S.; Jiang, J. Earth+: On-board satellite imagery compression leveraging historical earth observations. In Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Rotterdam, The Netherlands, 30 March–3 April 2025; Volume 1, pp. 361–376. [Google Scholar]
Sun, C.; Zhang, Y.; Tao, B.; Vasisht, D.; Marina, M. DeepSpace: Super Resolution Powered Efficient and Reliable Satellite Image Data Acquistion. In Proceedings of the ACM SIGCOMM 2025 Conference, Coimbra, Portugal, 8–11 September 2025; pp. 311–328. [Google Scholar]
Boukabara, S.A.; Eyre, J.; Anthes, R.A.; Holmlund, K.; Germain, K.M.S.; Hoffman, R.N. The Earth-Observing Satellite Constellation: A review from a meteorological perspective of a complex, interconnected global system with extensive applications. IEEE Geosci. Remote Sens. Mag. 2021, 9, 26–42. [Google Scholar] [CrossRef]
Aati, S.; Avouac, J.P.; Rupnik, E.; Deseilligny, M.P. Potential and limitation of planetscope images for 2-D and 3-D earth surface monitoring with example of applications to glaciers and earthquakes. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–19. [Google Scholar] [CrossRef]
Roy, D.P.; Huang, H.; Houborg, R.; Martins, V.S. A global analysis of the temporal availability of PlanetScope high spatial resolution multi-spectral imagery. Remote Sens. Environ. 2021, 264, 112586. [Google Scholar] [CrossRef]
Liu, Y.; Jin, H.; Yao, Y.W.; Chen, Y.; Zhao, Y.; Kong, L.; Li, R.; Liu, X.; Chen, G. Distributed On-Orbit Sparse Coding for Efficient Space Situational Awareness Image Transmission. In Proceedings of the IEEE INFOCOM 2025-IEEE Conference on Computer Communications, London, UK, 19–22 May 2025; pp. 1–10. [Google Scholar]
Clauson, J.; Cantrell, S.; Vrabel, J.; Oeding, J.; Ranjitkar, B.; Rusten, T.; Ramaseri, S.; Casey, K. Earth Observing Satellites Online Compendium; US Geological Survey: Reston, VA, USA, 2024; Volume 1.
Li, Y.; Wang, L.; Wang, T.; Yang, X.; Luo, J.; Wang, Q.; Deng, Y.; Wang, W.; Sun, X.; Li, H.; et al. STAR: A first-ever dataset and a large-scale benchmark for scene graph generation in large-size satellite imagery. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 1832–1849. [Google Scholar] [CrossRef]
y Arcas, B.A.; Beals, T.; Biggs, M.; Bloom, J.V.; Fischbacher, T.; Gromov, K.; Köster, U.; Pravahan, R.; Manyika, J. Towards a future space-based, highly scalable AI infrastructure system design. arXiv 2025, arXiv:2511.19468. [Google Scholar]
exadevice. Aerospace Storage System. 2025. Available online: https://www.exadevice.com/as3 (accessed on 17 February 2026).
Zhao, C.; Pan, J.; Sun, H.; Li, X.; Xu, K.; Zhao, Y.; Zhang, L. Reliability Case Study of COTS Storage on the Jilin-1 KF Satellite: On-Board Operations, Failure Analysis, and Closed-Loop Management. Aerospace 2026, 13, 116. [Google Scholar] [CrossRef]
Huang, D.; Li, X.; Zheng, X.; Wei, W.; Zhang, Q.; Guo, F. Evaluation and application of on-orbit calibration of the automated vicarious calibration system. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–15. [Google Scholar] [CrossRef]
Chen, X.; Xing, F.; You, Z.; Zhong, X.; Qi, K. On-orbit high-accuracy geometric calibration for remote sensing camera based on star sources observation. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–11. [Google Scholar] [CrossRef]
Amazon. Amazon’s Project Kuiper Completes Successful Tests of Optical Mesh Network in Low Earth Orbit. 2023. Available online: https://www.aboutamazon.com/news/innovation-at-amazon/amazon-project-kuiper-oisl-space-laser-december-2023-update (accessed on 17 February 2026).
Witten, I.H.; Neal, R.M.; Cleary, J.G. Arithmetic coding for data compression. Commun. ACM 1987, 30, 520–540. [Google Scholar] [CrossRef]
Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multiscale structural similarity for image quality assessment. In Proceedings of the Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 9–12 November 2003; Volume 2, pp. 1398–1402. [Google Scholar]
Niu, X.; Tang, H.; Wu, L. Satellite scheduling of large areal tasks for rapid response to natural disaster using a multi-objective genetic algorithm. Int. J. Disaster Risk Reduct. 2018, 28, 813–825. [Google Scholar] [CrossRef]
Indyk, P.; Motwani, R. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, Dallas, TX, USA, 24–26 May 1998; pp. 604–613. [Google Scholar]
Andoni, A.; Indyk, P.; Laarhoven, T.; Razenshteyn, I.; Schmidt, L. Practical and optimal LSH for angular distance. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
Vasisht, D.; Shenoy, J.; Chandra, R. L2D2: Low latency distributed downlink for LEO satellites. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference, Virtual Event, USA, 23–27 August 2021; pp. 151–164. [Google Scholar]
Li, Q.; Wang, S.; Ma, X.; Zhou, A.; Wang, Y.; Huang, G.; Liu, X. Battery-aware energy optimization for satellite edge computing. IEEE Trans. Serv. Comput. 2024, 17, 437–451. [Google Scholar] [CrossRef]
Pavlov, I. LZMA SDK (Software Development Kit). 2026. Available online: https://www.7-zip.org/sdk.html (accessed on 17 February 2026).
Bégaint, J.; Racapé, F.; Feltman, S.; Pushparaja, A. Compressai: A pytorch library and evaluation platform for end-to-end compression research. arXiv 2020, arXiv:2011.03029. [Google Scholar]
Velazquez, D.; Rodriguez, P.; Alonso, S.; Gonfaus, J.M.; Gonzalez, J.; Richarte, G.; Marin, J.; Bengio, Y.; Lacoste, A. EarthView: A large scale remote sensing dataset for self-supervision. In Proceedings of the Winter Conference on Applications of Computer Vision, Tucson, AZ, USA, 28 February–4 March 2025; pp. 1228–1237. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
Walker, J.G. Satellite constellations. J. Br. Interplanet. Soc. 1984, 37, 559. [Google Scholar]
Huang, B.; Reichman, D.; Collins, L.M.; Bradbury, K.; Malof, J.M. Tiling and stitching segmentation output for remote sensing: Basic challenges and recommendations. arXiv 2018, arXiv:1805.12219. [Google Scholar]

Figure 1. Schematic of the satellite image downlink mission. The sequential process includes satellite-based EO image capture, on-board encoding, data transmission via the ISL and the satellite-to-ground link, and decoding at the GS.

Figure 2. Pipeline of LIC with joint hyperprior and context entropy modeling. The right side illustrates the main LIC workflow, from raw pixels to the compressed bitstream, while the left side shows the hyperprior and context branches. The hyperprior metadata is encapsulated into the final bitstream along with the latents.

Figure 3. Redundancy of latent representation in multi-satellite observations. (a) without rotation; (b) with rotation. In each row, the left panel shows two EO images aligned by geographical coordinates, and the right panel shows the cosine similarity map of the corresponding latent representations in the overlapping region.

Figure 4. Architecture of DLRC. The gray-shaded region corresponds to the baseline LIC pipeline, while the blue-shaded region and the Filter correspond to our proposed DLRC. The right portion of the figure denotes the data exchange with other satellites. Arabic numerals indicate signature values, while circled numerals denote entries in the assignment map.

Figure 5. Workflow of latent representation decoding and reconstruction. In each step from left to right, the top row shows the current assignment map, where red entries indicate positions already used; the middle row indicates the source of the latent vectors; and the bottom row presents the reconstruction progress of the latent representation.

Figure 6. R–D performance comparison between the original baseline models and those integrated with DLRC on the STAR test set: (a) PSNR; (b) MS-SSIM. Each data point represents the average performance across all devices.

Figure 7. Bpp breakdown on the STAR test set for the original baseline models and the versions integrated with DLRC: (a)

o_{\exp} = 2.0

; (b)

o_{\exp} = 0.5

. The value above each bar indicates the total bpp.

Figure 7. Bpp breakdown on the STAR test set for the original baseline models and the versions integrated with DLRC: (a)

o_{\exp} = 2.0

; (b)

o_{\exp} = 0.5

. The value above each bar indicates the total bpp.

Figure 8. Visual comparison on representative samples from the STAR validation set: (a) lower bpp level; (b) higher bpp level. The reconstruction quality information is reported in the format: bpp/PSNR (dB)/MS-SSIM.

Figure 9. Computational overhead of the baseline models and DLRC under increasing patch sizes: (a) computation time; (b) memory usage. ELIC remains a fixed model setting across all

λ

values.

Figure 9. Computational overhead of the baseline models and DLRC under increasing patch sizes: (a) computation time; (b) memory usage. ELIC remains a fixed model setting across all

λ

values.

Figure 10. Communication overhead of DLRC under different patch sizes obtained from the simulator. Mean and maximum values across the 8 satellites are reported.

Figure 11. R–D performance comparison between ELIC and the version integrated with DLRC on the EarthView dataset.

Figure 12. Ablation studies of DLRC based on ELIC, reporting R–D performance and computation time. Here, k denotes the number of hash functions in the Recognizer; Original Rep. and Mean Rep. denote that the Selector uses the original latent vector and the mean vector as the representative, respectively.

Table 1. mAP for the STAR detection task on the validation set. Values represent the absolute increase or decrease in percentage points (

Δ

mAP) relative to the Original.

Table 1. mAP for the STAR detection task on the validation set. Values represent the absolute increase or decrease in percentage points (

Δ

mAP) relative to the Original.

Original	BMSHJ	BMSHJ + DLRC	MBT	MBT + DLRC	Cheng	Cheng + DLRC	ELIC	ELIC + DLRC
42.49%	$- 0.48 %$	$- 0.8 %$	$- 0.83 %$	$+ 0.68 %$	$+ 0.86 %$	$+ 1.19 %$	$+ 2.83 %$	$+ 2.21 %$

Table 2. Computational overhead breakdown of different baseline models and DLRC. The total LIC time is further decomposed into analysis transform, entropy modeling, and entropy coding.

Baseline	LIC Time (ms)	LIC Mem (MiB)	DLRC Time (ms)	DLRC Mem (MiB)	Analysis Transform	Entropy Modeling	Entropy Coding
BMSHJ	24.68	449	+12.89	+2	2.31	7.85	14.52
MBT	24.71	469	+12.89	+2	2.39	7.98	14.34
Cheng	985.83	493	+12.89	+2	7.41	963.95	14.48
ELIC	274.07	791	+12.89	+2	8.31	251.45	14.31
COSMIC	25.75	453	+12.89	+2	3.81	7.81	14.13

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lu, X.; Guan, X.; Wang, P.; Cai, Z.; Zhang, Y. Distributed Latent Representation Clustering for Efficient Multi-Satellite Image Compression. Remote Sens. 2026, 18, 1355. https://doi.org/10.3390/rs18091355

AMA Style

Lu X, Guan X, Wang P, Cai Z, Zhang Y. Distributed Latent Representation Clustering for Efficient Multi-Satellite Image Compression. Remote Sensing. 2026; 18(9):1355. https://doi.org/10.3390/rs18091355

Chicago/Turabian Style

Lu, Xiandong, Xingyu Guan, Pengcheng Wang, Zhiming Cai, and Yonghe Zhang. 2026. "Distributed Latent Representation Clustering for Efficient Multi-Satellite Image Compression" Remote Sensing 18, no. 9: 1355. https://doi.org/10.3390/rs18091355

APA Style

Lu, X., Guan, X., Wang, P., Cai, Z., & Zhang, Y. (2026). Distributed Latent Representation Clustering for Efficient Multi-Satellite Image Compression. Remote Sensing, 18(9), 1355. https://doi.org/10.3390/rs18091355

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distributed Latent Representation Clustering for Efficient Multi-Satellite Image Compression

Highlights

Abstract

1. Introduction

2. Background

2.1. Satellite Image Downlink Missions

2.2. Satellite Image Compression Methods

3. Method

3.1. Motivation and Challenges

3.2. Overview

3.3. Local Latent Representation Clustering

3.4. Global Cluster Signature Synchronization

3.5. Centralized Latent Representation Reconstruction

4. Evaluation

4.1. Experimental Setup

4.2. Experimental Results

4.3. Ablation Study

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI