Onboard Deployment of Remote Sensing Foundation Models: A Comprehensive Review of Architecture, Optimization, and Hardware

Sang, Hanbo; Zhang, Limeng; Chen, Tianrui; Guo, Weiwei; Zhang, Zenghui

doi:10.3390/rs18020298

Open AccessReview

Onboard Deployment of Remote Sensing Foundation Models: A Comprehensive Review of Architecture, Optimization, and Hardware

by

Hanbo Sang

¹,

Limeng Zhang

¹,

Tianrui Chen

¹,

Weiwei Guo

²

and

Zenghui Zhang

^1,*

¹

Shanghai Key Laboratory of Intelligent Sensing and Recognition, Shanghai Jiao Tong University, Shanghai 200240, China

²

Center of Digital Innovation, Tongji University, Shanghai 200092, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(2), 298; https://doi.org/10.3390/rs18020298

Submission received: 13 December 2025 / Revised: 10 January 2026 / Accepted: 13 January 2026 / Published: 16 January 2026

(This article belongs to the Special Issue Reviews in Remote Sensing Image Processing: Methods, Architectures, and Applications)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

This review presents the first comprehensive survey on the deployment of remote sensing foundation models (RSFMs) on resource-constrained satellites for onboard computing.
This review systematically covers a unified deployment pipeline, including RSFMs development, model optimization, and hardware analysis.

What are the implications of the main findings?

The onboard deployment of RSFMs is feasible and promising from the perspectives of memory, energy, and computation.
This review serves as a practical roadmap for future research on the deployment of large-scale models on edge devices.

Abstract

With the rapid growth of multimodal remote sensing (RS) data, there is an increasing demand for intelligent onboard computing to alleviate the transmission and latency bottlenecks of traditional orbit-to-ground downlinking workflows. While many lightweight AI algorithms have been widely developed and deployed for onboard inference, their limited generalization capability restricts performance under the diverse and dynamic conditions of advanced Earth observation. Recent advances in remote sensing foundation models (RSFMs) offer a promising solution by providing pretrained representations with strong adaptability across diverse tasks and modalities. However, the deployment of RSFMs onboard resource-constrained devices such as nano satellites remains a significant challenge due to strict limitations in memory, energy, computation, and radiation tolerance. To this end, this review proposes the first comprehensive survey of onboard RSFMs deployment, where a unified deployment pipeline including RSFMs development, model compression techniques, and hardware optimization is introduced and surveyed in detail. Available hardware platforms are also discussed and compared, based on which some typical case studies for low Earth orbit (LEO) CubeSats are presented to analyze the feasibility of onboard RSFMs’ deployment. To conclude, this review aims to serve as a practical roadmap for future research on the deployment of RSFMs on edge devices, bridging the gap between the large-scale RSFMs and the resource constraints of spaceborne platforms for onboard computing.

Keywords:

remote sensing; onboard deployment; edge computing; foundation model; hardware accelerator; model compression

1. Introduction

Remote sensing (RS) plays an increasingly important role for earth observation (EO) with the help of multimodal sensors mounted on satellites or unmanned aerial vehicles (UAVs). By capturing the all-weather and round-the-clock data of various electromagnetic spectrums, RS provides critical insights into many applications, e.g., environmental monitoring, urban planning, land use mapping, disaster response, and climate change analysis [1,2,3].

The traditional downlink-all strategy, which involves transmitting all raw data from the orbit to ground stations for processing, faces severe bandwidth and latency constraints due to the huge data volumes generated by modern high-resolution sensors [4]. Apart from vast resource waste, this limitation also hinders real-time decision making and timely responses in critical applications such as disaster management. To address these challenges, there is a growing emphasis on onboard techniques, where data is analyzed directly on the sensing platforms (e.g., satellites and UAVs) before transmission. Onboard processing reduces the reliance on ground stations, minimizes data transmission costs, enhances data privacy, and enables real-time analytics in communication-denied environments [5].

To enable this onboard intelligence, numerous near-real-time and lightweight algorithms based on deep learning have been widely developed for RS applications [6,7]. Although these methods are computationally efficient and suitable for resource-constrained edge devices, they often lack the generalization capabilities required to handle the diverse and complex conditions of RS data. Specifically, the tiny networks trained on limited datasets are task specific and struggle to adapt to new scenarios and sensor modalities without extensive retraining and redeployment, limiting the autonomy and flexibility of onboard computing.

More recently, the emergence of foundation models (FMs), referring to large-scale pretrained models capable of generalizing across diverse tasks, has revolutionized the field of artificial intelligence (AI) [8]. Compared to those lightweight networks, FMs have demonstrated better performance in various domains due to huge parameters according to scaling law [9]. Furthermore, they are more flexible for onboard deployment. After pretraining, FMs can be easily adapted to a wide range of downstream tasks via parameter-efficient fine-tuning on minimal task-specific data [10], significantly reducing the model parameter scale for updating and uplinking. The RS community has also witnessed the development of large-scale remote sensing foundation models (RSFMs) [11,12,13] tailored for various applications such as scene classification, semantic segmentation, target detection, and change detection. They leverage vast amounts of multimodal RS data to learn rich representations that capture the complex spatial, spectral, and temporal patterns inherent in EO imagery.

However, the deployment of RSFMs on resource-constrained onboard devices such as satellites remains a significant challenge due to their substantial compute, memory, and energy requirements [4]. The gap between model complexity and device capability hinders the practical adoption of RSFMs for real-time onboard analytics. To bridge this gap, various model compression and optimization techniques, including quantization [14,15], pruning [16,17], and knowledge distillation [18,19,20], are possible to reduce the size and computational demands of RSFMs. Furthermore, advancements in specialized hardware accelerators [21] and edge computing architectures [22] are essential to support the efficient execution.

Although onboard RSFMs show great potential over traditional AI, both in performance and flexibility, research on them has not yet been explored. With regard to the existing reviews, on the one hand, several surveys have explored the onboard techniques for RS [4,5,6,21,23]. They primarily focus on the development of lightweight AI algorithms, without extending to the onboard FMs. On the other hand, many reviews have summarized the advancements in RSFMs and highlight the capabilities of FMs in handling diverse RS tasks [11,12,13,24], but they do not delve into the unique considerations and solutions required for deploying them on edge devices in real-world applications.

To this end, this review presents a comprehensive review of the deployment of RSFMs on resource-constrained onboard devices for the first time, aiming to fill the gap between the existing research on onboard processing techniques and the advancements in RSFMs. Specifically, a promising pipeline including hardware analysis, RSFMs development and model compression techniques is introduced. The remainder of this review is organized as follows. In Section 2, the background of the onboard techniques and the FMs for RS are briefly surveyed. In Section 3, we discuss the promising pipelines for the deployment of onboard RSFMs and the available hardware platforms and products. Section 4 introduces the frameworks and datasets of existing RSFMs. Section 5 reviews the model compression and acceleration methods for deploying RSFMs on resource-constrained devices. In Section 6, the challenges and opportunities of onboard RSFMs are discussed for future research directions.

2. Background

2.1. Onboard Computing for Remote Sensing

In traditional remote sensing systems, data acquisition and analysis are separated into distinct stages following a space-to-ground or downlink-all workflow, where the raw data or preprocessed imaging product is transmitted to ground stations for further interpretation. However, as imaging technology advances and sensor resolution continues to increase, the resulting data volume has grown exponentially (∼200 Gb/s), vastly outpacing the maximum downlink capacity of current commercial constellations (200 Mb/s) [22]. This vast discrepancy can lead to significant delays in obtaining actionable insights, particularly in time-sensitive scenarios. In the face of this challenge, benefiting from the rapid development of embedded computing hardware and efficient AI algorithms, a paradigm shift is underway from traditional ground-based processing to onboard computing [5].

Specifically, in order to reduce the data volume for transmission, some onboard preprocessing techniques have been developed, e.g., data compression [25,26,27,28,29,30] and filtering [31,32,33,34,35]. For instance, both Gollin et al. [25] and Hay et al. [26] focus on using machine learning or deep learning to improve the onboard compression of SAR raw data, addressing the traditional block adaptive quantization (BAQ) limitations in handling large data volumes. For multispectral imagery, Guerrisi et al. [29,30] present a convolutional autoencoder (CAE) for onboard lossy image compression. Gómez and Meoni [28] propose a novel method to mitigate the downlink bottleneck by training an autoencoder for image compression directly onboard a satellite constellation using federated learning, and evaluate it on Sentinel-2 data. With respect to data filtering, Giuffrida et al. [31,32] propose a CloudScout network on the PhiSat-1 mission to autonomously detect and filter out the meaningless cloud-contaminated images onboard before downlinking. Llaveria et al. [34] present a computationally efficient CNN-based algorithm to select the informative bands of hyperspectral images for transmission. Meyer et al. [35] develop a cognitive, two-stage SAR utilizing a support vector machine (SVM), wherein a wide-swath, low-resolution scan first detects potential ship locations as regions of interest (ROIs).

Another line of research conducts direct onboard interpretation using lightweight onboard AI models to extract high-level information for downstream tasks, e.g., recognition [36,37,38], object detection [39,40,41,42], segmentation [43,44], and change detection [18,45,46]. For instance, Pang et al. [37,39] propose a series of lightweight CNNs, i.e., SOCNet and SOCDet, for onboard object recognition and detection respectively. Lu et al. [36] extend onboard recognition to real-world fire smoke monitoring on the Hyperscout-2 mission. By means of the lightweight change detection models proposed by Wang et al. [18,46] and Vít et al. [45], the heavy time-series images could be converted to binary change maps. Compared to large amounts of raw data, these high-level products only carrying brief labels or coordinates are more convenient for transmission.

To summarize, onboard computing offers many advantages over traditional downlink-all strategies, as follows:

Communication bandwidth: Significantly relieves the transmission pressure due to compressed representations.
Latency: Onboard computing enables near-real-time data interpretation and decision making without time-consuming downlinking.
Security: Processing sensitive data onboard minimizes the risk of interception during transmission.
Stability: Allows remote sensing platforms to operate independently offline in communication-denied environments.

Currently, abundant satellites have already integrated onboard AI computing. However, existing projects mainly rely on lightweight AI networks, which are efficient but lack the generalization capability to handle the diverse and complex conditions in RS. To this end, this review integrates the recent advancements in RSFMs with onboard computing, aiming to explore the deployment of RSFMs on resource-constrained devices for real-time and flexible data interpretation.

2.2. The Application of Foundation Models to the Edge

Foundation models (FMs) refer to large-scale pretrained models that can be adapted to a wide range of downstream tasks with minimal task-specific data. The application of deploying FMs on the edge devices has become a popular research topic in the broader AI community, e.g., mobile phones, autonomous driving [47,48,49], robotics [50,51,52,53], and medicine [54]. For instance, Ravichandran et al. [52] distill a large LLaMA [55] into a small language model with LORA [56] to enable the language-driven UAV planner to run fully onboard the Nvidia Jetson Orin NX [57]. Lupu et al. [51] improve the control of off-road ground vehicles operating on complex terrain by integrating a vision foundation model (VFM) DINO V1 [58] with meta-learning, and evaluate it on a vehicle equipped with NVIDIA Jetson Orin. Vemprala et al. [53] propose the GRID platform, which incorporates a deployment pipeline based on model optimization techniques, parameter-efficient fine-tuning (QLORA), and hybrid cloud–edge architectures to run foundation models on compute-constrained edge devices. Cui et al. [47] design a personalized autonomous vehicle motion control system by fine-tuning a vision–language model (VLM) based on the Qwen-VL architecture [59] followed by four-bit quantization with AWQ [60].

When it comes to RS, although lightweight onboard AI models have shown promising performance, large-scale FMs deployed on resource-constrained devices, which are more flexible and powerful under diverse complex conditions, remain largely unexplored. Jankovic et al. [61] introduce a UAV-powered edge computing framework for disaster detection, where a large vision transformer (ViT) is trained offline and then optimized via post-training quantization (PTQ) with the help of TensorRT [62]. The optimized model (33∼59 MB) is then deployed on the UAV with NVIDIA Jetson Nano. Pu and Xu [63] address the challenge of updating large FMs already deployed on satellites in conditions of limited uplink bandwidth. Their pipeline involves fine-tuning the model on the ground using the proposed LoRA-Det strategy and then transmitting only the small newly updated parameters to the satellite. To enable efficient on-orbit earth surface anomaly detection, Xu et al. [64] propose a novel cloud–device collaborative framework based on SAM [65], where the lightweight prior is constructed and compressed on the ground and then uplinked to the satellite for further onboard inference. However, these works only focus on some isolated techniques for onboard RSFMs, lacking a systematic analysis of the overall pipeline.

3. Onboard Deployment of RSFMs

In this section, the overall pipeline for the onboard deployment of RSFMs is first illustrated, followed by a summary of the available hardware platforms and products. Furthermore, the detail feasibility analysis and case study for the deployment are presented in Section 3.3.

3.1. Overall Pipeline

This review envisions a unified pipeline for the deployment of RSFMs on resource-constrained onboard devices, which consists of the Data Layer, Model Development Layer, Model Optimization Layer, System Layer, and Application Layer, as illustrated in Figure 1.

3.1.1. Data Layer

The Data Layer consists of data collection, data preprocessing, and data augmentation. While traditional workflows prioritize optical imagery, modern onboard systems collect multimodal RS data from various sensors to capture a holistic view of the target. This includes not only imaging data such as RGB, multispectral, hyperspectral, and SAR, but also critical non-imaging modalities such as the digital elevation model (DEM), automatic identification system (AIS), and textual knowledge. These collected multimodal RS data from various sensors are first preprocessed to enhance data quality and consistency. Common preprocessing techniques include radiometric calibration, geometric correction, atmospheric correction, and denoising. As the training of FMs requires a large-scale dataset which is expensive to obtain in RS, many data augmentation techniques are applied to help increase data diversity. Regardless of traditional image enhancement manners such as rotation, flipping, and scaling, many recent works take advantage of GAN or diffusion models to generate synthetic RS images, thereby addressing the critical data bottleneck.

3.1.2. Model Development Layer

The Model Development Layer involves the architecture design, pretraining, and fine-tuning of RSFMs, which are detailed in Section 4. This review typically focuses on vision foundation models (VFMs), which are most relevant to onboard deployment. The architecture design of VFMs is based on various backbones, e.g., convolutional neural networks (CNNs) [66], vision transformers (ViTs) [67,68], and hybrid networks. The pretraining of RSFMs is usually conducted on large-scale unlabeled or weakly labeled RS datasets, using self-supervised learning strategies, e.g., contrastive learning [69,70,71] and generative learning such as masked image modeling (MIM) [72] or masked autoencoders (MAE) [73]. In this stage, the models learn rich and generalizable representations that capture the complex spatial, spectral, and temporal patterns inherent in RS imagery. After pretraining, RSFMs can be adapted to various downstream tasks via fine-tuning the small-size target labeled datasets, e.g., scene classification, semantic segmentation, change detection, and target detection. To reduce the computational and memory costs during fine-tuning on resource-constrained devices, parameter-efficient fine-tuning (PEFT) techniques such as Low Rank Adaptation (LoRA) [56], adapters [74,75] and prompt tuning [76,77] are commonly used.

3.1.3. Model Optimization Layer

The Model Optimization Layer aims to compress the large-scale RSFMs into a smaller size to fit the resource-constrained onboard devices, which is elaborated in Section 5. Given the limited computational power, memory capacity, and energy budget of onboard devices, directly deploying the large-scale RSFMs is impractical. Therefore, various model compression and acceleration techniques are employed to reduce the size and computational demands of the networks, where quantization, pruning, and knowledge distillation are commonly used techniques. Typically, quantization reduces the precision of the model parameters from floating point to lower-bit-width representations (such as INT8) to decrease memory footprint. This review categorizes it into post-training quantization (PTQ) and quantization-aware training (QAT). Pruning (which is categorized into unstructured and structured strategies in this review) eliminates redundant weights or neurons in the network, resulting in a sparser model. Knowledge distillation transfers knowledge from a large-scale teacher model to a smaller student model, enabling the student to achieve comparable or even better performance with reduced complexity. The application of compression techniques to RS, especially hardware-aware quantization tools, is also discussed in Section 5.

3.1.4. System Layer

The System Layer encompasses the hardware and software infrastructure required to support the efficient execution of RSFMs on onboard devices. The hardware platforms suitable for the deployment are surveyed later in Section 3.2, which is categorized into CPU, GPU, FPGA, and ASIC according to the processor architecture, and radiation-hardened (Rad-Hard), commercial-off-the-shelf (COTS) according to the space qualification level. To conclude, Rad-Hard platforms (typically involve CPUs and FPGAs) are specifically designed to withstand the harsh radiation environment of space to ensure reliable operation. COTS platforms, on the other hand, offer a cost-effective alternative with higher performance but lack stability and lifetime guarantees. With the increase of model complexity and computational demand, e.g., RSFMs, recent advancements shift more attention to high-performance COTS platforms for verifying onboard AI on low Earth orbit (LEO) CubeSats.

The software toolkit functions as a hierarchical bridge between high-level AI models and low-level hardware instruction sets, facilitating the efficient execution of neural networks on edge devices. Its pipeline is summarized as follows:

Conversion: Convert the trained models from popular deep learning frameworks (e.g., PyTorch and TensorFlow) into a general hardware-agnostic intermediate representation (IR) format such as ONNX [78].
Compilation: Optimize the IR format (including operator fusion, memory layout optimization, and quantization) for the specific hardware platform and then generate low-level code, with the help of hardware-specific toolkits, as shown in Table 1.
Runtime: Provide an execution environment to efficiently run the compiled models on the target hardware, managing resources such as memory allocation, parallelism, and hardware acceleration.

In order to facilitate the end-to-end deployment of AI models on hardware platforms, some mainstream tools have been developed to streamline the overall workflow, as shown in Table 1. For GPU-based architectures, NVIDIA TensorRT [62] serves as the industry standard for high-performance inference. Its primary advantage lies in its kernel auto-tuning capability, which selects the optimal algorithmic implementation for specific GPU architectures and fuses sequential layers (such as convolution and activation) to minimize memory access overhead. Additionally, TensorRT provides robust post-training quantization (PTQ) calibration tools that mitigate accuracy loss when converting weights to INT8 precision. However, a significant limitation is it is not portable across different GPU generations, necessitating recompilation for each hardware target. In the domain of FPGAs, Xilinx Vitis AI [79] and hls4ml [80] offer distinct deployment pathways. Vitis AI provides a high-level abstraction that compiles quantized models into instructions for the deep learning processor unit (DPU) soft core, significantly reducing the barrier to entry for FPGA development. While effective for standard CNN backbones, Vitis AI struggles with custom operators often found in novel transformer architectures, which may require complex hardware logic redesigns. Conversely, hls4ml translates neural networks directly into high-level synthesis (HLS) firmware, offering ultra-low latency suitable for trigger-level applications, though it is constrained by the limited logic resources of the FPGA fabric. For hardware-agnostic deployment, Apache TVM [81] and ONNX Runtime [78] provide flexible alternatives. TVM employs an automated search mechanism (AutoTVM) to find efficient tensor operator schedules across diverse backends, including CPUs, GPUs, and DSPs. This flexibility allows for heterogeneous deployment without vendor lock-in, although the tuning process can be computationally expensive and time-consuming compared to vendor-specific libraries.

Table 1. Compilation toolkits for onboard AI deployment. The platform gives some typical examples supported by the toolkit, but not limited to them.

Toolkit	Platform	Framework	Provider
TVM [81]	Diverse	Diverse	Apache
TensorRT [62]	Jetson Series	GPU	NVIDIA
Vitis AI [79]	Xinlinx FPGAs	FPGA	AMD
hls4ml [80]	FPGAs	FPGA	CERN
OpenVINO [82]	Movidius Myriad	VPU	Intel
Edgetpu_compiler [83]	Coral	TPU	Google

3.1.5. Application Layer

After model compilation, the RSFMs could be executed during onboard inference. Benefiting from the powerful representation capability and flexibility of RSFMs, various downstream tasks could be accomplished, e.g., scene classification, semantic segmentation, change detection, and target detection [84,85,86].

Scene classification serves as the fundamental benchmark for evaluating the global semantic abstraction capabilities of a VFM, which involves assigning a semantic label to an entire RS image.
Semantic segmentation requires the dense prediction of land use and land cover (LULC) classes at the pixel level, necessitating both high-level semantic understanding and low-level boundary preservation.
Change detection involves identifying significant surface alterations between bi-temporal or multi-temporal image pairs while ignoring irrelevant variations caused by seasonal phenology, illumination, or atmospheric conditions, leading to a binary or multi-class change map.
Object detection focuses on localizing and classifying discrete objects of interest (e.g., vehicles, buildings, and ships) within RS images, often dealing with challenges such as small object sizes, dense distributions, and varying orientations.

3.2. Hardware Platforms and Products

3.2.1. Hardware Types

After development and optimization, the RSFMs need to be deployed on suitable hardware platforms for efficient onboard inference. There are many available processor types, e.g., the central processing unit (CPU), Graphics processing unit (GPU), field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC). The features of these hardware frameworks are summarized and compared in Table 2.

The CPU is a general-purpose processor capable of handling a wide range of sequential tasks with strong flexibility. However, its limited parallelism and lower energy efficiency restrict its performance in large-scale DL workloads.
The GPU is optimized for massive parallel computing, and is widely adopted for accelerating matrix operations in DL. The limitation lies in its high power consumption and thermal output, posing challenges for resource-constrained RS devices.
The FPGA provides reconfigurable hardware logic, enabling customization for specific inference operations. It offers a balance between flexibility and efficiency, but requires specialized design expertise and high cost per computing unit.
The ASIC is custom-designed chips optimized for specific computational patterns such as tensor operations in DL. Examples include Google’s TPU, Intel’s VPU, and other NPU variants. These accelerators deliver the most excellent energy efficiency ratio (EER), but lack flexibility once fabricated.

Up to now, various computing platforms have been developed according to these hardware architectures, as summarized in Table 3. The selection of the computing platform for remote sensing devices is governed by a fundamental trade-off between their spaceborne reliability and computational performance, which leads to a distinct bifurcation in hardware design, broadly categorized into radiation-hardened (Rad-Hard) and commercial-off-the-shelf (COTS) platforms.

3.2.2. Rad-Hard

On the one hand, space-qualified Rad-Hard hardware is designed to be tolerant to the harsh radiation environment in the space, e.g., single event effects (SEEs) and total ionizing dose (TID) [6], ensuring long-term operational stability and reliability. However, this reliability comes at the cost of increased fabrication complexity, a higher price, and compromised performance. For this reason, Rad-Hard technique is mainly adopted in large satellites operating in deep space, organized by national space agencies.

For instance, RAD750 [87], LEON3, and LEON4 [91] are some classical CPU-based chips which have been widely used in various missions due to their radiation-hardened stability. Some representative space missions [88,89,90,92,93], listed in Table 3, are launched and maintained by national space agencies such as the National Aeronautics and Space Administration (NASA), European Space Agency (ESA), Japan Aerospace Exploration Agency (JAXA), and Taiwan’s National Space Organization (NSPO).

When it comes to FPGA, the Xilinx Virtex-5QV [94] is a notable example of Rad-Hard by design, where the radiation total dose is guaranteed to 1 MegaRad. Kintex UltraScale XQR [98] is a radiation-tolerant high-performance space FPGA, offering higher logic capacity but lower radiation tolerance. To conclude, the architecture of existing Rad-Hard platforms is mainly based on traditional CPU and FPGA due to complex fabrication process, which limits its performance in handling AI workloads, especially for large-scale RSFMs. It is a trade-off between the computational performance and radiation-hardened capability.

3.2.3. COTS

On the other hand, COTS platforms leverage commercially available components that are not specifically designed for space applications. These platforms, especially those designed based on advanced GPU, ASICs, or hybrid SoCs, prioritize higher computational performance and efficiency than Rad-Hard, which are more suitable for onboard RSFMs deployment. Although requiring extensive software-based mitigation strategies to handle radiation-induced upsets, COTS platforms have gained popularity in LEO satellites and short-duration missions. Many up-to-date projects also adopt them to verify onboard computing techniques.

For example, NVIDIA Jetson series [57], including Jetson Nano, Jetson Xavier, and Jetson Orin, are the most widely used GPU-based platforms for embedded AI applications. Many universities and commercial companies have launched their LEO CubeSats, e.g., SpIRIT [102], SONATE-2 [103], LizzieSat-3 [104], and EventSat [105], which carry Jetson platforms to validate onboard AI techniques. However, the expected lifetimes of these CubeSats are relatively short, typically 1–3 years [102,103,104,105] in LEO due to radiation exposure and atmosphere drag.

Furthermore, there exist many specifically designed ASICs for efficient edge applications such as Intel VPU and Google EdgeTPU. Although Myriad 2 [106] is a generally designed COTS VPU manufactured by Intel, it has been further space-qualified through extensive radiation mitigation strategies by ESA [114], and tested onboard the famous PhiSat-1 (the first in-orbit experiment to successfully run a deep CNN for cloud detection [32]) and PhiSat-2 missions [107]. Google also proposes the Project Suncatcher [109] to investigate the feasibility of its powerful Coral EdgeTPU [108] for in-orbit computing systems directly with space-based solar energy, planning to launch two prototype satellites by early 2027. Radiation testing including TID and SEE is ongoing [115] to validate its reliability for LEO missions.

Recently, heterogeneous SoCs combining diverse hardware architectures have gained increasing attention for edge computing, as they can leverage the strengths of different components to achieve a better balance between performance, efficiency and flexibility. For instance, Unibap iX5 [110] integrates the CPU, GPU, and Myriad X VPU, combined with 2 GB DDR3 ECC RAM. It has been adopted in the HyTI mission [111] launched by Hawaii Space Flight Laboratory (HSFL) for monitoring global crop water use from thermal hyperspectral images. Unibap iX10 is an upgraded version with more powerful chips and larger memory, with 25 GB DDR4 ECC RAM, which is helpful to the parameter loading and inference of RSFMs. HPE Spaceborne Computer-2 (SBC-2) [112] is another representative high-performance computing (HPC) platform built mainly based on the two Qualcomm Snapdragon 855 SoCs [116], which is generally used for mobile phones integrating CPU, GPU, DSP, and specific AI accelerators. SBC-2 has been launched to the International Space Station (ISS) on Northrop Grumman’s 20th Commercial Resupply Services mission [113] for benchmarking AI processing onboard the space station.

3.3. Case Study and Feasibility Analysis for RSFMs Deployment

The feasibility of deploying large-scale RSFMs in an orbital environment hinges on several critical factors pertaining to the hardware platform, including memory capacity, power consumption, and computational performance.

3.3.1. Critical Restrictions

Memory Capacity. In general, there exists three fundamental tiers of memory in embedded systems:

On-chip memory (e.g., SRAM, BRAM/URAM in FPGAs, and other AI-engine local buffers): Integrated directly within the processor chip, offering the fastest access speed and lowest power consumption. However, its capacity is limited to the megabyte (MB) level due to area and technology constraints.
Off-chip memory (e.g., DDR/HBM DRAM): Located outside the processor chip with higher latency and greater power consumption than on-chip memory. It provides larger capacity with several gigabytes (GB), suitable for holding model parameters and intermediate activations during runtime.
Flash storage (e.g., eMMC and SSD): Providing the highest-capacity non-volatile memory but with the slowest access speed.

During the inference runtime, the compiled model is usually retrieved from the flash, and then the inference engine, as shown in Table 1, allocates space in the device’s main memory for parameters and intermediate tensor outputs. Although on-chip memory is the best choice for lightweight AI models’ deployment, off-chip DRAM achieves a balance between capacity and latency, making it more suitable for deploying large-scale RSFMs onboard. To this end, the DRAM memory capacity of the hardware platform directly limits the size of RSFMs that can be deployed.

Computational Performance. The computational performance of a hardware platform is typically measured by how many mathematical operations it can complete in one second:

FLOPS (floating point operations per second): Used for high-precision calculations such as FP32 (32-bit) and FP16 (16-bit). Common in training and high-end GPUs.
TOPS (tera operations per second): Used for integer calculations such as INT8 (eight-bit) and INT4 (four-bit). Common in edge inference because integer operations are faster and more energy-efficient.

For the ViT-based RSFMs, the computational load is determined by two main blocks, i.e., linear projection and the attention mechanism, whose scaling is

2 N D^{2}

and

4 N^{2} D

respectively, where N is the number of tokens and D is the feature dimension. Under common settings with a normal network architecture and standard input image size, the computational load for RSFMs demands at least several TOPS.

Power Consumption. Satellites rely on a carefully engineered power system that ensures a continuous, stable, and radiation-tolerant energy supply throughout the mission lifetime. There are two main power sources for in-orbit satellites:

Solar panels: The primary energy source involves converting solar radiation into electricity using photovoltaic cells, with around 30% conversion efficiency.
Batteries: This is the secondary power source, providing energy storage for use during periods of low solar exposure or high power demand.

3.3.2. Case Study

The feasibility of deploying medium-scale RSFMs onboard resource-constrained LEO satellites is analyzed in this part. With the development of high-performance COTS platforms, computational performance is no longer the bottleneck, so the discussion mainly focuses on memory and power restrictions.

Memory Budget. The number of parameters of some popular RSFMs is concluded ranging from 20 M to 2 B. A typical RSFM with 500 M parameters is taken as an example in this review to analyze the feasibility of onboard deployment. First consider the memory consumption during inference, which mainly consists of two distinct components, i.e., static memory and dynamic memory. On the one hand, the static memory required for storing and loading the model parameters depends on the quantization precision, e.g., 1 GB with FP16 (16-bit) quantization, while 500 MB with INT8 (8-bit) quantization. On the other hand, the dynamic memory is used to store intermediate activations during the forward pass, which varies based on the network architecture and token scale (

N = H \times W / P^{2}

, where H and W represent the image size and P denotes the patch size). For simplicity, hardware with more than 2 GB DRAM is required.

Power Budget. As large satellites carry and generate much more power than nano satellites, this review typically researches the more difficult RSFMs’ deployment onboard LEO 6U CubeSats (10∼15 kg with limited power), which are widely used in academic and commercial mission projects.

Satellites in LEO circle the Earth approximately every 90 min, and they spend around 35 min (30%) in the Earth’s shadow (eclipse) per orbit, generating no power from solar panels. For a typical 6U CubeSat whose orbit average power (OAP) of solar array is around 18.5 W [117] and OAP consumption for basic operation is at least 7.8 W [118], the remaining power budget for onboard AI computing is no more than 10 W, considering the eclipse period and energy conversion loss.

Feasibility Analysis and Hardware Recommendations. The properties of some representative hardware platforms suitable for edge deployment are compared in Table 4. The commonly used Myriad 2 VPU [106] and Coral EdgeTPU [108] are not suitable for RSFMs’ deployment due to memory limitations. Jetson Orin Nano [57] meets both the memory (the CPU and GPU share the same pool of 8 GB LPDDR5 DRAM) and power (within 10 W) budgets. Moreover, it has been radiation tested for TID [119], is able to operate beyond 20 krad(Si), and indicates 1.5–2 years of LEO mission lifetime. This makes it a cost-effective platform suitable for experimental validation of RSFMs deployment onboard LEO CubeSats with the help of the TensorRT toolkit [62]. Other Jetson platforms, e.g., Orin NX and AGX Xavier, provide larger memory or higher performance while exceeding the power limitation for nano 6U CubeSats, which could be considered for medium-to-large satellites with sufficient energy. XOR Versal AI Edge (VE2102) [120] is a Rad-Tolerant hybrid SoC integrating FPGA and specific AI engines with 6 MB on-chip memory and at most 64 GB DRAM, offering more reliability and lifetime than COTS platforms. However, compared to Jetson Orin Nano, which directly integrates the DDR5 into the SoM (system-on-module) that is shared by the CPU and GPU and offers higher bandwidth and access speed, the system development based on Versal AI Edge requires additional efforts to consider the interaction between the on-chip memory and external DRAM, and to lower the latency. Qualcomm RB5 [121] is also a powerful hybrid SoM originally designed for robotics deployment. However, its has not been radiation tested yet, which means it might lack reliability for space applications.

Furthermore, to bridge the gap between theoretical hardware specifications and mission requirements for practical deployment, several typical scenario cases are provided to illustrate the optimal pairing of model complexity and hardware platforms under different environmental and operational constraints.

Picosatellites: Due to the stringent SWaP (size, weight, and power) limits with 0.01–1 kg and sub-watt power generation, only ultra-lightweight models with less than 1 M parameters are suitable for deployment on picosatellites. Myriad 2 VPU [106] with 1 TOPS/W efficiency is a popular choice for such scenario, enabling basic onboard tasks like cloud detection and image compression.
Nanosatellites (e.g., 6U CubeSats): With moderate SWaP constraints allowing up to 10 kg and several watts of power, nanosatellites can accommodate medium-scale FMs with ∼500 M parameters. Jetson Orin Nano [57] and XOR Versal AI Edge [120] are viable hardware options for LEO and GEO missions, respectively, depending on radiation environment.
Medium to large satellites (e.g., geostationary): Satellites larger than 500 kg with relaxed SWaP limits, facilitate the deployment of large FMs such as vision–language models (VLMs) with billions of parameters on high-performance computers. However, thermal management and radiation hardening remain critical considerations for the reliable operation of a high-performance computer.

4. Remote Sensing Foundation Models (RSFMs)

The current literature typically categorizes RSFMs into two primary streams based on the modalities involved: vision–language models (VLMs) and vision foundation models (VFMs) [11]. In this review, we focus on the survey of VFMs, as they are most relevant to onboard RSFMs’ deployment. Table 5 gives an overview of existing VFMs in the field of RS, including their backbone, pretraining strategy, data modality, and model size.

4.1. Architecture and Pretraining Strategy

The rapid evolution of VFMs in RS has been marked a transition from convolutional neural networks to vision transformers as the dominant backbone. Early efforts primarily utilized CNN-based architectures such as ResNet [66] to extract hierarchical features. However, the field has largely shifted towards transformer-based ones such as the basic vision transformer (ViT) [67] and swin transformer (SwinT) [68] due to their superior ability to model long-range dependencies and global context in complex, multi-scale geospatial data. Some methods also combine the strengths of both CNNs and ViTs to further enhance feature extraction capabilities and efficiency.

The paradigm for pretraining the model involves supervised learning and self-supervised learning. However, supervised pretraining requires large-scale annotated datasets [128,133,144], which are hard to obtain in RS due to the high cost of expert labeling. To leverage the massive volumes of unlabeled Earth observation data, self-supervised learning, typically involving contrastive learning (Contrastive) and masked image modeling (MIM), as shown in Figure 2, has become the predominant pretraining strategy for VFMs.

4.1.1. Contrastive Learning

Contrastive learning trains representations by pulling embeddings of related views and pushing apart those of unrelated views without supervision. Practically, this is implemented by forming positive pairs (different augmentations, temporal revisits, or cross-modal pairs) and contrasting them against many negatives using an objective such as InfoNCE [69]. Building on this foundation, MoCo [70] introduces a momentum encoder and a dynamic queue to maintain a large dictionary of negatives efficiently, while SimCLR [71] shows the importance of large-batch training and strong augmentation pipelines for contrastive success.

In the context of RS, research utilizing contrastive learning focus on exploiting the unique meta-information inherent in RS data, such as temporal timestamps and geolocation. SeCo [122] pioneers the use of temporal invariance, treating images of the same location captured at different times (seasons) as positive pairs to learn representations that are invariant to seasonal changes but sensitive to semantic content. Extending this concept to the spatial domain, GASSL [123] and CSP [126] integrate geospatial coordinates directly into the pretraining objective. Specially, CSP employs a dual-encoder architecture to separately encode images and their geolocation, aligning visual features with their corresponding geolocation embeddings to improve the performance. MATTER [124] introduces a method centered on material and textural consistency, leveraging multi-temporal alignment to learn representations invariant to illumination and viewing angles. SkySense [139] and its successor SkySense V2 [147] employ massive SwinT combined with ViT, with up to billions of parameters, and V2 introduces a mixture-of-experts (MoE) strategy to further scale the model capacity while maintaining computational efficiency during inference.

4.1.2. Masked Image Modeling

Masked image modeling (MIM) learns representations by masking parts of the input and training the model to reconstruct the missing content or its discrete proxy, encouraging the encoder to capture both local texture and broader context required for reconstruction. In computer vision, two complementary flavors emerge: the first is MAE-based pixel reconstruction [73], which masks many image patches and reconstructs raw pixels with an asymmetric encoder–decoder. The second is token-based MIM introduced in BEiT [149], which tokenizes images into discrete visual tokens and predicts token IDs for masked locations. A simpler variant, SimMIM [72], demonstrates that straightforward random masking plus pixel reconstruction can be highly effective.

MIM methods are well-suited to RS because they force the models to model fine-grained spatial or spectral details. For instance, SatMAE [125] adapts the MAE framework to multispectral RS images, introducing temporal and spectral positional embeddings that allow the model to independently mask and reconstruct patches across time series and multi-spectral bands. 3DMAE [150] employs a novel 3D vertical masking strategy to effectively capture inter- and intra-modality correlations between paired SAR and optical images. To address the significant resolution variations in RS imagery, Scale-MAE [129] incorporates a ground sample distance (GSD)-based positional encoding, combined with a Laplacian pyramid decoder, to explicitly learn scale-invariant representations. RingMo [132] proposes a specialized masking strategy that preserves the unmasked tokens of small targets, preventing them from being lost during the random masking process. Similarly, MA3E [136] introduces angle-aware embeddings to reconstruction objectives, forcing the model to learn rotational invariance crucial for oriented objected detection. To extend MIM to multispectral data, S2MAE [141] utilizes 3D masking to capture continuous spectral signatures in hyperspectral data, whereas HyperSIGMA [143] applies a specialized MIM strategy to reduce the high dimensionality of hyperspectral imagery. RobSense [146] leverages MIM to reconstruct missing modalities, thereby enhancing robustness against incomplete data inputs.

4.1.3. Hybrid Strategy

An increasing number of VFMs combine contrastive and MIM objectives to exploit both invariance learning and generative understanding. CMID [134] unifies contrastive and MIM tasks under a single architecture, introducing cross-view interactions that enhance spatial–spectral coherence. Cross-Scale MAE [131] utilizes scale augmentation to enforce consistency between synthesized multi-scale views of the same input using both contrastive and generative losses. OmniSat [135] focuses on modality fusion, exploiting the precise spatial alignment of different sensors to learn joint embeddings that remain effective even when specific modalities are missing during inference. AnySat [145] utilizes a joint embedding predictive architecture (JEPA) with scale-adaptive encoders to predict latent representations rather than raw pixels, facilitating scalable learning across resolutions.

4.2. Data Modality

Different from natural images, RS data can be categorized into various modalities, as shown in Figure 3, including high-resolution optical images (RGB), multispectral imagery (MSI), synthetic aperture radar (SAR), hyperspectral imagery (HSI), the digital elevation model (DEM), etc. While the majority of the existing VFMs mainly focus on RGB images due to the abundance of available data, some methods explore the unique physical challenges of other data modalities. For instance, SARATR-X [142] represents a pioneering effort in SAR-specific FM, incorporating speckle noise modeling into MIM pretraining to enhance robustness against SAR artifacts. The fusion of optical and SAR imagery is also surveyed by Zhang et al. [2]. HyperSIGMA [143] tackles the challenge of extreme spectral redundancy, which involves hundreds of contiguous bands by utilizing a sparse sampling attention (SSA) mechanism, allowing the model to scale to over one billion parameters while effectively modeling the intricate spectral correlations that standard ViTs miss.

In order to handle multimodal diversity, recent advancements have shifted toward unified architectures capable of fusing diverse RS modalities to improve robustness against extreme domain gaps between different modalities. For example, SkySense [139] and SkySense V2 [147] integrate RGB, MSI, and SAR data during pretraining, leveraging their complementary information to learn more holistic representations. msGFM [140] focuses on self-supervised fusion by exploiting the spatial alignment of different sensors and using sensor-specific embeddings to bridge the domain gap between optical, SAR, and DEM data. AnySat [145] introduces a JEPA equipped with scale-adaptive encoders. This allows the model to map data of any resolution, scale, or modality into a common latent space, predicting representations rather than raw pixels. TerraMind [148] pushes the generative frontier with an any-to-any framework based on a symmetric transformer. It introduces “thinking-in-modalities”, a mechanism that generates synthetic intermediate modalities (e.g., creating a SAR image from an optical input) to enhance performance on downstream tasks. CDPrompt [151] utilized SAM with a lightweight domain tuning module and automatically generated in-domain prompts derived from SAR to enable robust multimodal change detection in missing modality scenarios.

To capture the dynamic evolution of the Earth’s surface, many models also integrate explicit mechanisms for time series processing. SatMAE [125] adapts the MAE framework for temporal data by introducing temporal positional encodings and employing an independent masking strategy across time steps, forcing the model to reconstruct distinct temporal states. SpectralGPT [138] utilizes a 3D generative pretrained transformer architecture that treats sequential data as a continuous stream of tokens, modeling the 3D dependencies inherent in volumetric data. AnySat [145] handles temporal dynamics by defining input patches as 3D tensors (height × width × time) within its scale-adaptive encoder, allowing it to process varying sequence lengths naturally.

4.3. Parameter-Efficient Fine-Tuning (PEFT)

The adaptation of pretrained VFMs to downstream applications relies on rigorous fine-tuning methodologies designed to transfer the knowledge from generalized pretext tasks to specific EO objectives. Historically, the standard adaptation protocol involved full fine-tuning (FFT), wherein all parameters of the pretrained backbone are updated via backpropagation using task-specific loss functions. However, this is not suitable for deployment onboard resource-constrained satellites. For a multi-billion parameter model, this requires high-end GPU clusters with massive RAM capacity to store gradient states and optimizer moments. Furthermore, FFT creates a separate, full-sized copy of the model for each downstream task. In a practical remote sensing workflow, where a single satellite operator might require distinct models for different tasks, the storage requirements for FFT become prohibitive. Additionally, FFT on small, specialized remote sensing datasets carries a high risk of catastrophic forgetting, where the model overfits to the narrow downstream distribution and loses the generalizable feature representations acquired during pretraining. Linear probing offers a more efficient alternative by freezing the backbone and training only a lightweight task-specific head on top of the extracted features by the pretrained models. However, it may under-utilize the rich representations learned during pretraining.

To address these challenges, parameter-efficient fine-tuning (PEFT) techniques have emerged as a dominant frontier, which focuses on freezing the vast majority of the pretrained weights and updating only a minimal subset of the learnable parameters (often fewer than 1%) to reach comparable performance to FFT. Common PEFT methods include adapter tuning, prompt tuning, and reparameterization tuning, as shown in Figure 4.

Adapter tuning is arguably the most versatile PEFT approach, which introduces small bottleneck modules (adapters) into each layer of the pretrained model. During fine-tuning, only the adapter parameters are updated while the original model weights remain frozen. This modular design allows for easy integration into existing architectures and can be tailored to different tasks by varying the adapter size and placement. In RS applications, in order to address the limitations of adapters in dense prediction tasks, UPetu [75] proposes a unified approach specifically for RSFMs, arguing that standard adapters are designed for classification and lack the spatial sensitivity required for pixel-level tasks. It integrates two complementary modules, i.e., the efficient quantization adapter module and context-aware prompt module, to enhance the correlation between fine-grained feature information and task-specific knowledge.

Prompt tuning draws inspiration from natural language processing, where task-specific prompts are prepended to the input data to steer the model’s attention during inference. In vision transformers, this involves learning a set of prompt tokens that are concatenated with the input patch embeddings. During fine-tuning, only these prompt tokens are updated, allowing the model to adapt to new tasks with minimal parameter changes. Some recent works explore prompt tuning for adapting the segment anything model (SAM) [65] to RS tasks [76,77].

A typical approach of reparameterization tuning is low rank adaptation [56] (LoRA). The efficacy of LoRA relies on the intrinsic dimension hypothesis, which posits that the optimal parameters for a specific downstream task reside in a low-dimensional subspace of the high-dimensional parameter space of the pretrained model. Instead of updating the full weight matrix, LoRA optimizes a low-rank approximation of the update, significantly reducing memory requirements without introducing inference latency.

The structure of LoRA is illustrated in Figure 4c. In a standard neural network layer, the forward pass is defined as

h = W_{0} x

, where x is the input and

W_{0}

is the frozen pretrained weight. LoRA decomposes the weight update

Δ W

into two smaller matrices A and B as follows:

h = W_{0} x + Δ W x = W_{0} x + B A x,

(1)

where

Δ W = B A

is a parallel path for the weight update,

A \in R^{r \times d}

and

B \in R^{k \times r}

, with

r ≪ min (k, d)

being the trainable low-rank matrices. During fine-tuning, only A and B are updated, while

W_{0}

remains frozen. To ensure the training begins exactly at the pretrained state (i.e.,

Δ W = 0

at step zero), A is initialized with Gaussian

N (0, σ^{2})

and B is initialized to zero. In this way, the total number of trainable parameters is reduced from

k d

to

r (k + d)

, leading to significant memory savings.

LoRA has been successfully applied to various VFMs in RS, demonstrating its effectiveness in adapting large models to specific EO tasks with minimal computational overhead. Some specific strategies based on LoRA have also been proposed to further address the unique challenges in RS. For instance, directly adapting a model trained on RGB or MSI to HSI (which has hundreds of spectral bands) requires handling 3D data cubes where the spectral correlation is as important as spatial correlation. Standard LoRA, which operates on 2D matrices, may fail to capture these tensor interactions efficiently. To this end, Ligan et al. [74] apply kronecker product adaptation (KronA) for fine-tuning SpectralGPT [138] to HSI classification, which is superior to standard LoRA and achieves accuracy competitive with FFT while undating only 0.056% of the parameters.

Table 6 summarizes the comparative performance of some typical PEFT methods applied to RS. Reparameterization tuning (e.g., LoRA and its variants) utilizes the smallest trainable parameters and introduces no additional inference latency since it only adds a parallel low-rank path during training. Moreover, it maintains original compute graph and memory access patterns, making it hardware-friendly for onboard applications where real-time processing is critical. Adapter tuning introduces moderate storage overhead due to the additional adapter layers, which requires further optimization such as quantization for efficient deployment.

4.4. Dataset

The success of VFMs heavily relies on large-scale and diverse pretraining datasets that capture the complex variability of the Earth’s surface. Early pretraining efforts largely rely on MillionAID [152] and fMoW [153]. MillionAID is a large-scale benchmark containing over one million RGB images primarily designed for scene classification with 51 categories. fMoW contains temporal sequences of high-resolution MSI imagery with metadata (time, location, and angles), enabling models to learn temporal dynamics beyond mere static appearance. However, these annotated datasets are limited in scale and diversity compared to the massive corpora used in natural image FMs. Recent state-of-the-art VFMs turn to collecting massive unlabeled multimodal datasets from various satellite platforms (e.g., Sentinel-1, Sentinel-2, Landsat, ALOS-2, NAIP, etc.) to fully leverage self-supervised pretraining, as is summarized in Table 7.

After pretraining, the utility of RS VFMs is rigorously evaluated through their transferability to downstream EO tasks. Following the pretraining and fine-tune paradigm, they leverage generalized representations learned from massive unlabeled corpora to achieve state-of-the-art performance on specific target tasks. Typical downstream tasks and their benchmarks include scene classification (UC Merced Land Use (UCM) [154], EuroSat [155], BigEarthNet (BEN) [156], and NWPU-RESISC45 [157]), semantic segmentation (Potsdam & Vaihingen [158], iSAID [159]), change detection (OSCD [160], LEVIR-CD [161]), and object detection (DOTA [162], DIOR [163]).

5. Model Optimization and Compression

To enable the deployment of large-scale RSFMs onboard resource-constrained satellites, model optimization and compression techniques are essential to reduce the models’ size and computational requirements while maintaining performance. Key strategies include quantization, pruning, and knowledge distillation, as illustrated in Figure 5. The applications of these compression techniques to RS models are also discussed in this section.

5.1. Quantization

Quantization reduces the precision of model weights and activations from high-precision formats (e.g., 32-bit floating point) to lower-precision formats (e.g., 8-bit integer), significantly reducing memory footprint and computational load. In addition, some hardware accelerators only support specific precision formats. The implementation of quantization generally falls into post-training quantization (PTQ) and quantization-aware training (QAT): PTQ applies quantization after model training, while QAT incorporates quantization effects during training to mitigate accuracy loss.

5.1.1. Post-Training Quantization (PTQ)

PTQ is a straightforward approach that quantizes a pretrained model without further retraining, which is computationally efficient and easy to implement. However, PTQ may lead to significant accuracy degradation if the model’s weight distribution is susceptible to quantization noise. This risk is elevated in RSFMs where subtle spectral distinctions (e.g., distinguishing between stressed and healthy vegetation or different mineral types) rely on high-precision feature representations in the deeper layers of the network. To address this challenge, advanced PTQ methods mainly focus on preserving relevant information in weights and activations during quantization, which can be divided into calibration-only and reconstruction-based techniques.

Calibration-only PTQ estimate quantization parameters (e.g., range, scale, and zero-point) via simple statistics or analytic rules on a small calibration set. Simple analytic corrections are applied, e.g., min/max or percentile clipping for activations, per-channel or per-tensor scale choices, symmetric/asymmetric zero-points, etc. For instance, to address scenarios where training data is unavailable, Nagel et al. [166] propose a data-free method that utilizes weight equalization and bias correction to mitigate quantization errors directly from the pretrained model parameters. ZeroQ [167] reconstructs a synthetic calibration dataset by optimizing input noise to match the batch normalization statistics of the original full-precision network. Building upon synthetic data generation, GDFQ [168] employs a generative adversarial network (GAN) to produce diverse, distribution-matching samples that facilitate knowledge distillation for high-precision quantization.

Reconstruction-based PTQ further refines quantized weights by minimizing the reconstruction error between the outputs of the full-precision and quantized models on a calibration set, which compute intensively at the quantization time but achieve much stronger accuracy at low bitwidths. For instance, AdaRound [169] challenges the assumption that rounding to the nearest integer is optimal by formulating weight quantization as a quadratic unconstrained binary optimization problem that minimizes layer-wise reconstruction error. BRECQ [170] extends the scope of optimization from individual layers to residual blocks, demonstrating that block-wise reconstruction guided by the Hessian of the task loss significantly reduces error accumulation compared to layer-wise methods. Adapting these reconstruction principles to non-convolutional architectures, APHQ-ViT [171] addresses the unique activation distributions of vision transformers by employing an average perturbation Hessian metric to calibrate weights according to the model’s sensitivity.

5.1.2. Quantization-Aware Training (QAT)

QAT is a more advanced approach that simulates quantization errors during the training or fine-tuning process, allowing the model to adapt its weights to the lower precision representation and learn to be robust to quantization noise. Specifically, during forward propagation, weights and activations are quantized to the target precision using simulated quantization functions, thus injecting quantization noise into the training signal. During backpropagation, a differentiable surrogate (commonly the straight-through estimator, STE) is used to propagate gradients through non-differentiable rounding/clamping operations. Although requiring access to the original training data and involving additional computational overhead during training, QAT can significantly improve the final accuracy of quantized models compared to PTQ, especially for aggressive quantization levels.

Pioneering the extreme limit of low-precision training, Hubara et al. [172] introduce binarized neural networks (BNNs) to radically reduce memory consumption by constraining weights and activations to single-bit values

{- 1, + 1}

during the forward pass. Moving towards practical deployment on standard hardware, Jacob et al. [173] propose a simulation framework that models quantization noise during training to enable inference using strictly integer arithmetic without sacrificing accuracy. As attention-based models gained prominence, Q-ViT [174] identify that the self-attention mechanism is highly sensitive to quantization noise and propose an information rectification module to preserve the distribution of attention scores. In order to scale QAT to the era of large language models, EfficientQAT [175] overcomes the prohibitive memory costs of end-to-end training by employing a block-wise reconstruction strategy that allows for the efficient fine-tuning of massive parameters.

5.2. Pruning

Model pruning removes parameters (weights, channels, or structures) from a trained neural network to reduce storage, FLOPs, and latency while attempting to preserve task performance. Pruning methods differ by granularity (unstructured and structured), timing (post-training, one-shot, during fine-tuning, and sparse training from scratch), and criterion (magnitude, sensitivity, second-order, gradient-based, etc.). This review categorizes pruning techniques into unstructured and structured methods.

5.2.1. Unstructured Pruning

Unstructured pruning removes individual scalar weights without regard to their spatial or channel-wise organization. Given the network weights W and the dataset

D

, unstructured pruning applies a sparse binary mask

M \in {0, 1}

to zero out unimportant weights, as follows:

min_{W, M} {L (W ⊙ M; D) s . t . | | M | |}_{0} \leq k,

(2)

where

L

is the loss function,

{| | M | |}_{0}

counts for the number of retained weights, and k determines the desired sparsity level.

The early theoretical approach framed pruning as an optimization problem that minimizes the loss increase due to weight removal. Optimal brain damage [176] utilizes the diagonal of the Hessian matrix to estimate weight saliency, proving that low-sensitivity weights could be removed with minimal error. This is refined by optimal brain surgeon [177], which employs the full inverse Hessian to update remaining weights, eliminating the need for retraining. Han et al. [178,179] give a practical magnitude-based pipeline, which combines pruning with quantization and entropy coding to obtain storage reductions.

To avoid the cost of dense pretraining, a class of single-shot or pruning-at-initialization methods emerges. SNIP [180] scores connections by gradient sensitivity evaluated on mini-batches and prunes once at initialization. Wang et al. propose gradient signal preservation (GraSP) [181] to choose masks that preserve gradient flow. Iterative synaptic flow pruning (SynFlow) [182] produces a data-agnostic synaptic flow saliency to avoid layer collapse during iterative pruning. These approaches motivate deeper analysis of when and why pruning should occur at initialization.

An alternative trajectory focuses on dynamic sparse training (DST), which maintains a sparse model throughout training and periodically rewires connections (drop low-utility weights and grow promising ones). Methods such as RigL [183] use gradient or magnitude signals to guide growth steps. DST directly targets training-time FLOP and memory reductions rather than only post hoc compression, approaching dense accuracy while offering substantial compute savings during training.

Scaling unstructured pruning to billion-parameter LLMs produces pragmatic one-shot methods that avoid costly retraining or heavy curvature estimation. SparseGPT [184] introduces an approximate second-order reconstruction scheme tailored to GPT family architectures and demonstrates accurate one-shot pruning. Wanda [185] proposes a simple per-output score (weight magnitude × input activation norm) that outperforms plain magnitude pruning without retraining.

5.2.2. Structured Pruning

On the other hand, structured pruning removes entire groups of parameters (filters, channels, neurons, attention heads, blocks, etc.) so that the pruned network contains smaller dense tensors rather than highly irregular sparsity. As modern accelerators are optimized for dense matrix operations, structured pruning explicitly targets hardware friendliness by enforcing block patterns and creating models that map directly to existing BLAS libraries without requiring specialized sparse kernels.

Early practical approaches frame structures’ pruning as filter or channel removal chosen by simple importance criteria. Li et al. [186] propose pruning entire convolutional filters whose removal minimally affects accuracy before fine-tuning, showing direct inference speed gains. ThiNet [187] introduces a data-driven filter selection strategy that estimates a filter’s contribution by solving a tiny reconstruction problem with responses from the next layer, illustrating that per-filter selection benefits from inter-filter redundancy and task signals rather than only within-filter magnitudes. Network slimming [188] moves towards training time structured sparsity induction by introducing

L_{1}

regularization on BatchNorm scale parameters to encourage channel sparsity during training. Subsequent structured methods emphasize better criteria and redundancy removal. FPGM [189] prunes filters according to their redundant relationship to others (geometric median) rather than those with smallest norms. NISP [190] propagates final-response importance scores backward to rank filters globally, offering another principled manner to prioritize structural removals.

A complementary line of works shift from hand-crafted heuristics to automated, device-aware pruning pipelines. NetAdapt [191] iteratively selects per-layer compression ratios using real latency or energy measurements on the target hardware instead of proxy FLOPs. AMC [192] formulates layer-wise pruning as a reinforcement learning policy search, automatically discovering compression schedules that balance accuracy against latency constraints.

The hardware ecosystem motivates a new middle ground between fully unstructured and coarse structured sparsity. To balance flexibility and efficiency, N:M Sparsity [193] introduces a semi-structured pattern (e.g., 2:4 sparsity) supported by NVIDIA Ampere GPUs, where two out of every four weights are zeroed to double compute throughput. STEP [194] further proposes a preconditioned training framework that learns N:M fine-grained structured sparsity masks from scratch, enabling high-accuracy sparse networks under hardware-friendly N:M constraints without relying on dense pretrained models. Isomorphic pruning [195] proposes grouping parameters by computational topology to handle the heterogeneous substructures of attention mechanisms effectively.

5.3. Knowledge Distillation

Knowledge distillation transfers knowledge from a large, complex teacher model to a smaller, simpler student model by training the student to mimic the teacher’s outputs. For model compression, the teacher model is typically a pretrained VFM, while the student is a compact model suitable for deployment on resource-constrained platforms. The most influential formulation to establish the dominant response-based paradigm is crystallized by Hinton et al. [196], where the student model is trained to match the softened output probabilities of the teacher model using a temperature parameter to smooth the logits.

While response-based methods have proven effective, they fail to capture the internal representational power of deep networks, particularly in very deep architectures where the final logical output is insufficient to guide intermediate layers. To this end, feature-based distillation methods emerge that align intermediate feature maps between the teacher and student. FitNets [197] pioneers this direction by introducing auxiliary regression losses on hidden layer outputs, allowing the student to learn richer representations beyond final outputs. Attention transfer [198] further refines this idea by matching attention maps derived from feature activations, allowing the student to learn where to look rather than exactly what to see.

Rather than distilling individual data points, relation-based methods transfer relationships among multiple samples encoded by the teacher. Relational knowledge distillation (RKD) [199] encourages the student to preserve the distances and angles between sampled pairs in the embedding space, ensuring that the structural topology of the teacher’s learned manifold is maintained. Contrastive representation distillation (CRD) [200] integrates contrastive learning by maximizing a lower bound on the mutual information between the teacher and student representations.

With the development of large-scale FMs and LLMs, some distillation efforts focus on scaling laws and efficient distillation pipelines. DistilBERT [201] demonstrates that a student model with half the layers of BERT can retain 97% of its performance by combining response-based distillation with masked language modeling during pretraining. TinyBERT [202] introduces a two-stage distillation process that first pretrains the student on large corpora using both response and feature-based losses, followed by task-specific fine-tuning with additional losses on attention distributions and hidden states.

5.4. Application in RS

The model compression techniques discussed above have been applied in various RS tasks to enable efficient deployment and near-real-time inference onboard resource-constrained devices.

Guo et al. [16] and Lu et al. [17] both explore structured pruning on CNNs such as VGG and ResNet, tailored for RS image classification. The former utilize a sensitivity function to selectively remove non-semantic filters, while the latter propose an energy-based framework based on singular value decomposition (SVD) that remains robust on undertrained models. Both methods achieve moderate reductions in model size and FLOPs with comparable accuracy (e.g., compressing ResNet-50 from 25.78 M to 15∼18 M).

Focusing on distilling the knowledge from pretrained CNNs to a lightweight student, CDKD [18] and DKD [19] are proposed for change detection and target detection, respectively. The student model of CDKD (FC-Siam-conc) achieves comparable performance to the full ResNet-50 or VGG-19 with only 1.54 M parameters. DKD reduces the parameters of a heavy RetinaNet-152 by nearly 4× (from 71.03 M to 19.9 M) and FLOPs by over 2×. In the context of transformer-based architecture, Wang et al. [20] propose burden-free distillation (BFD) to transfer general semantic knowledge from the visual encoder of CLIP (ViT-B/16) to task-specific change detection models via dual-temporal feature matching and patch contrastive loss. Compared to network pruning, distillation generally achieves higher compression ratios with less accuracy drop, but requires additional training efforts.

With respect to quantization, Li et al. [14] propose the SPMix-Q, which leverages layer-wise sensitivity heterogeneity to assign progressively decreasing bit-widths, achieving comparable segmentation performance with only 1/13 model size and 1/29 computational cost to the full-precision counterparts. GHOST [15] employs a clustering-based hybrid quantization strategy to automatically optimize bit widths. It also integrates distillation, utilizing a one-to-one self-teaching mechanism to distill knowledge from a full-precision teacher. To address the efficiency of fine-tuning large-scale RSFMs, Dong et al. [75] present UPetu, a unified PEFT framework that integrates quantization directly into adapter modules (EQAM) to reduce the size of updated parameters.

Due to the simplicity and flexibility of quantization, it is often combined with other techniques such as distillation [15] and PEFT [75]. Furthermore, the hardware friendliness of quantization also makes it a critical enabler for onboard deployment, often integrated directly into hardware-specific compilation pipelines to optimize latency, memory, and energy efficiency. For FPGA-based deployments, D’Abbondanza et al. [203] utilize hls4ml [80] to convert a QAT-trained U-Net, which quantized to four-bit via Brevitas, into FPGA firmware, achieving 8.8× efficiency gain over Vitis AI’s DPU. Ziaja et al. [204] validate the standard Vitis AI [79] workflow by quantifying the performance of models converted to INT* for DPU execution. Neris et al. [205] demonstrate that converting floating-point CNNs to 16-bit fix-point precision using Vitis HLS significantly reduces resource usage on Xilinx Kintex Ultrascale [98]. On GPU-accelerated embedded systems, both Ijaz el al. [206] and Jankovic et al. [61] leverage TensorRT [62] (the standard optimization engine for NVIDIA GPUs) to perform PTQ. The former find that FP16 quantization on Jetson Xavier NX or Nano offers the best balance of throughput and accuracy for disaster management CNNs, while the latter demonstrate that INT8 quantization is essential for real-time transformer inference on VAVs and reduce model size by ∼70%.

While low-bit quantization (e.g., INT4 or binary) offers substantial memory savings, it poses a significant risk to small target representation in RS imagery. Compared to natural images where targets often dominate the frame, RS images always contain small targets such as vehicles and ships that occupy only a few pixels due to high-altitude imaging. The activation maps corresponding to these objects rely on subtle high-frequency variations that are easily zeroed out or merged with the background noise floor when the dynamic range is compressed into low-bit format. For instance, on the NUAA-SIRST dataset, reducing precision from 32-bit to INT4 using standard PTQ resulted in an IoU drop from 72.69% to 60.32% [14]. Empirical evidence indicates that INT8 is generally regarded as the “safe boundary” for uniform quantization. To push beyond this limit without losing small targets, SPMix-Q [14] proposes a mixed-precision strategy. It maintains high precision such as FP16 or INT8 for the initial shallow layers that capture fine-grained spatial details, while allowing aggressive quantization such as 2–4 bits for the deeper semantic layers.

Most existing model optimization techniques are primarily developed and validated on optical images, which may not fully capture the unique characteristics of other data modalities, e.g., high spectral dimensionality, varying spatial resolutions, and sensor-specific noise patterns. For instance, Shinde et al. [207] show that uniform pruning or fixed-bit quantization is suboptimal for the land cover classification of RS imagery, as it often involves multi-resolution input and varying layer importance. Their adaptive layer-wise pruning combined with resolution scaling jointly consider spatial resolution and spectral information to preserve discriminative features. For multispectral input data, the high dimensionality and strong inter-band correlation allow channel-wise pruning and low-rank factorization with limited accuracy loss. Zou et al. [208] discover that channel importance in RS models is less separable than in natural-image models due to top-down viewpoints, scale variation, and atmospheric noise. They propose RemoteTrimmer to enable effective structural pruning by amplifying inter-channel importance differences via channel attention and introducing adaptive mining loss to focus training on difficult, noise-corrupted samples. For SAR input data, aggressive early-layer pruning or quantization can significantly degrade performance due to the need to preserve scattering and texture statistics under speckle noise [142].

6. Challenges and Future Opportunities

6.1. Radiation-Tolerant High-Performance Platform

Up to now, the computing platforms suitable for deploying large-scale RSFMs onboard satellites are still limited to COTS ones without space qualification according to Section 3.3, with little radiation tolerance capability and restricted stability and lifetime. Although COTS platforms allow operation in LEO with restricted stability and lifetime, the application in more harsh space environments beyond the Earth’s orbit is impractical.

Recently, some radiation-tolerant high-performance computing platforms have been under development for next-generation missions to bridge the gap between space-qualified reliability and COTS performance, which could open up new possibilities for onboard RSFMs deployment in deeper space. For instance, the AMD XOR Versal AI Edge [120] offers 8 TOPS computing power, incorporating advanced SEE mitigation through proprietary circuit techniques, and error detection and correction (EDAC) in block RAM and UltraRAM. A high-performance spaceflight computer (HPSC) [209] is being implemented by NASA and Microchip as a RHBD (radiation-hardened by design) heterogeneous SoC integrating four RISC-V cores and tensor accelerator support.

On the other hand, the radiation-tolerant memory for AI computing remains a primary bottleneck for onboard RSFMs’ deployment. The design of high-capacity and radiation-hardened DRAM faces a fundamental physical contradiction, i.e., increasing memory density requires scaling down the capacitor size, which inversely decreases the critical charge required to cause a single-event upset (SEU) [210]. Consequently, standard high-density DRAM cells become intolerably sensitive to particle strikes in orbit. Furthermore, the total ionizing dose (TID) degrades the retention time of DRAM capacitors by inducing leakage currents in access transistors, necessitating frequent refresh cycles that consume additional power [211]. As a result, current radiation-tolerant memory technologies lag significantly behind COTS counterparts in density, limiting the onboard deployment of large-scale RSFMs. Some technologies adopt COTS DRAM with extensive strengthening fabrication and error correction code (ECC) to mitigate the radiation effects, but the overhead in volume, power, and latency is non-negligible.

6.2. Memory Optimization

A critical challenge for onboard RSFMs’ computing is the limited memory bandwidth and data transmission speed, known as the memory wall. In common cases, the processor and memory are physically separated following the Von Neumann architecture. To perform a single operation, the data must travel from the off-chip DRAM to the on-chip cache and finally to the arithmetic logic unit (ALU), which introduces significant latency and energy waste. Multimodal data fusion in RSFMs further exacerbates the challenge, as it requires simultaneous access to multiple data streams with extra buffering, resampling, and alignment to overcome the desynchronization, which significantly increases memory access frequency and latency. In-memory computing (IMC) emerges as a promising solution to alleviate this challenge by integrating computation directly within the memory arrays, reducing data movement between storage and processing units. IMC architectures leverage resistive memory technologies (e.g., RRAM and PCM) to perform analog computations such as matrix–vector multiplications directly in the memory cells [212], which enable massive parallelism and significantly reduce latency and energy consumption, making it particularly suitable for the dense linear algebra operations prevalent in onboard RSFMs.

Another challenge is the limited DRAM memory capacity to hold the overall model parameters and activations during inference. Although the deployment of medium-scale RSFMs is feasible nowadays, the trend towards larger models containing several billions parameters is very hard for nano satellites. To break the dependency on high-capacity radiation-tolerant DRAM, a promising direction is flash-centric inference, which leverages the abundant non-volatile NAND flash storage as the primary repository for loading model weights [213]. By exploiting the inherent activation sparsity of the model, intelligent controllers can selectively stream only the necessary weights into a relatively small DRAM buffer on-demand.

6.3. Distributed Inference and Collaborative Computing

With the rapid development of satellite constellations, distributed inference across multiple satellites [214] featuring collaborative computing has emerged as a promising direction to overcome the resource constraints of RSFMs’ deployment onboard individual satellites, e.g., memory capacity, energy consumption, and computing power. By partitioning the parameters and intermediate activations of large-scale RSFMs across multiple satellites using parallelism strategies (e.g., model parallelism, pipeline parallelism, and tensor parallelism), the collective system can hold larger models and perform more complex tasks that infeasible for a single satellite.

Key challenges for RS distributed inference include efficient inter-satellite communication, the synchronization of model parameters, and handling latency introduced by data transmission between nodes. For instance, Jiang et al. [22] integrate multiple common satellites, the satellite server (the central node with more powerful computing resources), and the ground server as a whole system, and propose a resource scheduling framework SECORS that leverages multi-agent deep reinforcement learning (MAPPO) to optimize the energy efficiency, latency, and communication bandwidth during on-orbit inference. The intermittent communication between ground and satellites due to revisit cycles is also considered as an offline–online working mode.

6.4. Onboard Vision Language Autonomy

This review focuses on the mature VFMs on orbit, which have great potential to enhance the performance and generalization of satellite onboard computing. However, some limitations still exist, e.g., VFMs are typically trained for close-set visual recognition tasks and require task-specific fine-tuning to handle novel tasks and modalities, and they are not suitable for human-centric interaction. To this end, human-centric onboard autonomy based on VLMs is a promising direction.

Recently, VLMs have gained great attention in RS [13], which extend the standard visual backbones by aligning visual representations with natural-language semantic descriptions. Apart from handling complex high-level tasks (e.g., image captioning and visual question answering), VLMs could also unify the traditional visual tasks into an instruction-guided and text output framework, which greatly enhances flexibility and generalization. From the perspective of orbit-to-ground transmission, the output product for transmission is further compressed, as semantically rich text require less downlink bandwidth than full images.

Furthermore, the deployment of VLMs facilitates human-centric interaction and onboard autonomy. By utilizing natural language as an instructionable control interface, operators on the ground can issue instruction-based tasking directly to the mission without requiring specialized programming or retraining. For instance, the satellite could autonomously alter its task when receiving a command like “Capture images and execute recognition only if ship density exceeds a certain threshold”. Textual justifications for the decision of the mission could also be provided to the ground to verify the logic of the autonomous agent for increased trustworthiness and interpretability.

7. Conclusions

The rapid evolution of Earth observation systems, together with the increasing diversity and resolution of remote sensing modalities, has created an urgent demand for intelligent onboard processing capable of overcoming the long-standing limitations of bandwidth, latency, and operational inflexibility. This review provides the first perspective on the deployment of RSFMs on resource-constrained onboard platforms, bridging the gap between recent advances in foundation model research and the practical constraints of spaceborne computing. While RSFMs offer superior performance and generalization, their implementation faces severe challenges due to resource constraints, environmental factors and hardware’s energy efficiency ratio. In order to narrow this gap, from the algorithm perspective, model compression techniques including quantization, pruning, and knowledge distillation are analyzed as essential enablers for saving onboard resources. From the hardware and resource perspective, a typical case study and analysis are presented to demonstrate the feasibility of deploying RSFMs onboard LEO satellites under diverse scenarios, considering critical memory, power, and compute constraints. Ongoing progress in hardware–software co-optimization and edge-oriented toolkits have also made the deployment pipeline more convenient and efficient.

Furthermore, continued research on radiation-tolerant high-performance platforms, memory optimization, distributed inference, and human-centric VLM interfaces will further unlock the potential of larger-scale RSFMs for onboard autonomy in deeper space. We hope this review could open the door for future developments pertaining to next-generation intelligent Earth observation agents empowered by RSFMs.

Author Contributions

Conceptualization, H.S. and Z.Z.; methodology, H.S.; investigation, H.S., L.Z. and T.C.; writing—original draft preparation, H.S.; and writing—review and editing, Z.Z. and W.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by the National Natural Science Foundation of China under Grant 62271311 and the ESA-MOST CHINA Dragon 6 program ID. 95445.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ma, L.; Liu, Y.; Zhang, X.; Ye, Y.; Yin, G.; Johnson, B.A. Deep learning in remote sensing applications: A meta-analysis and review. ISPRS J. Photogramm. Remote Sens. 2019, 152, 166–177. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, L.; Wu, J.; Guo, W. Optical and Synthetic Aperture Radar Image Fusion for Ship Detection and Recognition: Current state, challenges, and future prospects. IEEE Geosci. Remote Sens. Mag. 2024, 12, 132–168. [Google Scholar] [CrossRef]
Jiang, N.; Feng, D.; Wang, J.; Zhu, J.; Huang, X. Along-track swarm SAR: Echo modeling and sub-aperture collaboration imaging based on sparse constraints. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 5602–5617. [Google Scholar] [CrossRef]
Chintalapati, B.; Precht, A.; Hanra, S.; Laufer, R.; Liwicki, M.; Eickhoff, J. Opportunities and Challenges of On-Board AI-based Image Recognition for Small Satellite Earth Observation Missions. Adv. Space Res. 2025, 75, 6734–6751. [Google Scholar] [CrossRef]
García, L.P.; Furano, G.; Ghiglione, M.; Zancan, V.; Imbembo, E.; Ilioudis, C.; Clemente, C.; Trucco, P. Advancements in Onboard Processing of Synthetic Aperture Radar (SAR) Data: Enhancing Efficiency and Real-Time Capabilities. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 16625–16645. [Google Scholar] [CrossRef]
Furano, G.; Meoni, G.; Dunne, A.; Moloney, D.; Ferlet-Cavrois, V.; Tavoularis, A.; Byrne, J.; Buckley, L.; Psarakis, M.; Voss, K.O.; et al. Towards the Use of Artificial Intelligence on the Edge in Space Systems: Challenges and Opportunities. IEEE Aerosp. Electron. Syst. Mag. 2020, 35, 44–56. [Google Scholar] [CrossRef]
Jiang, N.; Chen, J.; Zhu, J.; Liang, B.; Yang, D.; Huang, X.; Xing, M. High Frame Rate Along-Track Swarm SAR Sub-Aperture Collaboration Imaging for Moving Target. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5211721. [Google Scholar] [CrossRef]
Awais, M.; Naseer, M.; Khan, S.; Anwer, R.M.; Cholakkal, H.; Shah, M.; Yang, M.H.; Khan, F.S. Foundation Models Defining a New Era in Vision: A Survey and Outlook. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 2245–2264. [Google Scholar] [CrossRef]
Kaplan, J.; McCandlish, S.; Henighan, T.; Brown, T.B.; Chess, B.; Child, R.; Gray, S.; Radford, A.; Wu, J.; Amodei, D. Scaling Laws for Neural Language Models. arXiv 2020. [Google Scholar] [CrossRef]
Han, Z.; Gao, C.; Liu, J.; Zhang, J.; Zhang, S.Q. Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey. arXiv 2024. [Google Scholar] [CrossRef]
Xiao, A.; Xuan, W.; Wang, J.; Huang, J.; Tao, D.; Lu, S.; Yokoya, N. Foundation Models for Remote Sensing and Earth Observation: A Survey. IEEE Geosci. Remote Sens. Mag. 2025, 13, 297–324. [Google Scholar] [CrossRef]
Lu, S.; Guo, J.; Zimmer-Dauphinee, J.R.; Nieusma, J.M.; Wang, X.; van Valkenburgh, P.; Wernke, S.A.; Huo, Y. Vision Foundation Models in Remote Sensing: A Survey. IEEE Geosci. Remote Sens. Mag. 2025, 13, 190–215. [Google Scholar] [CrossRef]
Liu, C.; Zhang, J.; Chen, K.; Wang, M.; Zou, Z.; Shi, Z. Remote Sensing Spatiotemporal Vision–Language Models: A Comprehensive Survey. IEEE Geosci. Remote Sens. Mag. 2025, 2–42. [Google Scholar] [CrossRef]
Li, B.; Wang, L.; Wang, Y.; Wu, T.; Lin, Z.; Li, M.; An, W.; Guo, Y. Mixed-Precision Network Quantization for Infrared Small Target Segmentation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5000812. [Google Scholar] [CrossRef]
Zhang, J.; Lei, J.; Xie, W.; Li, Y.; Yang, G.; Jia, X. Guided Hybrid Quantization for Object Detection in Remote Sensing Imagery via One-to-One Self-Teaching. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5614815. [Google Scholar] [CrossRef]
Guo, X.; Hou, B.; Ren, B.; Ren, Z.; Jiao, L. Network Pruning for Remote Sensing Images Classification Based on Interpretable CNNs. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5605615. [Google Scholar] [CrossRef]
Lu, Y.; Gong, M.; Hu, Z.; Zhao, W.; Guan, Z.; Zhang, M. Energy-Based CNN Pruning for Remote Sensing Scene Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 3000214. [Google Scholar] [CrossRef]
Wang, G.; Zhang, N.; Wang, J.; Liu, W.; Xie, Y.; Chen, H. Knowledge Distillation-Based Lightweight Change Detection in High-Resolution Remote Sensing Imagery for On-Board Processing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 3860–3877. [Google Scholar] [CrossRef]
Zhang, Y.; Yan, Z.; Sun, X.; Diao, W.; Fu, K.; Wang, L. Learning Efficient and Accurate Detectors with Dynamic Knowledge Distillation in Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5613819. [Google Scholar] [CrossRef]
Wang, S.; Lv, C.; Quan, D.; Huyan, N.; Cao, X.; Sun, J.; Jiao, L. Burden-Free Distillation From Foundation Model for Efficient Remote Sensing Change Detection. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5632113. [Google Scholar] [CrossRef]
Diana, L.; Dini, P. Review on Hardware Devices and Software Techniques Enabling Neural Network Inference Onboard Satellites. Remote Sens. 2024, 16, 3957. [Google Scholar] [CrossRef]
Jiang, Q.; Zheng, L.; Zhou, Y.; Liu, H.; Kong, Q.; Zhang, Y.; Chen, B. Efficient On-Orbit Remote Sensing Imagery Processing via Satellite Edge Computing Resource Scheduling Optimization. IEEE Trans. Geosci. Remote Sens. 2025, 63, 1000519. [Google Scholar] [CrossRef]
Ghasemi, N.; Justo, J.A.; Celesti, M.; Despoisse, L.; Nieke, J. Onboard Processing of Hyperspectral Imagery: Deep Learning Advancements, Methodologies, Challenges, and Emerging Trends. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 4780–4790. [Google Scholar] [CrossRef]
Huo, C.; Chen, K.; Zhang, S.; Wang, Z.; Yan, H.; Shen, J.; Hong, Y.; Qi, G.; Fang, H.; Wang, Z. When Remote Sensing Meets Foundation Model: A Survey and Beyond. Remote Sens. 2025, 17, 179. [Google Scholar] [CrossRef]
Gollin, N.; Martone, M.; Krieger, G.; Rizzoli, P. AI for Optimized Raw Data Quantization in SAR Systems. In Proceedings of the 2024 International Radar Symposium (IRS), Wroclaw, Poland, 2–4 July 2024; pp. 244–249. [Google Scholar]
Hay, C.; Donnell, L.; Crawshaw, C.; Ireland, M.; Yaghoobi, M. Adaptive On-Board Signal Compression for SAR Using Machine Learning Methods. Small Satell. Conf. 2023. [Google Scholar] [CrossRef]
Alves de Oliveira, V.; Chabert, M.; Oberlin, T.; Poulliat, C.; Bruno, M.; Latry, C.; Carlavan, M.; Henrot, S.; Falzon, F.; Camarero, R. Reduced-Complexity End-to-End Variational Autoencoder for on Board Satellite Image Compression. Remote Sens. 2021, 13, 447. [Google Scholar] [CrossRef]
Gómez, P.; Meoni, G. Tackling the Satellite Downlink Bottleneck with Federated Onboard Learning of Image Compression. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 17–18 June 2024; pp. 6809–6818. [Google Scholar] [CrossRef]
Guerrisi, G.; Frate, F.D.; Schiavon, G. Artificial Intelligence Based On-Board Image Compression for the Φ-Sat-2 Mission. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 8063–8075. [Google Scholar] [CrossRef]
Guerrisi, G.; Bencivenni, G.; Schiavon, G.; Del Frate, F. On-Board Multispectral Image Compression with an Artificial Intelligence Based Algorithm. In Proceedings of the IGARSS 2024—2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024; pp. 2555–2559. [Google Scholar] [CrossRef]
Giuffrida, G.; Diana, L.; de Gioia, F.; Benelli, G.; Meoni, G.; Donati, M.; Fanucci, L. CloudScout: A Deep Neural Network for On-Board Cloud Detection on Hyperspectral Images. Remote Sens. 2020, 12, 2205. [Google Scholar] [CrossRef]
Giuffrida, G.; Fanucci, L.; Meoni, G.; Batič, M.; Buckley, L.; Dunne, A.; van Dijk, C.; Esposito, M.; Hefele, J.; Vercruyssen, N.; et al. The Φ-Sat-1 Mission: The First On-Board Deep Neural Network Demonstrator for Satellite Earth Observation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5517414. [Google Scholar] [CrossRef]
Llaveria, D.; Park, H.; Camps, A.; Patro, R.N. Efficient Onboard Band Selection Algorithm for Hyperspectral Imagery in SmallSat Missions with Limited Downlink Capabilities. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 8646–8661. [Google Scholar] [CrossRef]
Llaveria Godoy, D.; Longépé, N.; Meoni, G.; del Prete, R.; Camps, A. Convolutional-Neural-Network-Based Onboard Band Selection for Hyperspectral Data with Coarse Band-to-Band Alignment. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 16467–16475. [Google Scholar] [CrossRef]
Meyer, J.; Glatting, K.; Huber, S.; Krieger, G. A Cognitive SAR Concept for Ship Detection Using Support Vector Machines. In Proceedings of the EUSAR 2024, 15th European Conference on Synthetic Aperture Radar, Munich, Germany, 23–26 April 2024; pp. 537–542. [Google Scholar]
Lu, S.; Jones, E.; Zhao, L.; Sun, Y.; Qin, K.; Liu, J.; Li, J.; Abeysekara, P.; Mueller, N.; Oliver, S.; et al. Onboard AI for Fire Smoke Detection Using Hyperspectral Imagery: An Emulation for the Upcoming Kanyini Hyperscout-2 Mission. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 9629–9640. [Google Scholar] [CrossRef]
Pang, Y.; Zhang, Y.; Wang, Y.; Wei, X.; Chen, B. SOCNet: A Lightweight and Fine-Grained Object Recognition Network for Satellite On-Orbit Computing. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5632913. [Google Scholar] [CrossRef]
Xu, T.; Xiao, P.; Wang, H. MobileShuffle: An Efficient CNN Architecture for Spaceborne SAR Scene Classification. IEEE Geosci. Remote Sens. Lett. 2024, 21, 4015705. [Google Scholar] [CrossRef]
Pang, Y.; Zhang, Y.; Kong, Q.; Wang, Y.; Chen, B.; Cao, X. SOCDet: A Lightweight and Accurate Oriented Object Detection Network for Satellite On-Orbit Computing. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5608115. [Google Scholar] [CrossRef]
Sun, C.; Wang, X.; Liu, Z.; Wan, Y.; Zhang, L.; Zhong, Y. SiamOHOT: A Lightweight Dual Siamese Network for Onboard Hyperspectral Object Tracking via Joint Spatial-Spectral Knowledge Distillation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5521112. [Google Scholar] [CrossRef]
Xu, P.; Li, Q.; Zhang, B.; Wu, F.; Zhao, K.; Du, X.; Yang, C.; Zhong, R. On-Board Real-Time Ship Detection in HISEA-1 SAR Images Based on CFAR and Lightweight Deep Learning. Remote Sens. 2021, 13, 1995. [Google Scholar] [CrossRef]
Xu, X.; Zhang, X.; Zhang, T. Lite-YOLOv5: A Lightweight Deep Learning Detector for On-Board Ship Detection in Large-Scene Sentinel-1 SAR Images. Remote Sens. 2022, 14, 1018. [Google Scholar] [CrossRef]
Fernando, T.; Fookes, C.; Gammulle, H.; Denman, S.; Sridharan, S. Toward On-Board Panoptic Segmentation of Multispectral Satellite Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5402312. [Google Scholar] [CrossRef]
Lu, W.; Nguyen, M. A Lightweight Transformer with Multigranularity Tokens and Connected Component Loss for Land Cover Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4403316. [Google Scholar] [CrossRef]
Růžička, V.; Vaughan, A.; De Martini, D.; Fulton, J.; Salvatelli, V.; Bridges, C.; Mateo-Garcia, G.; Zantedeschi, V. RaVÆn: Unsupervised Change Detection of Extreme Events Using ML on-Board Satellites. Sci. Rep. 2022, 12, 16939. [Google Scholar] [CrossRef]
Wang, G.; Chen, H.; Li, J.; Wang, J.; Liu, W.; Chen, L. Bi-Temporal Feature Relational Distillation for On-Board Lightweight Change Detection in Remote Sensing Imagery. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 9900–9920. [Google Scholar] [CrossRef]
Cui, C.; Yang, Z.; Zhou, Y.; Peng, J.; Park, S.Y.; Zhang, C.; Ma, Y.; Cao, X.; Ye, W.; Feng, Y.; et al. On-Board Vision-Language Models for Personalized Autonomous Vehicle Motion Control: System Design and Real-World Validation. arXiv 2024. [Google Scholar] [CrossRef]
Gao, H.; Wang, Z.; Li, Y.; Long, K.; Yang, M.; Shen, Y. A Survey for Foundation Models in Autonomous Driving. arXiv 2024. [Google Scholar] [CrossRef]
Yan, X.; Zhang, H.; Cai, Y.; Guo, J.; Qiu, W.; Gao, B.; Zhou, K.; Zhao, Y.; Jin, H.; Gao, J.; et al. Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities. arXiv 2024. [Google Scholar] [CrossRef]
Firoozi, R.; Tucker, J.; Tian, S.; Majumdar, A.; Sun, J.; Liu, W.; Zhu, Y.; Song, S.; Kapoor, A.; Hausman, K.; et al. Foundation Models in Robotics: Applications, Challenges, and the Future. Int. J. Robot. Res. 2025, 44, 701–739. [Google Scholar] [CrossRef]
Lupu, E.S.; Xie, F.; Preiss, J.A.; Alindogan, J.; Anderson, M.; Chung, S.J. MAGICVFM-Meta-Learning Adaptation for Ground Interaction Control with Visual Foundation Models. IEEE Trans. Robot. 2025, 41, 180–199. [Google Scholar] [CrossRef]
Ravichandran, Z.; Cladera, F.; Hughes, J.; Murali, V.; Hsieh, M.A.; Pappas, G.J.; Taylor, C.J.; Kumar, V. Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities. arXiv 2025. [Google Scholar] [CrossRef]
Vemprala, S.; Chen, S.; Shukla, A.; Narayanan, D.; Kapoor, A. GRID: A Platform for General Robot Intelligence Development. arXiv 2023. [Google Scholar] [CrossRef]
Campanella, G.; Kumar, N.; Nanda, S.; Singi, S.; Fluder, E.; Kwan, R.; Muehlstedt, S.; Pfarr, N.; Schüffler, P.J.; Häggström, I.; et al. Real-World Deployment of a Fine-Tuned Pathology Foundation Model for Lung Cancer Biomarker Detection. Nat. Med. 2025, 31, 3002–3010. [Google Scholar] [CrossRef]
Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
NVIDIA. Powering the Future of Embedded Edge AI. Available online: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems (accessed on 15 November 2025).
Caron, M.; Touvron, H.; Misra, I.; Jegou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. Emerging Properties in Self-Supervised Vision Transformers. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9630–9640. [Google Scholar] [CrossRef]
Bai, J.; Bai, S.; Yang, S.; Wang, S.; Tan, S.; Wang, P.; Lin, J.; Zhou, C.; Zhou, J. Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. arXiv 2023, arXiv:2308.12966. [Google Scholar]
Lin, J.; Tang, J.; Tang, H.; Yang, S.; Chen, W.M.; Wang, W.C.; Xiao, G.; Dang, X.; Gan, C.; Han, S. AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration. Mach. Learn. Syst. 2024, 6, 87–100. [Google Scholar] [CrossRef]
Jankovic, B.; Jangirova, S.; Ullah, W.; Khan, L.U.; Guizani, M. Disaster Detection on the Fly: Optimized Transformers for UAVs. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 1–14. [Google Scholar] [CrossRef]
NVIDIA. NVIDIA TensorRT. Available online: https://developer.nvidia.com/tensorrt (accessed on 15 November 2025).
Pu, X.; Xu, F. Low-Rank Adaption on Transformer-Based Oriented Object Detector for Satellite Onboard Processing of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5202213. [Google Scholar] [CrossRef]
Xu, J.; Yan, K.; Fan, Z.; Jia, K.; Qi, J.; Cao, B.; Zhao, W.; Wang, G.; Wang, Q. Toward a Novel Method for General On-Orbit Earth Surface Anomaly Detection Leveraging Large Vision Models and Lightweight Priors. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4706321. [Google Scholar] [CrossRef]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 3992–4003. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
van den Oord, A.; Li, Y.; Vinyals, O. Representation Learning with Contrastive Predictive Coding (InfoNCE). arXiv 2019. [Google Scholar] [CrossRef]
He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning (MoCo). In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9726–9735. [Google Scholar] [CrossRef]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations (SimCLR). In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
Xie, Z.; Zhang, Z.; Cao, Y.; Lin, Y.; Bao, J.; Yao, Z.; Dai, Q.; Hu, H. SimMIM: A Simple Framework for Masked Image Modeling. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 9643–9653. [Google Scholar] [CrossRef]
He, K.; Chen, X.; Xie, S.; Li, Y.; Dollar, P.; Girshick, R. Masked Autoencoders Are Scalable Vision Learners (MAE). In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 15979–15988. [Google Scholar] [CrossRef]
Ligan, B.; Jbilou, K.; Kalloubi, F.; Ratnani, A. Parameter-Efficient Fine-Tuning of Multispectral Foundation Models for Hyperspectral Image Classification. arXiv 2025, arXiv:2505.15334. [Google Scholar]
Dong, Z.; Gu, Y.; Liu, T. UPetu: A Unified Parameter-Efficient Fine-Tuning Framework for Remote Sensing Foundation Model. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5616613. [Google Scholar] [CrossRef]
Zhang, J.; Shao, M.; Meng, L.; Cao, X.; Wang, S. PromptSeg: Prompt for Universal Remote Sensing Semantic Segmentation. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5643518. [Google Scholar] [CrossRef]
Chen, K.; Liu, C.; Chen, H.; Zhang, H.; Li, W.; Zou, Z.; Shi, Z. RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation Based on Visual Foundation Model. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4701117. [Google Scholar] [CrossRef]
ONNX. Open Neural Network Exchange. Available online: https://onnx.ai (accessed on 15 November 2025).
AMD. AMD Vitis™ AI Software. Available online: https://www.amd.com/en/products/software/vitis-ai.html (accessed on 15 November 2025).
FastML Team. fastmachinelearning/hls4ml. Available online: https://github.com/fastmachinelearning/hls4ml (accessed on 15 November 2025).
Apache Software Foundation. Apache TVM: An Open Machine Learning Compiler Framework. Available online: https://tvm.apache.org (accessed on 15 November 2025).
Intel. OpenVINO Toolkit: An Open Source AI Toolkit That Makes It Easier to Write Once, Deploy Anywhere. Available online: https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html (accessed on 15 November 2025).
Google. Edge TPU Compiler. Available online: https://www.coral.ai/docs/edgetpu/compiler (accessed on 15 November 2025).
Zhao, S.; Luo, Y.; Zhang, T.; Guo, W.; Zhang, Z. A domain specific knowledge extraction transformer method for multisource satellite-borne SAR images ship detection. ISPRS J. Photogramm. Remote Sens. 2023, 198, 16–29. [Google Scholar] [CrossRef]
Li, C.; Guo, W.; Zhang, Z.; Zhang, T. Self-Supervised Classification of SAR Images with Optical Image Assistance. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5221715. [Google Scholar] [CrossRef]
Zhang, Z.; Guo, W.; Zhu, S.; Yu, W. Toward Arbitrary-Oriented Ship Detection with Rotated Region Proposal and Discrimination Networks. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1745–1749. [Google Scholar] [CrossRef]
BAE SYSTEMS. Radiation-Hardened Electronics. Available online: https://www.baesystems.com/en/product/radiation-hardened-electronics (accessed on 15 November 2025).
eoPortal. GPM (Global Precipitation Measurement) Mission. Available online: https://www.eoportal.org/satellite-missions/gpm#gpm-global-precipitation-measurement-mission (accessed on 15 November 2025).
eoPortal. WorldView-1. Available online: https://www.eoportal.org/satellite-missions/worldview-1 (accessed on 15 November 2025).
eoPortal. Lunar Reconnaissance Orbiter. Available online: https://www.eoportal.org/satellite-missions/lro (accessed on 15 November 2025).
Gaisler. LEON3. Available online: https://www.gaisler.com/products/leon3 (accessed on 15 November 2025).
eoPortal. FormoSat-5 (Formosa Satellite 5). Available online: https://www.eoportal.org/satellite-missions/formosat-5 (accessed on 15 November 2025).
eoPortal. CHEOPS (CHaracterizing ExOPlanets Satellite). Available online: https://www.eoportal.org/satellite-missions/cheops (accessed on 15 November 2025).
Xilinx. Radiation-Hardened, Space-Grade Virtex-5QV Family Data Sheet: Overview. Available online: https://docs.amd.com/v/u/en-US/ds192_V5QV_Device_Overview (accessed on 15 November 2025).
eoPortal. NovaSAR-1. Available online: https://www.eoportal.org/satellite-missions/novasar-1 (accessed on 15 November 2025).
eoPortal. MCubed-2/COVE (Michigan Multipurpose Minisat-2/COVE Payload). Available online: https://www.eoportal.org/satellite-missions/mcubed-2 (accessed on 15 November 2025).
SpaceNews. Hundreds of Xilinx Space Grade FPGAs Deployed in Launch of Iridium NEXT Satellites. Available online: https://spacenews.com/hundreds-of-xilinx-space-grade-fpgas-deployed-in-launch-of-iridium-next-satellites (accessed on 15 November 2025).
AMD. Space-Grade AMD Kintex™ UltraScale™ XQR FPGA Family. Available online: https://www.amd.com/en/products/adaptive-socs-and-fpgas/fpga/kintex-ultrascale-xqr.html (accessed on 15 November 2025).
NASA. SpaceCube v3.0 Mini: NASA Next-Generation Data-Processing System for Advanced CubeSat Applications. Available online: https://ntrs.nasa.gov/api/citations/20190027308/downloads/20190027308.pdf (accessed on 15 November 2025).
Microchip. RTG4™ Radiation-Tolerant FPGAs. Available online: https://www.microchip.com/en-us/products/fpgas-and-plds/radiation-tolerant-fpgas/rtg4-radiation-tolerant-fpgas (accessed on 15 November 2025).
Nguyen, M. FPGAs Help Pave the Way for Modernized Space Computing Innovation Ecosystem. Available online: https://www.embedded.com/fpgas-help-pave-the-way-for-modernized-space-computing-innovation-ecosystem/ (accessed on 15 November 2025).
Trenti, M.; del Castillo, M.O.; Mearns, R.; McRobbie, J.; Therakam, C.; Chapman, A.; Woods, A.; Morgan, J.; Barraclough, S.; Mallo, I.R.; et al. SpIRIT mission: In-orbit results and technology demonstrations. arXiv 2024, arXiv:2407.14034. [Google Scholar] [CrossRef]
Herbst, T. Testing the NVIDIA Jetson Xavier NX Module for the SONATE-2 Nano Satellite Mission. In Proceedings of the IAC, Paris, France, 18–22 September 2022. [Google Scholar]
Bock, G. Sidus Space On-Orbit Edge Computer Now Fully Operational. Available online: https://spaceinsider.tech/2025/05/16/sidus-space-on-orbit-edge-computer-now-fully-operational (accessed on 15 November 2025).
Dolan, S.; Schuberth, L.; Arge, R.; Alarcia, R.M.G.; Messina, V.; Oliver, C.J.J.; Salmaso, F.; Sindermann, J.; Avogadro, F.S.; Golkar, A. Design and Analysis of an Event Camera Payload for Space-Based Object Detection on the EventSat 6U CubeSat Mission. In Proceedings of the 15th IAA Symposium for Small Satellites for Earth System Observation, Berlin, Germany, 4–8 May 2025. [Google Scholar]
Intel. Intel® Movidius™ Myriad™ 2 Vision Processing Unit 4GB. Available online: https://www.intel.com/content/www/us/en/products/sku/122461/intel-movidius-myriad-2-vision-processing-unit-4gb/specifications.html (accessed on 15 November 2025).
eoPortal. PhiSat-1 & -2 Nanosatellite Mission. Available online: https://www.eoportal.org/satellite-missions/phisat-1 (accessed on 15 November 2025).
Google. A Full Stack Platform for Edge AI. Available online: https://developers.google.com/coral (accessed on 15 November 2025).
Google. Meet Project Suncatcher, a Research Moonshot to Scale Machine Learning Compute in Space. Available online: https://blog.google/technology/research/google-project-suncatcher (accessed on 15 November 2025).
Unibap. iX5-106 Data Sheet. Available online: https://unibap.com/solutions/hardware/ix5 (accessed on 15 November 2025).
Hawaii Space Flight Laboratory. HyTI (Hyperspectral Thermal Imager). Available online: https://www.hsfl.hawaii.edu/missions/hyti (accessed on 15 November 2025).
Enterprise, H.P. HPE Spaceborne Computer. Available online: https://www.hpe.com/us/en/compute/hpc/supercomputing/spaceborne.html (accessed on 15 November 2025).
ISS National Lab. Sponsored Experiments Propel Spaceborne Computer Toward New Frontiers in Space Computing. Available online: https://issnationallab.org/press-releases/release-ng20-hpe-spaceborne-computer2 (accessed on 15 November 2025).
ESA. Newly Space Qualified Myriad 2 Video Processor to Fly on CubeSat Mission. Available online: https://www.esa.int/Enabling_Support/Space_Engineering_Technology/Shaping_the_Future/Newly_Space_Qualified_Myriad_2_Video_Processor_to_Fly_on_CubeSat_Mission (accessed on 15 November 2025).
Lentaris, G.; Leon, V.; Sakos, C.; Soudris, D.; Tavoularis, A.; Costantino, A.; Polo, C.B. Performance and Radiation Testing of the Coral TPU Co-processor for AI Onboard Satellites. In Proceedings of the 2023 European Data Handling & Data Processing Conference (EDHPC), Juan Les Pins, France, 2–6 October 2023; pp. 1–4. [Google Scholar] [CrossRef]
Swope, J.; Mirza, F.; Dunkel, E.; Towfic, Z.; Chien, S.; Russell, D.; Sauvageau, J.; Sheldon, D.; Fernandez, M.; Knox, C. Benchmarking Remote Sensing Image Processing and Analysis on the Snapdragon Processor Onboard the International Space Station. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 5305–5308. [Google Scholar] [CrossRef]
GOMSPACE. 6U STANDARD. Available online: https://gomspace.com/6u-standard.aspx (accessed on 15 November 2025).
Alén Space. 6U PLATFORM. Available online: https://alen.space/products/6u-platform (accessed on 15 November 2025).
Slater, W.S.; Tiwari, N.P.; Lovelly, T.M.; Mee, J.K. Total Ionizing Dose Radiation Testing of NVIDIA Jetson Nano GPUs. In Proceedings of the 2020 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 22–24 September 2020; pp. 1–3. [Google Scholar] [CrossRef]
AMD. AMD Versal AI Edge Portfolio Product Selection Guide. Available online: https://docs.amd.com/v/u/en-US/versal-ai-edge-product-selection-guide (accessed on 15 November 2025).
Qualcomm. Qualcomm Robotics RB5 Platform (Qualcomm QRB5165). Available online: https://www.qualcomm.com/content/dam/qcomm-martech/dm-assets/documents/qualcomm-robotics-rb5-platform-product-brief.pdf (accessed on 15 November 2025).
Mañas, O.; Lacoste, A.; Giró-i-Nieto, X.; Vazquez, D.; Rodríguez, P. Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data (SeCo). In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9394–9403. [Google Scholar] [CrossRef]
Ayush, K.; Uzkent, B.; Meng, C.; Tanmay, K.; Burke, M.; Lobell, D.; Ermon, S. Geography-Aware Self-Supervised Learning (GASSL). In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10161–10170. [Google Scholar] [CrossRef]
Akiva, P.; Purri, M.; Leotta, M. Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks (MATTER). In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 8193–8205. [Google Scholar] [CrossRef]
Cong, Y.; Khanna, S.; Meng, C.; Liu, P.; Rozi, E.; He, Y.; Burke, M.; Lobell, D.; Ermon, S. SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery. Adv. Neural Inf. Process. Syst. 2022, 35, 197–211. [Google Scholar]
Mai, G.; Lao, N.; He, Y.; Song, J.; Ermon, S. CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 23498–23515. [Google Scholar]
Mendieta, M.; Han, B.; Shi, X.; Zhu, Y.; Chen, C. GFM: Towards Geospatial Foundation Models via Continual Pretraining. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 16806–16816. [Google Scholar]
Bastani, F.; Wolters, P.; Gupta, R.; Ferdinando, J.; Kembhavi, A. SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 16726–16736. [Google Scholar] [CrossRef]
Reed, C.J.; Gupta, R.; Li, S.; Brockman, S.; Funk, C.; Clipp, B.; Keutzer, K.; Candido, S.; Uyttendaele, M.; Darrell, T. Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 4088–4099. [Google Scholar]
Fuller, A.; Millard, K.; Green, J. CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders. Adv. Neural Inf. Process. Syst. 2023, 36, 5506–5538. [Google Scholar]
Tang, M.; Cozma, A.; Georgiou, K.; Qi, H. Cross-Scale MAE: A Tale of Multiscale Exploitation in Remote Sensing. Adv. Neural Inf. Process. Syst. 2023, 36, 20054–20066. [Google Scholar]
Sun, X.; Wang, P.; Lu, W.; Zhu, Z.; Lu, X.; He, Q.; Li, J.; Rong, X.; Yang, Z.; Chang, H.; et al. RingMo: A Remote Sensing Foundation Model with Masked Image Modeling. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5612822. [Google Scholar] [CrossRef]
Wang, D.; Zhang, J.; Du, B.; Xia, G.S.; Tao, D. An Empirical Study of Remote Sensing Pretraining (RSP). IEEE Trans. Geosci. Remote Sens. 2023, 61, 5608020. [Google Scholar] [CrossRef]
Muhtar, D.; Zhang, X.; Xiao, P.; Li, Z.; Gu, F. CMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5607817. [Google Scholar] [CrossRef]
Astruc, G.; Gonthier, N.; Mallet, C.; Landrieu, L. OmniSat: Self-Supervised Modality Fusion for Earth Observation. arXiv 2024. [Google Scholar] [CrossRef]
Li, Z.; Hou, B.; Ma, S.; Wu, Z.; Guo, X.; Ren, B.; Jiao, L. Masked Angle-Aware Autoencoder for Remote Sensing Images (MA3E). In Proceedings of the Computer Vision—ECCV 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Springer: Cham, Switzerland, 2025; pp. 260–278. [Google Scholar] [CrossRef]
Nedungadi, V.; Kariryaa, A.; Oehmcke, S.; Belongie, S.; Igel, C.; Lang, N. MMEarth: Exploring Multi-modal Pretext Tasks for Geospatial Representation Learning. In Proceedings of the Computer Vision—ECCV 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Springer: Cham, Switzerland, 2025; pp. 164–182. [Google Scholar] [CrossRef]
Hong, D.; Zhang, B.; Li, X.; Li, Y.; Li, C.; Yao, J.; Yokoya, N.; Li, H.; Ghamisi, P.; Jia, X.; et al. SpectralGPT: Spectral Remote Sensing Foundation Model. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5227–5244. [Google Scholar] [CrossRef]
Guo, X.; Lao, J.; Dang, B.; Zhang, Y.; Yu, L.; Ru, L.; Zhong, L.; Huang, Z.; Wu, K.; Hu, D.; et al. SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery. In Proceedings of the Computer Vision and Pattern Recognition Conference, Seattle, WA, USA, 16–22 June 2024; pp. 27672–27683. [Google Scholar]
Han, B.; Zhang, S.; Shi, X.; Reichstein, M. msGFM: Bridging Remote Sensors with Multisensor Geospatial Foundation Models. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 27852–27862. [Google Scholar] [CrossRef]
Li, X.; Hong, D.; Chanussot, J. S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 27696–27705. [Google Scholar] [CrossRef]
Li, W.; Yang, W.; Hou, Y.; Liu, L.; Liu, Y.; Li, X. SARATR-X: Toward Building a Foundation Model for SAR Target Recognition. IEEE Trans. Image Process. 2025, 34, 869–884. [Google Scholar] [CrossRef] [PubMed]
Wang, D.; Hu, M.; Jin, Y.; Miao, Y.; Yang, J.; Xu, Y.; Qin, X.; Ma, J.; Sun, L.; Li, C.; et al. HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 6427–6444. [Google Scholar] [CrossRef]
Li, J.; Liu, Y.; Wang, X.; Peng, Y.; Sun, C.; Wang, S.; Sun, Z.; Ke, T.; Jiang, X.; Lu, T.; et al. HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; pp. 23048–23058. [Google Scholar]
Astruc, G.; Gonthier, N.; Mallet, C.; Landrieu, L. AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; pp. 19530–19540. [Google Scholar]
Do, M.K.; Han, K.; Lai, P.; Phan, K.T.; Xiang, W. RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 11–15 June 2025; pp. 7427–7436. [Google Scholar]
Zhang, Y.; Ru, L.; Wu, K.; Yu, L.; Liang, L.; Li, Y.; Chen, J. SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing. arXiv 2025. [Google Scholar] [CrossRef]
Jakubik, J.; Yang, F.; Blumenstiel, B.; Scheurer, E.; Sedona, R.; Maurogiovanni, S.; Bosmans, J.; Dionelis, N.; Marsocci, V.; Kopp, N.; et al. TerraMind: Large-Scale Generative Multimodality for Earth Observation. arXiv 2025. [Google Scholar] [CrossRef]
Bao, H.; Dong, L.; Piao, S.; Wei, F. BEiT: BERT Pre-Training of Image Transformers. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
Zhang, L.; Zhang, Z.; Guo, W.; Zhang, T.; Yu, W. 3DMAE: Joint SAR and Optical Representation Learning with Vertical Masking. IEEE Geosci. Remote Sens. Lett. 2023, 20, 4011505. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, Z.; Zhang, T.; Gao, G.; Yu, W. CDPrompt: Multimodal Change Detection with In-Domain Prompt in Missing Modality Scenarios. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5218318. [Google Scholar] [CrossRef]
Long, Y.; Xia, G.S.; Li, S.; Yang, W.; Yang, M.Y.; Zhu, X.X.; Zhang, L.; Li, D. On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances, and Million-AID. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4205–4230. [Google Scholar] [CrossRef]
Christie, G.; Fendley, N.; Wilson, J.; Mukherjee, R. Functional Map of the World. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6172–6180. [Google Scholar] [CrossRef]
Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 270–279. [Google Scholar]
Helber, P.; Bischke, B.; Dengel, A.; Borth, D. EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2217–2226. [Google Scholar] [CrossRef]
Sumbul, G.; Charfuelan, M.; Demir, B.; Markl, V. Bigearthnet: A large-scale benchmark archive for remote sensing image understanding. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 5901–5904. [Google Scholar]
Cheng, G.; Han, J.; Lu, X. Remote sensing image scene classification: Benchmark and state of the art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef]
ISPRS. ISPRS Test Project on Urban Classification, 3D Building Reconstruction and Semantic Labeling. Available online: https://www.isprs.org/resources/datasets/benchmarks/UrbanSemLab/default.aspx (accessed on 15 November 2025).
Waqas Zamir, S.; Arora, A.; Gupta, A.; Khan, S.; Sun, G.; Shahbaz Khan, F.; Zhu, F.; Shao, L.; Xia, G.S.; Bai, X. iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Daudt, R.C.; Le Saux, B.; Boulch, A.; Gousseau, Y. Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 2115–2118. [Google Scholar] [CrossRef]
Chen, H.; Shi, Z. A Spatial-Temporal Attention-Based Method and a New Dataset for Remote Sensing Image Change Detection. Remote Sens. 2020, 12, 1662. [Google Scholar] [CrossRef]
Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar] [CrossRef]
Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
Louis Klimek. Simple and Foolproof ways to Shrink, Compress, and Accelerate your Deep Learning, Neural Network, etc. Artificial Intelligence Models. Available online: https://piprogramming.org/articles/Simple-and-Foolproof-ways-to-Shrink-Compress-and-Accelerate-your-Deep-Learning-Neural-Network-etc-Artificial-Intelligence-Models-0000000015.html (accessed on 15 November 2025).
Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge distillation: A survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
Nagel, M.; Baalen, M.V.; Blankevoort, T.; Welling, M. Data-Free Quantization Through Weight Equalization and Bias Correction. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1325–1334. [Google Scholar] [CrossRef]
Cai, Y.; Yao, Z.; Dong, Z.; Gholami, A.; Mahoney, M.W.; Keutzer, K. ZeroQ: A Novel Zero Shot Quantization Framework. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 13166–13175. [Google Scholar] [CrossRef]
Xu, S.; Li, H.; Zhuang, B.; Jing, L.; Cao, J.; Liang, C.; Tan, M. Generative Low-Bitwidth Data Free Quantization. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; pp. 1–17. [Google Scholar]
Nagel, M.; Amjad, R.A.; Van Baalen, M.; Louizos, C.; Blankevoort, T. Up or Down? Adaptive Rounding for Post-Training Quantization. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; Volume 119, pp. 7197–7206. [Google Scholar]
Li, Y.; Gong, R.; Tan, X.; Yang, Y.; Hu, P.; Zhang, Q.; Yu, F.; Wang, W.; Gu, S. {BRECQ}: Pushing the Limit of Post-Training Quantization by Block Reconstruction. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
Wu, Z.; Zhang, J.; Chen, J.; Guo, J.; Huang, D.; Wang, Y. APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers. In Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 10–17 June 2025; pp. 9686–9695. [Google Scholar] [CrossRef]
Hubara, I.; Courbariaux, M.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2704–2713. [Google Scholar] [CrossRef]
Li, Y.; Xu, S.; Zhang, B.; Cao, X.; Gao, P.; Guo, G. Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer. Adv. Neural Inf. Process. Syst. 2022, 35, 34451–34463. [Google Scholar]
Chen, M.; Shao, W.; Xu, P.; Wang, J.; Gao, P.; Zhang, K.; Luo, P. EfficientQAT: Efficient Quantization-Aware Training for Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria, 27 July–1 August 2025; pp. 10081–10100. [Google Scholar] [CrossRef]
LeCun, Y.; Denker, J.; Solla, S. Optimal Brain Damage. Adv. Neural Inf. Process. Syst. 1989, 2, 598–605. [Google Scholar]
Hassibi, B.; Stork, D. Second order derivatives for network pruning: Optimal Brain Surgeon. Adv. Neural Inf. Process. Syst. 1992, 5, 164–171. [Google Scholar]
Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In Proceedings of the ICLR, San Juan, PR, USA, 2–4 May 2016. [Google Scholar]
Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both Weights and Connections for Efficient Neural Network. Adv. Neural Inf. Process. Syst. 2015, 28, 1–9. [Google Scholar]
Lee, N.; Ajanthan, T.; Torr, P. SNIP: Single-shot network pruning based on connection sensitivity. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Wang, C.; Zhang, G.; Grosse, R. Picking Winning Tickets Before Training by Preserving Gradient Flow. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Tanaka, H.; Kunin, D.; Yamins, D.L.; Ganguli, S. Pruning neural networks without any data by iteratively conserving synaptic flow. Adv. Neural Inf. Process. Syst. 2020, 33, 6377–6389. [Google Scholar]
Evci, U.; Gale, T.; Menick, J.; Castro, P.S.; Elsen, E. Rigging the Lottery: Making All Tickets Winners. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; Volume 119, pp. 2943–2952. [Google Scholar]
Frantar, E.; Alistarh, D. SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; Volume 202, pp. 10323–10337. [Google Scholar]
Sun, M.; Liu, Z.; Bair, A.; Kolter, J.Z. A Simple and Effective Pruning Approach for Large Language Models. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H.P. Pruning Filters for Efficient ConvNets. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Luo, J.H.; Wu, J.; Lin, W. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5068–5076. [Google Scholar] [CrossRef]
Liu, Z.; Li, J.; Shen, Z.; Huang, G.; Yan, S.; Zhang, C. Learning Efficient Convolutional Networks through Network Slimming. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2755–2763. [Google Scholar] [CrossRef]
He, Y.; Liu, P.; Wang, Z.; Hu, Z.; Yang, Y. Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4335–4344. [Google Scholar] [CrossRef]
Yu, R.; Li, A.; Chen, C.F.; Lai, J.H.; Morariu, V.I.; Han, X.; Gao, M.; Lin, C.Y.; Davis, L.S. NISP: Pruning Networks Using Neuron Importance Score Propagation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9194–9203. [Google Scholar] [CrossRef]
Yang, T.J.; Howard, A.; Chen, B.; Zhang, X.; Go, A.; Sandler, M.; Sze, V.; Adam, H. NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; pp. 289–304. [Google Scholar]
He, Y.; Lin, J.; Liu, Z.; Wang, H.; Li, L.J.; Han, S. AMC: AutoML for Model Compression and Acceleration on Mobile Devices. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; pp. 815–832. [Google Scholar]
Zhou, A.; Ma, Y.; Zhu, J.; Liu, J.; Zhang, Z.; Yuan, K.; Sun, W.; Li, H. Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
Lu, Y.; Agrawal, S.; Subramanian, S.; Rybakov, O.; De Sa, C.; Yazdanbakhsh, A. STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; Volume 202, pp. 22812–22824. [Google Scholar]
Fang, G.; Ma, X.; Mi, M.B.; Wang, X. Isomorphic Pruning for Vision Models. In Proceedings of the Computer Vision—ECCV 2024, Milan, Italy, 29 September–4 October 2024; pp. 232–250. [Google Scholar]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015. [Google Scholar] [CrossRef]
Romero, A.; Ballas, N.; Kahou, S.E.; Chassang, A.; Gatta, C.; Bengio, Y. FitNets: Hints for Thin Deep Nets. arXiv 2015, arXiv:1412.6550. [Google Scholar] [CrossRef]
Zagoruyko, S.; Komodakis, N. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Park, W.; Kim, D.; Lu, Y.; Cho, M. Relational Knowledge Distillation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3962–3971. [Google Scholar] [CrossRef]
Tian, Y.; Krishnan, D.; Isola, P. Contrastive Representation Distillation. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
Jiao, X.; Yin, Y.; Shang, L.; Jiang, X.; Chen, X.; Li, L.; Wang, F.; Liu, Q. TinyBERT: Distilling BERT for Natural Language Understanding. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; pp. 4163–4174. [Google Scholar] [CrossRef]
D’Abbondanza, N.; Tzelepis, S.; Ghielmetti, N.; Kakogeorgiou, I.; Buchova, V.; Karantzalos, K.; Kikaki, K.; Lemoine, N.M.; Pierini, M.; Summers, S.; et al. Edge SpAIce: Deep Learning Deployment Pipeline for Onboard Data Reduction on Satellite FPGAs. In Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Milano, Italy, 3–7 June 2025; pp. 1243–1249. [Google Scholar] [CrossRef]
Ziaja, M.; Bosowski, P.; Myller, M.; Gajoch, G.; Gumiela, M.; Protich, J.; Borda, K.; Jayaraman, D.; Dividino, R.; Nalepa, J. Benchmarking Deep Learning for On-Board Space Applications. Remote Sens. 2021, 13, 3981. [Google Scholar] [CrossRef]
Neris, R.; Rodríguez, A.; Guerra, R.; López, S.; Sarmiento, R. FPGA-Based Implementation of a CNN Architecture for the On-Board Processing of Very High-Resolution Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 3740–3750. [Google Scholar] [CrossRef]
Ijaz, H.; Ahmad, R.; Ahmed, R.; Ahmed, W.; Kai, Y.; Jun, W. A UAV-Assisted Edge Framework for Real-Time Disaster Management. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1001013. [Google Scholar] [CrossRef]
Shinde, T. Model Compression Meets Resolution Scaling for Efficient Remote Sensing Classification. In Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Tucson, AZ, USA, 28 February–4 March 2025; pp. 1576–1584. [Google Scholar] [CrossRef]
Zou, G.; Yao, L.; Liu, F.; Zhang, C.; Li, X.; Chen, N.; Xu, S.; Zhou, J. RemoteTrimmer: Adaptive Structural Pruning for Remote Sensing Image Classification. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; pp. 1–5. [Google Scholar] [CrossRef]
NASA. High Performance Spaceflight Computing. Available online: https://www.nasa.gov/game-changing-development-projects/high-performance-spaceflight-computing-hpsc (accessed on 15 November 2025).
Oldham, T.R. Scaling and single event effects (SEE) sensitivity. In Proceedings of the IEEE Nuclear and Space Radiation Effects Conference, Monterey, CA, USA, 21–25 July 2003. [Google Scholar]
Bacchini, A.; Furano, G.; Rovatti, M.; Ottavi, M. Total Ionizing Dose Effects on DRAM Data Retention Time. IEEE Trans. Nucl. Sci. 2014, 61, 3690–3693. [Google Scholar] [CrossRef]
Pedretti, G.; Ielmini, D. In-Memory Computing with Resistive Memory Circuits: Status and Outlook. Electronics 2021, 10, 1063. [Google Scholar] [CrossRef]
Alizadeh, K.; Mirzadeh, S.I.; Belenko, D.; Khatamifard, S.; Cho, M.; Del Mundo, C.C.; Rastegari, M.; Farajtabar, M. LLM in a flash: Efficient Large Language Model Inference with Limited Memory. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; pp. 12562–12584. [Google Scholar] [CrossRef]
Haut, J.M.; Paoletti, M.E.; Moreno-Álvarez, S.; Plaza, J.; Rico-Gallego, J.A.; Plaza, A. Distributed Deep Learning for Remote Sensing Data Interpretation. Proc. IEEE 2021, 109, 1320–1349. [Google Scholar] [CrossRef]

Figure 1. Overall pipeline for the deployment of RSFMs on resource-constrained onboard devices.

Figure 2. Overview of self-supervised learning strategies for VFMs. (a) Contrastive learning. (b) MAE-based masked image modeling.

Figure 3. Different remote sensing data modalities. From left to right: high-resolution optical image (RGB), multispectral imagery (MSI), synthetic aperture radar (SAR), hyperspectral imagery (HSI), and digital elevation model (DEM).

Figure 4. Structure of parameter-efficient fine-tuning (PEFT).

Figure 5. Overview of model compression techniques. (a) Floating point weights are quantized to lower-bit index with the lookup table. The image is taken from ref. [164]. (b) Pruning removes redundant structures, where gray neurons are eliminated. (c) Knowledge distillation transfers knowledge from a large-scale teacher model to a smaller student model. Image is modified from ref. [165].

Table 2. Comparison of diverse hardware processors.

Hardware	Performance	Power	Flexibility	Cost
CPU	Very Low	Low–Medium	Very High	Low
GPU	High	Very High	High	Medium
FPGA	Medium	Medium	High	High
ASIC	High	Low	Very Low	Low–High

Table 3. Computing hardware platforms and onboard examples for RS. Hybrid means SOCs combining different hardware types.

Qualification	Framework	Platform	Satellite/Project	Organization
Rad-Hard	CPU	RAD750 [87]	GPM Core Observatory [88]	NASA and JAXA
			WorldView-1 [89]	ESA
			LRO [90]	NASA
		LEON Series [91]	Formosat-5 [92]	NSPO
		LEON Series [91]	Cheops [93]	ESA
	FPGA	Virtex-5QV [94]	NovaSAR [95]	UKSA and CSIRO
			MCubed/COVE-2 [96]	University of Michgan
			Iridium NEXT [97]	Iridium
		UltraScaleXQR [98]	SpaceCube v3.0 [99]	NASA
		RTG4 [100]	JUICE [101]	ESA
COTS	GPU	Jetson Series [57]	SpIRIT [102]	University of Melbourne
			SONATE-2 [103]	University of Würzburg
			LizzieSat-3 [104]	Sidus Space
			EventSat [105]	Technical University of Munich
	VPU	Myriad 2 [106]	PhiSat-1 and PhiSat-2 [107]	ESA
	TPU	Coral Edge [108]	Project Suncatcher [109]	-
	Hybrid	Unibap iX5 [110]	HyTI [111]	Hawaii Space Flight Laboratory
	Hybrid	SBC-2 [112]	International Space Station [113]	ISS National Lab

Table 4. Comparison of some hardware platforms for edge deployment in terms of performance, power consumption, and memory capacity. The performance values listed in Table 4 are all based on TOPS at INT8 precision. Qualified indicates the space qualification, where ✓ means RHBD (radiation-hardened by design), ∘ means COTS but with radiation testing, and ✗ means no radiation information.

Platform	Framework	Performance	Power	Memory	Qualified
Myriad 2 [106]	VPU	∼1 TOPS	∼1 W	512 MB LPDDR3	✓
Coral Edge [108]	TPU	4 TOPS	∼2 W	∼8 MB SRAM	∘
Jetson Orin Nano [57]	GPU	67 TOPS	7–15 W	8 GB LPDDR5	∘
Jetson Orin NX [57]	GPU	157 TOPS	10–25 W	16 GB LPDDR5	∘
Jetson AGX Xavier [57]	GPU	22 TOPS	10–30 W	64 GB LPDDR4x	∘
XQR Versal VE2102 [120]	Hybrid	∼8 TOPS	∼10 W	6 MB BRAM, 64 GB DDR4	✓
Qualcomm RB5 [121]	Hybrid	15 TOPS	6–15 W	16 GB LPDDR5	✗

Table 5. Summary of existing VFMs.

Model	Publication	Backbone	Pretrain	Model Size	Modality					Time-Series
Model	Publication	Backbone	Pretrain	Model Size	RGB	MSI	SAR	HSI	DEM	Time-Series
SeCo [122]	ICCV2021	ResNet-50	Contrastive	23.5 M	✓
GASSL [123]	ICCV2021	ResNet-50	Contrastive	21.3 M	✓
MATTER [124]	CVPR2022	ResNet-34	Contrastive	23.5 M
SatMAE [125]	NIPS2022	ViT	MIM	307 M	✓	✓				✓
CSP [126]	ICML2023	ResNet-50	Contrastive	42 M	✓
GFM [127]	ICCV2023	SwinT	MIM	∼665 M	✓
SatLasNet [128]	ICCV2023	SwinT	Supervised	88 M	✓	✓	✓			✓
Scale-MAE [129]	ICCV2023	ViT	MIM	322 M	✓
CROMA [130]	NIPS2023	ViT	Hybrid	86 M		✓	✓
CS-MAE [131]	NIPS2023	ViT	Hybrid	307 M	✓
RingMo [132]	TGRS2023	ViT	MIM	∼300 M	✓
RSP [133]	TGRS2023	ViT	Supervised	<50 M	✓
CMID [134]	TGRS2023	SwinT	Hybrid	25∼100 M	✓
OmniSat [135]	ECCV2024	ViT	Hybrid	10 M	✓		✓			✓
MA3E [136]	ECCV2024	ViT	MIM	∼100 M	✓
MMEarth [137]	ECCV2024	ConvNeXt	MIM	∼100 M		✓	✓		✓
SpectralGPT [138]	TPAMI2024	ViT	MIM	86∼632 M		✓				✓
SkySense [139]	CVPR2024	SwinT+ViT	Contrastive	2.06 B	✓	✓	✓			✓
msGFM [140]	CVPR2024	Swin	MIM	89 M	✓	✓	✓		✓
S2MAE [141]	CVPR2024	ViT	MIM	86 M		✓
SARATR-X [142]	TIP2025	ViT	MIM	66 M			✓
HyperSIGMA [143]	TPAMI2025	ViT	MIM	1 B				✓
HyperFree [144]	CVPR2025	SAM	Supervised	-				✓
AnySat [145]	CVPR2025	ViT	Hybrid	125 M	✓	✓	✓			✓
RobSense [146]	CVPR2025	ViT	MIM	-		✓	✓			✓
SkySense V2 [147]	ICCV2025	SwinT+ViT	Contrastive	665 M∼2 B	✓	✓	✓			✓
TerraMind [148]	ICCV2025	ViT	MIM	-		✓	✓		✓

Table 6. Comparative performance of PEFT methods. Latency means the additional inference time introduced by the method, while Storage indicates the extra memory cost due to added modules.

Method	Structure	Trainable Prams	Latency	Storage
UPetu [75]	Adapter	0.73%	Low	Medium
KronA+ [74]	LoRA	0.056%	None	Low
PromptSeg [76]	Prompt	6%∼9.9%	Medium	Medium

Table 7. Existing large-scale RS multimodal datasets for VFMs pretraining.

Dataset	Size	Modalities	Sources	GSD/m
SatlasPretrain [128]	856 K	HS-RGB, MSI	Sentinel-2, NAIP	0.5–2, 10
SkySense [139]	21.5 M	HS-RGB, T-MSI, T-SAR	Sentinel-1, Sentinel-2, WorldView	-
MMEarth [137]	1.2 × 12 M	MSI, SAR, DEM	Sentinel-2, Sentinel-1, Aster	10, 20, 60
GeoPile-2 [140]	2 M	RGB, MSI, SAR, DSM	GeoPile, MillionAID, SEN12MS, MDAS	0.1–30
GeoPlex [145]	-	HS-RGB, T-MSI, T-SAR	TreeSatAI, FLAIR, PLANTED, S2NAIP, PASTIS	0.2–250
TerraMind [148]	9 M	MSI, SAR, DEM	SSL4EO-S12, MajorTOM-Core, Copernicus	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sang, H.; Zhang, L.; Chen, T.; Guo, W.; Zhang, Z. Onboard Deployment of Remote Sensing Foundation Models: A Comprehensive Review of Architecture, Optimization, and Hardware. Remote Sens. 2026, 18, 298. https://doi.org/10.3390/rs18020298

AMA Style

Sang H, Zhang L, Chen T, Guo W, Zhang Z. Onboard Deployment of Remote Sensing Foundation Models: A Comprehensive Review of Architecture, Optimization, and Hardware. Remote Sensing. 2026; 18(2):298. https://doi.org/10.3390/rs18020298

Chicago/Turabian Style

Sang, Hanbo, Limeng Zhang, Tianrui Chen, Weiwei Guo, and Zenghui Zhang. 2026. "Onboard Deployment of Remote Sensing Foundation Models: A Comprehensive Review of Architecture, Optimization, and Hardware" Remote Sensing 18, no. 2: 298. https://doi.org/10.3390/rs18020298

APA Style

Sang, H., Zhang, L., Chen, T., Guo, W., & Zhang, Z. (2026). Onboard Deployment of Remote Sensing Foundation Models: A Comprehensive Review of Architecture, Optimization, and Hardware. Remote Sensing, 18(2), 298. https://doi.org/10.3390/rs18020298

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Onboard Deployment of Remote Sensing Foundation Models: A Comprehensive Review of Architecture, Optimization, and Hardware

Highlights

Abstract

1. Introduction

2. Background

2.1. Onboard Computing for Remote Sensing

2.2. The Application of Foundation Models to the Edge

3. Onboard Deployment of RSFMs

3.1. Overall Pipeline

3.1.1. Data Layer

3.1.2. Model Development Layer

3.1.3. Model Optimization Layer

3.1.4. System Layer

3.1.5. Application Layer

3.2. Hardware Platforms and Products

3.2.1. Hardware Types

3.2.2. Rad-Hard

3.2.3. COTS

3.3. Case Study and Feasibility Analysis for RSFMs Deployment

3.3.1. Critical Restrictions

3.3.2. Case Study

4. Remote Sensing Foundation Models (RSFMs)

4.1. Architecture and Pretraining Strategy

4.1.1. Contrastive Learning

4.1.2. Masked Image Modeling

4.1.3. Hybrid Strategy

4.2. Data Modality

4.3. Parameter-Efficient Fine-Tuning (PEFT)

4.4. Dataset

5. Model Optimization and Compression

5.1. Quantization

5.1.1. Post-Training Quantization (PTQ)

5.1.2. Quantization-Aware Training (QAT)

5.2. Pruning

5.2.1. Unstructured Pruning

5.2.2. Structured Pruning

5.3. Knowledge Distillation

5.4. Application in RS

6. Challenges and Future Opportunities

6.1. Radiation-Tolerant High-Performance Platform

6.2. Memory Optimization

6.3. Distributed Inference and Collaborative Computing

6.4. Onboard Vision Language Autonomy

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI