A Unified Framework for Cross-Domain Space Drone Pose Estimation Integrating Offline Domain Generalization with Online Domain Adaptation

Yu, Yingjian; Li, Zhang; Yu, Qifeng

doi:10.3390/drones9110774

Open AccessArticle

A Unified Framework for Cross-Domain Space Drone Pose Estimation Integrating Offline Domain Generalization with Online Domain Adaptation

by

Yingjian Yu

^1,2

,

Zhang Li

^1,2,* and

Qifeng Yu

^1,2

¹

College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410000, China

²

Hunan Provincial Key Laboratory of Image Measurement and Vision Navigation, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Drones 2025, 9(11), 774; https://doi.org/10.3390/drones9110774

Submission received: 1 September 2025 / Revised: 20 October 2025 / Accepted: 6 November 2025 / Published: 7 November 2025

Download

Browse Figures

Versions Notes

Abstract

In this paper, we present a Unified Framework for cross-domain Space drone Pose Estimation (UF-SPE), addressing the simulation-to-reality gap that limits the deployment of deep learning models in real space missions. The proposed UF-SPE framework integrates offline domain generalization with online unsupervised domain adaptation. During offline training, the model relies exclusively on synthetic images. It employs advanced augmentation techniques and a multi-task architecture equipped with Domain Shifting Uncertainty modules to improve the learning of domain-invariant features. In the online phase, normalization layers are fine-tuned using unlabeled real-world imagery via entropy minimization, allowing for the system to adapt to target domain distributions without manual labels. Experiments on the SPEED+ benchmark demonstrate that the UF-SPE achieves competitive accuracy with just 12.9 M parameters, outperforming the comparable lightweight baseline method by 37.5% in pose estimation accuracy. The results validate the framework’s efficacy and efficiency for robust cross-domain space drone pose estimation, indicating promise for applications such as on-orbit servicing, debris removal, and autonomous rendezvous.

Keywords:

cross-domain robustness; space drone pose estimation; landmark localization

1. Introduction

Deep learning-based monocular pose estimation for non-cooperative space drones has garnered significant attention in recent years due to its advantages in low power consumption, structural simplicity, and cost-effectiveness, holding significant potential for on-orbit servicing, debris removal, and autonomous rendezvous [1,2]. Due to the autonomous and intelligent nature of the spacecraft discussed in this article, we prefer to refer to it as a space drone. Since the inaugural Satellite Pose Estimation Challenge in 2019 (SPEC2019), key empirical insights have emerged, such as the superiority of combining landmark localization with Perspective-n-Point (PnP) solvers over end-to-end regression approaches and the role of preliminary object detection in mitigating perspective scaling issues while enhancing landmark localization accuracy [3]. Substantial progress has been made in Space drone Pose Estimation, and these foundational works have profoundly advanced monocular Space drone Pose Estimation technologies, driving developments in benchmark datasets (e.g., SPEED [4], SwissCube [5]), prediction accuracy enhancements [6], and model lightweighting [7,8]. State-of-the-art performance has advanced significantly, with rotation error decreasing from 0.728° to 0.266° and translation error reducing from 0.036 m to 0.017 m, representing a 63.5% and 52.8% improvement, respectively, since the leading solution of 2019 [7,9].

However, these methods are predominantly trained on computer-simulated imagery and validated on simulation data of identical distribution. Due to unknown distribution shifts between real imagery of operational on-orbit space drones and synthetic renders, such approaches exhibit limited applicability in real-world scenarios [10]. To investigate the domain gap in Space drone Pose Estimation, Stanford University and the European Space Agency (ESA) jointly organized the 2nd Satellite Pose Estimation Competition in 2021 (SPEC2021) [11]. The competition provided annotated synthetic data alongside two unlabeled real-world datasets (Lightbox and Sunlamp) captured in laboratory settings emulating distinct space illumination conditions. Given the challenges in acquiring real on-orbit imagery imposed by orbital dynamics and security protocols, the competition designated these laboratory-captured images as practical proxies for operational space drone. Experiments in Ref. [12] demonstrate that the accuracy distribution of tested models on this proxy test set aligns well with performance on real imagery from the PRISMA mission [13], confirming the proxy’s practical significance.

All top three winning solutions [11,14,15] in SPEC2021 consistently employed test domain imagery during the offline training phase. Wang et al. [14] and Pérez et al. [15] utilized pseudo-labeling integrated with self-training to effectively extract transferable features from unlabeled target domain imagery, while the other utilized refined feature alignment through generative-discriminative optimization [11]. Although these domain-adaptation methods have achieved remarkable performance metrics, as noted in Ref. [16], access to real on-orbit imagery during the offline training phases remains highly constrained in most practical scenarios. This limitation restricts the applicability of existing offline domain-adaptation approaches. To address this, SPNv2 introduced Online Domain Refinement (ODR), pioneering a more pragmatic pathway toward real-world deployment [16]. Subsequently, research has increasingly focused on domain generalization techniques, including multi-task learning [17,18], data augmentation [19,20,21], vision Transformers [22], and dense landmark regression [23].

However, to the best of our knowledge, existing studies predominantly address isolated aspects of Space drone Pose Estimation. The aforementioned methods generally suffer from either limited accuracy or excessive model complexity. Consequently, the community urgently requires a unified framework to obtain appropriate accuracy with a compact model scale. To bridge this gap, we propose a Unified Framework for cross-domain Space drone Pose Estimation (UF-SPE) integrating two core phases: (1) offline domain generalization leveraging data augmentation and network architecture design to enhance domain-invariant feature extraction, and (2) online domain adaptation utilizing unlabeled on-orbit space drone imagery to refine model parameters and align output distributions. The contributions of this work can be summarized as follows:

(1): We propose to integrate style transfer augmentation [24] and composition-aware domain randomization to synthesize diverse training images.
(2): We propose to integrate a Domain Shifting Uncertainty (DSU) [25] estimation module with a multi-task learning architecture for bolstering the model’s generalization to unseen domains.
(3): During online adaptation, we adopt SPNv2’s ODR [16] to fine-tune normalization layers using unlabeled real on-orbit image streams, achieving output distribution alignment.

Experiments on the SPEED+ benchmark validate the framework’s efficacy. With only 12.9 million parameters, our method ranks third in accuracy. The top two ranked methods [22,23] require over six times the parameter count of our approach (e.g., 86.3 M vs. 12.9 M), demonstrating that the propose UF-SPE has a substantial reduction in computational resource demands while maintaining competitive accuracy. Compared to SPNv2 (similar parameter scale), our approach reduces error of pose estimation by 37.5% (metric:

{\bar{E}}_{S c o r e}

). Generally, a model with a smaller number of parameters corresponds to lower model complexity, reduced consumption of computational and memory resources, and consequently, lower hardware requirements. Models with fewer parameters often exhibit higher deployment feasibility in resource-constrained environments [6]. Therefore, the trade-off achieved by our method between the number of parameters and model accuracy holds significant practical value.

The remainder of this paper is organized as follows: Section 2 reviews the related work; Section 3 presents the preliminaries and notation; Section 4 describes the proposed UF-SPE method in detail, including offline style-transfer and constituent-aware domain randomization, the DSU module, and the online unsupervised domain-adaptation strategy; Section 5 details the experimental setup, datasets, and evaluation metrics, and compares our method with existing approaches; Section 6 presents ablation studies analyzing the contribution of each component; and Section 7 concludes this paper and discusses directions for future research.

2. Related Work

2.1. Vanilla Space Drone Pose Estimation from Monocular Images

Prior to the advent of deep learning methodologies, conventional approaches relied on handcrafted features, such as Scale-Invariant Feature Transform (SIFT) [26], Oriented FAST and Rotated BRIEF (ORB) [27], and Affine-SIFT (ASIFT) [28,29], to accomplish pose estimation tasks. While these methods advanced the development of pose estimation techniques, their limited representational capacity for shallow features rendered them inadequate for high-precision pose estimation of non-cooperative space drone.

Driven by the availability of large-scale annotated datasets, Deep Neural Networks (DNNs) have emerged as the dominant paradigm for this task in recent years. DNN-based methods, broadly categorized into direct methods and PnP-based methods, significantly surpass traditional approaches in pose estimation accuracy [1].

Direct methods utilize DNNs to regress 6-DoF pose parameters end-to-end. To address the sensitivity of traditional vision-based methods to illumination variations, Sharma et al. conducted early explorations of DNN-based approaches. They generated large-scale annotated synthetic imagery through OpenGL rendering and proposed the space drone pose network (SPN) to directly predict pose parameters for non-cooperative space drone [4]. Subsequently, Sharma et al. released the Spacecraft Pose Estimation Dataset (SPEED) and organized the inaugural Satellite Pose Estimation Challenge in 2019 (SPEC2019) to benchmark methods [3]. Phisannupawong et al. [30] pioneered a GoogLeNet-based CNN architecture for directly regressing the 7D pose vector (position and orientation quaternion). Proença et al. refined the rotation regression component of direct methods by discretizing the rotation space via soft classification technology [31]. However, empirical evidence indicates that direct methods generally underperform PnP-based methods in pose estimation accuracy, primarily due to the inherent difficulty of DNNs in learning complex mappings from images to 6-DoF poses [3].

PnP-based methods integrate Deep Neural Networks with PnP algorithms through three sequential stages: object detection, landmark localization, and PnP pose solving [7]. Landmark localization mainly encompasses two distinct methodologies: heatmap-based and coordinate-based approaches. The former models landmarks as probability heatmaps, regressing high-dimensional feature maps via DNNs and subsequently recovering coordinate values through post-processing, whereas the latter directly regresses coordinate values using DNNs. Chen et al. [9] adopted the heatmap-based method, securing first place in the 2019 Satellite Pose Estimation Challenge (SPEC2019). However, the high precision of heatmap-based methods critically depends on high-resolution inputs, high-resolution feature maps, and additional post-processing steps (e.g., sub-pixel refinement) to mitigate quantization errors [32]. In contrast, coordinate-based methods offer a more streamlined implementation. Subsequently, Park et al. proposed the Keypoints Regression Network (KRN) for direct coordinate regression [18]. Yu et al. conducted a comprehensive study on non-cooperative Space drone Pose Estimation; leveraging architectural advantages, their coordinate-based approach achieved state-of-the-art performance, ranking first on the SPEC2019 post-competition leaderboard [7].

Recently, significant efforts have focused on model lightweighting and edge-device deployment [6,8]. Wang et al. introduced an upsampling module based on DiteHRNet [33] and a top-k weighted heatmap decoding strategy, achieving an optimal balance between accuracy and computational complexity under hardware constraints [6].

Nevertheless, all of the aforementioned works were trained and evaluated on simulation data drawn from the same distribution. During the offline training phase, as has been pointed out in Ref. [16], real space drone imagery is often unavailable. Therefore, direct application to real-world scenarios often encounters significant simulation-to-reality domain gaps, primarily due to discrepancies in illumination, material reflectance, and sensor noise characteristics.

2.2. Cross-Domain Space Drone Pose Estimation from Monocular Images

The organization of the 2nd Spacecraft Pose Estimation Competition 2021 (SPEC2021) and the release of the SPEED+ dataset have significantly heightened research focus on cross-domain Space drone Pose Estimation [14,15]. In the SPEC2021, the top-performing solutions in both Sunlamp and Lightbox categories adopted generative adversarial modules to address domain shift challenges [34]. Specifically, the Sunlamp winner leveraged pseudo-labeling integrated with iterative enabling progressive feature extraction from unlabeled target domains. This strategy enhanced cross-domain representation learning and significantly improved target-domain test performance [14]. Meanwhile, the Lightbox winner augmented the state-of-the-art SPEC2019 methodology by incorporating adversarial training paradigms, which refined feature alignment through generative-discriminative optimization [11]. Notably, the runner-up solution in both categories employed a unified framework combining pseudo-label refinement, self-training cycles, and 3D geometry-guided loss combinations. This multi-component approach enhanced pose estimation robustness against appearance variations [15,35]. To further mitigate error propagation in pseudo-labels, Liu et al. introduced RANSAC-EPnP consensus mechanisms, which iteratively rejected outlier predictions during self-training [10].

The aforementioned methods implicitly assume the availability of real space drone imagery during the offline training phase. However, due to the frequent unavailability of real in-orbit space drone imagery during the offline training phase, subsequent research has increasingly prioritized domain generalization and online training paradigms to mitigate simulation-to-reality performance degradation.

The pioneering SPNv2 approach adopted a multi-task, multi-scale architecture to enhance domain-invariant feature extraction integrated with ODR that leverages self-supervised entropy minimization during the test phase for target domain adaptation with marginal computational overhead [16]. Subsequent work introduced a Keypoint Positioning Network trained on dual auxiliary tasks–spacecraft segmentation and component-wise facet identification-augmented by domain randomization and histogram equalization [21]. Further architectural innovations include pyramid-structured vision transformers to strengthen cross-domain generalization [17] and NeRF [36] for synthetic training data expansion [19,20].

Recently, SPNv3 advanced this paradigm through extensive data augmentation and Vision Transformer-based transfer learning [22]. The EagleNet proposed dense 2D–3D correspondence prediction with pixel-wise uncertainty estimation, generating multiple pose hypotheses refined through iterative optimization [23]. However, these two high-precision approaches exhibit notable limitations in model complexity and parameter efficiency. To address this, a Feature-aided Variational Auto-Encoder (FA-VAE) was developed, achieving competitive accuracy with superior inference speed, although its cross-domain generalization remains constrained by domain-specific training [37].

Addressing the prevailing challenge of inaccessible real space drone imagery during offline training and unlabeled real-time data during online operation, this paper proposes a unified framework for cross-domain Space drone Pose Estimation. The framework integrates offline domain generalization with online unsupervised domain adaptation, effectively leveraging both labeled data from the training phase and unlabeled data during deployment, thereby exhibiting significant practical potential.

3. Preliminaries

This study addresses the task of Space drone Pose Estimation, specifically the recovery of a target space drone’s 6-Degrees-of-Freedom (6DoF) pose relative to the camera from a single image

I \in R^{H \times W \times 3}

. The pose comprises rotation

R \in S O (3)

and translation

T \in R^{3}

. Let M denote the number of landmarks and the i-th landmark’s 2D image coordinates and 3D world coordinates be represented as

p_{i}

and

P_{i}

, respectively, where

i = 1, 2, \dots, M

. The PnP problem can be formulated as follows [38]:

min_{R, t} \sum_{i}^{M} {∥λ_{i} p_{i} - K [R ∣ T] P_{i}∥}_{2},

(1)

where

K

is camera intrinsic parameter matrix and

λ_{i}

is the depth of the i-th landmark. Following Yu et al. [7], this work employs EPnP [39] as the foundational pose solver, integrates RANSAC (Random Sample Consensus) [40] for outlier rejection, and adopts a nonlinear iterative optimization algorithm to refine the estimated pose.

4. Method

To resolve the domain shift problem arising from unavailable real-space images during offline training and unlabeled target-domain images during online deployment, we propose the UF-SPE integrating offline domain generalization with online domain refinement. The framework architecture synergistically coordinates three components (Figure 1). And the overall flowchart of the proposed UF-SPE Framework is shown in Figure 2.

Style transfer and constituent-aware randomization techniques are implemented to augment textural diversity in training images. This strategy explicitly attenuates the model’s dependence on textural features, which exhibit high susceptibility to domain shift, while reinforcing the extraction of cross-domain invariant geometric features.

A multi-task architecture incorporating uncertainty modeling modules is adopted, wherein domain shifts are parameterized as a multivariate Gaussian distribution. This design enhances the network’s capacity to isolate domain-invariant features through probabilistic regularization.

During real-time deployment, an unsupervised adaptation scheme fine-tunes the model using unlabeled target-domain images. This closed-loop mechanism iteratively aligns the model with target-domain distributions.

4.1. Augmentation

This study adopts style augmentation [24] and Constituent-Aware Domain Random-ization (CADR) for data enhancement. The style augmentation follows the methodology described in the existing literature, which involves loading a pre-trained style transfer model to randomly transform the texture and style of training images. Let the i-th input training image be

I_{s y n}^{i}

and the style augmentation transformation be

S

, then the image with style augmentation is

I_{s t y l e}^{i} = S (I_{s y n}^{i})

.

The CADR augmentation

C

utilizes the binary segmentation mask

S_{s y n}^{i}

of the space drone in synthetic image

I_{s y n}^{i}

to randomly replace both surface textures of the space drone and background textures with textures of

J_{C O C O}^{i}

sampled from the COCO dataset [41].

We first define a piecewise function as follows:

P_{p} (T) = \{\begin{matrix} T & triggered by probability p \\ I & triggered by probability (1 - p) \end{matrix},

(2)

where

T \in {S, C}

is an augmentation function and

I

is an equivalent mapping function (e.g.,

I (S) = S

). Consequently, the entire augmentation procedure can be formally formulated as follows:

I_{a u g}^{i} = P_{p s} (S) \circ P_{p c} (C) (I_{s y n}^{i}, S_{s y n}^{i}, J_{C O C O}^{i}),

(3)

where the operator ∘ donates the composition of functions and

p s

and

p c

denote the probabilities of style augmentation and CADR augmentation, respectively.

After utilizing the above-mentioned style augmentation and CADR augmentation, basic augmentations of the albumentations library [42] are also utilized in models training, which are the same as in Ref. [16].

4.2. Domain Uncertainty Estimation

To account for the uncertainty and stochasticity of domain shifts, we assume they follow a multivariate Gaussian distribution characterized by both mean and covariance parameters. A Domain Shifting Uncertainty module is designed to explicitly quantify this domain shift uncertainty, thereby enhancing the model’s domain generalization capability. This module is integrated into a multi-task learning network at the positions indicated by arrows a, b, c, and d in Figure 1.

Let the input feature tensor of DSU module be

x \in R^{B \times C \times H \times W}

and statistics of any channel of C follow a normal distribution

N (μ, Σ_{μ}^{2})

and

N (σ, Σ_{σ}^{2})

, respectively; then the variance of mean and variance can be formulated as follows [25]:

\begin{matrix} Σ_{μ}^{2} (x) = \frac{1}{B} \sum_{b = 1}^{B} {(μ (x) - E_{b} [μ (x)])}^{2}, \\ Σ_{σ}^{2} (x) = \frac{1}{B} \sum_{b = 1}^{B} {(σ (x) - E_{b} [σ (x)])}^{2}, \end{matrix}

(4)

where

Σ_{μ} \in R^{C}

and

Σ_{σ} \in R^{C}

denote the uncertainty of feature mean

μ

and standard deviation

σ

. The new feature statistics are randomly sampled according to the following formula:

\begin{matrix} β (x) = μ (x) + ϵ_{μ} Σ_{μ} (x), ϵ_{μ} \sim N (0, 1), \\ γ (x) = σ (x) + ϵ_{σ} Σ_{σ} (x), ϵ_{σ} \sim N (0, 1) . \end{matrix}

(5)

As inspired by the AdaIN [43], the final formulation of our approach is defined as

DSU (x) = \underset{γ (x)}{\underset{︸}{(σ (x) + ϵ_{σ} Σ_{σ} (x))}} (\frac{x - μ (x)}{σ (x)}) + \underset{β (x)}{\underset{︸}{(μ (x) + ϵ_{μ} Σ_{μ} (x))}} .

(6)

4.3. Online Unsupervised Domain Adaptation

To achieve Online Unsupervised Domain Adaptation (OUDA), the ODR module from SPNv2 [16] was implemented during the online training phase. As shown in Figure 1, this framework dynamically fine-tunes the statistical parameters of normalization layers using unlabeled real-world images acquired in real-time while keeping other parameters frozen. Formally, let

θ

denotes the set of all trainable parameters in the model, then the normalization layer parameters

ψ \in θ

is optimized by minimizing the following composite loss function:

min_{ψ} \frac{1}{n} \sum_{i = 1}^{n} ℓ_{ent} (x_{i}; ψ, θ),

(7)

where

x_{i}

is the unlabeled target domain image and the specific Shannon entropy loss is computed as

ℓ_{ent} (x_{i}; ψ, θ) = - \sum_{p} σ ({\hat{y}}_{i, p}) log σ ({\hat{y}}_{i, p}),

(8)

where

y_{i, p}

is the p-th pixel of the

x_{i}

’s output from segmentation head.

The ODR framework is explicitly designed to accommodate three critical constraints of online devices: source-free operation, sequential processing, incremental adaptation with streaming data, and computational efficiency. When hardware resources permit, the framework permits substitution with source-free pseudo-label-based self-training to achieve superior accuracy, albeit at the expense of higher computational demands.

5. Experiments

5.1. Datasets and Evaluation Metrics

The experimental validation of the proposed method was conducted using the SPEED+ dataset [12]. This dataset comprises synthetic and real images of the Tango space drone utilized in the PRISMA [13] mission, specifically designed to investigate domain gaps in Space drone Pose Estimation. As shown in Figure 3, it contains three subsets: (1) synthetic images generated from the CAD model (47,966 labeled training and 11,994 labeled validation images for model training), (2) Lightbox images for albedo simulation under laboratory conditions, and (3) Sunlamp images mimicking direct high-intensity homogeneous solar illumination. The Lightbox and Sunlamp subsets can serve as real-world weaker surrogate of the on-orbit target spaceborne images for precision evaluation [12].

Pose estimation accuracy was evaluated using the metric adopted in SPEC2021 [11]. For a single test image, translation error and rotation error are, respectively, defined as follows:

ξ_{T} = ∥ \hat{T} - T ∥, {\bar{ξ}}_{T} = ∥ \hat{T} - T ∥ / ∥ T ∥,

(9)

ξ_{R} = 2 arccos (〈 \hat{q}, q 〉), {\bar{ξ}}_{R} = 360 arccos (〈 \hat{q}, q 〉) / π,

(10)

where

(T, q)

and

(\hat{T}, \hat{q})

denote the ground-truth and estimated values, respectively, and

〈\cdot, \cdot〉

represents the vector dot product. Integrating both translation and rotation errors, the pose score for a single test image is defined as follows:

S c o r e = ξ_{R} + {\bar{ξ}}_{T} .

(11)

For the entire test set comprising N images, the average translation error

E_{T}

, rotation error

E_{R}

, and pose score

E_{S c o r e}

are defined as follows:

E_{T} = \frac{1}{N} \sum_{i = 1}^{N} ξ_{T}^{i},

(12)

E_{R} = \frac{1}{N} \sum_{i = 1}^{N} {\bar{ξ}}_{R}^{i},

(13)

E_{S c o r e} = \frac{1}{N} \sum_{i = 1}^{N} S c o r e^{i} .

(14)

Furthermore, the model’s comprehensive cross-domain accuracy is evaluated using the mean pose score across multiple test sets, denoted as the

{\bar{E}}_{S c o r e}

5.2. Implementation Details

During the offline training phase, the model was exclusively trained on synthetic imagery without utilizing any target domain images. Then, the offline trained model was evaluated on both the Lightbox and Sunlamp subsets. For data augmentation, in addition to the CADR augmentation mentioned in Section 4.2, style transfer augmentation [24] and basic image augmentations were employed. These included random brightness/contrast adjustment, Random Erase, Sun Flare effects, Gaussian blur, and additive noise. Detailed parameter configurations are recommended to follow Ref. [16].

The model was implemented using the PyTorch 2.0.1 framework and trained for 20 epochs with the Adam optimizer. A batch size of 16 was utilized, with an initial learning rate of

1 \times 10^{- 3}

, which was multiplied by 0.1 after the 15th and 18th epochs, respectively. Input images were resized to

768 \times 512

pixels, and training was conducted on an NVIDIA H800 GPU.

In experiments, the domain-shift uncertainty estimation module was optionally inserted at positions (a), (b), (c), or (d) depicted in Figure 1, with an activation probability of 0.5. This module was exclusively utilized during offline training.

At the online training stage, leveraging the characteristic of continuous unlabeled real image streams available during space drone operation, an online source-free un-supervised domain adaptation (UDA) method was employed to adapt the parameters of the normalization layers. Following SPNv2 [16], the batch size of 4 and the first 1024 images were used for online training before evaluating the model’s performance.

The final pose solution is primarily derived from the output of the heatmap regression head. The network’s direct pose regression output serves as a fallback solution only when pose derivation from heatmap-based predictions fails.

5.3. Comparative Analysis with State-of-the-Art

To validate the efficacy of the proposed method, we conducted experiments on the SPEED+ benchmark and compared them against state-of-the-art approaches. As detailed in Table 1, the proposed UF-SPE achieves the third-best performance on the

{\bar{E}}_{S c o r e}

metric while utilizing only approximately 12.9 million parameters, demonstrating an optimal balance between accuracy and model compactness. Excluding methods with significantly larger model sizes, our approach attains superior performance across multiple metrics, outperforming recent techniques including PVSPE [17] (which employs a pyramid-structured multi-task Vision Transformer), Ref. [19] (which leverages an additional NeRF [36] model for synthetic data generation), and Ref. [21]. Notably, despite Ref. [21]’s use of ground-truth bounding boxes for object detection to aid subsequent pose estimation, our method maintains a competitive advantage.

As illustrated in Figure 4, the test domain imagery (denoted as Original) exhibits visually challenging characteristics and a significant domain gap compared to the training samples in Figure 3. While baseline method (SPNv2) fails to recover correct poses under these conditions, the proposed approach achieves robust pose recovery for the target space drone.

5.4. Temporal Stability and Efficiency Analysis

To evaluate the temporal stability of the proposed method, we analyzed the frame-by-frame pose estimation results on continuous image sequences ROE1 and ROE2 from the Satellite Hardware-In-the-loop Rendezvous Trajectories (SHIRT) dataset. The SHIRT dataset comprises high-fidelity synthetic spaceborne images of the Tango space drone, captured in two simulated rendezvous trajectories in Low Earth Orbit (LEO). And it also contains real image sequences of these trajectories captured in a laboratory environment, featuring illumination from Earth albedo light boxes as used in the creation of the SPEED+ lightbox domain imagery [44]. The ROE1 scenario is characterized by the servicer maintaining a standard v-bar hold point with a typical along-track separation, during which the target undergoes rotation about one principal axis. Conversely, the ROE2 scenario entails a slow approach of the servicer towards a target that is tumbling about two principal axes. This dataset provides true 6-DoF pose annotation for each frame of image, facilitating quantitative evaluation of cross domain pose estimation accuracy over time.

In this experiment, the model was trained offline using synthetic data from the SPEED+ dataset and was subsequently evaluated on real laboratory data from two consecutive image sequences (ROE1 and ROE2) of the SHIRT dataset. Quantitative evaluation results, as illustrated in Figure 5, demonstrate the effectiveness of individual components within the UF-SPE framework when tested on consecutive frames. It can be observed that both style augmentation and CADR augmentation contribute to improved accuracy on sequential frames, while the introduction of ODR further enhances the mean average precision of the model’s performance across consecutive frames.

Figure 6 presents the stability analysis of ODR on the ROE1 and ROE2 sequences. The horizontal axis represents the difference in pose estimation scores between the results obtained with ODR activated and deactivated, where smaller values indicate better ODR performance. It can be observed that the majority of ODR instances effectively improve the model’s accuracy on consecutive frames, as evidenced by the mean differences being consistently less than zero. These results confirm the effectiveness and stability of ODR, demonstrating a very low probability of performance collapse.

For the efficiency analysis, we evaluated the model’s average training time, average ODR processing time, average inference time, and average pose estimation time. The average training time was measured on a server equipped with an H800 GPU and an Intel(R) Xeon(R) Platinum 8458P vCPU. All other timing statistics were obtained on a device with a 4090 GPU and an Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz. The number of parameters and floating-point operations (FLOPs) were calculated using the thop v0.1.1 library in Python 3.8.20. The timing results for different stages of the proposed method are summarized in Table 2. Notably, the time consumption for ODR is significantly lower than that of the offline training phase. The model’s parameter count, FLOPs and total pose-solving time are reported in Table 3. The results indicate that UF-SPE achieves a favorable balance, maintaining high accuracy while demonstrating competitive computational efficiency.

6. Ablation Studies

Comprehensive ablative experiments were designed to quantify the individual and synergistic contributions of (i) the data augmentation pipeline, (ii) the Domain Shift Uncertainty estimator, i.e., DSU module, and (iii) the online domain fine-tuning mechanism. Optimal configurations for each module were derived through grid search on hyperparameter spaces.

6.1. Effects of Data Augmentation

Twenty experimental trials were conducted to optimize the activation probabilities of style augmentation and CADR augmentation, with both probabilities being independent variables. The probability of style augmentation ranged from 0.3 to 0.7, while CADR augmentation probability was set between 0.3 and 0.6, with a uniform step size of 0.1. Performance was evaluated using the

{\bar{E}}_{S c o r e}

metric. As illustrated in Figure 7, the model achieved superior performance in regions characterized by high style augmentation probability and low CADR augmentation probability. The optimal configuration (0.6, 0.4) yielded peak

{\bar{E}}_{S c o r e}

values.

Further experiments investigated the individual and synergistic contributions of style augmentation and CADR augmentation. We employed boxplots to visualize the impact of different data augmentation strategies on the final pose estimation scores of the model. As illustrated in Figure 8, the box represents the interquartile range (IQR), the line inside the box denotes the median, and the diamond marker indicates the mean value. The boxes from left to right correspond to the following augmentation configurations: Basic Augmentation, Basic Augmentation + Style Augmentation, and Basic Augmentation + Style Augmentation + CADR Augmentation. As shown in Figure 8, style augmentation demonstrated limited efficacy in the Lightbox test domain, whereas it significantly improved robustness in the Sunlamp domain. In contrast, CADR augmentation consistently enhanced model performance across both test domains. Table 4 quantifies the impact of each augmentation strategy on key evaluation metrics. Experimental results demonstrate that progressive stacking of augmentation techniques yields consistent improvements in model accuracy.

6.2. Effects of Different Insertion Locations and Their Combinations

To assess the impact of the DSU module on model performance, we conducted experiments by inserting the DSU module at four distinct locations within the network architecture, as indicated by arrows a, b, c, and d in Figure 1. The following are illustrated in Figure 1: (a) downstream of the image input layer, (b) following the EfficientNet backbone [45], (c) internally within the Stacked BiFPN [46] module, and (d) subsequent to the BiFPN output. The bc refers to inserting DSU modules simultaneously in positions b and c, while bcd and abcd respectively refer to inserting DSU modules simultaneously in positions b c d or a b c d. Each configuration was evaluated independently to determine its effect on cross-domain pose estimation accuracy. Ablation studies were conducted under two distinct training regimes: without ODR and with ODR.

As demonstrated in Figure 9, the bcd configuration (locations b, c, d) significantly enhanced model performance in both the Lightbox and Sunlamp test domains, with mean average precision (

{\bar{E}}_{S c o r e}

) improvements of 17.3% and 14.8% compared to the base configuration, respectively. This validates the efficacy of the uncertainty estimation module in enhancing cross-domain robustness.

Figure 10 further confirms that the performance gains achieved by the bcd configuration were consistently maintained after integrating ODR (with mean average precision

{\bar{E}}_{S c o r e}

improvements of 20.5% and 14.4% compared to the base configuration on Lightbox and Sunlamp test domains, respectively), indicating that the uncertainty module’s functionality remains effective in improving the upper limit of accuracy. The improvement is calculated as

\frac{| B_{v a l u e} - A_{v a l u e} |}{A_{v a l u e}} \times 100 %,

(15)

where the meaning of the subscript value is corresponding value of

{\bar{E}}_{S c o r e}

.

6.3. Module Ablation in the Unified Framework

As illustrated in Figure 11, all modules consistently enhance pose estimation accuracy across both Lightbox and Sunlamp test domains. It is noteworthy that, in Figure 11a, with modules accumulating, interquartile ranges (IQRs) in boxplots maintain stable, while the diamond-shaped markers representing mean values exhibit a significant decline. This demonstrates that the model consistently attains high-precision pose recovery across an increasing number of test samples. A similar pattern is observed in the last three boxplots of Figure 11b: although the IQRs exhibit minimal variation, the mean values demonstrate a distinct gradual decline, as indicated by the green dashed arrows.

Table 5 comprehensively summarizes the impact of each module on specific evaluation metrics. With progressive module integration, all error metrics exhibit significant reductions, which validates the efficacy of the proposed methodology.

7. Conclusions

To address cross-domain Space drone Pose Estimation, this work proposes a unified framework integrating offline domain generalization and online domain adaptation. The framework employs stochastic style augmentation and CADR to randomize the texture features susceptible to domain shifts while enhancing the model’s capacity for extracting cross-domain invariant geometric features. Serving as an uncertainty quantification module parameterizing domain shifts as multivariate Gaussian distributions, the Domain Shifting Uncertainty module was integrated into a multi-task learning architecture, further strengthening the model’s generalization capability. During online deployment, unsupervised adaptation fine-tuning normalization layers using unlabeled real-world images was performed, achieving progressive alignment with target-domain distributions without manual annotations.

Extensive experiments on the SPEED+ benchmark validated the efficacy of our approach. With only 12.9 M parameters, the model secured the third-highest accuracy among state-of-the-art methods. The proposed framework achieved a 37.5% improvement in pose estimation accuracy

{\bar{E}}_{S c o r e}

over the baseline method (SPNv2) under cross-domain space drone scenarios.

Regarding the implementation of the proposed method, we envisage a practical pathway for its adoption. This includes developing lightweight model variants suitable for edge devices, optimizing inference pipelines, and leveraging ODR to adapt the model to specific deployment scenarios. We plan to validate the method on hardware platforms.

While the UF-SPE framework demonstrates promising results in controlled environments, its deployment on resource-constrained edge devices like the Jetson Nano presents several challenges, including hardware limitations and real-time processing requirements. The Jetson Nano’s limited memory and processing power may hinder the real-time execution of complex UF-SPE modules, especially under high-throughput scenarios. And balancing low-latency inference with power constraints is critical for edge devices. Regarding the issues mentioned above, employing model quantization and pruning presents a viable approach to mitigate computational overhead. Furthermore, the speed, stability, and accuracy of pose estimation within the UF-SPE framework can be enhanced through the integration of pose tracking algorithms [10] and Kalman filtering [44,47]. Future research will be dedicated to addressing the aforementioned challenges and validating the proposed solutions under spacecraft-grade radiation-hardened processors.

Author Contributions

Conceptualization, Y.Y. and Z.L.; methodology, Y.Y.; software, Y.Y.; validation, Y.Y., Z.L. and Q.Y.; formal analysis, Y.Y.; investigation, Y.Y.; resources, Y.Y.; data curation, Y.Y.; writing—original draft preparation, Y.Y.; writing—review and editing, Z.L.; visualization, Y.Y.; supervision, Q.Y.; project administration, Q.Y. and Z.L.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Innovation Program of Hunan Province under Grant No. 2022RC1196 and the National Natural Science Foundation of China under Grant No. 12472189.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Acknowledgments

This article is supported by the Hunan Provincial Key Laboratory of Image Measurement and Vision Navigation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pauly, L.; Rharbaoui, W.; Shneider, C.; Rathinam, A.; Gaudillière, V.; Aouada, D. A survey on deep learning-based monocular spacecraft pose estimation: Current state, limitations and prospects. Acta Astronaut. 2023, 212, 339–360. [Google Scholar] [CrossRef]
Sajjad, N.; Mirshams, M.; Hein, A.M. Spaceborne and ground-based sensor collaboration: Advancing resident space objects’ orbit determination for space sustainability. Astrodynamics 2024, 8, 325–347. [Google Scholar] [CrossRef]
Kisantal, M.; Sharma, S.; Park, T.H.; Izzo, D.; Märtens, M.; D’Amico, S. Satellite Pose Estimation Challenge: Dataset, Competition Design, and Results. IEEE Trans. Aerosp. Electron. Syst. 2020, 56, 4083–4098. [Google Scholar] [CrossRef]
Sharma, S.; Beierle, C.; D’Amico, S. Pose estimation for non-cooperative spacecraft rendezvous using convolutional neural networks. In Proceedings of the 2018 IEEE Aerospace Conference, Big Sky, MT, USA, 3–10 March 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–12. [Google Scholar]
Hu, Y.; Speierer, S.; Jakob, W.; Fua, P.; Salzmann, M. Wide-depth-range 6d object pose estimation in space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15870–15879. [Google Scholar]
Wang, Z.; Wang, J.; Yu, J.; Li, Z.; Yu, Q. High-accuracy real-time satellite pose estimation for in-orbit applications. Chin. J. Aeronaut. 2025, 38, 103458. [Google Scholar] [CrossRef]
Yu, Y.; Wang, Z.; Li, Z.; Yu, Q. A comprehensive study on PnP-based pipeline for pose estimation of noncooperative satellite. Acta Astronaut. 2024, 224, 486–496. [Google Scholar] [CrossRef]
Lotti, A.; Modenini, D.; Tortora, P.; Saponara, M.; Perino, M.A. Deep Learning for Real-Time Satellite Pose Estimation on Tensor Processing Units. J. Spacecr. Rocket. 2023, 60, 1034–1038. [Google Scholar] [CrossRef]
Chen, B.; Cao, J.; Parra, A.; Chin, T.J. Satellite Pose Estimation with Deep Landmark Regression and Nonlinear Pose Refinement. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 2816–2824. [Google Scholar]
Liu, K.; Yu, Y. Revisiting the Domain Gap Issue in Non-cooperative Spacecraft Pose Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 6864–6873. [Google Scholar]
Park, T.H.; Märtens, M.; Jawaid, M.; Wang, Z.; Chen, B.; Chin, T.J.; Izzo, D.; D’Amico, S. Satellite Pose Estimation Competition 2021: Results and Analyses. Acta Astronaut. 2023, 204, 640–665. [Google Scholar] [CrossRef]
Park, T.H.; Märtens, M.; Lecuyer, G.; Izzo, D.; D’Amico, S. SPEED+: Next-Generation Dataset for Spacecraft Pose Estimation across Domain Gap. In Proceedings of the 2022 IEEE Aerospace Conference (AERO), Big Sky, MT, USA, 5–12 March 2022; pp. 1–15. [Google Scholar] [CrossRef]
D’Amico, S.; Bodin, P.; Delpech, M.; Noteborn, R. PRISMA. In Distributed Space Missions for Earth System Monitoring; Springer: New York, NY, USA, 2013; pp. 599–637. [Google Scholar] [CrossRef]
Wang, Z.; Chen, M.; Guo, Y.; Li, Z.; Yu, Q. Bridging the Domain Gap in Satellite Pose Estimation: A Self-Training Approach Based on Geometrical Constraints. IEEE Trans. Aerosp. Electron. Syst. 2023, 60, 2500–2514. [Google Scholar] [CrossRef]
Pérez-Villar, J.I.B.; García-Martín, Á.; Bescós, J.; Escudero-Viñolo, M. Spacecraft Pose Estimation: Robust 2-D and 3-D Structural Losses and Unsupervised Domain Adaptation by Intermodel Consensus. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 2515–2525. [Google Scholar] [CrossRef]
Park, T.H.; D’Amico, S. Robust multi-task learning and online refinement for spacecraft pose estimation across domain gap. Adv. Space Res. 2023, 73, 5726–5740. [Google Scholar] [CrossRef]
Yang, H.; Xiao, X.; Yao, M.; Xiong, Y.; Cui, H.; Fu, Y. PVSPE: A pyramid vision multitask transformer network for spacecraft pose estimation. Adv. Space Res. 2024, 74, 1327–1342. [Google Scholar] [CrossRef]
Park, T.H.; Sharma, S.; D’Amico, S. Towards Robust Learning-Based Pose Estimation of Noncooperative Spacecraft. arXiv 2019, arXiv:1909.00392. [Google Scholar] [CrossRef]
Legrand, A.; Detry, R.; De Vleeschouwer, C. Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations. In Proceedings of the IEEE International Conference on Space Robotics, Luxembourg, 24–27 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–10. [Google Scholar]
Legrand, A.; Detry, R.; De Vleeschouwer, C. Domain Generalization for 6D Pose Estimation Through NeRF-based Image Synthesis. arXiv 2024, arXiv:2407.10762. [Google Scholar]
Legrand, A.; Detry, R.; De Vleeschouwer, C. Domain Generalization for In-Orbit 6D Pose Estimation. arXiv 2024, arXiv:2406.11743. [Google Scholar] [CrossRef]
Park, T.H.; D’Amico, S. Bridging Domain Gap for Flight-Ready Spaceborne Vision. arXiv 2024, arXiv:2409.11661. [Google Scholar] [CrossRef]
Ulmer, M.; Durner, M.; Sundermeyer, M.; Stoiber, M.; Triebel, R. 6D Object Pose Estimation from Approximate 3D Models for Orbital Robotics. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 10749–10756. [Google Scholar]
Jackson, P.T.; Abarghouei, A.A.; Bonner, S.; Breckon, T.P.; Obara, B. Style augmentation: Data augmentation via style randomization. In Proceedings of the CVPR Workshops, Long Beach, CA, USA, 16–20 June 2019; Volume 6, pp. 10–11. [Google Scholar]
Li, X.; Dai, Y.; Ge, Y.; Liu, J.; Shan, Y.; DUAN, L. Uncertainty Modeling for Out-of-Distribution Generalization. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022; pp. 1–17. [Google Scholar]
Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar] [CrossRef]
Yu, Y.; Guan, B.; Sun, X.; Li, Z. Self-calibration of cameras using affine correspondences and known relative rotation angle. Appl. Opt. 2021, 60, 10785–10794. [Google Scholar] [CrossRef]
Yu, Y.; Guan, B.; Sun, X.; Li, Z.; Fraundorfer, F. Rotation alignment of camera-IMU system using a single affine correspondence. Appl. Opt. 2021, 60, 7455–7465. [Google Scholar] [CrossRef]
Phisannupawong, T.; Kamsing, P.; Torteeka, P.; Channumsin, S.; Sawangwit, U.; Hematulin, W.; Jarawan, T.; Somjit, T.; Yooyen, S.; Delahaye, D.; et al. Vision-Based Spacecraft Pose Estimation via a Deep Convolutional Neural Network for Noncooperative Docking Operations. Aerospace 2020, 7, 126. [Google Scholar] [CrossRef]
Proença, P.F.; Gao, Y. Deep Learning for Spacecraft Pose Estimation from Photorealistic Rendering. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 6007–6013. [Google Scholar]
Li, Y.; Yang, S.; Liu, P.; Zhang, S.; Wang, Y.; Wang, Z.; Yang, W.; Xia, S.T. SimCC: A Simple Coordinate Classification Perspective for Human Pose Estimation. In Computer Vision—ECCV 2022; Springer Nature: Cham, Switzerland, 2022; pp. 89–106. [Google Scholar]
Li, Q.; Zhang, Z.; Xiao, F.; Zhang, F.; Bhanu, B. Dite-HRNet: Dynamic lightweight high-resolution network for human pose estimation. arXiv 2022, arXiv:2204.10762. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 139–144. [Google Scholar]
Pérez-Villar, J.I.B.; García-Martín, Á.; Bescós, J. Spacecraft pose estimation based on unsupervised domain adaptation and on a 3d-guided loss combination. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 37–52. [Google Scholar]
Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. NeRF: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
Liu, Y.; Zhou, R.; Du, D.; Cao, S.; Qi, N. Feature-aided pose estimation approach based on variational auto-encoder structure for spacecrafts. Chin. J. Aeronaut. 2024, 37, 329–341. [Google Scholar] [CrossRef]
Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar] [CrossRef]
Lepetit, V.; Moreno-Noguer, F.; Fua, P. EPnP: An Accurate O(n) Solution to the PnP Problem. Int. J. Comput. Vis. 2009, 81, 155–166. [Google Scholar] [CrossRef]
Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography—ScienceDirect. Readings Comput. Vis. 1987, 726–740. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.J.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and flexible image augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]
Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1501–1510. [Google Scholar]
Park, T.H.; D’Amico, S. Adaptive Neural Network-Based Unscented Kalman Filter for Robust Pose Tracking of Noncooperative Spacecraft. J. Guid. Control. Dyn. 2023, 46, 1671–1688. [Google Scholar] [CrossRef]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
Zhang, P.; Wu, D.; Baoyin, H. Real-time hybrid method for maneuver detection and estimation of non-cooperative space targets. Astrodynamics 2024, 8, 437–453. [Google Scholar] [CrossRef]

Figure 1. Illustration of the proposed UF-SPE. The UF-SPE integrates offline domain generalization (light blue background) and online unsupervised domain adaptation (light green background). DSU modules are optionally inserted at four positions (arrows a–d).

Figure 2. Flowchart of the proposed UF-SPE framework.

Figure 3. Dataset samples: Row 1 displays synthetic imagery; Row 2 shows Lightbox imagery simulating surface albedo; Row 3 presents Sunlamp imagery mimicking homogeneous solar illumination.

Figure 4. Qualitative results of the proposed UF-SPE on the (a) Lightbox and (b) Sunlamp subsets.

Figure 5. Effects of style augmentation, CADR augmentation and ODR on model performance on (a) ROE1 and (b) ROE2 sequences.

Figure 6. Difference in score between ODR and non ODR on (a) ROE1 and (b) ROE2 sequences.

Figure 7. Effects of probabilistic combination strategies for style augmentation and CADR augmentation on cross-domain pose estimation accuracy.

Figure 8. Effects of style augmentation and CADR augmentation on model performance on (a) Lightbox and (b) Sunlamp subsets.

Figure 9. Impact of insertion locations and integration schemes for the uncertainty module on model performance on (a) Lightbox and (b) Sunlamp subsets.

Figure 10. Effects of online domain adaptation fine-tuning on the uncertainty estimation module.

Figure 11. Effects of data augmentation, uncertainty estimation, and online domain refinement in (a) Lightbox and (b) Sunlamp subsets.

Table 1. Comparison with state-of-the-art methods.

Method	Param	Lightbox			Sunlamp
Method	Param	$E_{R}$ [°]	$E_{T}$ [ $m$ ]	$E_{Score}$	$E_{R}$ [°]	$E_{T}$ [ $m$ ]	$E_{Score}$	${\bar{E}}_{Score}$
SPNv3 [22]	86.3 M	2.03	1.2	0.05	3.4	1.5	0.07	0.06
Ref. [23]	88.6 M	1.75	8.5	0.04	2.66	1.3	0.06	0.05
SPN [4]	/	65.12	45.0	1.21	92.95	65.0	1.73	1.47
KRN [18]	/	33.62	95.0	0.74	65.37	204.0	1.47	1.11
Ref. [19]	/	7.20	20.0	0.16	16.9	29.0	0.34	0.25
PVSPE [17]	/	4.81	/	0.10	8.94	/	0.18	0.14
Ref. [19]	/	3.66	14.1	0.09	7.82	21.4	0.17	0.13
Ref. [21]	/	4.32	9.0	0.09	6.94	14.0	0.14	0.12
Ref. [10]	/	6.68	17.0	0.15	5.51	22.0	0.13	0.14
SPNv2 [16]	12.0 M	5.62	14.2	0.12	9.60	18.2	0.20	0.16
UF-SPE	12.9 M	5.71	15.2	0.12	6.62	13.4	0.14	0.13
UF-SPE ※	12.9 M	3.97	11.2	0.09	5.69	11.0	0.12	0.10

※: Utilized the online domain refinement.

Table 2. List of time consumption for UF-SPE’s main stages (ms).

Training	ODR	Inferring	Pose solving
255	93	24	31

Table 3. List of model parameter quantity, computational complexity, and pose solving time.

Params (M)	Flops (G)	Time (ms)
12.91	28.04	55

Table 4. Effects of style augmentation and CADR augmentation.

	Lightbox			Sunlamp
	$E_{R}$ [°]	$E_{T}$ [ $m$ ]	$E_{Score}$	$E_{R}$ [°]	$E_{T}$ [ $m$ ]	$E_{Score}$	${\bar{E}}_{Score}$
base	8.82	20.9	0.18	25.64	56.8	0.53	0.36
+Style	8.59	21.9	0.18	11.08	17.7	0.22	0.20
+CADR	6.82	18.7	0.15	7.75	15.1	0.16	0.16

Table 5. Ablation experiments of the proposed unified framework.

	Lightbox			Sunlamp
	$E_{R}$ [°]	$E_{T}$ [ $m$ ]	$E_{Score}$	$E_{R}$ [°]	$E_{T}$ [ $m$ ]	$E_{Score}$	${\bar{E}}_{Score}$
base	8.82	20.9	0.18	25.64	56.8	0.53	0.36
+Aug	6.82	18.7	0.15	7.75	15.1	0.16	0.16
+Unc	5.71	15.2	0.12	6.62	13.4	0.14	0.13
+Odr	3.97	11.2	0.09	5.69	11.0	0.12	0.11

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, Y.; Li, Z.; Yu, Q. A Unified Framework for Cross-Domain Space Drone Pose Estimation Integrating Offline Domain Generalization with Online Domain Adaptation. Drones 2025, 9, 774. https://doi.org/10.3390/drones9110774

AMA Style

Yu Y, Li Z, Yu Q. A Unified Framework for Cross-Domain Space Drone Pose Estimation Integrating Offline Domain Generalization with Online Domain Adaptation. Drones. 2025; 9(11):774. https://doi.org/10.3390/drones9110774

Chicago/Turabian Style

Yu, Yingjian, Zhang Li, and Qifeng Yu. 2025. "A Unified Framework for Cross-Domain Space Drone Pose Estimation Integrating Offline Domain Generalization with Online Domain Adaptation" Drones 9, no. 11: 774. https://doi.org/10.3390/drones9110774

APA Style

Yu, Y., Li, Z., & Yu, Q. (2025). A Unified Framework for Cross-Domain Space Drone Pose Estimation Integrating Offline Domain Generalization with Online Domain Adaptation. Drones, 9(11), 774. https://doi.org/10.3390/drones9110774

Article Menu

A Unified Framework for Cross-Domain Space Drone Pose Estimation Integrating Offline Domain Generalization with Online Domain Adaptation

Abstract

1. Introduction

2. Related Work

2.1. Vanilla Space Drone Pose Estimation from Monocular Images

2.2. Cross-Domain Space Drone Pose Estimation from Monocular Images

3. Preliminaries

4. Method

4.1. Augmentation

4.2. Domain Uncertainty Estimation

4.3. Online Unsupervised Domain Adaptation

5. Experiments

5.1. Datasets and Evaluation Metrics

5.2. Implementation Details

5.3. Comparative Analysis with State-of-the-Art

5.4. Temporal Stability and Efficiency Analysis

6. Ablation Studies

6.1. Effects of Data Augmentation

6.2. Effects of Different Insertion Locations and Their Combinations

6.3. Module Ablation in the Unified Framework

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI