A Deployment-Oriented Case Study of YOLO-Based Model Compression for On-Board Space Debris Detection

Kerr, Liam; Arandjelović, Ognjen

doi:10.3390/info17070650

Open AccessArticle

A Deployment-Oriented Case Study of YOLO-Based Model Compression for On-Board Space Debris Detection

by

Liam Kerr

and

Ognjen Arandjelović

^*

School of Computer Science, University of St Andrews, St Andrews KY16 9AJ, UK

^*

Author to whom correspondence should be addressed.

Information 2026, 17(7), 650; https://doi.org/10.3390/info17070650

Submission received: 26 May 2026 / Revised: 26 June 2026 / Accepted: 29 June 2026 / Published: 3 July 2026

(This article belongs to the Special Issue Computer Vision and Image Processing: Technologies and Applications for Multimedia Systems)

Download

Browse Figures

Versions Notes

Abstract

Space debris presents a growing operational risk to spacecraft, especially in low Earth orbit, where collisions can generate further debris and increase future collision probability. Active debris removal and in-orbit servicing require robust close-range perception, but on-board systems are constrained by power, memory, processing capability and the need for reliable real-time operation. This paper investigates convolutional object detection for on-board space debris detection using the SPARK 2022 spacecraft detection dataset. A YOLOv3 detector is fine-tuned and used to evaluate post-training compression through static quantisation and pruning. A lightweight architectural variant, YOLO-DWSC, is also introduced by replacing the YOLOv3-tiny backbone convolutions with depthwise separable convolutions while retaining the detection head. The full-precision YOLOv3 model achieves 0.972 mAP₅₀ and 0.884 mAP_50:95, while 8-bit static quantisation reduces model size from 405 MB to 102 MB with only a small reduction in mAP₅₀, although tighter localisation accuracy is more affected. YOLO-DWSC is much smaller and faster, reaching 256.4 FPS on the tested GPU at 43 MB, but with reduced accuracy. We present this work as a controlled case study rather than an attempt at state-of-the-art SPARK 2022 performance. The original challenge test labels were unavailable, and the experiments therefore use a class-balanced re-split of the labelled data. The results should consequently be interpreted as internally controlled comparisons of compression behaviour, not as leaderboard-comparable benchmark results. Pruning and a two-pass refinement method are also evaluated. The results indicate that simple compression methods can be useful for broad region-of-interest detection, but they also show that claims about on-board deployment require caution. Speed benefits are hardware- and runtime-dependent, and safety-critical proximity operations require evaluation criteria better aligned with full-object containment.

Keywords:

space debris; object detection; YOLOv3; quantisation; pruning; depthwise separable convolution; on-board inference; active debris removal; SPARK 2022; lightweight CNN

1. Introduction

Space debris is an increasingly serious problem for the long-term sustainability of space operations. Large non-functional spacecraft, spent rocket bodies, mission-related objects and smaller fragments occupy operational regions around the Earth, especially low Earth orbit (LEO). These objects threaten active satellites, crewed platforms, and future exploration missions. The risk is not limited to a single collision event since a collision can generate many further fragments, raising the probability of subsequent collisions and contributing to the self-reinforcing scenario often associated with the Kessler syndrome [1,2]. Recent ESA reports paint a stark picture of the present debris environment, with serious practical consequences to operational context. Tens of thousands of objects are tracked, while much larger populations of centimetre- and millimetre-scale debris are inferred statistically rather than routinely catalogued [2,3]. Although in the present paper the smaller objects are not of primary interest, they do motivate the wider problem. Even when the immediate focus is on detecting large known spacecraft or large debris objects during close-proximity operations, the environment in which these operations take place is already congested and dynamic, and the consequences of error potentially catastrophic.

Current mitigation practices, including end-of-life disposal and collision avoidance, are necessary but may not be sufficient. Active debris removal proposes intercepting selected high-risk objects, such as large defunct spacecraft or rocket bodies, and safely disposing of them [4]. In-orbit servicing and manufacturing similarly require spacecraft to approach, inspect and sometimes manipulate non-cooperative or only partly cooperative targets. These missions depend on close-range perception. A chaser spacecraft must detect the target, estimate its pose, track its motion and, in some cases, support detumbling or capture. A failure during this process could damage the target, the servicing spacecraft, or nearby assets, and could create further debris.

The perception problem is difficult because space hardware is heavily constrained. Spacecraft are limited by mass, power availability, processing capacity, memory and the use of specialised or radiation-tolerant components. A model that performs well on a modern desktop GPU is therefore not automatically suitable for an on-board mission. This important fact is often sidelined in published computer vision benchmarking, with the primary interest excessively focused on accuracy on a standard dataset. For actual space applications, inference latency, model size, numerical precision, hardware support and robustness to operating conditions are issues that cannot be separated from a method’s efficacy. Succinctly put, a detector is useful only insofar as it can be employed within a credible operational pipeline.

This paper investigates the use of convolutional neural network (CNN)-based object detection models for space debris detection and evaluates methods by which they can be compressed to run faster, use less memory and reduce computational demand. The work focuses on YOLOv3 and YOLOv3-tiny [5], using the SPARK 2022 spacecraft detection dataset [6]. We study three compression directions. First, static post-training quantisation is applied to YOLOv3 using 8-bit and 4-bit weight representations. Second, pruning is applied to investigate whether sparsity can reduce the effective network footprint while preserving accuracy. Third, a lightweight architectural variant, YOLO-DWSC, is introduced. YOLO-DWSC modifies the YOLOv3-tiny backbone by replacing standard convolutions with depthwise separable convolutions, while preserving the detection head.

The key contributions of this work are as follows. First, we conduct the first controlled case study of YOLOv3 compression on the SPARK 2022 dataset, reporting and analysing a diverse range of metrics, including accuracy, model size and CPU/GPU speed measurements. Second, we evaluate a lightweight YOLOv3-tiny-derived variant, YOLO-DWSC, as an architectural test of the accuracy-efficiency trade-off produced by replacing standard backbone convolutions with depthwise separable convolutions. Third, we critically analyse a two-pass region-of-interest refinement strategy intended to improve prediction accuracy after an initial detection. Fourth, reflecting on the practices in the published literature, which were adopted in the experimental part of this paper, we argue that in future standard mean average precision (mAP) should be supplemented in active debris removal contexts by containment-oriented metrics.

To properly contextualise the present work at the very outset, we emphasise that our aim herein is not to develop the most performant detector for SPARK 2022. Indeed, more recent YOLO variants, transformer-based detectors and task-specific models can achieve higher accuracy on many object detection problems, and recent space-object work has also moved beyond YOLOv3 [7,8,9,10]. Our aim is instead to examine how a widely used detector family behaves under specific architectural simplification and compression, and to discuss the consequent limits of using standard object-detection metrics for an active debris removal perception pipeline. This aim is motivated by the broader context, namely by the fact that on-board adoption is shaped not only by peak mAP, but also by the relationship between accuracy, model footprint, inference path, hardware support and the detector’s role in the mission stack.

Operational Role of Detection in Active Debris Removal

The detection task considered in this paper should be understood as one component within a wider active debris removal or in-orbit servicing pipeline. A monocular detector is unlikely to be the only perception system used during final capture. What is more likely is that object detection would provide the acquisition target, region-of-interest selection, and a reliable input crop for later pose estimation or tracking modules. This realisation is significant in that it sheds light on how the reported results should be interpreted. A detector used for early target acquisition could have looser localisation requirements if it can reliably identify the target and provide a sufficiently narrow region of interest for downstream processing. A detector used immediately before capture, however, would be subject to much stricter localisation and robustness requirements.

This operational framing also contextualises the significance and the role of compression. A smaller and faster model need not replace a larger detector in every part of a mission. Instead, what may be preferable is to allow an on-board spacecraft to perform frequent low-cost monitoring, triggering higher-cost perception only when needed, or sending a compact region of interest for more detailed processing on the ground or on-board; see Table 1. Thus, understood properly, the evaluation we report in the present paper does not concern only compression as a means of preserving a single model’s behaviour under a lower numerical precision or a smaller footprint, but compression as a way of exploring different possible roles in a perception stack.

2. Related Work

2.1. Space Debris Detection and Close-Proximity Perception

Ground-based radar and optical telescopes remain central to space situational awareness. Radar systems can estimate range and velocity by transmitting radio waves and measuring returns from resident space objects, and optical telescopes are widely used for more distant or geostationary objects. Systems such as the Tracking and Imaging Radar (TIRA) have contributed to European debris surveillance and object characterisation [11]. These systems are effective for cataloguing larger objects, but they have inherent limitations due to atmospheric effects, line-of-sight restrictions, latency, incomplete coverage, and reduced sensitivity to smaller debris.

Space-based and on-board sensing approaches can complement ground-based surveillance. Liu et al. [12] investigate space debris detection and positioning using multiple star trackers, arguing that space-based observations avoid some weather and day-night limitations of ground systems. For close-proximity operations, on-board perception is even more important because a chaser spacecraft needs immediate local information about the target, its apparent geometry and its motion. This is especially relevant when the object is non-cooperative or tumbling.

Deep learning has become increasingly important in spacecraft perception. Pose estimation for non-cooperative spacecraft has attracted substantial attention, especially because monocular cameras are passive, relatively low-power and already common on spacecraft. Lotti et al. [13] investigate real-time satellite pose estimation on low-power Edge TPU hardware, explicitly addressing accuracy-latency trade-offs. Pauly et al. [14] survey deep learning-based monocular spacecraft pose estimation and emphasise two persistent barriers to deployment: computation-intensive methods and the domain gap between synthetic or laboratory imagery and real orbital imagery. These concerns apply directly to object detection. A detector trained on rendered or laboratory images must be evaluated not merely as a dataset-specific pattern recognizer, but as a candidate component in a safety-critical perception pipeline.

2.2. CNN-Based Object Detection

Object detection involves localising and classifying objects in an image, usually by predicting bounding boxes and class probabilities. The main CNN-based families differ in how they balance accuracy and speed. Faster R-CNN uses a region proposal network followed by a detection stage [15]. It can be highly accurate, but the two-stage structure is often less suitable for real-time resource-constrained deployment. SSD performs detection in a single pass using default boxes at multiple scales [16]. YOLO also performs detection in one pass, dividing the image into a grid and predicting bounding boxes and class probabilities directly [17]. YOLOv3 extends the original YOLO formulation through multi-scale prediction and a Darknet-53 backbone [5].

YOLO-style detectors are attractive for space debris detection because they offer a favourable speed-accuracy trade-off and have mature implementations. However, general-purpose object detectors are normally developed and benchmarked on terrestrial datasets such as COCO [18]. But space imagery differs significantly. Targets may be small, low contrast, partially illuminated, highly reflective, or set against backgrounds containing Earth, stars, sensor noise or darkness. Therefore, the operationally realistic input to the detector is rather different in its characteristics from natural images that dominate the object detection literature and that have shaped the design of proposed detectors. In this context, transfer learning remains useful, but it does not remove the need for domain-specific adaptations to the model.

Recent work has started to address space object detection more directly. AlDahoul et al. [19] use EfficientDet and decision fusion for space situational awareness on SPARK-related data. Li et al. [20] propose a target localisation method for non-cooperative spacecraft in on-orbit servicing, using a lightweight CNN-based detector and explicitly considering efficiency. Zhou et al. [7] propose an improved YOLO11-based detector for SPARK 2022, incorporating image preprocessing, data augmentation and architectural modifications to improve small object detection. Guo et al. [8] propose an enhanced YOLOv8-based approach for space debris detection using cross-scale feature fusion. These studies support the broader direction pursued here, but they also show that the literature is rapidly moving towards increasingly task-specific detectors. The present paper sits slightly differently within this landscape, focusing on model compression and the practical consequences of making a detector lightweight.

2.3. Modern Real-Time, Small-Object and Video-Aware Detectors

Our choice of YOLOv3 in the present work is best understood contextualised within the background of the rapid development of real-time detection models. Specifically, the introduction of YOLOv5 and YOLOv8 made the YOLO family easier to train, deploy and export in PyTorch-based workflows, while later variants such as YOLOv10 have demonstrated lower latency and end-to-end operation by reducing reliance on non-maximum suppression and redesigning efficiency-critical model components [9,21]. RT-DETR provides a different line of development still, using an efficient real-time detection transformer with an encoder-decoder structure intended to preserve accuracy while reducing the computational burden usually associated with DETR-like models [10]. The inclusion of these models is important for work adjacent to ours, i.e., on inter-model comparison, especially if the aim is to produce a detector that is competitive with current real-time object detection systems.

Small-object and surveillance video detection work also offers pertinent ideas, although their applicability to SPARK 2022 is not immediate since surveillance video systems often exploit temporal information, scene- or context-specific background information, and motion cues [22,23]. Thus, Yu et al. [24] propose adaptive omni-attention over inter-frame and intra-frame context, using temporal information to improve difficult surveillance frames and suppress false positives in background regions. Wang et al. [25] use motion priors for surveillance vehicle detection, again exploiting information that is not available in a single still image. These approaches are potentially relevant to space debris detection when a video stream from an approaching spacecraft is available rather than individual images. A chaser spacecraft would normally observe a target across time, so temporal consistency, motion priors, and track-aware detection could all improve robustness, especially when dealing with small apparent target size, motion blur and intermittent low-quality frames.

The experiments we report in the present paper focus on the evaluation of single-image-based rather than video-based detection on SPARK 2022. Nevertheless, the surveillance literature is useful for contextualising our work and providing insight into promising follow-up work, such as one involving a pairing of a compressed detector used on-board with temporal feature aggregation or motion-aware filtering, particularly during approach phases where the same target is observed over consecutive frames.

2.4. Model Compression for On-Board Inference

Quantisation reduces the precision used to represent weights, activations, or both. Instead of using 32-bit floating-point values, the model may use 8-bit or lower integer values thereby reducing memory footprint and potentially accelerating inference when the hardware and inference runtime support the lower precision operations efficiently [26]. Post-training quantisation is attractive too, because it can be applied after training, but it is often less accurate than quantisation-aware training, which simulates quantisation effects during training.

Pruning removes weights, filters, channels or other structures judged to contribute little to model performance. Unstructured pruning can create sparse weight matrices but does not automatically yield speed improvements unless the hardware and libraries exploit sparsity. Structured pruning removes larger units, such as filters or channels, and can therefore be easier to accelerate in practice, but it is more likely to damage representational capacity if applied without care [27,28]. Knowledge distillation is another common compression method, where a smaller student model is trained to imitate a larger teacher model [29]. It was not implemented in this study, but it is a natural future extension.

Architectural compression changes the network itself. Depthwise separable convolutions, used in models such as MobileNet [30] and Xception [31], split a standard convolution into depthwise and pointwise stages. If a standard convolution has kernel size

k \times k

,

C_{in}

input channels and

C_{out}

output channels, then its parameter count is

k^{2} C_{in} C_{out} .

(1)

In contrast, a depthwise separable convolution has

k^{2} C_{in} + C_{in} C_{out}

(2)

parameters. This is a substantial reduction when

C_{out}

is large and

k > 1

. However, the reduction does not come without cost, because separating spatial and channel mixing can reduce expressive capacity. The practical question is therefore an empirical one, asking how much accuracy is thereby lost, and how much efficiency is gained, on a space-object detection task.

3. Materials and Methods

3.1. Experimental Scope and Choice of Baseline

In this work we adopt YOLOv3 as the baseline detector. Although it is unlikely that this model is capable of delivering the absolute state-of-the-art performance, this choice is nevertheless particularly well suited to the aims we pursue in the present work. Specifically, unlike the actual state-of-the-art contenders, which are recent and perpetually subject to small alterations, YOLOv3 is a well-known and well-understood reference model, while also sufficiently performant for the study of compression. The model is large enough for quantisation and pruning effects to be measurable, it has a readily available mature implementation crucial for reproducibility, and it facilitates comparison between full-precision, quantised, and pruned versions without a need to change the detector family.

Our key contribution is therefore a task-specific case study of how a particular kind of architectural alteration behaves in a YOLOv3-tiny-style detector for SPARK-like space-object imagery when considered alongside post-training quantisation, pruning, and mission-oriented evaluation concerns. This scope also explains the absence of comparisons with YOLOv5, YOLOv8, YOLOv10 or RT-DETR, which are discussed to the extent that this is useful for placing our work in context and for indicating how a conceptually adjacent, state-of-the-art benchmark study would have to be designed.

3.2. Dataset

For our experiments we use the SPARK 2022 spacecraft detection dataset [6] which contains RGB images of spacecraft and space debris at

1024 \times 1024

resolution. The detection stream contains 11 classes, namely 10 classes of uncooperative spacecraft and one debris class. The task is to localise and classify the object in each image by estimating its bounding box and label.

The original challenge structure included a test set for which labels were not available for the present study. Therefore, the labelled data available for the experiments were re-split into training, validation, and test subsets. The split used 58,000 training images, 15,000 validation images, and 15,000 test images. To maintain class balance, images and labels were grouped by class and sampled randomly for each subset. This avoided placing an entire contiguous block of images into only one split, which could have introduced unintended correlations if image filenames or ordering were related to scene conditions such as Earth background, illumination, or camera pose.

It is important to stress that the re-splitting we performed is appropriate for the specific goal of the present paper but must not be read as directly comparable with benchmark values reported in the literature that consistently use the original split. Since the purpose of our study is not to establish state-of-the-art SPARK performance, but to compare compressed variants under the same train-test conditions, and since all reported models are evaluated on the same held-out test set, the resulting internal comparisons are valid for what we aim to establish. We discuss this in more detail next.

3.3. Dataset Split, Comparability and Leakage Risk

The use of a re-split rather than the original SPARK 2022 challenge test protocol is an important methodological change as compared with the previous literature and is worthy of a brief discussion. As the original test labels were not available, the labelled data were reorganised into training, validation and test subsets. This decision makes the reported results suitable for internal comparison between the models evaluated here, but it limits direct comparison with original-split SPARK 2022 results. For this reason, the numerical results in this paper should not be read as competing for the state-of-the-art SPARK performance but rather as analysing relative behaviour under a common re-split. In short, the experimental results we report reveal how accuracy, size, and speed change when the same baseline detector is quantised, pruned or replaced by the YOLO-DWSC variant; see Table 2.

The usual best practice protocol was followed. In particular, the split was class-balanced with random selection within each class, reducing the risk that one class is over-represented in the training set or under-represented in the test set, and avoiding the use of contiguous filename blocks that might correlate with similar rendering or background conditions. However, synthetic datasets can contain near-duplicate scenes, similar target poses or rendering conditions that are not captured by class labels alone. Thus, we note that without additional grouping metadata, such as trajectory identity, rendering seed or sequence membership, information leakage through visually similar images cannot be ruled out with certainty.

3.4. Label and Prediction Quality Control

Because our experiments rely on bounding-box localisation rather than image-level classification, we purposefully did not separate label formatting from the experimental protocol. The SPARK labels were read in YOLO format and converted into corner coordinates for IoU and mAP computation. During implementation, prediction overlays were generated by drawing both predicted and ground-truth boxes on the original images. This visual check exposed a class-specific coordinate-ordering error in the prepared labels, where some bounding boxes were visually inconsistent with the target object despite the numerical pipeline appearing to run correctly. The affected labels were corrected before the final evaluation.

This finding highlights the importance of quality control because localisation metrics cannot catch errors such as coordinate convention inconsistency. Thus, a model may appear to perform poorly on particular classes even when the underlying detector is not the source of the error. Conversely, a preprocessing mistake can create an artificial performance boost if the same error is propagated through prediction and evaluation. For this reason, our final pipeline used both numerical checks and image-level overlays before reporting mAP, speed, and model-size comparisons. Figure 1 illustrates the type of visual inspection used to verify the relationship between predicted and target boxes.

3.5. Baseline Detector

Our primary baseline is YOLOv3 [5], initialised from weights trained on COCO [18] and fine-tuned on the SPARK 2022 training split. YOLOv3 was chosen because it is a well-understood one-stage detector and because its architecture is sufficiently large to make compression meaningful. The model contains 65,252,682 parameters and, in its full-precision form, occupies 405 MB in the evaluation environment.

Fine-tuning was performed for 50 epochs using a learning rate of 0.01 and momentum of 0.937. A patience value of 5 was used for early stopping, but the stopping criterion was not triggered, so training ran for the full 50 epochs. Training used an NVIDIA RTX 3060 GPU with 12 GB of VRAM. The batch size was selected automatically by the Ultralytics training framework [21]. Automatic mixed precision was used during training to accelerate computation by performing some operations in 16-bit floating point. Training took approximately six days.

After fine-tuning, the full-precision PyTorch model was exported to ONNX format. This was done because ONNX Runtime provides convenient support for post-training dynamic and static quantisation [32]. CUDA execution was used where possible for GPU inference.

3.6. YOLO-DWSC Architecture

YOLO-DWSC is a lightweight architectural variant based on YOLOv3-tiny that retains the detection head, but having the backbone convolutions replaced by depthwise separable convolutional layers; see Figure 2 and Table 3. The intention of the architectural alteration is to reduce the number of stored weights and the computational complexity of the convolutional backbone, while keeping the overall detector structure close to YOLOv3-tiny.

We thus adopt YOLO-DWSC as a meaningful architectural compression variant that facilitates a systematic analysis of the effects of making the YOLOv3-tiny-style backbone substantially lighter through depthwise separable convolutions. Specifically, our comparison is focused on the resulting size, speed and accuracy trade-off, together with quantisation and pruning, and is not premised on YOLO-DWSC being superior to YOLOv3-tiny or to other lightweight detectors.

A standard convolution applies filters over all input channels simultaneously. In contrast, a depthwise separable convolution first applies a depthwise convolution, in which each input channel is filtered separately, and then applies a pointwise

1 \times 1

convolution across channels. The pointwise stage reintroduces channel mixing, while the depthwise stage substantially reduces the number of spatial convolution operations. This structure has been widely used in mobile and embedded CNNs [30,31].

3.7. Quantisation

Static post-training quantisation was applied to the fine-tuned YOLOv3 model. In static quantisation, a representative calibration set is passed through the model to estimate quantisation parameters before inference. The calibration set used 80 images per class, chosen to provide a range of image types while keeping the calibration stage tractable. Quantisation was applied per channel where supported, as preliminary testing showed that per-channel quantisation substantially improved performance relative to coarser scaling.

Two static quantisation settings were evaluated for YOLOv3: Int8 weights with Int8 activations, and Int4 weights with Int8 activations. The object detection head was excluded from quantisation where possible, because preliminary testing indicated that the detection head was particularly sensitive to precision reduction. The most promising quantisation setting, Int8 weights and Int8 activations, was also applied to YOLO-DWSC.

3.8. Pruning

Pruning was implemented in PyTorch 2.5.1. Three pruning strategies were evaluated, namely unstructured L1 pruning, random unstructured pruning and structured L2 pruning. Pruning was applied to convolutional and linear layers. Sparsity levels of 0.1 and 0.3 were tested. These values were chosen because preliminary results showed severe performance degradation at or before this range for several pruning methods. Structured L2 pruning was applied across dimension 1, which prunes entire channels for convolutional layers.

A limitation of this experiment is that the object detection head was not excluded. This is important because the detection head directly parameterises class and bounding-box predictions. Pruning this part of the model can cause sharp class imbalance or localisation failure, highlighting that our experiment is aimed to be a sensitivity test rather than a state-of-the-art, optimised pruning solution.

3.9. Prediction and Evaluation Pipeline

The evaluation pipeline separates prediction from metric computation. During prediction, each model is run over the test images and its highest-confidence prediction for each image is stored with the corresponding label. The SPARK 2022 stream used in this work contains a single target object per image, and the models typically produced either one prediction or no prediction. Storing predictions in a file made it possible to compute multiple evaluation metrics over the same predictions without repeating inference.

Labels are stored in YOLO format as normalised centre coordinates, width and height:

(x_{c}, y_{c}, w, h) .

(3)

These values are converted into corner coordinates:

\begin{matrix} x_{\min} & = x_{c} - \frac{w}{2}, & y_{\min} & = y_{c} - \frac{h}{2}, \end{matrix}

(4)

\begin{matrix} x_{\max} & = x_{c} + \frac{w}{2}, & y_{\max} & = y_{c} + \frac{h}{2} . \end{matrix}

(5)

The same coordinate form is used for predicted bounding boxes and ground-truth boxes.

The main evaluation metric we adopted is mean average precision. A prediction is considered correct for a class at a given threshold when the predicted class matches the ground-truth class and the intersection over union (IoU) between predicted and target boxes is at least the threshold:

IoU (B_{p}, B_{g}) = \frac{| B_{p} \cap B_{g} |}{| B_{p} \cup B_{g} |} .

(6)

Precision and recall are defined as

Precision = \frac{T P}{T P + F P}, Recall = \frac{T P}{T P + F N} .

(7)

Predictions are sorted in descending order of confidence. Average precision is computed from the precision-recall curve for each class, and mAP is computed by averaging over classes. We report results using mAP₅₀ and mAP_50:95, where mAP_50:95 averages thresholds from 0.50 to 0.95 in increments of 0.05.

Figure 1 also illustrates why a single IoU threshold can be misleading in a mission context. In particular, a prediction may overlap the target substantially while still failing to capture all of the object geometry needed by a later pose estimation or control stage.

3.10. Two-Pass Region-of-Interest Refinement

A two-pass refinement method was evaluated as a post-training optimisation. The method first runs the detector on the full image. Then, the highest-confidence bounding box is expanded by a fixed scale factor to define a region of interest (ROI). The ROI is cropped from the original image, adjusted to be square to preserve aspect ratio, resized to

1024 \times 1024

, and passed through the detector again. If the second pass produces no prediction, the original prediction is retained; see Figure 3.

Let the first-pass bounding box be

(x_{1}, y_{1}, x_{2}, y_{2})

in the original image and let its width and height be

w = x_{2} - x_{1}

and

h = y_{2} - y_{1}

. With scale factor

s = 1.2

, the expanded coordinates are

\begin{matrix} x_{1}^{R} & = \max (0, x_{1} - (s - 1) w / 2), \end{matrix}

(8)

\begin{matrix} y_{1}^{R} & = \max (0, y_{1} - (s - 1) h / 2), \end{matrix}

(9)

\begin{matrix} x_{2}^{R} & = \min (1024, x_{2} + (s - 1) w / 2), \end{matrix}

(10)

\begin{matrix} y_{2}^{R} & = \min (1024, y_{2} + (s - 1) h / 2) . \end{matrix}

(11)

After the ROI has been made square and resized, the second-pass prediction must be mapped back to the original image coordinates. If the square ROI has top-left coordinate

(x_{1}^{R}, y_{1}^{R})

and side length

l_{R}

, and the resized detector input has side length

S = 1024

, then a second-pass coordinate

\hat{x}

maps back as

x_{orig} = x_{1}^{R} + \frac{l_{R}}{S} \hat{x},

(12)

and analogously for the vertical coordinate. This corrected mapping is important because the second-pass coordinates are expressed in the resized ROI coordinate system, not directly in the original image coordinate system.

The example presented in Figure 4 illustrates the value of evaluating the method despite the eventual negative result. In the case shown, ROI refinement appears geometrically sensible and yet, since the target is compact and set against an expansive background, the cropping and resizing step changes scale, context, and effective image resolution.

4. Results

4.1. Quantisation of YOLOv3

Table 4 shows the results for full-precision YOLOv3 and two statically quantised versions. The full-precision model achieves the highest accuracy, with 0.972 mAP₅₀ and 0.884 mAP_50:95. Static Int8 quantisation reduces the model size from 405 MB to 102 MB while reducing mAP₅₀ only from 0.972 to 0.965. The reduction in mAP_50:95 is larger, from 0.884 to 0.823, indicating that stricter localisation thresholds are more sensitive to quantisation. Static Int4 weight quantisation further reduces model size to 52 MB but produces a larger accuracy decrease. See Figure 5.

The trade-off is even more apparent in Table 5. It can be seen that the Int8 model preserves almost all of the coarse detection performance while reducing the model size by roughly three quarters. The loss is more apparent at stricter IoU thresholds, which indicates that low-precision inference affects precise localisation more strongly than it affects target detection. This finding is of crucial importance for on-board use, because a detector used to identify a broad region of interest may still be useful after Int8 quantisation, whereas a detector used to provide final high-precision localisation would require further validation.

In Table 6 we summarize the computational results of the present comparison. We stress that while FLOP counts, that is the nominal number of arithmetic operations, can be useful as architecture-level proxies, by themselves they do not address directly the deployment question we consider. In particular, in quantised and pruned models, speed in practice depends on a number of things such as the exported model representation, the efficiency of the low-level implementation of the relevant operations, memory management, batching, and hardware support for low-precision or sparse operations. Thus, the measured FPS values are informative for the present controlled comparison, but a spacecraft deployment-oriented study would further require specific target-hardware measurements of latency, memory use, and energy consumption.

The speed results are more complicated than the model-size results. On CPU, the quantised YOLOv3 models were found to be modestly faster than FP32, but on the NVIDIA RTX 3060 GPU the Int8 model was slower than the full-precision model. This counterintuitive result should not be interpreted as an effect of quantisation in general but rather of of the specific, tested hardware and software combination. Quantisation reduces memory footprint in the context of interest, but latency improves only when the inference backend maps the quantised operations to efficient low-precision kernels and when quantisation and dequantisation overhead is not dominant. This finding is of central importance when it comes to understanding the bearing of the present work on deployment. Our experiments show that quantisation can produce large storage savings while retaining much of the detection performance, but they do not by virtue of this validate the models on flight-relevant processors, radiation-tolerant GPUs, FPGAs or Edge TPU-style accelerators. Therefore, herein we use the phrase “deployment-oriented” to mean that we evaluate quantities relevant to deployment, namely size, CPU/GPU FPS and accuracy degradation, and not that the resulting model has been validated for a specific spacecraft computer.

4.2. YOLO-DWSC

Table 7 shows the results for YOLO-DWSC in full precision and Int8 static quantised form. YOLO-DWSC is much smaller and faster than full YOLOv3. The full-precision YOLO-DWSC model is 43 MB and reaches 256.4 FPS on the tested GPU, compared with 405 MB and 50 FPS for full YOLOv3. The cost is lower accuracy: mAP₅₀ decreases from 0.972 for YOLOv3 to 0.849 for YOLO-DWSC, and mAP_50:95 decreases from 0.884 to 0.631. The derived size, speed, and accuracy retention values relative to full-precision YOLOv3 are summarized in Table 8.

The YOLO-DWSC results correspond to a different kind of compression from quantisation. Instead of preserving the original YOLOv3 representation at lower precision, YOLO-DWSC alters the representational power of the detector by replacing standard backbone convolutions with depthwise separable convolutions. The full-precision YOLO-DWSC model is only about one tenth of the size of full YOLOv3 and is more than five times faster on the GPU used. What is evident is a larger drop in localisation quality, especially for the stricter mAP_50:95 condition. These findings suggest that YOLO-DWSC is better placed as a candidate low-cost acquisition or monitoring detector than as a realistic replacement for full YOLOv3 for the final localisation.

The resulting size–accuracy trade-off across the full-precision, quantised, and compressed models is plotted in Figure 6.

The Int8 YOLO-DWSC model reduces size further to 11 MB, but the accuracy drop is more substantial than for YOLOv3 Int8. This suggests that architectural compression and numerical compression interact. YOLO-DWSC already has reduced representational capacity, so additional precision reduction has less redundancy to absorb quantisation error.

4.3. Pruning

Table 9 reports pruning results for YOLOv3. Unstructured L1 pruning at 0.1 sparsity preserves most of the mAP₅₀ performance, reducing it from 0.972 to 0.968, but the mAP_50:95 drop is larger, from 0.884 to 0.854. At 0.3 sparsity, performance collapses. Random unstructured pruning performs very poorly even at 0.1 sparsity, which is expected because random removal does not preserve high-magnitude or high-importance weights. Structured L2 pruning at 0.1 sparsity retains 0.937 mAP₅₀ but drops to 0.808 mAP_50:95; at 0.3 sparsity it fails completely.

The measured GPU FPS differences are small and do not reveal a trend. This is consistent with the known limitation of pruning experiments that measure sparsity without specialised sparse kernels, where the model may contain fewer active weights, but the dense tensor operations performed by the runtime remain largely unchanged. In this setting, pruning does not yet provide effective speed optimisation. It is, however, informative as a measure of sensitivity. In particular, we find that YOLOv3 can tolerate light magnitude-based pruning but is not robust to unstructured random pruning or more aggressive structured pruning applied in this simple way.

As stressed earlier, it is important to remember that our pruning experiments are aimed at providing a sensitivity analysis, not a study of pruning as such. Indeed, a practical pruning protocol would normally prune iteratively, fine-tune after each pruning stage, protect or separately analyse sensitive layers, and evaluate speed on a runtime that exploits sparsity. Our present experiments did not include these steps, and, in particular, pruning was applied to the detection head as well as the backbone, which likely contributed to the rapid performance collapse under the more aggressive settings. The results therefore show that naive pruning is not useful in this setting but not that carefully designed pruning is unsuitable for space-object detection. A comprehensive future study of pruning simpliciter would need to treat backbone-only pruning, detection-head-only pruning, and joint pruning with recovery fine-tuning separately. Moreover, it would also need to report layer sensitivity, since early convolutional layers, feature aggregation layers and final prediction layers may have very different tolerance to sparsification. And hardware-aware sparse inference should be evaluated separately from accuracy retention, because unstructured sparsity need not improve latency on dense tensor kernels.

4.4. Two-Pass Refinement

Table 10 shows the results of the two-pass method. The method does not improve performance, for the full-precision YOLOv3 model, mAP₅₀ decreasing to 0.881 and mAP_50:95 to 0.578. The Int8 model also performs worse than the single-pass Int8 result.

The observed degradation has several likely causes. The method is applied only after training, so the model has not learnt to detect targets in cropped and rescaled images where the object occupies a much larger portion of the frame. The resizing step also introduces interpolation artefacts. Because the ROI is zoomed and then resized to

1024 \times 1024

, many pixels in the second-pass input are interpolated rather than observed, which may remove some of the fine structure that the detector uses for classification and localisation. Finally, any error in the first-pass bounding box can be amplified by the crop and, specifically if the first pass excludes part of the object, the second pass cannot remedy that.

Though negative, these results nevertheless provide useful insight. First, it is apparent that a region refinement method, even though sensible, need not necessarily be beneficial. It may be the case that if such a method is to be used, it should better be incorporated into training rather than applied as a post hoc add-on. For example, a learned coarse-to-fine detector could be devised so that the first pass is trained to produce a conservative object-containing ROI and the second pass is trained on the resulting cropped imagery.

5. Discussion

Our results show that lightweight space-object detection is feasible, but also that the trade-offs are more nuanced than model size alone suggests. The most reliable result from a compression perspective concerns static Int8 quantisation of YOLOv3, which reduces the model from 405 MB to 102 MB while preserving mAP₅₀ almost entirely. The larger drop in mAP_50:95 indicates that quantisation affects precise localisation more than coarse detection. For an early-stage region-of-interest detector this may be acceptable, but that may not be the case for a final detector used directly for capture or pose estimation.

5.1. Relation to Current Detectors and Limits of the Comparison

In order to ensure that our results are interpreted properly, we emphasise the need to distinguish two questions. The first is whether YOLOv3 and YOLO-DWSC are competitive against the best currently available detectors. The experiments we presented in the present paper cannot and do not answer that question. An up-to-date comparison would need to include YOLOv5, YOLOv8, YOLOv10, RT-DETR and recent space-object detectors, all trained and evaluated under the same split and deployment pipeline. The second question is whether specific compression operations applied to a YOLOv3-family detector preserve enough performance to be plausible for lower-cost roles in an on-board perception stack. It is this question that our experiments address.

The aforementioned distinction is important because the main empirical comparison in our experiments is within a specific family of models and compression variants. The results thus support conclusions about relative compression behaviour, such as the stronger preservation of mAP₅₀ than mAP_50:95 under Int8 quantisation, and the much larger speed and size gain but accuracy loss associated with YOLO-DWSC. They do not support general claims about YOLO-DWSC as a lightweight detector for SPARK 2022 or for on-board space debris detection broadly. Put differently, in this work YOLO-DWSC functions as a concrete instrument for an architectural test rather than as a state-of-the-art proposal.

YOLO-DWSC presents a trade-off. It is much smaller and faster than YOLOv3, and its full-precision size of 43 MB makes it a far more realistic choice for constrained hardware. Its mAP₅₀ of 0.849 is still useful for many detection purposes, but the mAP_50:95 value of 0.631 reveals reduced localisation precision. This suggests that replacing standard convolutions with depthwise separable convolutions can produce an efficient detector, but the architecture may need further compensation, such as improved feature fusion, higher-resolution detection heads, better augmentation, or knowledge distillation from a larger model.

The pruning results are weaker. Light unstructured L1 pruning is tolerated, but it does not improve GPU speed in this implementation. Random pruning and aggressive structured pruning damage the model sharply. These findings should not be interpreted as suggesting that pruning is useless for space-object detection. Rather, they only show that simple off-the-shelf pruning, applied without excluding sensitive layers or fine-tuning after pruning, is insufficient by itself. Therefore, future pruning work should treat the detection head carefully, apply iterative pruning with recovery fine-tuning, and evaluate speed only on hardware and runtimes that exploit sparsity.

The two-pass method provides the clearest negative result of our study. It is tempting to assume that cropping around an initial prediction should increase the signal-to-background ratio and improve localisation but in practice, the method degraded performance. This degradation is probably a consequence of the difference between the second-pass input distribution and the training distribution of data, as the model was trained on full images, not on enlarged crops containing partially interpolated structures. The result therefore supports a wider methodological conclusion, that post-training heuristics can fail to improve performance when they present the second-stage detector with inputs that differ systematically from the images used during training, even if the heuristic appears geometrically sensible.

A further limitation of the present work is that the experiments were not run on representative spacecraft hardware. The GPU used in this study is useful for controlled comparison, but it does not provide strong and direct evidence of on-board suitability. The surprising result that Int8 inference was slower than FP32 on GPU is a reminder that compression claims are hardware-dependent, a model that is smaller in memory not necessarily being faster in a particular runtime. Conversely, the same quantised model could be much faster on an accelerator with efficient integer operations. Future work should evaluate these models on embedded GPUs, FPGAs, Edge TPUs or radiation-tolerant processors relevant to space missions.

The evaluation metric used is also worth commenting upon. Mean average precision is one of many metrics appropriate in standard object detection comparisons, and it is undoubtedly useful here too. However, active debris removal and in-orbit servicing place distinct application-specific requirements that mAP cannot capture best. For example, in a pose estimation pipeline, a bounding box with high IoU may still be problematic if it excludes a small but important part of a spacecraft, such as the end of a solar array. For downstream pose estimation, full-object containment or specific object-part inclusion may be more important than close overlap. A conservative box that includes the whole spacecraft and some background may be preferable to a high-IoU box that cuts off a structurally important part.

A mission-oriented evaluation should therefore include additional metrics. One candidate is containment recall: the proportion of ground-truth boxes fully contained within the predicted box, perhaps with a tolerance margin. Another is asymmetric IoU, where a missing part of the ground-truth object is penalised more strongly than including extra background. A third is downstream task performance: if detection is used to crop inputs for pose estimation, the detector should be evaluated by the pose estimation accuracy it enables. These metrics would better reflect the role of detection in the broader operational pipeline.

5.2. Beyond IoU: Containment-Oriented Evaluation

The standard mAP evaluation used in this paper is appropriate for comparison with object detection literature, but it does not fully capture the requirements of active debris removal. In a conventional detection benchmark, a high-IoU prediction is normally treated as good evidence of localisation quality, but in a servicing or removal pipeline, the more important question may be whether the whole spacecraft has been included in the predicted region. A bounding box that excludes a solar array tip, antenna or other protruding component may still achieve a high IoU if the missed region is small relative to the whole box. The downstream impact could nevertheless be significant if the cropped region is passed to a pose-estimation or control module.

The containment metric we propose below is introduced as an evaluation extension that is motivated by the present study rather than as an additional result of it. The experiments reported here use standard object detection metrics, model size, and inference speed to characterize the compression behaviour of the tested models. The reason for introducing a containment recall is motivated by the aim of addressing a mission-oriented concern that is not well captured by mAP and that should be quantified in future work using the same predicted and ground-truth boxes used for conventional detection evaluation.

A simple containment-oriented metric can be defined using the fraction of the ground-truth object box covered by the prediction:

Coverage (B_{p}, B_{g}) = \frac{| B_{p} \cap B_{g} |}{| B_{g} |},

(13)

where

B_{p}

is the predicted bounding box and

B_{g}

is the ground-truth bounding box. A containment recall score at tolerance

ϵ

can then be defined as

{CR}_{ϵ} = \frac{1}{N} \sum_{i = 1}^{N} I [\frac{| B_{g, i} ∖ B_{p, i} |}{| B_{g, i} |} \leq ϵ] .

(14)

For a region-of-interest detector,

ϵ

could be set close to zero, allowing only a very small missed fraction of the ground-truth box. The design of this metric purposefully places greater weight on under-coverage than on over-coverage: a slightly larger crop may be wasteful and include irrelevant background information, but a crop that excludes part of the target removes information that becomes subsequently unrecoverable. Reporting containment recall alongside mAP would therefore provide a more application-specific picture of detector reliability.

Another useful quantity is crop expansion cost. If a detector is modified to produce conservative boxes, it may improve containment by including more background. That trade-off can be measured as

Expansion (B_{p}, B_{g}) = \frac{| B_{p} |}{| B_{g} |} .

(15)

What we are trying to capture here is that the kind of detector needed in a pose-estimation pipeline is one that achieves high containment recall with an expansion cost low enough that the downstream pose model gets a useful, target-dominated crop. This idea encapsulates the intuition behind the two-pass refinement experiment and suggests a clearer evaluation protocol for future work.

The most relevant single metric for the detector role considered here is therefore containment recall at a small tolerance, accompanied by expansion cost. Containment recall alone could be maximised trivially by predicting very large boxes, while expansion cost alone would favour tight boxes that may miss target extremities. The pair of quantities captures the operational trade-off more directly by pushing the detector towards including the whole target without producing a crop so large that downstream pose estimation loses effective resolution or becomes dominated by background.

Another potential concern that must not be overlooked is that SPARK 2022 data is synthetic. Although the authors claim that “the data have been generated under a realistic space simulation environment, with a large diversity in sensing conditions, including extreme and challenging ones for different orbital scenarios, background noise, low signal-to-noise ratio (SNR), and high image contrast that defines actual space imagery”, it is likely that real orbital images can still differ from synthetic ones in noise, illumination, sensor response, compression artefacts, and unmodelled physical effects. Robustness tests should include brightness shifts, blur, sensor noise, partial occlusion, Earth-background variation and out-of-distribution target behaviour. This is most important in the present context since a compressed model may be more sensitive under such perturbations due to less representational redundancy. This is especially salient for YOLO-DWSC and low-bit quantised variants.

Lastly, we note that future work should include per-class failure analysis. Since the SPARK 2022 classes differ in object geometry, apparent scale, and visual ambiguity, aggregate mAP may conceal problems in dealing with small structures, unusual aspect ratios, or lower contrast. This is particularly relevant for deployment, as rare but difficult target configurations may be more operationally important than average performance across a balanced test split. Future work should therefore report per-class AP, class-conditioned containment recall, and representative qualitative failures for each compression method.

The main limitations of the present study and their implications are summarized in Table 11.

6. Conclusions

Space debris is a growing threat to the sustainability of space operations, and close-proximity missions such as active debris removal and in-orbit servicing require reliable on-board perception. This paper investigated YOLO-based object detection for space debris detection using the SPARK 2022 dataset, focusing on compression methods relevant to constrained spacecraft hardware.

Full-precision YOLOv3 achieved the strongest detection performance, with 0.972 mAP₅₀ and 0.884 mAP_50:95. Static Int8 quantisation reduced the model size from 405 MB to 102 MB while preserving most of the mAP₅₀ performance, although stricter localisation accuracy degraded more noticeably. The proposed YOLO-DWSC architecture produced a substantially smaller and faster model, reaching 256.4 FPS on the tested GPU at 43 MB, but with reduced detection accuracy. Pruning showed limited benefit in the tested setting, largely because the runtime did not exploit sparsity and because the detection head was not protected from pruning. The two-pass refinement method degraded performance, showing that post-training region refinement is not reliable without being incorporated into the training distribution.

The main conclusion is that compression methods are promising for on-board space debris detection, but their value depends on the intended role of the detector. If the detector is used for broad target acquisition or region-of-interest proposal, Int8-quantised YOLOv3 and YOLO-DWSC may be viable candidates. If the detector is used for safety-critical localisation or as a precursor to precise pose estimation, further work is needed. That work should include representative hardware evaluation, quantisation-aware training, distillation, better pruning protocols, robustness testing and mission-specific metrics that account for full-object containment.

The overarching message that appears to emerge is that the best practical solution does not lie in the choice of a single compression technique. Rather, the detector architecture, training procedure, evaluation metrics, and inference hardware have to be considered and chosen together. In particular, lightweight backbones should be paired with quantisation-aware training, conservative region-of-interest metrics, and evaluation on representative hardware. The results reported here provide a first picture of this trade-off for YOLO-based space object detection, while also identifying the additional empirical work that would have to be performed before such a detector could be treated as truly deployment-ready.

Author Contributions

Conceptualization, L.K. and O.A.; methodology, L.K. and O.A.; software, L.K.; validation, L.K.; formal analysis, L.K. and O.A.; investigation, L.K.; resources, O.A.; data curation, L.K.; writing—original draft preparation, L.K. and O.A.; writing—review and editing, L.K. and O.A.; visualization, L.K. and O.A.; supervision, O.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable. The study used secondary image datasets and did not involve human participants or animals.

Informed Consent Statement

Not applicable.

Data Availability Statement

The SPARK 2022 dataset is available through Zenodo [6]. Code and trained model files can be made available by the authors upon reasonable request, subject to repository preparation and institutional requirements.

Acknowledgments

The authors thank the School of Computer Science, University of St Andrews, for access to GPU resources used during training and evaluation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kessler, D.J.; Cour-Palais, B.G. Collision Frequency of Artificial Satellites: The Creation of a Debris Belt. J. Geophys. Res. Space Phys. 1978, 83, 2637–2646. [Google Scholar] [CrossRef]
European Space Agency. ESA Space Environment Report 2025. 2025. Available online: https://www.esa.int/Space_Safety/Space_Debris/ESA_Space_Environment_Report_2025 (accessed on 14 May 2026).
European Space Agency. Space Debris by the Numbers. 2026. Available online: https://www.esa.int/Space_Safety/Space_Debris/Space_debris_by_the_numbers (accessed on 14 May 2026).
European Space Agency. Active Debris Removal. 2026. Available online: https://www.esa.int/Space_Safety/Space_Debris/Active_debris_removal (accessed on 14 May 2026).
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Rathinam, A.; Gaudilliere, V.; Mohamed Ali, M.A.; Ortiz Del Castillo, M.; Pauly, L.; Aouada, D. SPARK 2022 Dataset: Spacecraft Detection and Trajectory Estimation. Zenodo 2022. [Google Scholar] [CrossRef]
Zhou, Y.; Zhang, T.; Li, Z.; Qiu, J. Improved Space Object Detection Based on YOLO11. Aerospace 2025, 12, 568. [Google Scholar] [CrossRef]
Guo, Y.; Yin, X.; Xiao, Y.; Zhao, Z.; Yang, X.; Dai, C. Enhanced YOLOv8-Based Method for Space Debris Detection Using Cross-Scale Feature Fusion. Discov. Appl. Sci. 2025, 7, 95. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection. arXiv 2024, arXiv:2304.08069. [Google Scholar] [CrossRef]
Fraunhofer Institute for High Frequency Physics and Radar Techniques FHR. Space Observation Radar TIRA. 2026. Available online: https://www.fhr.fraunhofer.de/en/the-institute/technical-equipment/Space-observation-radar-TIRA.html (accessed on 14 May 2026).
Liu, M.; Wang, H.; Wang, H.; Zhao, L.; Peng, Q.; Zhang, S.; Chen, W. Space Debris Detection and Positioning Technology Based on Multiple Star Trackers. Appl. Sci. 2022, 12, 3593. [Google Scholar] [CrossRef]
Lotti, A.; Modenini, D.; Tortora, P.; Saponara, M.; Perino, M.A. Deep Learning for Real Time Satellite Pose Estimation on Low Power Edge TPU. arXiv 2022, arXiv:2204.03296. [Google Scholar] [CrossRef]
Pauly, L.; Rharbaoui, W.; Shneider, C.; Rathinam, A.; Gaudilliere, V.; Aouada, D. A Survey on Deep Learning-Based Monocular Spacecraft Pose Estimation: Current State, Limitations and Prospects. Acta Astronaut. 2023, 212, 339–360. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar] [CrossRef]
AlDahoul, N.; Karim, H.A.; De Castro, A.; Tan, M.J.T. Localization and Classification of Space Objects Using EfficientDet Detector for Space Situational Awareness. Sci. Rep. 2022, 12, 21896. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Huo, J.; Ma, P.; Jiang, R. Target Localization Method of Non-Cooperative Spacecraft on On-Orbit Service. Chin. J. Aeronaut. 2022, 35, 336–348. [Google Scholar] [CrossRef]
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO, Version 8.0.0; Computer Software. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 28 June 2026).
Pham, D.S.; Arandjelović, O.; Venkatesh, S. Detection of dynamic background due to swaying movements from motion features. IEEE Trans. Image Process. 2014, 24, 332–344. [Google Scholar] [CrossRef] [PubMed]
Arandjelović, O.; Pham, D.S.; Venkatesh, S. CCTV scene perspective distortion estimation from low-level motion features. IEEE Trans. Circuits Syst. Video Technol. 2015, 26, 939–949. [Google Scholar] [CrossRef]
Yu, T.; Chen, C.; Zhou, Y.; Hu, X. Improving Surveillance Object Detection with Adaptive Omni-Attention over both Inter-Frame and Intra-Frame Context. In Proceedings of the Asian Conference on Computer Vision (ACCV), Macao, China, 4–8 December 2022; pp. 2697–2712. [Google Scholar]
Wang, X.; Hu, X.; Chen, C.; Fan, Z.; Peng, S. Illuminating Vehicles with Motion Priors for Surveillance Vehicle Detection. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020; pp. 2021–2025. [Google Scholar] [CrossRef]
Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2704–2713. [Google Scholar] [CrossRef]
Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv 2016, arXiv:1510.00149. [Google Scholar]
Frankle, J.; Carbin, M. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. arXiv 2019, arXiv:1803.03635. [Google Scholar]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar] [CrossRef]
ONNX Runtime Developers. Model Optimizations: Quantization. 2026. Available online: https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html (accessed on 14 May 2026).

Figure 1. Illustrative bounding-box prediction from the SPARK 2022 dataset. The example shows the ground-truth object extent and a detector prediction, together with the corresponding IoU value, thus visually illustrating the character of the localisation criterion used when computing average precision and mAP.

Figure 2. Simplified YOLO-DWSC design. The detection head is kept close to YOLOv3-tiny, while the backbone convolutional layers are replaced by depthwise separable convolutions.

Figure 3. Two-pass ROI refinement. The method attempts to use the first prediction to crop a higher-signal region for a second prediction.

Figure 4. Example of the two-pass refinement process on a SPARK 2022 image. The first-pass detector output defines an expanded region of interest, which is cropped, resized and processed again. The example shows both the intuitive appeal of the procedure and the potential source of error, namely that the second-pass image is no longer drawn from the same distribution as the full-frame training images.

Figure 5. Accuracy effects of YOLOv3 post-training quantisation. The difference between mAP₅₀ and mAP_50:95 shows that tighter localisation is more strongly affected by precision reduction.

Figure 6. Model size and mAP₅₀ trade-off across full-precision, quantised and architecture-compressed models.

Table 1. Possible roles of object detection in an active debris removal or servicing pipeline. The same detector performance may be acceptable in one role and inadequate in another.

Mission Stage	Detection Role	Main Evaluation Concern
Longer-range approach	Acquire the target and reject background clutter	High recall and stable target presence detection
Intermediate approach	Provide a region of interest for tracking or pose estimation	Conservative target containment and low latency
Close proximity	Support hand-off to pose estimation and control	Precise localisation, robustness and predictable failure modes
Fallback or monitoring mode	Run continuously under power or compute limits	Low memory footprint, low inference cost and graceful degradation

Table 2. Summary of the re-split SPARK 2022 evaluation protocol used in this study.

Aspect	Interpretation
Class balance	Maintained by random class-wise sampling into training, validation and test subsets.
Benchmark comparability	Limited because the original challenge test labels were unavailable and the standard split was not used.
Internal comparison	Valid for comparing the models reported in this paper, since all variants are evaluated on the same held-out test subset.
Leakage risk	Reduced by random sampling within classes, but not eliminated because synthetic near-duplicates or shared rendering conditions may remain.
Empirical goal	Compression case study under a controlled split, not state-of-the-art benchmark performance.

Table 3. Backbone-level architectural comparison between YOLOv3-tiny and YOLO-DWSC.

Component	YOLOv3-Tiny	YOLO-DWSC
Backbone convolution	Standard convolutional layers	Depthwise separable convolutional layers
Downsampling	Max-pooling layers	Max-pooling layers retained
Detection head	YOLOv3-tiny detection head	YOLOv3-tiny detection head retained
Main design aim	Compact YOLO detector	Further reduction in model footprint and convolutional cost
Expected weakness	Lower accuracy than full YOLOv3	Reduced representational capacity from separable convolutions

Table 4. Quantised and full-precision YOLOv3 results on the held-out SPARK 2022 test split.

Method	mAP₅₀	mAP_50:95	CPU FPS	GPU FPS	Size (MB)
FP32	0.972	0.884	1.1	50.0	405
Static Int8wInt8a	0.965	0.823	1.4	35.0	102
Static Int4wInt8a	0.904	0.731	1.6	N/A	52

Table 5. Derived compression summary for YOLOv3 relative to the full-precision model. Accuracy retention is computed as the compressed model’s mAP divided by the corresponding FP32 mAP.

Method	Size Reduction	mAP₅₀ Reduction	mAP_50:95 Reduction	GPU Speed Relative to FP32
Static Int8wInt8a	74.8%	99.3%	93.1%	0.70×
Static Int4wInt8a	87.2%	93.0%	82.7%	N/A

Table 6. Computational and deployment-relevant quantities measured in the present study. The table reports the quantities used for the central comparison, namely model size and measured CPU/GPU throughput under the same software and hardware setup.

Model	Precision	Size (MB)	CPU FPS	GPU FPS
YOLOv3	FP32	405	1.1	50.0
YOLOv3	Static Int8wInt8a	102	1.4	35.0
YOLOv3	Static Int4wInt8a	52	1.6	N/A
YOLO-DWSC	FP32	43	23.3	256.4
YOLO-DWSC	Static Int8wInt8a	11	17.4	104.2

Table 7. Quantised and full-precision YOLO-DWSC results on the held-out SPARK 2022 test split.

Method	mAP₅₀	mAP_50:95	CPU FPS	GPU FPS	Size (MB)
FP32	0.849	0.631	23.3	256.4	43
Static Int8wInt8a	0.771	0.556	17.4	104.2	11

Table 8. Derived comparison of YOLO-DWSC with full-precision YOLOv3. Values are computed from the same held-out test evaluation.

Model	Size Relative to YOLOv3 FP32	GPU Speed Relative to YOLOv3 FP32	mAP₅₀ Retention	mAP_50:95 Retention
YOLO-DWSC FP32	10.6%	5.13×	87.3%	71.4%
YOLO-DWSC Int8wInt8a	2.7%	2.08×	79.3%	62.9%

Table 9. Pruning results for full-precision fine-tuned YOLOv3.

Method	mAP₅₀	mAP_50:95	GPU FPS
Unstructured L1, sparsity 0.1	0.968	0.854	39.7
Unstructured L1, sparsity 0.3	0.483	0.328	40.1
Random unstructured, sparsity 0.1	0.040	0.024	40.3
Random unstructured, sparsity 0.3	0.000	0.000	41.3
Structured L2, sparsity 0.1, dim 1	0.937	0.808	40.8
Structured L2, sparsity 0.3, dim 1	0.000	0.000	40.3

Table 10. Two-pass ROI refinement results using fine-tuned YOLOv3.

Method	mAP₅₀	mAP_50:95
FP32 two-pass	0.881	0.578
Static Int8wInt8a two-pass	0.843	0.578

Table 11. Main limitations of the present study and their implications.

Limitation	Consequence
No standard SPARK test-label evaluation	Results are internally comparable but not directly leaderboard-comparable.
No modern-detector training runs	The study does not establish competitiveness against YOLOv5/8/10, RT-DETR or recent task-specific detectors.
No YOLOv3-tiny baseline under identical conditions	YOLO-DWSC cannot be interpreted as a full ablation of YOLOv3-tiny.
No representative spacecraft hardware	Speed results are runtime-specific and cannot validate flight deployment.
No optimised pruning pipeline	Pruning results show the weakness of naive pruning, not the limit of hardware-aware iterative pruning.
Containment recall is not computed	The containment metric is a proposed mission-oriented extension, not an empirical result of this study.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kerr, L.; Arandjelović, O. A Deployment-Oriented Case Study of YOLO-Based Model Compression for On-Board Space Debris Detection. Information 2026, 17, 650. https://doi.org/10.3390/info17070650

AMA Style

Kerr L, Arandjelović O. A Deployment-Oriented Case Study of YOLO-Based Model Compression for On-Board Space Debris Detection. Information. 2026; 17(7):650. https://doi.org/10.3390/info17070650

Chicago/Turabian Style

Kerr, Liam, and Ognjen Arandjelović. 2026. "A Deployment-Oriented Case Study of YOLO-Based Model Compression for On-Board Space Debris Detection" Information 17, no. 7: 650. https://doi.org/10.3390/info17070650

APA Style

Kerr, L., & Arandjelović, O. (2026). A Deployment-Oriented Case Study of YOLO-Based Model Compression for On-Board Space Debris Detection. Information, 17(7), 650. https://doi.org/10.3390/info17070650

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Deployment-Oriented Case Study of YOLO-Based Model Compression for On-Board Space Debris Detection

Abstract

1. Introduction

Operational Role of Detection in Active Debris Removal

2. Related Work

2.1. Space Debris Detection and Close-Proximity Perception

2.2. CNN-Based Object Detection

2.3. Modern Real-Time, Small-Object and Video-Aware Detectors

2.4. Model Compression for On-Board Inference

3. Materials and Methods

3.1. Experimental Scope and Choice of Baseline

3.2. Dataset

3.3. Dataset Split, Comparability and Leakage Risk

3.4. Label and Prediction Quality Control

3.5. Baseline Detector

3.6. YOLO-DWSC Architecture

3.7. Quantisation

3.8. Pruning

3.9. Prediction and Evaluation Pipeline

3.10. Two-Pass Region-of-Interest Refinement

4. Results

4.1. Quantisation of YOLOv3

4.2. YOLO-DWSC

4.3. Pruning

4.4. Two-Pass Refinement

5. Discussion

5.1. Relation to Current Detectors and Limits of the Comparison

5.2. Beyond IoU: Containment-Oriented Evaluation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI