1. Introduction
Space debris is an increasingly serious problem for the long-term sustainability of space operations. Large non-functional spacecraft, spent rocket bodies, mission-related objects and smaller fragments occupy operational regions around the Earth, especially low Earth orbit (LEO). These objects threaten active satellites, crewed platforms, and future exploration missions. The risk is not limited to a single collision event since a collision can generate many further fragments, raising the probability of subsequent collisions and contributing to the self-reinforcing scenario often associated with the Kessler syndrome [
1,
2]. Recent ESA reports paint a stark picture of the present debris environment, with serious practical consequences to operational context. Tens of thousands of objects are tracked, while much larger populations of centimetre- and millimetre-scale debris are inferred statistically rather than routinely catalogued [
2,
3]. Although in the present paper the smaller objects are not of primary interest, they do motivate the wider problem. Even when the immediate focus is on detecting large known spacecraft or large debris objects during close-proximity operations, the environment in which these operations take place is already congested and dynamic, and the consequences of error potentially catastrophic.
Current mitigation practices, including end-of-life disposal and collision avoidance, are necessary but may not be sufficient. Active debris removal proposes intercepting selected high-risk objects, such as large defunct spacecraft or rocket bodies, and safely disposing of them [
4]. In-orbit servicing and manufacturing similarly require spacecraft to approach, inspect and sometimes manipulate non-cooperative or only partly cooperative targets. These missions depend on close-range perception. A chaser spacecraft must detect the target, estimate its pose, track its motion and, in some cases, support detumbling or capture. A failure during this process could damage the target, the servicing spacecraft, or nearby assets, and could create further debris.
The perception problem is difficult because space hardware is heavily constrained. Spacecraft are limited by mass, power availability, processing capacity, memory and the use of specialised or radiation-tolerant components. A model that performs well on a modern desktop GPU is therefore not automatically suitable for an on-board mission. This important fact is often sidelined in published computer vision benchmarking, with the primary interest excessively focused on accuracy on a standard dataset. For actual space applications, inference latency, model size, numerical precision, hardware support and robustness to operating conditions are issues that cannot be separated from a method’s efficacy. Succinctly put, a detector is useful only insofar as it can be employed within a credible operational pipeline.
This paper investigates the use of convolutional neural network (CNN)-based object detection models for space debris detection and evaluates methods by which they can be compressed to run faster, use less memory and reduce computational demand. The work focuses on YOLOv3 and YOLOv3-tiny [
5], using the SPARK 2022 spacecraft detection dataset [
6]. We study three compression directions. First, static post-training quantisation is applied to YOLOv3 using 8-bit and 4-bit weight representations. Second, pruning is applied to investigate whether sparsity can reduce the effective network footprint while preserving accuracy. Third, a lightweight architectural variant, YOLO-DWSC, is introduced. YOLO-DWSC modifies the YOLOv3-tiny backbone by replacing standard convolutions with depthwise separable convolutions, while preserving the detection head.
The key contributions of this work are as follows. First, we conduct the first controlled case study of YOLOv3 compression on the SPARK 2022 dataset, reporting and analysing a diverse range of metrics, including accuracy, model size and CPU/GPU speed measurements. Second, we evaluate a lightweight YOLOv3-tiny-derived variant, YOLO-DWSC, as an architectural test of the accuracy-efficiency trade-off produced by replacing standard backbone convolutions with depthwise separable convolutions. Third, we critically analyse a two-pass region-of-interest refinement strategy intended to improve prediction accuracy after an initial detection. Fourth, reflecting on the practices in the published literature, which were adopted in the experimental part of this paper, we argue that in future standard mean average precision (mAP) should be supplemented in active debris removal contexts by containment-oriented metrics.
To properly contextualise the present work at the very outset, we emphasise that our aim herein is not to develop the most performant detector for SPARK 2022. Indeed, more recent YOLO variants, transformer-based detectors and task-specific models can achieve higher accuracy on many object detection problems, and recent space-object work has also moved beyond YOLOv3 [
7,
8,
9,
10]. Our aim is instead to examine how a widely used detector family behaves under specific architectural simplification and compression, and to discuss the consequent limits of using standard object-detection metrics for an active debris removal perception pipeline. This aim is motivated by the broader context, namely by the fact that on-board adoption is shaped not only by peak mAP, but also by the relationship between accuracy, model footprint, inference path, hardware support and the detector’s role in the mission stack.
Operational Role of Detection in Active Debris Removal
The detection task considered in this paper should be understood as one component within a wider active debris removal or in-orbit servicing pipeline. A monocular detector is unlikely to be the only perception system used during final capture. What is more likely is that object detection would provide the acquisition target, region-of-interest selection, and a reliable input crop for later pose estimation or tracking modules. This realisation is significant in that it sheds light on how the reported results should be interpreted. A detector used for early target acquisition could have looser localisation requirements if it can reliably identify the target and provide a sufficiently narrow region of interest for downstream processing. A detector used immediately before capture, however, would be subject to much stricter localisation and robustness requirements.
This operational framing also contextualises the significance and the role of compression. A smaller and faster model need not replace a larger detector in every part of a mission. Instead, what may be preferable is to allow an on-board spacecraft to perform frequent low-cost monitoring, triggering higher-cost perception only when needed, or sending a compact region of interest for more detailed processing on the ground or on-board; see
Table 1. Thus, understood properly, the evaluation we report in the present paper does not concern only compression as a means of preserving a single model’s behaviour under a lower numerical precision or a smaller footprint, but compression as a way of exploring different possible roles in a perception stack.
5. Discussion
Our results show that lightweight space-object detection is feasible, but also that the trade-offs are more nuanced than model size alone suggests. The most reliable result from a compression perspective concerns static Int8 quantisation of YOLOv3, which reduces the model from 405 MB to 102 MB while preserving mAP50 almost entirely. The larger drop in mAP50:95 indicates that quantisation affects precise localisation more than coarse detection. For an early-stage region-of-interest detector this may be acceptable, but that may not be the case for a final detector used directly for capture or pose estimation.
5.1. Relation to Current Detectors and Limits of the Comparison
In order to ensure that our results are interpreted properly, we emphasise the need to distinguish two questions. The first is whether YOLOv3 and YOLO-DWSC are competitive against the best currently available detectors. The experiments we presented in the present paper cannot and do not answer that question. An up-to-date comparison would need to include YOLOv5, YOLOv8, YOLOv10, RT-DETR and recent space-object detectors, all trained and evaluated under the same split and deployment pipeline. The second question is whether specific compression operations applied to a YOLOv3-family detector preserve enough performance to be plausible for lower-cost roles in an on-board perception stack. It is this question that our experiments address.
The aforementioned distinction is important because the main empirical comparison in our experiments is within a specific family of models and compression variants. The results thus support conclusions about relative compression behaviour, such as the stronger preservation of mAP50 than mAP50:95 under Int8 quantisation, and the much larger speed and size gain but accuracy loss associated with YOLO-DWSC. They do not support general claims about YOLO-DWSC as a lightweight detector for SPARK 2022 or for on-board space debris detection broadly. Put differently, in this work YOLO-DWSC functions as a concrete instrument for an architectural test rather than as a state-of-the-art proposal.
YOLO-DWSC presents a trade-off. It is much smaller and faster than YOLOv3, and its full-precision size of 43 MB makes it a far more realistic choice for constrained hardware. Its mAP50 of 0.849 is still useful for many detection purposes, but the mAP50:95 value of 0.631 reveals reduced localisation precision. This suggests that replacing standard convolutions with depthwise separable convolutions can produce an efficient detector, but the architecture may need further compensation, such as improved feature fusion, higher-resolution detection heads, better augmentation, or knowledge distillation from a larger model.
The pruning results are weaker. Light unstructured L1 pruning is tolerated, but it does not improve GPU speed in this implementation. Random pruning and aggressive structured pruning damage the model sharply. These findings should not be interpreted as suggesting that pruning is useless for space-object detection. Rather, they only show that simple off-the-shelf pruning, applied without excluding sensitive layers or fine-tuning after pruning, is insufficient by itself. Therefore, future pruning work should treat the detection head carefully, apply iterative pruning with recovery fine-tuning, and evaluate speed only on hardware and runtimes that exploit sparsity.
The two-pass method provides the clearest negative result of our study. It is tempting to assume that cropping around an initial prediction should increase the signal-to-background ratio and improve localisation but in practice, the method degraded performance. This degradation is probably a consequence of the difference between the second-pass input distribution and the training distribution of data, as the model was trained on full images, not on enlarged crops containing partially interpolated structures. The result therefore supports a wider methodological conclusion, that post-training heuristics can fail to improve performance when they present the second-stage detector with inputs that differ systematically from the images used during training, even if the heuristic appears geometrically sensible.
A further limitation of the present work is that the experiments were not run on representative spacecraft hardware. The GPU used in this study is useful for controlled comparison, but it does not provide strong and direct evidence of on-board suitability. The surprising result that Int8 inference was slower than FP32 on GPU is a reminder that compression claims are hardware-dependent, a model that is smaller in memory not necessarily being faster in a particular runtime. Conversely, the same quantised model could be much faster on an accelerator with efficient integer operations. Future work should evaluate these models on embedded GPUs, FPGAs, Edge TPUs or radiation-tolerant processors relevant to space missions.
The evaluation metric used is also worth commenting upon. Mean average precision is one of many metrics appropriate in standard object detection comparisons, and it is undoubtedly useful here too. However, active debris removal and in-orbit servicing place distinct application-specific requirements that mAP cannot capture best. For example, in a pose estimation pipeline, a bounding box with high IoU may still be problematic if it excludes a small but important part of a spacecraft, such as the end of a solar array. For downstream pose estimation, full-object containment or specific object-part inclusion may be more important than close overlap. A conservative box that includes the whole spacecraft and some background may be preferable to a high-IoU box that cuts off a structurally important part.
A mission-oriented evaluation should therefore include additional metrics. One candidate is containment recall: the proportion of ground-truth boxes fully contained within the predicted box, perhaps with a tolerance margin. Another is asymmetric IoU, where a missing part of the ground-truth object is penalised more strongly than including extra background. A third is downstream task performance: if detection is used to crop inputs for pose estimation, the detector should be evaluated by the pose estimation accuracy it enables. These metrics would better reflect the role of detection in the broader operational pipeline.
5.2. Beyond IoU: Containment-Oriented Evaluation
The standard mAP evaluation used in this paper is appropriate for comparison with object detection literature, but it does not fully capture the requirements of active debris removal. In a conventional detection benchmark, a high-IoU prediction is normally treated as good evidence of localisation quality, but in a servicing or removal pipeline, the more important question may be whether the whole spacecraft has been included in the predicted region. A bounding box that excludes a solar array tip, antenna or other protruding component may still achieve a high IoU if the missed region is small relative to the whole box. The downstream impact could nevertheless be significant if the cropped region is passed to a pose-estimation or control module.
The containment metric we propose below is introduced as an evaluation extension that is motivated by the present study rather than as an additional result of it. The experiments reported here use standard object detection metrics, model size, and inference speed to characterize the compression behaviour of the tested models. The reason for introducing a containment recall is motivated by the aim of addressing a mission-oriented concern that is not well captured by mAP and that should be quantified in future work using the same predicted and ground-truth boxes used for conventional detection evaluation.
A simple containment-oriented metric can be defined using the fraction of the ground-truth object box covered by the prediction:
where
is the predicted bounding box and
is the ground-truth bounding box. A containment recall score at tolerance
can then be defined as
For a region-of-interest detector, could be set close to zero, allowing only a very small missed fraction of the ground-truth box. The design of this metric purposefully places greater weight on under-coverage than on over-coverage: a slightly larger crop may be wasteful and include irrelevant background information, but a crop that excludes part of the target removes information that becomes subsequently unrecoverable. Reporting containment recall alongside mAP would therefore provide a more application-specific picture of detector reliability.
Another useful quantity is crop expansion cost. If a detector is modified to produce conservative boxes, it may improve containment by including more background. That trade-off can be measured as
What we are trying to capture here is that the kind of detector needed in a pose-estimation pipeline is one that achieves high containment recall with an expansion cost low enough that the downstream pose model gets a useful, target-dominated crop. This idea encapsulates the intuition behind the two-pass refinement experiment and suggests a clearer evaluation protocol for future work.
The most relevant single metric for the detector role considered here is therefore containment recall at a small tolerance, accompanied by expansion cost. Containment recall alone could be maximised trivially by predicting very large boxes, while expansion cost alone would favour tight boxes that may miss target extremities. The pair of quantities captures the operational trade-off more directly by pushing the detector towards including the whole target without producing a crop so large that downstream pose estimation loses effective resolution or becomes dominated by background.
Another potential concern that must not be overlooked is that SPARK 2022 data is synthetic. Although the authors claim that “the data have been generated under a realistic space simulation environment, with a large diversity in sensing conditions, including extreme and challenging ones for different orbital scenarios, background noise, low signal-to-noise ratio (SNR), and high image contrast that defines actual space imagery”, it is likely that real orbital images can still differ from synthetic ones in noise, illumination, sensor response, compression artefacts, and unmodelled physical effects. Robustness tests should include brightness shifts, blur, sensor noise, partial occlusion, Earth-background variation and out-of-distribution target behaviour. This is most important in the present context since a compressed model may be more sensitive under such perturbations due to less representational redundancy. This is especially salient for YOLO-DWSC and low-bit quantised variants.
Lastly, we note that future work should include per-class failure analysis. Since the SPARK 2022 classes differ in object geometry, apparent scale, and visual ambiguity, aggregate mAP may conceal problems in dealing with small structures, unusual aspect ratios, or lower contrast. This is particularly relevant for deployment, as rare but difficult target configurations may be more operationally important than average performance across a balanced test split. Future work should therefore report per-class AP, class-conditioned containment recall, and representative qualitative failures for each compression method.
The main limitations of the present study and their implications are summarized in
Table 11.
Author Contributions
Conceptualization, L.K. and O.A.; methodology, L.K. and O.A.; software, L.K.; validation, L.K.; formal analysis, L.K. and O.A.; investigation, L.K.; resources, O.A.; data curation, L.K.; writing—original draft preparation, L.K. and O.A.; writing—review and editing, L.K. and O.A.; visualization, L.K. and O.A.; supervision, O.A. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable. The study used secondary image datasets and did not involve human participants or animals.
Informed Consent Statement
Not applicable.
Data Availability Statement
The SPARK 2022 dataset is available through Zenodo [
6]. Code and trained model files can be made available by the authors upon reasonable request, subject to repository preparation and institutional requirements.
Acknowledgments
The authors thank the School of Computer Science, University of St Andrews, for access to GPU resources used during training and evaluation.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Kessler, D.J.; Cour-Palais, B.G. Collision Frequency of Artificial Satellites: The Creation of a Debris Belt. J. Geophys. Res. Space Phys. 1978, 83, 2637–2646. [Google Scholar] [CrossRef]
- European Space Agency. ESA Space Environment Report 2025. 2025. Available online: https://www.esa.int/Space_Safety/Space_Debris/ESA_Space_Environment_Report_2025 (accessed on 14 May 2026).
- European Space Agency. Space Debris by the Numbers. 2026. Available online: https://www.esa.int/Space_Safety/Space_Debris/Space_debris_by_the_numbers (accessed on 14 May 2026).
- European Space Agency. Active Debris Removal. 2026. Available online: https://www.esa.int/Space_Safety/Space_Debris/Active_debris_removal (accessed on 14 May 2026).
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Rathinam, A.; Gaudilliere, V.; Mohamed Ali, M.A.; Ortiz Del Castillo, M.; Pauly, L.; Aouada, D. SPARK 2022 Dataset: Spacecraft Detection and Trajectory Estimation. Zenodo 2022. [Google Scholar] [CrossRef]
- Zhou, Y.; Zhang, T.; Li, Z.; Qiu, J. Improved Space Object Detection Based on YOLO11. Aerospace 2025, 12, 568. [Google Scholar] [CrossRef]
- Guo, Y.; Yin, X.; Xiao, Y.; Zhao, Z.; Yang, X.; Dai, C. Enhanced YOLOv8-Based Method for Space Debris Detection Using Cross-Scale Feature Fusion. Discov. Appl. Sci. 2025, 7, 95. [Google Scholar] [CrossRef]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection. arXiv 2024, arXiv:2304.08069. [Google Scholar] [CrossRef]
- Fraunhofer Institute for High Frequency Physics and Radar Techniques FHR. Space Observation Radar TIRA. 2026. Available online: https://www.fhr.fraunhofer.de/en/the-institute/technical-equipment/Space-observation-radar-TIRA.html (accessed on 14 May 2026).
- Liu, M.; Wang, H.; Wang, H.; Zhao, L.; Peng, Q.; Zhang, S.; Chen, W. Space Debris Detection and Positioning Technology Based on Multiple Star Trackers. Appl. Sci. 2022, 12, 3593. [Google Scholar] [CrossRef]
- Lotti, A.; Modenini, D.; Tortora, P.; Saponara, M.; Perino, M.A. Deep Learning for Real Time Satellite Pose Estimation on Low Power Edge TPU. arXiv 2022, arXiv:2204.03296. [Google Scholar] [CrossRef]
- Pauly, L.; Rharbaoui, W.; Shneider, C.; Rathinam, A.; Gaudilliere, V.; Aouada, D. A Survey on Deep Learning-Based Monocular Spacecraft Pose Estimation: Current State, Limitations and Prospects. Acta Astronaut. 2023, 212, 339–360. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar] [CrossRef]
- AlDahoul, N.; Karim, H.A.; De Castro, A.; Tan, M.J.T. Localization and Classification of Space Objects Using EfficientDet Detector for Space Situational Awareness. Sci. Rep. 2022, 12, 21896. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Huo, J.; Ma, P.; Jiang, R. Target Localization Method of Non-Cooperative Spacecraft on On-Orbit Service. Chin. J. Aeronaut. 2022, 35, 336–348. [Google Scholar] [CrossRef]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO, Version 8.0.0; Computer Software. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 28 June 2026).
- Pham, D.S.; Arandjelović, O.; Venkatesh, S. Detection of dynamic background due to swaying movements from motion features. IEEE Trans. Image Process. 2014, 24, 332–344. [Google Scholar] [CrossRef] [PubMed]
- Arandjelović, O.; Pham, D.S.; Venkatesh, S. CCTV scene perspective distortion estimation from low-level motion features. IEEE Trans. Circuits Syst. Video Technol. 2015, 26, 939–949. [Google Scholar] [CrossRef]
- Yu, T.; Chen, C.; Zhou, Y.; Hu, X. Improving Surveillance Object Detection with Adaptive Omni-Attention over both Inter-Frame and Intra-Frame Context. In Proceedings of the Asian Conference on Computer Vision (ACCV), Macao, China, 4–8 December 2022; pp. 2697–2712. [Google Scholar]
- Wang, X.; Hu, X.; Chen, C.; Fan, Z.; Peng, S. Illuminating Vehicles with Motion Priors for Surveillance Vehicle Detection. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020; pp. 2021–2025. [Google Scholar] [CrossRef]
- Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2704–2713. [Google Scholar] [CrossRef]
- Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv 2016, arXiv:1510.00149. [Google Scholar]
- Frankle, J.; Carbin, M. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. arXiv 2019, arXiv:1803.03635. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar] [CrossRef]
- ONNX Runtime Developers. Model Optimizations: Quantization. 2026. Available online: https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html (accessed on 14 May 2026).
Figure 1.
Illustrative bounding-box prediction from the SPARK 2022 dataset. The example shows the ground-truth object extent and a detector prediction, together with the corresponding IoU value, thus visually illustrating the character of the localisation criterion used when computing average precision and mAP.
Figure 2.
Simplified YOLO-DWSC design. The detection head is kept close to YOLOv3-tiny, while the backbone convolutional layers are replaced by depthwise separable convolutions.
Figure 3.
Two-pass ROI refinement. The method attempts to use the first prediction to crop a higher-signal region for a second prediction.
Figure 4.
Example of the two-pass refinement process on a SPARK 2022 image. The first-pass detector output defines an expanded region of interest, which is cropped, resized and processed again. The example shows both the intuitive appeal of the procedure and the potential source of error, namely that the second-pass image is no longer drawn from the same distribution as the full-frame training images.
Figure 5.
Accuracy effects of YOLOv3 post-training quantisation. The difference between mAP50 and mAP50:95 shows that tighter localisation is more strongly affected by precision reduction.
Figure 6.
Model size and mAP50 trade-off across full-precision, quantised and architecture-compressed models.
Table 1.
Possible roles of object detection in an active debris removal or servicing pipeline. The same detector performance may be acceptable in one role and inadequate in another.
| Mission Stage | Detection Role | Main Evaluation Concern |
|---|
| Longer-range approach | Acquire the target and reject background clutter | High recall and stable target presence detection |
| Intermediate approach | Provide a region of interest for tracking or pose estimation | Conservative target containment and low latency |
| Close proximity | Support hand-off to pose estimation and control | Precise localisation, robustness and predictable failure modes |
| Fallback or monitoring mode | Run continuously under power or compute limits | Low memory footprint, low inference cost and graceful degradation |
Table 2.
Summary of the re-split SPARK 2022 evaluation protocol used in this study.
| Aspect | Interpretation |
|---|
| Class balance | Maintained by random class-wise sampling into training, validation and test subsets. |
| Benchmark comparability | Limited because the original challenge test labels were unavailable and the standard split was not used. |
| Internal comparison | Valid for comparing the models reported in this paper, since all variants are evaluated on the same held-out test subset. |
| Leakage risk | Reduced by random sampling within classes, but not eliminated because synthetic near-duplicates or shared rendering conditions may remain. |
| Empirical goal | Compression case study under a controlled split, not state-of-the-art benchmark performance. |
Table 3.
Backbone-level architectural comparison between YOLOv3-tiny and YOLO-DWSC.
| Component | YOLOv3-Tiny | YOLO-DWSC |
|---|
| Backbone convolution | Standard convolutional layers | Depthwise separable convolutional layers |
| Downsampling | Max-pooling layers | Max-pooling layers retained |
| Detection head | YOLOv3-tiny detection head | YOLOv3-tiny detection head retained |
| Main design aim | Compact YOLO detector | Further reduction in model footprint and convolutional cost |
| Expected weakness | Lower accuracy than full YOLOv3 | Reduced representational capacity from separable convolutions |
Table 4.
Quantised and full-precision YOLOv3 results on the held-out SPARK 2022 test split.
| Method | mAP50 | mAP50:95 | CPU FPS | GPU FPS | Size (MB) |
|---|
| FP32 | 0.972 | 0.884 | 1.1 | 50.0 | 405 |
| Static Int8wInt8a | 0.965 | 0.823 | 1.4 | 35.0 | 102 |
| Static Int4wInt8a | 0.904 | 0.731 | 1.6 | N/A | 52 |
Table 5.
Derived compression summary for YOLOv3 relative to the full-precision model. Accuracy retention is computed as the compressed model’s mAP divided by the corresponding FP32 mAP.
| Method | Size Reduction | mAP50 Reduction | mAP50:95 Reduction | GPU Speed Relative to FP32 |
|---|
| Static Int8wInt8a | 74.8% | 99.3% | 93.1% | 0.70× |
| Static Int4wInt8a | 87.2% | 93.0% | 82.7% | N/A |
Table 6.
Computational and deployment-relevant quantities measured in the present study. The table reports the quantities used for the central comparison, namely model size and measured CPU/GPU throughput under the same software and hardware setup.
| Model | Precision | Size (MB) | CPU FPS | GPU FPS |
|---|
| YOLOv3 | FP32 | 405 | 1.1 | 50.0 |
| YOLOv3 | Static Int8wInt8a | 102 | 1.4 | 35.0 |
| YOLOv3 | Static Int4wInt8a | 52 | 1.6 | N/A |
| YOLO-DWSC | FP32 | 43 | 23.3 | 256.4 |
| YOLO-DWSC | Static Int8wInt8a | 11 | 17.4 | 104.2 |
Table 7.
Quantised and full-precision YOLO-DWSC results on the held-out SPARK 2022 test split.
| Method | mAP50 | mAP50:95 | CPU FPS | GPU FPS | Size (MB) |
|---|
| FP32 | 0.849 | 0.631 | 23.3 | 256.4 | 43 |
| Static Int8wInt8a | 0.771 | 0.556 | 17.4 | 104.2 | 11 |
Table 8.
Derived comparison of YOLO-DWSC with full-precision YOLOv3. Values are computed from the same held-out test evaluation.
| Model | Size Relative to YOLOv3 FP32 | GPU Speed Relative to YOLOv3 FP32 | mAP50 Retention | mAP50:95 Retention |
|---|
| YOLO-DWSC FP32 | 10.6% | 5.13× | 87.3% | 71.4% |
| YOLO-DWSC Int8wInt8a | 2.7% | 2.08× | 79.3% | 62.9% |
Table 9.
Pruning results for full-precision fine-tuned YOLOv3.
| Method | mAP50 | mAP50:95 | GPU FPS |
|---|
| Unstructured L1, sparsity 0.1 | 0.968 | 0.854 | 39.7 |
| Unstructured L1, sparsity 0.3 | 0.483 | 0.328 | 40.1 |
| Random unstructured, sparsity 0.1 | 0.040 | 0.024 | 40.3 |
| Random unstructured, sparsity 0.3 | 0.000 | 0.000 | 41.3 |
| Structured L2, sparsity 0.1, dim 1 | 0.937 | 0.808 | 40.8 |
| Structured L2, sparsity 0.3, dim 1 | 0.000 | 0.000 | 40.3 |
Table 10.
Two-pass ROI refinement results using fine-tuned YOLOv3.
| Method | mAP50 | mAP50:95 |
|---|
| FP32 two-pass | 0.881 | 0.578 |
| Static Int8wInt8a two-pass | 0.843 | 0.578 |
Table 11.
Main limitations of the present study and their implications.
| Limitation | Consequence |
|---|
| No standard SPARK test-label evaluation | Results are internally comparable but not directly leaderboard-comparable. |
| No modern-detector training runs | The study does not establish competitiveness against YOLOv5/8/10, RT-DETR or recent task-specific detectors. |
| No YOLOv3-tiny baseline under identical conditions | YOLO-DWSC cannot be interpreted as a full ablation of YOLOv3-tiny. |
| No representative spacecraft hardware | Speed results are runtime-specific and cannot validate flight deployment. |
| No optimised pruning pipeline | Pruning results show the weakness of naive pruning, not the limit of hardware-aware iterative pruning. |
| Containment recall is not computed | The containment metric is a proposed mission-oriented extension, not an empirical result of this study. |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |