SLR-Net: Lightweight and Accurate Detection of Weak Small Objects in Satellite Laser Ranging Imagery

Zhu, Wei; Hu, Jinlong; Gong, Weiming; Wang, Yong; Zhang, Yi

doi:10.3390/s26020732

Open AccessArticle

SLR-Net: Lightweight and Accurate Detection of Weak Small Objects in Satellite Laser Ranging Imagery

by

Wei Zhu

^1,2

,

Jinlong Hu

¹

,

Weiming Gong

^1,2

,

Yong Wang

³ and

Yi Zhang

^1,*

¹

Institute of Seismology, China Earthquake Administration, Wuhan 430071, China

²

Hubei Key Laboratory of Earthquake Early Warning, Hubei Earthquake Agency, Wuhan 430071, China

³

Xinjiang Astronomical Observatory, Chinese Academy of Sciences, Urumqi 830011, China

^*

Author to whom correspondence should be addressed.

Sensors 2026, 26(2), 732; https://doi.org/10.3390/s26020732

Submission received: 24 December 2025 / Revised: 15 January 2026 / Accepted: 19 January 2026 / Published: 22 January 2026

(This article belongs to the Section Remote Sensors)

Download

Browse Figures

Versions Notes

Abstract

To address the challenges of insufficient efficiency and accuracy in traditional detection models caused by minute target sizes, low signal-to-noise ratios (SNRs), and feature volatility in Satellite Laser Ranging (SLR) images, this paper proposes an efficient, lightweight, and high-precision detection model. The core motivation of this study is to fundamentally enhance the model’s capabilities in feature extraction, fusion, and localization for minute and blurred targets through a specifically designed network architecture and loss function, without significantly increasing the computational burden. To achieve this goal, we first design a DMS-Conv module. By employing dense sampling and channel function separation strategies, this module effectively expands the receptive field while avoiding the high computational overhead and sampling artifacts associated with traditional multi-scale methods, thereby significantly improving feature representation for faint targets. Secondly, to optimize information flow within the feature pyramid, we propose a Lightweight Upsampling Module (LUM). Integrating depthwise separable convolutions with a channel reshuffling mechanism, this module replaces traditional transposed convolutions at a minimal computational cost, facilitating more efficient multi-scale feature fusion. Finally, addressing the stringent requirements for small target localization accuracy, we introduce the MPD-IoU Loss. By incorporating the diagonal distance of bounding boxes as a geometric penalty term, this loss function provides finer and more direct spatial alignment constraints for model training, effectively boosting localization precision. Experimental results on a self-constructed real-world SLR observation dataset demonstrate that the proposed model achieves an

{mAP}_{50 : 95}

of 47.13% and an F1-score of 88.24%, with only 2.57 M parameters and 6.7 GFLOPs. Outperforming various mainstream lightweight detectors in the comprehensive performance of precision and recall, these results validate that our method effectively resolves the small target detection challenges in SLR scenarios while maintaining a lightweight design, exhibiting superior performance and practical value.

Keywords:

Satellite Laser Ranging (SLR); small object detection; lightweight network; SLR-Net; feature fusion

1. Introduction

Satellite Laser Ranging (SLR) is currently recognized as one of the most precise ground-based optical space geodetic techniques [1,2,3,4]. Its fundamental principle involves calculating the satellite-to-ground distance by precisely measuring the round-trip time of flight of laser pulses between a ground station and cooperative components carried by the target satellite, such as Corner Cube Retroreflectors (CCRs) or Laser Retroreflector Arrays (LRAs). SLR observational data plays an irreplaceable role in fields such as earth-satellite laser time and frequency transfer [5,6], precise orbit determination (POD) [7], the determination of global geodetic reference frames and parameters [8,9,10], and space debris monitoring [11].

The implementation of SLR technology relies on the precise tracking of target satellites. However, in actual observations, constrained by multiple factors such as ephemeris prediction errors, telescope optical axis pointing deviations, and jitter caused by atmospheric turbulence, the accuracy of blind tracking relying solely on ephemeris predictions often fails to meet requirements. Therefore, during the target’s optical visibility period (specifically during nighttime observations, excluding Earth shadow eclipses, where optical imaging is ineffective), it is typically necessary to utilize images acquired by CCD cameras for closed-loop correction to ensure the target remains in the center of the detector’s effective Field of View (FOV) in real-time. However, limited by weak echo signals, stellar interference, and complex noise environments [12] (encompassing sky background radiation, detector dark current, and readout noise), target satellites in CCD images are often submerged in strong background noise, appearing as minute, dim point targets that are extremely difficult to distinguish. Moreover, a ‘stop-and-go’ strategy to avoid laser interference is operationally infeasible, as the introduced latency fails to compensate for high-frequency pointing jitter, necessitating simultaneous detection during laser emission. Consequently, achieving precise detection of such targets has become a critical bottleneck currently constraining the development of end-to-end automated SLR.

Traditional object detection algorithms (e.g., Viola-Jones [13], HOG [14], and DPM [15]) rely on hand-crafted feature design. Constrained by shallow representation capabilities, these methods struggle to capture deep semantic information, resulting in poor robustness to noise and limited generalization performance [16]. In contrast, deep learning-based algorithms have evolved into three mainstream paradigms: the two-stage architecture (e.g., Faster R-CNN [17]) that pursues high precision via a “coarse-to-fine” strategy; the one-stage architecture (e.g., YOLO [18,19,20]) that models detection as a regression problem to balance efficiency; and the Transformer paradigm (e.g., DETR [21]) that introduces self-attention mechanisms to break the locality limitation of convolutions [22].

Benefiting from the powerful feature extraction and semantic abstraction capabilities of deep networks, the aforementioned generic detectors perform exceptionally well in related fields such as astronomical observation and remote sensing interpretation. However, directly transferring them to the SLR task yields suboptimal results. SLR-CCD images are characterized by extremely small targets (sub-pixel level) and high background noise. In such extreme scenarios, the classic downsampling mechanism in CNN architectures often leads to the irreversible loss of critical spatial details. Meanwhile, the global attention mechanism relied upon by Transformers struggles to focus on informative regions for point targets lacking texture and shape semantics, tending instead to aggregate background noise. Given the dilemma between feature preservation and noise suppression faced by existing methods, designing an efficient detection network specialized for SLR dim and small targets has become an urgent need in this field.

The remainder of this paper is organized as follows: Section 2 details the proposed SLR-Net architecture, including the DMS-Conv and LUM modules. Section 3 introduces the dataset collection and analyzes target features, followed by the experimental setup, comparative results, and ablation studies. Finally, Section 4 concludes the paper.

2. Method

To effectively address the unique challenges characterizing Satellite Laser Ranging (SLR) imagery—namely, tiny object sizes, low Signal-to-Noise Ratio (SNR), and feature ambiguity caused by blurring—this paper proposes a specialized, optimized, lightweight, and high-performance object detection network.

This study adopts the efficient and concise architecture of YOLOv11 [23] as the baseline framework. While retaining the design philosophy of the efficient CSPNet (Cross Stage Partial Network) backbone and the PANet (Path Aggregation Network), targeted reconstructions are implemented at key nodes of feature extraction and fusion. However, when directly applied to scenarios involving extremely small targets such as SLR, the standard configuration of YOLOv11 exhibits limitations in its standard components regarding feature extraction, multi-scale fusion, and bounding box regression.

Specifically, the main innovations of the proposed method are concentrated in the following three aspects:

Feature Extraction Optimization: To address the issue of tiny object features being easily lost in deep networks, we design novel convolution modules DMS-Conv, to enhance the network’s feature representation capabilities.
Feature Fusion Enhancement: To improve the efficiency of information flow between feature maps of different scales, we propose a more lightweight and efficient upsampling fusion mechanism, the Lightweight Upsampling Module (LUM).
Localization Accuracy Improvement: To overcome the deficiencies of traditional loss functions in bounding box regression for small targets, we introduce a new geometric constraint, MPD-IoU Loss, to guide the model toward more precise localization.

The overall architecture of the proposed SLR-Net is illustrated in Figure 1.

2.1. Dense Multi-Scope Convolution (DMS-Conv)

For the multi-scale, star-like small targets in SLR, expanding the receptive field to fuse contextual information is crucial during feature extraction. Common methods employ parallel multi-scale convolutions or multi-scale dilated convolutions. However, parallel multi-scale convolutions inevitably introduce high computational costs and parameter overheads. While multi-scale dilated convolutions can effectively expand the receptive field at a lower computational cost, their inherent grid effect leads to discontinuous feature sampling. For small targets in SLR, which occupy only a few pixels, it is extremely easy to completely miss the target within the sampling “holes” or destroy the integrity of their feature distribution due to discontinuous sampling, resulting in missed detections or false alarms.

To address this challenge, we designed an efficient Dense Multi-Scope Convolution (DMS-Conv), the structure of which is illustrated in Figure 2. It aims to achieve powerful and rich feature representation with extremely low computational overhead through a Dense Sampling strategy without introducing sampling artifacts. The core idea lies in functionally allocating computational resources along the channel dimension, restricting expensive spatial convolution operations to feature subspaces, thereby capturing diverse receptive field features without introducing sampling artifacts.

Specifically, for a given input feature map

X \in R^{B \times C \times H \times W}

, DMS-Conv first splits it evenly along the channel dimension into two parallel branches:

X = {X_{c h e a p}, X_{g r o u p}}

. Here,

X_{c h e a p} \in R^{B \times C / 2 \times H \times W}

serves as an approximate identity-like mapping and is passed directly to the output to preserve the original information in the input features. The other branch,

X_{g r o u p} \in R^{B \times C / 2 \times H \times W}

, is used for subsequent multi-scale feature extraction, as shown in Equation (1).

We reshape

X_{g r o u p}

along the channel dimension into G groups and apply a corresponding spatial convolution

F_{k_{g}}

from a preset kernel set

K = {k_{1}, \dots, k_{g}}

to each group. To maximize efficiency, the lightweight spatial convolution within each group is performed on a compressed channel width

m i n_c h

(experimentally set to

m i n_c h \geq 16

). The outputs of all groups

{Y_{g}}_{g = 1}^{G}

are concatenated along the channel dimension to form

Y_{g r o u p}

, as shown in Equation (2). Finally, we concatenate the output of the Cheap Branch (

X_{c h e a p}

) with the output of the multi-scale path (

Y_{g r o u p}

) and perform cross-channel information remixing through an efficient

1 \times 1

convolution (linear projection) to obtain the final output Y:

X_{c h e a p}, X_{g r o u p} = Split (X)

(1)

Y_{g r o u p} = Concat {{Conv}_{k_{i}} (X_{g r o u p_{i}})}_{i = 1}^{G}

(2)

Y = {Conv}_{1 \times 1} (Concat (X_{c h e a p}, Y_{g r o u p})), Y \in R^{B \times C \times H \times W}

(3)

This design strictly limits the computationally intensive

k \times k

convolutions to the feature subspace while utilizing

1 \times 1

convolutions to restore full-channel information interaction. DMS-Conv not only effectively expands the receptive field with minimal computational overhead but also enhances feature diversity and expressiveness, forming a richer representation. In the network design, we apply DMS-Conv within the bottleneck structure with channel reduction to maximize its efficiency and performance gains.

2.2. Lightweight Upsampling Module (LUM)

In the neck structure of the detector, the upsampling operation is responsible for restoring high-level semantic features to a higher resolution for fusion with low-level detailed features. However, traditional upsampling methods, such as Transposed Convolution which incurs huge computational overheads, often become a bottleneck in lightweight model design. To construct a more efficient and powerful feature fusion path, we designed and proposed a novel Lightweight Upsampling Module (LUM), the structure of which is illustrated in Figure 3. It aims to replace traditional upsampling layers in a lightweight manner to achieve a balance between efficiency and performance.

Specifically, for a given deep input feature map

X_{i n} \in R^{B \times C \times H \times W}

, the module first doubles its spatial dimensions via bilinear interpolation and immediately applies a Depthwise Separable Convolution (DWC) for preliminary spatial feature extraction, as shown in Equation (4). Since depthwise convolution operates independently per channel, to break the limitation of isolated channel information, we subsequently introduce a Channel Shuffle mechanism. Drawing on the idea of ShuffleNet, this operation breaks the independence between channels through efficient and uniform rearrangement, laying a more efficient feature foundation for subsequent cross-channel information fusion, as shown in Equation (5). Finally, we utilize a

1 \times 1

Pointwise Convolution (PWC) to perform a weighted combination of the shuffled features, achieving cross-channel information fusion while generating the final upsampled output

Y \in R^{B \times C \times 2 H \times 2 W}

. The entire process can be represented by the following sequence of equations:

X^{'} = F_{D W C} (Upsample (X_{i n}))

(4)

X^{″} = ChannelShuffle (X^{'})

(5)

Y = F_{P W C} (X^{″})

(6)

In summary, the LUM module constructs an efficient and powerful upsampling unit by ingeniously combining upsampling, depthwise separable convolution, channel shuffle, and pointwise convolution. It not only significantly reduces the computational burden of the upsampling path but also enhances the cross-channel communication capability of features through the introduction of the channel shuffle mechanism, contributing to the quality of multi-scale feature fusion and thereby improving the detection performance for small-sized targets.

2.3. MPD-IoU Loss

Among existing IoU-series loss functions, CIoU has been widely used for bounding box regression in object detection. It introduces center point distance penalties and aspect ratio constraints on top of the traditional IoU, improving localization accuracy to a certain extent. However, CIoU still has deficiencies in its constraint mechanism: its aspect ratio term contributes limitedly to small object scenarios, and the single-point metric of center distance struggles to fully characterize the alignment differences between the predicted box and the ground truth box at the boundaries.

Therefore, this paper introduces a new loss function—MPD-IoU, whose schematic diagram is shown in Figure 4. Its core idea is to further introduce the Euclidean distances between two sets of diagonal points of the predicted box and the ground truth box as geometric penalties on top of the IoU calculation, thereby describing the spatial consistency of the bounding boxes in greater detail. Specifically, let the predicted box and the ground truth box be

(b_{p}^{x t l}, b_{p}^{y t l}, b_{p}^{x b r}, b_{p}^{y b r})

and

(b_{g}^{x t l}, b_{g}^{y t l}, b_{g}^{x b r}, b_{g}^{y b r})

respectively; MPD-IoU is defined as:

MPDIoU = IoU - \frac{d_{t l}}{S} - \frac{d_{b r}}{S}

(7)

where IoU represents the Intersection over Union of the two boxes.

d_{t l} = \sqrt{{(b_{g}^{x t l} - b_{p}^{x t l})}^{2} + {(b_{g}^{y t l} - b_{p}^{y t l})}^{2}}

is the Euclidean distance of the top-left corner points, and

d_{b r} = \sqrt{{(_{g}^{x b r} - b_{p}^{x b r})}^{2} + {(b_{g}^{y b r} - b_{p}^{y b r})}^{2}}

is the Euclidean distance of the bottom-right corner points. S is a normalization factor to control the magnitude of the penalty.

Compared with CIoU, MPD-IoU possesses the following advantages:

Enhanced Boundary Alignment Constraint: By simultaneously considering the registration degree of both the top-left and bottom-right corners, MPD-IoU achieves a finer-grained alignment metric at the geometric level compared to a single center point.
Adaptation to Small Object Detection: In scenarios with star-like small targets in SLR, aspect ratio differences contribute limitedly to regression, whereas the deviation of boundary points directly determines whether the target is covered. Thus, MPD-IoU fits the task requirements better.
Optimized Convergence Stability: Corner distance constraints provide clearer gradient information, enabling the model to converge faster to high-quality bounding box predictions during training.

In summary, MPD-IoU maintains the efficiency of CIoU while further reinforcing the spatial geometric constraints of the bounding box, making it particularly suitable for high-precision localization tasks such as low-SNR, star-like small object detection.

3. Experiments

3.1. Dataset

The optimization of deep learning model performance relies heavily on high-quality training data. Addressing the scarcity of CCD image datasets in the Satellite Laser Ranging (SLR) field, this study conducted data collection in the actual operating environment of the TROS1000 [24] system. TROS1000 is the world’s largest aperture mobile SLR system developed by the Institute of Seismology, China Earthquake Administration. It is equipped with a 1-m aperture optical telescope with a maximum range of 36,000 km and is deployed at the Nanshan Observation Station of Xinjiang Astronomical Observatory, Chinese Academy of Sciences.

The dataset was collected from CCD cameras at the ground-based SLR station, covering real observation data under different nights and atmospheric conditions. We used professional annotation tools to precisely label image frames containing satellite laser reflection signals and synchronously labeled non-ranging targets within the field of view. This annotation strategy not only helps accurately screen target candidate regions during SLR observations but also provides data support for multi-task starry sky background detection. Ultimately, the dataset contains 1156 images with 2162 valid target instances. The entire dataset was divided into training, validation, and test sets in a ratio of 7:2:1. Typical samples from the constructed dataset are visualized in Figure 5.

3.2. Dataset Feature Analysis

After constructing the dataset, we systematically analyzed the target features. We first statistically analyzed the center positions of all targets in the dataset and plotted a spatial distribution heatmap (Figure 6). It is evident from the figure that the target distribution is not uniformly random but exhibits significant center aggregation characteristics. The vast majority of target instances are concentrated in the center of the image and its vicinity, particularly in the normalized coordinate range of (0.6, 0.6) to (0.8, 0.7), where target density is highest. This distribution characteristic is highly correlated with the automatic tracking task of the SLR system, which strives to maintain the target under test at the center of the field of view.

Target size is also a key factor determining detection difficulty. As shown in Figure 6, we visualized the pixel dimensions (width and height) of all target instances. The scatter plot clearly reveals the core challenge of this dataset: target sizes are generally extremely small. Widths are concentrated between 5 and 15 pixels, and heights between 5 and 20 pixels. The size distribution of the entire dataset forms a dense cluster in the bottom-left corner of the chart, with only a very few large outliers. This typical tiny target distribution characteristic poses a significant test to the detection capability of the model.

3.3. Experimental Environment and Evaluation Metrics

The experimental environment configuration is shown in Table 1.

The training parameters were set as follows: training duration of 100 epochs, batch size of 16, and image size of

640 \times 640

. The model uses the SGD optimizer for parameter optimization, with an initial learning rate of 0.01 and a momentum parameter of 0.937. To prevent overfitting, a weight decay strategy was adopted with a value of

5 \times 10^{- 4}

.

To comprehensively evaluate the model’s effectiveness, Precision (P), Recall (R),

{mAP}_{50}

,

{mAP}_{75}

,

{mAP}_{50 : 95}

, and F1-Score were selected as metrics. Additionally, Params and GFLOPs were used to compare model parameters and running speed.

The metrics are defined as follows:

Precision (P):

$P = \frac{T P}{T P + F P},$

where TP is True Positives and FP is False Positives.
Recall (R):

$R = \frac{T P}{T P + F N},$

where FN is False Negatives.
F1-Score:

$F 1 = 2 \times \frac{P \times R}{P + R} .$
Mean Average Precision (mAP): For a given IoU threshold t,

${AP}_{t} = \int_{0}^{1} precision (r) d r,$

and

${mAP}_{t} = \frac{1}{C} \sum_{c = 1}^{C} A P_{t, c} .$
${mAP}_{50 : 95}$ :

${mAP}_{50 : 95} = \frac{1}{10} \sum_{i = 0}^{9} {mAP}_{0.50 + 0.05 i} .$

3.4. Ablation Experiments

We conducted extensive ablation studies to systematically verify the effectiveness and efficiency of the innovative components proposed in this paper—DMS-Conv, Lightweight Upsampling Module (LUM), and MPD-IoU Loss. All experiments were performed on the SLR dataset described in Section 3.5.2, and results are summarized in Table 2.

Regarding the Architectural Component Analysis, we first validated the performance of the DMS-Conv and LUM components. It was found that integrating either module individually did not show an overwhelming advantage in core metrics measuring localization accuracy. However, when both modules were integrated, we observed significant synergistic gains: the model’s

{mAP}_{50}

and Recall reached globally optimal levels. This clearly indicates a strong complementarity between the powerful feature extraction capability of DMS-Conv and the efficient feature fusion path of LUM. Their combination enables the model to “see clearer and find more completely” from strong noise.

In the Loss Function Analysis, based on the optimal architecture, the introduction of the MPD-IoU loss function achieved a significant breakthrough in core localization accuracy metrics, especially under stricter evaluation standards like

{mAP}_{50 : 95}

and

{mAP}_{75}

. Precision and F1-Score also reached their global best. This result strongly confirms that while the optimized architecture gives the model the ability to “see clearly”, the advanced MPD-IoU loss teaches it how to “draw accurately”. For SLR targets with blurred edges, the direct geometric constraints provided by MPD-IoU are key to high-precision localization.

As for the Model Complexity Analysis, Table 2 also reports the complexity and computational overhead. From the baseline to our final complete model, Parameters decreased slightly from 2.58 M to 2.57 M, while GFLOPs increased slightly from 6.3 to 6.7. This indicates that the significant performance improvement comes from superior, problem-specific architectural design rather than simply increasing parameters.

3.5. Comparative Experiments

To validate the effectiveness of SLR-Net, we conducted comparative experiments against 15 mainstream object detectors. The benchmarks encompass classical two-stage algorithms (e.g., Faster R-CNN [17]), the YOLO series [20,23,26,27,28,29,30,31], and detectors representing the Transformer paradigm (e.g., DINO [32]). Detailed comparative data are presented in Table 3.

3.5.1. Results and Analysis

In terms of detection accuracy, our model achieved an

{mAP}_{50}

of 92.36% and an F1-Score of 88.24%. With a comparable parameter count (2.57 M), its

{mAP}_{50}

surpassed YOLOv10-n (+3.81%) and YOLOv8-n (+10.71%), respectively, and also exceeded the medium-scale network YOLOv5-m (25.05 M). The data indicate that the feature enhancement module tailored for point targets effectively overcomes the bottleneck of lightweight networks in extracting weak features.

In terms of model adaptability, the experiments revealed that several large-scale general-purpose detectors exhibited suboptimal performance on this specific task. Traditional two-stage algorithms showed limited accuracy; even DINO, a benchmark model based on the Transformer architecture with 47.54 M parameters, achieved an

{mAP}_{50}

of 87.70%, which is lower than that of our model. This phenomenon may be attributed to the fact that excessively deep layers or global attention mechanisms, when processing point targets lacking semantic information, are more prone to introducing background noise interference, thereby constraining detection performance.

Regarding inference efficiency and limitations, the model achieved an inference speed of 130.39 FPS on a single GPU (2.57 M Params/6.70 GFLOPs). It is worth noting that the computational modules introduced to enhance the capture of weak targets imposed a certain inference burden, resulting in a decrease in speed compared to the Base model (201.28 FPS) and some minimalist models (e.g., YOLOv3-tiny). However, considering the stringent accuracy requirements of SLR systems and the fact that 130 FPS far exceeds the real-time processing standard (>30 FPS), this strategy of trading a minor speed loss for significant accuracy gains (

+ 1.45 %

in

{mAP}_{50}

) is considered acceptable and efficient for practical deployment.

To rigorously validate the effectiveness and stability of the proposed method, we compared SLR-Net with the YOLOv11 baseline across five independent experimental runs to account for stochastic fluctuations. Table 4 reports the mean and standard deviation for key metrics. As shown, SLR-Net achieves consistent improvements across all indicators. While the improvement in general detection (

{mAP}_{50}

) is steady (

+ 0.90 %

), the most significant gain is observed in strict localization accuracy (

{mAP}_{75}

), which increases by 3.88% (from 40.02% to 43.90%). In the context of Satellite Laser Ranging (SLR), the telescope servo system relies on precise centroid coordinates to maintain stable tracking; a “loose” detection (low IoU) can induce jitter. Therefore, this substantial boost in high-IoU performance demonstrates that the proposed MPD-IoU Loss and DMS-Conv significantly refine bounding box regression, transforming “rough detection” into the “high-precision localization” required for automated observations.

3.5.2. Visualization Analysis

To further qualitatively analyze the detection behavior of different models, we provide visualization results including response heatmaps and final detection outputs, as shown in Figure 7 and Figure 8. These visualizations are generated on representative SLR scenes with strong background clutter, varying noise levels, and extremely small targets.

The heatmap comparisons (Figure 7) reveal that mainstream detectors tend to activate strongly on high-intensity background regions or structured noise, which often leads to false positives. In contrast, the proposed SLR-Net produces more compact and target-centered responses, with suppressed background activation. This indicates that SLR-Net is able to better capture discriminative cues of weak targets while mitigating interference from complex background patterns.

In terms of detection results (Figure 8), several challenging cases are illustrated, including sparse star-like targets and low-contrast targets embedded in clutter. As highlighted in the examples, baseline models either miss the target (false negatives) or incorrectly respond to background artifacts (false positives). Benefiting from its enhanced feature representation and balanced precision–recall behavior, SLR-Net successfully detects these targets with accurate localization and reduced false alarms.

Overall, the visualization results are consistent with the quantitative evaluations in Table 3, demonstrating that the proposed model not only achieves competitive performance in terms of metrics, but also exhibits superior robustness and reliability in real SLR detection scenarios.

4. Conclusions

To address the challenges associated with Satellite Laser Ranging (SLR) imagery specifically, minute target sizes, low signal-to-noise ratios (SNRs), and susceptibility to feature loss—this paper proposes an efficient and lightweight detection network named SLR-Net. The network innovatively incorporates the DMS-Conv module, which effectively enhances feature extraction capabilities and expands the receptive field by employing dense sampling and channel separation strategies. Simultaneously, the Lightweight Upsampling Module (LUM) is utilized to optimize multi-scale feature fusion, and in conjunction with the MPD-IoU loss function, the localization accuracy for minute targets is significantly improved.

Experimental results demonstrate that SLR-Net achieves superior performance on real-world SLR datasets. With only 2.57 M parameters, it attains an

{mAP}_{50 : 95}

of 47.13%, significantly outperforming current mainstream lightweight detectors while maintaining extremely low computational costs. This study not only validates the practical value of the proposed method in automated SLR observation systems but also provides a solid foundation for future real-time deployment on edge computing devices. Future work will focus on further expanding the dataset and exploring the generalization capability of the model in more complex environments.

Author Contributions

Conceptualization, J.H., W.G. and Y.Z.; Methodology, J.H. and Y.Z.; Software, Y.Z.; Validation, J.H. and Y.Z.; Formal analysis, Y.Z.; Investigation, W.G., Y.W. and Y.Z.; Data curation, J.H.; Writing—original draft, Y.Z.; Writing—review & editing, W.Z., J.H., W.G., Y.W. and Y.Z.; Visualization, W.Z., W.G. and Y.Z.; Supervision, W.Z., J.H., W.G. and Y.W.; Project administration, W.Z. and Y.W.; Funding acquisition, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Degnan, J.J. Satellite Laser Ranging: Current Status and Future Prospects. IEEE Trans. Geosci. Remote Sens. 1985, GE-23, 398–413. [Google Scholar] [CrossRef]
Schreiber, K.U.; Kodet, J. The Application of Coherent Local Time for Optical Time Transfer and the Quantification of Systematic Errors in Satellite Laser Ranging. Space Sci. Rev. 2017, 214, 22. [Google Scholar] [CrossRef]
Marshall, J.A.; Klosko, S.M.; Ries, J.C. Dynamics of SLR Tracked Satellites. Rev. Geophys. 1995, 33, 353–360. [Google Scholar] [CrossRef]
Fumin, Y. Current Status And Future Plans For The Chinese Satellite Laser Ranging Network. Surv. Geophys. 2001, 22, 465–471. [Google Scholar] [CrossRef]
Wu, Z.; Geng, R.; Tang, K.; Meng, W.; Zhang, H.; Cheng, Z.; Xiao, A.; Gao, S.; Wang, X.; Huang, Y.; et al. Experiments and Progress of Space-to-Ground Laser Time-Frequency Transfer for the China Space Station. Acta Opt. Sin. 2025, 45, 286–294. [Google Scholar]
Geng, R.; Wu, Z.; Huang, Y.; Lin, H.; Yu, R.; Tang, K.; Zhang, H.; Zhang, Z. Experimental Study on Transponder Laser Time Transfer Based on Satellite Retroreflectors. Chin. J. Lasers 2023, 50, 280–289. [Google Scholar]
Xiao, W.; Wu, Z.; Li, Z.; Fan, L.; Guo, S.; Chen, Y. Research on the Autonomous Orbit Determination of Beidou-3 Assisted by Satellite Laser Ranging Technology. Remote Sens. 2025, 17, 2342. [Google Scholar] [CrossRef]
Schreiber, K.U.; Hugentobler, U.; Kodet, J.; Stellmer, S.; Klügel, T.; Wells, J.P.R. Gyroscope Measurements of the Precession and Nutation of Earth’s Axis. Sci. Adv. 2025, 11, eadx6634. [Google Scholar] [CrossRef]
Li, X.; Lou, J.; Yuan, Y.; Wu, J.; Zhang, K. Determination of Global Geodetic Parameters Using Satellite Laser Ranging to Galileo, GLONASS, and BeiDou Satellites. Satell. Navig. 2024, 5, 10. [Google Scholar] [CrossRef]
Cheng, M. Decadal Variations in Equatorial Ellipticity and Principal Axis of the Earth from Satellite Laser Ranging/GRACE. Surv. Geophys. 2024, 45, 1601–1626. [Google Scholar] [CrossRef]
Steindorfer, M.A.; Wang, P.; Koidl, F.; Kirchner, G. Space Debris and Satellite Laser Ranging Combined Using a Megahertz System. Nat. Commun. 2025, 16, 575. [Google Scholar] [CrossRef]
Newberry, M.V. SIGNAL-TO-NOISE CONSIDERATIONS FOR SKY-SUBTRACTED CCD DATA. Publ. Astron. Soc. Pac. 1991, 103, 122. [Google Scholar] [CrossRef]
Viola, P.; Jones, M. Rapid Object Detection Using a Boosted Cascade of Simple Features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA, 8–14 December 2001; Volume 1, pp. I–I. [Google Scholar] [CrossRef]
Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar] [CrossRef]
Felzenszwalb, P.; McAllester, D.; Ramanan, D. A Discriminatively Trained, Multiscale, Deformable Part Model. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar] [CrossRef]
Zhang, X.; Yang, Y.H.; Han, Z.; Wang, H.; Gao, C. Object Class Detection. ACM Comput. Surv. 2013, 46, 1–53. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. arXiv 2016, arXiv:1612.08242. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar] [CrossRef]
Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
Tangyong, G.; Peiyuan, W.; Xin, L.; Wei, Z.; Tong, Z.; Shipeng, L.; Qingshan, L. Progress of the satellite laser ranging system TROS1000. Geod. Geodyn. 2015, 6, 67–72. [Google Scholar][Green Version]
Ma, S.; Xu, Y. MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression. arXiv 2023, arXiv:2307.07662. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
Yaseen, M. What Is YOLOv8: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector. arXiv 2024, arXiv:2408.15857. [Google Scholar] [CrossRef]
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar] [CrossRef]
Caron, M.; Touvron, H.; Misra, I.; Jégou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. Emerging Properties in Self-Supervised Vision Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 9650–9660. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. RTMDet: An Empirical Study of Designing Real-Time Object Detectors. arXiv 2022, arXiv:2212.07784. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into High Quality Object Detection. arXiv 2017, arXiv:1712.00726. [Google Scholar] [CrossRef]

Figure 1. Structure of SLR-Net.

Figure 2. Structure of DMS-Conv.

Figure 3. Structure of LUM.

Figure 4. Schematic diagram of MPD-IoU.

Figure 5. Visualization of typical samples from the dataset.

Figure 6. Spatial and scale distribution analysis of the ground truth bounding boxes.

Figure 7. Visual comparison of response heatmaps between baseline models and SLR-Net.

Figure 8. Qualitative detection results on challenging SLR scenes with different methods.

Table 1. Experimental Environment Configuration.

Item	Specification
Operating System	Ubuntu 22.04
GPU	RTX 4090 (24 GB)
CPU	16 vCPU Intel (Santa Clara, CA, USA) Xeon(R) Platinum 8352V CPU @ 2.10 GHz
Memory	120 GB
Programming Language	Python 3.10
Framework	PyTorch 2.1.0 + CUDA 12.1
IDE	JupyterLab

Table 2. Ablation study of SLR-Net on the SLR dataset. DMS: DMS-Conv, LUM: Lightweight Upsampling Module, MPD: MPD-IoU loss [25]. Best results are shown in bold, and second-best are underlined.

Model	Components			Complexity		Performance (%)
Model	DMS	LUM	MPD	Params (M)	GFLOPs	Prec.	Recall	F1	${mAP}_{50}$	${mAP}_{75}$	${mAP}_{50 : 95}$
Baseline				2.58	6.30	88.51	86.89	87.69	90.91	42.65	47.24
+ LUM		✓		2.59	6.30	88.87	86.77	87.80	91.20	42.70	47.22
+ DMS	✓			2.56	6.20	89.23	86.64	87.91	91.49	42.76	47.20
+ DMS + LUM	✓	✓		2.57	6.70	89.94	86.39	88.13	92.07	42.87	47.15
SLR-Net (Proposed)	✓	✓	✓	2.57	6.70	90.30	86.27	88.24	92.36	42.92	47.13

Table 3. Comparison with state-of-the-art detectors on the SLR dataset. Best results are shown in bold, and second-best are underlined.

Model	Params (M)	GFLOPs	Prec. (%)	Recall (%)	F1 (%)	${mAP}_{50}$ (%)	${mAP}_{75}$ (%)	${mAP}_{50 : 95}$ (%)	FPS
SLR-Net (Proposed)	2.57	6.70	90.30	86.27	88.24	92.36	42.92	47.13	130.39
Base	2.58	6.30	88.51	86.89	87.69	90.91	42.65	47.24	201.28
YOLO-based Detectors
YOLOv3 [20]	103.67	282.20	89.57	84.15	86.77	89.45	40.11	46.93	121.94
YOLOv3-tiny [20]	12.13	18.90	79.17	37.25	50.67	59.26	14.16	24.92	275.76
YOLOv5-m	25.05	64.00	87.85	85.29	86.55	91.41	43.64	47.27	176.75
YOLOv5-n	2.50	7.10	86.98	85.11	86.03	87.44	36.50	44.11	216.34
YOLOv6-m [27]	51.98	161.10	84.30	77.45	80.73	83.36	23.76	38.00	195.63
YOLOv6-n [27]	4.23	11.80	70.90	88.24	78.63	82.79	19.21	35.99	221.42
YOLOv8-m [29]	25.84	78.70	86.45	87.25	86.85	91.35	42.57	48.22	195.79
YOLOv8-n [29]	3.01	8.10	81.85	83.82	82.83	81.65	30.83	38.90	211.24
YOLOv9-m [30]	20.16	77.00	86.32	86.63	86.48	90.94	40.08	46.50	124.36
YOLOv9-t [30]	1.97	7.60	87.90	87.75	87.82	90.94	43.34	47.35	137.14
YOLOv10-m [31]	15.31	58.90	84.35	79.90	82.07	89.54	46.06	48.43	161.16
YOLOv10-n [31]	2.27	6.50	86.31	80.37	83.23	88.55	38.24	45.63	205.11
Anchor-Free/Transformer-based Detectors
YOLOX-S [33]	8.94	8.52	93.37	82.84	87.79	85.80	27.80	39.50	120.35
DINO [32]	47.54	80.70	85.37	85.78	85.57	87.70	25.60	38.50	39.95
FCOS [34]	32.11	50.29	97.50	38.24	54.93	78.90	20.30	32.00	100.87
RTMDet-S [35]	8.86	9.44	89.94	70.10	78.79	79.70	16.70	30.80	106.15
RetinaNet [36]	36.33	52.20	52.53	76.47	62.28	60.80	3.30	20.30	99.63
Faster R-CNN [17]	41.35	63.18	74.36	14.22	23.87	18.50	2.20	5.70	91.12
Cascade R-CNN [37]	69.15	90.98	33.33	1.47	2.82	3.10	0.00	0.60	68.43

Table 4. Statistical Performance Comparison. Evaluation of stability and accuracy between YOLOv11 and SLR-Net across 5 independent runs (

N = 5

). The results (Mean ± Std) demonstrate significant improvements in high-precision localization metrics (

{mAP}_{75}

).

Table 4. Statistical Performance Comparison. Evaluation of stability and accuracy between YOLOv11 and SLR-Net across 5 independent runs (

N = 5

). The results (Mean ± Std) demonstrate significant improvements in high-precision localization metrics (

{mAP}_{75}

).

Metric	YOLOv11 (Base)	SLR-Net (Proposed)	$Δ$ Mean
${mAP}_{50}$ (%)	$92.40 \pm 1.25$	$93.30 \pm 0.71$	+0.90
${mAP}_{50 : 95}$ (%)	$46.20 \pm 0.99$	$47.78 \pm 0.87$	+1.58
${mAP}_{75}$ (%)	$40.02 \pm 3.23$	$43.90 \pm 2.55$	+3.88
Precision (%)	$90.00 \pm 1.80$	$90.64 \pm 1.19$	+0.64
Recall (%)	$86.48 \pm 1.59$	$87.50 \pm 1.58$	+1.02
F1-Score (%)	$88.18 \pm 0.73$	$89.02 \pm 0.63$	+0.84

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, W.; Hu, J.; Gong, W.; Wang, Y.; Zhang, Y. SLR-Net: Lightweight and Accurate Detection of Weak Small Objects in Satellite Laser Ranging Imagery. Sensors 2026, 26, 732. https://doi.org/10.3390/s26020732

AMA Style

Zhu W, Hu J, Gong W, Wang Y, Zhang Y. SLR-Net: Lightweight and Accurate Detection of Weak Small Objects in Satellite Laser Ranging Imagery. Sensors. 2026; 26(2):732. https://doi.org/10.3390/s26020732

Chicago/Turabian Style

Zhu, Wei, Jinlong Hu, Weiming Gong, Yong Wang, and Yi Zhang. 2026. "SLR-Net: Lightweight and Accurate Detection of Weak Small Objects in Satellite Laser Ranging Imagery" Sensors 26, no. 2: 732. https://doi.org/10.3390/s26020732

APA Style

Zhu, W., Hu, J., Gong, W., Wang, Y., & Zhang, Y. (2026). SLR-Net: Lightweight and Accurate Detection of Weak Small Objects in Satellite Laser Ranging Imagery. Sensors, 26(2), 732. https://doi.org/10.3390/s26020732

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SLR-Net: Lightweight and Accurate Detection of Weak Small Objects in Satellite Laser Ranging Imagery

Abstract

1. Introduction

2. Method

2.1. Dense Multi-Scope Convolution (DMS-Conv)

2.2. Lightweight Upsampling Module (LUM)

2.3. MPD-IoU Loss

3. Experiments

3.1. Dataset

3.2. Dataset Feature Analysis

3.3. Experimental Environment and Evaluation Metrics

3.4. Ablation Experiments

3.5. Comparative Experiments

3.5.1. Results and Analysis

3.5.2. Visualization Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI