Enhanced RT-DETR with Dynamic Cropping and Legendre Polynomial Decomposition Rockfall Detection on the Moon and Mars

Zang, Panpan; He, Jinxin; Yang, Yongbin; Li, Yu; Zhang, Hanya

doi:10.3390/rs17132252

Open AccessArticle

Enhanced RT-DETR with Dynamic Cropping and Legendre Polynomial Decomposition Rockfall Detection on the Moon and Mars

by

Panpan Zang

,

Jinxin He

^*,

Yongbin Yang

,

Yu Li

and

Hanya Zhang

College of Earth Sciences, Jilin University, Changchun 130061, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(13), 2252; https://doi.org/10.3390/rs17132252

Submission received: 1 May 2025 / Revised: 27 June 2025 / Accepted: 28 June 2025 / Published: 30 June 2025

(This article belongs to the Special Issue Remote Sensing of Target Object Detection and Identification (Third Edition))

Download

Browse Figures

Versions Notes

Abstract

The analysis of rockfall events provides critical insights for deciphering planetary geological processes and reconstructing environmental evolutionary timelines. Conventional visual interpretation methods that rely on orbiter imagery can be inefficient due to their massive datasets and subtle morphological signatures. While deep learning technologies, particularly object detection models, demonstrate transformative potential, they require specific adaptation to planetary imaging constraints, including low contrast, grayscale inputs, and small-target detection. Our coordinated optimization strategy integrates dynamic cropping optimization with architectural innovations: Kolmogorov–Arnold Network based C3 module (KANC3) replaces RepC3 through Legendre polynomial decomposition to strengthen feature representation, while our dynamic cropping strategy significantly improves small-target detection in low-contrast grayscale imagery by mitigating background and target imbalance. Experimental validation on the optimized RMaM-2020 dataset demonstrates that Real-Time Detection Transformer with a ResNet-18 backbone and Kolmogorov–Arnold Network based C3 module (RT-DETR-R18-KANC3) achieves 0.982 precision, 0.955 recall, and 0.964 mAP50 under low-contrast conditions, representing a 1% improvement over the baseline model and exceeding YOLO-series models by >40% in relative performance metrics.

Keywords:

Moon and Mars; object detection; rockfall; RT-DETR

Graphical Abstract

1. Introduction

The exploration of extraterrestrial environments has been a consistent and significant frontier in scientific research. The Moon and Mars, the nearest celestial bodies, are at the forefront of this fascinating venture. In addition to enabling astronomical observations, planetary probes serve as critical research tools. Scientists use the data obtained through probes like the Lunar Reconnaissance Orbiter (LRO) [1] and Mars Reconnaissance Orbiter (MRO) [2], including a large number of high-resolution images, for in-depth planetary research. In recent years, deep learning has demonstrated considerable potential in computer vision, which is an essential approach to analyzing imagery data acquired via probes [3].

The rockfall phenomenon is a ubiquitous mass wasting process that occurs on many celestial bodies, including the Moon and Mars [4]. Rockfall formation mechanisms encompass complex interactions between natural and environmental factors, primarily induced by endogenous or exogenous events such as impact events [5,6], thermal stress and weathering [7,8], seismic activity [5,9], gravitational influences [5,7], and ice heaving and sublimation [10].

Investigating the origins of rockfalls on extraterrestrial surfaces is crucial for comprehensively understanding the geological characteristics and environmental conditions of these celestial bodies as well as for explaining complex geological phenomena [7]. The earliest lunar observations were conducted during the Apollo missions of the 1970s, where Shoemaker and his colleagues undertook pioneering research [11]. The photographic data from these missions laid the foundation for our understanding of the various processes affecting lunar topography, especially those potentially induced by seismic activity [11]. As we entered the 2000s, extraterrestrial exploration advanced significantly with the application of high-resolution imaging technologies. McEwen et al. (2007) [2] employed HiRISE data collected by the MRO to investigate Martian gullies and slopes, exploring potential rockfall triggers and their implications for water-related processes. Chuang et al. (2007) [12] used high-resolution imagery to analyze the triggering mechanisms of slope stripes, encompassing rockfalls, impact explosions, and seismic activities. Technological innovation drove methodological advancements, exemplified by Senthil Kumar’s team (2016), who innovatively integrated high-resolution image interpretation with numerical simulations during Schrödinger basin structural scarp investigations, quantitatively revealing ballistic transport ranges of lunar surface boulders up to 1.2 km under 1.62 m/s² gravitational acceleration [13]. Kokelaar et al. (2017) conducted an in-depth study of the particle collapse phenomena on the lunar surface utilizing high-resolution images from the LRO [14]. Additionally, Bickel et al. (2019) introduced a novel approach to evaluating the bearing capacity of the lunar surface by measuring rockfall trajectories, with the developed model accurately capturing the soil characteristics of various regions [15]. Notably, machine learning frameworks have been utilized for automatically detecting and classifying rockfall-related features within large-scale image datasets [16], thereby enhancing detection accuracy and providing new insights into their spatial distribution and evolution [17]. Research on rockfalls on the Moon and Mars has advanced from photogeological analyses [17] to machine learning-based detection frameworks. Simultaneously, research methods have evolved from manual feature matching to deep learning implementations, particularly through transfer learning [4,16,17].

A rockfall occurs when meter-scale boulders detach from steep terrains, rapidly descending through rolling, bouncing, and sliding mechanisms, often leaving distinctive trails [18]. A comparative analysis of lunar and Martian rockfalls uncovers fundamental differences in geological expressions and physical mechanisms. Lunar rockfalls exhibit unique dispersion patterns and extended trajectories under microgravity (1.62 m/s²) and near-vacuum conditions (

10^{- 12}

kPa), with regolith interactions preserving long-term geological records [19,20,21]. In contrast, Martian rockfalls demonstrate a greater range of dynamic diversities influenced by moderate gravity at 3.71 m/s² and atmospheric drag at approximately 0.6 kPa [22,23], with seasonal water–ice cycles creating transient lubrication layers [7]. These planetary-scale processes collectively record the history of lunar impacts and the evolution of Martian surfaces, while shared characteristics such as boulder morphology and trail persistence provide significant insights. Notably, patterns of rock fractures induced by solar radiation [7] and seismically active zones [20] can be discerned through rockfall distribution analysis. Recent progress in automated rockfall detection has involved employing DL frameworks to analyze satellite imagery. Bickel et al. (2020) showcased the efficacy of using Retina-Net for lunar rockfall identification, achieving 83% precision with a 40% increase in processing speed compared to manual methods [17]. Subsequent work by Bickel et al. (2020) [20] generated the first global lunar rockfall inventory of this architecture via transfer learning adaptation, revealing unprecedented surface erosion patterns. Expanding this approach, Bickel et al. (2020) pioneered multi-domain learning by integrating Martian and lunar datasets across six CNN architectures (ResNet-50 to EfficientNet-B7), showing a 22% improvement in accuracy in cross-planetary detection tasks [16]. Zoumpekas et al. (2021) [24] addressed terrestrial applications through an attention-based U-Net framework with synthetic data augmentation, resolving class imbalance issues (an F1-score improvement from 0.62 to 0.79). Crucially, these studies demonstrate that domain adaptation techniques can enhance detector robustness across planetary bodies. The engineering feasibility of YOLOv8 and MobileNetv2 for real-time lunar rock detection was validated by Rajasekhar et al. (2024). Their LSOD-2023 multi-source dataset achieved 37 FPS inference speeds, providing a lightweight solution for autonomous obstacle avoidance in future unmanned probes [25].

Despite these advances, significant challenges persist in adapting deep learning to planetary contexts. Low-contrast conditions in grayscale images obscure subtle textural differences between rockfalls and backgrounds. The absence of chromatic information in grayscale inputs eliminates color-based discriminative features. Small-target detection is compromised by scale imbalances where rockfalls occupy <2% of the image area. These constraints collectively degrade feature extraction and localization accuracy in existing frameworks.

This study conducts a systematic investigation into dynamic cropping strategy and optimization methodologies for the RT-DETR-R18 model. The introductory section examines the formation mechanisms and geological implications of rockfall phenomena observed on lunar and Martian surfaces. Recent advancements in deep learning-based automated detection methodologies are systematically reviewed, highlighting their transformative potential in planetary geological analysis. The Materials and Methods section elaborates on the implementation principles of the dynamic cropping strategy while delineating the refinement pathways for RT-DETR-R18 through KANC3 module integration and reconstruction. The Conclusion section reveals that dynamic cropping and architectural modifications demonstrate efficacy in enhancing rockfall detection performance and its significance. Current constraints, including suboptimal contrast characteristics and textural ambiguities, are identified as critical limiting factors. Future research may prioritize the development of multimodal fusion coupled with advanced image enhancement techniques to address these limitations in grayscale images.

2. Materials and Methods

2.1. Datasets

The dataset used in this study was obtained from the publicly accessible website https://edmond.mpdl.mpg.de/imeji/collection/DowTY91csU3jv9S2 (accessed on 27 June 2025). Bickel et al. (2021) [26] developed a well-balanced dataset consisting of 2822 labeled rockfall instances (1508 from Mars and 1314 from the Moon), ‘RMaM-2020’, derived from high-resolution orbital imagery captured by NASA’S HiRISE (Mars Reconnaissance Orbiter, 0.25–0.75 m/pixel) and LRO NAC (Lunar Reconnaissance Orbiter, 0.5–2.2 m/pixel). More specifically, the lunar dataset contains 814 grayscale images with 300 negative instances, while the Martian dataset includes 694 grayscale images with 300 negative instances (Figure 1 and Figure 2).

In order to explore certain characteristics of the dataset, we utilized a Python program within the Spyder 5.5.1 to generate distribution maps, illustrating the area ratio of rockfall phenomena on the Moon and Mars based on information from CSV files (Figure 3). The resulting figures indicate that the distribution of rockfalls in terms of area ratio is remarkably similar between the Moon and Mars. The horizontal axis represents two data sources, “Moon” and “Mars”, while the vertical axis denotes the proportion of fallen rock area within the images. The box plots clearly depict the data distribution for both celestial bodies, with median area ratios for both lunar and Martian rockfalls remaining approximately consistent around 1.0%. The interquartile range is also very close, primarily concentrated between 0.5% and 2.5%. Overall, it can be concluded that lunar and Martian rockfall area ratios are predominantly concentrated between 0% and 2%, exhibiting a dense distribution. This indicates that an overwhelming majority of ground truths occupy relatively small areas, potentially obscuring micro-scale rockfall targets with complex geological backgrounds.

Based on Figure 3, morphological processing methods were additionally employed to try to extract features from the raw images. The proposed method achieves the collaborative features extraction of Martian and lunar rockfall boulders and traces through a multi-stage image processing pipeline. Figure 4 and Figure 5 illustrate the processing workflow for two representative samples.

(a): The original image displays the raw grayscale input preserving the original textural features of the lunar and Martian surface. Annotations are explicitly marked with blue bounding boxes, where category labels are anchored at the top of each box.
(b): Enhanced boulder: Local contrast is amplified using CLAHE [27] ( $C l i p L i m i t = 3.0, T i l e G r i d = 2 \times 2$ ), coupled with bilateral filtering ( $σ d = 85, σ r = 85$ ) for noise suppression. A spatial weighting matrix with Gaussian attenuation ( $d e c a y c o e f f i c i e n t = 0.3$ ) focuses on regions through weighted fusion (0.7:0.3), accentuating potential boulder structures.
(c): Boulder detection: Adaptive thresholding with dynamic parameters ( $G a u s s i a n w i n d o w = 15 \times 15$ , $C = 5$ ) generates binarized images. Morphological gradients ( $e l l i p t i c a l k e r n e l = 9 \times 9$ ) enhance edges, followed by contour filling after area filtering ( $t h r e s h o l d = 5 % o f i m a g e a r e a$ ).
(d): High-pass trace: A custom $3 \times 3$ high-pass kernel ( $c e n t r a l w e i g h t = 9$ ) amplifies gradient responses of linear features. Multi-scale morphological gradients ( $k e r n e l s = 3 / 5 / 7$ ) preserve traces of varying thicknesses. Sobel operators [28] ( $k e r n e l = 9$ ) compute gradient magnitudes, with percentile thresholding (P85) extracting candidate regions.
(e): Trace detection: Wolf binarization [29] ( $w i n d o w = 5 \times 5, k = 0.05$ ) optimizes thin-line features, complemented by morphological closing ( $e l l i p t i c a l k e r n e l = 2 \times 2$ ) to connect fragmented segments. Dynamic line parameters ( $m i n_l e n g t h = 3 p x$ , $m a x_g a p = 1 p x$ ) ensure trace continuity, while skeletonization improves morphological representation.
(f): Integrated results: Red contours denote the detected boulder, while green regions indicate trace distributions. The validation mechanism calculates real-time feature coverage (boulder: 0.5–3.0%, trace: 0.1–2.0%).

The results shown in Figure 4 and Figure 5, while demonstrating optimal performance in our experiments, still exhibit elevated anomaly alerts, indicating the insufficient effectiveness of morphology-based feature extraction methods for boulder defects and trace characteristics. A comparative analysis of Figure 4c,e and Figure 5c,e reveals that fixed-scale morphological kernels inadequately adapt to dimensional variations in boulder defects and intensity heterogeneity in trace patterns. This limitation leads to significant spatial overlap of similar features in the composite result Figure 4f and Figure 5f, substantially reducing filtration efficiency for non-target regions. The rockfall trajectory traces present particularly weak textural signatures in raw grayscale images, often being misclassified as background noise.

2.2. Object Detection Model

Currently, object detectors mainly utilize CNN and Transformer-based architectural frameworks. CNN-based detectors have evolved from two-stage [30] to single-stage [31] models that can be categorized into anchor-based [30] and anchor-free [32] detection paradigms. In contrast, Transformer-based detectors (DETRs) [33] simplify the detection process by eliminating handcrafted components, such as Non-Maximum Suppression (NMS), thereby enabling end-to-end object detection. Although existing real-time detectors typically employ CNN-based architectures to balance detection speed and accuracy, their dependence on NMS for post-processing results in inference latency and reduced robustness [34]. Real-Time Detection Transformer (RT-DETR) [35] is an efficient object detection model based on the Transformer architecture, representing an optimized version of DETR. RT-DETR enables real-time object detection by reducing computational complexity and accelerating inference speed. Unlike traditional anchor-based methods, RT-DETR employs an end-to-end detection approach, which directly generates bounding boxes and class labels from images.

2.3. Training, Validation, and Test Datasets

The experimental results illustrated in Figure 1, Figure 2, Figure 3, Figure 4 and Figure 5 reveal that the inherent characteristics of raw grayscale images hinder their direct compatibility with model processing requirements. Primarily, the grayscale imaging mechanism’s reliance on luminance variations results in the inadequate representation intensity of edge features and textural attributes, while the presence of interfering textures in background regions resembling target features induces notable feature aliasing phenomena. Furthermore, the severe imbalance in pixel distribution between target regions and the entire image leads to critical class imbalance issues, causing models to learn substantial background noise and consequently exhibit suboptimal performance metrics. These analytical findings collectively substantiate that the direct utilization of unoptimized raw data as model inputs constitutes a non-ideal solution. Therefore, we developed a dynamic cropping strategy to focus on localized features, which directly mitigates the background–target imbalance by selectively amplifying regions of interest while suppressing irrelevant background pixels. The strategy integrates dynamic scaling and boundary verification, which has three-stage processes and the following core principles:

Coordinate space normalization

A resolution-independent normalized mapping system can be established to achieve cross-scale spatial alignment [34]:

X_{a b s} = X_{n o r m} \times W

(1)

where normalized coordinates

X_{n o r m} \in [0, 1]

are scaled to absolute pixel values through (1).

2.: Logarithmic scaling

A spatial completeness guarantee mechanism can be constructed as follows [36]:

Δ_{r} = β \cdot \ln (1 + \frac{A_{base}}{A_{obj} + ϵ})

(2)

where the baseline scaling ratio

β = 0.35

balances detail preservation and background correlation. Reference area

A_{b a s e} = 400^{2}

matches the conventional target scale distributions.

A_{obj}

denotes the actual pixel area of the target object, calculated from its bounding box coordinates. Small-object enhancement is

Δ_{r} \to β \cdot \ln (10^{5})

when

A_{o b j} \to 0

. Stabilization constant

ϵ = 10^{- 5}

prevents zero-area degeneration.

3.: Boundary-constrained

A spatial completeness guarantee mechanism can be constructed as follows [37]:

x_{i}^{final} = clip (x_{i} \pm \frac{w Δ_{r}}{2}, 0, W)

(3)

where

w = o r i g i n a l w i d t h

and

c l i p

ensures that values remain within the image dimensions. This constraint system ensures that (1) extended regions remain within physical boundaries; (2) central symmetry

\frac{x_{1} + x_{2}}{2} = C

holds consistently; and (3) the aspect ratio perturbation rate is < 2% through dual verification.

The positive images (Figure 6 and Figure 7) from the lunar and Martian datasets underwent dynamic adaptive cropping processing implementing small-target amplification through area-adaptive magnification in Equation (2), while all negative samples were preserved intact as a background dataset, resulting in 2000 positive samples and 600 negative samples. During data reorganization, 2600 cross-domain training [16] samples were randomly divided at an 8:2 ratio to establish new training (2080 samples) and validation sets (520 samples). Simultaneously, 100 cropped lunar test samples and 121 cropped Martian test samples from the original test set were integrated to construct a novel composite test set containing 221 samples. Figure 8 and Figure 9 demonstrate the same morphological feature extraction results as Figure 4 and Figure 5.

2.4. Structure of Enhanced Object Detect Model

The RT-DETR-R18-KANC3 model (Figure 10) introduces architectural modifications to the baseline RT-DETR-R18 framework, primarily replacing the RepC3 modules [35] in the original Feature Pyramid Network (FPN) with KANC3 modules [38]. Critical improvements occur at four strategic positions in the detection head (Layers 14, 19, 22, and 25), where feature enhancement units based on KALNConv2DLayer reconstruct the multi-scale feature fusion mechanism. While preserving the backbone network and spatial pyramid architecture, this modification substitutes the parametric convolutions in the original

3 \times R e p C 3

modules with KALNConv2DLayer, incorporating Legendre polynomial basis expansions. This architectural evolution maintains computational efficiency while strengthening nonlinear representation capabilities through feature decomposition.

Legendre polynomial basis expansions are carried out as follows [39]:

f (x) = \sum_{n = 0}^{N} a_{n} P_{n} (x)

(4)

The target function

f (x)

is defined on

x \in [- 1, 1]

, where the expansion coefficients

a_{n}

are given by the following orthogonal projection:

a_{n} = \frac{2 n + 1}{2} \int_{- 1}^{1} f (x) P_{n} (x) d x

(5)

The Legendre polynomials

P_{n} (x)

form a complete orthogonal basis in the

L^{2} [- 1, 1]

function space, satisfying the orthogonality condition:

\int_{- 1}^{1} P_{m} (x) P_{n} (x) d x = \frac{2}{2 n + 1} δ_{m n} w i t h δ_{m n} = \{\begin{matrix} 1, i f m = n \\ 0, o t h e r w i s e \end{matrix}

(6)

The KANC3 module (Figure 11) achieves structural innovation by inheriting RepC3 architecture while reconstructing feature processing pathways. During feature fusion stages, it employs the serialized stacking of KALNConv2DLayers to establish deep feature interactions through three consecutive KALN convolution operations, directly addressing the integration of fragmented textures in low-visibility environments. The implementation first compresses input channel dimensions using the expansion coefficient e (halving channels when

e = 0.5

); then, it performs feature transformation via three independent grouped polynomial convolutions, culminating in the channel-wise concatenation and fusion of multi-scale features. The processed features undergo channel-wise concatenation to reconstruct topological traces, effectively overcoming conventional convolutions’ limitations in perceiving sparse patterns. Within the RT-DETR-R18-KANC3, this module replaces original RepC3 units at cross-scale feature aggregation nodes in the Feature Pyramid Network. By maintaining

3 \times 3

kernel dimensions in spatial domains while introducing polynomial expansion mechanisms in channel dimensions, this dual approach significantly enhances low-contrast target discrimination in dust-obscured regions through multi-scale feature decomposition while preserving computational efficiency.

The KALNConv2DLayer (Figure 12) operates as a core convolutional module implementing feature enhancement through group-wise polynomial expansion architecture to resolve low target–background contrast and weak texture identification. Building upon standard convolution, this 2D-specialized implementation integrates Legendre polynomial basis expansion. Its architecture comprises multiple independent grouped processing paths, each containing fundamental components: a Conv2d layer, InstanceNorm2d, and SiLU activation. Input features are uniformly partitioned across groups, processed through base convolutional operations, and subsequently combined with polynomial path outputs. The polynomial pathway computes Legendre polynomial expansions of input features (with maximum degree controlled by the parameter degree), followed by learnable polynomial weight tensor convolution. Notably, the module employs Kaiming uniform initialization for both convolutional kernels and polynomial weights, with input features normalized to a [−1, 1] interval to ensure numerical stability in low-illumination conditions during polynomial computation:

x_{out} = SiLU (I n s t a n c e N o r m (W_{base} * x + \sum_{k = 0}^{3} α_{k} ⊙ P_{k} (x_{norm})))

(7)

where

x_{out} \in R^{C o \times H \times W}

denotes the output feature tensor with

C o

output channels and spatial dimensions

H \times W

. The operator

*

represents the convolution operation implemented through

W_{b a s e} \in R^{C o \times C i \times k \times k}

, with the convolutional kernel weights transforming

C i

-channel input

x

. The Hadamard product (

⊙

) performs element-wise multiplication between the learnable polynomial coefficients

α_{k} \in R^{C o \times C i}

and the

k

-th order Legendre basis functions

P_{k} (\cdot)

evaluated at

x_{n o r m}

. The min–max normalized input satisfies

x_{n o r m} \in {[- 1, 1]}^{C i \times H \times W}

. The

I n s t a n c e N o r m

operator applies instance-wise standardization, and

SiLU (\cdot)

denotes the Sigmoid-weighted Linear Unit activation function.

Kaiming uniform initialization is as follows:

W \sim U (- \sqrt{\frac{6}{n_{in}}}, \sqrt{\frac{6}{n_{in}}})

(8)

where

W

denotes the weight tensor being initialized,

\sim

indicates statistical distribution,

U (a, b)

represents a continuous uniform distribution over the interval

[a, b]

, and

n_{in}

specifies the number of input units (fan-in) to the layer. The scaling factor

\sqrt{\frac{6}{n_{in}}}

is derived from variance scaling principles to maintain activation variances during forward propagation.

2.5. Experimental Process

The model performance was evaluated using five principal metrics: precision, recall, mean average precision at IoU = 0.5 (mAP50), mAP across IoU thresholds from 0.5 to 0.95 (mAP50-95), and F1 score. Precision quantifies the proportion of correctly identified positive instances among all predicted positives, while recall measures the fraction of actual positives successfully detected. These metrics are formally defined as follows:

Precision = \frac{T P}{T P + F P}

(9)

Recall = \frac{T P}{T P + F N}

(10)

where TP denotes true positives, FP denotes false positives, and FN denotes false negatives. The F1 score provides the harmonic mean integration of precision and recall:

F 1 = \frac{2 \cdot P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

(11)

Average precision (AP) per class is calculated as the area under the precision recall curve across recall levels. The mean average precision (mAP) aggregates performance across all S object categories:

mAP = \frac{1}{S} \sum_{i = 1}^{S} {AP}_{i}

(12)

Intersection over Union (IoU) serves as the fundamental localization metric in object detection, measuring bounding box overlap between predictions and ground truths:

IoU = \frac{A r e a o f O v e r l a p}{A r e a o f U n i o n}

(13)

The mAP50 represents the mean AP across categories at IoU = 0.5 threshold, while mAP50-95 evaluates detection robustness by averaging AP values over multiple IoU thresholds from 0.5 to 0.95 at 0.05 increments. This multi-threshold metric imposes stricter localization requirements, providing a comprehensive assessment of model performance across varying detection precision levels.

For Experiment I, we conducted a comparative analysis between raw image inputs and dynamically cropped inputs using the RT-DETR-R18 model. This experiment systematically evaluated performance variations across different input dimensions under identical experimental conditions, revealing the impact of dynamic adaptive cropping on model performance.

Experiment II presents a comprehensive performance comparison between the modified RT-DETR-R18-KANC3 model and its baseline counterpart RT-DETR-R18, along with three state-of-the-art YOLO-series detectors: YOLOv8, YOLOv10, and YOLO11. The evaluation framework maintains identical training and evaluation metrics across all compared models to ensure methodological consistency in architectural capability assessment. The parameter configurations for Experiment II are systematically documented in Table 1. All comparative models adhere to identical training conditions and hardware configurations to ensure fair performance evaluation across different architectures.

3. Results

3.1. Experiment I

The results of Experiment I (Table 2) demonstrate that the RT-DETR-R18 model exhibits distinct performance variations under different input scales. Compared with the original input dimensions (320/640/1024), enlarging the input size to 1024 caused a significant decline in the test set mAP50 from 0.264 to 0.162. Conversely, when employing cropped inputs at a 320 × 320 resolution, the model achieved superior detection precision with a mAP50 of 0.953 while maintaining a high recall rate of 0.955. This indicates that dynamic cropping operations effectively enhance both detection precision and robustness for small-scale input representations.

Figure 13 illustrates the detection performance under the original input dimensions (640 × 640/1024 × 1024): (a) displays ground-truth annotations containing filenames with corresponding “rockfall” labels; (b) shows detection results at a 640 × 640 resolution, where confidence scores primarily range between 0.3 and 0.4 accompanied by false negatives; and (c) presents 1024 × 1024 detection outputs, demonstrating an expanded confidence range (0.3–0.9) while maintaining comparable false negative occurrences.

3.2. Experiment II

Figure 14, Figure 15, Figure 16, Figure 17 and Figure 18 and Table 3 present the results of Experiment II. Table 3 demonstrates the RT-DETR series’ superior performance on the optimized dataset, where the enhanced RT-DETR-R18-KANC3 variant achieves validation/test set mAP50 scores of 0.967/0.964, respectively, outperforming the baseline model RT-DETR-R18 by 0.9–1.1 percentage points, thereby confirming the KANC3 module’s effectiveness. In contrast, YOLOv10 attains a validation mAP50 of only 0.511, with all YOLO variants yielding zero-valued test set metrics. Post hoc analysis revealed substantial false positives and negatives in detection outputs rather than data recording errors.

Figure 14 comparatively visualizes training-phase metrics, including precision, recall, mAP50, and mAP50-95, across different models.

Figure 15 demonstrates a comparative analysis of F1 scores across confidence threshold intervals for rockfall detection models. The RT-DETR-R18-KANC3 model (orange curve) maintains superior performance within 0.6–0.8 confidence ranges, achieving peak F1 scores (≈0.97) that notably surpass baseline RT-DETR-R18 (blue) and YOLO variants (green/red/purple). Notably, YOLOv10 (red) exhibits marked performance fluctuations in low-confidence regimes (0.2–0.4), showing an F1 score decline of approximately 0.3 compared to higher confidence intervals.

Figure 16 visually demonstrates detection failures in YOLO-series implementations through representative samples containing true positive and false positives examples in (d) with blue boxes.

Figure 17 compares receptive field characteristics between the baseline RT-DETR-R18 and the improved model RT-DETR-R18-KANC3 using gradient-weighted activation maps from the final backbone layer. R18-KANC3 (Figure 17b) exhibits marked differences in receptive field characteristics compared to the baseline R18 (Figure 17a), manifested through expanded spatial coverage of high-contribution regions and reduced effective receptive field radius. This parameter evolution indicates that the R18-KANC3 enhances spatial focusing capacity through optimized attention distribution during feature extraction processes.

Figure 18 visually contrasts detection performance improvements through annotated samples from the test set. The black boxes indicate the ground truth labels, while blue boxes show false positives, red boxes denote missed detections, and green boxes represent ground truth annotations.

4. Discussion

This study proposes a coordinated optimization strategy that integrates dynamic cropping refinement with model architectural modifications. The dynamic cropping strategy combines coordinate normalization, logarithmic feature scaling, and boundary constraint mechanisms to enhance small-target feature representation in grayscale images and mitigate background–target imbalance. Architecturally, to address the challenge of low target–background contrast in grayscale, the KANC3 unit replaces standard RepC3 modules, with its Legendre polynomial-based KALNConv2D layers strengthening feature space decomposition [38,39]. This approach specifically boosts the differential characteristics between targets and backgrounds. Furthermore, the KANC3 unit incorporates multi-scale feature decoupling, which proves highly effective in separating critical target features from pervasive low-illumination noise. To specifically combat the misclassification of faint textural traces, the framework employs grouped polynomial convolutions. These convolutions act to amplify local gradient responses, enhancing the discernibility of subtle structures. Complementing this, deep feature interactions facilitated by serialized KANC3 structures enable the reconstruction of trajectory topology through sophisticated channel-wise compression and fusion mechanisms. Collectively, these provide a comprehensive solution for overcoming the perceptual difficulties encountered in grayscale image analysis. The experimental results demonstrate the optimized input scheme’s detection superiority across resolutions, achieving 0.953 mAP50 and 0.955 recall in optimal configurations. Quantitative analysis confirms 1% improvements in recall and mAP50 and mAP50-95 over baseline models, establishing a novel technical pathway for addressing morphological ambiguities, spectral information scarcity, and data quality constraints in grayscale image analysis.

In Experiment I, the suboptimal detection performance on raw imagery stems from three interrelated factors:

(1): Limited spatial resolution obscures micro-scale rockfall targets within complex geological backgrounds, particularly in regions with static mega clasts (e.g., Martian debris cones and aeolian tails) that exhibit morphological similarities to rockfall features [17];
(2): Grayscale inputs lack multispectral discriminability, combined with low-illumination dust mantling common in extraterrestrial environments, further reducing target background contrast [40];
(3): Rockfall trajectory traces present weak textural signatures in grayscale imagery, often being misclassified as background noise due to their fragmented morphological patterns (Figure 4 and Figure 5).

In Experiment II, the superior performance of RT-DETR-R18-KANC3 over both its baseline counterpart (RT-DETR-R18) and YOLO-series models (YOLOv8/v10/11) can be attributed to three synergistic factors:

(1): End-to-end detection paradigm: the elimination of NMS post-processing in RT-DETR reduces false-positive aggregation in low-contrast terrains (Figure 13 and Figure 16) [35];
(2): RT-DETR’s self-attention mechanism propagates low-contrast edge information across fields without degradation (Figure 17) [33];
(3): Legendre polynomial decomposition: the KANC3 module enhances sensitivity to texture signatures under grayscale ambiguity, which explains the 1% improvement over its baseline [38].

5. Conclusions

This study reveals an enhanced RT-DETR framework optimized for rockfall detection based on the dynamic cropping strategy and KANC3 module integration. By addressing the challenges associated with low-contrast grayscale imagery and small-target recognition, the proposed RT-DETR-R18-KANC3 model achieves a state-of-the-art performance, with 0.964 mAP50 and 0.955 recall on the optimized RMaM-2020 dataset, outperforming baseline models by 1% and YOLO-series detectors by over 40%. The success of dynamic cropping and Legendre polynomial-based feature decomposition demonstrates significant potential for automated geological analysis in extraterrestrial exploration, while current limitations in texture ambiguity highlight the need for future multimodal fusion approaches.

Future research directions may focus on the following:

(1): Developing multimodal fusion integrating high-resolution imagery with digital elevation models (DEMs) to enable concurrent detection;
(2): Implementing a morphology-adaptive cropping strategy based on rock dimension parameters to optimize region-of-interest selection;
(3): Adopting defect detection-inspired approaches combining background suppression (using wavelet decomposition) and texture enhancement (through Retinex-based processing) to address small-target detection in low-quality planetary imagery.

Author Contributions

Conceptualization, P.Z. and J.H.; methodology, P.Z.; validation, Y.Y., Y.L. and H.Z.; formal analysis, Y.Y.; investigation, Y.L.; resources, J.H.; data curation, H.Z.; writing—original draft preparation, P.Z. and J.H.; writing—review and editing, P.Z. and J.H.; visualization, P.Z.; supervision, J.H.; project administration, P.Z.; funding acquisition, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This study was partially supported by the National Key Research and Development Program of China (No. 2020YFA0714103).

Data Availability Statement

The data that support the findings of this study are available at https://edmond.mpdl.mpg.de/imeji/collection/DowTY91csU3jv9S2 (accessed on 27 June 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Robinson, M.S.; Brylow, S.M.; Tschimmel, M.; Humm, D.; Lawrence, S.J.; Thomas, P.C.; Denevi, B.W.; Bowman-Cisneros, E.; Zerr, J.; Ravine, M.A.; et al. Lunar Reconnaissance Orbiter Camer (LROC) Instrument Overview. Space Sci. Rev. 2010, 150, 81–124. [Google Scholar] [CrossRef]
McEwen, A.S.; Eliason, E.M.; Bergstrom, J.W.; Bridges, N.T.; Hansen, C.J.; Delamere, W.A.; Grant, J.A.; Gulick, V.C.; Herkenhoff, K.E.; Keszthelyi, L.; et al. Mars Reconnaissance Orbiter’s High Resolution Science Experiment (HiRISE). J. Geophys. Res. Planets 2007, 112, E05S02. [Google Scholar] [CrossRef]
Lobinson, Y.; Brylow, Y.; Hschimmel, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Bickel, V.T.; Conway, S.J.; Tesson, P.-A.; Manconi, A.; Loew, S.; Mall, U. Deep Learning-Driven Detection and Mapping of Rockfalls on Mars. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 2831–2841. [Google Scholar] [CrossRef]
Bickel, V.T.; Aaron, J.; Manconi, A.; Loew, S. Global Drivers Transport Mechanisms of Lunar Rockfalls. J. Geophys. Res. Planets 2021, 126, e2021JE006824. [Google Scholar] [CrossRef]
Hörz, F.; Basilevsky, A.T.; Head, J.W.; Cintala, M.J. Erosion of lunar surface rocks by impact processes: A synthesis. Planet. Space Sci. 2020, 194, 105105. [Google Scholar] [CrossRef] [PubMed]
Tesson, P.-A.; Conway, S.J.; Mangold, N.; Ciazela, J.; Lewis, S.R.; Mège, D. Evidence for thermal-stress-induced rockfalls on Mars impact cracter slopes. Icarus 2020, 342, 113503. [Google Scholar] [CrossRef]
Molaro, J.; Byrne, S. Rates of Temperature Change of Airless Landscapes and Implications for Thermal Stress Weathering. J. Geophys. Rev. Planets 2012, 117, E10011. [Google Scholar] [CrossRef]
Varga, P.; Grafarend, E. Influence of Tidal Force on the Triggering of Seismic Events. Pure Appl. Geophys. 2018, 175, 1649–1657. [Google Scholar] [CrossRef]
Dundas, C.M.; Bramson, A.M.; Ojha, L.; Wray, J.J.; Mellon, M.T.; Byrne, S.; McEwen, A.S.; Putzig, N.E.; Viola, D.; Sutton, S.; et al. Exposed Subsurface Ice Sheets in the Martian Mid-Latitudes. Science 2018, 359, 199–201. [Google Scholar] [CrossRef]
Shoemaker, E.M. The Geology of the Moon. Sci. Am. 1964, 211, 38–47. [Google Scholar] [CrossRef]
Chuang, F.C.; Beyer, R.A.; McEwen, A.S.; Thomson, B.J. HiRISE Observations of Slope Streaks on Mars. Geophys. Res. Lett. 2007, 34, L20204. [Google Scholar] [CrossRef]
Senthil Kumar, P.; Sruthi, U.; Krishna, N.; Lakshmi, K.J.P.; Menon, R.; Amitabh; Gopala Krishna, B.; Kring, D.A.; Head, J.W.; Goswami, J.N.; et al. Recent Shallow Moonquake and Impact-Triggered Boulder Falls on the Moon: New Insights from the Schrödinger Basin. J. Geophys. Res. Planets 2016, 121, 147–179. [Google Scholar] [CrossRef]
Kokelaar, B.P.; Bahia, R.S.; Joy, K.H.; Viroulet, S.; Gray, J.M.N.T. Granular Avalanches on the Moon: Mass-Wasting Conditions, Processes, and Features. J. Geophys. Res. Planets 2017, 122, 1893–1925. [Google Scholar] [CrossRef]
Bickel, V.T.; Honniball, C.I.; Martinez, S.N.; Rogaski, A.; Sargeant, H.M.; Bell, S.K.; Czaplinski, E.C.; Farrant, B.E.; Harrington, E.M.; Tolometti, G.D.; et al. Analysis of Lunar Boulder Tracks: Implications for Trafficability of Pyroclastic Deposits. J. Geophys. Res. Planets 2019, 124, 1296–1314. [Google Scholar] [CrossRef]
Bickel, V.T.; Mandrake, L.; Doran, G. Analyzing Multi-Domain Learning for Enhanced Rockfall Mapping in Known and Unknown Planetary Domains. ISPRS J. Photogramm. Remote Sens. 2021, 182, 1–13. [Google Scholar] [CrossRef]
Bickel, V.T.; Lanaras, C.; Manconi, A.; Loew, S.; Mall, U. Automated Detection of Lunar Rockfalls Using a Convolutional Neural Network. IEEE Trans. Geosci. Remote Sens. 2019, 57, 3501–3511. [Google Scholar] [CrossRef]
Hungr, O.; Leroueil, S.; Picarelli, L. The Varnes Classification of Landslide Types, an Update. Landslides 2014, 11, 167–194. [Google Scholar] [CrossRef]
Perko, H.A.; Nelson, J.D.; Sadeh, W.Z. Surface Cleanliness Effect on Lunar Soil Shear Strength. J. Geotech. Geoenviron. Eng. 2001, 127, 371–383. [Google Scholar] [CrossRef]
Bickel, V.T.; Aaron, J.; Manconi, A.; Loew, S.; Mall, U. Impacts Drive Lunar Rockfalls over Billions of Years. Nat. Commun. 2020, 11, 2862. [Google Scholar] [CrossRef]
Colwell, J.E.; Batiste, S.; Horányi, M.; Robertson, S.; Sture, S. Lunar Surface: Dust Dynamics and Regolith Mechanics. Rev. Geophys. Rev. 2007, 45. [Google Scholar] [CrossRef]
Tsige, M.; Ruiz, J.; del Río, I.A.; Jiménez-Díaz, A. Modeling of Landslides in Valles Marineris, Mars, and Implications for Initiation Mechanism. Earth Moon Planets 2016, 118, 15–26. [Google Scholar] [CrossRef]
Roberts, G.P.; Matthews, B.; Bristow, C.; Guerrieri, L.; Vetterlein, J. Possible Evidence of Paleomarsquakes from Fallen Boulder Populations, Cerberus Fossae, Mars. J. Geophys. Res. Planets 2012, 117, E02009. [Google Scholar] [CrossRef]
Zoumpekas, T.; Puig, A.; Salamó, M.; García-Sellés, D.; Blanco Nuñez, L.; Guinau, M. An Intelligent Framework for End-to-End Rockfall Detection. Int. J. Intell. Syst. 2021, 36, 6471–6502. [Google Scholar] [CrossRef]
Rajasekhar, K.; Rashmitha, M.; Priyanka, S.; Uma Mahesh, S. Rockfall Detection on Moon and Mars Using Deep Learning Models. In Proceedings of Fifth Doctoral Symposium on Computational Intelligence; Springer Nature: Singapore, 2024; pp. 361–374. [Google Scholar] [CrossRef]
Bickel, V.T.; Mandrake, L.; Doran, G. A Labeled Image Dataset for Deep Learning-Driven Rockfall Detection on the Moon and Mars. Front. Remote Sens. 2021, 2, 640034. [Google Scholar] [CrossRef]
Mishra, A. Contrast Limited Adaptive Histogram Equalization (CLAHE) Approach for Enhancement of the Microstructures of Friction Stir Welded Joints. arXiv 2021, arXiv:2109.00886. [Google Scholar] [CrossRef]
Duda, R.O.; Hart, P.E. Use of the Hough Transformation to Detect Lines and Curves in Pictures. Commun. ACM 1972, 15, 11–15. [Google Scholar] [CrossRef]
Wolf, C.; Jolion, J.-M.; Chassaing, F. Text Localization, Enhancement and Binarization in Multimedia Documents. In Proceedings of the 2002 International Conference on Pattern Recognition, Montreal, QC, Canada, 11–15 August 2002; Volume 2, pp. 1037–1040. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2016, arXiv:1506.02640. [Google Scholar] [CrossRef]
Zhou, X.; Wang, D.; Krähenbühl, P.; Humm, D.; Lawrence, S.J.; Thomas, P.C.; Denevi, B.W.; Bowman-Cisneros, E.; Zerr, J.; Ravine, M.A.; et al. Objects as Points. arXiv 2019, arXiv:1904.07850. [Google Scholar] [CrossRef]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillow, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. arXiv 2020, arXiv:2005.12872. [Google Scholar] [CrossRef]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar] [CrossRef]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs ON Real-Time Object Detection. arXiv 2024, arXiv:2304.08069. [Google Scholar] [CrossRef]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Bhattacharjee, S.S. TorchKAN: Simplified KAN Model with Variations. GitHub Repositories. 2024. Available online: https://github.com/1ssb/torchkan/ (accessed on 27 June 2025).
Celeghini, E.; Gadella, M.; del Olmo, M.A. Image processing and Legendre Polynomials. arXiv 2023, arXiv:2312.17743v1. Available online: https://arxiv.org/abs/2312.17743v1 (accessed on 27 June 2025).
Liu, Z.; Xang, J.; Duan, J. A low illumination target detection method based on a dynamic gradient gain allocation strategy. arXiv 2024, arXiv:2404.19756. [Google Scholar] [CrossRef]

Figure 1. Some images from the lunar test dataset. The rockfall regions with red bounding boxes were delineated based on labels from the original CSV file.

Figure 2. Some images from the Martian test dataset. The rockfall regions with red bounding boxes were delineated based on labels from the original CSV file.

Figure 3. The area ratio distribution of rockfalls on the Moon and Mars.

Figure 4. Visualization of feature extraction from the Moon.

Figure 5. Visualization of feature extraction from Mars.

Figure 6. Result of dynamic cropping strategy from the Moon.

Figure 7. Result of dynamic cropping strategy from Mars.

Figure 8. Cropped image from the Moon.

Figure 9. Cropped image from Mars.

Figure 10. Overview of RT-DETR-R18-KANC3.

Figure 11. Overview of KANC3.

Figure 12. Overview of KALNConv2DLayer.

Figure 13. Some of the model-generated detection results. (a) Ground truth; (b) size 640; (c) size 1024.

Figure 14. Model comparison of P/R/mAP metrics from training phase.

Figure 15. F1 scores for rockfall across models.

Figure 16. YOLO-series model-generated detection samples from training results.

Figure 17. Receptive field comparison.

Figure 18. Detection improvement visualization.

Table 1. Experimental setup for Experiment II.

Parameter	Specification
Operating system	Windows10
Deep learning framework	PyTorch 1.10
Programming language	Python 3.10
GPU	NVIDIA GeForce RTX 4090
CPU	16 vCPU
Image size	320 × 320
Initial learning	0.0001
Batch size	16
Epoch	130
Optimizer	AdamW

Table 2. Results of Experiment I.

Model	Dataset	Size	Precision	Recall	mAP50	mAP50-95
Original-R18	Val	320	0.332	0.231	0.196	0.0746
Original-R18	Test	320	0.285	0.227	0.171	0.0626
Original-R18	Val	640	0.435	0.356	0.321	0.125
Original-R18	Test	640	0.334	0.343	0.264	0.0939
Original-R18	Val	1024	0.537	0.408	0.414	0.174
Original-R18	Test	1024	0.302	0.244	0.162	0.0579
Cropped-R18	Val	320	0.971	0.949	0.957	0.858
Cropped-R18	Test	320	0.977	0.955	0.953	0.759

Table 3. Results of Experiment II.

Model	Dataset	Precision	Recall	mAP50	mAP50-95
R18	Val	0.971	0.949	0.957	0.858
R18	Test	0.977	0.955	0.953	0.759
R18-KANC3	Val	0.978	0.959	0.967	0.866
R18-KANC3	Test	0.982	0.955	0.964	0.775
YOLOv8	Val	0.883	0.719	0.847	0.592
YOLOv8	Test	0	0	0	0
YOLOv10	Val	0.525	0.468	0.511	0.335
YOLOv10	Test	0	data	0	0
YOLO11	Val	0.944	0.853	0.926	0.666
YOLO11	Test	0	0	0	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zang, P.; He, J.; Yang, Y.; Li, Y.; Zhang, H. Enhanced RT-DETR with Dynamic Cropping and Legendre Polynomial Decomposition Rockfall Detection on the Moon and Mars. Remote Sens. 2025, 17, 2252. https://doi.org/10.3390/rs17132252

AMA Style

Zang P, He J, Yang Y, Li Y, Zhang H. Enhanced RT-DETR with Dynamic Cropping and Legendre Polynomial Decomposition Rockfall Detection on the Moon and Mars. Remote Sensing. 2025; 17(13):2252. https://doi.org/10.3390/rs17132252

Chicago/Turabian Style

Zang, Panpan, Jinxin He, Yongbin Yang, Yu Li, and Hanya Zhang. 2025. "Enhanced RT-DETR with Dynamic Cropping and Legendre Polynomial Decomposition Rockfall Detection on the Moon and Mars" Remote Sensing 17, no. 13: 2252. https://doi.org/10.3390/rs17132252

APA Style

Zang, P., He, J., Yang, Y., Li, Y., & Zhang, H. (2025). Enhanced RT-DETR with Dynamic Cropping and Legendre Polynomial Decomposition Rockfall Detection on the Moon and Mars. Remote Sensing, 17(13), 2252. https://doi.org/10.3390/rs17132252

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced RT-DETR with Dynamic Cropping and Legendre Polynomial Decomposition Rockfall Detection on the Moon and Mars

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.2. Object Detection Model

2.3. Training, Validation, and Test Datasets

2.4. Structure of Enhanced Object Detect Model

2.5. Experimental Process

3. Results

3.1. Experiment I

3.2. Experiment II

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI