SCCA-YOLO: Spatial Channel Fusion and Context-Aware YOLO for Lunar Crater Detection

Tang, Jiahao; Gu, Boyuan; Li, Tianyou; Lu, Ying-Bo

doi:10.3390/rs17142380

Open AccessArticle

SCCA-YOLO: Spatial Channel Fusion and Context-Aware YOLO for Lunar Crater Detection

¹

School of Mechanical, Electrical & Information Engineering, Shandong University, Weihai 264209, China

²

Glasgow College, University of Electronic Science and Technology of China, Chengdu 611731, China

³

School of Space Science and Technology, Institute of Space Sciences, Shandong University, Weihai 264209, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(14), 2380; https://doi.org/10.3390/rs17142380

Submission received: 22 May 2025 / Revised: 6 July 2025 / Accepted: 8 July 2025 / Published: 10 July 2025

Download

Browse Figures

Versions Notes

Abstract

Lunar crater detection plays a crucial role in geological analysis and the advancement of lunar exploration. Accurate identification of craters is also essential for constructing high-resolution topographic maps and supporting mission planning in future lunar exploration efforts. However, lunar craters often suffer from insufficient feature representation due to their small size and blurred boundaries. In addition, the visual similarity between craters and surrounding terrain further exacerbates background confusion. These challenges significantly hinder detection performance in remote sensing imagery and underscore the necessity of enhancing both local feature representation and global semantic reasoning. In this paper, we propose a novel Spatial Channel Fusion and Context-Aware YOLO (SCCA-YOLO) model built upon the YOLO11 framework. Specifically, the Context-Aware Module (CAM) employs a multi-branch dilated convolutional structure to enhance feature richness and expand the local receptive field, thereby strengthening the feature extraction capability. The Joint Spatial and Channel Fusion Module (SCFM) is utilized to fuse spatial and channel information to model the global relationships between craters and the background, effectively suppressing background noise and reinforcing feature discrimination. In addition, the improved Channel Attention Concatenation (CAC) strategy adaptively learns channel-wise importance weights during feature concatenation, further optimizing multi-scale semantic feature fusion and enhancing the model’s sensitivity to critical crater features. The proposed method is validated on a self-constructed Chang’e 6 dataset, covering the landing site and its surrounding areas. Experimental results demonstrate that our model achieves an

m A P_{0.5}

of 96.5% and an

m A P_{0.5 : 0.95}

of 81.5%, outperforming other mainstream detection models including the YOLO family of algorithms. These findings highlight the potential of SCCA-YOLO for high-precision lunar crater detection and provide valuable insights into future lunar surface analysis.

Keywords:

lunar crater detection; YOLO11; joint spatial and channel fusion; context aware; channel attention concatenation

1. Introduction

Lunar craters, formed by the high-velocity impacts of meteorites, constitute a fundamental class of surface topographic features on the Moon. These widely distributed craters can offer invaluable geological insights. The morphology, spatial distribution, and scale characteristics of lunar craters provide rich geological data critical for various scientific endeavors, including lunar dating [1], stratigraphy [2], and planetary evolution research [3]. In recent years, with the advancement of lunar exploration activities, the precise topographic mapping of the lunar surface has become increasingly important for scientific research and mission planning. Specifically, the targeted mapping of medium- and small-scale craters (cratered regions larger than 8 ∗ 8 pixels) has become indispensable for the accurate selection and hazard assessment of safe landing and exploration sites on the Moon [4].

The acquisition of high-resolution lunar remote sensing imagery provides a rich data foundation for crater detection while also presenting significant challenges. On one hand, lunar craters are inherently sparse in features, and a significant portion of them exhibit challenging visual characteristics such as small size, blurred boundaries, and low grayscale contrast. These factors lead to incomplete semantic representation during feature extraction, causing significant difficulties to accurately identify crater regions [5]. Due to the sparsity of visual features, the model must rely on learning highly abstract and essential representations to achieve accurate detection. This increases the need for deeper and more complex network structures, which may compromise computational efficiency, especially in real-time applications. These challenges necessitate the need for effective feature enhancement strategy to mitigate the negative effect of down-sampled resolution on semantic representation and detection accuracy [6]. On the other hand, the inherent complexity of the lunar surface, with features such as ravines, faults, and debris that visually resemble craters, can lead to significant background confusion in detection regions [7,8,9]. Constructing global semantic associations in both spatial and channel dimensions to guide the model to focus on target areas while suppressing redundant responses is another key issue in improving crater detection performance.

To address the aforementioned challenges, it is imperative to develop more effective feature enhancement strategies that can improve the model’s capacity for semantic representation of lunar craters and enhance its stability and accuracy under complex background interference. In recent years, the rapid development of artificial intelligence, particularly data-driven methods based on deep learning, has led to transformative advances in various computer vision tasks such as image classification [10], semantic segmentation [11], and object detection [12]. These methods leverage end-to-end learning mechanisms to automatically extract multi-level and multi-scale features from large datasets, significantly improving model adaptability to complex scenes and diverse targets. With the powerful feature representation capabilities of deep neural networks, researchers have increasingly applied these techniques to lunar and planetary surface analysis. In particular, deep learning-based approaches have demonstrated remarkable performance in automatic crater recognition, offering more efficient and intelligent solutions to support lunar geological studies and mission planning [13].

However, degraded crater morphologies characterized by overlapping rims, partial erosion, or burial under regolith remain a primary source of false positives and missed detections. Existing approaches, ranging from morphological template fitting based on mathematical morphology [14], texture-based segmentation using local binary patterns (LBPs) [15] and Gabor filters [9] to hybrid deep learning frameworks that jointly detect and reconstruct incomplete rims, demonstrate promising results on controlled benchmarks but often falter under realistic lunar imaging conditions due to variable illumination, resolution constraints, and heterogeneous terrain. This issue highlights the urgent need for enhanced feature representation.

Regarding the crater feature enhancement, early-stage two-stage models including Fast R-CNN [16] and Faster R-CNN [12] have reached high precision by refining region proposals and extracting features at multiple levels through RoI pooling and cascaded classifiers. These designs are particularly effective for craters with clear boundaries. However, their complex pipelines and slower inference speed limit their applicability to large-scale or real-time crater detection tasks. In contrast, recent single-stage detectors, especially the YOLO family of algorithms [17], have become the mainstream architectures for crater detection owing to a fine balance between speed and accuracy. Various YOLO-based models, including YOLOv7-DCN [18], YOLOv8-LCNet [19], and YOLOv9-GELAN [20], have been proposed by integrating different YOLO versions, yielding significant improvements in feature representation capabilities as well as global detection accuracy. Furthermore, the incorporation of attention mechanisms and multi-scale feature fusion has greatly enhanced the model’s capability of crater feature representation. For instance, HRFPNet [21] employs a feature aggregation module (FAM) to strengthen contextual awareness, and the EfficientDet series [22] introduces a BiFPN structure for efficient information integration. SE-Net [23] boosts detection precision by introducing the CBAM attention module. Additionally, multi-branch convolutions [24] and Transformer structures [25] have also exhibited promising results in feature enhancement.

Building on feature enhancement, the integration of global modeling mechanisms enables the model to further focus on target regions and suppress background interference. CAD-Net [26] introduced global context attention mechanisms to achieve comprehensive semantic understanding of the entire image. Following this, LBA-MCNet [27] proposed the GDAL module to simulate pixel-level spatial dependencies, enhancing edge feature expression. GCNet [28] and SCPNet [29] effectively improved region focusing by combining spatial and channel attention mechanisms. Furthermore, a recently proposed GLGCNet [30], adopted a “global–local–global” three-stage structure to address the issue of varying crater scales, showing superior robustness.

Despite the previous progress in strengthening the target representation and mitigating background interference, two major challenges persist: Firstly, to enhance the global crater detection performance, the local feature representation must be emphasized, especially for the craters with indistinct structural features. Most models deeply rely on the backbone network for shallow local modeling, causing difficulties to fully exploit contextual information for craters, further resulting in insufficient feature expression. Secondly, the visual similarity between craters and surrounding features, such as ridges or faults, often frustrate the correct detection of targeted craters. This problem is especially prominent when crater boundaries are blurred or degraded. Without sufficient contextual reasoning, models may misclassify these ambiguous regions, leading to false positives and missed detections. To address these issues, this paper proposes an improved YOLO11 series model—Spatial Channel Fusion and Context-Aware YOLO—that integrates context-aware modeling with spatial–channel information enhancement, aiming to systematically enhance lunar crater detection performance.

The main contributions of this paper are summarized as follows:

(1) The feature sparsity of lunar craters, caused by their small size and low contrast, often leads to incomplete semantic representation, which makes it difficult for conventional detectors to accurately localize the targets. To address this issue, we develop a Context- Aware Module (CAM) that enhances local contextual information through a multi-scale dilated convolution structure, thereby improving detection robustness. Simultaneously, Channel Attention Concatenation (CAC) is proposed to adaptively adjust the weight of each channel, leading to emphasized weights of important channels.

(2) A Joint Spatial and Channel Fusion Module (SCFM) is constructed to tackle the detection accuracy issue caused by the blurred boundaries and background interference. The cross-dimensional semantic associations employed by the SCFM contributes to advanced capability to focus on crater regions and suppress background interference.

(3) The feasibility of our model is validated using the self-constructed Chang’e 6 dataset. Comparing with mainstream detection models, our model achieves comprehensive results for precision, recall,

m A P_{0.5}

, and

m A P_{0.5 : 0.95}

. Experiments are conducted including model comparisons, ablation studies, and module replacements for the SCFM.

2. Methodology

2.1. Framework of SCCA-YOLO

YOLO11 was adopted as the baseline framework for the proposed model, based on a trade-off between detection performance and the expandability of new modules. Specifically, YOLO11 offers greater flexibility in module integration under the PyTorch framework, making it more suitable for model embedding. To enhance both context awareness and global perceptual capability, two specially designed modules were integrated into the head of YOLO11, alongside an improved feature concatenation strategy.

Initially, the CAM was introduced to enrich the network’s ability to perceive local contextual dependencies. Following this, a CAC strategy was applied to selectively emphasize the important channels. Subsequently, the SCFM was designed to strengthen the global representation by effectively extracting cross-spatial and cross-channel features. Detailed descriptions of the CAM, CAC, and SCFM are provided in Section 2.2 and Section 2.3, respectively.

The input remote sensing images were initially processed through the backbone to extract multi-scale features. These features were further enhanced and fused in the improved SCCA-YOLO head. Eventually, feature maps with three different sizes were utilized for the multi-scale target detection.

The architecture of the proposed SCCA-YOLO model is illustrated in Figure 1.

2.2. Context-Aware Module (CAM)

In lunar crater detection, the challenge often arises from the inherent sparse features, leading to frequent misidentification of the target. However, the backbone network suffers from limited feature extraction ability, resulting from relatively insufficient context information and a shallow perception field, which causes difficulties in the accurate detection of the craters. Worse still, as the network deepens, the spatial resolution decreases, which exacerbates difficulties in detecting small craters.

To enhance the local feature perception of our model, we propose the CAM to optimize feature representation in two key aspects. First, to increase the richness of features, we utilize a multi-branch convolution structure that extracts semantic information from multiple perspectives to better differentiate planetary craters. Second, we expand the receptive field by employing dilated convolutions, which aggregate contextual information at varying scales without adding redundant computation cost. The detailed architecture of the CAM is illustrated in Figure 2.

As shown in Figure 2, the three branches of CAM initially pass through a

1 * 1

convolution to adjust the number of channels, thereby reducing computational complexity while preserving essential feature information. The second and third branches then apply

3 * 3

convolutions to extract basic features. Subsequently, dilated convolutions with dilation rates of

d = 1

,

d = 3

, and

d = 5

are established to represent receptive fields of different sizes. The mathematical representation of the three branches of the CAM is as follows:

M_{1} = f_{d = 1}^{d i c o n v} [f_{k = 1}^{c o n v} (F_{1})]

(1)

M_{2} = f_{d = 3}^{d i c o n v} {f_{k = 3}^{c o n v} [f_{k = 1}^{c o n v} (F_{1})]}

(2)

M_{3} = f_{d = 5}^{d i c o n v} {f_{k = 3}^{c o n v} [f_{k = 1}^{c o n v} (F_{1})]}

(3)

where

f_{k = 1}^{c o n v}

and

f_{k = 3}^{c o n v}

represent the standard convolutions with kernel sizes of

1 * 1

and

3 * 3

, respectively;

f_{d = 1}^{d i c o n v}

,

f_{d = 3}^{d i c o n v}

, and

f_{d = 5}^{d i c o n v}

represent the dilated convolution with a kernel size of

3 * 3

and dilation rates of 1, 3, and 5, respectively;

F_{1}

is the input feature map, and

M_{1}

,

M_{2}

, and

M_{3}

represent the output feature maps of the three branches.

After being processed by the three branches, the outputs are concatenated and passed through a final

1 * 1

convolution to refine the feature map into the output channel. Finally, a residual structure is introduced to pass the input feature map through a

1 * 1

convolution, which retains the key features of the crater. This residual map is then fused with the previously extracted features containing local contextual information. The concatenation and residual connection are expressed as:

M = f_{k = 1}^{c o n v} [C a t (M_{1}, M_{2}, M_{3})]

(4)

F_{2} = R e L U [f_{k = 1}^{c o n v} (F_{1}) \oplus M]

(5)

where

C a t (\cdot)

denotes the feature map concatenation operation; ⊕ denotes the element-wise addition of the feature maps;

R e L U

denotes using the rectified linear unit as the activation function; M represents the feature map of local feature awareness; and

F_{2}

represents the output feature map.

In particular, the choice of dilated convolution was based on the relationship between crater size and receptive field. The input images were sized at 512 × 512 pixels, and we defined a valid crater as having a diameter greater than 8 pixels. However, in the YOLO11 framework, even the deepest feature map (P5) only offers a receptive field of approximately 11 pixels, which is insufficient to cover and characterize larger craters entirely. This limitation restricts the network’s ability to incorporate contextual information around the craters. Therefore, by employing dilated convolutions, we effectively enlarged the receptive field without increasing the model’s depth, strengthening its capability to integrate neighboring features into the crater representation. This strategy is refereed to as context-aware modeling in this study.

Furthermore, the original splicing strategy of the YOLO11 model simply stacks two feature maps without distinguishing the importance of their semantic content. To optimize the information fusion process of impact craters, we propose a Channel Attention Concatenation (CAC) strategy. By assigning attention weights to channel information, the concatenated features pay more attention to channels with more semantic information, providing more effective feature maps for subsequent global information modeling.

As shown in Figure 3, CAC introduces a learnable weight for each input feature map. These adaptive weights are automatically updated during training like convolution parameters, allowing the model to adaptively weight each feature map according to its importance, thus achieving more effective and discriminative feature fusion. Our attention weighting strategy first concatenates the input feature maps, then normalizes the trained weights, multiplies them with the original input features, and finally obtains the reconstructed features in the channel dimension. The mathematical expression of CAC is as follows:

M = C a t (F_{1}, F_{2})

(6)

{\hat{ω}}_{j} = \frac{ω_{j}}{\sum_{i = 1}^{C} ω_{i}}

(7)

F_{3} = \hat{W} ⊙ M

(8)

where

C a t (\cdot)

denotes the feature map concatenation operation, ⊙ denotes the broad multiply operation of feature maps.

F_{1}

,

F_{2}

represent the input feature maps, and M is the concatenated feature map. W represents the adaptive weight vector, and

ω_{i}

represents the ith value of W,

\hat{W}

is the normalized weight vector, and

{\hat{ω}}_{j}

represents the jth value of

\hat{W}

.

F_{3}

represents the output feature map.

Different from the original splicing strategy of YOLO11, which treats all feature maps equally, the proposed CAC strategy introduces an adaptive weighting mechanism in the channel dimension. It assigns different weights according to the importance of each channel for impact crater feature expression. This enables the model to better focus on channels highly relevant to the semantics of impact craters, thereby enhancing the ability to extract key information.

2.3. Joint Spatial and Channel Fusion Module (SCFM)

After the CAM and CAC, the feature maps have captured abundant local contextual information, enhancing the representation of crater features. At this stage, modeling the global relationship between craters and the background allows the model to focus attention more effectively on the target craters. Therefore, we designed the SCFM before the detection head instead of the backbone network. The SCFM integrates global spatial and channel information, representing the relationships across both space and channels between pixels. This helps suppress irrelevant background features and improves the distinction between craters and the background.

As illustrated in Figure 4, the SCFM consists of four branches. The first and second branches, respectively, utilize global max pooling (GMP) and global average pooling (GAP) to integrate global spatial information, representing it as channel weights. The third branch applies a

1 * 1

convolution to map the input feature map. The fourth branch uses a

1 * 1

convolution to adjust the feature map channels to 1, integrating global channel information into spatial position weights. Following this, the third branch performs matrix multiplication with the first and second branches to apply channel-wise weighting to the mapped feature map, emphasizing the more important feature channels. The two multiplication results are subsequently concatenated and processed through a

1 * 1

convolution, yielding a spatial location weight. This spatial attention mechanism based on dual pooling layers contributes to a more concentrated attention on the test-related regions. Simultaneously, the third branch performs matrix multiplication with the fourth branch to selectively focus on the spatial locations in the feature map, obtaining a channel weight. This matrix multiplication result is enhanced by a

1 * 1

convolution, further yielding a channel-aware feature map, which strengthens the model’s ability to focus on the most salient features of craters. The mathematical representations of the operations above are listed as follows:

M_{1} = f_{c_{2} = c_{1}}^{c o n v} (F_{1}) \otimes G M P (F_{1})

(9)

M_{2} = f_{c_{2} = c_{1}}^{c o n v} (F_{1}) \otimes G A P (F_{1})

(10)

M_{s} = f_{c_{2} = 1}^{c o n v} [C a t (M_{1}, M_{2})]

(11)

M_{c} = f_{c_{2} = c_{1}}^{c o n v} (F_{1}) \otimes f_{c_{2} = 1}^{c o n v} (F_{1})

(12)

where

f_{c_{2} = c_{1}}^{c o n v}

and

f_{c_{2} = 1}^{c o n v}

represent the standard convolution with the output channel of

c_{1}

and 1, respectively.

G A P (\cdot)

and

G M P (\cdot)

denote global average pooling and global max pooling, respectively. ⊗ denotes the matrix multiply operation of feature maps.

C a t (\cdot)

denotes the feature map concatenation operation.

M_{1}

and

M_{2}

represent the feature maps of the first and second branches, respectively.

M_{s}

and

M_{c}

represents the spatial-aware and channel-aware feature maps, respectively.

At this stage, both spatial-aware and channel-aware feature maps are prepared. Through a board multiplication of these two feature maps, a feature representation with global spatial and channel information can be obtained. Eventually, a residual net connecting the original feature map is applied to retain the shallow features and avoid issues including gradient vanishing. It is also notable that a learnable residual scaling factor is included in this residual structure. This factor adjusts the weight of global fused information in the SCFM, allowing the model to selectively focus on the original key crater features and the fused cross-channel and cross-space information.

The final-stage pixel-level global information in the SCFM can be represented as follows:

M = M_{c} ⊙ M_{s}

(13)

F_{2} = W \cdot M \oplus f_{c_{2} = c_{1}}^{c o n v} (F_{1})

(14)

where ⊙ denotes the broad multiply operation of feature maps, ⊕ denotes element-wise addition of the feature maps, M is the spatial-aware and channel-aware feature map, W is the residual scaling factor, and

F_{2}

represents the output feature map.

In short, according to Figure 4, the SCFM enhances and fuses channel and spatial information twice, making it highly effective in global perception. This approach allows the model to adopt a holistic view and direct attention toward the target craters, ultimately improving crater detection performance.

3. Experiment Results and Discussions

3.1. Experiment Configurations

3.1.1. Datasets Provenance and Segmentation

The dataset employed in this study originated from the Chang’e-6 landing site and its surrounding areas. The native ground sampling distance of the original images was approximately 0.85 m per pixel. From these original images, we carefully selected representative regions and extracted

512 \times 512

grayscale patches, resulting in a curated dataset consisting of 216 images. Each image covered approximately 200,000 square meters of the lunar surface, and the 216 images together spanned over 40 million square meters. The dataset exhibited strong diversity by covering a wide range of typical detection scenarios.

During the dataset preparing process, all of the reference lunar craters were labeled manually to ensure an optimal dataset quality. A significant portion of these images contained over 100 annotated craters each, with a total of 26,435 crater instances labeled across the dataset. To ensure annotation consistency, we adopted a unified filtering strategy: craters smaller than 8 × 8 pixels were excluded due to their high ambiguity, and for partially truncated craters at the image boundaries, only those with more than half of the structure preserved were annotated. These annotated craters spanned a broad spectrum of scales and morphological variations, ranging from small-scale, well-defined rims to large, heavily degraded structures. Specifically, small craters with diameters less than 50 m account for approximately 92% of all labeled instances in the dataset. The diversity and richness of this dataset provided a robust foundation for evaluating detection performance under challenging conditions including small-object recognition, ambiguous boundaries, and cluttered backgrounds.

Regarding the dataset partitioning, the dataset was split into training, validation, and testing subsets in a ratio of 8:1:1. All performance metrics reported in this study were computed on the testing subset to comprehensively and objectively assess the effectiveness and generalizability of the model.

It is also notable that no redundant cropping was applied during the image pre-processing stage. While overlapping cropping could increase sample diversity and allow some partially cut craters to appear fully in other patches, it may also introduce data leakage and lead to overfitting. The fundamental reason is that the overlapping clipping would cause the same crater to reappear in different data subsets, resulting in advanced exposure of key characteristics. This negative consequence can further lead to falsely high results for the evaluation metrics.

3.1.2. Model Implementation

All experiments in this study were conducted on a workstation equipped with an NVIDIA GeForce RTX 4090 GPU (24,111 MiB memory) (NVIDIA, Santa Clara, CA, USA). The software environment included Python 3.10.8, PyTorch 2.5.1, and the Ultralytics YOLO version 8.3.25 framework.

Among the five different-size YOLO11 variants, YOLO11s was selected as a trade-off between computational efficiency and detection accuracy. The AdamW optimizer was employed for model training, and hyperparameters were empirically selected and fine-tuned based on the validation set performance to ensure both stable convergence and optimal accuracy. The training pipeline was configured for 300 epochs with a batch size of 32. The initial learning rate was set to 0.002, and the momentum coefficient was fixed at 0.9. This implementation setup ensured a robust and reproducible training process that balanced computational efficiency with convergence stability.

3.1.3. Evaluation Metrics

To comprehensively evaluate the proposed model’s detection performance and computational efficiency, we adopted six widely used metrics. Detection performance was assessed using precision (P), recall (R),

m A P_{0.5}

, and

m A P_{0.5 : 0.95}

. For computational efficiency, GFLOPs and Params were utilized as evaluation metrics.

Regarding the detection performance evaluation, precision (P) quantifies the proportion of correctly predicted positive instances among all predicted positives. It is calculated as

P = \frac{T P}{T P + F P}

(15)

where

T P

and

F P

denote true positives and false positives, respectively.

Recall (R) measures the model’s ability to detect actual positive instances. It is defined as

R = \frac{T P}{T P + F N}

(16)

where

F N

refers to false negatives.

mAP provides an overall measure of detection accuracy across various confidence thresholds. It is one of the most widely accepted metrics in object detection tasks and includes two commonly reported forms

m A P_{0.5}

and

m A P_{0.5 : 0.95}

. mAP is calculated as the area under the precision–recall (PR) curve, averaged over all object classes:

\{\begin{matrix} A P = \int_{0}^{1} P (R) d R \\ m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i} \end{matrix}

(17)

where N denotes the number of object classes, and

A P_{i}

is the average precision for class i.

With regard to the computational efficiency performance, GFLOPs measure the model’s computational complexity. A lower number of GFLOPs reflects lower computational demands, making the model more suitable for deployment in resource-constrained environments.

Params assesses the model’s size and storage requirement. Smaller models with fewer parameters are easier to store, transfer, and deploy, particularly in embedded or mobile systems.

By jointly considering these multidimensional metrics, we established a systematic and holistic evaluation of the model’s overall performance, encompassing detection accuracy, computational efficiency, and resource consumption.

3.2. Results and Comparisons

3.2.1. Performance Comparison with Mainstream Detection Models

To comprehensively validate the detection performance and application potential of the proposed SCCA-YOLO model, we conducted comparative experiments against a range of representative baseline algorithms. These included the two-stage detector Faster R-CNN, single-stage detectors RetinaNet and CenterNet, and the mainstream YOLO series including YOLOv5, YOLOv6 [31], YOLOv8, YOLOv9 [32], YOLOv10 [33], and YOLO11.

Faster R-CNN, a canonical two-stage detection framework, incorporates a Region Proposal Network (RPN), making it a high-precision model suitable as an accuracy benchmark. RetinaNet addresses class imbalance using the Focal Loss function, offering notable advantages in small-object detection. CenterNet, as a keypoint-based one-stage detector, yields a fine balance between speed and accuracy by directly predicting object centers.

The YOLO family of algorithms have gained wide adoption in practical applications due to their remarkable real-time performance and detection accuracy. Notably, with iterative improvements from YOLOv5 to YOLO11, the architectures have progressively optimized the trade-off between detection accuracy and the interference speed, leading to incrementally robust and efficient models. Consequently, benchmarking against multiple YOLO versions enabled a comprehensive evaluation of SCCA-YOLO’s comprehensive capabilities in terms of detection precision, computational complexity, and real-time applicability within the same algorithm family. The overall comparison experiment results are illustrated in Table 1.

Experimental results indicated that traditional detectors including Faster R-CNN, RetinaNet, and CenterNet fell short in the task of lunar crater detection, largely because their design philosophies are misaligned with the unique characteristics of crater imagery. Firstly, Faster R-CNN’s two-stage pipeline relies on an RPN to generate candidate regions, followed by region-wise pooling that inherently diminishes spatial resolution. Such an approach proves ill-suited for isolating small, densely clustered craters with faint rims, as RPNs are highly sensitive to edge ambiguity, and the pooling layers further erode the already subtle features of degraded crater boundaries. Moreover, the model’s computationally heavy backbone and multi-stage processing significantly hamper throughput, rendering it impractical for large-scale inference on high-resolution planetary datasets.

RetinaNet addresses class imbalance by introducing Focal Loss, which indeed elevates detection metrics—pushing precision to 40.3% and recall to 50.4%—yet it still lacks the architectural innovations necessary for robust multi-scale feature extraction and holistic context reasoning. In lunar surface scenes, many non-crater structures (such as ridges, shadows, and geological faults) exhibit textures and shapes that closely resemble true craters. Without an explicit mechanism for aggregating fine-grained details across scales or for modeling long-range dependencies, RetinaNet frequently confused these terrain artifacts with genuine impact features, capping its mAP at 60.8% (IoU = 0.5) and 36.5% (IoU = 0.5:0.95).

CenterNet reframes detection as keypoint estimation, predicting object centers and regressing object dimensions in a single stage. While this yields commendable inference speed, it rests on implicit assumptions of object centrality and approximate symmetry—assumptions that lunar craters often violate. Craters on planetary surfaces can overlap, erode unevenly, or be partially obscured by ejecta and shadows, disrupting the notion of a well-defined centroid. Consequently, CenterNet’s reliance on a single-point representation led to missed detections and spurious false positives, limiting its precision to 40.6% and recall to 45.2%.

In contrast, the YOLO family generally outperformed these models due to their dense prediction structure, robust feature fusion capabilities, and high-resolution output maps, which are more compatible with the demands of crater detection. For instance, YOLOv5 yielded a precision of 90.7% and a recall of 86.8%, which showcased a great advancement comparing with non-YOLO algorithms. Subsequent versions incorporate attention mechanisms and various feature fusion strategies, leading to similarly acceptable results with minor advancement. By YOLO11, most metrics illustrated an ascent despite the compromise on precision at 88.7%. The recall reached 87.6%, and

m A P_{0.5}

and

m A P_{0.5 : 0.95}

improved to 95.9% and 80.8%, respectively. However, challenges persisted, particularly under high IoU thresholds, where the model exhibited localization inaccuracies reflected in stagnant precision, especially for small-scale craters in complex terrain.

In comparison of all models, the proposed SCCA-YOLO surpassed all baselines across all detection performance. It achieved a precision of 90.8%, recall of 88.5%,

m A P_{0.5}

of 96.5%, and

m A P_{0.5 : 0.95}

of 81.5%, representing respective gains of 2.1%, 0.9%, 0.6%, and 0.7% over YOLO11. Regarding computational efficiency, YOLO11 achieved the best GFLOPs (21.5) while YOLOv9 excelled in Params (7.288M). Although SCCA-YOLO was slightly inferior in these metrics compared with some mainstream models, its computational efficiency had an acceptable tolerance. In short summary, SCCA-YOLO, despite an acceptable sacrifice in computational efficiency, yielded stronger robustness and generalization in lunar crater detection, which is especially critical for representing subtle crater features, extracting semantic information, and suppressing interference from complex backgrounds.

3.2.2. Visual Evaluation of Classic YOLO Algorithms

To further provide an intuitive assessment of model performance under complex and challenging conditions, four representative test images were selected for visual analysis. These samples encompassed typical difficulties encountered in lunar crater detection tasks, including blurred crater boundaries, overlapping crater clusters, large-scale variations, and high-exposure interference. We compared the detection results of the proposed SCCA-YOLO model with those of three mainstream models, including YOLOv5, YOLOv8, and the baseline YOLO11, to evaluate their robustness and adaptability in practical application scenarios. The comparison results are illustrated in Figure 5.

From an overall perspective, YOLOv5 and YOLOv8 demonstrated moderate capability in target boundary localization. However, they frequently suffered from missed detections and false positives when dealing with craters that exhibited weak visual features or were embedded in complex spatial structures. For instance, in the second and third rows of Figure 5, where unclear boundaries, crater overlap, and scale diversity are prominent, both YOLOv5 and YOLOv8 exhibit noticeable performance degradation, failing to accurately detect several targets.

YOLO11, in comparison, demonstrated a marked improvement in detecting craters with weak or subtle features, highlighting its enhanced feature extraction capability. Nonetheless, it still struggled with target omission and misclassification in challenging settings such as high-exposure regions (first row) and complex background interference (fourth row). In these cases, YOLO11 exhibited similar issues such as boundary ambiguity, incomplete crater localization, and spurious detections, necessitating further improvements for robust performance in adverse conditions.

In contrast, the proposed SCCA-YOLO consistently delivered more stable and accurate detection results across all four challenging scenarios. It effectively identified craters with low contrast, ambiguous morphology, or overlapping spatial arrangements, and demonstrates strong resistance to illumination noise and background clutter. These results validate the effectiveness and practicality of SCCA-YOLO in real-world lunar crater detection tasks, especially under conditions that traditionally degrade detection robustness.

The superior performance of SCCA-YOLO stems from comprehensive enhancements in feature extraction and information fusion strategies. On one hand, the model reinforces local contextual understanding and multi-scale feature awareness, enabling the more sensitive detection of small-scale craters. On the other hand, its optimized fusion architecture strengthens salient feature representation and attenuates background noise, improving the model’s ability to discriminate between craters and surrounding terrain. This synergy between local feature enhancement and global semantic modeling significantly improves both detection accuracy and localization precision, thereby ensuring the model’s adaptability and stability in various real-world scenarios.

Notably, as shown in Figure 5, medium- and small-scale craters constitute the vast majority of our annotated population. Because these smaller features also exhibit the greatest degree of degradation, the aggregate recall and precision scores we report predominantly reflect our model’s ability to detect such challenging instances. Consequently, the overall performance metrics can be regarded as a robust proxy for small-crater detection accuracy, thereby guiding targeted improvements in future work.

3.3. Ablation Experimental Result

To further validate the effectiveness of the individual modules in SCCA-YOLO, a series of ablation experiments were designed. These experiments were conducted on the same dataset by selectively removing key modules, including the CAM, CAC, and SCFM. Notably, we did not evaluate the CAC strategy independently (i.e., baseline + CAC or baseline + CAC + SCFM), as CAC was specifically designed to complement the CAM module. As detailed in Section 2.2, CAC performs channel-wise reweighting on the multi-branch outputs produced by CAM. Without the CAM providing enriched feature representations, CAC cannot operate as intended, and its independent evaluation would deviate from the original module design. The results of the ablation experiments are presented in Table 2. To further illustrate the effectiveness of the proposed modules, Figure 6 visualizes the intermediate feature maps across different configurations.

From these results, it can be observed that the baseline model achieved a precision of 88.7% and a recall of 87.6%. The corresponding values of

m A P_{0.5}

and

m A P_{0.5 : 0.95}

were 95.9% and 80.8%, respectively. These results indicate that the baseline model, despite the absence of any added modules, demonstrated moderate detection capability.

After introducing the CAM (Case b) into the baseline model, the precision and recall increased to 90.5% and 87.8%, respectively, while

m A P_{0.5}

slightly improved to 96.0%. These findings suggest that the CAM, through multi-branch dilated convolution, effectively enhances the model’s local contextual awareness and improves its sensitivity to craters with indistinct visual features. To intuitively demonstrate the effectiveness of the CAM, we conducted a visualization of the feature map after processing through the CAM, as illustrated in the third column of Figure 6. According to these feature maps, the CAM contributes to the initial feature enhancement of our model based on the local context information. This partially mitigates the challenge of weak feature representation. However, due to the insufficient fusion of high-level semantic information,

m A P_{0.5 : 0.95}

did not show a substantial improvement.

Upon further integration of the CAC strategy (Case c), the precision and recall remained largely unchanged, whereas

m A P_{0.5 : 0.95}

increased to 81.3%. This demonstrates that the CAC strategy optimizes the feature fusion process through adaptive channel-wise weighting, strengthening semantic associations across multi-scale features and improving overall recognition performance under varying crater scales and background complexities.

When the SCFM was incorporated (Case d), the model’s performance improved comprehensively, with both precision and recall reaching 90.0%. Meanwhile,

m A P_{0.5}

and

m A P_{0.5 : 0.95}

further increased to 96.3% and 81.4%, respectively. The SCFM, by performing global feature fusion along both spatial and channel dimensions, effectively suppresses complex background noise and further enhances the model’s ability to differentiate craters from clutter. The corresponding feature maps (the fourth column of Figure 6) corroborate this improvement: After the SCFM, the model further concentrates on the target craters owing to the module’s superior global modeling capability. Therefore, the craters can be more effectively distinguished with the background interference.

Finally, with all three modules integrated (Case f), the model achieved optimal performance across three of the four metrics: precision improved to 90.8%, recall to 88.5%,

m A P_{0.5}

to 96.5%, and

m A P_{0.5 : 0.95}

to 81.5%. These results validate the effectiveness of the proposed progressive optimization strategy across three stages—feature enhancement, semantic fusion, and global modeling. In particular, the combined application of the CAM, CAC strategy, and SCFM exhibits strong complementarity and synergistic benefits, significantly improving the model’s robustness and generalization capability in detecting small-scale craters and suppressing complex background interference.

3.4. SCFM Replacement

Finally, we conducted comparative experiments against several representative global context modeling methods, including NLBlock, SCPBlock, and GCBlock. The experimental results are summarized in Table 3.

The NLBlock, which models long-range dependencies via non-local operations, improves global contextual awareness by directly computing interactions across all spatial positions. However, its high computational complexity and lack of specialization for small object features or complex backgrounds limit its performance in crater detection tasks. Specifically, it yielded a precision of 83.3%, recall of 84.6%, and a relatively low

m A P_{0.5 : 0.95}

of 69.3%, falling short of practical deployment requirements in high-density, high-noise environments.

The SCPBlock achieved a recall of 90.1%, slightly higher than the SCFM, but its precision dropped to 88.2%, lower than the 90.0% attained by the SCFM. This trade-off suggests that while SCPBlock improves detection coverage, it sacrifices accuracy, likely introducing more false positives. This drawback arises from SCPBlock’s emphasis on enhancing spatial features while lacking adequate channel-wise feature representation. As a result, the model struggles to accurately delineate craters from background textures in complex terrains, ultimately impairing detection precision.

In contrast, GCBlock performed well in terms of precision (91.4%) but exhibited a lower recall of 87.5% compared to the SCFM. This indicates a reduced capacity to detect all craters, especially in challenging scenarios. The GCBlock primarily leverages channel attention for global feature modeling but does not sufficiently utilize spatial information, making it less effective for detecting irregular or edge-degraded crater instances. Consequently, the model is prone to missed detections in regions with ambiguous boundaries, leading to a higher type II error.

The proposed SCFM addresses the limitations of the above approaches by jointly modeling global context across both spatial and channel dimensions. This dual-domain interaction significantly enhances the model’s ability to capture salient features while suppressing background noise. As a result, SCFM achieved balanced and superior performance, with both precision and recall reaching 90.0%. Furthermore, it delivered the best results across two evaluation metrics, attaining

m A P_{0.5}

of 96.3% and

m A P_{0.5 : 0.95}

of 81.4%. On the other two metrics, P and R, it obtained the second-best result among the four modules, yielding 90.0% for both. While our SCFM did not achieve the highest scores on every metric, it nevertheless attained an optimal trade-off between precision and recall.

These findings clearly demonstrate the robustness and generalization capability of the SCFM, particularly in detecting craters amidst complex background conditions. The module’s effectiveness makes it especially suitable for scenarios involving dense crater distributions and topographically heterogeneous surfaces, where conventional global modeling methods fall short.

4. Conclusions

This paper presented SCCA-YOLO, a novel lunar crater detection framework tailored for high-resolution CCD imagery. To address the limitations of traditional methods, particularly the insufficient feature representation and background interference, SCCA-YOLO incorporates two key innovation modules: CAM and SCFM. The CAM utilizes a multi-branch dilated convolution structure to capture diverse semantic information and enhances local contextual perception through diffusion convolutions, effectively detecting craters across various scales. Additionally, an adaptive CAC mechanism was introduced to strengthen critical information pathways during feature fusion, further enhancing feature representation quality. On this basis, the SCFM performs feature fusion along both the global spatial and channel dimensions, directing the model’s attention towards target craters and significantly improving detection capability.

Extensive experiments conducted on the Chang’e-6 dataset demonstrated that SCCA-YOLO consistently outperformed other mainstream detection models in terms of precision, recall, and mAP.

In particular, SCCA-YOLO exhibited a marked advantage in crater detection tasks, especially under challenging scenarios of blurred features and complex background interference. This incremental advancement notably improved detection accuracy and enhanced robustness against complex illumination and topographic variations. Compared to traditional YOLO-based methods, SCCA-YOLO achieved superior stability and resilience.

In conclusion, lunar craters are fundamental geomorphological features of the Moon and are crucial for understanding lunar surface evolution. The efficient and accurate detection of these craters holds significant scientific importance and practical value. Given the diversity in crater sizes, detection results not only affect recognition performance but also directly impact lunar age estimation, where large craters are typically used as statistical indicators. Ensuring accurate detection across scales helps improve the scientific reliability of geological interpretation. In particular, crater statistics derived from high-precision detection results can serve as a key basis for crater-based lunar dating methods, helping to reduce age estimation errors and improve the reliability of geological evolutionary analysis. Future research will focus on further compressing the model architecture and enhancing real-time detection capabilities. In addition, the generalizability and scalability of the proposed method will be explored in surface feature extraction and geological structure recognition tasks on other planetary bodies.

Author Contributions

Conceptualization, J.T.; methodology, J.T. and T.L.; software, J.T. and B.G.; experiment design, J.T.; validation, J.T., B.G., and T.L.; investigation, B.G. and T.L.; resources, B.G.; writing—original draft preparation, B.G. and T.L.; writing—review and editing, J.T., T.L., and Y.-B.L.; visualization, B.G.; supervision, Y.-B.L.; project administration, Y.-B.L.; funding acquisition, Y.-B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China, grant number 2022YFF0711400, and the National Space Science Data Center Youth Open Project, grant number NSSDC2302001.

Data Availability Statement

The SCCA-YOLO code is available at https://github.com/lingchez/lunar-crater-scca-yolo (accessed on 22 May 2025). The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SCCA-YOLO	Spatial Channel Fusion and Context-Aware YOLO
CAM	Context-Aware Module
SCFM	Joint Spatial and Channel Fusion Module
CAC	Context Attention Concatenation
GMP	Global Max Pooling
GAP	Global Average Pooling
P	Precision
R	Recall
AP	Average Precision
mAP	Mean Average Precision
GFLOPs	Giga Floating Point Operations
Params	Parameter Counts

References

Hiesinger, H.; van der Bogert, C.H.; Michael, G.; Schmedemann, N.; Iqbal, W.; Robbins, S.J.; Ivanov, B.; Williams, J.P.; Zanetti, M.; Plescia, J.B.; et al. The Lunar Cratering Chronology. Rev. Mineral. Geochem. 2023, 89, 401–452. [Google Scholar] [CrossRef]
Zhang, K.; Liu, J.; Zhang, L.; Gu, Y.; Yue, Z.; Zhang, S.; Zhang, J.; Qin, H.; Liu, J. Quantitative Research on the Morphological Characteristics of Lunar Impact Craters of Different Stratigraphic Ages since the Imbrian Period. Remote Sens. 2024, 16, 1540. [Google Scholar] [CrossRef]
Yue, Z.; Li, H.; Zhang, N.; Gou, S.; Lin, Y. Lunar Evolution Analysis Based on Numerical Simulations of Typical Lunar Impact Craters. Space Sci. Technol. 2023, 3, 0084. [Google Scholar] [CrossRef]
Yang, C.; Wang, X.; Zhao, D.; Guan, R.; Zhao, H. Accurate Mapping and Evaluation of Small Impact Craters within the Lunar Landing Area. Remote Sens. 2024, 16, 2165. [Google Scholar] [CrossRef]
Jing, X.; Liu, X.; Liu, B. Composite Backbone Small Object Detection Based on Context and Multi-Scale Information with Attention Mechanism. Mathematics 2024, 12, 622. [Google Scholar] [CrossRef]
Zhang, H.; An, L.; Chu, V.; Stow, D.; Liu, X.; Ding, Q. Learning Adjustable Reduced Downsampling Network for Small Object Detection in Urban Environments. Remote Sens. 2021, 13, 3608. [Google Scholar] [CrossRef]
Fan, L.; Yuan, J.; Zha, K.; Wang, X. ELCD: Efficient Lunar Crater Detection Based on Attention Mechanisms and Multiscale Feature Fusion Networks from Digital Elevation Models. Remote Sens. 2022, 14, 5225. [Google Scholar] [CrossRef]
Zang, S.; Mu, L.; Xian, L.; Zhang, W. Semi-Supervised Deep Learning for Lunar Crater Detection Using CE-2 DOM. Remote Sens. 2021, 13, 2819. [Google Scholar] [CrossRef]
Tewari, A.; Verma, V.; Srivastava, P.; Jain, V.; Khanna, N. Automated Crater Detection from Co-Registered Optical Images, Elevation Maps and Slope Maps Using Deep Learning. Planet. Space Sci. 2022, 218, 105500. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Sinha, M.; Paul, S.; Ghosh, M.; Mohanty, S.N.; Pattanayak, R.M. Automated lunar crater identification with Chandrayaan-2 TMC-2 images using deep convolutional neural networks. Sci. Rep. 2024, 14, 8231. [Google Scholar] [CrossRef] [PubMed]
Chen, M.; Liu, D.; Qian, K.; Li, J.; Lei, M.; Zhou, Y. Lunar Crater Detection Based on Terrain Analysis and Mathematical Morphology Methods Using Digital Elevation Models. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3681–3692. [Google Scholar] [CrossRef]
Vickers, M.J.; Cook, A.C. Textural Analysis to Aid Automated Classification of Lunar Craters. In Proceedings of the European Planetary Science Congress, Rome, Italy, 19–24 September 2010; Volume 5, p. EPSC2010-771. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Chen, S.; Liang, J.; Zhu, J.; Tian, X. New Methods for Lunar Impact Crater Detection Based on YOLO v7 with Deformable ConvNets. In Proceedings of the 2023 IEEE International Conference on Electrical, Automation and Computer Engineering (ICEACE), Changchun, China, 29–31 December 2023; pp. 123–127. [Google Scholar] [CrossRef]
Nan, J.; Wang, Y.; Di, K.; Xie, B.; Zhao, C.; Wang, B.; Sun, S.; Deng, X.; Zhang, H.; Sheng, R. YOLOv8-LCNET: An Improved YOLOv8 Automatic Crater Detection Algorithm and Application in the Chang’e-6 Landing Area. Sensors 2025, 25, 243. [Google Scholar] [CrossRef] [PubMed]
Deeksha; Meenpal, T. Crater Detection Using YOLOv9 Model. In Proceedings of the 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 24–28 June 2024; pp. 1–5. [Google Scholar] [CrossRef]
Yang, S.; Cai, Z. High-Resolution Feature Pyramid Network for Automatic Crater Detection on Mars. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4601012. [Google Scholar] [CrossRef]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar] [CrossRef]
Zhang, J.; Wei, Z.; Zhang, X.; Zhang, X.; Wei, Z. SE-CBAM-YOLOv7: An Improved Lightweight Attention Mechanism-Based YOLOv7 for Real-Time Detection of Small Aircraft Targets in Microsatellite Remote Sensing Imaging. Aerospace 2024, 11, 605. [Google Scholar] [CrossRef]
Ruan, H.; Qian, W.; Zheng, Z.; Peng, Y. A Decoupled Semantic–Detail Learning Network for Remote Sensing Object Detection in Complex Backgrounds. Electronics 2023, 12, 3201. [Google Scholar] [CrossRef]
Jia, Y.; Su, Z.; Wan, G.; Liu, L.; Liu, J. AE-TransUNet+: An Enhanced Hybrid Transformer Network for Detection of Lunar South Small Craters in LRO NAC Images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 6007405. [Google Scholar] [CrossRef]
Zhang, G.; Lu, S.; Zhang, W. CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2019, 57, 10015–10024. [Google Scholar] [CrossRef]
Xie, Y.; Liu, S.; Chen, H.; Cao, S.; Zhang, H.; Feng, D.; Wan, Q.; Zhu, J.; Zhu, Q. Localization, Balance, and Affinity: A Stronger Multifaceted Collaborative Salient Object Detector in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4700117. [Google Scholar] [CrossRef]
Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCBlock. In Proceedings of the Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 11532–11541. [Google Scholar]
Fan, X.; Luo, H.; Zhang, X.; He, L.; Zhang, C.; Jiang, W. SCPNet: Spatial-Channel Parallelism Network for Joint Holistic and Partial Person Re-Identification. arXiv 2018, arXiv:1810.06996. [Google Scholar]
Bai, Z.; Li, G.; Liu, Z. Global–Local–Global Context-Aware Network for Salient Object Detection in Optical Remote Sensing Images. ISPRS J. Photogramm. Remote Sens. 2023, 198, 184–196. [Google Scholar] [CrossRef]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6569–6578. [Google Scholar] [CrossRef]
Wang, X.; Girshick, R.; Gupta, A.; He, K. NLBlock. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
Shan, Y.; Xia, Y.; Chen, Y.; Cremers, D. SCP. arXiv 2023, arXiv:2312.04410. [Google Scholar]

Figure 1. Demonstration of SCCA-YOLO Model framework and workflow.

Figure 2. Architecture of Context-Aware Module (CAM): (a) architecture; (b) details of dilated convolution.

Figure 3. The architecture of Channel Attention Concatenation (CAC).

Figure 4. Architecture of Joint Spatial and Channel Fusion Module (SCFM).

Figure 5. Visual comparison within classic YOLO algorithms. The green boxes represent TP, the blue boxes represent FP, and the red boxes represent FN.

Figure 6. Influences of CAM and SCFM on feature extraction. The brighter color indicates that the model pays more attention to that area.

Table 1. Performance of mainstream detection models.

Model	Backbone	Detection Performance				Computational Efficiency
Model	Backbone	P (%)	R (%)	mAP_0.5 (%)	mAP_0.5:0.95 (%)	GFLOPs	Params/M
Faster R-CNN [16]	ResNet50	32.1	32.2	45.3	28.3	140.7	41.593
RetinaNet [34]	EfficientNetb3	40.3	50.4	60.8	36.5	26.9	21.903
CenterNet [35]	ResNet50	40.6	45.2	68.7	39.2	70.2	32.665
YOLOv5	-	90.7	86.8	96.0	80.5	24.0	9.123
YOLOv6 [31]	EfficientRep	89.8	87.8	96.1	79.9	44.2	16.306
YOLOv8	-	88.2	88.1	95.5	80.1	28.6	11.136
YOLOv9 [32]	-	89.7	87.9	96.1	80.6	27.4	7.288
YOLOv10 [33]	-	88.3	86.8	95.2	80.7	24.8	8.067
YOLO11	-	88.7	87.6	95.9	80.8	21.5	9.428
SCCA-YOLO	-	90.8	88.5	96.5	81.5	29.2	12.196

Table 2. Method evaluation metrics.

Case	Ablation Module				Detection Performance
Case	Baseline	CAM	CAC	SCFM	P (%)	R (%)	mAP_0.5 (%)	mAP_0.5:0.95 (%)
(a)	✓				88.7	87.6	95.9	80.8
(b)	✓	✓			90.5	87.8	96.0	80.3
(c)	✓	✓	✓		90.0	87.5	95.9	81.3
(d)	✓			✓	90.0	90.0	96.3	81.4
(e)	✓	✓		✓	88.2	89.2	96.0	81.1
(f)	✓	✓	✓	✓	90.8	88.5	96.5	81.5

Table 3. Evaluation of global modeling modules.

Method	P (%)	R (%)	${mAP}_{0.5}$ (%)	${mAP}_{0.5 : 0.95}$ (%)
NLBlock [36]	83.3	84.6	91.7	69.3
SCPBlock [37]	88.2	90.1	96.2	80.6
GCBlock [28]	91.4	87.5	96.1	80.7
SCFM	90.0	90.0	96.3	81.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tang, J.; Gu, B.; Li, T.; Lu, Y.-B. SCCA-YOLO: Spatial Channel Fusion and Context-Aware YOLO for Lunar Crater Detection. Remote Sens. 2025, 17, 2380. https://doi.org/10.3390/rs17142380

AMA Style

Tang J, Gu B, Li T, Lu Y-B. SCCA-YOLO: Spatial Channel Fusion and Context-Aware YOLO for Lunar Crater Detection. Remote Sensing. 2025; 17(14):2380. https://doi.org/10.3390/rs17142380

Chicago/Turabian Style

Tang, Jiahao, Boyuan Gu, Tianyou Li, and Ying-Bo Lu. 2025. "SCCA-YOLO: Spatial Channel Fusion and Context-Aware YOLO for Lunar Crater Detection" Remote Sensing 17, no. 14: 2380. https://doi.org/10.3390/rs17142380

APA Style

Tang, J., Gu, B., Li, T., & Lu, Y.-B. (2025). SCCA-YOLO: Spatial Channel Fusion and Context-Aware YOLO for Lunar Crater Detection. Remote Sensing, 17(14), 2380. https://doi.org/10.3390/rs17142380

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SCCA-YOLO: Spatial Channel Fusion and Context-Aware YOLO for Lunar Crater Detection

Abstract

1. Introduction

2. Methodology

2.1. Framework of SCCA-YOLO

2.2. Context-Aware Module (CAM)

2.3. Joint Spatial and Channel Fusion Module (SCFM)

3. Experiment Results and Discussions

3.1. Experiment Configurations

3.1.1. Datasets Provenance and Segmentation

3.1.2. Model Implementation

3.1.3. Evaluation Metrics

3.2. Results and Comparisons

3.2.1. Performance Comparison with Mainstream Detection Models

3.2.2. Visual Evaluation of Classic YOLO Algorithms

3.3. Ablation Experimental Result

3.4. SCFM Replacement

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI