Satellite Optical Target Edge Detection Based on Knowledge Distillation

Meng, Ying; Zhang, Luping; Zhang, Yan; Hu, Moufa; Zhao, Fei; Shen, Xinglin

doi:10.3390/rs17173008

Open AccessArticle

Satellite Optical Target Edge Detection Based on Knowledge Distillation

by

Ying Meng

,

Luping Zhang

^*,

Yan Zhang

,

Moufa Hu

,

Fei Zhao

and

Xinglin Shen

College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(17), 3008; https://doi.org/10.3390/rs17173008

Submission received: 13 July 2025 / Revised: 17 August 2025 / Accepted: 27 August 2025 / Published: 29 August 2025

Download

Browse Figures

Versions Notes

Abstract

Edge detection of space targets is vital in aerospace applications, such as satellite monitoring and analysis, yet it faces challenges due to diverse target shapes and complex backgrounds. While deep learning-based edge detection methods dominate due to their powerful feature representation capabilities, they often suffer from large parameter sizes and lack explicit geometric prior constraints for space targets. This paper proposes a novel edge detection method for satellite targets based on knowledge distillation, namely STED-KD. Firstly, a multi-stage distillation strategy is proposed to guide a lightweight, fully convolutional network with fewer parameters to learn key features and decision boundaries from a complex teacher model, achieving model efficiency. Next, a shape prior guidance module is integrated into the student branch, incorporating geometric shape information through shape prior model construction, similarity metric calculation, and feature reconstruction, enhancing adaptability to space targets and improving detection accuracy. Additionally, a curvature-guided edge loss function is designed to ensure continuous and complete edges, minimizing local discontinuities. Experimental results on the UESD space target dataset demonstrate superior performance, with ODS, OIS, and AP scores of 0.659, 0.715, and 0.596, respectively. On the BSDS500, STED-KD achieves ODS, OIS, and AP scores of 0.818, 0.829, and 0.850, respectively, demonstrating strong competitiveness and further confirming its stability.

Keywords:

space target; knowledge distillation; shape prior; curvature

1. Introduction

Space targets refer to various types of flying objects that are active in the outer space of the Earth. They play a pivotal role in modern technology and military applications, supporting functions like communication, navigation, meteorological monitoring, remote sensing reconnaissance, and so on. With the increasing number and diversity of space objects, detecting and identifying these targets have become critical tasks. Edge detection, as a fundamental step in this process, provides an important basis for further identification, tracking, and analysis. Given an image containing space targets (such as astronomical images taken by satellites or Earth observation images), edge detection aims to extract the contour boundaries of the space targets in the image as well as the edge information with significant features. However, challenges such as complex backgrounds and diverse target shapes complicate this task. As shown in Figure 1, space target edges encompass both object-level boundaries and meaningful local details, often overlapping with background elements like the Earth or nebulae, increasing detection difficulty.

Effective edge detection requires balancing local feature capture with global semantic understanding. Current feature representation methods are mainly divided into traditional and deep learning-based categories. Traditional methods include gradient operator-based [2,3,4,5,6] and handcrafted feature-based [7,8,9,10,11,12,13,14,15,16,17] approaches, which rely on low-level features like color, texture, and gradients but are sensitive to noise and offer limited localization accuracy. In contrast, deep learning-based methods, with their powerful feature representation capabilities, have become dominant. Early non-end-to-end methods, such as N4-fields [18], DeepContour [19], DeepEdge [20], and HFL [21], are simple and easy to understand but limited in applicability. In 2015, Xie et al. [22] proposed HED (Holistically Connected Edge Detection), an end-to-end method using VGG16 as the backbone network and employing a multi-scale feature fusion structure to learn rich hierarchical features. However, HED’s top-down feature extraction can lose fine details. Subsequent research methods have focused on fusing shallow and deep features. For example, Liu et al. [23] proposed RCF (Richer Convolutional Features) method, which uses a Feature Pyramid Network (FPN) structure to achieve cross-layer feature fusion; He et al. [24] proposed BDCN, which adopts a bi-directional cascading network structure and introduces a scale enhancement module to generate multi-scale features; Soria et al. [25] proposed DexiNed, which is based on HED and Xception model, and enhances edge feature extraction through multi-scale feature fusion and cross-layer connection techniques; Pu et al. [26] proposed EDTER, which is based on Transformer, captures coarse-grained global context and fine-grained local context in two stages, and combines global and local context through feature fusion to predict edge results. In addition, the UAED proposed by Zhou et al. [27] innovatively models the uncertainty in multi-annotation scenarios by converting the deterministic label space into a learnable Gaussian distribution, demonstrating advantages in improving detection robustness. The DiffusionEdge proposed by Ye et al. [28] pioneered the introduction of the diffusion probabilistic model into the edge detection task. By performing gradual denoising in the latent space and combining adaptive Fourier filtering and uncertainty distillation strategies, it can directly generate clear edges without post-processing, significantly enhancing the crispness of edges. Despite the significant progress made by these methods, they often come at the cost of detection speed (for example, the single-image inference time of the EDTER model is as high as 1.2 s). The increased model complexity not only raises computational costs but also makes the training process more challenging, failing to meet the requirements for real-time performance and computational resources in space target edge detection.

To address these challenges, lightweight edge detection methods [29,30,31,32] have emerged. Inspired by traditional methods, Wibisono et al. [29] proposed the TIN, which combines a feature extraction module, an Enrichment module, and a Summarizer module, thereby pioneering a new path for lightweight edge detection. FINED [30] further improves detection accuracy based on TIN. Su et al. [31] proposed the PiDiNet, which combines traditional edge detection operators with CNN models through pixel difference convolution (PDC) and reduces model parameters using separable depthwise convolution. Soria et al. [32] proposed the TEED, which contains only 59k parameters. This model achieves a compact network structure by reducing the number of convolutional layers and skipping batch normalization, while also introducing the smish activation function to further improve training efficiency. Li et al. [33] proposed UHNet, achieving 166 FPS with 42.3k parameters via an innovative PDDP block and pooling-based channel conversion. These lightweight methods have, to some extent, enhanced detection efficiency and saved computational resources, but they still have limitations. Their ultra-lightweight design sacrifices fine structure capture, causing edge discontinuities in satellite details and failing to balance space targets’ “complete contour + clear local details” requirements. Recently, Liufu et al. [34] proposed SAUGE, which extracts multi-granular features from SAM’s intermediate layers via a lightweight STN to generate controllable granularity edge maps, showing strong cross-dataset generalization. However, SAM’s natural scene-biased features adapt poorly to space targets’ geometric distributions, limiting direct aerospace migration. Therefore, achieving the optimal balance between speed and accuracy while improving the rate and ensuring edge quality is the focus of research.

In the pursuit of performance enhancement in general scenarios, current research often overlooks the geometric prior constraints unique to the targets. This oversight can lead to semantic confusion, resulting in pseudo-edges that conflict with the geometric morphology of the target, thereby reducing the accuracy of detection. Moreover, due to the influence of occlusion or changes in lighting, the edge prediction of satellite solar panels often experiences local discontinuities or contour distortions, as shown in Figure 2. However, space targets typically possess distinct geometric features, such as the regular shape of satellites, the rectangular outline of solar panels, and the axial symmetry of the satellite body. These features are important identifiers that distinguish them from other objects or backgrounds. The above analysis fully demonstrates the importance of using geometric prior information to guide model learning in space target edge detection.

To address the issues of large parameter volume, low computational efficiency, insufficient detail capture, and failure to fully utilize the geometric prior constraints of targets in existing methods for space target edge detection, especially the phenomenon of local edge discontinuities, this paper proposes a satellite target edge detection method based on knowledge distillation, named STED-KD. STED-KD employs a teacher–student architecture, where both the teacher and student branches are based on an encoder–decoder structure. To deeply mine the shape prior information of space targets, a shape prior guidance module is embedded in the student branch. Additionally, a curvature-guided loss function is designed to fully utilize the characteristic of edge curvature changes—namely, that the curvature changes of continuous edges are typically continuous, while discontinuous edges often exhibit sudden curvature changes at breakpoints—thereby achieving natural edge closure. To comprehensively verify the effectiveness of the proposed method, this paper adopts an experimental design that prioritizes specialized datasets while using general datasets as supplements. With the UESD [1] dataset as the core, it focuses on verifying the targeted performance of method in the task of space target edge detection. Meanwhile, the BSDS500 [7] general dataset is introduced to verify the generalization ability of the method in a wider range of scenarios, ensuring that the method is not only applicable to specific space target scenarios but also possesses certain competitiveness in general edge detection. Extensive experiments conducted on UESD [1] and BSDS500 [7] demonstrate that, compared with current edge detection methods, the proposed STED-KD not only improves the accuracy of edge detection but also significantly reduces model complexity, achieving model lightweighting. The main contributions of this paper are outlined as follows:

This paper proposes a novel knowledge distillation-based edge detection method for space targets. By designing a distillation strategy, the student model with fewer parameters and simpler structure can effectively learn the key features and decision boundaries from the teacher model.
A shape-prior-guided module is embedded in the student branch. By constructing a shape-prior model, computing similarity, and reconstructing features, this module effectively incorporates the geometric shape information of spatial targets into the edge-detection pipeline, thereby enhancing the model’s adaptability to spatial targets and improving the accuracy of edge detection.
This paper designs a curvature-guided loss function. By leveraging the characteristic of edge curvature changes, the model is guided to better restore the continuity of discontinuous edges, thus achieving natural edge closure and further enhancing the overall effect of edge detection.

The architecture of this paper is as follows. Section 2 mainly introduces the related works on knowledge distillation and the geometric prior of spatial targets. Section 3 analyzes our STED-KD in detail. Experimental results in single-frame datasets are showed in Section 4. Lastly, Section 5 presents our conclusion.

2. Related Work

Knowledge distillation is a typical neural network compression technique, first proposed by Hinton et al. [35] in 2015. Unlike other model compression methods such as model quantization and network pruning, knowledge distillation does not impose strict limitations on network structure and can achieve model compression based on different network frameworks.

Knowledge distillation adopts a teacher–student paradigm, where the teacher model usually has a complex structure and a large number of parameters, enabling it to extract rich feature information, while the student model has a simple structure and fewer parameters. The core of knowledge distillation lies in transforming the knowledge of the teacher model into a concise and effective representation, thereby reducing computational complexity and resource demands while maintaining high performance.

The selection of the teacher model is primarily based on performance. The model needs to have strong feature extraction capabilities to extract multi-level and multi-scale feature information from the input image. Although the main purpose of the teacher model is to provide high-quality knowledge guidance for the student model, its complexity and computational cost should also be within a reasonable range. An overly complex teacher model not only leads to high computational costs during training but also increases the difficulty of knowledge distillation. Therefore, the teacher model should be a network structure that balances performance and complexity.

The design of the student model mainly follows two core criteria: light weight and efficiency, and compatibility with the teacher model. On the one hand, the core goal of the student model is to achieve low computational demand and low storage occupancy, so as to run efficiently in resource-constrained environments. Specifically, the model should have fewer parameters to reduce storage space and computational burden, and its structure should be concise and efficient to meet the needs of real-time applications. On the other hand, the student model should be feature-aligned with the teacher model and have consistent output, thereby ensuring support for intermediate layer feature distillation and output layer distillation.

The geometric prior of spatial targets includes the following two categories:

(1): Intrinsic Geometric Shape Features of Spatial Targets: Spatial targets typically possess specific geometric shape features that are important identifiers distinguishing them from other objects or backgrounds. For example, satellite components often exhibit regular rectangular or circular structures, while the background is mostly random noise or irregular structures. This difference between targets and backgrounds provides a crucial entry point for accurately distinguishing spatial targets. By extracting geometric shape features, important prior information can be provided for edge detection, thereby enhancing the performance of the task. In this paper, the Shape Prior Guidance Module we designed is based on this concept. It acquires structural information consistent with the morphological features of spatial targets through template matching, dynamic feature enhancement, and channel dimension filtering techniques and effectively suppresses background responses conflicting with geometric priors, thus improving the accuracy and robustness of edge detection.
(2): Geometric Distribution and Topological Structure Constraints of Spatial Targets: Spatial targets are complex systems composed of multiple components. The locations and arrangement of these components in space (i.e., geometric distribution) and their interconnections and relative positions (i.e., topological structure) contain rich prior information. For instance, the satellite body usually occupies the central position and has a regular geometric shape (cuboid or cylinder). Solar panels are typically installed symmetrically on both sides of the satellite body and exhibit rectangular or square structures. The body and solar panels usually have axial symmetry and fixed connection points. The specific locations and arrangement of these components in space are crucial for understanding the overall structure of the satellite. By analyzing geometric distribution information, the various components of the satellite can be more accurately identified and interpreted, thereby improving the precision of edge detection. Moreover, introducing topological structure constraints ensures that the generated edge map not only matches the geometric shape features of each component but also maintains consistency with the interconnections and relative positions between components. This is particularly important in complex scenarios, such as when there is occlusion or poor lighting conditions, to achieve fine-grained parsing and understanding of the target’s overall structure and maintain the accuracy and integrity of edge detection.

3. Methodology

In this section, we introduce our STED-KD in detail.

3.1. Overview Architecture

Figure 3 illustrates the overall framework of proposed method. STED-KD comprises a teacher branch and a student branch, both of which adopt a fully convolutional encoder–decoder structure to accommodate input images of arbitrary sizes. First, the teacher model guides the student model to learn its key features and decision boundaries through knowledge distillation (Section 3.2), achieving model lightweighting. Then, a shape prior guidance module (Section 3.3) is innovatively introduced into the student branch to leverage the geometric shape prior information of space targets for enhanced model learning. Last, the curvature characteristic of edge changes is incorporated into the loss function (Section 3.4) to further enhance the overall effect of edge detection.

The training pipeline consists of two stages: in the first stage, a pre-trained teacher model supervises the lightweight student network through intermediate-feature and output-level distillation to obtain an initial model; in the second stage, the generic features learned in the first stage are frozen, the Shape Prior Guided Module is embedded, the geometric prior of space targets is fused into the network, and end-to-end fine-tuning is carried out with a curvature-guided edge loss.

3.2. Knowledge Distillation

To fully leverage the teacher model’s knowledge to enhance the performance of the student model, we adopt a multi-stage distillation mechanism. Below, we provide a detailed introduction from three main aspects: teacher model, student model, and the distillation approach.

3.2.1. Teacher Model

For knowledge distillation, the structural design of the teacher model is of vital importance, as it directly determines the quality of the knowledge that can be passed on to the student model. In the method proposed in this paper, the teacher model adopts a fully convolutional encoder–decoder structure. The encoder is responsible for mapping the original input image into a fixed-length high-dimensional feature vector to capture the key feature information within the image. The decoder then converts the feature vector output into specific edge prediction label sequences.

As shown in Figure 4, the encoder consists of 13 convolutional layers and 4 max-pooling layers. With the pooling layers serving as boundaries, the overall structure is divided into 5 blocks. Within each block, the number of feature channels remains consistent, increasing successively to 64, 128, 256, 512, and 512. All convolutional layers employ a 3 × 3 kernel size with a stride of 1 and padding set to “same” to ensure that the spatial dimensions of features remain unchanged before and after convolution. The pooling layers use a 2 × 2 kernel to achieve feature dimensionality reduction. In the encoder structure, as the number of feature channels gradually increases, the model is able to extract from lower-level simple features such as textures to higher-level features with more semantic information. Meanwhile, the progressive halving of the feature space dimensions helps to focus on key areas and eliminate redundant information.

The decoder is composed of upsampling layers, feature fusion layers, convolutional layers, and a softmax layer. The upsampling layer doubles the height and width of the input feature map to restore the spatial dimensions. It is then concatenated with the feature map from the encoder at the corresponding scale to preserve more detailed information. Taking the upsampling layer as a boundary, the overall structure is divided into 4 blocks. Between each block, the spatial dimensions of the features are progressively restored to the original size of the input image, and the number of channels is halved successively to 512, 256, 128, and 64. The SoftMax layer maps the output features to class probabilities, ultimately generating the edge prediction results.

3.2.2. Student Model

When designing the structure of the student model, it is imperative to take into account the requirements of feature knowledge distillation for model structural similarity as well as the need for model lightweighting. Therefore, this paper selects an encoder–decoder structure similar to that of the teacher model as the basic framework for the student model. Specifically, the decoder part of the student model is consistent with that of the teacher model, while the encoder structure has been optimized, as shown in Table 1.

To further reduce computational complexity, depthwise separable convolution is employed in the encoder to replace traditional convolutional layers. Depthwise separable convolution decomposes standard convolution into two separate operations: depthwise convolution and pointwise convolution. Depthwise convolution can be regarded as a special type of group convolution, where the number of groups equals the number of channels, focusing on performing convolution operations on each channel individually. Pointwise convolution then completes the information fusion between different channels through a 1 × 1 ordinary convolution. This design significantly reduces computational complexity while maintaining model structural similarity, achieving the model’s lightweighting.

The encoder of the student model has approximately

1 \times 10^{6}

parameters, a significant reduction compared to the teacher model’s

1.4 \times 10^{7}

parameters. This not only gives the student model an advantage in terms of storage and computing resource requirements, enabling it to better adapt to resource constraints in practical applications, but also effectively increases the model’s running speed, meeting the needs of edge detection scenarios with high real-time requirements.

3.2.3. Multi-Stage Distillation

In the multi-stage distillation mechanism, each stage has its own unique method of knowledge transfer and learning focus. In the first stage, the intermediate feature maps from the encoder part of the teacher model are passed on as knowledge to the student model, and the student model’s feature representation capability is constrained by the feature distillation [36,37] loss function shown in Equation (1). This stage primarily focuses on the student model’s learning of the teacher model’s feature extraction ability, enabling it to initially grasp the teacher model’s multi-level feature representation of the input image.

L_{m i d} = {| | ϕ_{t} (f_{t} (x)), ϕ_{s} (f_{s} (x)) | |}_{2}^{2}

(1)

where

f_{t} (x)

and

f_{s} (x)

represent the intermediate layer features output by the encoder in the teacher and student models, respectively. Since the network structures of the teacher and student models differ, the shapes and dimensions of their intermediate feature maps may not be consistent.

ϕ_{t} (f_{t} (x))

and

ϕ_{s} (f_{s} (x))

represent the transformed features of the teacher and student models, respectively, aligning them to have the same feature map shape and dimension, thereby achieving one-to-one correspondence of feature points. The transformation functions are implemented using simple convolutional layers.

In the second stage, the output probability distribution of the teacher model is utilized. The student model’s output layer learning is guided by the Kullback–Leibler (KL) divergence loss function, as shown in Equation (2). This enables the student model to better approximate the teacher model’s classification decision-making ability. This stage focuses on the student model’s learning of the teacher model’s decision logic, thereby further enhancing the student model’s performance.

L_{o u t} = τ^{2} \cdot K L (p_{t} (x; τ), p_{s} (x; τ))

(2)

where

K L

denotes the loss incurred when approximating the probability distribution of the teacher network with that of the student network. A larger

K L

divergence indicates a poorer approximation.

τ

represents the temperature parameter in knowledge distillation, which controls the smoothness of the output probability vectors. The

τ^{2}

term is used to maintain scale consistency, ensuring that the magnitude of the gradients is not affected by the temperature parameter. The temperature parameter

τ

adopts a dynamically adjusted strategy: it is set to 8 in the initial training stage to learn the global feature distribution, reduced to 5 in the middle stage to focus on key decision boundaries, and fixed at 3 in the later stage to strengthen the discrimination of clear edges.

p_{t} (x; τ)

and

p_{s} (x; τ)

represent the output probabilities of the teacher and student models, respectively, and can be expressed as Equations (3) and (4).

p_{t} (x; τ) = softmax (z_{t} (x) / τ)

(3)

p_{s} (x; τ) = softmax (z_{s} (x) / τ)

(4)

where

z_{t} (x)

and

z_{s} (x)

represent the original logit outputs of the teacher model and the student model for sample x, respectively. The overall loss for the knowledge-distillation stage is formulated as Equation (5).

L_{k d} = α \cdot L_{m i d} + (1 - α) \cdot L_{o u t}

(5)

Here, the weight parameter α is set to a fixed value of 0.5. This specific value is chosen to balance the contributions of the intermediate feature distillation loss and the output-layer distillation loss. It ensures that the student model can effectively learn the teacher model’s intermediate-layer feature representation capabilities—enabling it to capture detailed features of spatial target edges—while also promoting alignment between the output probability distributions of the two models, thereby strengthening the learning of decision boundaries. This configuration thus accommodates the dual requirements of spatial target edge detection: accurate capture of feature details and robust learning of decision logic.

3.3. Shape Prior Guidance

To fully exploit the geometric prior constraints of space targets, a shape prior guidance module is designed. The core of this module lies in enhancing the accuracy of edge detection through shape prior information. As shown in Figure 5, the module is mainly divided into three phases: shape prior model construction, similarity calculation, and feature reconstruction.

3.3.1. Shape Prior Model Construction

The construction of the shape prior database is to provide a high-quality and representative data foundation for the subsequent shape prior encoder. This process includes several key steps:

(1): Data Collection and Edge Annotation. After clarifying the target object or scene, widely collect relevant images to ensure coverage of variations in different angles, lighting conditions, and background environments to enhance the generalization ability of the shape prior model. For space targets, this paper collects images of key components (including solar panels, satellite bodies, antennae, etc.) under different angles and lighting conditions. Data annotation tools are used to annotate the edges of these key components, and they are roughly categorized into two types based on shape: circular and quadrilateral, as shown in Figure 6; these images are all from public datasets.
(2): Statistical Feature Extraction. Conduct feature analysis on the annotated images to extract statistical features of the shape of the target components, including calculating the curvature distribution of edges and analyzing the directional features of edges, etc. For example, solar panels usually have regular rectangular edges with a relatively concentrated curvature distribution and mainly linear directional features. The curvature distribution can reflect the smoothness and bending of the object’s contour, while the directional features help to understand the main direction of the edges. These statistical features can quantitatively describe the typical characteristics of the target shape and provide a key basis for target recognition and classification.
(3): Construction of the Shape Prior Model. Based on the statistical features, further use principal component analysis (PCA) to establish the shape prior model for each component separately. Combine the annotated edge coordinates with the extracted statistical features to form the feature vector f, and perform PCA analysis on the feature vector to extract the main feature vectors (principal components) and corresponding eigenvalues, as shown in Equation (6). The main reason for using standard PCA [38] for feature extraction when constructing the shape prior model is as follows: The regular geometric structure of space targets needs to be characterized by a continuous and complete feature distribution. Standard PCA can effectively retain the main feature components of the shape and ensure feature continuity, which meets the requirement of structural integrity for edge detection. Meanwhile, its calculation process is compatible with the existing model framework, which can avoid introducing additional complexity while satisfying the feature extraction effect.

U, \sum = P C A (f)

(6)

where

U

represents the matrix of principal components and

\sum

represents the matrix of eigenvalues. Select the first three principal components of each component as the shape prior feature vectors and combine them to form the shape prior model

M

, as shown in Equation (7).

M = [\begin{matrix} u_{1, 1} & u_{1, 2} & u_{1, 3} \\ u_{2, 1} & u_{2, 2} & u_{2, 3} \\ u_{3, 1} & u_{3, 2} & u_{3, 3} \end{matrix}]

(7)

where

u_{k, i}

represents the

i

-th principal component feature vector of the

k

-th key component.

3.3.2. Similarity Calculation

Through the interaction between the encoder features

F_{e n c} \in R^{H \times W \times C}

of the student branch and the prior model

M

, generate spatially adaptive convolutional kernel weights. First, calculate the similarity between the features and the template using channel-wise convolution, as shown in Equation (8).

S = Sigmoid ({Conv}_{1 \times 1} (F_{e n c} \otimes M))

(8)

where ⊗ denotes channel-wise correlation calculation and

S \in R^{N \times H \times W}

reflects the matching degree of each position with the template.

3.3.3. Feature Reconstruction

(1): Dynamic Convolutional Kernel Generation. Based on the similarity matrix $S$ , weight the base convolutional kernel $W^{b a s e}$ to generate target-oriented dynamic convolutional kernels.

W_{d y n} = S \cdot W^{b a s e}

(9)

The dynamic kernel

W_{d y n}

adaptively focuses on feature regions that highly match the target components. In this way, the model can dynamically adjust the convolutional kernel parameters to enhance the feature response that conforms to the shape of the key components of the space target.

(2): Feature Reconstruction. Use the dynamic convolutional kernel to guide feature reconstruction, as shown in Equation (10).

F_{g e o} = Conv (F_{e n c}, W_{d y n})

(10)

The reconstructed feature

F_{g e o}

can better capture the contour features of the target components, thereby improving the performance of edge detection.

3.4. Loss Function

The proposed training pipeline consists of two stages. The first stage is knowledge-distillation training, whose loss function is detailed in Section 3.2.3. The second stage is shape-prior-augmented training, in which the student model’s predictions are compared with the ground-truth edge map. The standard cross-entropy loss [39]

L_{C E}

for edge detection is given by Equation (11).

L_{C E} = - \frac{1}{N} \sum_{i = 0}^{N} [y_{i} \log (p_{S} (x_{i}))]

(11)

where

N

denotes the total number of pixels,

y_{i}

represents the true label value of the

i

-th pixel in the true edge map, and

p_{S} (x_{i})

represents the prediction probability of the student model for sample

x_{i}

. Both

y_{i}

and

p_{S} (x_{i})

take values of 0 or 1, where 1 indicates that the pixel belongs to the edge and 0 indicates otherwise.

The cross-entropy loss mainly focuses on the pixel-level classification accuracy. However, for the task of edge detection, the continuity and integrity of edges are equally crucial. To enable the model to generate smoother and more continuous edges, this paper innovatively incorporates curvature information [40] into the loss function, as shown in Equation (12), defined as the mean squared error between the predicted curvature

C (\hat{y})

and the true curvature

C (y)

. By comparing the curvature of the predicted edge and the true edge, the model is guided to more accurately extract the geometric features of the edge, thereby generating prediction results that better match the true edge.

L_{c u r} = \frac{1}{N} \sum_{i = 1}^{N} {(C (p_{S} (x_{i})) - C (y_{i}))}^{2}

(12)

where for the prediction probability

p_{S} (x_{i})

and the true edge map

y_{i}

,

C ({\hat{y}}_{i})

) and

C (y_{i})

represent the curvatures of the predicted edge and the true edge at the

i

-th pixel, respectively.

Thus, the overall loss for the shape-prior-augmented training stage is given in Equation (13).

L_{s a p} = γ \cdot L_{C E} + λ \cdot L_{c u r}

(13)

Here, γ and λ are fixed weight parameters, set to 0.7 and 0.3, respectively, with no adaptive updates. γ is assigned a larger weight for the cross-entropy loss to prioritize the model’s accurate classification of edge pixels, while λ allocates an appropriate weight to the curvature loss, guiding the model to focus on the continuity of edge curvature changes.

The proposed knowledge-distillation-based edge detection procedure is summarized in Algorithm 1.

Algorithm 1 STED-KD
Input: Image $I \in R^{H \times W \times 3}$ //RGB space target image
Output: edge map $E \in R^{H \times W}$
1. Teacher model $\leftarrow$ Pre-trained CNN
	teacher_features $f_{t} \leftarrow$ Extract_features(Teacher_model, $I$ );
	teacher_probs $p^{T}$ ← Softmax(Teacher_model( $I$ ));
2. Multi-stage Knowledge Distillation $\leftarrow$ Teacher model
	Stage-1: Intermediate-feature distillation
		student_features $f_{s}$ ← Extract_features(Student_model, $I$ );
		Loss_mid $\leftarrow$ MSE (teacher_features $f_{t}$ , student_features $f_{s}$ );
	Stage-2: Output-level distillation
		student_probs $p^{S}$ ← Softmax(Student_model( $I$ ));
		Loss_out ← KL (teacher_ probs $p^{T}$ , student_ probs $p^{S}$ );
		Loss_kd ← alpha * Loss_mid + beta * Loss_out;
	return Pre-trained Student model.
3. Shape Prior Augmented Training $\leftarrow$ ground_truth, Pre-trained Student model
	Shape Prior Guided Module
		1. Build shape prior:
			Shape_prior $M$ ← PCA(Edge_templates);
		2. Similarity computation:
			student_features $F_{e n c} \leftarrow$ Extract_features(Pre-trained Student model, $I$ );
			S ← Similarity(student_features $F_{e n c}$ , Shape_prior $M$ );
		3. Feature reconstruction:
			dynamic_kernel $W_{d y n}$ ← Weighted_kernel(S, Base_kernel $W^{b a s e}$ );
			reconstructed feature $F_{g e o}$ ← DynamicConv(student_features $F_{e n c}$ , $W_{d y n}$ );
			student_features’ ← reconstructed feature $F_{g e o}$ ;
			student_probs’ ← Pre-trained Student model (student_features’);
	Loss Function:
		Loss_ce ← cross_entropy(student_probs’, ground_truth);
		Loss_cur ← curvature_penalty(student_probs’, ground_truth);
		Loss_spa ← gamma * Loss_ce + delta * Loss_cur.

4. Experiments

4.1. Datasets

This paper uses the UESD [1] and BSDS500 [7] datasets for training and evaluating the proposed method. The experimental validation system is constructed around the core task of space target edge detection, with the selection of datasets and experimental design all guided by this core. Among them, the UESD [1] dataset, as a specialized dataset for space targets, serves as the core benchmark for evaluating the targeted performance of the method, and the effectiveness of all innovative modules (such as the shape prior guidance module and curvature loss function) is mainly based on the results from this dataset; the BSDS500 [7] dataset, on the other hand, acts as a general edge detection benchmark and is used to verify the generalization ability of the method.

The UESD dataset [1] was constructed by Zhao et al. specifically for satellite component recognition. A realistic space simulation environment was established based on Unreal Engine 4, and 33 different high-quality satellite models were imported into this environment, eventually generating 10,000 satellite images with different attitudes and backgrounds. Given the current lack of public datasets dedicated to space target edge detection, this paper selected 200 representative images from the UESD dataset [1] and annotated the edge information using the LabelMe [41] tool, as shown in Figure 7.

During the annotation process, emphasis was placed on the edges of components such as the satellite body, solar panels, and antennas, while edges in the background and minor detail edges within the components were consciously ignored. This selective annotation method significantly reduced the workload but may have a certain impact on subsequent evaluation results, especially in the calculation of precision and recall. This will be discussed in detail in the experimental analysis section. The annotation results are saved in JSON format for subsequent processing and analysis.

The BSDS500 dataset [7] is one of the most widely used datasets in the field of edge detection, containing 500 images (200 training images, 100 validation images, and 200 test images). BSDS500 covers a variety of different scenes and objects, ranging from natural landscapes to man-made structures and from simple backgrounds to complex textures. Each image has been annotated by multiple annotators to ensure the accuracy and reliability of the annotations. As a general dataset, its role is to reflect the basic performance of the method in general edge detection tasks, proving that it is not only “overfitted” to space target scenarios but also has universal applicability.

4.2. Implementation Details

Experiments are implemented using PyTorch 3.7. Training and testing are conducted on the BSDS500 [7] and UESD [1] datasets. The final test results are evaluated using MATLAB R2024 tools. The training process is based on the Adam optimizer with a learning rate set at 1 × 10⁻⁵, a weight decay of 5 × 10⁻⁴, and a batch size of 8. The temperature parameter

τ

in knowledge distillation is dynamically adjusted according to the training stages (8→5→3).

To ensure a fair comparison with other works, this paper adopts commonly used evaluation metrics to measure the performance of edge detection algorithms, including Average Precision (AP) and the F-measure at Optimal Dataset Scale (ODS) and Optimal Image Scale (OIS). The F-measure is defined as Equation (14).

F_{m e a s u r e} = \frac{2 \times P \times R}{P + R}

(14)

where

P

represents the precision, which is the proportion of true edge pixels among the detected edge pixels.

R

represents the recall, which is the proportion of all true edge pixels that are successfully detected. The calculation formulas for precision and recall are as follows.

P = \frac{TP}{TP + FP}

(15)

R = \frac{TP}{TP + FN}

(16)

In the formula, TP (True Positive) represents the number of edge pixels that are correctly predicted as edges, meaning the algorithm correctly identifies edge pixels as edges. FP (False Positive) indicates the number of non-edge pixels that are incorrectly predicted as edges, meaning the algorithm mistakenly identifies non-edge pixels as edges. FN (False Negative) represents the number of edge pixels that are incorrectly predicted as non-edges, meaning the algorithm fails to identify edge pixels as edges. The higher the values of precision and recall, the better the performance of the edge detection algorithm.

It is worth noting that edge detection algorithms typically output an edge probability map, where each pixel value indicates the probability of that point being an edge point, with values ranging from 0 to 1. To convert the edge probability map into a binary edge map, binarization processing through threshold setting is required. The Optimal Dataset Scale (ODS) applies a unified fixed threshold to all images to maximize the F-measure across the entire dataset. In contrast, the Optimal Image Scale (OIS) selects an optimal threshold for each image individually to maximize the F-measure for that specific image.

Furthermore, by calculating precision and recall at different thresholds, the Precision-Recall (PR) curve can be plotted. The Average Precision (AP) corresponds to the area under the PR curve, reflecting the algorithm’s average precision performance at various recall levels. Together, these evaluation metrics provide a comprehensive and quantitative assessment of the edge detection algorithm’s performance.

4.3. Comparison with State-of-the-Arts

On BSDS500 [7], we compare our model with 22 edge detection methods to comprehensively evaluate the performance, including traditional gradient-based methods such as Sobel [5] and Canny [6]; handcrafted feature-based methods like MShift [14], EGB [15], and OEF [11]; early non-end-to-end deep learning-based methods like DeepEdge [20], DeepContour [19], and HFL [21]; classic end-to-end deep learning-based methods like HED [22], BDCN [24], and RCF [23]; lightweight deep learning-based methods like PiDiNet [31] and TIN [29]; and more advanced methods proposed in the last two years, such as UAED [27] and DiffusionEdge [28]. The evaluation results for all methods are taken from their publications. Quantitative results are shown in Table 2, and Figure 8 shows the Precision–Recall curves of all methods. It should be noted that the experiment on the BSDS500 dataset focuses on evaluating the generality of the method in general scenarios, while the edge detection performance for the specific scenario of space targets will be verified through subsequent qualitative and quantitative results on the UESD [1] dataset. These two parts together form a comprehensive evaluation of the method’s performance.

As presented in Table 2, the proposed method attains scores of 0.818, 0.829, and 0.850 on the ODS, OIS, and AP metrics, respectively. Although these results have not yet reached state-of-the-art performance levels, our method still exhibits strong competitiveness and offers unique advantages. Compared with traditional edge detection methods and most deep learning-based approaches, our method achieves significant performance improvements. For instance, compared with HED [22], our method achieves improvements of approximately 3.0%, 2.1%, and 1.0% on the ODS, OIS, and AP metrics, respectively, while reducing the model parameter count by approximately 66.7%. Similarly, compared with BDCN [24], our method achieves improvements of about 1.2%, 0.3%, and 0.3% on the ODS, OIS, and AP metrics, respectively, while also reducing the model parameter count by approximately 70.2%. It is evident that our method maintains high edge detection performance while effectively reducing model complexity. In addition, compared with lightweight deep-learning methods such as PiDiNet [31] and TIN [29], the proposed method also demonstrates significant advantages. On the ODS, OIS, and AP metrics, it achieves improvements of approximately 3.9%, 3.1%, and 4.0% over PiDiNet [31], respectively. On the ODS and OIS metrics, it achieves improvements of approximately 5.9% and 4.3% over TIN [29], respectively. This further validates that the proposed method can more effectively capture edge features and improve detection accuracy while achieving model lightweighting.

Figure 9 illustrates the edge detection results of Canny [6], HED [22], PiDiNet [31], DexiNed [25], BDCN [24], UAED [27], DiffusionEdge [28], and the method proposed in this paper. Although there is still a certain gap compared with UAED [27] and DiffusionEdge [28], the proposed method generates clearer and more easily identifiable edge lines than most other methods. The proposed method outperforms methods such as Canny [6] and DexiNed [25] by more effectively suppressing background noise and reducing the number of false positives. Additionally, compared with methods such as BDCN [24] and PiDiNet [31], the proposed method can retain richer and more complete target edges as much as possible.

In the proposed method, the Shape Prior Guidance Module plays a key role. This module guides the feature learning process of model by constructing prior models, which are simple yet representative and can provide effective prior knowledge for edge detection of space targets. However, this prior model fails to achieve ideal results on the BSDS500 [7] dataset. Specifically, space targets usually have relatively regular and simple geometric shapes, such as rectangular solar panels and cuboid, cylindrical, or spherical bodies, etc. These simple geometric features enable the model to relatively easily extract effective shape prior information, thereby achieving better detection results on the space target dataset. In contrast, the images in the BSDS500 [7] dataset contain a large number of complex natural scenes and objects, whose edges and shape features are far more complex than those of space targets. The simple prior model struggles to fully capture these complex features, resulting in less significant performance improvements on this dataset. In the future, we plan to further optimize the Shape Prior Guidance Module and explore more complex and effective prior models. This will enhance the model’s performance, particularly in handling complex structures, and enable it to achieve better performance on a wider variety of image types.

On UESD [1], the proposed method is compared with various edge detectors that exhibit relatively superior performance on the BSDS500 dataset, including Canny [6], MCG [17], OEF [11], HED [22], BDCN [24], DexiNed [25], PidiNet [31], and DiffusionEdge [28]. The evaluation results are presented in Table 3, and the precision–recall curves are depicted in Figure 10. The proposed method achieves scores of 0.659, 0.715, and 0.596 on the ODS, OIS, and AP metrics, respectively, outperforming all other edge detectors. Compared with the second-ranked method, the proposed method improves by 2.4%, 1.3%, and 1.3% on the ODS, OIS, and AP metrics, respectively.

It is worth noting that all methods are pre-trained on the BSDS500 dataset and then directly transferred to the UESD [1] dataset for evaluation. This experimental design does not ignore the value of in-domain training but is based on the practical constraints of the space target edge detection task, namely that the dedicated annotated data in this field is extremely scarce, making it difficult to support the independent training of the model. Although we have reason to believe that training directly on UESD [15] can further improve the model’s metrics, in this scenario, the practical value of the model is not only reflected in in-domain optimality but more crucially in the cross-domain transfer capability from general scenarios to the space target domain.

When evaluating on UESD [1] dataset, it can be observed that although quantitative assessment results reflect the performance to a certain extent, they fail to accurately represent the actual effectiveness of the method due to the special handling in the data annotation process. Specifically, during the annotation of data, we focused on the satellite targets themselves, annotating only the prominent edges of key satellite components while neglecting the background edges and finer details. While this approach simplifies the annotation workflow, it introduces potential biases into the evaluation results. Taking the calculation of precision as an example, the lack of annotations for background edges and the finer edges makes it difficult to accurately assess the model’s detection results for these edges. The model is prone to misclassifying unannotated edges as true edges, and these misclassifications are categorized as false positives (FP) during evaluation, leading to an increase in the FP value and resulting in a precision evaluation that is lower than the model’s actual performance level. Furthermore, this annotation approach may also affect the calculation of recall. Since some true edges are not annotated, even if the model correctly detects these edges, they cannot be counted as true positives (TP) in the evaluation, which may cause the recall evaluation to be lower than the actual level as well. Therefore, when analyzing the evaluation results, it is essential to fully consider the impact of this annotation bias on the performance metrics to gain a more comprehensive understanding of the model’s actual performance.

Figure 9. Qualitative comparisons with previous state-of-the-arts on BSDS500 [7]. Experiments on this dataset aim to verify the generalization ability and competitiveness of the method in general edge detection scenarios. For qualitative results specific to space target scenarios, please refer to Figure 11.

A further analysis of the aforementioned phenomenon is reflected in the precision-recall curve. As the threshold gradually decreases, the number of edges that the model identifies as positive samples increases, encompassing both more true edges (TP increases) and some background noise (FP increases). Meanwhile, the number of undetected true edges (FN) decreases correspondingly, driving the recall rate to continuously rise. The fluctuation of precision is constrained by the dual factors of the growth rates of TP and FP. Under ideal circumstances, in the low recall range, the model can only detect edges with the highest confidence, which are typically true edges, resulting in high precision and a relatively flat curve. As the recall rate increases, precision gradually decreases, but at a slower rate, leading to a smoother curve. However, due to the fact that only some prominent edges are annotated in the images, the few edges detected by the model at low recall may include some misclassifications, resulting in a lower precision. As the threshold further decreases and the model gradually detects more edges, the growth rate of TP may exceed that of FP at this stage, causing precision to initially increase, as shown in Figure 10.

Figure 10. The precision–recall curves on UESD [1,6,11,17,22,24,25,28,31].

As can be observed from Table 4, the HED [22], which performs slightly less effectively on the BSDS500 [7], demonstrates relatively outstanding performance in the task of spatial target edge detection. Below, we explore the main reasons for this phenomenon by combining the qualitative results in Figure 11. The HED [22] method adopts a top-down progressive feature extraction approach. While this method can obtain relatively macroscopic feature information to a certain extent, it inevitably leads to the loss of some detail information. The loss of detail information, on one hand, results in the loss of target detail features; on the other hand, it also reduces the misjudgment of unlabeled background or target detail edges as real edges. The reduction in the number of FPs directly leads to an increase in precision, thereby enabling the HED [22] method to exhibit certain advantages in the evaluation metrics of the spatial target edge detection task. However, it should be noted that this excellent performance in evaluation metrics does not fully demonstrate the method’s high reliability and stability in complex and variable spatial environments.

The spatial target edge detection results of various edge detectors are shown in Figure 11. Column (a) displays satellite images with simple backgrounds but rich in detail. It can be observed that methods such as HED [22], DexiNed [25], TEED [21], and BDCN [24] often tend to miss details and fail to detect complete edges when processing such images. For instance, in the regions indicated by the red boxes in the figure, the aforementioned methods struggle to accurately present the contours and details of the satellite body. In contrast, the method proposed in this paper can accurately detect the complete shape and subtle edges of the satellite body, demonstrating its advantages in capturing high-level semantic information and low-level details. Focusing further on the solar panel component, compared with methods like DexiNed [25] and TEED [21], the edges extracted by proposed method eliminate the interference of redundant and cluttered lines, showing significant advantages in terms of clarity and accuracy. Column (b) consists of satellite images affected by lighting conditions. Lighting variations cause edges to break locally or disappear entirely in certain areas, making edge detection more challenging. For instance, in the areas marked by red boxes in the figure, existing methods cannot close the broken parts of the solar panel edges, resulting in incomplete edge maps. The proposed method can effectively close the broken parts and generate continuous, smooth edges. Columns (c) and (d) present satellite images with complex backgrounds. It can be seen that methods like DexiNed [25] and TEED [21] detect a large number of false edges, making it difficult to clearly identify the satellite edges. The proposed method can effectively suppress background noise and detect clearer and more distinct satellite edges.

Figure 11. Edge detection results examples of various detectors on UESD [1] dataset. (a) Satellite images with simple backgrounds but rich in details; (b) satellite images affected by illumination; (c,d) satellite images with complex backgrounds.

4.4. Ablation Study

This section conducts three sets of ablation experiments to validate the effectiveness of each component (knowledge distillation, shape prior guidance, and curvature loss) proposed in this paper, as shown in Table 4. “Baseline-T” represents the teacher model as the baseline, while “Baseline-S” represents the student model as the baseline. Neither of these baseline models adopts knowledge distillation, includes the shape prior guidance module, or uses the traditional cross-entropy loss

L_{C E}

as the loss function. To further verify the role of each component, different strategies and modules are gradually introduced into the baselines. Specifically, “+KD” indicates the introduction of the knowledge distillation strategy, where the loss function of the method consists of two parts: one is the knowledge distillation loss between the student model and the teacher model, and the other is the cross-entropy loss between the student model’s output and the true edges, with the weight coefficients of both set to 0.5. “+SPGM” denotes the embedding of the shape prior guidance module in the encoder–decoder structure of the student model. “+

L_{c u r}

” signifies the incorporation of curvature loss into the loss function.

In the first set of experiments, the performance differences between the two baselines, “Baseline-S” and “Baseline-T”, are compared. As shown in Table 5, the ODS, OIS, and AP of “Baseline-S” are 0.598, 0.632, and 0.535, respectively, which exhibit a certain gap compared to “Baseline-T”. This indicates that the teacher model has advantages in feature representation and decision-making capabilities, while the student model still has room for improvement in these aspects.

In the second set of experiments, the contributions of three modules—knowledge distillation, shape prior guidance, and curvature loss—to enhancing the performance of the student model were individually assessed. As shown in Table 5, the ODS, OIS, and AP values for “+KD” are 0.630, 0.686, and 0.571, respectively, representing improvements of 3.2%, 2.4%, and 2.0% over “Baseline-S”. The experimental results demonstrate that knowledge distillation enables the student model to effectively learn the critical features of the teacher model, thereby enhancing its feature representation and decision-making capabilities. Knowledge distillation significantly improves the performance of the student model, with all its metrics surpassing those of the Baseline-S baseline network, thereby fully validating the effectiveness of the knowledge distillation strategy. The ODS, OIS, and AP values for “+SPGM” are 0.628, 0.688, and 0.575, respectively, representing improvements of 3.0%, 2.6%, and 2.4% over “Baseline-S”, which confirms that the shape prior guidance module effectively enhances the accuracy of edge detection. The ODS, OIS, and AP values for “+

L_{c u r}

” are 0.608, 0.674, and 0.559, respectively, and the experiments indicate that incorporating curvature loss can achieve edge optimization.

In the third set of experiments, the effects of jointly optimizing the components were further explored to comprehensively evaluate their synergistic interactions. The experimental results show that pairwise combinations of the components can significantly enhance model performance, with each combination exhibiting unique advantages and focuses. This further validates the effectiveness and complementarity of knowledge distillation, shape prior guidance, and curvature loss in edge detection.

4.5. Parameter Analysis

To verify the impact of the weight parameters

γ

(cross-entropy loss weight) and

λ

(curvature loss weight) in the loss function on model performance, this section systematically tests the effects of different parameter combinations through controlled variable experiments. The experiments are conducted on the UESD dataset, with a fixed learning rate of 0.001 and a batch size of 8. The core metrics include ODS, OIS, and AP. The experiments cover key combinations of

γ

(0.5/0.6/0.7/0.8) and

λ

(0.2/0.3/0.4/0.5), focusing on analyzing the influence of parameter changes on space target edge detection. The results are shown in Table 5. When

γ = 0.7

and

λ = 0.3

, the model achieves the optimal performance in all three metrics. This combination ensures the dominance of cross-entropy loss in edge pixel classification through a reasonable value of

γ

(improving localization accuracy), and at the same time, enables the curvature loss to effectively constrain edge continuity through the adaptive weight of

λ

(reducing fractures or redundancies), perfectly adapting to the detection requirements of space targets.

As can be seen from Table 5,

γ

controls the weight of cross-entropy loss and directly affects the classification accuracy of edge pixels. Experimental results show that when

γ

is in the range of 0.5–0.7, all three metrics improve steadily as

γ

increases (e.g., from (0.3, 0.5) to (0.3, 0.7), ODS increases by 0.024), indicating that appropriately enhancing the weight of pixel classification can improve localization accuracy. However, when

γ

exceeds 0.7 (e.g.,

γ = 0.8

), the metrics decrease instead (ODS decreases by 0.018), because excessive focus on local pixels leads to the loss of consistency in the overall morphology of edges.

λ

controls the weight of curvature loss and adjusts edge continuity. When

λ

is in the range of 0.2–0.3, performance is gradually optimized as

λ

increases (e.g., from (0.2, 0.7) to (0.3, 0.7), OIS increases by 0.012), demonstrating that moderate curvature constraints can enhance edge smoothness. But when

λ

exceeds 0.3 (e.g.,

λ

= 0.5), OIS decreases significantly by 0.023, because excessive smoothing causes the loss of key details such as satellite antennas, verifying the necessity of balancing curvature constraints with detail preservation.

When

γ

and

λ

deviate from the optimal ratio (e.g., (0.4, 0.8)), the model performance declines comprehensively, indicating that the weights of cross-entropy loss (responsible for localization) and curvature loss (responsible for continuity) need to be optimized synergistically. Strengthening a single loss unilaterally will lead to an imbalance in detection results.

5. Conclusions

This paper proposes a satellite target edge detection method based on knowledge distillation. By designing a multi-scale distillation strategy, a lightweight, fully convolutional network with fewer parameters is guided to learn critical features and decision boundaries from a complex teacher model. At the same time, a shape prior guidance module is embedded into the student branch to integrate the geometric prior information of spatial targets. A curvature-guided edge closure loss function is designed to effectively alleviate the problem of local edge breakage. The experimental validation of this paper centers on space target edge detection. The results on the UESD dataset fully demonstrate the targeted effectiveness of the method for space targets, while the results on the BSDS500 dataset verify the generalization ability of the method. Together, they form a complete support for the effectiveness of the method.

Although some achievements have been made in the task of spatial target edge detection, there are still directions worthy of further research and improvement. In the future, we plan to carry out work from the following aspects:

(1): In the Related Work section, this paper summarizes the geometric prior information of spatial targets, including geometric distribution and topological structure constraints. Subsequent research will delve deeper into how to effectively utilize geometric prior information to guide model learning. For example, using the topological structure constraints of spatial targets to optimize edge connectivity, ensuring that the detected edges conform to the actual characteristics of spatial targets in terms of topological relationships, and reducing unreasonable edge breakage and connection situations.
(2): In the method proposed in this paper, simple and representative prior models are constructed to guide the model’s feature learning, providing effective prior knowledge for spatial target edge detection. However, this prior model performs poorly on the BSDS500 dataset. The main reason is that the geometric shapes of spatial targets are simple, while the BSDS500 dataset contains complex natural scenes and objects, and the simple prior model struggles to capture their complex features. In the future, we will further optimize the shape prior guidance module and explore more complex and effective prior models to enhance the model’s ability to handle complex structures and make it perform better on more types of images.
(3): For the occlusion and illumination variation problem in complex scenes, although the current method has some ability to cope, there is still room for improvement. In the future, edge recovery strategies can be explored, such as edge completion methods based on Generative Adversarial Networks (GANs), using the adversarial training of generators and discriminators to predict the edges of occluded parts.

Author Contributions

Conceptualization, L.Z., Y.Z., M.H., F.Z. and X.S.; formal analysis, Y.M. and L.Z.; investigation, Y.M.; methodology, Y.M., L.Z., Y.Z., M.H., F.Z. and X.S.; software, Y.M.; supervision, Y.Z., M.H. and F.Z.; validation, Y.M.; visualization, X.S.; writing—original draft, Y.M.; writing—review and editing, L.Z. All authors will be updated at each stage of manuscript processing, including submission, revision, and revision reminder, via emails from our system or the assigned Assistant Editor. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhao, Y.; Zhong, R.; Cui, L. Intelligent recognition of spacecraft components from photorealistic images based on Unreal Engine 4. Adv. Space Res. 2023, 71, 3761–3774. [Google Scholar] [CrossRef]
Roberts, L. Machine Perception of Three-Dimensional Solids; Massachusetts Institute of Technology: Cambridge, MA, USA, 1963. [Google Scholar]
Prewitt, J.M.S. Object enhancement and extraction. Pict. Process. Psychopictorics 1970, 10, 15–19. [Google Scholar]
Kittler, J. On the accuracy of the Sobel edge detector. Image Vis. Comput. 1983, 1, 37–42. [Google Scholar] [CrossRef]
Torre, V.; Poggio, T.A. On edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PAMI-8, 147–163. [Google Scholar] [CrossRef] [PubMed]
Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PAMI-8, 679–698. [Google Scholar] [CrossRef]
Arbelaez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 898–916. [Google Scholar] [CrossRef] [PubMed]
Ren, X.; Bo, L. Discriminatively trained sparse code gradients for contour detection. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–8 December 2012; Volume 1, pp. 584–592. [Google Scholar]
Martin, R.; Fowlkes, C.C.; Malik, J. Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 530–549. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z.; Xing, F.; Shi, X.; Yang, L. Semicontour: A semi-supervised learning approach for contour detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 251–259. [Google Scholar]
Hallman, S.; Fowlkes, C.C. Oriented edge forests for boundary detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1732–1740. [Google Scholar]
Konishi, S.; Yuille, A.L.; Coughlan, J.M.; Zhu, S.C. Statistical edge detection: Learning and evaluating edge cues. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 57–74. [Google Scholar] [CrossRef]
Dollár, P.; Zitnick, C.L. Fast edge detection using structured forests. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1558–1570. [Google Scholar] [CrossRef]
Comaniciu, D.; Meer, P. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 603–619. [Google Scholar] [CrossRef]
Felzenszwalb, P.F.; Huttenlocher, P. Efficient graph-based image segmentation. Int. J. Comput. Vis. 2004, 59, 167–181. [Google Scholar] [CrossRef]
Cheng, M.M.; Liu, Y.; Hou, Q.; Bian, J.; Torr, P.; Hu, S.M.; Tu, Z. HFS: Hierarchical feature selection for efficient image segmentation. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part III 14. Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 867–882. [Google Scholar]
Arbeláez, P.; Pont-Tuset, J.; Barron, J.T.; Marques, F.; Malik, J. Multiscale combinatorial grouping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 328–335. [Google Scholar]
Plötz, T.; Roth, S. Neural nearest neighbors networks. Adv. Neural Inf. Process. Syst. 2018, 31, 1–12. [Google Scholar]
Shen, W.; Wang, X.; Wang, Y.; Bai, X.; Zhang, Z. Deepcontour: A deep convolutional feature learned by positive-sharing loss for contour detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3982–3991. [Google Scholar]
Bertasius, G.; Shi, J.; Torresani, L. Deepedge: A multi-scale bifurcated deep network for top-down contour detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 4380–4389. [Google Scholar]
Bertasius, G.; Shi, J.; Torresani, L. High-for-low and low-for-high: Efficient boundary detection from deep object features and its applications to high-level vision. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 504–512. [Google Scholar]
Xie, S.; Tu, Z. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1395–1403. [Google Scholar]
Liu, Y.; Cheng, M.M.; Hu, X.; Wang, K.; Bai, X. Richer convolutional features for edge detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3000–3009. [Google Scholar]
He, J.; Zhang, S.; Yang, M.; Shan, Y.; Huang, T. Bi-directional cascade network for perceptual edge detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3828–3837. [Google Scholar]
Poma, X.S.; Riba, E.; Sappa, A. Dense extreme inception network: Towards a robust cnn model for edge detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Aspen, CO, USA, 1–5 March 2020; pp. 1923–1932. [Google Scholar]
Pu, M.; Huang, Y.; Liu, Y.; Guan, Q.; Ling, H. Edter: Edge detection with transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1402–1412. [Google Scholar]
Zhou, C.; Huang, Y.; Pu, M.; Guan, Q.; Huang, L.; Ling, H. The treasure beneath multiple annotations: An uncertainty-aware edge detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 15507–15517. [Google Scholar]
Ye, Y.; Xu, K.; Huang, Y.; Yi, R.; Cai, Z. Diffusionedge: Diffusion probabilistic model for crisp edge detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 6675–6683. [Google Scholar]
Wibisono, J.K.; Hang, H.M. Traditional method inspired deep neural network for edge detection. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Virtually, 25–28 October 2020; IEEE: New York, NY, USA, 2020; pp. 678–682. [Google Scholar]
Wibisono, J.K.; Hang, H.M. Fined: Fast inference network for edge detection. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Virtual, 5–9 July 2021; IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar]
Su, Z.; Liu, W.; Yu, Z.; Hu, D.; Liao, Q.; Tian, Q.; Liu, L. Pixel difference networks for efficient edge detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 10–17 October 2021; pp. 5117–5127. [Google Scholar]
Soria, X.; Li, Y.; Rouhani, M.; Sappa, A.D. Tiny and efficient model for the edge detection generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 1364–1373. [Google Scholar]
Fuzhang, L.; Chuan, L. UHNet: An ultra-lightweight and high-speed edge detection network. arXiv 2024, arXiv:2408.04258. [Google Scholar]
Liufu, X.; Tan, C.; Lin, X.; Qi, Y.; Li, J.; Hu, J.F. SAUGE: Taming SAM for Uncertainty-Aligned Multi-Granularity Edge Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 5766–5774. [Google Scholar]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge distillation: A survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
Moslemi, A.; Briskina, A.; Dang, Z.; Li, J. A survey on knowledge distillation: Recent advancements. Mach. Learn. Appl. 2024, 18, 100605. [Google Scholar] [CrossRef]
Abdi, H.; Williams, L.J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
Mao, A.; Mohri, M.; Zhong, Y. Cross-entropy loss functions: Theoretical analysis and applications. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 23803–23828. [Google Scholar]
Marin, D.; Zhong, Y.; Drangova, M.; Boykov, Y. Thin structure estimation with curvature regularization. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 397–405. [Google Scholar]
Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vis. 2008, 77, 157–173. [Google Scholar] [CrossRef]

Figure 1. Example of space target images from the UESD dataset proposed in [1], where the images are generated using NASA’s collected satellite geometric models. (a) MRO3; (b) Aqua; (c) ACRIMSAT_0522; (d) Juno.

Figure 2. Illustration of the results of edge detector, showing local discontinuities or distortions in solar panel edges. (a) BDCN [24]; (b) DexiNed [25]; (c) HED [22]; (d) PiDiNet [31].

Figure 3. Overall framework of the proposed method.

Figure 4. The encoder–decoder structure of the teacher branch.

Figure 5. Architecture of shape prior guidance module.

Figure 6. Annotated component images. (a) Quadrilateral; (b) circular.

Figure 7. Examples of annotation for space target images. The first row shows the original images from the UESD [1] and the second row shows the corresponding binarized images obtained through annotation.

Figure 8. The precision–recall curves on BSDS500 [4,6,7,11,13,14,15,16,17,19,20,21,22,23,27,28,31].

Table 1. Encoder of the student branch.

Encoder Layer	Output Shape	Param
Input	(w, h, 3)	-
DSConv1	(w, h, 64)	219
pool	(w/2, h/2, 64)	-
DSConv2	(w/2, h/2, 128)	8768
pool	(w/4, h/4, 128)	-
DSConv3	(w/4, h/4, 256)	33,920
DSConv4	(w/4, h/4, 256)	67,840
pool	(w/8, h/8, 256)	-
DSConv5	(w/8, h/8, 512)	133,376
DSConv6	(w/8, h/8, 512)	266,752
pool	(w/16, h/16, 512)	-
DSConv7	(w/16, h/16, 512)	266,752
DSConv8	(w/16, h/16, 512)	266,752
		1,005,589

Table 2. Quantitative results on BSDS500 [7]. The best result are highlighted in red and blue, respectively.

Methods	ODS	OIS	AP	Param	FPS
Sobel [4]	.539	.575	.498	18	278.8
Canny [6]	.611	.676	.520	9/25	27.7
MShift [14]	.598	.645	.497	-	-
EGB [15]	.614	.658	.564	-	-
HFS [16]	.650	.688	.201	-	-
Pb [7]	.672	.695	.652	-	-
SE [13]	.743	.764	.800	-	-
MCG [17]	.744	.777	.760	-	-
OEF [11]	.746	.770	.815	-	-
DeepEdge [20]	.753	.772	.807	-	-
DeepContour [19]	.757	.776	.790	-	-
HFL [21]	.767	.788	.795	-	-
DexiNed [25]	.729	.745	.583	35M	1.43
TIN [29]	.772	.795	-	244K	10.1
HED [22]	.788	.808	.840	14.7M	2.91
PiDiNet [31]	.787	.804	.817	710K	3.59
BDCN [24]	.806	.826	.847	16.3M	2.88
RCF [23]	.811	.830	.846	14.8M	2.35
EDTER [26]	.824	.841	.880	468.8M	0.67
UAED [27]	.828	.847	.892	72.5M	1.02
DiffusionEdge [28]	.834	.848	.815	225M	1.54
ours	.818	.829	.850	4.91M	3.05

Table 3. Quantitative results when models are trained on BSDS500 [7] but evaluated with UESD [1]. The best two results are highlighted in red and blue, respectively.

Methods	ODS	OIS	AP	Param	FPS
Canny [6]	.444	.477	.341	9/25	58.8
MCG [17]	.489	.535	.355	-	-
OEF [11]	.493	.524	.389	-	-
DexiNed [25]	.611	.676	.564	35M	0.35
HED [22]	.635	.702	.583	14.7M	1.5
PiDiNet [21]	.566	.634	.452	710K	2.59
BDCN [24]	.557	.676	.440	16.3M	1.57
DiffusionEdge [28]	.594	.628	.492	225M	0.19
ours	.659	.715	.596	4.91M	2.15

Table 4. Ablation study of the effectiveness of the components of proposed method on UESD [1]. The best results are highlighted in red.

Method	ODS	OIS	AP
Baseline-T	.649	.683	.582
Baseline-S	.598	.662	.551
Baseline-S + KD	.630	.686	.571
Baseline-S + SPGM	.628	.688	.575
Baseline-S + $L_{c u r}$	.608	.674	.559
Baseline-S + KD + SPGM	.655	.709	.591
Baseline-S + KD + $L_{c u r}$	.638	.694	.574
Baseline-S + SPGM + $L_{c u r}$	.635	.695	.579
Ours	.659	.715	.596

Table 5. Impact of

γ

and

λ

parameter combinations on model performance (UESD [1]). The best results are highlighted in red.

Table 5. Impact of

γ

and

λ

parameter combinations on model performance (UESD [1]). The best results are highlighted in red.

$(λ, γ)$	ODS	OIS	AP
$(0.2, 0.7)$	.648	.703	.582
$(0.3, 0.5)$	.635	.692	.559
$(0.3, 0.6)$	.647	.707	.589
$(0.3, 0.7)$	.659	.715	.596
$(0.3, 0.8)$	.641	.705	.579
$(0.4, 0.7)$	.652	.708	.590
$(0.4, 0.8)$	.638	.699	.573
$(0.5, 0.7)$	.647	.692	.582

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Meng, Y.; Zhang, L.; Zhang, Y.; Hu, M.; Zhao, F.; Shen, X. Satellite Optical Target Edge Detection Based on Knowledge Distillation. Remote Sens. 2025, 17, 3008. https://doi.org/10.3390/rs17173008

AMA Style

Meng Y, Zhang L, Zhang Y, Hu M, Zhao F, Shen X. Satellite Optical Target Edge Detection Based on Knowledge Distillation. Remote Sensing. 2025; 17(17):3008. https://doi.org/10.3390/rs17173008

Chicago/Turabian Style

Meng, Ying, Luping Zhang, Yan Zhang, Moufa Hu, Fei Zhao, and Xinglin Shen. 2025. "Satellite Optical Target Edge Detection Based on Knowledge Distillation" Remote Sensing 17, no. 17: 3008. https://doi.org/10.3390/rs17173008

APA Style

Meng, Y., Zhang, L., Zhang, Y., Hu, M., Zhao, F., & Shen, X. (2025). Satellite Optical Target Edge Detection Based on Knowledge Distillation. Remote Sensing, 17(17), 3008. https://doi.org/10.3390/rs17173008

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Satellite Optical Target Edge Detection Based on Knowledge Distillation

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Overview Architecture

3.2. Knowledge Distillation

3.2.1. Teacher Model

3.2.2. Student Model

3.2.3. Multi-Stage Distillation

3.3. Shape Prior Guidance

3.3.1. Shape Prior Model Construction

3.3.2. Similarity Calculation

3.3.3. Feature Reconstruction

3.4. Loss Function

4. Experiments

4.1. Datasets

4.2. Implementation Details

4.3. Comparison with State-of-the-Arts

4.4. Ablation Study

4.5. Parameter Analysis

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI