1. Introduction
As essential military and civilian facilities, the detection of airport runway areas holds significant importance in emergency rescue, public services, and aircraft automatic landing [
1]. Polarimetric Synthetic Aperture Radar (PolSAR) images exhibit such characteristics as all-day, all-weather capabilities, and they contain more abundant polarization information, which enables a more comprehensive reflection of object features. Consequently, the interpretation of PolSAR images holds high research value [
2,
3]. With the advancement of PolSAR technology, the acquisition of a vast amount of polarimetric synthetic aperture radar data has propelled the application of deep learning algorithms in PolSAR image interpretation research [
4,
5].
However, due to the lack of labeled images and the imaging mechanism that makes the boundary of the runway area blurred, the difficulty of image annotation is high. As a result, the current focus of airport runway area detection predominantly revolves around using optically acquired images that are easier to annotate and are available in larger quantities [
6]. Due to the limited availability of annotated samples required for effective deep learning training, the detection of airport runway areas in PolSAR images mostly relies on traditional methods. Traditional polarized SAR image runway detection techniques involve manual feature extraction to identify regions of interest [
7,
8,
9,
10,
11]. Subsequently, suspected targets are classified based on various manually set thresholds and using different classifiers, such as support vector machines [
12,
13,
14,
15], effectively filtering out false alarms from ground objects. Moreover, traditional methods necessitate the integration of prior information regarding airport runway areas, such as straight-line features, geometric characteristics, and dimensions of runways. However, the running time of these traditional methods is also long, and compared with some deep learning methods, it is difficult to guarantee the timeliness of detection [
16].
With the development of deep learning theory, neural networks such as convolutional neural networks (CNN) [
17,
18,
19,
20] and quaternion neural networks (QNN) [
21,
22], have been well applied in PolSAR image processing. QNN is suitable for processing multidimensional and complex data, and can maintain the interdependence between channels. However, due to the large computational cost of QNN networks, and the lack of regularization methods and activation functions designed for QNN networks, the network layers are generally shallow when used, making it difficult to extract deeper features. Therefore, PolSAR image processing is still dominated by CNN. With the emergence of high-precision and generalizable semantic segmentation networks such as U-Net [
17] and Deeplab V3 [
20], CNN has been promoted for airport runway area detection. For instance, in reference [
23], the integration of a geospatial context attention mechanism into the Deeplab V3 network enhanced the learning of geospatial features. Similarly, in reference [
18], an innovative atrous convolution module was designed to expand the model’s receptive field, resulting in a reduced false alarm rate during airport runway area detection and yielding commendable detection outcomes.
However, there are still shortcomings in some methods that use deep learning for airport runway area detection using PolSAR data [
17,
18,
19,
20,
24]. Firstly, due to the imaging characteristics of PolSAR, ground targets such as rivers exhibit similar scattering properties to runways. This is manifested in the PauliRGB image, where colors appear quite alike, resulting in a significant number of false alarm occurrences. As depicted in
Figure 1c, the detection results obtained from the notable Unet++ [
19] network, an improved version of U-Net, are showcased. Unet++ has demonstrated favorable outcomes when applied to remote sensing imagery [
25,
26,
27]. However, when utilized for runway area detection, certain limitations arise due to the scarcity of annotated training data. This data paucity impairs the network’s ability to extract deep semantic information specific to runway areas or results in the loss of critical semantic information during its propagation within the network. Consequently, the network struggles to effectively discern runways from other ground objects. Another shortcoming lies in the segmentation results, where the detection performance of certain runway or taxiway areas between runways is relatively poor. This is due to the presence of many narrow taxiways in the runway area of PolSAR images, which makes it difficult for the network to accurately detect. Adding edge information will enable the image to reference connected region information during detection. For example, as depicted in
Figure 1f, the detection results obtained from the runway area segmentation algorithm D-Unet [
18] illustrate its limitations in detecting taxiway regions within the red circle. Despite being an improved version of the U-Net network, specifically tailored for runway area detection, D-Unet still exhibits missed detections in taxiway regions. Indeed, numerous U-Net-based networks, with various improvements, have showcased remarkable performance across a multitude of image segmentation tasks [
19,
28,
29,
30,
31]. However, when it comes to the specific challenge of runway area detection in PolSAR data, especially in large-scale scenarios, certain limitations hinder their overall effectiveness. These challenges are often rooted in the network’s structural designs, leading to the loss of critical semantic information during information propagation. For instance, the expression bottleneck structure mentioned in Inception V3 [
32] could potentially result in information loss. Consequently, these factors significantly impact the network’s ability to achieve optimal runway area detection with PolSAR data. Of course, other PolSAR runway detection methods also have some shortcomings to a greater or lesser extent. Some common examples are: the detection scene is too small [
33]; the detection time is too long; only the PauliRGB image is used without using other information from the
matrix, which avoids the large amount of computation caused by complex numbers, but also loses a lot of useful information [
18].
In recent years, there has been remarkable progress in the development of self-supervised learning algorithms. This approach has the distinct advantage of circumventing the resource-intensive data collection and annotation processes by leveraging auxiliary tasks to train models using the data itself [
5]. The introduction of MOCO [
34] has been a testament to the powerful capabilities of self-supervised learning. Models trained through self-supervised learning exhibit a robust ability to extract semantic information and demonstrate excellent generalization performance on downstream tasks, sometimes even outperforming supervised learning methods. Consequently, employing pre-trained networks obtained from self-supervised learning can effectively address the challenge of scarce annotated data in downstream tasks, enabling accurate extraction of deep semantic information. Although some research has applied self-supervised learning to PolSAR images and achieved outstanding results, the majority of these studies have focused on PolSAR image classification tasks [
5,
35]. However, these methods often involve dividing the images into homogeneous color blocks, which overlooks crucial information, like the shape of ground objects. As a result, these methods exhibit poor performance when directly applied to runway area detection tasks. The key challenge lies in integrating self-supervised learning with PolSAR images effectively, harnessing its advantages to precisely detect and segment airport runways despite limited annotated samples. Addressing this issue is crucial and forms the central focus and complexity of ongoing research efforts.
In response to the aforementioned challenges, this paper proposes a Self-Enhancing Learning Network (SEL-Net) for runway area detection, which incorporates self-supervised learning to enhance semantic and edge information. Firstly, building upon the self-supervised algorithm MOCO [
34], SEL-Net utilizes the multi-channel nature of PolSAR data and introduces feature images that emphasize runway characteristics. By augmenting the number of positive samples and introducing pseudo-labeled images, the self-supervised network becomes more attentive to runway areas during pre-training. This allows the transferred pre-trained model to enhance its capability to extract deep features related to runways in the detection network. Furthermore, the segmentation network in SEL-Net includes a Semantic Information Transmission Module (STM) to facilitate the propagation of semantic information within the network, addressing the issue of information loss during transmission. Improvements are made to the up-sampling and down-sampling structures to further mitigate information loss. Additionally, to overcome insufficient edge information extraction, SEL-Net incorporates an Edge Information Extraction Module (EEM) and an Edge Information Fusion Module (EFM). These modules enhance the network’s ability to extract edge information.
The contributions of this paper can be summarized as follows:
- (1)
A self-supervised learning-based PolSAR image runway area detection network, SEL-Net, is designed. By introducing self-supervised learning and improving the detection network, the effectiveness of runway area detection in PolSAR images has been significantly improved under conditions of insufficient annotated data, resulting in a reduction in both false positive and false negative rates.
- (2)
By capitalizing on the distinctive traits of PolSAR data and employing the MOCO network, we obtain a pre-trained model that prioritizes the recognition of runway region features. Transferring this well-trained model to the downstream segmentation task effectively addresses the issue of insufficient deep semantic feature extraction from the runway region, which is previously constrained by the scarcity of PolSAR data annotations.
- (3)
To enhance the U-Net network’s ability to extract edge information, we introduce EEM and EFM. Furthermore, we design a STM, and implement improvements to the up- and down-sampling processes to minimize the loss of semantic information during network propagation.
5. Discussion
This paper studies how to use self-supervised networks to solve the impact of insufficient PolSAR data annotation on deep learning. In the self-supervised learning stage, feature channel images are added as pseudo-labels and the loss function is improved. It can be seen from the t-SNE dimensionality reduction visualization in
Figure 19 and the heat map visualization in
Figure 20 that, compared with the previous self-supervised learning methods, the pre-trained model obtained by this paper pays more attention to the extraction of semantic information in the runway area. The ablation experiment in
Section 4.4.3 also shows that the introduction of the pre-trained model obtained by self-supervised learning further improves the detection results. In addition, this paper also improves the detection network. It can be seen from the experimental results and channel visualization in the ablation experiment in
Section 4.4.3 that the modified model enhances the extraction of edge information and can reduce the loss of semantic information in the network. It can be seen from
Table 5 that, in the comparative experiments, this paper’s method is optimal in four evaluation indicators of detection accuracy. It can also be seen from the experimental result figures that this paper’s method has the least false alarm rate, the best runway integrity, and smoother edges compared with other methods. It can be shown that this paper’s method solves the problem of high false alarm rate and runway integrity caused by insufficient extraction of edge information and deep semantic information in the runway area by the network.
Due to the many improvements made to the network to increase accuracy, the complexity of the model has also increased. In the future, there is potential for model lightweighting to enhance detection efficiency while maintaining detection accuracy. Furthermore, as the number of self-supervised learning images used is relatively limited, some feature images may have unclear runway characteristics due to the imaging mechanism. This results in missing imaging in certain areas, leading to slow convergence and higher loss values during self-supervised learning clustering. To address this issue, future work can focus on introducing more polarimetric decomposition methods to obtain feature images with clear runway characteristics. Finally, our network can not only be applied to airport runway area detection, but also can be used in the detection of more similar objects in the future, such as bridge detection. In addition, our network can also be used to detect the linear debris-free glacier parts proposed in the literature [
44] to warn of the occurrence of avalanches. Specifically, we can first use a large number of PolSAR images of glacier areas and the feature maps of the features of glaciers and wet snow polarization characteristics proposed in the literature [
21] to generate a pre-trained model through self-supervised learning, and then use a small number of labeled images of debris-free glaciers to train the final model to detect the distribution of debris-free glaciers.