1. Introduction
With the increasing development of remote sensing technologies, satellites are becoming more and more able to provide rich ocean data and information. As frequent carriers at sea, ships play a very important role in transportation, trade and defense; therefore, the great surveillance ability of remote sensing systems represents a convenient way to detect and classify ships using remote sensing imagery [
1]. In recent decades, benefiting from the 24/7 all-weather monitoring capability of synthetic aperture radar (SAR) systems, it is more prevalent to conduct ship detection and classification using satellite SAR images [
2,
3,
4]. Specifically, with the availability of more and more benchmark datasets such as the OpenSARShip [
2] and FUSAR-Ship [
3], SAR ship classification has received increasing attention in the SAR remote sensing community.
OpenSARShip is a widely used dataset for SAR ship classification established by the Shanghai Key Laboratory of Intelligent Sensing and Recognition, Shanghai Jiaotong University [
2]. The SAR ship images in this dataset are characterized by medium to high resolution, large intra-class variation, and small inter-class separation [
2,
4]. At the same time, imaging interference in the images of this dataset, including common speckle noise, sidelobes, and smearing effects, make it challenging to classify ships using this dataset [
2]. Currently, classification of ship targets using only single-polarized SAR images ignores, for example, the complementary features between dual-polarized images (i.e., SAR ship images in the polarization combinations of vertical–vertical (VV) and vertical–horizontal (VH)/horizontal–horizontal (HH) and horizontal–vertical (HV)) [
2]. However, it is very important and conducive to making full use of the complementary information of dual-polarized SAR images to suppress noise interference and improve the classification performance; therefore, in this paper we aim to deeply explore dual-polarized SAR images as a way to achieve high-performance SAR ship classification. Experimental validations are performed on the OpenSARShip dataset [
2], which provides more paired polarimetric data (e.g., VV-VH/HH-HV SAR ship images) compared to the FUSAR-Ship dataset [
3].
Regarding the SAR ship classification task in the current literature, a few existing methods mainly use handcrafted features for SAR ship representation, for instance geometric features (e.g., length, width, aspect ratio) [
5,
6,
7,
8], scattering features (e.g., radar cross section, RCS) [
5,
6,
7], and other widely used traditional features [
9,
10]. In an important work, Huang et al. [
2] first explored the effectiveness of using different manual features for ship classification based on the OpenSARShip dataset using the classic k-nearest neighbor (KNN) algorithm [
11]. Salerno et al. [
7] extensively validated the significant contribution of the geometric and scattering features for acquiring promising overall ship classification accuracy using low-resolution SAR images. To mitigate the deficiency of single classifiers for ship type prediction, Yan et al. [
8] proposed the multiple classifiers ensemble learning (MCEL) method to improve the utility of geometric features for ship classification in SAR images with limited samples. Despite the impressive advances of handcrafted features for SAR ship target classification, there are still a number of challenges. Most importantly, obtaining such manually designed features requires expert knowledge, which is time-consuming and labor-intensive. Moreover, it is always a nontrivial and complex work to perform the training and testing processes incorporating the feature extraction and classifier design. Thus, more efficient feature extraction schemes and classification strategies for SAR ship targets need to be developed.
Recently, deep learning technology has developed rapidly and achieved state-of-the-art (SOTA) performance on many tasks. Motivated by achievements in the computer vision (CV) field, many high-performance convolutional neural networks (CNNs) [
12] have been applied to remote sensing image interpretation tasks, including SAR automatic target recognition [
13]. Compared with traditional manual features, CNNs can extract deeper and more semantically meaningful features, which endows them with much power to improve the performance of SAR ship classification. On the one hand, to mitigate the issue of deep features’ unexplainability and the issue of sample scarcity in practice, researchers have made efforts to add traditional manual features to CNNs for both SAR ship detection [
14] and classification [
15,
16,
17]. For example, Zhang et al. [
15,
16] proposed integrating some manual features, such as the histogram of oriented gradient (HOG) feature [
15,
16], the naive geometric feature (NGF) [
6], the local radar cross section (LRCS) feature [
15], etc., with advanced CNNs, then fused the features to improve network performance. Similarly, Zheng et al. [
17] proposed a multi-feature collaborative fusion network framework to explore the interaction between deep features and the handcrafted features. On the other hand, specifically modifying advanced CNNs is another promising way to improve the accuracy of SAR ship classification. He et al. [
4] proposed a densely connected triplet CNN model and introduced the Fisher discriminant regularization metric learning to help the network extract more robust features. Dong et al. [
18] achieved promising results with high-resolution SAR images by using a residual network [
19]. Zhao and Lang [
20] and Zhao et al. [
21] respectively used transfer learning and domain adaptation technologies to cope with SAR ship classification using an unlabeled target dataset. As these methods are all based on single-polarization (single-pol) SAR images and ignore the complementary information between dual-polarization (dual-pol) images, their performance improvements are undoubtedly limited, especially when dealing with low-resolution SAR images.
Considering the abundant polarimetric information contained in dual-pol SAR imagery, several works have attempted to use this kind of data to obtain better ship classification performance. Xi et al. [
22] proposed a novel feature loss double-fusion Siamese network (DFSN). Their approach first uses a detection network to extract the ship area, eliminating the impact of sea clutter and noise, then uses a twin network to extract deep features of the cropped images (mainly composed of the ship targets), and finally uses multiple losses to jointly supervise the network learning process. However, their network is not end-to-end, and needs to first extract the main ship area in the image, which is time-consuming and laborious compared with direct end-to-end classification schemes. Zeng et al. [
23] proposed a novel CNN method equipped with the hybrid channel feature loss (HCFL) to sufficiently explore the information contained in dual-polarized SAR ship feature maps at the last layer of the network. Zhang et al. [
24] proposed a squeeze-and-excitation Laplacian pyramid network with dual-pol feature fusion (SE-LPN-DPFF). In this method, the dual-pol information and fusion strategy are both studied. Specifically, the deep feature maps from the VV and VH polarized images and their coherence image are concatenated on channel dimension and processed by the attention mechanism and multiscale mechanism to improve performance. He et al. [
25] also validated the effectiveness of using polarization for SAR ship classification, and further proposed a group bilinear pooling CNN (GBCNN) [
26] with an improved bilinear pooling operation to fuse the dual-pol information, thereby reducing computation complexity and improving classification performance. Xu et al. [
27] used a contrastive learning framework to explore the rich dual-polarized information. They regarded the VV and VH SAR images as positive sample pairs to strengthen the classifiers. Additionally, in accordance with the single-pol processing paradigm, Xie et al. [
28] and Zhang et al. [
29] respectively attempted to fuse the HOG features and the comprehensive geometric features (CGFs) with the deep features from dual-pol SAR ship images for further performance improvement. Recently, He et al. [
30] introduced a multiscale deep feature fusion framework applied to a GBCNN [
26], based on which the present paper conducts further exploration to boost SAR ship classification performance.
In summary, the above-mentioned methods have two defects. First, existing dual-polarized SAR ship classification methods generally only use the last feature layer of CNNs for fusion processing, ignoring the abundant feature information of the previous layers. Although the last layer of convolution features can represent the most discriminative semantic features, they do not have semantic integrity [
31]. Second, simple methods (e.g., concatenation, summation, or convolution) used for feature fusion cannot take full advantage of the complementary information between dual-pol SAR images. Meanwhile, when using second-order bilinear pooling operations, the features will contain redundant information, resulting in limited performance. To address these limitations, there is an urgent need to use multiscale features to obtain richer representations and more deeply explore the complementary information of dual-pol SAR images so as to reduce redundant information and suppress as much noise as possible.
Motivated by the above-mentioned analysis, we propose a novel cross-polarimetric interaction network, dubbed CPINet, to deal with the dual-polarized SAR ship classification task. Specifically, we first elaborate the SAR-DensnNet-v1 [
26] backbone network tailored for medium-resolution SAR images and use it to extract the multiscale features of SAR ships. Second, in order to make full use of the complementary information between dual-pol SAR images, we improve the squeeze–excitation (SE) attention block [
32] and propose a mixed-order SE (MO-SE) attention module applied to the last feature layer of CNNs. Third, to restrain the effect of noise as much as possible, factorized bilinear coding (FBC) [
33] is introduced to fuse the deep features of dual-pol SAR ship images. FBC is an improved method of bilinear pooling [
34] that can reduce the number of parameters and computation; it performs a low-value suppression operation after the feature fusion operation, which can help to suppress the effect of noise in SAR images. Finally, in order to obtain more complete semantic information and deeply supervise the network training, the multiscale fused features of the internal and last convolutional layers are sent to the classifier for classification processing, and the GradNorm algorithm [
35] is introduced to learn the weight hyperparameter of each loss component, ensuring that the network is better trained.
The main contributions of this paper are as follows:
- (1)
We propose CPINet to obtain more complete semantic information by fusing the feature representations at different scales and inherently suppressing the noise interference when making full use of the complementary information between dual-pol SAR ship images.
- (2)
A novel mixed-order squeeze–excitation (MO-SE) attention augmentation module is proposed, which is applied on the last feature layer of the CNNs. In this way, the dual-polarized deep features can guide each other, allowing the complementary information between them to be fully mined.
- (3)
The GradNorm algorithm is developed for the dual-polarized SAR ship classification; to the best of our knowledge, this is the first time this method has been extended to adaptively balance multiple losses for a single classification task.
- (4)
Comprehensive experiments demonstrated that the proposed CPINet is superior to other compared methods and achieves SOTA performance for the challenging SAR ship classification task based on the commonly used OpenSARShip dataset.
The remainder of this paper is organized as follows: related work is reviewed in
Section 2;
Section 3 describes the proposed method in detail;
Section 4 presents the experiments and analysis; finally,
Section 5 concludes the paper.