HA-CP-Net: A Cross-Domain Few-Shot SAR Oil Spill Detection Network Based on Hybrid Attention and Category Perception

Song, Dongmei; Wang, Shuzhen; Wang, Bin; Chen, Weimin; Chen, Lei

doi:10.3390/jmse13071340

Open AccessArticle

HA-CP-Net: A Cross-Domain Few-Shot SAR Oil Spill Detection Network Based on Hybrid Attention and Category Perception

by

Dongmei Song

^1,2,

Shuzhen Wang

¹,

Bin Wang

^1,2,*

,

Weimin Chen

¹ and

Lei Chen

¹

The College of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266404, China

²

The Technology Innovation Center for Maritime Silk Road Marine Resources and Environment Networked Observation, Ministry of Natural Resources, Qingdao 266580, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(7), 1340; https://doi.org/10.3390/jmse13071340 (registering DOI)

Submission received: 18 May 2025 / Revised: 8 July 2025 / Accepted: 11 July 2025 / Published: 13 July 2025

(This article belongs to the Section Marine Environmental Science)

Download

Browse Figures

Versions Notes

Abstract

Deep learning models have obvious advantages in detecting oil spills, but the training of deep learning models heavily depends on a large number of samples of high quality. However, due to the accidental nature, unpredictability, and urgency of oil spill incidents, it is difficult to obtain a large number of labeled samples in real oil spill monitoring scenarios. Surprisingly, few-shot learning can achieve excellent classification performance with only a small number of labeled samples. In this context, a new cross-domain few-shot SAR oil spill detection network is proposed in this paper. Significantly, the network is embedded with a hybrid attention feature extraction block, which consists of a coordinate attention module to perceive the channel information and spatial location information, as well as a global self-attention transformer module capturing the global dependencies and a multi-scale self-attention module depicting the local detailed features, thereby achieving deep mining and accurate characterization of image features. In addition, to address the problem that it is difficult to distinguish between the suspected oil film in seawater and real oil film using few-shot due to the small difference in features, this paper proposes a double loss function category determination block, which consists of two parts: a well-designed category-perception loss function and a traditional cross-entropy loss function. The category-perception loss function optimizes the spatial distribution of sample features by shortening the distance between similar samples while expanding the distance between different samples. By combining the category-perception loss function with the cross-entropy loss function, the network’s performance in discriminating between real and suspected oil films is thus maximized. The experimental results effectively demonstrate that this study provides an effective solution for high-precision oil spill detection under few-shot conditions, which is conducive to the rapid identification of oil spill accidents.

Keywords:

polarimetric SAR; oil spill detection; hybrid attention mechanism; category perception

1. Introduction

Marine pollution has long been a focal point of human concern [1], as it poses a serious threat to ecosystems [2]. With the increasing frequency of oil drilling and shipping activities, concerns about oil-related accidents have continued to grow [3]. Once the leaked oil reaches the coastline or nearshore areas, the situation becomes more complex and worsens [4].

To date, satellite remote-sensing technology has developed into a mainstream approach for marine oil spill monitoring [5]. Synthetic Aperture Radar (SAR), with its unique cloud-penetrating imaging capability and all-weather monitoring features [6], has demonstrated significant technical advantages in oil spill identification [7].

Its imaging mechanism is a radar that actively emits an electromagnetic signal to the ground at a certain angle of incidence and receives and processes the backward-scatter signal from the ground to generate images. When an oil spill occurs, the oil film covering the sea will change its surface roughness, resulting in a weakening of the back-ray signal, thus presenting obvious “dark speckle” features on the radar image [8]. However, other phenomena such as a ship’s stern marks and low-wind-speed zones also show lower rear-scattering characteristics on the image. Therefore, how to effectively distinguish the oil spill area from the suspected oil spill is the key to oil spill detection [9].

Aiming to mitigate the shortcomings of traditional unipolar SAR, such as providing limited information and vulnerability to background noise interference, multipolar SAR has emerged. Zhang et al. proposed an adaptive mechanism based on the Otsu method, which combines region growing with edge detection and threshold segmentation to extract oil spills [10]. While the threshold segmentation methodology demonstrates notable advantages in marine oil spill detection, its practical application reveals substantial limitations. The technique exhibits heightened sensitivity to the signal-to-noise ratio in remote-sensing imagery, and is particularly susceptible to sea surface clutter interference.

With the development of machine learning and neural networks, decision trees [11], fuzzy logic [12], and multi-layer perceptual neural networks [13] have been applied to oil spill detection, and the accuracy of the oil spill detection has been improved to a certain extent. However, these methods are prone to fragmentation during the detection process, which leads to the detection results of the oil spill area presenting a scattered or fragmented pattern, and misclassification of the oil spill as seawater or other features. To its credit, deep learning exhibits a powerful capability for learning representations, which can automatically extract information from a large amount of data [14]. Ma et al. proposed a deep convolutional neural network (DCNN) with up to 12 weighted layers [15]. The increase in weighted layers enables the network to better extract multi-level oil spill characteristics from SAR images. However, the training and updating of the deep learning network rely heavily on a large number of training samples, while oil spill accidents are characterized by high pollution, requiring rapid emergency response, and occur by chance, making it difficult to obtain a large amount of oil spill sample data in practical monitoring. In addition, due to the high cost of conducting oil spill experiments, this also leads to difficulty in obtaining oil spill sample data. Fortunately, the emergence of few-shot learning methods provides new ideas to address this problem.

As of now, methodologically speaking, there are currently three main several shot learning methods: model fine-tuning [16], data enhancement [17], and migration learning [18]. Among them, the meta-learning method in migration learning is widely used because it can quickly adapt to new classification tasks. Meta-learning has achieved excellent research results in many fields; its core idea is to let machines ‘learn to learn’, so that they have the ability to analyze and solve problems, similar to the human brain. At present, the use of meta-learning to detect oil spills with few-shot is rare; however, meta-learning methods are commonly used in the field of SAR image classification. For instance, Yuan Tai et al. proposed a complex CNN-based source network using few-shot learning to extract rich sample features, which are then processed by a disconnected attention module to generate output features [19]. These features are selectively transferred to the target network for classifying target domain data. Experimental results on three real SAR datasets demonstrate that the proposed method exhibits significant superiority in ground object classification tasks. Haorun Li et al. proposed an innovative hybrid network for target classification in SAR imagery, designed to independently extract spatial and frequency-domain information [20]. Experimental results demonstrate that the network exhibits excellent classification performance across various few-shot scenarios.

However, convolutional neural networks are limited to capturing only local feature information from images and struggle to model long-range dependencies in oil spill imagery, thereby neglecting global contextual information. This limitation results in insufficient feature extraction for SAR target classification, consequently constraining further improvements in classification accuracy. In addition, in few-shot learning paradigms, upon completion of feature extraction, both support set features and query set samples are projected into a category metric space. Within this space, the classification of unknown samples is achieved by comparing the distances between the features of the samples to be classified in the query set and those of the known categories in the support set. Subsequently, this classification capability is transferred to the target domain through transfer learning, enabling the classification of target domain samples.

Based on this, a series of few-shot learning-based methods have been successively proposed. Dalal Alajaji et al. introduced a deep few-shot learning method for remote-sensing scene classification. This method is based on a prototypical deep neural network framework, combined with a pre-trained SqueezeNet convolutional neural network for image features embedding, and achieves classification by computing the distance between the support set samples and unlabeled samples for each category in the embedding space [21]. Liu et al. proposed a metric-based deep few-shot learning (DFSL) method [22]. This approach leverages the source domain dataset to learn a category metric space, in which the classification of query samples is achieved by measuring the distance between the samples to be classified and the support set samples. Subsequently, the learned category metric space is generalized to the target domain dataset, thereby achieving effective classification performance on the target domain. However, existing few-shot learning networks primarily focus on the distance between query set samples and support set samples, while neglecting the inter-class distances among samples of different categories within the support set. When samples from different categories in the support set are closely distributed in the feature space, the network struggles to accurately determine the categorical affiliation of query samples relative to the support set samples during classification, thereby compromising classification performance. In the feature extraction of SAR oil spill images, due to the similarity in backscattering characteristics between suspected oil films and actual oil spills, when the features of the support set and query set are mapped into the metric space, the features of suspected oil films and actual oil spills in the support set often exhibit close proximity in distance metrics. This similarity can interfere with distance-based classification decisions, leading to misclassification of query samples with true seawater labels as oil spills. Therefore, there is an urgent need for an effective method to address the challenge of distinguishing between similar samples that are too close in the metric space.

In summary, the existing SAR oil spill image detection method still faces problems of too few oil spill samples and insufficient training, and the feature extraction network fails to effectively excavate the global and local feature information of the oil spill image at the same time. Furthermore, the close proximity of oil films and suspected oil films in the few-shot metric space makes it difficult to distinguish between them effectively. To address these challenges, this paper proposes a hybrid attention feature extraction block that integrates both global and local feature mining capabilities. A category-aware loss function is designed to enhance the distinguishability of feature distances, and a high-precision SAR oil spill detection framework is developed under few-shot conditions based on meta-learning strategies. The main contributions of this article are as follows:

(1) This study innovatively applies meta-learning techniques to the field of oil spill detection, aiming to address the critical issue of sample scarcity. Through systematic training of oil spill data from the source domain, the model acquires in-depth knowledge of characteristic features across various categories in oil spill imagery. This knowledge is then effectively transferred to the target domain, enabling accurate classification with limited samples, thereby significantly enhancing the accuracy of oil spill detection in the target domain.

(2) A hybrid attention feature extraction block is constructed to comprehensively explore both global and local features in oil spill imagery. The block consists of three key components: the coordinate attention module enables the network to obtain long-distance relationships in one direction and retain spatial location information in the other direction at the same time, thus enhancing the network’s ability to capture spatial location and channel information. Secondly, the global self-attention transformer module effectively captures global dependencies by modeling self-correlation among pixels. Finally, the multi-scale self-attention module employs diverse window partitioning strategies, allowing the network to precisely focus on local oil spill characteristics, thus substantially improving the accuracy of oil spill detection.

(3) A novel loss function, termed category-perception distance loss, which introduces a groundbreaking approach to feature space optimization. By minimizing the objective function value—specifically, the discrepancy in category-aware distances between output and input feature vectors—the model achieves dual optimization in the feature space: it significantly reduces intra-class distances (e.g., oil spill–oil spill, seawater–seawater) while effectively expanding inter-class distances (e.g., oil spill–non oil spill, seawater–ship). This dual optimization mechanism ensures tight clustering of similar samples and effective separation of dissimilar samples in the feature space, thereby substantially enhancing the accuracy of oil spill detection.

The rest of this article is organized as follows. The second section introduces the data used and the research area. The third section introduces the framework of the proposed model. The fourth section introduces the results and discussions. The fifth section summarizes.

2. Materials

To verify the effectiveness of the proposed method, this study selects three SAR images for experiments: two fully polarimetric SAR images from RADARSAT-2 and one dual-polarization image from Sentinel-1. Dataset 1 is a Sentinel-1 image acquired over the Persian Gulf on 18 May 2019, with a spatial resolution of 10 m × 10 m. The data format is Ground Range Detected (GRD). The oil spill type in the imagery is identified as crude oil. This image is shown in Figure 1a [23]. Data 2 is a fully polarimetric SAR image acquired by RADARSAT-2 over the Gulf of Mexico on 8 May 2015, with a resolution of 4.73 m × 4.95 m and provided in Single Look Complex (SLC) format. The oil spill type in the imagery is identified as crude oil. Dataset 3 is also a fully polarimetric image from RADARSAT-2, captured during an oil spill experiment conducted by the Norwegian Clean Seas Association in the European North Sea on 8 June 2011, with a resolution of 4.7 m × 4.8 m, also in SLC format. The types of oil spills in the images include plant oil, crude oil, and emulsified oil. This image is shown in Figure 1b,c [24]. In addition, Dataset 1 was selected as the source domain data, and Datasets 2 and 3 were selected as the target domain data. Figure 1a presents grayscale imagery from dual-polarization SAR data, with black areas indicating oil spills/suspected oil spills, white representing land/vessels, and gray depicting seawater. In contrast, Figure 1b,c display Pauli decomposition results from fully polarimetric SAR data, where dark green areas denote oil spills/suspected oil spills due to distinct polarimetric scattering mechanisms (e.g., surface/volume scattering), while seawater appears in dark purple hues.

3. Proposed Method

In addressing the few-shot scenario, where distinguishing between oil films and suspected oil films proves difficult, this is primarily due to two factors: their close proximity in metric spaces and the insufficient extraction of discriminative features from oil spill images. This paper proposes a cross-domain few-shot SAR oil spill detection network that incorporates hybrid attention and category-perception mechanisms. In the realm of feature extraction, a hybrid attention block is constructed to thoroughly explore the deep-level characteristics of oil spill imagery. The proposed method initially augments the network’s capacity to capture spatial location and channel information through the integration of the coordinate attention module. Subsequently, it leverages the global self-attention module alongside the multi-scale self-attention module to thoroughly extract both global contextual and local details from the imagery, thereby bolstering the efficacy of oil spill detection. Furthermore, a category-perception loss function is proposed to enhance the network’s capability in distinguishing between oil spills and non-oil spill regions.

Based on the aforementioned improvements in both feature extraction and classification decision-making, the proposed detection network in this study significantly enhances the accuracy of oil spill detection across various scenarios.

3.1. Framework

As shown in Figure 2, the specific process of HA-CP-Net mainly includes two stages: (a) the alternate training stage of the source domain data and the training data in the target domain, and (b) the classification stage of the target domain test data. In the first stage, the network initially employs the mapping layer to standardize the channel count of both source and target domain data. Subsequently, it utilizes the hybrid attention feature extraction block to derive global and local features from the two domains. These features are then projected into a metric space defined by category-perception distance. Within this space, samples belonging to the same category are drawn closer together, while those from different categories are pushed further apart, thereby augmenting the model’s classification capacity. During the training process of labeled samples in the target domain, Gaussian noise is strategically incorporated for data augmentation to meet the model’s training requirements. In the second stage, the pre-trained mapping layer and hybrid attention feature extraction block are employed to extract distinctive features from both labeled and unlabeled datasets. These extracted features are then fed into the NN classifier for sample classification, ultimately yielding a comprehensive classification result map that facilitates a thorough evaluation of the model’s performance.

In this study, to facilitate the effective transfer of knowledge acquired from the source domain to the target domain by the network, a strategy involving the alternate training of source domain and target domain data is implemented. Specifically,

C_{s}

category is randomly selected from the source domain data,

m_{s}

and

n_{s}

samples are selected successively from each category to form a support set

S_{S} = {\{(a_{i}, b_{i})\}}_{i = 1}^{C_{s} \times m_{s}}

and query set

Q_{s} = {\{(a_{j}, b_{j})\}}_{j = 1}^{C_{s} \times n_{s}}

of the source domain. Among them,

(a_{i}, b_{i})

and

(a_{j}, b_{j})

, respectively, represent the

i

sample in the support set and the corresponding label and the

j

sample in the query set and the corresponding label. Then, the selected support set and query set will be reorganized for the training task of the source domain. In the target domain data training, divide the target domain dataset into the training set

T_{t r}

and the test set

T_{t e}

firstly [25].

To meet the needs of few-shot learning tasks, the data is enhanced for

T_{t r}

by adding Gaussian noise. Similarly, in the target domain training,

T_{t r}

samples of the target domain dataset are randomly sampled to form a support set

S_{T} = {\{(a_{i}, b_{i})\}}_{i = 1}^{C_{T} \times m_{T}}

and a query set

Q_{T} = {\{(a_{i}, b_{i})\}}_{i = 1}^{C_{T} \times n_{T}}

, and reorganize it. The learning process is the same as the source domain data training process. Taking the sample

a_{j}

in the query set

Q_{S}

as an example, its probability distribution in the characteristic space is:

P (b_{j} = m |a_{j} \in Q_{S}) = \frac{\exp (- d (f_{φ} (a_{j}), f_{m}))}{\sum_{m = 1}^{C} \exp (- d (f_{φ} (a_{j}), f_{m}))}

(1)

From these,

d (,)

refers to the European distance function,

b_{j}

is the real label of the query sample

a_{j}

,

f_{φ}

refers to the feature extracted by the optimization parameter

φ

,

f_{m}

is the feature of the

m

class in the support set, and

C

is the number of each category in each scenario training task. To achieve channel dimension consistency between the source and target domain data when feeding into the feature extraction network, a 2D convolutional layer is first applied to process the input data, unifying the channel dimensions of both domains to a standardized size of 18. The output of the mapping layer can be mathematically represented as:

I^{'} = I \times T

(2)

From these,

I

is the input data,

I^{'}

is the converted data, and

T

is the conversion matrix.

3.2. Hybrid Attention Feature Extraction Block

For enhanced extraction of both global and localized fine-grained features in oil spill imagery, this paper builds a hybrid attention feature extraction block, which includes three parts: coordinate attention module and global self-attention transformer module, and multi-scale self-attention module. The detailed explanation is as follows.

3.2.1. Coordinate Attention Module [26]

This paper incorporates coordinated attention to perform global positional encoding on SAR imagery, thereby capturing both spatial and channel information of the images. Notably, coordinate attention is capable of capturing long-range dependencies along one spatial dimension while retaining precise positional information along the other, thereby aiding the network in more accurately localizing the target of interest. Subsequently, the captured positional information is integrated through channel-wise weighting, which adaptively modulates the network’s attention to each pixel in SAR oil spill imagery, thereby enhancing the network’s feature representation capability.

The structure of coordinated attention is shown in Figure 3. Specifically, given the input image

X

, the global average pooling is used in two spatial ranges

(H, 1)

or

(1, W)

, and each channel is coded along the direction

x

and

y

, respectively, to obtain the characteristics of the two directions. These two characteristics can be expressed as

Z_{c}^{h} (h) = \frac{1}{W} \sum_{0 \leq i < W} x_{c} (h, i)

(3)

Z_{c}^{w} (w) = \frac{1}{H} \sum_{0 \leq j < H} x_{c} (j, w)

(4)

In the formula,

Z_{c}^{h} (h)

is the output characteristic of the height

h

of the channel

c

,

i

is the index of the width, and

W

is the total width;

Z_{c}^{w} (w)

is the output characteristic of the width

w

of the channel

c

,

j

is the index of the height, and

H

is the total height.

Then the two transformed feature maps obtained above are concatenated and then fed into a shared 1 × 1 convolutional

F_{1}

to generate the new feature

f

.

f = δ (F_{1} [z^{h}, z^{w}])

(5)

In the formula,

[\cdot, \cdot]

is the splicing operation along the spatial dimension,

δ

is a nonlinear activation function.

Next, split

f

into two separate tensors

f^{h}

and

f^{w}

along the spatial dimension. Use the other two

1 \times 1

convolutional transformations

F_{h}

and

F_{w}

to convert

f^{h}

and

f^{w}

into characteristics with the same number of channels as the input

X

, respectively, and obtain

g^{h}

and

g^{w}

;

g^{h}

and

g^{w}

can be expressed as

g^{h} = σ (F_{h} (f^{h}))

(6)

g^{w} = σ (F_{w} (f^{w}))

(7)

Then the output

g^{h}

and

g^{w}

are used as attention weights. The final output of the coordinate attention-processed feature maps

y_{c} (i, j)

can be represented as

y_{c} (i, j) = x_{c} (i, j) \times g_{c}^{h} (i) \times g_{c}^{w} (j)

(8)

In the formula,

x_{c} (i, j)

represents the input original figure,

g_{c}^{h} (i)

is the weight of the

c

channel height

i

, and

g_{c}^{w} (j)

is the weight of the

c

channel width

j

.

3.2.2. Global Self-Attention Transformer Module

To extract the global feature representation of oil spill imagery, the inter-pixel self-correlations are meticulously modeled utilizing the global self-attention module. This module is composed of patch embedding, multi-head self-attention, normalization layer, dropout, and MLP, as illustrated in Figure 4 below.

Initially, the feature map derived from coordinated attention is transformed into vectors via patch embedding. Subsequently, to acquire the similarity among the input vectors, thereby establishing global dependencies and bolstering the network’s capacity to capture contextual information within the image, the multi-head self-attention mechanism is employed to extract global features. Following this, the features undergo normalization, dropout, and processing through MLP to expedite the model’s convergence, augment its nonlinear expressive power, and further refine its generalization capability.

In particular, the “multi-head” component in multi-head attention architecture implements parallel self-attention mechanisms that operate independently. This design principle enables concurrent extraction of heterogeneous feature representations from multiple representation subspaces. The mathematical formulation of the multi-head self-attention mechanism is presented below [27]:

MSA = C o n c a t (A t t e n t i o n_{1}, A t t e n t i o n_{2}, \dots, A t t e n t i o n_{h})

(9)

A t t e n t i o n (Q, K, V) = S o f t M a x (\frac{Q K^{T}}{\sqrt{d_{k}}} + B) V

(10)

Among them,

h

represents the number of attention heads,

x

is the input vector,

Q

represents the query vector,

K

represents the key vector, and

V

represents the value vector.

\sqrt{d_{k}}

is the scale scaling factor, and

B

is the relative position coding.

The term

Q K^{T}

computes the query-key similarity matrix, which is scaled by

\sqrt{d_{k}}

to prevent gradient anomalies. The SoftMax normalization then generates attention weights, which are finally weighted and summed with the value matrix. The positional bias term

B

explicitly introduces spatial constraints. The design of self-attention breaks through the limited receptive field of traditional convolutional operations. This enables adaptive focus on salient features of oil spills while suppressing background interference in oil spill detection.

3.2.3. Multi-Scale Self-Attention Module

For enhanced extraction of localized features in SAR oil spill images, this module utilizes windows with varying segmentation strategies, which facilitate focused attention on window-specific information. This design significantly improves the network’s capability to capture subtle oil spill patterns, consequently boosting detection accuracy.

To this end, a multi-scale self-attention module (as shown in Figure 5) is designed, which mainly includes the W-MSA (Window-Multi Head Self-Attention) mechanism and the SW-MSA (Shift Window-Multi Head Self-Attention) mechanism [28]. The specific details now follow. W-MSA mechanism: To facilitate image segmentation, the Seq2Img operation is utilized to reshape the input feature vectors into feature images. Subsequently, through the segmentation operation depicted in Figure 6, the large feature map is partitioned into multiple non-overlapping small feature maps. Self-attention is then computed within each small feature map to acquire the similarity of pixels within the local window, thereby yielding the weighted feature map.

SW-MSA mechanism: In the W-MSA mechanism, the direct communication between pixels across different windows is impeded, which imposes certain limitations on the extraction of neighborhood feature information. To address this, the SW-MSA mechanism re-partitions the image windows as illustrated in Figure 7b. The new windows retain some pixels from the old windows, and self-attention is then computed within these new windows to facilitate information exchange, thereby enabling a more comprehensive extraction of feature information. Notably, to alleviate computational complexity, the shift splicing operation is strategically implemented on the partitioned windows, as illustrated in Figure 7e. Given the discontinuity of internal pixels within the spliced window, it is imperative to design an appropriate mask during the self-attention computation to exclude extraneous values and ensure that similarity is not calculated between discontinuous pixels, thereby achieving an accurate feature representation. The detailed computational process of the spliced window is depicted in Figure 8. Following the self-attention computation, the shifted windows are restored to their original positions, whereby the network’s final output features are generated through a global average pooling operation.

3.3. Implementation Procedures

The features of the support set samples and the query set samples are extracted by the hybrid attention feature extraction block. These features are projected into an embedding space that is predicated on a category-perception distance metric. Based on the sample features within the support set, the cumulative distances among intra-class samples and those among inter-class samples are computed. The category-perception distance is subsequently defined by the differential between these two summed distances. Then, calculate the difference between the category-perception distance of the output feature vector and the category-perception distance of the input feature vector. Then, according to the European distance between the query set sample and the feature vector of the support set sample, classify the query set feature into the nearest support set category, and convert the distance into a category probability. The classification of query set samples is assessed in conjunction with their true labels. Through continuous updating and refinement of the objective function, the feature distance among samples of the same class is progressively minimized, while the distance between samples of different classes is maximized, thereby enhancing the model’s classification capacity.

(1) Category-Perception Distance Loss Function

To address the challenge of classifying oil spill regions within oil spill imagery, a category-perception loss function is introduced. This function aims to enhance the separability of categories in the few-shot learning metric space of the model. Specifically, this method first calculates the sum of feature distances between query samples and same-class support samples (e.g., oil spill to oil spill), as well as the sum of feature distances between query samples and different-class support samples (e.g., oil spill to seawater). The difference between these two values is then used as the core optimization objective. During model training, network parameters are continuously adjusted through gradient backpropagation, causing the feature distance between same-class samples to progressively decrease while the distance between different-class samples consistently increases.

The category-perception loss function can be defined as:

L_{c l} = l_{f i n a l} - l_{i n i t i a l}

(11)

l = \sum_{i = 1}^{k} d [f (x), f (y_{i})] - \sum_{j = 1}^{n} d [f (x), f (y_{j})]

(12)

Among them,

l_{i n i t i a l}

is the difference in the perceived distance of the input characteristic vector category, and

l_{f i n a l}

is the difference in the perceived distance of the output characteristic vector category.

d (,)

refers to the European distance function,

f (x)

is the sample characteristic of the query set,

f (y_{i})

is the

i

sample feature of the same kind of the support set,

f (y_{j})

is the characteristic of the

j

sample of different classes of the support set, and

n, k

represent the total amount of samples of different and similar samples, respectively.

(2) Category-Perception Distance Loss Function [29]

The quantitative relationship between the distance between the features and the actual label is updated through

L_{c e}

, which is defined as:

L_{c e} = E_{Q, S} [- \sum_{(x, y) \in Q} \log p_{φ} (y = k | x)]

(13)

Among them,

E

is the mathematical expectation,

p (\cdot | \cdot)

is the conditional probability function,

k

is the real label, and

x

is the negative logarithmic probability of the query sample.

Table 1 provides example-based explanations of the cross-entropy loss function.

L oss 1 = - (0 \times \log 0.3 + 0 \times \log 0.3 + 1 \times \log 0.4) = 0.91

L oss 2 = - (0 \times \log 0.3 + 1 \times \log 0.4 + 0 \times \log 0.3) = 0.91

L oss = (0.91 + 0.91) \div 2 = 0.91

Category-perception distance loss function

L_{c l}

and cross-entropy loss function

L_{c e}

together constitute the loss function of this paper, and the total loss can be expressed as:

L = L_{c e} + L_{c l}

(14)

4. Experimental Environment and Pre-Processing

In the data preprocessing stage, the two-scene Radarsat-2 quad-polarization SAR oil spill data and the one-scene Sentinel-1 dual-polarization SAR oil spill data were selected for the experiment. From these, the Sentinel-1 one-scene image with category number 5 is used as the source domain dataset, and the remaining two-scene images are used as the target domain dataset. The SAR image is cropped using SNAP 7.0 software and the modified Lee filtering method is used to denoise the image. The next step involves extracting the following 30 polarimetric features from fully polarimetric SAR images: Span, Geometric intensity, VV intensity, Co-polarization phase difference, Co-polarization power ratio, Co-polarization correlation coefficient, Real part of the co-polarization cross product, Muller polarization feature M_33, Consistency coefficient, Polarimetric entropy, Polarimetric scattering anisotropy, Mean scattering angle, Anisotropy, Maximum eigenvalue, Pedestal height, Average intensity, Single bounce eigenvalue relative difference, Polarimetric feature P, Bragg scattering ratio, Self-similarity parameter, Scattering diversity, Surface scattering fraction, Combined features, H_A_12 Combined parameters, H_A Combined, Oil spill detection indicator CT, Correlation coefficient, Cross-polarization ratio, Degree of polarization, and Gini. Additionally, the following 10 textural features are extracted: Angular second moment, Contrast, Dissimilarity, Energy, Entropy, Correlation, Mean, Variance, Homogeneity, and Maximum probability. Using different extraction window sizes (5 × 5, 7 × 7, 9 × 9) yields a total of 30 features. Due to polarization mode limitations, only 30 textural features are extracted for dual-polarization imagery.

Among them, the polarization features are extracted by MATLAB R2021a software and the texture features are extracted by SNAP 7.0 software. Subsequently, feature selection is performed on the extracted features. To mitigate the subjectivity inherent in feature selection and to automate the identification of the optimal feature subset, a recursive feature elimination approach integrated with cross-validation within random forests is employed for the purpose of feature optimization [30]. Then, the extracted features are selected. To reduce the subjectivity of feature selection, the optimal subset of features is automatically selected, and the recursive feature elimination method combined with cross-verification of random forests is used for feature selection. The methodology entails partitioning the dataset into K distinct subsets, ensuring that each subset encompasses samples representative of all features. The process involves randomly selecting K-1 subsets to train the random forest model, with validation conducted on the remaining subset. Features that contribute minimally to the model are subsequently eliminated. This iterative training procedure is employed to identify the optimal feature subset. As shown in Table 2, Table 3 and Table 4, in this article, Dataset 1, Dataset 2, and Dataset 3 select 16, 18, and 16 characteristics, respectively, and the figure below also marks the proportion of each feature.

In addition, to meet the requirements of model input, the source domain and target domain data are sliced respectively, cut into small patch form and entered into the feature extraction block. The small patch is a 3D cube of 9 × 9 × chs obtained with a width of 4 in the upper, lower, left and right, and a single pixel as the center. The category of the central pixel is the corresponding label. To accommodate the training requirements, the query set and the support set are delineated from the source domain and target domain datasets, respectively. The query set consists of pixels with unknown category attributes during the training session, while the support set consists of pixels with known category attributes. Regarding sample selection, the source domain dataset randomly selects 200 samples per category for training, whereas the target domain is configured with only 5 samples per category for training purposes. All remaining samples are allocated to the test set to validate the accuracy of the network model. In the target domain training, Gaussian noise is employed for data augmentation to fulfill the training requirements, resulting in 200 augmented samples per category. In the training procedure, the hyper parameters of the network are set as follows: the learning rate is 0.0001, the batch size is set to 100, the number of iterations is 20,000, and the Adam optimizer is used to optimize the network. The version of the framework used in this study and the hardware and software configurations including the computing platform are shown in Table 5.

5. Results and Discussions

To demonstrate the superiority of the proposed network methodology, comparative experiments are conducted on two distinct datasets, evaluating the proposed method against other algorithms. Meanwhile, to validate the efficacy of each enhancement approach, corresponding ablation experiments are conducted. Furthermore, experiments involving feature subset optimization are conducted to investigate the impact of feature redundancy on model performance. Specifically, the five evaluation indicators of overall accuracy (OA), average accuracy (AA), Kappa coefficient, Precision, and F1-score are used to evaluate the classification effect of the model and comprehensively evaluate the effect of oil spill detection.

5.1. Comparative Experiments

To assess the efficacy of the proposed model, four methodologies are chosen for comparative experiments against our HA-CP-Net. These include the machine learning algorithm SVM, the deep learning algorithm DeepLabV3 [31], and the few-shot learning algorithms Gia-CFSL and DCFSL [32]. To ensure the integrity of the experiment, five labeled samples from each category were selected for training, with the learning rate uniformly established at 0.0001 and the number of episodes set at 20,000.

(1) Analysis of results for Dataset 2

Figure 9 presents the visual outcomes of oil spill detection on the oil spill data, utilizing both the proposed algorithm and the comparative algorithms, alongside the corresponding ground truth label maps. In Figure 9a, dark green areas represent oil spills or suspected oil spills, while dark purple regions indicate seawater.

As illustrated in Figure 9, the proposed method demonstrates superior visual performance with optimal classification results. In contrast, the SVM-based machine learning approach exhibits significant misclassification artifacts and substantial noise in its output map.

This phenomenon primarily stems from model underfitting due to insufficient training samples, which restricts the model’s capacity to fully extract discriminative features. When generalizing to new datasets, SVM’s performance is particularly susceptible to suboptimal kernel function selection and inappropriate regularization parameters, potentially leading to significant misclassification errors. The DeepLabV3 algorithm demonstrates reduced noise in its classification results; however, it exhibits significant fragmentation in oil spill regions. This limitation stems from its reliance on atrous convolutions and atrous spatial pyramid pooling, which restrict the model’s ability to capture global contextual information. The inherent “gridding” effect in the receptive field during feature extraction leads to discontinuous sampling and incomplete detail capture, ultimately resulting in insufficient feature learning. The classification efficacy of the few-shot learning algorithm DCFSL has shown some improvement, yet it still presents a significant number of artifacts. This is attributed to the DCFSL algorithm’s reliance solely on distance metrics within the feature space to differentiate categories. Similar samples may be distributed in close proximity within the feature space, and distance information alone may not suffice for reliable category decision-making, thereby resulting in misclassification. The classification performance of Gia-CFSL is relatively commendable, yet it still exhibits some artifacts. This is due to the Gia-CFSL algorithm’s reliance on graph structures, which inadequately separates the feature distances between classes, leading to suboptimal category decision-making by the model. In comparison with the previously mentioned methods, the network proposed in this study exhibits optimal performance in oil spill detection. The sea surface is nearly free of speckle noise, and the detected oil spill areas show remarkable integrity. These results demonstrate that the network successfully captures comprehensive global and local image features, while demonstrating robust classification accuracy in distinguishing between oil spill and non-oil spill regions.

Table 6 delineates the oil spill detection accuracy of the proposed algorithm alongside comparative methods. The proposed method in this paper achieves a significantly higher OA of 97.11%, outperforming other algorithms by 0.9% compared to the suboptimal Gia-CFSL and by 6.51% over the traditional SVM. This demonstrates that the proposed method exhibits superior reliability in global classification tasks. The leading AA further indicates its more balanced classification capability across different categories. The Kappa coefficient is substantially higher than those of other methods, confirming that its classification results align closely with ground truth labels and are minimally affected by random factors. With a 97.21% precision rate—2.65% higher than Gia-CFSL—and an F1-score of 97.33%, the proposed method exhibits low false detection rates, highlighting its robustness in classification performance.

(2) Analysis of results for Dataset 2

Figure 10 presents the visualization results of oil spill detection on the oil spill dataset using both the proposed algorithm and the comparative algorithms, alongside the corresponding ground truth label maps. Figure 10 shows the visualization results of the oil spill detection of the oil spill dataset using the algorithm of this paper and the comparison algorithm respectively, as well as the corresponding real label map of the ground. In Figure 10a, dark green areas represent oil spills or suspected oil spills, while dark purple regions indicate seawater.

As illustrated in Figure 10, a significant number of false detections are observed with the machine learning algorithm SVM and the deep learning algorithm DeepLabV3. This is attributed to the inability of SVM and DeepLabV3 algorithms to acquire sufficient features to underpin the network’s classification decisions. Consequently, the network struggles to make accurate judgments during decision-making, leading to the erroneous classification of some non-oil spill areas as oil spills, thereby resulting in a high false alarm rate. The classification efficacy of the few-shot learning algorithms DCFSL and Gia-CFSL has shown some advancement, yet instances of misclassification persist. In contrast, the oil spill detection performance of the proposed network in this study demonstrates a more complete delineation of oil spill regions overall, with classification results that most closely approximate the actual ground truth. The quantitative comparison of oil spill detection accuracy between the proposed algorithm and state-of-the-art methods is summarized in Table 7. The proposed method in this paper achieves the best performance in OA, reaching 98.49%, significantly outperforming other methods. Similarly, it leads in AA with 94.54%, slightly higher than Gia-CFSL’s 93.31% and far exceeding other methods. This metric reflects the model’s average performance across different categories, indicating that the proposed method performs well in all categories. The Kappa coefficient is 0.8810, markedly higher than that of other methods. The Kappa coefficient measures the agreement between model predictions and random predictions, with higher values indicating more reliable model performance. The precision ranks first at 96.88%, followed by Gia-CFSL’s 94.89%. The F1-score is 95.83%, which is the harmonic mean of precision and recall. A high F1-score indicates a good balance between precision and recall.

5.2. Ablation Experiments

To assess the contribution of each enhancement to the network’s performance, this study integrates various improvement methodologies into the baseline network and evaluates their respective contributions. The methodology proposed in this study has attained the highest precision across the three metrics of OA, AA, and Kappa, thereby demonstrating the efficacy of the three enhancement strategies.

Table 8 shows the evaluation index results of the improved method, respectively. The analysis shows that compared with the baseline network, after adding various parts of the improvement, Dataset 2 increased by 3.28% overall, AA increased by 6.61%, and Kappa improved by 0.4057; Dataset 3 increased by 9.75% overall, AA increased by 5.69%, and Kappa increased by 0.3101. Only the coordinate attention module is added. Compared with the baseline network, Dataset 2 is generally improved by 1.64%, AA is increased by 4.53%, and Kappa is increased by 0.1389; Dataset 3 is generally improved by 1.63%, AA increased by 1.07%, and Kappa increased by 0.0677. Only the multi-scale self-attention module is added. Compared with the baseline network, Dataset 2 is generally improved by 2.02%, AA is improved by 5.11%, and Kappa is improved by 0.2999; Dataset 3 is overall increased by 1.73%, AA increased by 0.86%, and Kappa increased by 0.1405. Only the category-perception distance loss function is added. Compared with the baseline network, Dataset 2 is generally increased by 3%, AA is increased by 6%, and Kappa is increased by 0.2001; Dataset 3 is generally increased by 6.69%, AA is increased by 2.36%, and Kappa is increased by 0.192. It can be seen that the method proposed in this paper has significantly improved the accuracy of oil spill detection.

5.3. Feature Optimization Experiments

This section will explore the influence of the amount of feature optimization on the efficacy of oil spill detection. Insufficient oil spill features fail to capture critical information from the imagery, limiting the network’s expressive capability. Conversely, an excessive number of features not only introduces data redundancy and increases model complexity but also degrades the accuracy of oil spill detection. Therefore, selecting an optimal feature subset to enhance oil spill classification accuracy is of paramount importance. Among these, the source domain dataset has been refined to 16 selected features from an initial set of 60, while the target domain Datasets 2 and 3 have been optimized to 18 and 16 features, respectively, from an original pool of 30.

As shown in Figure 11 and Figure 12, both areas use the optimal feature set to obtain the best detection effect. Dataset 2’s OA, AA and Kappa increased by 1.24%, 4.02%, and 0.0941, respectively, while Dataset 3’s OA, AA, and Kappa increased by 3.86%, 4.23%, and 0.1395, respectively.

6. Conclusions

This study proposes a cross-domain few-shot SAR oil spill detection network that integrates hybrid attention and category-perception learning, achieving high-precision oil spill detection with limited sample availability. The conclusions are as follows:

(1) A hybrid attention feature extraction block is proposed to thoroughly extract both global and local information from the images. This module incorporates a coordinate attention mechanism, which adeptly captures long-range dependencies along one spatial dimension while meticulously preserving spatial location information along the other. This sophisticated approach facilitates the effective integration of channel-wise and spatial features. Subsequently, the global self-attention module is designed to capture the global contextual relationships between pixels, thereby enabling a comprehensive representation of feature dependencies at a global scale. The multi-scale self-attention module utilizes windows with diverse partitioning strategies to concentrate on local information, thereby enabling the accurate extraction of fine-grained oil spill features, which significantly enhances detection accuracy.

(2) Furthermore, to address the challenge of distinguishing between oil spill and suspected oil spill samples due to their close proximity in the metric space, a category perception distance loss function is designed. This function is formulated to reduce the intra-class distance while increasing the inter-class distance in the feature space, thereby enhancing the network’s capability for oil spill detection.

(3) Experimental results demonstrate that the proposed method exhibits significant advantages over other algorithms in few-shot oil spill detection, achieving high detection accuracy. Moreover, the effectiveness of various improvements and feature selection strategies is validated through ablation experiments and feature selection analysis. The proposed network demonstrates outstanding oil spill detection capability and serves as an effective solution for few-shot oil spill detection.

(4) However, HA-CP-Net has disadvantages such as long model training time. In future research, we will further explore how to obtain a more lightweight model while accelerating the model training speed and ensuring oil spill detection.

Author Contributions

Conceptualization, D.S. and B.W.; methodology, S.W.; software, W.C.; validation, W.C., L.C. and S.W.; formal analysis, D.S.; investigation, S.W.; resources, B.W.; data curation, S.W.; writing—original draft preparation, S.W.; writing—review and editing, D.S.; visualization, W.C.; supervision, D.S.; project administration, W.C.; funding acquisition, B.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Key Program of Joint Fund of the National Natural Science Foundation of China and Shandong Province under Grant U22A20586, the Natural Science Foundation of Shandong Province under Grant ZR2022MD015, the Fundamental Research Funds for the Central Universities under Grant 24CX02030A, the National Natural Science Foundation of China under Grant 41701513, 61371189 and 41772350, and the Key Research and Development Program of Shandong Province under Grant 2019GGX101033.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mithun Sundhar, B.; Rajan, G.K. Characterizing Ocean Surface Contamination: Composition, Film Thickness, and Rheology. Mar. Pollut. Bull. 2023, 186, 114287. [Google Scholar] [CrossRef]
Kang, S.-I.; Huh, C.; Kim, C.-K.; Cho, M.-I.; Choi, H.-J. Feasibility, Advantages, and Limitations of Machine Learning for Identifying Spilled Oil in Offshore Conditions. J. Mar. Sci. Eng. 2025, 13, 793. [Google Scholar] [CrossRef]
Helle, I.; Mäkinen, J.; Nevalainen, M.; Afenyo, M.; Vanhatalo, J. Impacts of Oil Spills on Arctic Marine Ecosystems: A Quantitative and Probabilistic Risk Assessment Perspective. Environ. Sci. Technol. 2020, 54, 2112–2121. [Google Scholar] [CrossRef] [PubMed]
Asif, Z.; Chen, Z.; An, C.; Dong, J. Environmental Impacts and Challenges Associated with Oil Spills on Shorelines. J. Mar. Sci. Eng. 2022, 10, 762. [Google Scholar] [CrossRef]
Fingas, M.; Brown, C.A. Review of Oil Spill Remote Sensing. Sensors 2017, 18, 91. [Google Scholar] [CrossRef] [PubMed]
Dong, Y.; Liu, Y.; Hu, C.; MacDonald, I.R.; Lu, Y. Chronic Oiling in Global Oceans. Science 2022, 376, 1300–1304. [Google Scholar] [CrossRef]
Liu, F.; Liu, B.; Zhou, H.; Han, S.; Zou, K.; Lv, W.; Liu, C. CIDNet: A Maritime Ship Detection Model Based on ISAR Remote Sensing. J. Mar. Sci. Eng. 2025, 13, 954. [Google Scholar] [CrossRef]
Yaohua, X.; Xudong, M. A SAR Oil Spill Image Recognition Method Based on DenseNet Convolutional Neural Network. In Proceedings of the International Conference on Robots & Intelligent System (ICRIS), Haikou, China, 15–16 June 2019; pp. 78–81. [Google Scholar]
Kang, J.; Yang, C.; Yi, J.; Lee, Y. Detection of Marine Oil Spill from PlanetScope Images Using CNN and Transformer Models. J. Mar. Sci. Eng. 2024, 12, 2095. [Google Scholar] [CrossRef]
Yu, F. An Improved Otsu Method for Oil Spill Detection from SAR Images. Oceanologia 2017, 59, 311–317. [Google Scholar] [CrossRef]
Boko, Z.; Skoko, I.; Sanchez-Varela, Z.; Pincetic, T. Application of Advanced Algorithms in Port State Control for Offshore Vessels Using a Classification Tree and Multi-Criteria Decision-Making. J. Mar. Sci. Eng. 2024, 12, 1905. [Google Scholar] [CrossRef]
Shenouda, A.; Hagras, M.A.; Rusu, E.; Ismael, S.; Fayek, H.H.; Balah, A. Selecting Appropriate Water–Energy Solutions for Desalination Projects in Coastal Areas. J. Mar. Sci. Eng. 2024, 12, 1901. [Google Scholar] [CrossRef]
Topouzelis, K.; Karathanassi, V.; Pavlakis, P.; Rokos, D. Oil Spill Detection Using Rbf Neural NetwoRKS and Sar Data. In Proceedings of the ISPRS Congress, Istanbul, Turkey, 12–23 July 2004. [Google Scholar]
Zhang, Y.; Zhu, Q.; Guan, Q. Oil Spill Detection Based on CBD-Net Using Marine SAR Image. In Proceedings of the EEE International Geoscience and Remote Sensing Symposium, Brussels, Belgium, 11 July 2021; pp. 3495–3498. [Google Scholar]
Ma, X.; Xu, J.; Wu, P.; Kong, P. Oil Spill Detection Based on Deep Convolutional Neural Networks Using Polarimetric Scattering Information From Sentinel-1 SAR Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4204713. [Google Scholar] [CrossRef]
Nakamura, A.; Harada, T. Revisiting Fine-Tuning for Few-Shot Learning. arXiv 2019, arXiv:1910.00216. [Google Scholar]
Royle, J.A.; Dorazio, R.M.; Link, W.A. Analysis of Multinomial Models With Unknown Index Using Data Augmentation. J. Comput. Graph. Stat. 2007, 16, 67–85. [Google Scholar] [CrossRef]
Jang, Y.; Lee, H.; Hwang, S.J.; Shin, J. Learning What and Where to Transfer. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA, 9–15 June 2019; pp. 3030–3039. [Google Scholar]
Tai, Y.; Tan, Y.; Xiong, S.; Sun, Z.; Tian, J. Few-Shot Transfer Learning for SAR Image Classification Without Extra SAR Samples. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 2240–2253. [Google Scholar] [CrossRef]
Li, H.; Wang, T.; Wang, S. Few-Shot SAR Target Classification Combining Both Spatial and Frequency Information. In Proceedings of the GLOBECOM 2022–2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; pp. 480–485. [Google Scholar]
Alajaji, D.; Alhichri, H.S.; Ammour, N.; Alajlan, N. Few-Shot Learning For Remote Sensing Scene Classification. In Proceedings of the 2020 Mediterranean and Middle-East Geoscience and Remote Sensing Symposium (M2GARSS), Tunis, Tunisi, 9–11 March 2020; pp. 81–84. [Google Scholar]
Liu, B.; Yu, X.; Yu, A.; Zhang, P.; Wan, G.; Wang, R. Deep Few-Shot Learning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 2290–2304. [Google Scholar] [CrossRef]
Krestenitis, M.; Orfanidis, G.; Ioannidis, K.; Avgerinakis, K.; Vrochidis, S.; Kompatsiaris, I. Oil Spill Identification from Satellite Images Using Deep Neural Networks. Remote Sens. 2019, 11, 1762. [Google Scholar] [CrossRef]
Song, D. Adaptive Oil Spill Detection Network for Scene-Based PolSAR Data Using Dynamic Convolution and Boundary Constraints. Int. J. Appl. Earth Obs. Geoinf. 2024, 130, 103914. [Google Scholar] [CrossRef]
Song, D.; Qin, F.; Wang, B.; Dai, S. Few-Shot Learning With Label Smoothing and Metric Space Optimization for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5535722. [Google Scholar] [CrossRef]
Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
Wang, Y.; Ma, X.; Chen, Z.; Luo, Y.; Yi, J.; Bailey, J. Symmetric Cross Entropy for Robust Learning with Noisy Labels. In Proceedings of the IEEE/CVF International Conference on Computer, Long Beach, CA, USA, 5–20 June 2019; pp. 322–330. [Google Scholar]
Awad, M.; Fraihat, S. Recursive feature elimination with cross-validation with decision tree: Feature selection method for machine learning-based intrusion detection systems. J. Sens. Actuator Netw. 2023, 12, 67. [Google Scholar] [CrossRef]
Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
Zhang, Y.; Li, W.; Zhang, M.; Wang, S.; Tao, R.; Du, Q. Graph Information Aggregation Cross-Domain Few-Shot Learning for Hyperspectral Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 1912–1925. [Google Scholar] [CrossRef] [PubMed]

Figure 1. SAR oil spill images.

Figure 2. Schematic diagram of HA-CP-Net model.

Figure 3. Structure of coordinate attention.

Figure 4. Global self-attention transformer module.

Figure 5. Multi-scale self-attention module.

Figure 6. Schematic diagram of window partitioning in W-MSA. The boxes of different colors in the figure represent both the window sizes and segmentation methods used for image segmentation.

Figure 7. SW-MSA segmentation and shift diagram. The boxes of different colors in the figure represent both the window sizes and segmentation methods used for image segmentation. (a): Input feature map, (b): Schematic diagram of feature map segmentation method, (c): Segmentation result diagram (showing partitioning effect), (f): Segmented window numbering diagram (labeling partition indices), (e): Feature map after window displacement (demonstrating spatial transformation), (d): Feature map after window restoration.

Figure 8. Combined window 8 and 1 self-attention calculation diagram.

Figure 9. Dataset 2 oil spill detection results map.

Figure 10. Dataset 3 oil spill detection results map.

Figure 11. Classification accuracy of initial feature set and preferred feature set for Dataset 2.

Figure 12. Classification accuracy of initial feature set and preferred feature set for Dataset 3.

Table 1. The model’s predicted values and true values.

Predicted Value			True Value			Loss
0.3	0.3	0.4	0	0	1	0.91
0.3	0.4	0.3	0	1	0	0.91

Table 2. Features preferred for Dataset 1.

Features	Percent
Correlation-9 × 9	7.5%
Entropy-7 × 7	7.2%
Maximum probability-7 × 7	7.1%
Energy-7 × 7	6.8%
Maximum probability-9 × 9	6.6%
Homogeneity-7 × 7	6.5%
Correlation-5 × 5	6.1%
Contrast-7 × 7	6.0%
Mean-5 × 5	6.0%
Correlation-7 × 7	5.9%
Dissimilarity-5 × 5	5.9%
Contrast-9 × 9	5.7%
Homogeneity-9 × 9	5.7%
Maximum probability-5 × 5	5.7%
Angular second moment-5 × 5	5.7%
Energy-9 × 9	5.6%

Table 3. Features preferred for Dataset 2.

Features	Percent
Mean-5 × 5	10.8%
Average intensity	10.4%
Dissimilarity-7 × 7	10%
Maximum eigenvalue	10%
Variance-5 × 5	9.9%
Homogeneity-7 × 7	9.7%
Polarimetric entropy	6.0%
Gini	5.2%
Degree of polarization	4.5%
Mean scattering angle	4.2%
Combined features	3.6%
VV intensity	3.6%
Coherence coefficient	3.2%
H_A Combined parameters	2.6%
H_A12 Combined parameters	1.9%
Pedestal height	1.6%
Bragg scattering ratio	1.4%
Anisotropy	1.4%

Table 4. Features preferred for Dataset 3.

Features	Percent
Maximum eigenvalue	10.9%
Mean-5 × 5	10.7%
Geometric intensity	10.4%
Mean-7 × 7	10.3%
Variance-5 × 5	10.3%
Variance-7 × 7	10.2%
Average intensity	9.8%
Span	4.4%
Co-polarization power ratio	4.4%
VV intensity	4.3%
Surface scattering fraction	4.1%
Maximum probability-7 × 7	2.2%
Angular second moment-7 × 7	2.1%
Energy-7 × 7	2.0%
Polarimetric entropy	1.9%
Entropy-7 × 7	1.9%

Table 5. Experimental configuration hardware and software.

Configuration	Version
GPU	NVIDIA GTX GeForce 3060
Memory	16.0 GB
System	Windows 11 Professional, 64-bit
Language	Python 3.9.16
Frame	PyTorch 2.0.0
CUDA	11.7

Table 6. Comparison of oil spill detection accuracy of different algorithms for Dataset 2.

	OA (%)	AA (%)	Kappa	Precision (%)	F1-Score (%)
SVM	90.60	91.76	0.7165	87.89	89.70
DeepLabV3	91.51	92.36	0.7394	91.32	92.68
DCFSL	95.69	95.64	0.8695	94.50	93.71
Gia-CFSL	96.21	95.91	0.8732	94.56	94.36
Proposed	97.11	96.81	0.9490	97.21	97.33

Table 7. Comparison of oil spill detection accuracy of different algorithms for Dataset 3.

	OA (%)	AA (%)	Kappa	Precision (%)	F1-Score (%)
SVM	80.58	82.84	0.6234	81.13	80.55
DeepLabV3	82.49	83.16	0.6460	85.76	84.33
DCFSL	83.78	83.41	0.7568	82.61	81.21
Gia-CFSL	94.63	93.31	0.7915	94.89	92.81
Proposed	98.49	94.54	0.8810	96.88	95.83

Table 8. Classification accuracy of ablation experiments.

Data	Baseline	Coordinate Attention Module	Multi-Scale Self-Attention Module	Category-Perception Distance	OA (%)	AA (%)	Kappa
Dataset 2	✓				93.83	90.20	0.5433
	✓	✓			95.47	94.73	0.6822
	✓		✓		95.85	95.31	0.8432
	✓			✓	96.83	96.20	0.7434
	✓	✓	✓	✓	97.11	96.81	0.9490
Dataset 3	✓				88.74	88.85	0.5709
	✓	✓			90.37	89.92	0.6386
	✓		✓		90.47	89.71	0.7114
	✓			✓	95.43	91.21	0.7632
	✓	✓	✓	✓	98.49	94.54	0.8810

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Song, D.; Wang, S.; Wang, B.; Chen, W.; Chen, L. HA-CP-Net: A Cross-Domain Few-Shot SAR Oil Spill Detection Network Based on Hybrid Attention and Category Perception. J. Mar. Sci. Eng. 2025, 13, 1340. https://doi.org/10.3390/jmse13071340

AMA Style

Song D, Wang S, Wang B, Chen W, Chen L. HA-CP-Net: A Cross-Domain Few-Shot SAR Oil Spill Detection Network Based on Hybrid Attention and Category Perception. Journal of Marine Science and Engineering. 2025; 13(7):1340. https://doi.org/10.3390/jmse13071340

Chicago/Turabian Style

Song, Dongmei, Shuzhen Wang, Bin Wang, Weimin Chen, and Lei Chen. 2025. "HA-CP-Net: A Cross-Domain Few-Shot SAR Oil Spill Detection Network Based on Hybrid Attention and Category Perception" Journal of Marine Science and Engineering 13, no. 7: 1340. https://doi.org/10.3390/jmse13071340

APA Style

Song, D., Wang, S., Wang, B., Chen, W., & Chen, L. (2025). HA-CP-Net: A Cross-Domain Few-Shot SAR Oil Spill Detection Network Based on Hybrid Attention and Category Perception. Journal of Marine Science and Engineering, 13(7), 1340. https://doi.org/10.3390/jmse13071340

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HA-CP-Net: A Cross-Domain Few-Shot SAR Oil Spill Detection Network Based on Hybrid Attention and Category Perception

Abstract

1. Introduction

2. Materials

3. Proposed Method

3.1. Framework

3.2. Hybrid Attention Feature Extraction Block

3.2.1. Coordinate Attention Module [26]

3.2.2. Global Self-Attention Transformer Module

3.2.3. Multi-Scale Self-Attention Module

3.3. Implementation Procedures

4. Experimental Environment and Pre-Processing

5. Results and Discussions

5.1. Comparative Experiments

5.2. Ablation Experiments

5.3. Feature Optimization Experiments

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI