Improved Lightweight Marine Oil Spill Detection Using the YOLOv8 Algorithm

Shi, Jianting; Jiao, Tianyu; Ames, Daniel P.; Chen, Yinan; Xie, Zhonghua

doi:10.3390/app16020780

Open AccessArticle

Improved Lightweight Marine Oil Spill Detection Using the YOLOv8 Algorithm

by

Jianting Shi

^1,2

,

Tianyu Jiao

¹,

Daniel P. Ames

^2,*

,

Yinan Chen

¹ and

Zhonghua Xie

¹

College of Computer and Information Engineering, Hei Longjiang University of Science and Technology, Harbin 150022, China

²

Civil and Construction Engineering, College of Engineering, Brigham Young University, Provo, UT 84602, USA

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(2), 780; https://doi.org/10.3390/app16020780

Submission received: 17 November 2025 / Revised: 31 December 2025 / Accepted: 2 January 2026 / Published: 12 January 2026

(This article belongs to the Special Issue Remote Sensing Applications in Agricultural, Earth and Environmental Science, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Marine oil spill detection using Synthetic Aperture Radar (SAR) is crucial but challenged by dynamic marine conditions, diverse spill scales, and limitations in existing algorithms regarding model size and real-time performance. To address these challenges, we propose LSFE-YOLO, a YOLOv8s-optimized (You Only Look Once version 8) lightweight model with an original, domain-tailored synergistic integration of FasterNet, GN-LSC Head (GroupNorm Lightweight Shared Convolution Head), and C2f_MBE (C2f Mobile Bottleneck Enhanced). FasterNet serves as the backbone (25% neck width reduction), leveraging partial convolution (PConv) to minimize memory access and redundant computations—overcoming traditional lightweight backbones’ high memory overhead—laying the foundation for real-time deployment while preserving feature extraction. The proposed GN-LSC Head replaces YOLOv8’s decoupled head: its shared convolutions reduce parameter redundancy by approximately 40%, and GroupNorm (Group Normalization) ensures stable accuracy under edge computing’s small-batch constraints, outperforming BatchNorm (Batch Normalization) in resource-limited scenarios. The C2f_MBE module integrates EffectiveSE (Effective Squeeze and Excitation)-optimized MBConv (Mobile Inverted Bottleneck Convolution) into C2f: MBConv’s inverted-residual design enhances multi-scale feature capture, while lightweight EffectiveSE strengthens discriminative oil spill features without extra computation, addressing the original C2f’s scale variability insufficiency. Additionally, an SE (Squeeze and Excitation) attention mechanism embedded upstream of SPPF (Spatial Pyramid Pooling Fast) suppresses background interference (e.g., waves, biological oil films), synergizing with FasterNet and C2f_MBE to form a cascaded feature optimization pipeline that refines representations throughout the model. Experimental results show that LSFE-YOLO improves mAP (mean Average Precision) by 1.3% and F1 score by 1.7% over YOLOv8s, while achieving substantial reductions in model size (81.9%), parameter count (82.9%), and computational cost (84.2%), alongside a 20 FPS (Frames Per Second) increase in detection speed. LSFE-YOLO offers an efficient and effective solution for real-time marine oil spill detection.

Keywords:

SAR images; YOLOv8; deep learning; marine oil spill detection; lightweight

1. Introduction

The marine environment faces significant threats from oil spills caused by maritime transportation and offshore oil drilling platforms, exacerbating the problem of oil pollution [1]. Due to the ocean’s limited self-purification capacity, oil spills can cause severe damage to marine ecosystems, severely impacting ecological balance [2]. With advancements in satellite remote sensing technology, Synthetic Aperture Radar (SAR) technology has proven to be an effective means for monitoring marine oil spills due to its all-weather, all-day capabilities [3]. When short gravity waves and capillary waves affect the sea surface, the microwave Bragg scattering on the sea surface significantly weakens. This makes the oil film appear as prominent black spots in SAR images [4]. This renders SAR data a valuable source for detecting marine oil spills [5]. However, the sea surface is subject to several natural phenomena, including waves, ocean currents, and low winds, which give rise to uneven intensity and high noise levels in oil spill images [6], which makes oil spill detection challenging. Accurately identifying oil spill areas can help us reduce the false alarm rate in marine oil spill monitoring. In addition, failure to promptly detect oil spill areas allows the oil to disperse through ocean currents, resulting in widespread marine pollution and causing significant adverse effects on the marine environment [7]. Consequently, there is a need for high standards of real-time accuracy in marine oil spill detection. How to effectively reduce computational resource consumption and accelerate detection speed while ensuring the accuracy of oil spill detection has become a key research focus in this field.

There are three primary research directions for detecting marine oil spills using SAR remote sensing images: traditional threshold segmentation methods, machine learning methods, and deep learning methods [8]. The traditional threshold segmentation method focuses on determining an appropriate gray threshold, usually based on the gray histogram of the image. Yu et al. [9] proposed an adaptive mechanism using the Otsu method, which combines edge detection with threshold segmentation to identify and extract the location and extent of oil spills in remote sensing images. Li et al. [10] utilized a sliding window method that selects the optimal segmentation threshold based on the maximum entropy within the window. This approach effectively filters out small regions generated during segmentation and merges adjacent areas according to their distance. The correct detection rate for this method was 86.61%. With the increasing popularity of traditional neural networks and machine learning algorithms, Kim et al. [11] leveraged Terra-SAR dual-polarization data to extract polarization parameters and optimize the input layer of an artificial neural network for analyzing oil spill regions in SAR images. Meanwhile, Magrì et al. [12] selected the most relevant features and used them in a support vector machine (SVM) classifier to successfully identify and classify marine oil spill areas. However, these methods often require manual feature recognition design, which can be time-consuming and lacks scalability. They also heavily rely on threshold settings or model hyperparameter configurations, which can be subjective and uncertain [13]. Therefore, we currently focus our research on deep learning methods. Recent advances in imaging equipment, computer hardware, and software indicate that deep learning techniques have considerable potential to improve the efficiency and accuracy of marine oil spill detection. The object detection method based on deep learning can automatically extract and learn complex high-level features [14]. Researchers have applied some deep learning methods to detect oil spills at sea. XIONG & Meng [15] designed a densely connected network model using DenseNet convolutional neural networks. This model extracts multi-scale features from images, improving the ability to capture fine features and enhancing image recognition accuracy. We input the oil spill SAR image into the CNN network. The CNN model extracts the SAR image’s features, which improves the model’s detection accuracy but requires high memory and computing resources. Huang et al. [16] proposed a SAR oil spill detection algorithm based on Faster R-CNN. The CNN model was used to extract SAR image features, input the feature information into the Region Proposal Network (RPN), and generate a prediction boundary box, which could realize fast end-to-end oil spill detection. However, the detection accuracy of this method is relatively low. Zhu et al. [6] proposed CBD-Net, an oil spill detection network based on scenario and boundary supervision, to extract oil spill areas by integrating multi-scale features and to introduce the attention mechanism module scSE to enhance the internal consistency of oil spill areas. However, introducing an additional boundary supervision module reduces the detection speed.

While the research methods mentioned earlier have improved either accuracy or speed, they cannot achieve both simultaneously. For detecting marine oil spills, achieving high accuracy is essential to prevent oil from spreading while meeting real-time requirements. To address this gap, we propose LSFE-YOLO, a YOLOv8s-optimized lightweight model that integrates an original, synergistic module tailored to domain-specific challenges. We adopt FasterNet [17] as the backbone, with neck width reduction to minimize computational overhead and enhance inference speed; design the GN-LSC Head, integrating convolutions and GroupNorm [18] for a lightweight architecture while maintaining stable accuracy under resource constraints; integrate the SE [19] attention mechanism to suppress background interference; and develop the C2f_MBE module using EffectiveSE-optimized MBConv [20] + C2f to strengthen multi-scale feature extraction. This complementary module combination resolves the long-standing accuracy-speed trade-off, filling the gap of deployable, high-performance solutions for marine oil spill detection.

2. Materials and Methods

2.1. Image Acquisition

The SAR images we used were obtained from the Alaska Satellite Facility (ASF) platform. Publicly available oil spill data were obtained from the National Oceanic and Atmospheric Administration (NOAA). The image is from the Gulf of Mexico. As shown in Figure 1a, the NOAA dataset includes geographic coordinates, occurrence dates, and image sources for the oil spill event, with the affected areas clearly marked. NOAA (https://www.ospo.noaa.gov/Products/ocean/, accessed on 31 December 2025) [21] has published verified oil spill data from the Gulf of Mexico, the Pacific Ocean, the Atlantic Ocean, the Great Lakes, and international waters, and it uses Sentinel-1 satellites to gather information on various types of oil spills. The corresponding Sentinel-1 SAR images can be obtained from the ASF (https://search.asf.alaska.edu/, accessed on 31 December 2025) [22] platform using information such as the geographical coordinates and occurrence dates of oil spill incidents. Sentinel-1 monitors the surface in all weather conditions and provides high-quality SAR images, making it an ideal source for our research.

Because the Interferometric Wide Swath (IW) acquisition mode generates SAR images with high quality, positioning accuracy, and radiation resolution, we used Sentinel-1’s Ground Range Detected (GRD) product to collect data in IW mode. The GRD product contains data with two polarization modes: VV and VH. On the ocean surface, oil films are relatively smooth and have low radar wave echo reflectivity. The VV polarization has higher echo reflectivity, and the oil film forms more apparent surface scattering characteristics. Therefore, VV polarization data is selected as the raw marine oil spill detection data. Figure 2a shows the VV polarization SAR image of the Gulf of Mexico on 27 January 2023, downloaded from the ASF platform in the Sentinel-1 GRD product IW acquisition mode.

Existing oil spill detection datasets are difficult to obtain and lack publicly available datasets. Therefore, this paper will detail the construction of an oil spill detection dataset to evaluate the accuracy of the proposed method and to address the scarcity of data sources for this task.

2.2. Image Preprocessing

We downloaded the GRD format file from the ASF platform to acquire comprehensive Sentinel-1 Synthetic Aperture Radar (SAR) data related to the NOAA oil spill incident. We collected a total of 1335 SAR images depicting oil spills on the sea surface. Our initial phase involves preprocessing the acquired SAR data, transforming the raw data into high-quality inputs suitable for target detection tasks, thereby ensuring data reliability. We used SNAP (Sentinel Application Platform, version 8.0.0), a free and open-source software specifically designed for processing and analyzing Sentinel satellite data, to handle the collected SAR data.

Our preprocessing procedures included orbit correction, radiation calibration, geocoding, image filtering, decibelization, and cropping of images Oil spill scenarios are inherently complex and variable, necessitating a larger number of representative samples to enhance the effectiveness of target detection algorithms for identifying oil spills on the sea surface. To address this need, we employed a data augmentation technique to increase the size of our training dataset. To preserve the feature information contained within the synthetic aperture radar (SAR) images, we implemented only horizontal and vertical flipping, random contrast adjustment, and random scaling as methods to enhance the original dataset. Using Figure 1b as an example, the preprocessed SAR image is shown in Figure 2a. The preprocessing effectively enhances the brightness and texture details of the oil spill area while significantly reducing noise and artifact interference.

After preprocessing the SAR raw images, this study uses the LabelMe (Version 3.16.7) image annotation tool to label them. In LabelMe, the Rectangle annotation tool is selected to label the pictures, with the annotation category set to “oil”. As shown in Figure 2b, the box indicates the oil spill area. Upon completion of annotation, a JSON file with the same image name is generated, containing annotation category information, labeled area coordinates, annotation box type, and the text matrix for the entire image. The JSON file generated by LabelMe is then converted to a text file. Through these steps, this study successfully constructs a SAR image-based oil spill dataset suitable for YOLOv8 object detection, providing a solid data foundation for subsequent training and evaluation of oil spill recognition models. This approach also enhances the applicability and generalization capability of oil spill recognition models in complex marine environments.

Following the data enhancement process, we successfully acquired a total of 2670 Synthetic Aperture Radar (SAR) images. We used the LabelMe data annotation tool for manual calibration, and we subsequently divided the dataset into training, validation, and test sets in an 8:1:1 ratio. We documented the annotated labels in text format, establishing a robust foundation for our subsequent oil spill identification tasks. The data and programs are available on this website, and open access is provided as needed (http://www.hydroshare.org/resource/7c54d14f1f2b48a4a2fdfdab12aa233f, accessed on 31 December 2025).

2.3. The YOLOv8 Network Architecture

The YOLO network has a simple structure and fast detection speed, making it a representative single-stage target detection algorithm with broad application prospects. Glenn Jocher proposed the YOLOv8 algorithm, which follows the same principles as the YOLOv3 and YOLOv5 algorithms. The YOLOv8 network structure is shown in Figure 3. Compared to previous versions, YOLOv8 introduces a new architecture that further improves the model’s performance and flexibility. The YOLOv8 network structure comprises three main components: the backbone, neck, and head networks.

The data preprocessing strategy of YOLOv8 continues that of YOLOv5, mainly using four enhancement techniques: Mosaic augmentation, mixup, spatial perturbation, and color perturbation [17]. These methods improve the model’s generalization ability and robustness. The backbone network primarily extracts information from images and includes the C2f module, SPPF module, and Conv_BN_SiLU module. Compared to the C3 module in YOLOv5, the C2f module in YOLOv8 incorporates additional branches. This enables richer gradient flow information while maintaining a lightweight structure. The SPPF module pools input feature maps at different scales, extracting and fusing multi-scale features to enhance the model’s feature extraction capability. The Conv_BN_SiLU module, which consists of convolutional operations, batch normalization, and the SiLU activation function, reduces the model’s parameters and accelerates convergence. The Neck part primarily functions in feature fusion, constructing YOLO’s feature pyramid through FPN and PAN structures to fully integrate multi-scale information. The head part mainly retrieves category and location information for the detection targets. Decoupled head and anchor-free strategies separate the classification and detection tasks, improving model performance and flexibility. YOLOv8 uses BCE Loss for classification, DFL Loss, and CloU Loss for regression. This enables a more precise determination of the target object’s actual bounding box. Compared to previous YOLO algorithms, including YOLOv7 which set new state-of-the-art for real-time object detectors [23], YOLOv8 achieves higher accuracy and enhanced scalability.

Figure 3. Network structure of YOLOv8 (Yang et al. 2023 [24]).

2.4. The Proposed LSFE-YOLO Network Architecture

To address the need for improved responses to marine oil spills, we developed a model, LSFE-YOLO. This model improves on YOLOv8 by focusing on three main goals: reducing background noise, adapting to different target sizes, and enhancing real-time performance. It also aims to reduce the number of model parameters and computational cost, thereby increasing detection speed while maintaining accuracy. The network structure for the improved LSFE-YOLO is shown in Figure 4.

2.4.1. FasterNet

FasterNet is an innovative neural network architecture that delivers superior inference speed across diverse devices while maintaining high accuracy in multiple visual tasks, characterized by reduced latency and enhanced throughput. Evaluations on ImageNet-1k demonstrate that FasterNet outperforms MobileViT-XXS by 3.1 × on GPUs/CPUs and 2.5 × on ARM processors, with a 2.9% accuracy gain [25]. Consequently, we selected FasterNet as the backbone network for YOLOv8 to enhance the model’s detection performance.

Mainstream lightweight networks (e.g., MobileNet, ShuffleNet, GhostNet) rely on depthwise convolution (DWConv), group convolution (GConv), and depthwise separable convolution (DSC) for spatial feature extraction to reduce model complexity and computational cost [24]. However, minimizing floating-point operations (FLOPs) in these designs often induces increased memory access. Additionally, redundant data processing operations (e.g., concatenation, shuffling, pooling) increase runtime, rendering them too slow for latency-sensitive applications.

FasterNet employs partial convolution (PConv) to reduce computational redundancy and memory access while enhancing the efficiency of spatial feature extraction. As illustrated in Figure 5, PConv leverages the inherent redundancy in feature maps by applying regular convolution only to a subset of input channels, typically the first or last consecutive

C_{p}

channels for continuous memory access, leaving the rest unaffected. This design significantly reduces the number of parameters and computational load. When

C_{p}

is set to 1/4 of the total channels, PConv’s FLOPs are merely 1/16 of regular convolution, with memory access reduced to approximately 1/4. The formulas for calculating PConv’s FLOPs and memory access are as follows:

F = h \times w \times k^{2} \times c_{p}^{2}

(1)

M = h \times w \times 2 c_{p} + k^{2} \times c_{p}^{2} \approx h \times w \times 2 c_{p}

(2)

where h represents the height of the feature map, w represents the width of the feature map, and k represents the kernel size. Furthermore, F denotes the floating point operations (FLOPs) of PConv, while M represents the memory access of PConv.

To fully utilize multi-channel information, Pointwise Convolution (PWConv) is incorporated on top of PConv; the FasterNet block, comprising one PConv layer, two subsequent PWConv layers, forming an inverted residual structure, and post-intermediate normalization/activation layers to preserve feature diversity and reduce latency, serves as the core feature extraction module for each FasterNet stage. Compared to YOLOv8’s original backbone, this design reduces computational complexity while boosting detection accuracy and speed. In the YOLOv8s neck network, the C2f and Conv modules incur substantial memory/cache overhead because their convolutional kernels generate numerous feature map channels, resulting in excessive parameters and computational cost. Given that network depth is tightly coupled with model expressiveness and cannot be further reduced, we optimize by decreasing the neck network width by 25% by reducing the number of convolutional kernels in the C2f and Conv modules, thereby substantially reducing the parameter count and improving operational speed.

2.4.2. GN-LSC Head

YOLOv8 employs a decoupled detection head that separates classification and bounding box regression into two independent branches, each consisting of two one-dimensional convolutions and one two-dimensional convolution. While this design enhances flexibility and detection accuracy, it introduces excessive parameters and redundant computations, restricting its applicability to resource-constrained embedded devices. To address this limitation, we propose the GN-LSC Head, a detection module featuring a shared convolutional architecture integrated with GroupNorm. This normalization technique partitions channels into groups for intra-group normalization, enabling batch-size-independent, stable accuracy while reducing computational burden and improving the detection head’s localization and classification performance.

The shared convolutional structure of the GN-LSC Head significantly reduces model parameters and computational redundancy, achieving a lightweight design. To enhance feature extraction capability, two 3 × 3 convolutions are cascaded; meanwhile, a Scale layer is incorporated to adapt to inconsistent target scales across different detection heads. This integrated design constitutes the final GN-LSC Head detection module, which further improves computational efficiency; its structure is illustrated in Figure 6.

To further improve the model’s performance, the GN-LSC Head employs Group- Norm rather than BatchNorm for normalization. Comparative experiments were conducted to verify the efficacy of GroupNorm integration in the detection head, with the results presented in Table 1. The data indicate that the GN-LSC Head using GroupNorm achieves improvements in three key metrics: precision (P) increased by 0.4%, recall (R) by 5.1%, and mean Average Precision (mAP) by 1.9%, compared to the conventional.

2.4.3. SENet

Complex background information in sea-surface oil-spill images degrades target-detection performance. To address this, we integrate the Squeeze-and-Excitation (SE) module, which adaptively recalibrates inter-channel feature responses to enhance the weight of discriminative features and suppress irrelevant ones, thereby effectively improving the model’s ability to recognize oil spill targets in complex backgrounds. The SE module achieves this through two core operations: squeeze and excitation. In the squeeze operation, global average pooling aggregates spatial information of each input feature map channel into a scalar, reducing spatial dimensions and capturing global feature representations; in the excitation operation, the aggregated scalar generates channel-specific weights via fully connected layers and activation functions, which are then used to recalibrate channel features and direct the model’s focus to essential information.

The SE module is embedded upstream of the Spatial Pyramid Pooling-Fast (SPPF) module in the YOLOv8 backbone, enabling feature-map screening and optimization. As a key component of the YOLOv8 backbone, the SPPF module enhances the network’s receptive field by applying multi-scale pooling operations to the input feature map. By prioritizing critical features and suppressing background interference upfront, the SE module ensures that the subsequent SPPF process focuses on meaningful oil-spill-related features, thereby improving the model’s detection accuracy in complex backgrounds. Notably, this enhancement is achieved with a relatively small additional computational cost, effectively boosting the model’s overall performance. The structure of the SE module is illustrated in Figure 7.

Where

F_{t r}

indicates that the input feature graph X is generated by convolution operation to generate feature graph U, and the feature graph U is globally average pooled by extrusion operation

F_{s q}

to generate channel statistic z, where the c element of z is calculated as follows:

z_{c} = F_{s q} (u_{c}) = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} u_{c} (i, j)

(3)

F_{e x}

represents the excitation operation. Statistic z passes the first fully connected layer and ReLU activation function and then passes the second fully connected layer and Sigmoid activation function to generate the weight s for recalculating the channel. The calculation formula is as follows:

s = F_{e x} (z, W) = σ (g (z, W)) = σ (W_{2} δ (W_{1} z))

(4)

F_{scale}

represents the scaling operation, which multiplies the generated weight s with the corresponding channel of the feature graph U to carry out feature fusion and generate the final feature graph

\tilde{X}

.

2.4.4. C2f_MBE

The C2f module is an innovative design in YOLOv8 that builds on the ELAN in YOLOv7 [26]. It features abundant residual connections and rich gradient flow but lacks attention to critical target information during feature extraction [27]. Compounded by the significant scale variation in oil spills in sea-surface images—where large spills may dominate, and small ones are easily overlooked in complex backgrounds—accurate detection is challenging. To address this, we enhance the C2f module with Mobile Inverted Bottleneck Convolution (MBConv). Its inverted-residual structure preserves information integrity while capturing multi-scale features, enabling the learning of complex, rich representations that improve the detection of oil spills across scales. Additionally, the integrated SE module in MBConv prioritizes key features, strengthening representations that better distinguish small localized and large diffuse spills. Notably, MBConv’s lightweight property ensures these improvements incur minimal computational overhead, making it suitable for real-time or resource-constrained marine oil spill monitoring systems. The processing flow of MBConv is detailed as follows:

Step1, uses a

1 \times 1

convolution to upsample.

Step2, a Depthwise Convolution (DWConv) extracts features, then introduces the SE module.

Step3, uses a

1 \times 1

convolution to downsample and output at the Dropout layer, as Figure 8a shows.

The proposed LSFE-YOLO model replaces the Bottleneck in the neck network’s C2f module with MBConv. The enhanced C2f module better captures inter-feature relationships. It strengthens oil spill region feature extraction, enabling the model to capture a more comprehensive set of oil spill features and significantly improving detection performance. However, this optimization increases the number of model parameters. It reduces detection speed-an undesirable trade-off given the equal importance of speed and accuracy for real-time oil spill detection. To retain accuracy while minimizing speed loss, we replace the original SE module in MBConv with the EffectiveSE module [28], yielding the improved MBEConv (structure shown in Figure 8b).

EffectiveSE is an optimized variant of the SE module that retains its core squeeze-and-excitation operations but differs in its architecture. Unlike SE (which uses two fully connected layers requiring channel dimension reduction and expansion), EffectiveSE adopts a single fully connected layer and eliminates channel dimension reduction. This design reduces model parameters, preserves more channel information to enhance feature representation, and preserves the SE module’s advantage of emphasizing critical features-all without additional computational cost. These merits make EffectiveSE well-suited for real-time oil spill detection and deployment on edge devices or in low-power environments with limited computational resources, where inference speed is critical.

To mitigate the impact on detection speed, we implemented improvements in MBConv by replacing the original Squeeze-and-Excitation (SE) module with an EffectiveSE variant. To evaluate the proposed module’s effectiveness, this study conducts three sets of comparative experiments involving the C2f module in the YOLOv8 model. These experiments compare the original module (M0), the C2f module using MBConv (M1), and the C2f module employing the improved version of MBConv (M2), followed by a thorough analysis of the experimental results from the three configurations.

As illustrated in Table 2, the detection speed of the M1 and M2 models is marginally lower than that of the original M0 model. However, both the F1 and mAP values for the M1 and M2 models surpass those of the M0 model, demonstrating that the implementation of MBConv effectively enhances the model’s detection accuracy concerning sea surface oil spills. Notably, the M2 model not only sustains the detection accuracy achieved by the M1 model but also demonstrates a slight improvement, all while increasing the detection speed by 2.9 FPS. This outcome substantiates our assertion that the EffectiveSE module is more efficient than the SE module.

3. Results and Discussion

3.1. Environmental Configuration

To ensure the fairness, reliability, and reproducibility of the evaluation, all experiments were executed in a standardized and unified computational environment: Ubuntu 20.04 LTS operating system, PyTorch 1.10.0 deep learning framework (coupled with Python 3.8 and CUDA 11.3), an Intel(R) Xeon(R) Platinum 8358P CPU (2.60 GHz), and an NVIDIA RTX 3090 GPU (24 GB memory) to facilitate accelerated model training, inference, and efficient computational throughput.

The experimental dataset comprises 2670 SAR oil-spill images. To preserve the consistency of oil spill category and background context distributions across data partitions, the dataset was divided into training (80%), validation (10%), and test (10%) subsets. This method mitigates potential biases induced by uneven data distribution and ensures the generalizability of model performance evaluations.

To eliminate confounding from training configurations, consistent hyperparameter settings were used across all comparative models: initial learning rate = 0.01, batch size = 32, total training epochs = 300, and the Stochastic Gradient Descent (SGD) optimizer with a weight decay coefficient of 0.0005. These parameters align with domain-specific best practices for object detection tasks, thereby establishing a rigorous and fair baseline for performance comparison. Four widely recognized, standardized evaluation metrics, Precision (P), Recall (R), and Mean Average Precision (

m A P

), are used [29]. Precision defines the proportion of true positives among all samples classified as positive. The recall rate defines the proportion of all actual positive samples correctly classified as positive. The F1 score is the harmonic mean of the precision and recall rates. The mAP represents the mean detection accuracy of all categories and reflects the model’s overall detection performance. The P, R,

F 1

score and

m A P

are calculated as follows:

P = \frac{T P}{T P + F P}

(5)

R = \frac{T P}{T P + F N}

(6)

F 1 = \frac{2 \times P \times R}{P + R}

(7)

m A P = \frac{1}{n} \sum_{i = 1}^{n} A P_{i}

(8)

T P

indicates the number of positive samples correctly predicted by the model, P suggests the number of positive samples incorrectly predicted by the model,

F N

shows the number of negative samples incorrectly predicted by the model, and

A P

is the detection accuracy of each category. We also measured the model’s performance in terms of model size, model parameter quantity, Frames Per Second (FPS), and Giga Floating Point Operations per Second (GFLOPs). The smaller the model size, parameter count, and GFLOPs, the more lightweight the model, the lower the computational complexity and power requirements, and the lower the hardware performance requirements. A higher FPS value indicates better real-time performance and suggests faster detection speed for the model.

We conducted the following experiments to verify the LSFE-YOLO model’s superiority: comparing different lightweight networks based on YOLOv8s, comparative experiments between the SE attention mechanism and other spatial and channel attention mechanisms, ablation studies on the LSFE-YOLO model’s improvements, and comparative experiments before and after the upgrades. Additionally, the enhanced model is compared with current mainstream object detection algorithms.

3.2. Comparisons of Lightweight Networks

To highlight the advantages of the proposed lightweight LSFE-YOLO network in maintaining high accuracy and fast detection speed, this section compares various lightweight network models based on the YOLOv8s network. We tested the original YOLOv8s model, the YOLOv8s model with EfficientViT as the backbone, the YOLOv8s model with StarNet as the backbone, the YOLOv8s model with ShuffleNetV2 as the backbone, and the improved LSFE-YOLO model. As shown in Figure 9, the LSFE-YOLO model achieves the highest detection accuracy and the smallest number of parameters.

As shown in Table 3, the detection results indicate that the proposed lightweight network significantly reduces the original model’s size by 81.9%. ShuffleNetv2 follows with a 46.5% reduction, StarNet with 40.7%, and EfficientViT with 22.6%. The LSFE-YOLO model uses FasterNet as its backbone network, which reduces model size and accelerates detection speed. Compared to the original model, the LSFE-YOLO model decreases GFLOPs by 23.9 and achieves 116 FPS, the highest among the models we tested. ShuffleNetv2, StarNet, and EfficientViT reduce GFLOPs by 12.5, 11.1, and 8, respectively, compared to the original model. This illustrates that the LSFE-YOLO model has a lower computational load and complexity, with faster detection speed. Furthermore, the LSFE-YOLO model achieves a mAP of 96.4%, which is 1.3% higher than that of the original model, while the mAP of the other three networks falls below that of the original model. This validates that the LSFE-YOLO model is more lightweight and effectively balances detection accuracy and speed, meeting the demands of real-time detection.

3.3. Ablation Experiments

To evaluate the impact of each improvement step on detection performance, we conducted ablation experiments on the innovative modules. Table 4 shows six schemes from S0 to SP representing different combinations of five improvement strategies. S0 is the baseline network without improvement strategies, which is equivalent to the YOLOv8s model, S1 builds on the S0 framework by incorporating the FasterNet structure to enhance the backbone network, S2 further modifies the network’s width over S1, S3 introduces a lightweight detection head, known as the GN-LSC head, based on S2, and S4 integrates the Squeeze-and-Excitation (SE) attention mechanism into the backbone of S3, SP substitutes all original C2f modules within the S4 framework with the newly proposed C2f_MBE modules, thereby establishing the comprehensive LSFE-YOLO mode. The YOLOv8s model is the baseline, and we reconstructed the backbone network based on the FasterNet network. We adjusted the network width and replaced the original detection head with the newly designed lightweight GN-LSC head. These improvements significantly enhance the lightweight nature of the S0 model, effectively reducing model complexity and computational load. When we incorporate the FasterNet network and reduce network width, the resulting S2 model can reduce parameters by 77.5%, model size by 76.5%, and GFLOPs by 75%, along with a 26% increase in detection speed. By further introducing the GN-LSC head, the S3 model, compared to the S2 model, decreases parameters by 32%, model size by 32.1%, and GFLOPs by 25.4%, and it also achieves a slight increase in detection speed. The experimental results demonstrate that the S3 model exhibits the lowest parameters, smallest model size, and fastest detection speed.

Although the above improvement methods make the model more lightweight, they impact the model’s detection accuracy. To enhance the model’s detection accuracy, we introduce the SE module and replace the original C2f module in YOLOv8s with the C2f_MBE module. Incorporating the efficient channel attention mechanism SE enables the network to focus better on essential features and suppress unimportant ones. The SE module is relatively lightweight, with fewer parameters, maintaining model size and computational load while increasing the mAP from 94.7% in the S3 model to 95.3% in the S4 model and finally, replacing the C2f module with the C2f_MBE module results in the SP model, which achieves a 1.1% increase in mAP and a 15.1% reduction in GFLOPs, with a minimal sacrifice in model parameters and FPS compared to the S4 model. This demonstrates that the SP model provides more accurate oil spill localization, making sea surface oil spill detection more efficient.

3.4. Experimental Comparison of Different Attention Mechanisms

To further verify the SE attention mechanism’s effectiveness in the LSFE-YOLO model, this experiment compares it with the SimAM, ECA, and GlobalContext attention mechanisms. We integrated all four attention mechanisms into the LSFE-YOLO model’s backbone network for experimental analysis. Table 5 summarizes the results. The precision (P), recall (R), and mean Average Precision (mAP) values for the SE attention mechanism in this study are the highest, recorded at 96.6%, 91.5%, and 96.4%, respectively. This evidence demonstrates that the model using the SE attention mechanism performs best in detection. While the SimAM and ECA attention mechanisms show low parameter counts, their detection accuracies are subpar. Therefore, we advise sacrificing some model parameters to enhance detection accuracy. We incorporate the SE attention mechanism into the model, achieving a balance between model parameters and detection accuracy.

3.5. Comparison of Different Advanced Detection Algorithms

We assess the LSFE-YOLO detection model’s effectiveness by comparing it to several of the most popular and advanced object detection methods of recent years, including Faster R-CNN, SSD, RT-DETR [30], and the YOLO series. Meanwhile, to ensure the evaluation process’s integrity, we performed all experiments within a consistent experimental environment, using identical data partitioning, hyperparameter configurations, and training iterations, as shown in Table 6. Given the intricate nature of the sea surface environment, characterized by factors such as low wind conditions, ocean currents, and the presence of biological oil films, the potential for misidentification of oil spills at sea may increase, resulting in a heightened rate of false detections. This phenomenon adversely affects the accuracy of oil spill detection efforts. As the accompanying table illustrates, the enhanced LSFE-YOLO model presented in this study achieved a Precision (P) index of 96.6%, representing a 3.5% improvement over the original YOLOv8s model. A higher Precision value correlates with a reduced rate of erroneous predictions, particularly concerning categories such as low wind, leading to a greater proportion of accurately identified positive samples. This indicates that the improved model exhibits a notable decrease in the false detection rate. Furthermore, when compared to other leading one-stage and two-stage algorithms, the model proposed in this study exhibits the highest Precision value, thereby demonstrating the effectiveness of the enhancements implemented.

As the table presents, the two-stage target detection model, Faster R-CNN, achieves a detection accuracy of only 89.8%, with an FPS rate of 30. This performance is inadequate for the necessary detection accuracy and, more importantly, does not satisfy the real-time requirements for detecting oil spills on the surface of the sea. Although the SSD and YOLOv3 models demonstrate improved detection accuracy compared to the Faster R-CNN model, the overall detection rate for oil spills remains suboptimal, and the model sizes are substantial. In contrast to the two-stage algorithm, the RT-DETR-L, YOLOv5m, YOLOv8m, YOLOv10m, and YOLOv8s-spd-eca-ad models exhibit enhanced detection outcomes. Nonetheless, these models impose a significant computational burden, and their detection speeds do not fulfill the criteria for real-time detection.

In comparison, the YOLOv5s, YOLOv8s, YOLOv10s, and YOLOv8-YP [31] models exhibit improved detection speeds, reduced model sizes, and lower computational complexity. However, there is still a need for further enhancements in detection accuracy. In contrast, our proposed LSFE-YOLO model demonstrates superior performance in both mean Average Precision (mAP) and F1 metrics, achieving scores of 96.4% and 94%, respectively. Additionally, this model outperforms others in four lightweight metrics: parameters, model size, GFLOPs, and FPS. When compared to the original YOLOv8s model, LSFE-YOLO shows a significant reduction in complexity, with a model size of 4.1 MB, a calculation amount of 4.5 MB, and the number of parameters reduced to 1.9 M. These figures indicate decreases of 81.9%, 84.2%, and 82.9%, respectively, compared to the original model. The detection speed reaches 116 frames per second, marking a 20.8% increase.

From the perspective of the three core dimensions of detection accuracy, model lightweight and inference efficiency, the proposed LSFE-YOLO model is better than the recently updated YOLO foundation models, such as YOLOv9s, YOLOv11n and YOLOv11s, in all indicators. It received the highest values in terms of Precision (P), mean average precision (mAP), and F1 score, while recall (R) was on par with the best. In terms of Precision (P), LSFE-YOLO reached 96.6%, which was 3.8 percentage points higher than that of YOLOv9 (92.8%), 2.0 percentage points higher than that of YOLOv11n (94.6%), and 1.3 percentage points higher than that of YOLOv11 (95.3%), with the lowest risk of missed detection or false detection. In terms of mAP, LSFE-YOLO achieved 96.4%, which was significantly higher than YOLOv11 (95.4%, an increase of 1.0%), demonstrating the best overall detection ability for targets such as oil stains. The recall (R) of LSFE-YOLO reached 91.5%, close to that of YOLOv11s (91.9%), only slightly lower by 0.4 percentage points, and significantly higher than that of YOLOv9s (90.4%, a difference of 1.1%), and there was no significant deficiency. In addition, LSFE-YOLO achieves extremely lightweight performance in terms of model size, number of parameters, and GFLOPs, far exceeding the three control models, making it more suitable for use in edge devices in oil detection scenarios. While ensuring lightweight, LSFE-YOLO achieves a detection speed of 116 frames per second, far exceeding the three comparison models. Compared with YOLOv9s (71 frames/s), it is 63.4% higher. In short, LSFE-YOLO not only ensures an overall lead in detection accuracy, but also achieves ultimate model lightweight and maximum inference efficiency. Compared with YOLOv9, YOLOv11n, and YOLOv11s, it is more suitable to meet the core needs of marine oil pollution detection scenarios, with high precision, low computing power, and high real-time performance.

Table 6. Comparison of Experimental Results of Different Advanced Detection Algorithms.

Model	P%	R%	mAP%	F1%	Model Size/MB	Parameters/M	GFLOPs	FPS
Faster RCNN	55.1	92.4	89.8	69.0	108.2	136.7	369.7	30.0
SSD	92.1	88.2	93.5	90.0	90.6	23.6	174.8	91.0
RT-DETR-L	92.8	90.5	94.5	91.6	59.1	28.4	100.6	48.0
YOLOv3	94.6	85.1	93.8	89.6	207.8	103.6	282.2	49.0
YOLOv5s	94.7	91.8	94.4	93.2	18.6	9.1	23.8	103.0
YOLOv5m	93.4	91.0	95.4	92.2	50.5	25.1	64.2	80.0
YOLOv8s	93.1	91.5	95.1	92.3	22.6	11.1	28.4	96.0
YOLOv8s-spd-eca-ad [25]	94.3	90.3	95.4	92.3	21.2	10.3	52.1	76.0
YOLOv8-YP [29]	93.5	89.4	93.9	91.4	10.9	5.3	12.7	88.0
YOLOv8m	93.2	91.0	95.5	92.1	52.1	25.8	78.7	80.0
YOLOv9s [32]	92.8	90.4	94.9	91.58	14.6	7.1	26.7	71
YOLOv10m [33]	92.2	89.1	93.6	90.6	33.5	15.3	58.9	83.0
YOLOv10s [34]	90.0	89.1	93.7	89.5	16.6	7.2	24.1	104.0
YOLOv11n	94.6	90.4	94.8	92.5	5.2	2.5	6.3	86
YOLOv11s [35]	95.3	91.9	95.4	93.6	18.3	9.4	21.3	72
LSFE-YOLO	96.6	91.5	96.4	94.0	4.1	1.9	4.5	116.0

3.6. Detection Effect Display

To provide a more intuitive demonstration of the detection effect of the improved algorithm presented, four randomly selected images with oil stains are compared. As Figure 10 illustrates, we present the detection outcomes of LSFE-YOLO and the other comparative models.

The efficacy of oil stain detection varies among different models. Notably, only YOLOv5s, YOLOv8s-spd-eca-ad, and the LSFE-YOLO model that we propose can detect oil stains in all four images without false positives. However, the proposed model exhibits superior detection accuracy and faster detection speed. Among the other comparison models, the Faster-RCNN model only detects two oil stains, and the SSD model detects three oil stains but also exhibits false positives. The remaining network models, RT-DETR-L, YOLOv8s, YOLOv8m, and YOLOv10s, identified four oil slicks. However, the detection results indicate that the efficacy of oil slick detection in scenes with elevated background noise is suboptimal, and we observed a prevalence of false positives. In conclusion, the LSFE-YOLO model we have presented exhibits enhanced detection precision while maintaining a commendable detection speed, facilitating more efficient marine oil spill detection in complex aquatic environments.

3.7. Independent Repeated Trials

To assess the stability and reproducibility of the proposed method, multiple independent experiments were conducted. Under identical hyperparameter configurations and training conditions, five independent training and testing sessions were performed using different random seeds (42, 123, 456, 789, 101,112). Each experiment began with random initialization to ensure the independence of the results. As shown in Table 7, the mAP obtained from the five experiments were 95.8%, 96.5%, 95.1%, 95.1%, and 95.5%, respectively. Statistical analysis yielded an average mAP of 95.6% with a standard deviation of 0.58%. The relatively low standard deviation indicates that the proposed method demonstrates good stability and reproducibility, validating its robustness under different random initializations.

3.8. Generalization Experiment

To further verify the model’s generalization ability and analyze its detection results for multi-scale targets in complex backgrounds, we employ the public dataset GC10-DET [25] for testing. The dataset comprises steel plate surface defects collected by Tianjin University in an actual industrial scenario. It includes ten types of surface defects: punching (Pu), weld (WI), crescent moon crescent bend (Cg), water spot (Ws), oil spot (Os), indentation (Rp), fold (Cr), silk spot (Ss), waist fold (Wf), and foreign body indentation (In). The ten surface defects are as follows: weld (WI), oil spot (Os), silk spot (Ss), foreign body indentation (In), indentation (Rp), fold (Cr), and waist fold (Wf). These are the overall detection results for all defect types.

Table 8 presents the experimental results. The test results once again demonstrate the superiority of the LSFE-YOLO model. While reducing the number of model parameters and improving model detection speed, the R, F1 scores, and mAP evaluation metrics are superior to those of YOLOv8s, verifying that our proposed model has enhanced generalization capabilities.

4. Conclusions

The transportation of marine vessels and the operation of offshore oil rigs inevitably lead to oil spills, which pose significant threats to the aquatic environment. Consequently, prompt and accurate detection of surface oil spills is critical. However, distinguishing surface oil spill characteristics from background elements remains challenging, particularly due to the substantial variations in the scale of oil spills, which can easily result in misdetections.

To address these issues, this study proposes LSFE-YOLO, a lightweight model tailored for marine oil spill detection. Key technical optimizations include: (1) Backbone reconstruction with FasterNet and 25% neck network width reduction, minimizing model volume and computational overhead while enhancing inference efficiency; (2) Design of GN-LSC Head, which leverages a shared convolutional structure and GroupNorm normalization to reduce parameter redundancy and ensure stable performance under resource-constrained edge computing scenarios; (3) Integration of SE attention mechanism to adaptively recalibrate channel-wise feature responses, suppressing background noise and highlighting discriminative oil spill features; (4) Development of the C2f_MBE module, integrating MBConv and EffectiveSE while enhancing multi-scale feature representation to reduce missed detections due to extreme size variations.

Experimental evaluations on a 2670-sample SAR oil spill dataset demonstrate that LSFE-YOLO achieves a superior trade-off between accuracy and real-time performance. The model attains 96.6% precision, 96.4% mAP, and 116 FPS inference speed, outperforming state-of-the-art compact detectors, e.g., YOLOv9s, YOLOv11n/S, while retaining ultra-lightweight characteristics: 4.1 MB model size, 1.9 M parameters, and 4.5 GFLOPs. This renders it highly suitable for deployment on resource-constrained edge devices.

Despite these advancements, the model exhibits non-trivial limitations requiring further refinement. In terms of failure modes: (1) Small-scale oil spills (<5% of image area) are prone to missed detections when overlapping with high-intensity background clutter, e.g., dense ship wakes; (2) False positives persist in low-SNR environments, e.g., rough-sea SAR imagery, where biological oil films or turbid water are misclassified as anthropogenic oil spills due to similar gray-scale properties. Regarding generalization, the model’s performance is only validated on temperate-region SAR data, with untested adaptability to extreme weather, e.g., heavy precipitation and fog; special marine environments, e.g., polar ice-covered seas and high-sediment estuaries; and multi-sensor modalities, e.g., optical remote sensing and infrared imaging widely used in marine monitoring.

Future research will address these gaps by (1) Expanding the dataset with extreme-scenario, multi-regional, and multi-sensor (optical, infrared, multi-polarization SAR) samples to improve generalization; (2) Exploring advanced feature fusion (e.g., transformer-based cross-scale attention) for small/overlapping spill detection; (3) Investigating domain adaptation and transfer learning to enhance cross-sensor/environment adaptability.

In conclusion, LSFE-YOLO provides an effective lightweight solution for real-time marine oil spill detection with great practical deployment potential. Widespread application requires addressing extreme scenario and cross-domain generalization limitations. Future efforts will focus on dataset expansion and algorithm optimization to improve the model’s reliability and robustness in complex marine environments.

Author Contributions

J.S.: Writing—review & editing, Writing—original draft, Visualization, Validation, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. T.J.: Writing—original draft, Visualization, Validation, Methodology, Formal analysis, Data curation, Conceptualization. D.P.A.: Writing—review & editing, Supervision, Project administration, Conceptualization. Y.C.: Writing—review & editing, Supervision. Z.X.: Writing—review & editing All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study focuses on the optimization and validation of SAR oil spill detection algorithms, which does not involve human subjects, animal experiments, or any research content that may violate ethical norms. Therefore, this study does not require review by the Institutional Review Board. All image data used in this research were obtained from the NOAA (National Oceanic and Atmospheric Administration) website, and the dataset applicable for detection tasks was constructed after subsequent data processing procedures.

Informed Consent Statement

This study focuses on the optimization and validation of SAR oil spill detection algorithms, and does not involve human subjects or any research content that requires the participation of human individuals. All image data used in this research were obtained from the NOAA (National Oceanic and Atmospheric Administration) website (publicly available data), and no personal information of human beings is involved. Therefore, this study does not require informed consent from any individuals, and the use of the aforementioned data complies with the relevant data usage policies of NOAA.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

During the preparation of this work, the authors used Grammarly (Version 1.2.221.1801) to assist in drafting and refining the text. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the published article.

Conflicts of Interest

The authors declare no competing interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Leifer, I.; Lehr, W.J.; Simecek-Beatty, D.; Bradley, E.; Clark, R.; Dennison, P.; Hu, Y.; Matheson, S.; Jones, C.E.; Holt, B.; et al. State of the art satellite and airborne marine oil spill remote sensing: Application to the BP Deepwater Horizon oil spill. Remote Sens. Environ. 2012, 124, 185–209. [Google Scholar] [CrossRef]
Brekke, C.; Solberg, A.H.S. Oil spill detection by satellite remote sensing. Remote Sens. Environ. 2005, 95, 1–13. [Google Scholar] [CrossRef]
Wan, J.; Cheng, Y. Remote sensing monitoring of Gulf of Mexico oil spill using ENVISAT ASAR images. In Proceedings of the 21st International Conference on Geoinformatics; IEEE: Piscataway, NJ, USA, 2013; pp. 1–5. [Google Scholar] [CrossRef]
Yin, J.; Yang, J.; Zhou, Z.; Song, J. The extended Bragg scattering model-based method for ship and oil-spill observation using compact polarimetric SAR. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 8, 3760–3772. [Google Scholar] [CrossRef]
Ma, X.; Wu, P.; Kong, P. Oil spill detection based on deep convolutional neural networks using polarimetric scattering information from Sentinel-1 SAR images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–13. [Google Scholar] [CrossRef]
Zhu, Q.; Zhang, Y.; Li, Z.; Yan, X.; Guan, Q.; Zhong, Y. Oil spill contextual and boundary-supervised detection network based on marine SAR images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–10. [Google Scholar] [CrossRef]
Simecek-Beatty, D.; Lehr, W.J. Extended oil spill spreading with Langmuir circulation. Mar. Pollut. Bull. 2017, 122, 226–235. [Google Scholar] [CrossRef]
Keramea, P.; Spanoudak, K.; Zodiatis, G.; Gikas, G.; Sylaios, G. Oil spill modeling: A critical review on current trends, perspectives, and challenges. Mar. Sci. Eng. 2021, 9, 181. [Google Scholar] [CrossRef]
Yu, F.; Sun, W.; Li, J.; Zhao, Y.; Zhang, Y.; Chen, G. An improved Otsu method for oil spill detection from SAR images. Oceanologia 2017, 59, 311–317. [Google Scholar] [CrossRef]
Li, Z.; Chen, L.; Zhang, B.; Shi, H.; Long, T. SAR image oil spill detection based on maximum entropy threshold segmentation. Signal Process. 2019, 35, 1111–1117. [Google Scholar] [CrossRef]
Kim, D.; Jung, H. Mapping oil spills from dual-polarized SAR images using an artificial neural network: Application to oil spill in the Kerch Strait in November 2007. Sensors 2018, 18, 2237. [Google Scholar] [CrossRef]
Magrì, S.; Vairo, T.; Reverber, A.P.; Fabiano, B. Oil Spill Identification and Monitoring from Sentinel-1 SAR satellite earth observations: A machine learning approach. In Proceedings of the Chemical Engineering Transactions; AIDIC: Milan, Italy, 2021; Volume 86, pp. 379–384. [Google Scholar] [CrossRef]
Dong, X.; Li, J.; Li, B.; Jin, Y.; Miao, S. Marine oil spill detection from low-quality SAR remote sensing images. Mar. Sci. Eng. 2023, 11, 1552. [Google Scholar] [CrossRef]
Xu, W.; Xu, T.; Thomasson, A.; Chen, W.; Karthikeyan, R.; Tian, G.; Shi, Y.; Ji, C.; Su, Q. A lightweight SSV2-YOLO based model for detection of sugarcane aphids in unstructured natural environments. Comput. Electron. Agric. 2023, 211, 107961. [Google Scholar] [CrossRef]
Xiong, Y.; Meng, X. A SAR oil spill image recognition method based on densenet convolutional neural network. In Proceedings of the 2019 International Conference on Image and Remote Sensing (ICRIS); IEEE: Piscataway, NJ, USA, 2019; pp. 78–81. [Google Scholar] [CrossRef]
Huang, X.; Zhang, B.; Perrie, W.; Lu, Y.; Wang, C. A novel deep learning method for marine oil spill detection from satellite synthetic aperture radar imagery. Mar. Pollut. Bull. 2022, 179, 11366. [Google Scholar] [CrossRef]
Chen, J.; Kao, S.H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H.G. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar] [CrossRef]
Wu, Y.; He, K. Group Normalization. In Computer Vision—ECCV 2018, Proceedings of the ECCV 2018, Munich, Germany, 8—14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 11217. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
National Oceanic and Atmospheric Administration (NOAA). OSPO Ocean Products [Online]; NOAA: Washington, DC, USA, n.d. Available online: https://www.ospo.noaa.gov/Products/ocean/ (accessed on 31 December 2025).
Alaska Satellite Facility (ASF). ASF Data Search Portal [Online]; University of Alaska Fairbanks: Fairbanks, AK, USA, n.d. Available online: https://search.asf.alaska.edu/ (accessed on 31 December 2025).
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
Wang, G.; Wang, J.; Nie, Z.; Yang, H.; Yu, S. A Lightweight YOLOv8 Tomato Detection Algorithm Combining Feature Enhancement and Attention. Agronomy 2023, 13, 1824. [Google Scholar] [CrossRef]
Lu, Y.; Kang, S.; Wu, S.; He, C. Nonwoven defect detection by fusing selection kernel attention. Comput. Eng. Appl. 2024, 60, 331–339. [Google Scholar] [CrossRef]
Yu, M.; Li, Y.; Yan, P.; Li, X.; Tian, Q.; Xie, B. Dense detection algorithm for ceramic tile defects based on improved YOLOv8. J. Intell. Manuf. 2025, 36, 5613–5628. [Google Scholar] [CrossRef]
Lee, Y.; Park, J. Centermask: Real-time anchor-free instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2020; pp. 3906–13915. [Google Scholar] [CrossRef]
Liu, J.; Xu, Y.; Li, H.; Guo, J. Soil Moisture Retrieval in Farmland Areas with Sentinel Multi-Source Data Based on Regression Convolutional Neural Network. Sensors 2021, 21, 877. [Google Scholar] [CrossRef]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2024; pp. 16965–16974. [Google Scholar] [CrossRef]
Peng, Y.; Ji, Y. Road Crack Detection Algorithm Based on Improved YOLOv8. In Proceedings of the ICAICA 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 28–32. [Google Scholar] [CrossRef]
Lv, X.; Duan, F.; Jiang, J.; Fu, X.; Gan, L. Deep metallic surface defect detection: The new benchmark and detection network. Sensors 2020, 20, 1562. [Google Scholar] [CrossRef]
Bento, J.; Paixão, T.; Alvarez, A.B. Performance Evaluation of YOLOv8, YOLOv9, YOLOv10, and YOLOv11 for Stamp Detection in Scanned Documents. Appl. Sci. 2025, 15, 3154. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Sapkota, R.; Flores-Calero, M.; Qureshi, R.; Badgujar, C.; Nepal, U.; Poulose, A.; Zeno, P.; Vaddevolu, U.B.P.; Khan, S.; Shoman, M.; et al. YOLO advances to its genesis: A decadal and comprehensive review of the You Only Look Once (YOLO) series. Artif. Intell. Rev. 2025, 58, 274. [Google Scholar] [CrossRef]
Navin, N.; Farid, F.A.; Rakin, R.Z.; Tanzim, S.S.; Rahman, M.; Rahman, S.; Uddin, J.; Karim, H.A. Bilingual Sign Language Recognition: A YOLOv11-Based Model for Bangla and English Alphabets. J. Imaging 2025, 11, 134. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Synthetic Aperture Radar (SAR) images (Part 1). (a) Oil Spill Events Provided by NOAA. (b) LabelMe operation pictures.

Figure 2. Synthetic Aperture Radar (SAR) images (Part 2). (a) Sentinel-1 image. (b) The preprocessed image.

Figure 4. Network structure of LSFE-YOLO.

Figure 5. Partial Convolution [17].

Figure 6. GN-LSC Head Structure Diagram.

Figure 7. SE Module Structure Diagram.

Figure 8. MBConv structure diagram. (a) The unimproved MBConv structure; (b) The MBConv structure of the EffectiveSE module is introduced.

Figure 9. Comparison of accuracy and parameter number of different lightweight networks.

Figure 10. Comparison of detection effects of different models.

Table 1. GN-LSC Head module validity experiments.

Model	P%	R%	mAP%	Model Size/MB	GFLOPs
YOLOv8s	93.1	91.5	95.1	22.6	28.4
BatchNorm	93.4	87.5	94.2	18.6	21.4
GroupNorm	93.8	92.6	96.1	18.6	21.4

Table 2. C2f_MBE module validity experiments.

Model	F1%	mAP%	FPS
M0	92.3	95.1	96.0
M1	93.3	95.9	90.1
M2	93.7	96.2	93.0

Table 3. Detection results of different lightweight networks.

Model	mAP%	Model Size/MB	GFLOPs	FPS
YOLOv8s	95.1	22.6	28.4	96.0
YOLOv8s-EfficientViT	94.7	17.5	20.4	48.0
YOLOv8s-StarNet	94.5	13.4	17.3	102.0
YOLOv8s-ShuffleNetv2	94.1	12.1	15.9	108.0
LSFE-YOLO	96.4	4.1	4.5	116.0

Table 4. Detection results of the ablation experiment.

Model	FasterNet	Network Width	GN-LSC Head	SE	C2f_MBE	mAP/%	Model Size/MB	Parameters/M	GFLOPs	FPS
S0						95.1	22.6	11.1	28.4	96
S1	✓					95.5	12.4	6.1	16.1	115
S2	✓	✓				94.9	5.3	2.5	7.1	121
S3	✓	✓	✓			94.7	3.6	1.7	5.3	123
S4	✓	✓	✓	✓		95.3	3.6	1.7	5.3	120
SP	✓	✓	✓	✓	✓	96.4	4.1	1.9	4.5	116

Table 5. SE Attention Mechanism Effectiveness Experiment.

Attention Mechanism	P/%	R/%	mAP/%	Parameters
SimAM	96.1	90.0	95.6	0
ECA	95.9	90.0	95.7	3
GC	95.0	91.5	96.0	2533
SE	96.6	91.5	96.4	1152

Table 7. Results of Independent Repeated Trials.

Serial Number	Random Seed	mAP%
1	42	95.8
2	123	96.5
3	456	95.1
4	789	95.1
5	101,112	95.5
Mean ± Standard Deviation		95.6 ± 0.58

Table 8. GC10-DET dataset generalization ability test.

Model	Parameters/M	FPS	Defect Type	P/%	R/%	F1/%	mAP/%
YOLOv8s	11.1	96.0	Pu	95.4	100.0	97.6	99.5
			WI	76.5	100.0	86.7	94.3
			Cg	74.3	92.3	82.3	89.4
			Ws	74.0	73.8	73.9	75.9
			Os	70.9	59.0	64.4	66.6
			Ss	61.6	52.9	56.9	56.6
			In	60.1	25.1	35.4	34.5
			Rp	61.0	33.3	43.1	41.9
			Cr	63.6	30.3	41.0	44.0
			Wf	76.0	90.6	82.7	90.5
			all	71.3	65.7	68.4	69.3
LSFE-YOLO	1.9	116.0	Pu	82.8	100.0	90.6	99.5
			WI	60.2	96.6	74.2	84.1
			Cg	73.5	92.3	81.8	90.4
			Ws	64.9	74.1	69.2	79.3
			Os	74.0	69.4	71.6	80.4
			Ss	62.8	63.8	63.3	63.2
			In	51.8	28.6	36.9	29.3
			Rp	57.6	30.7	40.1	37.9
			Cr	100.0	29.0	45.0	42.3
			Wf	78.8	92.9	85.3	87.5
			all	70.6(−0.7)	67.7(+2.0)	69.1(+0.7)	69.4(+0.1)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shi, J.; Jiao, T.; Ames, D.P.; Chen, Y.; Xie, Z. Improved Lightweight Marine Oil Spill Detection Using the YOLOv8 Algorithm. Appl. Sci. 2026, 16, 780. https://doi.org/10.3390/app16020780

AMA Style

Shi J, Jiao T, Ames DP, Chen Y, Xie Z. Improved Lightweight Marine Oil Spill Detection Using the YOLOv8 Algorithm. Applied Sciences. 2026; 16(2):780. https://doi.org/10.3390/app16020780

Chicago/Turabian Style

Shi, Jianting, Tianyu Jiao, Daniel P. Ames, Yinan Chen, and Zhonghua Xie. 2026. "Improved Lightweight Marine Oil Spill Detection Using the YOLOv8 Algorithm" Applied Sciences 16, no. 2: 780. https://doi.org/10.3390/app16020780

APA Style

Shi, J., Jiao, T., Ames, D. P., Chen, Y., & Xie, Z. (2026). Improved Lightweight Marine Oil Spill Detection Using the YOLOv8 Algorithm. Applied Sciences, 16(2), 780. https://doi.org/10.3390/app16020780

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved Lightweight Marine Oil Spill Detection Using the YOLOv8 Algorithm

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Acquisition

2.2. Image Preprocessing

2.3. The YOLOv8 Network Architecture

2.4. The Proposed LSFE-YOLO Network Architecture

2.4.1. FasterNet

2.4.2. GN-LSC Head

2.4.3. SENet

2.4.4. C2f_MBE

3. Results and Discussion

3.1. Environmental Configuration

3.2. Comparisons of Lightweight Networks

3.3. Ablation Experiments

3.4. Experimental Comparison of Different Attention Mechanisms

3.5. Comparison of Different Advanced Detection Algorithms

3.6. Detection Effect Display

3.7. Independent Repeated Trials

3.8. Generalization Experiment

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI