3.1. Environmental Configuration
To ensure the fairness, reliability, and reproducibility of the evaluation, all experiments were executed in a standardized and unified computational environment: Ubuntu 20.04 LTS operating system, PyTorch 1.10.0 deep learning framework (coupled with Python 3.8 and CUDA 11.3), an Intel(R) Xeon(R) Platinum 8358P CPU (2.60 GHz), and an NVIDIA RTX 3090 GPU (24 GB memory) to facilitate accelerated model training, inference, and efficient computational throughput.
The experimental dataset comprises 2670 SAR oil-spill images. To preserve the consistency of oil spill category and background context distributions across data partitions, the dataset was divided into training (80%), validation (10%), and test (10%) subsets. This method mitigates potential biases induced by uneven data distribution and ensures the generalizability of model performance evaluations.
To eliminate confounding from training configurations, consistent hyperparameter settings were used across all comparative models: initial learning rate = 0.01, batch size = 32, total training epochs = 300, and the Stochastic Gradient Descent (SGD) optimizer with a weight decay coefficient of 0.0005. These parameters align with domain-specific best practices for object detection tasks, thereby establishing a rigorous and fair baseline for performance comparison. Four widely recognized, standardized evaluation metrics, Precision (
P), Recall (
R), and Mean Average Precision (
), are used [
29]. Precision defines the proportion of true positives among all samples classified as positive. The recall rate defines the proportion of all actual positive samples correctly classified as positive. The F1 score is the harmonic mean of the precision and recall rates. The mAP represents the mean detection accuracy of all categories and reflects the model’s overall detection performance. The
P,
R,
score and
are calculated as follows:
indicates the number of positive samples correctly predicted by the model, P suggests the number of positive samples incorrectly predicted by the model, shows the number of negative samples incorrectly predicted by the model, and is the detection accuracy of each category. We also measured the model’s performance in terms of model size, model parameter quantity, Frames Per Second (FPS), and Giga Floating Point Operations per Second (GFLOPs). The smaller the model size, parameter count, and GFLOPs, the more lightweight the model, the lower the computational complexity and power requirements, and the lower the hardware performance requirements. A higher FPS value indicates better real-time performance and suggests faster detection speed for the model.
We conducted the following experiments to verify the LSFE-YOLO model’s superiority: comparing different lightweight networks based on YOLOv8s, comparative experiments between the SE attention mechanism and other spatial and channel attention mechanisms, ablation studies on the LSFE-YOLO model’s improvements, and comparative experiments before and after the upgrades. Additionally, the enhanced model is compared with current mainstream object detection algorithms.
3.3. Ablation Experiments
To evaluate the impact of each improvement step on detection performance, we conducted ablation experiments on the innovative modules.
Table 4 shows six schemes from S0 to SP representing different combinations of five improvement strategies. S0 is the baseline network without improvement strategies, which is equivalent to the YOLOv8s model, S1 builds on the S0 framework by incorporating the FasterNet structure to enhance the backbone network, S2 further modifies the network’s width over S1, S3 introduces a lightweight detection head, known as the GN-LSC head, based on S2, and S4 integrates the Squeeze-and-Excitation (SE) attention mechanism into the backbone of S3, SP substitutes all original C2f modules within the S4 framework with the newly proposed C2f_MBE modules, thereby establishing the comprehensive LSFE-YOLO mode. The YOLOv8s model is the baseline, and we reconstructed the backbone network based on the FasterNet network. We adjusted the network width and replaced the original detection head with the newly designed lightweight GN-LSC head. These improvements significantly enhance the lightweight nature of the S0 model, effectively reducing model complexity and computational load. When we incorporate the FasterNet network and reduce network width, the resulting S2 model can reduce parameters by 77.5%, model size by 76.5%, and GFLOPs by 75%, along with a 26% increase in detection speed. By further introducing the GN-LSC head, the S3 model, compared to the S2 model, decreases parameters by 32%, model size by 32.1%, and GFLOPs by 25.4%, and it also achieves a slight increase in detection speed. The experimental results demonstrate that the S3 model exhibits the lowest parameters, smallest model size, and fastest detection speed.
Although the above improvement methods make the model more lightweight, they impact the model’s detection accuracy. To enhance the model’s detection accuracy, we introduce the SE module and replace the original C2f module in YOLOv8s with the C2f_MBE module. Incorporating the efficient channel attention mechanism SE enables the network to focus better on essential features and suppress unimportant ones. The SE module is relatively lightweight, with fewer parameters, maintaining model size and computational load while increasing the mAP from 94.7% in the S3 model to 95.3% in the S4 model and finally, replacing the C2f module with the C2f_MBE module results in the SP model, which achieves a 1.1% increase in mAP and a 15.1% reduction in GFLOPs, with a minimal sacrifice in model parameters and FPS compared to the S4 model. This demonstrates that the SP model provides more accurate oil spill localization, making sea surface oil spill detection more efficient.
3.5. Comparison of Different Advanced Detection Algorithms
We assess the LSFE-YOLO detection model’s effectiveness by comparing it to several of the most popular and advanced object detection methods of recent years, including Faster R-CNN, SSD, RT-DETR [
30], and the YOLO series. Meanwhile, to ensure the evaluation process’s integrity, we performed all experiments within a consistent experimental environment, using identical data partitioning, hyperparameter configurations, and training iterations, as shown in
Table 6. Given the intricate nature of the sea surface environment, characterized by factors such as low wind conditions, ocean currents, and the presence of biological oil films, the potential for misidentification of oil spills at sea may increase, resulting in a heightened rate of false detections. This phenomenon adversely affects the accuracy of oil spill detection efforts. As the accompanying table illustrates, the enhanced LSFE-YOLO model presented in this study achieved a Precision (P) index of 96.6%, representing a 3.5% improvement over the original YOLOv8s model. A higher Precision value correlates with a reduced rate of erroneous predictions, particularly concerning categories such as low wind, leading to a greater proportion of accurately identified positive samples. This indicates that the improved model exhibits a notable decrease in the false detection rate. Furthermore, when compared to other leading one-stage and two-stage algorithms, the model proposed in this study exhibits the highest Precision value, thereby demonstrating the effectiveness of the enhancements implemented.
As the table presents, the two-stage target detection model, Faster R-CNN, achieves a detection accuracy of only 89.8%, with an FPS rate of 30. This performance is inadequate for the necessary detection accuracy and, more importantly, does not satisfy the real-time requirements for detecting oil spills on the surface of the sea. Although the SSD and YOLOv3 models demonstrate improved detection accuracy compared to the Faster R-CNN model, the overall detection rate for oil spills remains suboptimal, and the model sizes are substantial. In contrast to the two-stage algorithm, the RT-DETR-L, YOLOv5m, YOLOv8m, YOLOv10m, and YOLOv8s-spd-eca-ad models exhibit enhanced detection outcomes. Nonetheless, these models impose a significant computational burden, and their detection speeds do not fulfill the criteria for real-time detection.
In comparison, the YOLOv5s, YOLOv8s, YOLOv10s, and YOLOv8-YP [
31] models exhibit improved detection speeds, reduced model sizes, and lower computational complexity. However, there is still a need for further enhancements in detection accuracy. In contrast, our proposed LSFE-YOLO model demonstrates superior performance in both mean Average Precision (mAP) and F1 metrics, achieving scores of 96.4% and 94%, respectively. Additionally, this model outperforms others in four lightweight metrics: parameters, model size, GFLOPs, and FPS. When compared to the original YOLOv8s model, LSFE-YOLO shows a significant reduction in complexity, with a model size of 4.1 MB, a calculation amount of 4.5 MB, and the number of parameters reduced to 1.9 M. These figures indicate decreases of 81.9%, 84.2%, and 82.9%, respectively, compared to the original model. The detection speed reaches 116 frames per second, marking a 20.8% increase.
From the perspective of the three core dimensions of detection accuracy, model lightweight and inference efficiency, the proposed LSFE-YOLO model is better than the recently updated YOLO foundation models, such as YOLOv9s, YOLOv11n and YOLOv11s, in all indicators. It received the highest values in terms of Precision (P), mean average precision (mAP), and F1 score, while recall (R) was on par with the best. In terms of Precision (P), LSFE-YOLO reached 96.6%, which was 3.8 percentage points higher than that of YOLOv9 (92.8%), 2.0 percentage points higher than that of YOLOv11n (94.6%), and 1.3 percentage points higher than that of YOLOv11 (95.3%), with the lowest risk of missed detection or false detection. In terms of mAP, LSFE-YOLO achieved 96.4%, which was significantly higher than YOLOv11 (95.4%, an increase of 1.0%), demonstrating the best overall detection ability for targets such as oil stains. The recall (R) of LSFE-YOLO reached 91.5%, close to that of YOLOv11s (91.9%), only slightly lower by 0.4 percentage points, and significantly higher than that of YOLOv9s (90.4%, a difference of 1.1%), and there was no significant deficiency. In addition, LSFE-YOLO achieves extremely lightweight performance in terms of model size, number of parameters, and GFLOPs, far exceeding the three control models, making it more suitable for use in edge devices in oil detection scenarios. While ensuring lightweight, LSFE-YOLO achieves a detection speed of 116 frames per second, far exceeding the three comparison models. Compared with YOLOv9s (71 frames/s), it is 63.4% higher. In short, LSFE-YOLO not only ensures an overall lead in detection accuracy, but also achieves ultimate model lightweight and maximum inference efficiency. Compared with YOLOv9, YOLOv11n, and YOLOv11s, it is more suitable to meet the core needs of marine oil pollution detection scenarios, with high precision, low computing power, and high real-time performance.
Table 6.
Comparison of Experimental Results of Different Advanced Detection Algorithms.
Table 6.
Comparison of Experimental Results of Different Advanced Detection Algorithms.
| Model | P% | R% | mAP% | F1% | Model Size/MB | Parameters/M | GFLOPs | FPS |
|---|
| Faster RCNN | 55.1 | 92.4 | 89.8 | 69.0 | 108.2 | 136.7 | 369.7 | 30.0 |
| SSD | 92.1 | 88.2 | 93.5 | 90.0 | 90.6 | 23.6 | 174.8 | 91.0 |
| RT-DETR-L | 92.8 | 90.5 | 94.5 | 91.6 | 59.1 | 28.4 | 100.6 | 48.0 |
| YOLOv3 | 94.6 | 85.1 | 93.8 | 89.6 | 207.8 | 103.6 | 282.2 | 49.0 |
| YOLOv5s | 94.7 | 91.8 | 94.4 | 93.2 | 18.6 | 9.1 | 23.8 | 103.0 |
| YOLOv5m | 93.4 | 91.0 | 95.4 | 92.2 | 50.5 | 25.1 | 64.2 | 80.0 |
| YOLOv8s | 93.1 | 91.5 | 95.1 | 92.3 | 22.6 | 11.1 | 28.4 | 96.0 |
| YOLOv8s-spd-eca-ad [25] | 94.3 | 90.3 | 95.4 | 92.3 | 21.2 | 10.3 | 52.1 | 76.0 |
| YOLOv8-YP [29] | 93.5 | 89.4 | 93.9 | 91.4 | 10.9 | 5.3 | 12.7 | 88.0 |
| YOLOv8m | 93.2 | 91.0 | 95.5 | 92.1 | 52.1 | 25.8 | 78.7 | 80.0 |
| YOLOv9s [32] | 92.8 | 90.4 | 94.9 | 91.58 | 14.6 | 7.1 | 26.7 | 71 |
| YOLOv10m [33] | 92.2 | 89.1 | 93.6 | 90.6 | 33.5 | 15.3 | 58.9 | 83.0 |
| YOLOv10s [34] | 90.0 | 89.1 | 93.7 | 89.5 | 16.6 | 7.2 | 24.1 | 104.0 |
| YOLOv11n | 94.6 | 90.4 | 94.8 | 92.5 | 5.2 | 2.5 | 6.3 | 86 |
| YOLOv11s [35] | 95.3 | 91.9 | 95.4 | 93.6 | 18.3 | 9.4 | 21.3 | 72 |
| LSFE-YOLO | 96.6 | 91.5 | 96.4 | 94.0 | 4.1 | 1.9 | 4.5 | 116.0 |
3.6. Detection Effect Display
To provide a more intuitive demonstration of the detection effect of the improved algorithm presented, four randomly selected images with oil stains are compared. As
Figure 10 illustrates, we present the detection outcomes of LSFE-YOLO and the other comparative models.
The efficacy of oil stain detection varies among different models. Notably, only YOLOv5s, YOLOv8s-spd-eca-ad, and the LSFE-YOLO model that we propose can detect oil stains in all four images without false positives. However, the proposed model exhibits superior detection accuracy and faster detection speed. Among the other comparison models, the Faster-RCNN model only detects two oil stains, and the SSD model detects three oil stains but also exhibits false positives. The remaining network models, RT-DETR-L, YOLOv8s, YOLOv8m, and YOLOv10s, identified four oil slicks. However, the detection results indicate that the efficacy of oil slick detection in scenes with elevated background noise is suboptimal, and we observed a prevalence of false positives. In conclusion, the LSFE-YOLO model we have presented exhibits enhanced detection precision while maintaining a commendable detection speed, facilitating more efficient marine oil spill detection in complex aquatic environments.